1

I am looking for a solution to deindex all the URLs with query strings ?te= from Google. From example I want to deindex all the URLs https://example.com/?te= from Google.

Google has currently indexed 21k URLs with the same query string and I want them all to be deindex. Should I used X robot files to do so?

What are the possible solution to do that?

I have tried blocking them using robot.txt using the command

Disallow: /*?te=

But it didn't help me out.

3
  • 1
    Is it specific to Google? You can go to the Google webmaster console and request de-indexing. This isn't a PHP issue. Commented Apr 28, 2023 at 12:30
  • @user3783243 It is specified to Google only I have already requested them and some of the URLs are removed but the removal is temporary I want to permanently remove them from Google. Also, I don't want google to index those pages ever because they are consuming a lot of bandwidth and these URL parameters are just useless because each URL is redirection to and internal link of my website. Commented Apr 29, 2023 at 10:07
  • @StephenOstermiller Oh, the query parameter is /?te= Commented Apr 29, 2023 at 10:09

1 Answer 1

1

Your robots.txt solution would mostly work if you gave it enough time. Google usually stops indexing URLs it can't crawl. However, Google occasionally indexes such URLs based in external links without indexing the contents of the page.

Using X-Robots-Tag is a much better idea. It will prevent Google from indexing the pages. You will need to remove your disallow rule from robots.txt or Googlebot won't be able to crawl your URLs and see the X-Robots-Tag. You'll also need to give Googlebot time to crawl all the pages. Some pages will start getting de-indexed in a few days, but it could take months for Googlebot to get through all of them.

If you are using Apache 2.4 or later, you can do this in .htaccess using Apache's built in expressions:

<If "%{QUERY_STRING} =~ /te=/">
    Header set X-Robots-Tag noindex
</If>

If you are still on Apache 2.2 or earlier, you'll have to use a rewrite rule and environment variable to achieve the same effect:

RewriteCond %{QUERY_STRING} te=
RewriteRule ^(.*)$ $1 [E=teinquery:1]
Header set X-Robots-Tag noindex env=teinquery

I recommend testing to see if it is working using curl on the command line.

curl --head "https://example.com/"

should NOT show a line that is X-Robots-Tag: noindex, but the following command should show it:

curl --head "https://example.com/?te=foo"
Sign up to request clarification or add additional context in comments.

8 Comments

can you tell how can I find which server I am using my website is built on wordpress.
@mehtabfatima Your response headers might tell you what version you are running. (If so you should change that because that can tell malicious users you are running software susceptible to some vulns)
@StephenOstermiller Ok now can you clear one thing where to add this code should I add it in the function.php file(under theme editor) or htaccess file
You can try the the Apache 2.4 configuration and see if it works. It will show a "500 internal server error" on Apache 2.2. If you are running your own server, check this How can I tell what version of apache I'm running? If you are on shared hosting, ask your hosting provider.
As it says in my answer, this is .htaccess code. It is not for PHP.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.