PDA

View Full Version : Modify robots.txt to fix duplicate urls for Google crawler



Ivillages
05-02-2007, 04:21 PM
One of our clients has determined that with the URL rewrite module enabled, Google can possibly find 2 urls for the sam page like:

/index.php?a=2&b=8343
and
/index/listings/page8343.htm

This is not a good thing since Google will filter out pages that have duplicate urls. We found a fix, which involves adding the following to your robots.txt file:

Disallow: /*?
Disallow: /*&

You can find more info on tweaking this at
http://www.google.com/support/webmasters/bin/answer.py?answer=40367

I thought y'all might find this pretty handy, since the term 'search friendly url' can be deceiving when Google will actually omit that url if there is a duplicate 'not search friendly url' in your site!

Bob

Festus
05-03-2007, 06:01 PM
Hmm, that's interesting. Thank you for the tip. About once a month I check Google to see how many of my pages are indexed on my site as a whole (site:www.mysite.com) and in specific directories. I'll modify my robots.txt file and see what happens.

Ivillages
05-03-2007, 06:44 PM
You're welcome! Please post your findings for the benefit of the others.........

Festus
08-29-2007, 04:27 PM
I have the URL rewrite module enabled so I was eager to see if this code


Disallow: /*?
Disallow: /*&

would help me. I've played with it a couple of times now and my pages indexed by Google go down significantly when I use the code. I had been keeping notes about once a month or so on how many of my pages were being indexed, and when I added the above code to my robots.txt it took a little over a month then my indexed pages went down sharply. So, I took the above code out of my robots.txt file and about a month later my indexed pages went back up. That got me curious, so I put the code back in and sure enough, the number of indexed pages went down again. I took the code out, and indexed pages are back up. Most importantly here, my actual page views go up when indexed pages go up (which seems logical, but I've been tracking my stats to make sure).

Seems to me like the code should be a good thing, so I don't understand why I get the results I do. Just thought I'd pass along my findings...

geomodules
08-29-2007, 06:26 PM
Hmm, that's interesting. Thank you for the tip. About once a month I check Google to see how many of my pages are indexed on my site as a whole (site:www.mysite.com) and in specific directories. I'll modify my robots.txt file and see what happens.

Read this..... May29th I posted this in my knowledge base.
http://geomodules.com/support/index.php?_m=knowledgebase&_a=viewarticle&kbarticleid=7&nav=0

complete with video

EagleMark
07-03-2008, 12:29 AM
I'll have to try that because NONE of my pages from this software have ever been listed in Google. All I get from Google Webmaster area is a bunch of error pages, mostly duplicate pages found several differant ways. Some pages can be found 10 times, so I'm guessing google has banned my site. The only page that can be found is the plain HTML index page., I have given up on this software and the ability to be found in search engines. The cost of the software alone should be able to be found in search engines without spending more money on add-ons...

danzo
07-03-2008, 01:15 AM
Which site are you?
It will sometime take months for google to even start crawling your site,
and then for your indexed pages to appear on search results,
especially if your site is not linked from any site before.

If you linked from a highly regarded site, it should appear in weeks.


I'll have to try that because NONE of my pages from this software have ever been listed in Google. All I get from Google Webmaster area is a bunch of error pages, mostly duplicate pages found several differant ways. Some pages can be found 10 times, so I'm guessing google has banned my site. The only page that can be found is the plain HTML index page., I have given up on this software and the ability to be found in search engines. The cost of the software alone should be able to be found in search engines without spending more money on add-ons...

EagleMark
07-03-2008, 02:51 AM
The site has been up since Oct 2007.

I can make new sites with no links come up top five within weeks and as little as days. Only gets better from there. The exception is any page from this software.

I have worked on this site here, in support, on the web and with freinds (other professionals). with no results. I have easy 100 hours into this issue and have lost $1,000.s in advertising. I stopped putting time and money into this along time ago.

Just happened to look for updates before my one year support is up to see if this was fixed? And find another company fixing the issue and offering add ons to fix the issue... I have also yet to see this software in any web searches? Except the one's with the another companie$ $oft ware for more buck$!

Can you see why this makes me angry?

Refunds?

By the way I think the software otherwise is TOP NOTCH! :D

Parked
07-06-2008, 08:31 PM
..

jonyo
07-08-2008, 12:40 PM
Posts referencing a specific website removed at request of site owner.


The jist of the posts removed: There are many sites that use Geo software and have in the thousands of pages indexed, all pages from the Geo software. The post actually referenced one site in particular where you would google "site:_________.com" and it came up with more than 10 pages of results all pages that were with the software, proving that the software pages were being indexed by google.

My input: Make sure you have at least Geo 3.1 and it should work just fine with search engines, even better if you have the Geo SEO addon or one of the other excellent 3rd party SEO addons.

Also, since this was tacked onto a thread talking about robots.txt file, make sure you haven't done what the first post in this thread suggests, if you don't have the SEO addon installed. By following the first post's instructions without having an SEO addon installed, you effectively ban the search engines from viewing any part of your site except for the front page.