SEOHigh.Com SEO/SEM Services

Toronto Search Engine Optimization (SEO) Search Engine Marketing (SEM)

 

 
<< Previous    1  2  [3]    Next >>

You can provide multiple Disallows to one User-agent. In the following example, all spiders will be told not to index the cgi-bin and the images directories.

User-agent: *
Disallow: /cgi-bin/
Disallow: /images/


We can also use the robots.txt file to help improve search engine rankings that we may have achieved with a dynamic page such as php. Googlebot may have problems with them if there are too many variables in the Session IDs of the URL.


A URL with session IDs will look similar to the below:
http://www.yourcoolsite.com/cat.php?par=887&show=subcats?=0431Tr
If your cool website is written in php and is converted into HTML pages for googlebot to index, the robot will still try to index the php pages. After copying the pages from php to HTML, place each set of pages in their own folder. Title them something easy for you to remember. Place all the php pages into a folder named "php."

This will allow you to leave the HTML pages under the root directory which is easily indexed by the spiders. Then using what you have learned so far, implement the following in your robots.txt file:


User-agent: googlebot
Disallow: /php/


Now we have kept googlebot out of the php pages, which the bot usually has problems crawling. It leaves the spider to crawl the more friendly html pages, and it will not see your original content duplicated on your site between the php and html versions. If the pages are cleanly coded, this will often result in improved rankings in all three of the major search engines.


You can also use comments in your robots.txt file, but you need to be careful of where they are used.
Disallow: /images/ #comment send googlebot away
We could run into a problem if a search spider bot attempts to disallow /images/#comment, which is a not a folder on the server and would more than likely tell the bots to just leave the website altogether.


It is better to leave your comments on their own separate line. See the example below.


#keeps googlebot out of my porn
User-agent: googlebot
Disallow: /images/

So as we can see there is a very valid and legitimate reason to use the robots.txt file. There are also numerous other times to use the file. In some cases it could stop a large company from looking like fools for not protecting their intellectual property, and in others it would stop sensitive data from being crawled and indexed over the internet, and also to help a site increase positions in the natural organic search results listings.

After you have written your robots.txt file and placed it on your server, you should validate it with one of the robots.txt validation tools online.

<< Previous    1  2  [3]    Next >>