SEOHigh.Com SEO/SEM Services

Toronto Search Engine Optimization (SEO) Search Engine Marketing (SEM)

 

 
<< Previous    1  [2]  3    Next >>

The first line is the User-agent line. This is the line where you can specify which search spider bots are allowed to index your site(s). The second line is the directive line or disallow field. This is the line you will use to block folders or files blocked from spiders.

Of particular note: if the publishers of Perfect 10 magazine (an online porn magazine suing Google for linking to their images) had used the robots.txt file, they could stop search spiders from indexing their images. Is it Googles' fault the magazine hired incompetent IT staff? I don't think so. To me it's another adult webmaster looking for more free publicity.

To write the robots.txt file, you would start by addressing specific search engines. The User-agent line would start as:


User-agent:
Adding a specific search engines spider name here will give the search spider notice that it is to follow the next line for instruction, i.e.:
User-agent: googlebot


This tells googlebot that it is to follow the next line's directions on how to proceed through your website, or to leave altogether. You may also employ the use of an asterisk (*) as a wildcard for all search spiders.
The second line known as the directive is written as:
Disallow:


By adding a folder after the Disallow statement, the search spider should ignore the folder for indexing purposes and move to others where there is no restriction.
Disallow: /images/


This is a special example, just for Perfect 10. This one minute bit of instruction could have saved a ton in wasted legal fees on a frivolous lawsuit. As this is a basic step in building websites, it is incumbent on website owners to protect their intellectual property, and not a 3rd party search engines duty.


You can also disallow specific files this way
Disallow: cheeseyporn.htm
One way I recommend using this all the time is to keep robots out of you cgi bin directory
Disallow: /cgi-bin/


If you leave the Disallow directive line blank or not filled in, this indicates that ALL files may be retrieved and or indexed by specifiedl robot(s). This would let all robots index all files.
User-agent: *
Disallow:
And vice versa you can keep all robots out easily.
User-agent: *
Disallow: /

In the example above, the one forward slash (/) equals your root directory. Since the root directory is blocked, none of the other folders and files can be indexed or crawled. Your site will be removed from search engines once they read your robots.txt and update their indexes.

<< Previous    1  [2]  3    Next >>