How to Control Search Engine
Spiders
Robots.txt Implementation
One of the most fundamental steps when optimizing a website is
writing a robots.txt file. It helps tell spiders what is useful and
public for sharing in the search engine indexes and what is not. It
should also be noted that not all search spiders will follow your
instructions left in the robots.txt file. In addition, a poorly
done robots.txt file can stop the search spiders from crawling and
indexing your website properly. In this article I will show you how
to be sure everything will work correctly.
While there are many other SEOs who will tell you that a robots.txt
file will not improve your rankings, I would disagree, in order for
the robots to index your site properly, they need instruction on
which folders or files to not crawl or index, as well as which ones
you want to have indexed.
Another good reason to use the robots.txt file is because many of
the search engines tell the public to use them on their websites.
Below is a quote taken from Google:
Make use of the robots.txt file on your web server. This file tells
crawlers which directories can or cannot be crawled. Make sure it's
current for your site so that you don't accidentally block the
Googlebot crawler.
Even though others feel this is of no use unless you are blocking
content, keep this in mind; when a search engine goes out of their
way (and this is the tightest-lipped search engine ever) to tell us
to use something, it is usually to ones advantage to follow the
little clues we are offered.
Also if you read your stats file on your web hosting server, you
will usually find the URL to your robots.txt being requested. If a
search bot asks for the robots.txt and does not find it on your
server, the spider often just leaves.
How do you build a robots.txt file for your website? I am glad
you asked. One thing you do not want to do is use an HTML editor to
build this file. The easiest way to create the file is with a text
editor like Notepad. After opening Notepad (or another text
editor), save the blank file as robots.txt. This file will be
placed on the root level of your web server, or in other words the
same folder as your index page, once it is complete.
Now I will cover several different methods of efficiently using
a robots.txt file to direct the robot to crawl the correct
directories and and avoid others.
First we will discuss how to format information. The text file is
actually a list. Its directions consist of two fields, or lines of
instruction.
|