Robots.txt Ultimate How To Tutorial

Robots.txt is a simple file in the root of your ftp server telling server bots (spiders - Google bot, Yahoo bot, Msn bot) what to do. You can easily tell them which part of your website you don't want to get indexed.

What are robots - spiders?

robots.txtThey are simple search engines "executives" who fetches the website content for storing it in search engine index (servers).

There are good and bad (spam) bots, spiders. Good are for example bots coming from major search engines, and the bad are spam bots that are built and used only for negative purposes, like getting emails from user websites that can be used later for spam emails that we are all sick of.

How to create a robots.txt file?

It is very simple, create an empty file robots.txt and upload it to your website root folder (usually public html folder). Just make a file in notepad, rename it to robots and upload it with ftp to your site root folder. Not lets make it work.

In first example we will address a specified bot, spider. TO do that just include this in the first line of you robots.txt file.

User-agent: BotName

Change BotName with the robot name you want to address. To use it for all robits simply use this line instead.

User-agent: *

The second part is to tell robots what parts of the website should not be crawled, visited.

 

Disallow: /docs/

 

For this example it means that any path on your website starting with the string /cgi-bin/ will not be crawled. You can put multiple lines for excluding different directories, files.

Multiple paths can be excluded per robot by using several Disallow lines.

 

User-agent: *
Disallow: /docs/
Disallow: /temp/
Disallow: /mypictures

 

In this example robots.txt file would apply to all bots and instruct them to stay out of directories /docs/ and /temp/.

The third line tells them to exclude all the urls starting with /mypictures, that goes for folders and files. (See how the last slash is not displayed).

To prevent the access to the whole site for the specified bot just add this line in robots.txt file. Instead of BotName, put the name of the bot.

 

User-agent: BotName
Disallow: /

 

This robots.txt does not have any restrictions at all and allows all the bots to crawl the whole site and al the files and folders.

 

User-agent: *
Disallow:

 

Here is the short list of famous bots /spiders.

Robot Name

Googlebot

Googlebot-Image

Slurp

ZyBorg

fast

Openbot

Scooter

Add new comment

The content of this field is kept private and will not be shown publicly.

Plain text

  • No HTML tags allowed.
  • Web page addresses and email addresses turn into links automatically.
  • Lines and paragraphs break automatically.