Robots.txt Ultimate How To Tutorial |
|
| Search Engine Optimization |
| Monday, 07 January 2008 10:27 |
|
Robots.txt is a simple file in the root of your ftp server telling server bots (spiders - Google bot, Yahoo bot, Msn bot) what to do. You can easily tell them which part of your website you don't want to get indexed. What are robots - spiders?
There are good and bad (spam) bots, spiders. Good are for example bots coming from major search engines, and the bad are spam bots that are built and used only for negative purposes, like getting emails from user websites that can be used later for spam emails that we are all sick of. How to create a robots.txt file?It is very simple, create an empty file robots.txt and upload it to your website root folder (usually public html folder). Just make a file in notepad, rename it to robots and upload it with ftp to your site root folder. Not lets make it work. In first example we will address a specified bot, spider. TO do that just include this in the first line of you robots.txt file. User-agent: BotName Change BotName with the robot name you want to address. To use it for all robits simply use this line instead. User-agent: * The second part is to tell robots what parts of the website should not be crawled, visited.
Disallow: /docs/
For this example it means that any path on your website starting with the string /cgi-bin/ will not be crawled. You can put multiple lines for excluding different directories, files. Multiple paths can be excluded per robot by using several Disallow lines.
User-agent: *
Disallow: /docs/ Disallow: /temp/ Disallow: /mypictures
In this example robots.txt file would apply to all bots and instruct them to stay out of directories /docs/ and /temp/. The third line tells them to exclude all the urls starting with /mypictures, that goes for folders and files. (See how the last slash is not displayed). To prevent the access to the whole site for the specified bot just add this line in robots.txt file. Instead of BotName, put the name of the bot.
User-agent: BotName
Disallow: /
This robots.txt does not have any restrictions at all and allows all the bots to crawl the whole site and al the files and folders.
User-agent: *
Disallow:
Here is the short list of famous bots /spiders. Robot Name |
I have been banned from craigslist. I...
lets not have any high hopes, there m...
Thank you! a great idea
Thank you so much! what a great idea!
i thought these way of promoting is n...
Well that was 2 years ago, now Drupal...
I heard mtv was running Drupal. It i...
lets not have any high hopes, there m...
I heard Angela back links were great....
That is so true' when ever i visit a ...
I think, that very big part of unmind...
We ordered 500 images retouching pack...
Guys from Seoandwebdesign.com did ast...
Your retouching was great, we will or...
We ordered 500 images retouching pack...