Robots Exclusion Standard

The Robots Exclusion Standard is a "non-official" standard which is followed by all police web crawler software. The file instructs webcrawlers on how to behave when visiting a webserver.

Howto instruct crawlers[edit]

Make a file /robots.txt in your websites root. (domain.tdl/robots.txt).

The two basic instructions are "User-agent" and "Disallow". This allows every crawler to access everything:

User-agent: *
Disallow:

Disallowing nothing means allow everything. Disallowing / will disallow your whole domain:

User-agent: *
Disallow: /

These are the basic instructions. It is possible to disallow many files and folders. It is also possible to have many sets of User-agent/disallow in order to instruct crawlers differently:

User-agent: NameOfBotWeDislike
Disallow: /

User-agent: CatchBadBots
Disallow: /trap/

User-agent: *
Disallow: /directory/file1.html
Disallow: /directory/file2.html

Respected by some, not by others[edit]

The two basic instructions mentioned above are followed by all "polite" crawler software.

Some will follow "crawl-delay" (in seconds):

User-agent: *
Disallow: /trap/
Crawl-delay: 10 # Wait at least 10 seconds between crawls

Some also follow request-rate (pages pr/interval in seconds) and visit-time. Visit-time is read as GMT.

User-agent: *
Disallow: /trap/
Request-rate: 1/5         # maximum rate is one page every 5 seconds
Visit-time: 0600-0845     # only visit between 6:00 AM and 8:45 AM UT (GMT)

More information[edit]

Google Blog: Controlling how search engines access and index your website

Examples:

Google: http://google.com/robots.txt
Microsoft: http://www.microsoft.com/robots.txt

Robots Exclusion Standard

Howto instruct crawlers[edit]

Respected by some, not by others[edit]

More information[edit]

Navigation menu

Page actions

Page actions

Personal tools

Search

Navigation

fun free games

software benchmarks

educational videos

Comparisons

Great software

for beginners

cheat sheets

HOWTO

한국어

confused?

feed reader feeds

try your luck

logs

Tools