The indexable web

The indexable web (also called "the surface web" or "the visible web") is the part of the World Wide Web which can be indexed by search engines who follow the rules of web crawling such as following robots.txt.

The Search Engines[edit]

Search Engines make a (distributed) database of the web by using computer programs known as spiders (or "web crawlers") who start with a list of one or more websites and slowly follow hyperlinks from one page to another until they have indexed "the whole web". The sum of pages these spiders can reach are the indexable web.

The Robots Exclusion Standard[edit]

The Robots Exclusion Standard allows webmasters to have a file on their site called robots.txt (like this one]). If this file says you can't index /foo then most (polite) spiders don't do that. However, you can still read all about /foo when you visit a site which forbids /foo from being indexed, but you have to find it in some other way. Thus; /foo is on the web, but is not part of indexable web.

It should also be noted that spiders do not follow links who are generated by JavaScript or included in Flash-files. Polite spiders also don't try to break into password-protected areas.

The Visible web[edit]

The visible web is the part of the Internet you can find in search engines.

This is not the same as the The indexable web.

The difference between the indexable and the visible web is:

Most search engines censor sites. Such sites can be indexed but are not. Big search-engines can say "Linuxreviews? We don't like that site. We're going to put that on our lists of sites who don't appear in our search-engine right now".

And there are other reasons why sites who can be indexed are not; they may be new, the crawlers haven't stumbled on any links to it yet, etc.

The visible web is much smaller than the indexable web.

The Deep Web[edit]

The Deep Web is a term sometimes used for the parts of the Internet who are there, but can not be found by search engines. The term sometimes means the whole internet (visible and invisible) and sometimes the "invisible" parts of the net who can't be found using search-engines.

The indexable web

Contents

The Search Engines[edit]

The Robots Exclusion Standard[edit]

The Visible web[edit]

The Deep Web[edit]

Navigation menu

Page actions

Page actions

Personal tools

Search

Navigation

fun free games

software benchmarks

educational videos

Comparisons

Great software

for beginners

cheat sheets

HOWTO

한국어

confused?

feed reader feeds

try your luck

logs

Tools