Robots Exclusion standard which is commonly known as robots.txt is a file used to index part of a web page which is useful and not to index a part of web page or whole by search engines. This robots.txt file follows set of rules and regulations for crawling a web page by crawlers. This file must be placed in a root level of a website. If web owner doesn't specify this file in website's root level then the whole site can be crawled by crawlers.
Syntax for robots.txt file: http://www.domain.com/robots.txt
Structure of robots.txt file consist of user-agent and disallow. With these web owners can restrict the crawlers in indexing the web pages of a particular site that which can be excluded in crawling by web robots and which part can be indexed.
Few examples for robot.txt file using user-agent and disallow are:
1. User-agent: *
Disallow:
The above example meant that all search robots can visit all files of a website since * includes all robots.
2. User-agent: *
Disallow: /
The above example is meant to exclude the whole website by all search robots if placed in a robots.txt file.
3. User-agent: *
Disallow: /temp/
Disallow: /cgi-bin/
This example is meant to not to crawl the specified folders. Here 'temp' and 'cig-bin' excluded in crawling by all robots.
4. User-agent: *
Disallow: /directory/
This meant to not to crawl the specific directory by all robots mentioned in robots.txt file.
5. User-agent: x robot
Disallow: /
The above example represents that 'x robot' excludes in crawling the entire site.
Syntax for robots.txt file: http://www.domain.com/robots.txt
Structure of robots.txt file consist of user-agent and disallow. With these web owners can restrict the crawlers in indexing the web pages of a particular site that which can be excluded in crawling by web robots and which part can be indexed.
Few examples for robot.txt file using user-agent and disallow are:
1. User-agent: *
Disallow:
The above example meant that all search robots can visit all files of a website since * includes all robots.
2. User-agent: *
Disallow: /
The above example is meant to exclude the whole website by all search robots if placed in a robots.txt file.
3. User-agent: *
Disallow: /temp/
Disallow: /cgi-bin/
This example is meant to not to crawl the specified folders. Here 'temp' and 'cig-bin' excluded in crawling by all robots.
4. User-agent: *
Disallow: /directory/
This meant to not to crawl the specific directory by all robots mentioned in robots.txt file.
5. User-agent: x robot
Disallow: /
The above example represents that 'x robot' excludes in crawling the entire site.
No comments:
Post a Comment