WWW Robot是一种特殊的程序,自动遍历网页的超文本链接结构的检索文件,并递归检索的所有文件索引。
请注意, “递归”在这里并无限的定义或任何特定的遍历算法;即使机器人适用于一些启发式的,以选择的、秩序的文件访问和列出的请求,在较长的时间,和存放较大的空间,它相当于是一个机器人。
正常的网页浏览器并不是机器人,因为它们都是由人工触发浏览网页,并且不自动检索引用的文件(除图片) 。
网络机器人,有时被称为Web的流浪者,网页检索器,或蜘蛛程序。这些名称有些会让人引起误导,因为他们给人的印象好比软件本身的动作之间的网站,像病毒,事实却并非如此,机器人只需从网站中访问网站的文件。
来源于 http://www.robotstxt.org/faq/what.html
A robot is a program that automatically traverses the Web's hypertext structure by retrieving a document, and recursively retrieving all documents that are referenced.
Note that "recursive" here doesn't limit the definition to any specific traversal algorithm; even if a robot applies some heuristic to the selection and order of documents to visit and spaces out requests over a long space of time, it is still a robot.
Normal Web browsers are not robots, because they are operated by a human, and don't automatically retrieve referenced documents (other than inline images).
Web robots are sometimes referred to as Web Wanderers, Web Crawlers, or Spiders. These names are a bit misleading as they give the impression the software itself moves between sites like a virus; this not the case, a robot simply visits sites by requesting documents from them.