You will have already started to understand it:

.The internal structure of robots.txt is really simple and once you understand how it works it is difficult to make mistakes. You just need to understand what the various labels found in this file are, namely (among the main ones): User-agent Disallow Allow Crawl delay Sitemap So let’s go and see their meaning and their functions, shall we? User-agent: This value indicates the crawler to which the following directives are directed. To refer to Google ‘s general crawler we will therefore indicate Googlebot , for images only Googlebot -Image, for Baidu and so on. If you want to refer to all crawlers , just enter an asterisk .

The specified User-agent NOT to scan a certain URL

Is the directive Iran Telegram Number Data that instructs . In short, it is the true heart of the file, its profound reason for being. Allow: Unlike Disallow, this directive specifies which pages or subfolders can be scanned. But why is this directive needed, when we know that crawlers are led to automatically scan every URL until otherwise requested? Well, this directive is used above all to give access to pages that we want to index and which are however inserted in a directory marked by the Disallow directive. Clear, right? However , remember that Allow works only and exclusively for the Google bot .

Here is an example:

This special directive Israel phone number database is used to tell crawlers to wait a certain . Amount of time before crawling the next page of the website. The value to be entered is in milliseconds . In this case you should know that Googlebot turns a deaf ear to this directive. In short, you can use it for Bing , Yahoo! and Yandex, but not for Mister G, for which you instead have to change the setting within the Google Search Console . Sitemap: this directive is used to specify to the search engine the URL where the site’s sitemap is located. How do you create a robots.txt file? Well, now you know pretty much everything you should know about this important file.

The specified User-agent NOT to scan a certain URL

Here is an example:

发表评论 取消回复

发表评论取消回复