We announce a new agent of user (user-agent) for the robots.txt file, that is called Googlebot-News, and that even provides to the publishers more control on its content. In case you have not heard speak on the robots.txt file, it is a standard of Internet that takes using from 1994 and which it has been adopted by all the main motors search and all the “robots” that processes the Web of suitable form. When a motor search watches if it has permission to track and to index a page Web, the mechanism of “sight if we have permission to track those pages” is in the robots.txt file.
The publishers could easily contact with us through a form if they wish not to be including in Google the News but they want to appear in the index of results search Web of Google. Now, the publishers can control their content in Google more the News of a form even automated. The owners of the Web sites can simply add specific directives to Googlebot-News in their robots.txt archives. Of form similar to the agents of user Googlebot and Googlebot-Image, the new agent of Googlebot-News user can be used to specify that pages of a Web site would have to be tracked and to appear in Google the News.
Here we presented/displayed some examples to you for publishers:
To include pages in the search as much Web of Google like in the News:
User-agent: Googlebot
Disallow:
This it is the simplest case. In fact, a robots.txt file is not needed for this case.
To include pages in the search Web of Google, but not in the News:
User-agent: Googlebot
Disallow:
User-agent: Googlebot-News
Disallow: /
This robots.txt file says that there is no file to which it is not possible to be acceded for the general dredge of Google for the Web, Googlebot call; but the agent of user “Googlebot-News” has blocked the access to all the archives of the Web site.
To include pages in Google the News, but not in the search Web of Google:
User-agent: Googlebot
Disallow: /
User-agent: Googlebot-News
Disallow:
When a robots.txt file is watched, Google stops the most specific directives. Two forward edge says to us that Googlebot (the agent of user for the index Web of Google) is blocked to track any page of the Web site. The following directive, that is for an agent of more specific user, for Google the News, suppresses the blockade of Googlebot and notifies to permission to Google to track the pages of its Web site.
To block groups different from pages of the search Web of Google and Google the News:
User-agent: Googlebot
Disallow: /latest_news
User-agent: Googlebot-News
Disallow: /archives
The pages blocked for the search Web of Google and Google the News can be controlled of independent form. This file robots.txt blocks for search Web of Google the news more recent (the URL that is in the /latest_news folder), but allows that these appear in Google the News. However, content blocks premium to Google the News (the URL that is in the /archives folder folder), but allows that they appear in the search Web of Google.
To avoid that pages for the search Web of Google and Google are tracked the News:
User-agent: Googlebot
Disallow: /
This robots.txt file says to him to Google that Googlebot, the agent of user for our dredge of the search Web, would not have to track any page of this Web site. And because any directive to Googlebot-News has not been specified, our search of the News will follow the guide who of Googlebot, and will not track pages for Google the News.
For some words search, we showed to results of Google the News in a discreet box or section in the pages of results Web, along with our normal results search. Some times also we make this with Images, Videos, Maps and Products. This is known like Universal Search. As Google the News feeds the “Universal News” on the results search, if you block the agent of user of Googlebot-News, then the news of your Web site are not included in the results search Universal.
We are now proving the support for the new agent of user. If you see some problem, please háznoslo to know. It remembers that it is possible that English Google [] offers a connection to a page in certain cases, when we have not even tracked that page. If you want to read more on the robots.txt archives, we offer additional documentation in our page Web. We hope that webmasters enjoys the flexibility and the easiest handling than the agent of Googlebot-News user offers.
Published by Jonathan Simon, Webmaster Trends Analyst; translated by Hope, equipment of Quality search.