Matt Cutts interview- Links, Robots.txt, nofollows, noindex

If there ever was a groundbreaking tell all interview about links, robots.txt, nofollows and noindex, then this one by Eric Enge with Matt Cutts is one!

Have a good read, and re read.

Matt Cutts and Eric Enge- 24 Sep 2007 (transcript 8 Oct 07)

Some interesting points:
  • Robots.txt - Google does not support crawl delay in robots.txt since a number of people are getting the parameters wrong, ie telling Google to crawl only one page per month.
  • Pages excluded via robots.txt still accrue Google PR
  • If page excluded via robots.txt, it can still appear in SERP's if people link to them. For home pages of such sites, the ODP/DMOZ listing title/description may be used.
  • If there is a noindex meta tag on a page, then that page wont be returned in the SERP's
  • noindex pages accrue Google PR, and can pass Google PR - even though as a page in its own right, it does not exist in the SERP's
    - a good use for this would be pages such as sitemaps that you don't want to appear in the SERP's, but that you want to accrue and pass on page rank.
    - or a login page that provides no useful content - meta tag noindex it, and it wont appear in Google but will accrue and pass on Google PR
  • a nofollow meta tag will mean that the links on the whole page are not followed
  • a nofollow tag on a link means that only that link is not followed
  • pages on a site that are not important for SEO should have links to them nofollowed, so as to conserve Google PR
  • where pages are duplicate content on a site (as compared to other pages on the site), they can have the meta noindex tag added
    - also good to change all the links that point to those pages, and use the yahoo or Google webmaster tool to find external inbound links and try and get them changed as well.
  • Matt Cutts: Well, I have made a promise that my Webspam team wouldn't go to the Google Analytics group and get their data and use it. Search quality or other parts of Google might use it, but certainly my group does not
  • Google toolbar not likely to be used for search results - to much noise, and too much bias towards webmaster websites
  • Hidden Text - 14 different ways of creating hidden text
    - Hidden text philosophy - So, our philosophy has tried to be not to find any false positives, but to try to detect stuff that would qualify as keyword stuffing, or gibberish, or stitching pages, or scraping, especially put together with hidden text.
    - Google notifies Webmasters about small incidences of hidden text, email to the Webmaster, and alert in Webmaster Central. Likely to get a 30 day penalty (removal from index) from Google that can increase if you don't remove the text.
    - "I think Google handles the vast majority of idioms like dynamic menus and things like that very well. In almost all of these cases you can construct interesting examples of hidden text. Hidden text, like many techniques, is on a spectrum. The vast majority of the time, you can look and you can instantly tell that it is malicious, or it's a huge amount of text, or it's not designed for the user. Typically we focus our efforts on the most important things that we consider to be a high priority. The keyword stuffed pages with a lot of hidden text, we definitely give more attention."
    -reciprocal links - Google has said that "excessive reciprocal linking is bad". It has not said that "reciprocal linking is bad" - just don't be excessive.

    What is excessive "People pretty reasonably and pretty quickly came to a fairly good definition of what is excessive and that's the sort of thing where we try to give general guidance so that people can use their own common sense. Sometime people help other people to know where roughly those lines are, so they don't have to worry about getting to close to them."

    Links need editorial content around them
    "so the links you get with these campaigns [viral marketing/digg] have some editorial component to them, which is what we are looking for."