Technical Interview with Matt Cutts - Crawling and Indexing

I have a lot of respect for Eric Enge and he has put together a great summary of a technical interview with Matt Cutts about Crawling and Indexing.

I will highlight a few points below, but the full article referenced above is well worth a detailed read.

Matt Cutts quotes:

Matt Cutts: "the number of pages that we crawl is roughly proportional to your PageRank"
Matt Cutts: "One idea is that if you have a certain amount of PageRank, we are only willing to crawl so much from that site. But some of those pages might get discarded, which would sort of be a waste"
Matt Cutts: "Imagine we crawl three pages from a site, and then we discover that the two other pages were duplicates of the third page. We'll drop two out of the three pages and keep only one, and that's why it looks like it has less good content"
Matt Cutts: re: affiliate programs: "Duplicate content can happen. If you are operating something like a co-brand, where the only difference in the pages is a logo, then that's the sort of thing that users look at as essentially the same page. Search engines are typically pretty good about trying to merge those sorts of things together, but other scenarios certainly can cause duplicate content issues"


When I SEO a website, I work very hard at getting rid of duplicate content. While Google works hard at identifying it, there are costs. And Google may not come to the same decision as you would like it to make. So make sure yourself what pages you would like to rank for certain phrases, and make it easy for Google to come to that same decision.

Google drops pages from its index! I don't like the idea of Google dropping anything. And I certainly don't want Google to make its own mind up as to what it should drop. So pages can be made to be unique, not included on the site, or you can use rel=canonical to tell Google that rather than indexing a certain page, another pages should be indexed.


Matt Cutts: We do have the ability to execute a large fraction of JavaScript when we need or want to. One thing to bear in mind if you are advertising via JavaScript is that you can use NoFollow on JavaScript links"

"When we need or want to" - another decision point for Google. I would rather not use Javascript for navigation where there is an alternative, as I would rather there not need to be a decision for Google.

Matt Cutts: " ... You really want to have most of your pages have actual products with lots of text on them."

Having unique text and enough text on pages is most important. I see so many ecommerce sites that have no text on their category pages, and bad SEO of their product pages.

Matt Cutts: re: link juice loss in the case of a domain change: "I can certainly see how could be some loss of PageRank. I am not 100 percent sure whether the crawling and indexing team has implemented that sort of natural PageRank decay"
Eric Enge Comment: In a follow on email, Matt confirmed that this is in fact the case. There is some loss of PR through a 301.

Having correct inbound links, and correct internal navigation is rather important. Have good housekeeping in your website and its linking arrangements.
Digg StumbleUpon del.icio.us technorati blinklist furl reddit sphinn

2 Comments

Girish Chandran - Girish Chandran - Mar 17, 2010

Internal linking architecture is one of the prominent areas to concentrate when dealing with site optimisation. Matt's words from this interview, again emphasizes the importance of having quality text around the site. Its so true and I have seen in person, several ecommerce websites with poor content and sometimes doesnt even have a product title. Pathetic. I am wondering who will put an end to their ignorance. Without correcting such minor things, there is no point in complaining about poor returns from the web :) What do you think?

Bob Jenson - Action Online Web Design - May 5, 2010

Completely agree, especially on the duplicate content points. I keep getting clients who have duplicate content on their old sites.


Post Comment

*
*


Visual CAPTCHA

*
Code is not case-sensitive
*

We welcome comments on this article, provided they have something to contribute. Please note that all links will be created using the nofollow attribute. This is a spam free zone. HTML is stripped from comments, but BBCode is allowed.