I have a lot of respect for Eric Enge and he has put together a great summary of a technical interview with Matt Cutts about Crawling and Indexing. Interview is no longer available at original source (checked March 2018).
I will highlight a few points below, but the full article referenced above is well worth a detailed read.
Matt Cutts quotes:
Matt Cutts: “the number of pages that we crawl is roughly proportional to your PageRank” Matt Cutts: “One idea is that if you have a certain amount of PageRank, we are only willing to crawl so much from that site. But some of those pages might get discarded, which would sort of be a waste” Matt Cutts: “Imagine we crawl three pages from a site, and then we discover that the two other pages were duplicates of the third page. We’ll drop two out of the three pages and keep only one, and that’s why it looks like it has less good content” Matt Cutts: re: affiliate programs: “Duplicate content can happen. If you are operating something like a co-brand, where the only difference in the pages is a logo, then that’s the sort of thing that users look at as essentially the same page. Search engines are typically pretty good about trying to merge those sorts of things together, but other scenarios certainly can cause duplicate content issues”
When I SEO a website, I work very hard at getting rid of duplicate content. While Google works hard at identifying it, there are costs. And Google may not come to the same decision as you would like it to make. So make sure yourself what pages you would like to rank for certain phrases, and make it easy for Google to come to that same decision.
Google drops pages from its index! I don’t like the idea of Google dropping anything. And I certainly don’t want Google to make its own mind up as to what it should drop. So pages can be made to be unique, not included on the site, or you can use rel=canonical to tell Google that rather than indexing a certain page, another pages should be indexed.
Matt Cutts: “ … You really want to have most of your pages have actual products with lots of text on them.”
Having unique text and enough text on pages is most important. I see so many ecommerce sites that have no text on their category pages, and bad SEO of their product pages.
Matt Cutts: re: link juice loss in the case of a domain change: “I can certainly see how could be some loss of PageRank. I am not 100 percent sure whether the crawling and indexing team has implemented that sort of natural PageRank decay” Eric Enge Comment: In a follow on email, Matt confirmed that this is in fact the case. There is some loss of PR through a 301.
Having correct inbound links, and correct internal navigation is rather important. Have good housekeeping in your website and its linking arrangements.