PDA

View Full Version : Google Indexes Document's First 100k



Garrett
04-05-2004, 05:26 PM
How big are your web pages? If you're creating especially long, text-heavy pages you might consider breaking up your site into smaller pieces. Into 100k size pieces to be exact, according to GoogleGuy.

Mark Carey reported that GoogleGuy said (http://www.markcarey.com/googleguy-says/archives/google-indexed-first-101k-of-a-document.html), "we'll typically index the first 101K of a web page -- in practice, more content of a page can be indexed (e.g. PDFs), but if you keep your main content under 100K or so, that's the safest.

Remember that Google's not indexing your images (well, they are, but not in the same index as their web pages), so a page that's over 100k is enormous.

If your pages run over 100k without images you should find a way to break them up some. There's a good chance they're hard for your site visitors to navigate anyhow.

If you absolutely have to have more than 100k on a page, make sure the indexibles are above the 100k line.

Mark Carey
04-05-2004, 06:55 PM
While the 101K limit has been known for some time, there is a debate about whether Google crawls beyond the 101K mark on page. For example suppose a page is 150K in size consisting of mostly links. Will Google simply stop crawling the page after 101K, thus not following the links at the bottom of the page? Or, does Google index only the first 101K, but continue to follow the remainder of the links on the page? I have read claims on both sides of the debate, but never tried to test it myself. The answer can have a significant impact on large sitemaps. Nobody cares if the entire 200K sitemap in indexed, but we certainly care that all of the links are crawled.

adore
04-06-2004, 03:10 PM
When talking about such big sitemaps, there's another question - if such a big number of links could be spidered. As you probably know, the suggestions are that there should be no more than 100 links at one site. Are they spidered or ignored? It's difficult to say.

Riklaunim
04-14-2004, 08:11 AM
Some time ago (1-2 months) I've made a simple HTML page that was something lika a site map. I've put on it links to all articles on my page like:

- Category
{Blockquote here}-link: small descr.{/blockquote}
{Blockquote here}-link: small descr.{/blockquote}

There was more than 200 dynamic links. The page had only Title, and Robots Index/Follow. It got listed on google and some other search engines. And as I noticed google did followed those dynamic links indexing links to forum categories etc. :)