PDA

View Full Version : Google Announces 6 Billion Item Index



Brittany
02-17-2004, 09:23 AM
Google, the search engine named after the mathematical term for a very large number ("googol"), announced today that it has expanded its index to include a very impressive 6 billion items.

“People worldwide can find more information with Google than with any other search engine," boasts Larry Page, Google co-founder and president of Products.

The 6 billion “items” included in the Google web index are comprised of 4.28 billion web pages, 845 million Usenet messages, several information pages related to books, and an image index that has doubled to include over 880 million images.

As if those numbers aren’t impressive enough, Google offers a wide variety of options for its searchers. Priding itself on relevant results, Google, which swept the 2004 Search Engine Watch Awards, offers services including: Google Web Search, which implements “powerful and scalable” technology and is capable of searching non-HTML file formats such as PDF, Microsoft Office, and Corel; Google Image Search, with advanced features allowing users to search by file size, format, coloration, and more; Google Groups, a 20-year Usenet conversation archive; and the newest feature, Google Print, a service allowing users to access book-related information.

This is just the latest accomplishment by the world’s most popular search engine in an effort to keep its searchers coming back for more. It's interesting because the March 2004 issue of Technology Review questions how long Google will be able to last at the top of the search engine world, with competitors such as Microsoft scrambling to find the next breakthrough search technology.

Any thoughts?

Mel
02-17-2004, 10:42 AM
As with all Google Press Releases I tend
to take this with a grain of salt.

There is no mention of the fact that many of the pages which Google indexes are in their supplemental index, which seems is not routinely used for results, but only in cases of "diffcult searches" which IMO makes it kind of a second class index, where the inclusion of a page in the supplemental index seems to place it at the end of the que when rankings are being dished out.

Last month Google was saying that it had an index of 3.5 million pages yet it was reporting 5.4 billion pages returned for the search term +the. (Kind of hard to return more pages than you have in your index IMO). Tonight the Google home page shows that they are indexing 4,285,199,774 pages yet a search on +the returns 5,480,000,000 pages.

As always Google likes to keep us guessing rather than providing straight information.

minstrel
02-17-2004, 11:53 AM
Any way you slice it, that's still a LOT of pages.

As for the supplemental index being "second class", I'm not sure I understand that - seems to me if those are not needed for routine searches but only for "difficult searches", most are probably secondary pages on sites which are already in the primary index.

BOBW
02-17-2004, 06:26 PM
I know google is the best with the issue of relevancy of returns and I am very glad to be listedin google. I knoe that google is the many ISP's default search engine because it works. I am new to this game, but have studied it on the sidelines for years.

I am ranking good in Google.
I was returning on top in Yahoo, but now my site vanished from Yahoo. I am not listed in MSN, or Lycos, or Yahoo. I am listed in the ODP, but only for two weeks, now. I don't know if updates to databases for the ODP will help, or not. Let's hope the posting about it hurting is not true.

QUESTION for experts:
What is the best avenue for paid inclusion for Lycos, MSN, Yahoo, etc?

Mel
02-17-2004, 08:03 PM
Any way you slice it, that's still a LOT of pages.

As for the supplemental index being "second class", I'm not sure I understand that - seems to me if those are not needed for routine searches but only for "difficult searches", most are probably secondary pages on sites which are already in the primary index.
Its not that they are not needed, just that they are not used. A search engine IMO should give equal opportunity to all the pages it indexes, not excluding some pages from the primary search results.

There seems to be no rhyme nor reason for the way sites are included in the supplemental index. One site dealing in jewelry for instance may have all its pages in the primary index, and another jewlery site may have all its pages in the supplemental index. Since there are plenty of results for jewelry terms if you site is put into the supplemental index you are never going to rank for any keywords no matter how good the site, since the index containing your pages is never included in rankings for your keywords.

This is what I mean by second class - not having equal opportunity to rank, all other things being equal.

minstrel
02-17-2004, 08:54 PM
How does one determine which sites are in the primary index and which in the supplementary index (I admit I've never heard of the latter)? And how or why are sites chosen for one versus the other?

What you say seems to me to be a blatant contradiction of the founding principle of Google, viz., a democratic search engine.

greeneagle
02-17-2004, 11:59 PM
An Associated Press Business Writer answered most of the questions in this thread within today's news article about Google.

"Google adds 1B more pages to Web Index"

I guess it takes a little preparation to list 1 billion new web pages, not to mention SERP jockeying. Obviously they are taking the new heightened competition from MSN, Yahoo and numerous startups very seriously. A 25% - 35% increase in indexed web pages, is quite a bit to digest at once. I don't think the ride is over on this one. No wonder Page Ranking activities have slowed down.

Off to the races - Maybe this should be dubbed the "Kentucky" update.

Here's the story (http://seattlepi.nwsource.com/business/aptech_story.asp?category=1700&slug=Google%20Expansion)

Ken

cbp
02-18-2004, 12:29 AM
I was just driving home from work and its made the radio news!!!! The radio treated it as big news that Google reached 6 billion .... or it could be what they call a "slow news day"

CBP

ldyguique
02-18-2004, 07:05 AM
Google Adds 1 Billion More Pages to Web Index (http://www.washingtonpost.com/wp-dyn/articles/A48855-2004Feb17.html)


Google's search engine now spans 4.28 billion Web pages, up from 3.3 billion pages earlier this week. The Mountain View, Calif.-based company also said it has enlarged its index of Web images to 880 million, up from slightly more than 400 million.

Even with the expansion, Google still isn't close to capturing the constantly expanding constellation of online content. By some estimates, there are 10 billion pages on the Web.


Groan. . .do you have any idea just how much of that is pure crud? It's like email spam. I get maybe 4 or 5 emails per day that are useful, but I receive well over 200/day. At least I don't have to look at all of the crud on the net before arriving at places I might want to be. I just have to look at some of it.

cbp
02-18-2004, 07:10 AM
http://www.webproworld.com/viewtopic.php?t=14169

BOBW
02-18-2004, 03:01 PM
Back in the Yahoo today. It must have been a Snafooooo. Yahoo is showing older database results. They were definately pulled from Google prior to latest changes. Google is different and I am trying to optimize for a couple of key phrases that have fallen out on results. Will report findings.

BOBW
02-18-2004, 03:07 PM
I have a theory I want to float. I think that it is plausible that Google is nixing some of the geographical terms such as "county" for favoring paid advertisers. I took a look at the top 20 or so on specific results and no meta tags were even used in most of them. Also the keyword saturation in the content was not consistent with the results. The url vs. meta vs. page name made no sense either. I can only speculate that paid inclusion is working on some search strings.

BOBW
02-18-2004, 03:09 PM
I give up....Now I am ranking high in county searches.

Jan Shepherd
02-18-2004, 03:58 PM
the day before Yahoo! search goes live wouldn't you say? Particularly considering the results.

You know that old Chinese curse "May you live in interesting times"? I reckon we are...and it may just turn out to be a blessing!!

Regards

rambodog
02-18-2004, 04:33 PM
This appears to be more than hype. I have a large number of well optimized legitimate business cleints whose sites were decimated by Florida (moving from top 10 to obscurity) that have roared back into the top 20 today after completely disappearing due to Florida.

It will need more analysis but big changes appear to be afoot again.

rambodog
02-18-2004, 04:36 PM
I have a substantial number of clients with business sites that went from top 10 to obscurity with the Florida update that have moved back into top 20 and top 30 positions this afternoon (2/18). This looks to be a substantial change to the rankings.
Note: the changes are currently shifiting in and out of the results and there is no significant change in inbound links at this point.

Mel
02-18-2004, 11:26 PM
How does one determine which sites are in the primary index and which in the supplementary index (I admit I've never heard of the latter)? And how or why are sites chosen for one versus the other?

What you say seems to me to be a blatant contradiction of the founding principle of Google, viz., a democratic search engine.

If a site is listed in the supplemental index it will have the words supplemental in green in the serps, but you will seldom (If ever) see that in regular search serps, but it is often seen in allinurl:searches.

I have no idea nor can I find any information regarding how Google decides to put a page in the regular or supplemental index. The only information Google has seems to be at http://www.google.com/help/interpret.html and says only this:

Supplemental Result
Google augments results for difficult queries by searching a supplemental collection of web pages. Results from this index are marked in green as "Supplemental."

But since many (most?) of the pages in the supplemental index appear to be regular pages relevant to normal terms it appears to me that such pages have little chance of making it into the rankings.

As a concrete example, I know of a jewelry site which has roughly 1/3 of their pages in the index in the supplemental index and the rest in the regular index.

What makes it confusing is that some of the most important and content filled pages (such as one which is one of the best pages in the web for information on diamonds) are in the supplmental index and some minor pages are in the regular index.

Now since Google only uses supplemental pages to augment results for "diffucult" queries and there are lots of diamond information pages on the web, it seems that this page might as well not be indexed as to be in the supplemental index.

minstrel
02-18-2004, 11:54 PM
As a concrete example, I know of a jewelry site which has roughly 1/3 of their pages in the index in the supplemental index and the rest in the regular index.

What makes it confusing is that some of the most important and content filled pages (such as one which is one of the best pages in the web for information on diamonds) are in the supplmental index and some minor pages are in the regular index.

Now since Google only uses supplemental pages to augment results for "diffucult" queries and there are lots of diamond information pages on the web, it seems that this page might as well not be indexed as to be in the supplemental index.
That actually tends to support what i was suggesting earlier - that Google uses something to identify pages on a site as central or more important and puts those in the primary directory, and other pages from that site go into the supplemental directory. This would be entirely consistent with their "democratic" view of their search engine. As an example, I have yet to see ALL of the pages on my site show up in a Google search - although over time which ones show up and which ones don't varies.

It would be different if there were some entire sites dumped into the supplemental directory - that would be a contradiction of the "principle of democracy" that is Google's goal.

Mel
02-19-2004, 12:58 AM
As a concrete example, I know of a jewelry site which has roughly 1/3 of their pages in the index in the supplemental index and the rest in the regular index.

What makes it confusing is that some of the most important and content filled pages (such as one which is one of the best pages in the web for information on diamonds) are in the supplmental index and some minor pages are in the regular index.

Now since Google only uses supplemental pages to augment results for "diffucult" queries and there are lots of diamond information pages on the web, it seems that this page might as well not be indexed as to be in the supplemental index.
That actually tends to support what i was suggesting earlier - that Google uses something to identify pages on a site as central or more important and puts those in the primary directory, and other pages from that site go into the supplemental directory. This would be entirely consistent with their "democratic" view of their search engine. As an example, I have yet to see ALL of the pages on my site show up in a Google search - although over time which ones show up and which ones don't varies.

It would be different if there were some entire sites dumped into the supplemental directory - that would be a contradiction of the "principle of democracy" that is Google's goal.

Well there are entire sites in the supplemental index too, but look at the bolded portion above, on a site that sells diamond rings google has placed the page that tells you how to choose a good diamond in the supplemental index. If they have such a mechanism to determine important pages ( I have never heard this idea broached before) then its broken IMO.

minstrel
02-19-2004, 01:11 AM
But of course your definition of minor vs. major isn't based on the same criteria as Google's - in your case, you're probably looking at content, while Google is probably looking at links relative to content and other factors. I would guess that this could be affected by how well the page is optimized...

Re: entire sites in Google -- can you show me one or two? I'd be interested in seeing what those sites look like...

Mel
02-19-2004, 02:42 AM
Well you can imagine things like that Mistrel, but then that might not be the case, no?

In this case this page is linked to from every page on an 1800 page site, and in addition shows 95 goodl links from external sites at Alltheweb (Zeal for instance), and just happens to be well optimized and possibly one of the most important pages on the site.

This is a site that sells diamond rings and this is the page that tells you how to inspect diamond rings and how to determine the quality and read the certificates.

Now it may be that Google is not clever enough to figure that all out, but IMO if you can't figure out which are the most important pages to a consumer, then that is a very good argument for not having a superior and an inferior index.

minstrel
02-19-2004, 03:08 AM
Well you can imagine things like that Mistrel, but then that might not be the case, no?
No need to get shirty, Mel... I'm just asking a question.

For a site that is indexed in the primary directory, you may well disagree with certain pages being shifted to the secondary directory but that's a bit like arguing that site #9 or #17 in Google's search results is more relevant than the site listed #1.

That isn't the point though - in the interests of speed alone, I can see why Google might want to shift some of its indexed pages into a supplementary directory. If they shift whole websites into the supplementary directory, that would be a very different thing - that's why I'm wondering about those instances you mentioned where this occurred: what is different about those websites that shuffles them into the supplementary irectory? Are they old? out of date? penalized? what?