PDA

View Full Version : Analysis of Google's new algo



cbp
12-29-2003, 07:31 PM
I have just had a read of this analysis from Atul Gupta at SEORank:
http://www.seorank.com/analysis-of-hilltop-algorithm.htm

It is worth a careful read and consideration. It does seem to explain a lot of what we are seeing.

The author does not beleive that there is a 'filter' or 'money list keywords'.

He also speculates that 20% of weight is given to 'on page' factors (relevance score) and 40% is given to PR - BUT, now 40% weighting is given to a Local Score from the Hilltop algo.

What say you?

CBP

minstrel
12-29-2003, 08:24 PM
Interesting article, cbp - I just gave it a quick scan and bookmarked it for later but my initial reaction is that it sounds like it may be an explanation that "fits the data" we've seen...

minstrel
12-29-2003, 08:50 PM
I'm reading more of this article now, while multitasking between making supper, feeding the pets, and doing laundry (I know - not the best formula for concentration but I'm used to it).

This bit is fascinating:

Our analysis of the above formula and Google behavior indicates that the total weight distributed to the 3 components (RS group, PR group and LS group) is as follows –

RelevanceScore = 20%, PageRank = 40%, LocalScore = 40%

Where:

RS is the translation of all SEO efforts
PR is the translation of Link-building efforts
LS is the translation of links from the expert documents

With this implementation, Google has shifted significant weight to the off-page factors, taking away ranking control from webmasters. As you can see, there is a fairly low score level available to gain just from your SEO efforts. If an average SEO expert is able to leverage 10% of this weight and a super expert SEO can leverage 18% of this weight, the total difference in ranking between an average SEO and a great SEO is just about 8%.
And for those who argue that meta tags and such are irrelevant to today's search engines:


New Google Ranking Formula = {(1-d)+a (RS)} * {(1-e)+b (PR * fb)} * {(1-f)+c (LS)}

Where:

RS = RelevanceScore: (Score based on keywords appearing in Title, Meta tags, Headlines, Body text, URL, Alt text, Title attribute, anchor text etc. of your site)

ronniethedodger
12-29-2003, 09:10 PM
At first glance (I just skimmed it) I am wondering how this Hilltop Algo actually does come up with these "expert" pages.

Outside of known expert pages such as DMOZ, Yahoo Directory, etc...there is a lot to still be had. If the Hilltop Algo is used to determine these missing pages sometime in the future, the prospect of it is a little scary.

There is no mention of it in the article...but Google does track user "back" clicks. It also tracks which results get clicked on. How that may impact in determining expert pages was not addressed.

To some extent, the wild roller coaster effect of some of the two-word phrases is explained by continual and daily habitual checking of results. It also would explain spam appearing to be lodged on page one and not moving at all too. The article did mention the possibility of a spam page being picked up as an "expert" page....and that is the scary part.

cbp
12-30-2003, 11:58 PM
If there is any truth to what is propsed in the article then, as Ron pointed out, what are the "expert pages" ...

Just found this from the original Hilltop document:
http://www.cs.toronto.edu/~georgem/hilltop/

Our approach is based on the same assumptions as the other connectivity algorithms, namely that the number and quality of the sources referring to a page are a good measure of the page's quality. The key difference consists in the fact that we are only considering "expert" sources - pages that have been created with the specific purpose of directing people towards resources. In response to a query, we first compute a list of the most relevant experts on the query topic. Then, we identify relevant links within the selected set of experts, and follow them to identify target web pages. The targets are then ranked according to the number and relevance of non-affiliated experts that point to them. Thus, the score of a target page reflects the collective opinion of the best independent experts on the query topic.


We define an expert page as a page that is about a certain topic and has links to many non-affiliated pages on that topic

CBP[/quote]

ronniethedodger
12-31-2003, 12:20 AM
The key difference consists in the fact that we are only considering "expert" sources - pages that have been created with the specific purpose of directing people towards resources.

We define an expert page as a page that is about a certain topic and has links to many non-affiliated pages on that topic.


If what the article already identified as some of the possible expert sources to be DMOZ, Yahoo, and I can't remember all of the ones cited....then I do not think this Hilltop Algo has been implemented yet.

The article "suggests" that 40% of the weight would be Hilltop. Since I am not in any of these directories, nor am I an anything at all really...then one would assume that I would also not place well in any results....right? But the complete opposite of that is happening.

The second definition I quoted above (I don't know where that is exactly in the text or in what context it was taken out of), but it almost sounds like a link page to me. In fact it is the definition of a link page. It could also be the definition of a directory....but what is the differences between the two really?

I don't know yet. Give me a beer, and feed us some more links cbp. ;0)

minstrel
12-31-2003, 12:23 AM
We define an expert page as a page that is about a certain topic and has links to many non-affiliated pages on that topic.
I'd be interested in knowing precisely how they define "non-affiliated pages"...

cbp
12-31-2003, 12:35 AM
Ron wrote:

If what the article already identified as some of the possible expert sources to be DMOZ, Yahoo, and I can't remember all of the ones cited....then I do not think this Hilltop Algo has been implemented yet.

The expert site stuff is intriguing - as if this algo or a version of it has been implemented, then links from expert sites are more important (I will have to revise my opinion of the importance of a DMOZ link :-) ... sites like ministrels http://www.psychlinks.ca/ may also take on considerably more importance in this version of the algo, as his site could be considered an expert hub/authority site as of the links to resources ... as least that is my understanding of the "expert" site - the problem is now how does the algo identify them?

ministrel wrote:

I'd be interested in knowing precisely how they define "non-affiliated pages"...

The original hiltop document says:

We define two hosts as affiliated if one or both of the following is true:
They share the same first 3 octets of the IP address.
The rightmost non-generic token in the hostname is the same.
We consider tokens to be substrings of the hostname delimited by "." (period). A suffix of the hostname is considered generic if it is a sequence of tokens that occur in a large number of distinct hosts. E.g., ".com" and ".co.uk" are domain names that occur in a large number of hosts and are hence generic suffixes. Given two hosts, if the generic suffix in each case is removed and the subsequent right-most token is the same, we consider them to be affiliated.

E.g., in comparing "www.ibm.com" and "ibm.co.mx" we ignore the generic suffixes ".com" and ".co.mx" respectively. The resulting rightmost token is "ibm", which is the same in both cases. Hence they are considered to be affiliated. Optionally, we could require the generic suffix to be the same in both cases.

The affiliation relation is transitive: if A and B are affiliated and B and C are affiliated then we take A and C to be affiliated even if there is no direct evidence of the fact. In practice some non-affiliated hosts may be classified as affiliated, but that is acceptable since this relation is intended to be conservative

I guess this is one way of dealing with the "networks" of sites set up for spam reasons..

CBP

minstrel
12-31-2003, 12:45 AM
In practice some non-affiliated hosts may be classified as affiliated, but that is acceptable since this relation is intended to be conservative.
Is it just me, or does this sound a little like the Texas formula?

"We'd rather execute an innocent man than take a chance on letting a guilty one off the hook."

ronniethedodger
12-31-2003, 01:03 AM
The expert site stuff is intriguing - as if this algo or a version of it has been implemented, then links from expert sites are more important (I will have to revise my opinion of the importance of a DMOZ link :-)

That is what I have NOT seen any evidence of. I don't even have a link from that other expert hub of Minstrel's either.


The original hiltop document says:
We define two hosts as affiliated if one or both of the following is true:
They share the same first 3 octets of the IP address. blah...blah...etc.

I guess this is one way of dealing with the "networks" of sites set up for spam reasons..


This part I have read about quite some time ago. One SEO paper I read addressed this...referred to it as the Class 3 Block of an IP address.

In the article he used a Widget Company as an example of linking their various sites to build up link popularity. Each site was on different Web Hosts, so that they did not have the possibility of being in the same Class 3 IP block.

There were 5 sites total. Each dealt with the Widget Keyword, but each site was not related per say. One dealt with Ordering Widgets thru an Online store, another dealt with Widgets How-to's and uses (but did not sell anything), one was the main Widget HQ Corporate Site, etc.

The crux of the article dealt with the Class 3 block though. How search engines were already savvy to the spamming aspect. It went into detail how build these sites on other hosts and build the sites to even appear that they were not even the same company.

I have seen this type of setup already in place on some sites. One of our competitors uses it...they have one main product per site and advertise it over the others -- but the others products are low-key and have links to the other sites. They are all linked together. Heck these guys have pages for stuff that they do not even sell, complete with product write ups and pricing (always out of stock).

But the point being is.....what if one of these sites were picked up or chosen to be an expert site? There has to be more to than just looking at the IP address to determine what is considered a non-affilliate site.

ronniethedodger
12-31-2003, 01:19 AM
In practice some non-affiliated hosts may be classified as affiliated, but that is acceptable since this relation is intended to be conservative.
Is it just me, or does this sound a little like the Texas formula?

"We'd rather execute an innocent man than take a chance on letting a guilty one off the hook."

I think you are looking at affilliation aspect in the wrong light.

The only time they would look at affiliation is if the links that are pointing to an expert site is from the same sub-domain or class 3 IP block. In that case they will not use those links to rank the expert site with...they are thrown out.

The way I read it is there are expert sites selected. How that process is determined is still unclear.

Once the expert site is selected...then the get ranked based on non-affiliated links that point to that site (somewhat like backlinks and PR).

Now if two expert sites with outbound links to your site add up to 36 pts (who knows what the point system is) and one expert site with an outbound link to your competitor gives him 48 pts (cuz this expert site is the 800 lb gorilla of expert sites) then your competitor has 12 more pts at 40% which is 4.8 more pts than you in the results.

The points that each expert site hand out are based on their ranking, which is determined by non-affilliated host sites backlinks.

cbp
12-31-2003, 01:30 AM
Ron wrote:

That is what I have NOT seen any evidence of.

It comes back to finding a theory to fit the facts. The theory linked in the first message of the thread can fit the facts - a lot of other theories can be used to fit the facts as well - there is the OOP (over optimization penalty) theory; the money words filter theory; the applied semantics theory; the stemming theory etc - there has been intelligient discussion on all them and intelligient debunking of them all, including the one here (I would not say 'disprove' yet).

The theories are not necessarily stacking up when tested in practice.

The problem is, some read a theory, take it has fact and go off "half cocked" and things take a life of their own. (eg I have seen advice given to get back on top of rankings, you need to replace all your H1 tags with H4's !!!! - its being asked about and repeated more often recently). We have seen other example of this in this forum.

The problem is what am I supposed to do with my sites ... so far I done nothing, but none of the sites were affected by Florida, but if I want to do better what can I do? How do I and others know what is good advice and what is speculation - its not easy.

My interest in the particular theory in this thread, is that I do have a lot of respect for its author and it does make a lot of sense (this does not mean it right). If it is right, I need to get my site listed on more authority/expert/hub sites (as defined by the Hilltop document) rather than just get as many links as I can - this is something I can do and will not harm me (even if the theory is wrong) (but if I followed the advice to change my H1 to H4 - this had the potential to harm me and was bad advice anyway).

CBP

minstrel
12-31-2003, 01:39 AM
I agree, cbp - we need to remember that much of what we're reading are educated guesses but I do like the sense of the article you found.

My feeling is that even if Hilltop isn't it, the trend for Google will be toward something like it - some way of changing the weighting on links pointing to a site to better evaluate what those pointers actually mean in determining where to rank that site.

ronniethedodger
12-31-2003, 02:14 AM
Right Minstrel...it still boils down to links. It is the one constant in every one of the theories except maybe the money word theory. But even that theory had speculation that it applied to high concentration of backlinks with those words in the link text itself.

The Hilltop theory though is the scarier of the bunch when you look at it in it's possible worse-case scenario -- what or who the expert sites will be and how they are determined.

It could be, like you said, that certain aspects of Hilltop is being incorporated into the Google algo or it is being applied after the fact as some type of filter.

I am following RL's thread too, and the changes at the datacenters. Seems something is shaking up there. It will be interesting to see how that all relates to this and fits in with the theory.

cbp
01-01-2004, 02:30 AM
I post this URL in good faith:
http://www.vaughns-1-pagers.com/google-florida-chart.htm

It is another perspective on the update. I post it with a little relunctance. It contains a interesting look at it - the chart needs to be read in conjunction with the table below.

I posted it with relunctance as the author has done just what we discussed above - taken ideas and theory's and presented it as fact. There are some aspects that I diasgree with, but he does incoproate the Hilltop algo.

Good luck trying to understand it !!

CBP

minstrel
01-01-2004, 12:42 PM
I posted it with relunctance as the author has done just what we discussed above - taken ideas and theory's and presented it as fact. There are some aspects that I diasgree with, but he does incorpoate the Hilltop algo.
Yes, I don't mean this as a criticism of your decision, cbp, but I probably wouldn't have bothered to post the URL - I started trying to figure out what he was saying but by the time I saw the second reference to Scroogle (you may recall these people as the "Google shot President Kennedy and created all those crop circles" conspiracy theorists - see thread in Search Engines Insider Reports) and to "theories"/rumours propogated by Scroogle my reaction was, "why bother?"

cbp
01-05-2004, 12:51 AM
Another theory to fit the facts:
http://www.seoresearchlabs.com/seo-research-labs-google-report.pdf (this is a .pdf file)

What say you?

CBP

minstrel
01-05-2004, 01:07 AM
What say you?
What say me? I say another excellent find, cbp. It is refreshing to see common sense in the middle of the ocean of hysteria and gossip that has swirled around Google over the past couple of months. Not only that but Dan Thies tells you straight out when he's offering a wild guess - and my opinion is that even his wild guesses are a huge step forward from the scroogle conspiracy theories.

Even if you only read the section on "rumors that have to be stopped", this 100kb download is worth the time. For anyone currently having panic attacks over Google changes, this article should hit you like a fast-acting, under the tongue Ativan.

cbp
01-05-2004, 01:14 AM
The refreshing thing for me about his approach is that, even if he is wrong and its really the Hilltop algo referenced in my first message, the practical things to do are pretty much the same for both theory's - ie RELEVANT LINKS and LINKS from AUTHORITY SITES (combined with the appropriate on-page stuff to make the site relevant to keywords).

CBP

ronniethedodger
01-05-2004, 01:18 AM
The refreshing thing for me about his approach is that, even if he is wrong and its really the Hilltop algo referenced in my first message, the practical things to do are pretty much the same for both theory's - ie RELEVANT LINKS and LINKS from AUTHORITY SITES (combined with the appropriate on-page stuff to make the site relevant to keywords).


Still have to determine what those Authority Sites are. It is the only unknown out of our control, unless someone already does control it....but then that would be another conspiracy theory, eh? ;0)

cbp
01-05-2004, 01:30 AM
I have no idea how Google would determine these, but I assume there could be something like:

* there are a lot of links out on the page that are focused on one topic/keyword area and the links do not go to 'affiliated' sites (eg DMOZ)- the logic here being if your site is good enough to be listed with a whole lot of other sites on the same topic, means you must be good (the more of these links you have, maybe the better) .... just my piece of speculation.

CBP

JayDrake
01-05-2004, 05:01 PM
What a great thread this has been! I've spent plenty of my work day reading through it and can honestly say it wasn't a waste of time. It's surely good to see people talking about Google in a positive light which hasn't been entirely the normal way of things recently.

Personally, I continue to believe in Google and constantly seek to remind that in finding the real method in Google's madness involves remembering that at the head of all this, Google seeks to provide the most relevant results as often as it can. With this in mind, most of the Google conspiracy theories simply don't hold water in my mind. Anything related to Google trying to get people to use Adwords so they can milk them for money is silly. Yes. Adwords makes them money. Relevant search results make them more, and without those relevant search results they would lose their loyal searchers and Adwords also would suffer. The scientists at Google are smart.

Noting that the scientists at Google are smart (and scientists, for that matter) the concept of not using the h1 tag or really not using or overusing any compliant tag seems unfounded and downright silly to me. HTML was not meant to be used as it has been since the explosion of the web and it's role as the new medium for advertising. HTML was, in fact, meant to exchange data in a structured, easily understood common format. (Much like the format of the '...for dummies' books and other similar series.) This recognized, it is worth noting that many of these scientists quite likely are aware of how HTML was intended to be used and can consider some finer points of the structure of the page such as the use of the 'h' tags. How much weight is given to this I wouldn't know, though I would like to think there's some. Moreover, I continue to believe that if the validity of your HTML isn't already a factor, it will be. This is to say that your HTML should validate against the specification that your document claims to adhere to.

Some quick opinions:

1. Reciprocal links should count for less than one way links. This is to say that if domain A has a page linked to domain B and domain B also has a page linked to domain A, both links should be worth less than if only one or the other domain were to post a link. You might ask why, so I'll tell you. The idea is to find relevant web content, not to find web pages that have savvy people working for them that are good at building relationships with other related sites. Ones ability to trade links is not equal to ones ability to create a site that others link to of their own perogative.

2. There is no number two.

3. If you're honestly doing the right thing and you get hit, keep doing the right thing. Our buddies at Google make mistakes too. They then check to see how their changes went and fix their mistakes.

To me, nothing has changed so far as what I should do to gain Google's favour. Well designed, content rich sites with valid markup and good navigation. Solid linking strategies that involve getting links from related sites and anywhere that simply offers.

I'm sure there's much more I can say to many more of the posts and articles, but I'm somewhat scattered today and it's about time to go home. ;)

PhilC
01-05-2004, 06:16 PM
Here's another article, written by me way back on the 3rd December, that suggests a Hilltop-like expert system:- http://www.webworkshop.net/florida-update.html

I still think that an expert system is the theory that best explaines what we see at Google; specifically that two different algos are being used and that the old algo is being used for more specific searchterms, whereas the new algo is being used for more general searchterms.

janeth
01-05-2004, 06:31 PM
I will start with a lot of the original guesses started because sites had disappeared and no one knew why. Later those sites reapered and again no one knows why.

So the information and things we are looking at today are not the same as the ones we where looking at a month ago.

Also last week the only thing I changed on my site was taking a header tag from and H4 to H3 and we dropped about 3 to 4 points for every search.

I have heard this in about four different places but could not put my finger on any of them now.

But I herd that links from government and Collages carry more weight then any other links. If this is true this could be one type of the Authority Sites.

Again I do not know how true it is or where I read it but I want to say it was from the hilltop .

But some of the same people that where talking about the h tags where also talking about hilltop when the Fla thing just started

http://www.internet-marketing-research.net/forums/viewtopic.php?t=1993

But you have to keep in mind those post where the same time everything went crazy and what we see today is not the same as what we where seeing then.

verygreatful
01-05-2004, 06:35 PM
Hi all, Love this forum and just wanted to drop in and tell you that I am a BIG Google fan and if they make any changes what so ever-it's all for the good of us all. And sometimes rumors can distroy people, so think before you all jump ship with such a super search engine.

ronniethedodger
01-05-2004, 08:28 PM
I have no idea how Google would determine these, but I assume there could be something like:

* there are a lot of links out on the page that are focused on one topic/keyword area and the links do not go to 'affiliated' sites (eg DMOZ)- the logic here being if your site is good enough to be listed with a whole lot of other sites on the same topic, means you must be good (the more of these links you have, maybe the better) .... just my piece of speculation.

Also the more links you have, the better your chance of hitting an Authority site too. Which will be even better. Links are good.

No matter what it is going on at Google, they are not the only game in town. Linking is still good on the other SE's. Google may go bye-bye...then again they may not. But I think either way it will not matter, cuz the links will still be a vital part of the SE game no matter who is on top.


To me, nothing has changed so far as what I should do to gain Google's favour. Well designed, content rich sites with valid markup and good navigation. Solid linking strategies that involve getting links from related sites and anywhere that simply offers.


Nothing has changed for me either. I took one bad hit in November for one "one-word" term (whoopee), and now it is back. Of course even before that...I was not really up there very good anyway so it really didn't matter if it took a hit or not.

On the whole, I am a lot better. Not only at Google, but MSN is doing something right now too. A lot of people do not seem to be interested in that part, but I am...cuz I am looking damn good over there. Referrals are up, and someone else around here at WPW said MSN is coming unglued and sending him a lot more too.

You are right about good content and linking too. They will prevail no matter what happens, and you will be sitting better after the dust has settled every time.

Mel
01-05-2004, 09:05 PM
Great thread all!

In looking at what may be going on at Google, I stopped reading the Gupta paper the minute he equated the Hilltop algo and local PR as one and the same, they are two different patents, and thus necessarily represent two different techonogies.I also find it very hard to believe that Google now relies on only three areas for ranking.

Dan Thies paper needs to be considered carefully, Dan is a man who tends to think things through carefully and uses a lot of common sense.

I think that any serious reseacher can see that there are more and more directory, educational government pages in the top results than before Florida, and that is IMO good evidence that some sort of expert system is in play, since such pages tend to be classified as expert pages.

I personally think that there is more than one basic change to the Google algo at work here, which is one of the reasons its so difficult to decipher exactly whats happening.

I tend to believe that Google is using all the the following to some degreeL

CIRCA technology to better understand the meaning of the page content, which is a basic shore of any search engine.

Hilltop in order to start to bring some order to the chaos that too much dependance on anchortext has brought

Local PR (and/or Topic Sensitive PageRank) possibly to eliminate any IPO problems, but more importantly to sort out the chaos that rampant and indescriminate linking only for ranking purposes has brought.

Stemming is now a fact, but how important is it to rankings?

Now the question in my mind is just how these are all blended into the mix and how much authority each is given in the final rankings.

Combined with the fact that we will soon (NOW?) have to start paying more attention to Inktomi, the next few months will be intersting in the search engine world.

amberstar702
01-06-2004, 12:25 AM
Reading all the previous comments have been very interesting. However, while monitoring one specific website that has no links and no content at all jump up to positions 1 and 2 in Google and Yahoo - I am reminded of times I would tell my clients during counseling sessions that this is neither a logical nor a fair world. Especially since I had done SEO and promotions for this company several months ago. And especially since I have links and content and have dropped off the screen on these two search engines. Unfortunately I am not able to name this website for others to view.

These prior comments are intellectually impressive but I believe are an exercise in futility. In my opinion, it is like trying to understand the nature of God. It makes for stimulating conversation but can never be solved.

cbp
01-06-2004, 12:29 AM
has no links and no content at all jump up to positions 1 and 2 in Google and Yahoo

It must have links. How did you determine that they have none?

CBP

Deep13
01-06-2004, 12:29 AM
just finished reading the article...
very nice explanation..

getting listed on directories like DMOZ is becoming tougher and tougher..

I know its free but people complaining about the behaviour of DMOZ mods, it always makes me think...do they also need money to get listed there?

I hope getting listed in directories doesnt get TOTALLY commercialised and also DMOZ people should also specify the reason why the site did not get listed...

u learn from ur mistakes only...
regards
Deep

minstrel
01-06-2004, 12:37 AM
I agree with you, Deep13, but don't get me started on THAT topic again ;o)

We'd also better hope Steven Glover doesn't read this thread... :O)

Mel
01-06-2004, 12:54 AM
Reading all the previous comments have been very interesting. However, while monitoring one specific website that has no links and no content at all jump up to positions 1 and 2 in Google and Yahoo - I am reminded of times I would tell my clients during counseling sessions that this is neither a logical nor a fair world. Especially since I had done SEO and promotions for this company several months ago. And especially since I have links and content and have dropped off the screen on these two search engines. Unfortunately I am not able to name this website for others to view.

These prior comments are intellectually impressive but I believe are an exercise in futility. In my opinion, it is like trying to understand the nature of God. It makes for stimulating conversation but can never be solved.

While we will perhaps never be privvy to all the details of Googles algo, we strive to learn as much as we can and even that imperfect knowledge is enough to get a client rankings that are beneficial to his business.

IMO an imperfect understanding is not a reason to stop learning ... or believeing for that matter.

T2DMan
01-06-2004, 06:20 AM
At long last, a forum/thread that has more than just the endless theorising. Some ideas that start to make sense of the whole Florida update. The articles mentioned on this thread have a lot more concrete evidence, if one can call them that. Hilltop - can't get more concrete than patents, and new Google employees. Much of the other conspiracy, theories, oop's, ... just didn't gel.

Still not easy. Have a CRM specialist client who I can't get listed for a subpage "CRM consultants", yet their index page listed in the "expert" dmoz where lots of other crm'ers are listed. And they are top for "CRM Consultants city", and top for "CRM Consulting". Each affected subpage needs to be listed on 2+ expert sites. Ahhhh. You need to put up a site. See what terms don't work, then get those subpages listed in expert directories. When new phrases start becoming popular and pages drop, you then need to get new expert links.

Time2Dine - Online Restaurant Booking & Search Engine Optimisation Auckland (http://www.time2dine.co.nz/search-engine-optimisation-auckland.php)

cbp
01-07-2004, 11:35 PM
Here is another document with a very long URL (http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=/netahtml/srchnum.htm&r=1&f=G&l=50&s1=6,658,423.WKU.&OS=PN/6,658,423&RS=PN/6,658,423) to read .... whoops....I mean put you to sleep. It's another patent that Google got on 2 Dec (it was filed in 2001).

Abstract

Improved duplicate and near-duplicate detection techniques may assign a number of fingerprints to a given document by (i) extracting parts from the document, (ii) assigning the extracted parts to one or more of a predetermined number of lists, and (iii) generating a fingerprint from each of the populated lists. Two documents may be considered to be near-duplicates if any one of their fingerprints match.

A paragraph in the summary says:

The present invention may function to generate clusters of near-duplicate documents, in which a transitive property is assumed. Each document may have an identifier for identifying a cluster with which it is associated. In this alternative, in response to a search query, if two candidate result documents belong to the same cluster and if the two candidate result documents match the query equally well, only the one deemed more likely to be relevant (e.g., by virtue of a high Page rank, being more recent, etc.) is returned.

and also say:

In this alternative, in response to a search query, if two candidate result documents belong to the same cluster and if the two candidate result documents match the query equally well (e.g., have the same title and/or snippet) if both appear in the same group of results (e.g., first page), only the one deemed more likely to be relevant (e.g., by virtue of a high PageRank, being more recent, etc.) is returned.

This is a bit scary, as if I read it right, this is different to the so called duplicate content penalty where exactly the same sites get a penalty (one is PR0). I think this means if two pages match for title and some other things ("near duplicates"), one is dropped !!!!

How much of this needs to be integrated into the theories surrounding Florida.

What say you?

CBP

simonm
01-12-2004, 05:19 AM
Minstrel's response with regard to comments about DMOZ

I agree with you, Deep13, but don't get me started on THAT topic again ;o)

We'd also better hope Steven Glover doesn't read this thread... :O)


I second that.

On the main subject of this thread, having had a strategy of big corporate web sites and small targeted 'satellite' sites which were focused and had better positioning for particular key words and terms (pre Florida update), clearly the big site now wins out.

This is even where page optimisation and PR is poorer on the page of the big site than the equivalent content page on the smaller site.

Though I will retain the smaller sites as informational sites, I will certainly put more into page optimisation of the big site pages, also placing links from home page to the more important pages that are otherwise buried deep in the site.

kevinmc2
01-13-2004, 08:21 AM
SHs says "Hi all, Love this forum and just wanted to drop in and tell you that I am a BIG Google fan and if they make any changes what so ever-it's all for the good of us all. And sometimes rumors can distroy people, so think before you all jump ship with such a super search engine."

Google USE to be a legitimate search engine. Now the irrelevancy of search results due to their so-called Florida update, frankly makes it a near-worthless search engine. I do all my searches now with Ask Jeeves and am trying some of the less known search engines with better, more relevant results.

cbp
01-13-2004, 02:58 PM
Google USE to be a legitimate search engine. Now the irrelevancy of search results due to their so-called Florida update, frankly makes it a near-worthless search engine.

This thread started off as an analysis of what was happening at Google ... I guess it was only a matter of time before it degenerates into an anti-Google thread.

This is what is good about WebProWorld in comparison to other forums - its not Google worship, but its also not Google bashing - it trying to understand and do some analysis.

CBP

Mel
01-13-2004, 08:57 PM
SHs says "Hi all, Love this forum and just wanted to drop in and tell you that I am a BIG Google fan and if they make any changes what so ever-it's all for the good of us all. And sometimes rumors can distroy people, so think before you all jump ship with such a super search engine."

Google USE to be a legitimate search engine. Now the irrelevancy of search results due to their so-called Florida update, frankly makes it a near-worthless search engine. I do all my searches now with Ask Jeeves and am trying some of the less known search engines with better, more relevant results.

Hi Kevin
While I tend to agree that the current google search results are not as good as they were pre-florida, Googles now algo is still a work in progress IMO, and the results get better with each tweak or twist of the knob.

I wouldn't write Google off just yet and anticipate more improvements in the near future. I know that a 2 month hiatus of not so relevant results (especially over the Holidays) is a disaster for many but the best may be yet to come.

cbp
02-02-2004, 04:01 AM
Heres another read that puts a slightly different slant on things:
http://answers.google.com/answers/threadview?id=300810

There is also a link to a thread here at WPW at the bottom of the article!!!

CBP

Mel
02-02-2004, 04:19 AM
Well notwithstanding the fact that this comes from a Google Answerer, they too put their pants on one leg at a time, and I have seen significant evidence Post Florida which shows that repeated anchor text can work wonders with rankings.

cbp
02-02-2004, 04:25 AM
I agree with the power of the anchor text.

BUT, one of my PR5 sites is MIA since Austin (went up after Florida) - every single link to the site has the main keyword in the anchor text .... that why this got me thinking .... I can see no other reason for the drop. Most of the links are on topic and from important sites (eg DMOZ) - once the dust has settled, I will set about changing the anchor text on 50% of the links to something like 'click here' or similar - see what happens.

CBP

Mel
02-02-2004, 04:58 AM
Hi Cbp

IMO Google has severely deprecated internal anchor text which is resulting in many sites losing their rankings, but I see no indication of that for exteranl links, especially links from expert sites.

fathom
02-02-2004, 04:58 AM
I agree with the power of the anchor text.

BUT, one of my PR5 sites is MIA since Austin (went up after Florida) - every single link to the site has the main keyword in the anchor text .... that why this got me thinking .... I can see no other reason for the drop. Most of the links are on topic and from important sites (eg DMOZ) - once the dust has settled, I will set about changing the anchor text on 50% of the links to something like 'click here' or similar - see what happens.

CBP

For what it's worth - "click here" or some other facsimile will get you better ranks for "click here"... and I highly recommend avoiding that direction.

Notwithstanding - I wouldn't change anything - at least not at a 50% gamble level where you have no idea - what cause what - and the next update could result in a total reversal.

Frustrating as hell - I know -- but never ever base major decisions and major changes on a single update -- you will almost always be wrong 100% of the time.

cbp
02-11-2004, 04:45 AM
Here is the latest bed time read, if you can't sleep:
http://javelina.cet.middlebury.edu/lsa/out/cover_page.htm

Like some of the other "dry" reads linked in this thread, it is well worth a read and carefully digested as it does appear to fit in with some of what is happening at Google.

CBP

Mel
02-11-2004, 06:17 AM
Nice find CBP, if latent sematic indexing can be developed for a large scale search engine it should be a very powerful tool, and almost spam proof.

Now if Google could just add an LSI node to thier engine...

webnewton
02-11-2004, 08:42 AM
There is more to this guys! Google has become really smart this time.

cbp
02-11-2004, 03:06 PM
Now if Google could just add an LSI node to thier engine

I think they may have already done some form of this. LSI is the technology behind the Applied Semantic purchase by Google last year.

CBP

ppayne
02-14-2004, 08:05 AM
I have a question about affiliate program type links, i.e. getting 100 of my customers to link to my page with a certain URL that allows them to get paid for sales they send me. Is having 100 people linking to jlist.com from their sites, be them anime fan sites or personal web pages or blogs, a Good Thing overall? Does it raise my page's ranking in Google, as long as they're not abusing it in any way, e.g. making doorway pages leading to my site to increase their own revenues? I'm not sure where the "affiliate connection" to linking is.

PhilC
02-14-2004, 08:25 AM
Links into a site are always a good thing to have, ppayne. We are still in the dark about Google's new algorithm, and it may turn out that some links are better than others. It may even turn out that some links aren't counted at all for rankings, but no links into a site can hurt, and all of them may be beneficial.

compar
02-15-2004, 11:38 AM
I have just published a new article in our InfoPool. You can read it here Content and the New Google Algorithm. (http://www.compar.com/infopool/articles/news24.html)

Comments are welcome.

cbp
02-15-2004, 02:27 PM
Good read compar.

Mods - how about moving this to this thread:
http://www.webproworld.com/viewtopic.php?t=10802

...as its another good resource.

CBP

minstrel
02-15-2004, 02:38 PM
Okay...

Of course, maybe we need to re-title this thread "The NEW new Google Algorithm"...

cbp
03-09-2004, 12:08 AM
Here is another OK read:
http://www.sitepoint.com/article/1290

CBP

Mel
03-09-2004, 01:56 AM
Interesting but no substance to back up the suppostions.

cbp
03-17-2004, 07:17 PM
Here is a statement from GG over at WMW about 18 months ago and reposted at Cre8asite forums (yes I know its old):

Of course, folks never know when we're going to adjust our scoring. It's pretty easy to spot domains that are hoarding PageRank; that can be just another factor in scoring. If you work really hard to boost your authority-like score while trying to minimize your hub-like score, that sets your site apart from most domains. Just something to bear in mind..

No wonder GG has to carefully choose his words, as I am sure there are some hints in there that may shed further light on the algo.
ie
authority-like score

hub-like score

which are relevant to the topics in this thread

CBP

mtbot
03-17-2004, 08:04 PM
I am sure there are some hints in there that may shed further light on the algo.
ie
authority-like score

hub-like score


Looks like you found gold! Thanks CBP.

For those that don't know, a hub-like score increases with having more outgoing links on your site and an authority-like score increases with receiving more incoming links.

This may be one way that Google has tried to lower the ranking of link directories and link pages which, by their nature, provide no unique content.

The problem, of course, is what this may mean in practice if it is being implemented in any way. Last weekend, I removed all junky, reciprocal outgoing links from my site and put them on another site. So my hub-like score has greatly decreased. Hopefully, this will help me in at least some small way.

fathom
03-17-2004, 09:40 PM
I am sure there are some hints in there that may shed further light on the algo.
ie
authority-like score

hub-like score


Looks like you found gold! Thanks CBP.

For those that don't know, a hub-like score increases with having more outgoing links on your site and an authority-like score increases with receiving more incoming links.

This may be one way that Google has tried to lower the ranking of link directories and link pages which, by their nature, provide no unique content.

The problem, of course, is what this may mean in practice if it is being implemented in any way. Last weekend, I removed all junky, reciprocal outgoing links from my site and put them on another site. So my hub-like score has greatly decreased. Hopefully, this will help me in at least some small way.

It's also worth while viewing all pages that Google knows of as "one great big single website" were your particular pages are not the actually center (hub or authority)... in this context you can really start appreciating how many "other things" come into play besides PageRank (and scoring).

The greatest problem we deal with is "perspective". That perspective tends to automatically place all the conditions of the "total Google web" around ourselves - because the total equation is too complex to view - thus we choose to look "only from our own angle".

jackson992
03-29-2004, 03:41 PM
This leads me to believe that the bigger your site is aka pages it has, the better you will rank. This is for sure indicated by the fact that my biggest site was the only one to to get hit by this months algo change