 |

01-10-2006, 11:44 AM
|
|
WebProWorld 1,000+ Club
|
|
Join Date: Dec 2003
Location: Houston
Posts: 5,716
|
|
LSI/LSA – Is MSN Ahead of GOOGLE in AI?
LSI/LSA – Is MSN Ahead of GOOGLE in AI?
In another thread it was claimed that MSN has the advantage over GOOGLE in their algorithms in the use of Artificial Intelligence (AI) utilizing LSI/LSA.
I have yet to see anything substantial indicating that MSN has endeavored to that extent. Yet, GOOGLE selects what they determine to be the most accurate description of a page from multiple on page and off page components for SERP listings including a recent revisited importance favoring the “Description Metatag” when determined relevant. IMO – That indicates to me that LSI/LSA has been established to some extent.
What are your thoughts?
Ken
|

01-10-2006, 11:53 AM
|
|
WebProWorld Veteran
|
|
Join Date: Jul 2003
Location: GoogleVille
Posts: 911
|
|
I don't personally see it as MSN being ahead of google, only using it first in their algo. That does not say they are actually ahead in what they can do with it yet.
But this is Mels baby, so I will shut up now until he posts :)
|

01-10-2006, 01:06 PM
|
 |
WebProWorld 1,000+ Club
|
|
Join Date: May 2005
Location: Norway
Posts: 4,659
|
|
Some of you WPW members must not know what LSI/LSA are.
On the latent semantic indexing webpage you find a lot of articles that explain the concepts.
Click on papers in the left menu. A related subject is term vectoring.
|

01-10-2006, 01:13 PM
|
|
WebProWorld Veteran
|
|
Join Date: Jul 2003
Location: GoogleVille
Posts: 911
|
|
Actually, I was one of the first people saying it was going to happen along with Jake baille and others. I am well familliar with it.
|

01-10-2006, 02:01 PM
|
|
WebProWorld Veteran
|
|
Join Date: Jan 2006
Posts: 372
|
|
MSN wasn't the first. Fast has had niche engines using LSI for quite a while. The problem hasn't been implementing LSI in a search engine, but in implementing it in a very large engine.
|

01-10-2006, 02:20 PM
|
|
WebProWorld Pro
|
|
Join Date: Nov 2005
Location: Devon UK
Posts: 102
|
|
There's a new search engine based in UK which claims to be most advanced in the world!! So press release said! They use AI I believe.
Have a look at previewseek - I use it a lot as I like the clustering and the results.
Regards
|

01-10-2006, 05:09 PM
|
 |
WebProWorld 1,000+ Club
|
|
Join Date: May 2005
Location: Norway
Posts: 4,659
|
|
Is it a meta SE?
Previewseek was also more comprehensive, he said. "It visits all the major search engines on your behalf, so that you can be sure that you are not missing any result. We put the results back together and then we grade them according to our BehaveRank algorithm that continuously evolves based on user behaviour."
My underline or a combination of a unique AI and a meta SE?
Then you have to define what you mean by better? Better at finding new sites / pages?
They say it is more comprehensive and uses what they call their BehaveRank algorithm than changes with user actions.
Good, but should like to see that algorithm.
|

01-10-2006, 05:17 PM
|
 |
WebProWorld 1,000+ Club
|
|
Join Date: Oct 2003
Location: Encinitas, CA
Posts: 1,908
|
|
I think your premise is false. Google relies on IBLs, not content. There is little that is intelligent about Google.
__________________
DrTandem's San Diego Web Page Design, drtandem.com
|

01-10-2006, 05:21 PM
|
 |
WebProWorld 1,000+ Club
|
|
Join Date: May 2005
Location: Norway
Posts: 4,659
|
|
Good evening DrTandem1
It is evening in Norway.
You disagree as usual, as I with you :-)
"PageRank or PR(A) can be calculated using a simple iterative algorithm, and corresponds to the principal eigenvector of the normalized link matrix of the web. Also, a PageRank for 26 million web pages can be computed in a few hours on a medium size workstation. There are many other details which are beyond the scope of this paper."
Yes mathematics is trivial as my professor told me, but Google is much more then algorithms.
It is n datacenters, hardware ....
How large is the global Google index?
|

01-10-2006, 05:43 PM
|
 |
WebProWorld 1,000+ Club
|
|
Join Date: Oct 2004
Location: Kent, England
Posts: 1,425
|
|
All other things being equal
If all other things were equal google would work just fine. The trouble is they are not.
The Google algo is predicated on the basis that somehow a selection of links taken from the web accurately reflects the relevance of a page to the needs of a searcher.
Now that half the world has a fair idea how the Google algorithm works, they are out there trying to manipulate it. Optimising pages, buying links etc.
Its against this background, that you can see the intrinsic honesty of the straightfoward salesmanship encompassed in pay per click.
Whatever algorithm or update Google puts in place, there will always be attempts to manipulate it.
|

01-10-2006, 06:05 PM
|
 |
WebProWorld 1,000+ Club
|
|
Join Date: May 2005
Location: Norway
Posts: 4,659
|
|
buying links etc I dislike that feature myself. Buy your way to the top.
Do we have any ida about wich SE is best at
1. Fighting spam?
2. Finding new sites?
I wote for Google.
If we should mention another SE, one of mine favourites is the Australian SE
Factbites that gives good results on other regions of the world.
Example KW's:
Finance in Russia
Look at the great SERP's. Directory like listings.
Quality is sometimes more important than quantity.
|

01-10-2006, 06:18 PM
|
 |
WebProWorld 1,000+ Club
|
|
Join Date: Oct 2003
Location: Encinitas, CA
Posts: 1,908
|
|
Quote:
|
Originally Posted by kgun
buying links etc I dislike that feature myself. Buy your way to the top.
Do we have any ida about wich SE is best at
1. Fighting spam?
2. Finding new sites?
I wote for Google.
Quality is sometimes more important than quantity.
|
When a new site is buried for up to a year, simply being indexed does no good. Spam? Massive artificial IBLs is all it takes to manipulate Google.
I agree about quality v. quantity. Google has lost the quality battle. Tons of 404 errors that are ages old. This is beyond a quality issue, it goes to purposeful dilution of the SERPs. Like the bar that waters down its liquor on its "special" drinks, you need to pay.
__________________
DrTandem's San Diego Web Page Design, drtandem.com
|

01-10-2006, 06:35 PM
|
 |
WebProWorld 1,000+ Club
|
|
Join Date: Oct 2004
Location: Kent, England
Posts: 1,425
|
|
Mystery of SEO
In many respects SEO has become like product placement in films. Do the stars really prefer brand X to smoke, drink, drive etc? Of course not. The brands pay to feature in films. Now the brands pay to feature on the SERPs. Either through an intermediary SEO or through pay per click.
All the SE's have become corrupted. It's just a question of degree and definition. Spam is an arbitrary definition of illicit manipulation. Sure there's crude black hat. But most SEO involves shades of grey where one man's spam is another man's garlic sausage. Given the combined market cap of Google and Yahoo is close to $200 billion it would be a surprise if everything stayed clean and above board.
|

01-10-2006, 06:36 PM
|
 |
WebProWorld 1,000+ Club
|
|
Join Date: May 2005
Location: Norway
Posts: 4,659
|
|
Massive artificial IBLs is all it takes to manipulate Google.
Agree.
I am sure, Google are working on other algorithms. I await a new update, where that is the next spam issue.
Since everybody's eyes are on Google, Google is also the number one spam object.
I think that it will be more and more difficult to spam Google.
Look at the meta tag discussions here at WPW. What emphasis Google places on the various tags, will in my view be more and more difficult to know in advance. By the time you have figured it out, there is a new update.
The weakest point in the Google algorithm now is in my opinion exactly link buying.
"To be or not to be, that is the question."
Translated to SE's, IMO
Staying power / consitency over time, that is the question.
|

01-10-2006, 06:51 PM
|
 |
WebProWorld 1,000+ Club
|
|
Join Date: May 2005
Location: Norway
Posts: 4,659
|
|
In many respects SEO has become like product placement in films. Do the stars really prefer brand X to smoke, drink, drive etc? Of course not. The brands pay to feature in films. Now the brands pay to feature on the SERPs. Either through an intermediary SEO or through pay per click.
Yes, may be.
One of the most popular Tv programs on Norwegian Tv, "Hotel Cesar" does exactly what you say, David. Brands are shown in seconds. I have been told that they are shown so fast that you (consciously) nearly do not register them.
But people love the program, I think it has been there for more than 4 years.
|

01-10-2006, 09:39 PM
|
|
WebProWorld Veteran
|
|
Join Date: Jul 2003
Location: GoogleVille
Posts: 911
|
|
Once again I have read thru an entire thread here where the first 3-4 posts are about the actual topic at hand, and then it just degrades into multiple other topics that have nothing to do with the actual topic.
Nice.
|

01-10-2006, 10:59 PM
|
|
WebProWorld 1,000+ Club
|
|
Join Date: Jul 2003
Posts: 1,919
|
|
OK first off what do we all agree what latent semantic indexing is about?
IMO LSI is best described with regard to search engines as a solution to the problem of "vocabulary mismatch" which means that search engines might not find the same pages for say SEO and Search Engine Optimization and Search engine marketing even though the writers of pages using these terms migght in fact be talking about the same thing but using different words. For a more formal discussion of what LSI as applied to a search engine might be see http://www.cs.utk.edu/~berry/lsi++/node2.html
Next is Google even using LSI in its ranking algorithm?
There are those who say that the Googles purchase of Applied Semantics (who have systems based on LSI) is indicative of the fact that Google is using LSI in the ranking algo.
There are others who say that since you can do a special search in Google by prefixing the search term with a tilde (~) that this search must be based on LSI and therefor Google is using LSI.
My understanding of the Applied Symantics situation is that Google bought Applied Semantics because they were already using one of thier products to determine which Adwords ads were applicable to display for various search terms. I believe that Google are still using this technology in one way or another both Adwords and Adsense, but seriously doubt that it is being used in organic search to any degree.
As an example you can do a bit of research on this by doing a search and noticing that the pages displayed by Adwords to not match up very well with the pages displayed in the organic search alongside, hence the ranking systems must be somewhat different, which leadsme to the conclusion that LSI as used in the Adwords algo is not a major component of the organic ranking algo.
Likewise the ~ search which google says adds synonyms to your search is quite a bit different than LSI in that LSI determines the symantic connections between words in documents and synonyms are simply lists of words that mean the generally the same thing. As an example Search Engine marketingmay be symantically connected to Search engine optimization in many documents but they are not synonyms.
Still others have suggested that Google may be applying LSI not to the text content of webpages but to the anchor text of links that point to them, and for that reason you should vary your anchor text to achieve higher LSI scores, but I have not seen anything more than speculation that this is in fact the case.
I welcome comments on any evidence that Google is in fact using LSI.
|

01-10-2006, 11:05 PM
|
|
WebProWorld 1,000+ Club
|
|
Join Date: Jul 2003
Posts: 1,919
|
|
OK first off what do we all agree what latent semantic indexing is about?
IMO LSI is best described with regard to search engines as a solution to the problem of "vocabulary mismatch" which means that search engines might not find the same pages for say SEO and Search Engine Optimization and Search engine marketing even though the writers of pages using these terms migght in fact be talking about the same thing but using different words. For a more formal discussion of what LSI as applied to a search engine might be see http://www.cs.utk.edu/~berry/lsi++/node2.html
Next is Google even using LSI in its ranking algorithm?
There are those who say that the Googles purchase of Applied Semantics (who have systems based on LSI) is indicative of the fact that Google is using LSI in the ranking algo.
There are others who say that since you can do a special search in Google by prefixing the search term with a tilde (~) that this search must be based on LSI and therefor Google is using LSI.
My understanding of the Applied Symantics situation is that Google bought Applied Semantics because they were already using one of thier products to determine which Adwords ads were applicable to display for various search terms. I believe that Google are still using this technology in one way or another both Adwords and Adsense, but seriously doubt that it is being used in organic search to any degree.
As an example you can do a bit of research on this by doing a search and noticing that the pages displayed by Adwords to not match up very well with the pages displayed in the organic search alongside, hence the ranking systems must be somewhat different, which leads me to the conclusion that LSI as used in the Adwords algo is not a major component of the Google organic ranking algo.
Likewise the ~ search which google says adds synonyms to your search is quite a bit different than LSI in that LSI determines the symantic connections between words in documents and synonyms are simply lists of words that mean the generally the same thing. As an example Search Engine marketing may be semantically connected to Search engine optimization in many documents but they are not synonyms.
Still others have suggested that Google may be applying LSI not to the text content of webpages but to the anchor text of links that point to them, and for that reason you should vary your anchor text to achieve higher LSI scores, but I have not seen anything more than speculation that this is in fact the case.
I welcome comments on any evidence that Google is in fact using LSI.
|

01-11-2006, 05:47 AM
|
 |
WebProWorld 1,000+ Club
|
|
Join Date: May 2005
Location: Norway
Posts: 4,659
|
|
Is it possible to explain the mathematics?
Think of a threedimensional space (room) and arrows pointing from the origo to different places in the space (room). Place documents on the tip of each arrow (vector). Documents that are related, lie close to each other (euclidian distance) in that room. Even if you can not visualize higher dimensions than three, mathematically the computation is done in exactly the same manner (eucledian distance etc. in higher dimensional spaces is analogus).
LSI represents terms and documents in a rich, high-dimensional space, allowing the underlying ("latent"), semantic relationships between terms and documents to be expolited during searching. Each dimension is merely assumed to represent one ore more semantic relationship in the term-document space.
Mathematical methods (singular value decomposition - SVD) are used to reduce the dimension of the term-document space, to filter out noise etc. and find related documents. In this way the underlying semantic relationships between docum | |