|
|
||||||
|
||||||
| Index Link To US Private Messages Archive FAQ RSS | ||||||
| Google Discussion Forum Google Discussion forum is for topics specifically related to Google. There is a subforum dedicated to AdSense/AdWords subjects. |
Share Thread: & Tags
|
||||
|
![]() |
|
|
LinkBack | Thread Tools | Display Modes |
|
||||
|
LSI/LSA – Is MSN Ahead of GOOGLE in AI?
In another thread it was claimed that MSN has the advantage over GOOGLE in their algorithms in the use of Artificial Intelligence (AI) utilizing LSI/LSA. I have yet to see anything substantial indicating that MSN has endeavored to that extent. Yet, GOOGLE selects what they determine to be the most accurate description of a page from multiple on page and off page components for SERP listings including a recent revisited importance favoring the “Description Metatag” when determined relevant. IMO – That indicates to me that LSI/LSA has been established to some extent. What are your thoughts? Ken |
|
||||
|
I don't personally see it as MSN being ahead of google, only using it first in their algo. That does not say they are actually ahead in what they can do with it yet.
But this is Mels baby, so I will shut up now until he posts :)
__________________
William Cross Expert Search Engine Optimization Man's Best Friend: If you don't believe it, just try this experiment. Put your dog and your wife in the trunk of the car for an hour. When you open the trunk, who is really happy to see you? |
|
||||
|
Some of you WPW members must not know what LSI/LSA are.
On the latent semantic indexing webpage you find a lot of articles that explain the concepts. Click on papers in the left menu. A related subject is term vectoring.
__________________
Mini Network:: Financial information at your fingertips Learn object oriented programming where it started I will use a search engine before I ask dumb questions. |
|
||||
|
Actually, I was one of the first people saying it was going to happen along with Jake baille and others. I am well familliar with it.
__________________
William Cross Expert Search Engine Optimization Man's Best Friend: If you don't believe it, just try this experiment. Put your dog and your wife in the trunk of the car for an hour. When you open the trunk, who is really happy to see you? |
|
|||
|
There's a new search engine based in UK which claims to be most advanced in the world!! So press release said! They use AI I believe.
Have a look at previewseek - I use it a lot as I like the clustering and the results. Regards |
|
||||
|
Is it a meta SE?
Previewseek was also more comprehensive, he said. "It visits all the major search engines on your behalf, so that you can be sure that you are not missing any result. We put the results back together and then we grade them according to our BehaveRank algorithm that continuously evolves based on user behaviour." My underline or a combination of a unique AI and a meta SE? Then you have to define what you mean by better? Better at finding new sites / pages? They say it is more comprehensive and uses what they call their BehaveRank algorithm than changes with user actions. Good, but should like to see that algorithm.
__________________
Mini Network:: Financial information at your fingertips Learn object oriented programming where it started I will use a search engine before I ask dumb questions. |
|
||||
|
It is evening in Norway.
You disagree as usual, as I with you :-) "PageRank or PR(A) can be calculated using a simple iterative algorithm, and corresponds to the principal eigenvector of the normalized link matrix of the web. Also, a PageRank for 26 million web pages can be computed in a few hours on a medium size workstation. There are many other details which are beyond the scope of this paper." Yes mathematics is trivial as my professor told me, but Google is much more then algorithms. It is n datacenters, hardware .... How large is the global Google index?
__________________
Mini Network:: Financial information at your fingertips Learn object oriented programming where it started I will use a search engine before I ask dumb questions. |
|
||||
|
If all other things were equal google would work just fine. The trouble is they are not.
The Google algo is predicated on the basis that somehow a selection of links taken from the web accurately reflects the relevance of a page to the needs of a searcher. Now that half the world has a fair idea how the Google algorithm works, they are out there trying to manipulate it. Optimising pages, buying links etc. Its against this background, that you can see the intrinsic honesty of the straightfoward salesmanship encompassed in pay per click. Whatever algorithm or update Google puts in place, there will always be attempts to manipulate it.
__________________
Simply Clicks | SEO | SEO Training| Pay Per Click Advertising | Search Engine Powered Marketing |
|
||||
|
buying links etc I dislike that feature myself. Buy your way to the top.
Do we have any ida about wich SE is best at 1. Fighting spam? 2. Finding new sites? I wote for Google. If we should mention another SE, one of mine favourites is the Australian SE Factbites that gives good results on other regions of the world. Example KW's: Finance in Russia Look at the great SERP's. Directory like listings. Quality is sometimes more important than quantity.
__________________
Mini Network:: Financial information at your fingertips Learn object oriented programming where it started I will use a search engine before I ask dumb questions. |
|
||||
|
In many respects SEO has become like product placement in films. Do the stars really prefer brand X to smoke, drink, drive etc? Of course not. The brands pay to feature in films. Now the brands pay to feature on the SERPs. Either through an intermediary SEO or through pay per click.
All the SE's have become corrupted. It's just a question of degree and definition. Spam is an arbitrary definition of illicit manipulation. Sure there's crude black hat. But most SEO involves shades of grey where one man's spam is another man's garlic sausage. Given the combined market cap of Google and Yahoo is close to $200 billion it would be a surprise if everything stayed clean and above board.
__________________
Simply Clicks | SEO | SEO Training| Pay Per Click Advertising | Search Engine Powered Marketing |
|
||||
|
Massive artificial IBLs is all it takes to manipulate Google.
Agree. I am sure, Google are working on other algorithms. I await a new update, where that is the next spam issue. Since everybody's eyes are on Google, Google is also the number one spam object. I think that it will be more and more difficult to spam Google. Look at the meta tag discussions here at WPW. What emphasis Google places on the various tags, will in my view be more and more difficult to know in advance. By the time you have figured it out, there is a new update. The weakest point in the Google algorithm now is in my opinion exactly link buying. "To be or not to be, that is the question." Translated to SE's, IMO Staying power / consitency over time, that is the question.
__________________
Mini Network:: Financial information at your fingertips Learn object oriented programming where it started I will use a search engine before I ask dumb questions. |
|
||||
|
In many respects SEO has become like product placement in films. Do the stars really prefer brand X to smoke, drink, drive etc? Of course not. The brands pay to feature in films. Now the brands pay to feature on the SERPs. Either through an intermediary SEO or through pay per click.
Yes, may be. One of the most popular Tv programs on Norwegian Tv, "Hotel Cesar" does exactly what you say, David. Brands are shown in seconds. I have been told that they are shown so fast that you (consciously) nearly do not register them. But people love the program, I think it has been there for more than 4 years.
__________________
Mini Network:: Financial information at your fingertips Learn object oriented programming where it started I will use a search engine before I ask dumb questions. |
|
||||
|
Once again I have read thru an entire thread here where the first 3-4 posts are about the actual topic at hand, and then it just degrades into multiple other topics that have nothing to do with the actual topic.
Nice.
__________________
William Cross Expert Search Engine Optimization Man's Best Friend: If you don't believe it, just try this experiment. Put your dog and your wife in the trunk of the car for an hour. When you open the trunk, who is really happy to see you? |
|
|||
|
OK first off what do we all agree what latent semantic indexing is about?
IMO LSI is best described with regard to search engines as a solution to the problem of "vocabulary mismatch" which means that search engines might not find the same pages for say SEO and Search Engine Optimization and Search engine marketing even though the writers of pages using these terms migght in fact be talking about the same thing but using different words. For a more formal discussion of what LSI as applied to a search engine might be see http://www.cs.utk.edu/~berry/lsi++/node2.html Next is Google even using LSI in its ranking algorithm? There are those who say that the Googles purchase of Applied Semantics (who have systems based on LSI) is indicative of the fact that Google is using LSI in the ranking algo. There are others who say that since you can do a special search in Google by prefixing the search term with a tilde (~) that this search must be based on LSI and therefor Google is using LSI. My understanding of the Applied Symantics situation is that Google bought Applied Semantics because they were already using one of thier products to determine which Adwords ads were applicable to display for various search terms. I believe that Google are still using this technology in one way or another both Adwords and Adsense, but seriously doubt that it is being used in organic search to any degree. As an example you can do a bit of research on this by doing a search and noticing that the pages displayed by Adwords to not match up very well with the pages displayed in the organic search alongside, hence the ranking systems must be somewhat different, which leadsme to the conclusion that LSI as used in the Adwords algo is not a major component of the organic ranking algo. Likewise the ~ search which google says adds synonyms to your search is quite a bit different than LSI in that LSI determines the symantic connections between words in documents and synonyms are simply lists of words that mean the generally the same thing. As an example Search Engine marketingmay be symantically connected to Search engine optimization in many documents but they are not synonyms. Still others have suggested that Google may be applying LSI not to the text content of webpages but to the anchor text of links that point to them, and for that reason you should vary your anchor text to achieve higher LSI scores, but I have not seen anything more than speculation that this is in fact the case. I welcome comments on any evidence that Google is in fact using LSI. |
|
|||
|
OK first off what do we all agree what latent semantic indexing is about?
IMO LSI is best described with regard to search engines as a solution to the problem of "vocabulary mismatch" which means that search engines might not find the same pages for say SEO and Search Engine Optimization and Search engine marketing even though the writers of pages using these terms migght in fact be talking about the same thing but using different words. For a more formal discussion of what LSI as applied to a search engine might be see http://www.cs.utk.edu/~berry/lsi++/node2.html Next is Google even using LSI in its ranking algorithm? There are those who say that the Googles purchase of Applied Semantics (who have systems based on LSI) is indicative of the fact that Google is using LSI in the ranking algo. There are others who say that since you can do a special search in Google by prefixing the search term with a tilde (~) that this search must be based on LSI and therefor Google is using LSI. My understanding of the Applied Symantics situation is that Google bought Applied Semantics because they were already using one of thier products to determine which Adwords ads were applicable to display for various search terms. I believe that Google are still using this technology in one way or another both Adwords and Adsense, but seriously doubt that it is being used in organic search to any degree. As an example you can do a bit of research on this by doing a search and noticing that the pages displayed by Adwords to not match up very well with the pages displayed in the organic search alongside, hence the ranking systems must be somewhat different, which leads me to the conclusion that LSI as used in the Adwords algo is not a major component of the Google organic ranking algo. Likewise the ~ search which google says adds synonyms to your search is quite a bit different than LSI in that LSI determines the symantic connections between words in documents and synonyms are simply lists of words that mean the generally the same thing. As an example Search Engine marketing may be semantically connected to Search engine optimization in many documents but they are not synonyms. Still others have suggested that Google may be applying LSI not to the text content of webpages but to the anchor text of links that point to them, and for that reason you should vary your anchor text to achieve higher LSI scores, but I have not seen anything more than speculation that this is in fact the case. I welcome comments on any evidence that Google is in fact using LSI. |
|
||||
|
Is it possible to explain the mathematics?
Think of a threedimensional space (room) and arrows pointing from the origo to different places in the space (room). Place documents on the tip of each arrow (vector). Documents that are related, lie close to each other (euclidian distance) in that room. Even if you can not visualize higher dimensions than three, mathematically the computation is done in exactly the same manner (eucledian distance etc. in higher dimensional spaces is analogus). LSI represents terms and documents in a rich, high-dimensional space, allowing the underlying ("latent"), semantic relationships between terms and documents to be expolited during searching. Each dimension is merely assumed to represent one ore more semantic relationship in the term-document space. Mathematical methods (singular value decomposition - SVD) are used to reduce the dimension of the term-document space, to filter out noise etc. and find related documents. In this way the underlying semantic relationships between documents are revealed. LSI statistically analyses the pattern of word usage across the entire document collection, placing documents with similar word usage near each other in the term-document space, and allowing semantically related documents to be near each other even though they may not share terms. This is simple mathematics (mostly linear algebra learned in under graduate cources in mathematics) and the methods have been known for many years, so it should be no problem for Google to use LSI.
__________________
Mini Network:: Financial information at your fingertips Learn object oriented programming where it started I will use a search engine before I ask dumb questions. |
|
|||
|
The math may be rather simple to you (but in all likelyhood not to most posters here) but have you considered the computational necesary resources to run that calculation on 8 billion pages at ranking time?
|
|
|||
|
Double post
|
|
||||
|
1. Explaining latent semantic indexing (LSI) in more detail as I see it.
Let the space be three (x,y,z) dimensional n=3 (a room, with the origin in the center of the room so you can have negative elements on the x,y and z axis). Pick two documents in that room. If the arrows stands vertical on each other (orthogonal) there is no relation between the two documents. They are not candidates in the same search query. If they lay in the same two dimensional plane, the space collapses to a plane, n=2. If they lay on the same arrow it collapses to the real line. In both cases, the documents have related content. If two documents are placed on the same arrowhead, they should be (near) identical depending on how accurate the algorithme is (at flitering out noise (spam) etc). 2. Computational burden. How are the space represented? If you think of C++ that is used in LSI, a three dimensional space is a pointer to a pointer to a pointer (pointer***). Similarily for higher dimensions. Today, there are a lot of C++ liberaries with container classes. The computational burden of course, dependes on the dimensionality, number of elements (documents) and methods that are implemented to operate on the vectors (arrows). Should not be a problem with high dimensional systems with todays computing power and paralellism. May be (distributed) beta (objects) - see toolbar - may be used to make the computations more compact and efficient.
__________________
Mini Network:: Financial information at your fingertips Learn object oriented programming where it started I will use a search engine before I ask dumb questions. |
|
|||
|
Yes Kgun, You can explain it that way if you like, but I doubt that such an explaination will be helpful to many who do not practice mathematics daily.
You seem to have a good handle on LSI, do you agree that for the average searcher, webmaster and SEO that it simply is a way for search engines to find related pages which cannot be found by lexilogical analysis? IMO we do not have to have a mathematical understanding of the process to know if it works or is useful, sort of like you don't have to know the stress levels in the connecting rods of you car engine in order to use it and find it useful. |
|
||||
|
You seem to have a good handle on LSI, do you agree that for the average searcher, webmaster and SEO that it simply is a way for search engines to find related pages which cannot be found by lexilogical analysis?
Correct as far as I understand. I am not an expert. I have only tried to boil down my mathematical understanding of LSI in the hope that it may be helpful to other members / surfers. IMO we do not have to have a mathematical understanding of the process to know if it works or is useful, sort of like you don't have to know the stress levels in the connecting rods of you car engine in order to use it and find it useful. Agree. See my two last posts in the thread about information.
__________________
Mini Network:: Financial information at your fingertips Learn object oriented programming where it started I will use a search engine before I ask dumb questions. |
|
|||
|
Well two days have passed with no one responding to my request for any substantiation that Google is in fact using LSI as a part of its ranking alog, so I guess its fair to say that no one has any evidence to support this assumption, and I guess then we should just let the matter die of old age.
|
|
||||
|
I think that to give a definite answer to that question is like saying that you can square the circle, that is construct a square with side lenght r*sqrt(pi). Seemingly a very simple problem.
You can not solve that problem numerically with all available computing power on the earth. The Google algorithm is perhaps so complex now, that it is nearly impossible to simulate every aspect of it. If you look at the original paper above, Google uses linear algebra to compute the page rank. There is no problem as I said above for Google to use LSI as an element in their indexing. Most probably they use it or a more general method that has LSI as a special case. Only my guessing, even without having studied the original paper or its relation to LSI in any detail. GoogleBOT: "Do not touch my circles" :-)
__________________
Mini Network:: Financial information at your fingertips Learn object oriented programming where it started I will use a search engine before I ask dumb questions. |
|
||||
|
Mel didnt ask to prove that google was using it, merely to supply evidence that they might be using it. He wanted to see what reasons people had for thinking google might be using LSI at this stage.
__________________
William Cross Expert Search Engine Optimization Man's Best Friend: If you don't believe it, just try this experiment. Put your dog and your wife in the trunk of the car for an hour. When you open the trunk, who is really happy to see you? |
|
||||
|
Quote:
You may use grid hosting for search tool. The position of independent documents, should be orthogonal to each other. So you may use "orthogonal principal components decomposition" to find less related documents. The inverted link matrix of the web will give related (linked) documents. Of course, Google, know all about this.
__________________
Mini Network:: Financial information at your fingertips Learn object oriented programming where it started I will use a search engine before I ask dumb questions. Last edited by kgun; 04-14-2008 at 01:34 PM. |
|
||||
|
Quote:
Quote:
Perhaps because the index of a search engine is constantly updated and you can therefore say that new information is fed into it. But as I understand it, LSI/LSA does not include some kind of memory function, so when data is no longer available in the index, it's not going to use memorized information. To me that would be an important part of AI. (Though I guess it is possible to make the algorithms use no longer available, memorized info, as well.) As to MSN, as far as I understood it, they´re trying to really use artificial intelligence, where Google is still more going after the algorithmical approach. Perhaps these 2 things are pretty much the same thing. Overall though, I think we´re just on the level of "information retrieval" and far from actual Artificial Intelligence.
__________________
FREE SEO ! Really? YES! All you have to do is implement it! Follow me on Twitter PeterIMC |
|
||||
|
Quote:
Search for: * something The star can be any word Google seems fit. The ideal is that they come up with something that makes sense in language and not just what ever. LSI would make that a lot easier. Take 2 combinations of 2 words, where the last word is the same. Choose them such that you know for sure that one combination is more commonly used than the other. If LSI is used, It is more likely that they will choose the most used one to rank high when doing the * something search.
__________________
FREE SEO ! Really? YES! All you have to do is implement it! Follow me on Twitter PeterIMC |
|
||||
|
Quote:
Great comment Peter. Clever man.
__________________
Mini Network:: Financial information at your fingertips Learn object oriented programming where it started I will use a search engine before I ask dumb questions. Last edited by kgun; 04-15-2008 at 07:42 PM. |
|
||||
|
Any updated information on this subject?
__________________
Mini Network:: Financial information at your fingertips Learn object oriented programming where it started I will use a search engine before I ask dumb questions. |
![]() |
|
| Thread Tools | |
| Display Modes | |
|
|
|
WebProWorld |
Advertise |
Contact Us |
About |
Forum Rules |
MVP's |
Archive |
Newsletter Archive |
Top |
WebProNews
WebProWorld is an iEntry, Inc. ® site - © 2010 All Rights Reserved Privacy Policy and Legal iEntry, Inc. 2549 Richmond Rd. Lexington KY, 40509 |