Contact Us Forum Rules Search Archive
WebProWorld Part of WebProNews.com
Page One Link To Us Edit Profile Private Messages Archives FAQ RSS Feeds  
 

Go Back   WebProWorld > Search Engines > Search Engine Optimization Forum
Subscribe to the Newsletter FREE!


Register FAQ Members List Calendar Arcade Chatbox Mark Forums Read

Search Engine Optimization Forum SEO is much easier with help from peers and experts! The WebProWorld SEO forum is for the discussion and exploration of various search engine optimization topics. Any non (engine) specific SEO or SEM topics should go here.

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 02-25-2005, 06:46 AM
Faglork's Avatar
WebProWorld Veteran
 

Join Date: Feb 2005
Location: Forchheim, Germany
Posts: 947
Faglork RepRank 0
Default Latent Semantic Indexing (LSI) - theory, praxis, outlook

Hello everybody,

as suggested by greeneagle, I try to start a discussion about Latent Semantic Indexing (LSI), since right now this becomes a widely discussed topic, with posts shattered all over the WPW forum.

Latent Semantic Indexing seems to provide a new, but long expected shift in the way the big engines rank sites in the SERPs.

What is LSI?

Latent Semantic Indexing (LSI) is a statistical information retrieval method which lets you find relevant information not by simple keyword match, but by recognizing the "theme", the "topic" of a given document, in our case a website (Do I hear echoes of "topic distillation"? http://tinyurl.com/3wucm ). What's more, the document does not even have to include the keyword. LSI will get the "meaning" of the page and find it anyway. LSI is even capable of finding relevant pages in *other* languages.

Now, some people claim that Google is already using LSI and that this might account for the seemingly erratic behavior noticed since the last "Allegra" update. Fact is, Google IS using some sort of LSI when you apply the ~ operator to a search. Try
http://www.google.com/search?q=~zoo
and notice the highlighted terms.

If you want to play around with an LSI implementation, visit the Telcordia Latent Semantic Indexing (LSI)
Demo Machine at http://lsi.research.telcordia.com/
and use the demo.

The most comprehensive article on LSI can be found at
http://javelina.cet.middlebury.edu/l.../lsa_intro.htm

A Google search for "definition: latent semantic indexing" will give you an overview. What really stunned me is that the first result is a SEO company claiming to optimize for LSI ...


So far for a first introduction. I will try to find more information, above all on "if and how do existing SEs already use LSI".


Any comments, ideas, suggestions ...?

Alex
Reply With Quote
  #2 (permalink)  
Old 02-25-2005, 06:51 AM
Mel Mel is online now
WebProWorld 1,000+ Club
 

Join Date: Jul 2003
Posts: 1,922
Mel RepRank 0
Default Re: Latent Semantic Indexing (LSI) - theory, praxis, outlook

Quote:
Originally Posted by faglork
...

Latent Semantic Indexing seems to provide a new, but long expected shift in the way the big engines rank sites in the SERPs.

...
I think that the cart may be before the horse here. I have not seen a single shred of verifiable evidence that LSI is being used at all by any major search engine.
__________________
Mel Nelson
Expert SEO
Reply With Quote
  #3 (permalink)  
Old 02-25-2005, 07:24 AM
Faglork's Avatar
WebProWorld Veteran
 

Join Date: Feb 2005
Location: Forchheim, Germany
Posts: 947
Faglork RepRank 0
Default

Just read
Semantic Web Ontologies: What Works and What Doesn't
Google's director of search quality discusses challenges of automation, knowledge, spam, and even politics.
http://www.alwayson-network.com/comm...d=7480_0_3_0_C
Reply With Quote
  #4 (permalink)  
Old 02-25-2005, 08:01 AM
ctabuk's Avatar
Moderator
WebProWorld Moderator
 

Join Date: Jul 2003
Location: Lincolnshire
Posts: 4,453
ctabuk RepRank 3ctabuk RepRank 3ctabuk RepRank 3
Default

Alex, firstly a very well deserved MVP.
You know me, I'm simple, but it does appear from reading everything (for once)that Politics and Political URL's are and always have been Googles first choice, so has this been around for sometime, or is it the shape of things to come on all SE's?
Reply With Quote
  #5 (permalink)  
Old 02-25-2005, 09:09 AM
Faglork's Avatar
WebProWorld Veteran
 

Join Date: Feb 2005
Location: Forchheim, Germany
Posts: 947
Faglork RepRank 0
Default

Quote:
Originally Posted by ctabuk
(...) so has this been around for sometime, or is it the shape of things to come on all SE's?
While LSI obviously has been around sometime, it's practical use in SEs may still take some time. The SEs themselves keep mum on that topic, maybe because it is too soon to go public, or because they do not want to tell the competition where they are heading.

The purpose of this discussion is to bring together the pieces of information. There is a lot of speculation going on, maybe together we can shed some light on this interesting topic.

So what I'd like to see were some posts with EVIDENCE - either for or against the use of LSI in SEs. Not just posts like "I do not believe it" or "I believe it".

The above article by Google's "director of search quality" just acknowledges that Google is at least toying around with it, so to speak. If and how far this already has been incorporated in Google's algo is still a matter of speculation. The test with ~ as search operator shows that there IS some kind of semantic algo already in use. Whether it is used in the normal search as well and whether it is an LSI algo, up to this time nobody knows.

But I would LIKE to know, hence the discussion.

Alex
Reply With Quote
  #6 (permalink)  
Old 02-25-2005, 09:52 AM
Mel Mel is online now
WebProWorld 1,000+ Club
 

Join Date: Jul 2003
Posts: 1,922
Mel RepRank 0
Default

Ok Alex that makes more sense. LSI is not new nor a magic bullet, after all it effectively just does the same thing that parsing the words into barrels does, but it may enable search engines to understand the meaning of the the writer of the page a bit better.

The article is interesting, but remember it is the director of search quality talking not the director of search engineering. I believe that Google has many similar projects that they working in the lab, most may never see the light of day as a part of their search engine, but there may be one that will truly be a breakthrough. My take on the article is that he is not talking about LSI at all, but about the problems of producing more relevant results by any means.

Perhaps O/T or perhaps not

Every time there is a major shakeup in Googles algo, we see all sorts of theories surfacing that Google has implmented this or that new and revolutionary technology but it never seems to be the case. If you do a bit of research on the infamous Florida update you will find that there were theories aplenty advanced that Google had implemented Hilltop, Localrank, topic specific page rank, and many more but until the Allegra update all these quietly died.

Now we have another update that has people scratching thier heads, but just like Florida the results seem to be slowly gravitating back towards where they were before Allegra. Please pardon me if I an less than enthusiastic about the new crop of theories.

I am, with minor variations still optimixing pages in much the same way that I was four years ago, and they still rank well. I have watched many top three Google rankings drop down to the bottom of the first page or even onto the third page, but over the past two weeks some have climbed back up to the #3 spot and some have climbed from the third page to the top of the second without me changing a period on a page. The exact same thing happened after Florida, when it took a full two months for things to return to normal.

My intended response this update is to have another cup of coffee, read a few more forum posts to see if anyone has come up with anything startling, then build a few more sites, and see if the rankings come back.
__________________
Mel Nelson
Expert SEO
Reply With Quote
  #7 (permalink)  
Old 02-25-2005, 10:23 AM
Faglork's Avatar
WebProWorld Veteran
 

Join Date: Feb 2005
Location: Forchheim, Germany
Posts: 947
Faglork RepRank 0
Default

Quote:
Originally Posted by Mel
Ok Alex that makes more sense. LSI is not new nor a magic bullet, after all it effectively just does the same thing that parsing the words into barrels does,
No. Did you try the LSI demo machine? You can find pages which do not even have the keyword in it. This is way beyond "parsing into barrels".

Quote:
Originally Posted by Mel
My take on the article is that he is not talking about LSI at all, but about the problems of producing more relevant results by any means.
The article appeared in a series:
Quote:
This text is excerpted from SDForum's Semantic Technologies Seminar
Quote:
Originally Posted by Mel
Now we have another update that has people scratching thier heads, but just like Florida the results seem to be slowly gravitating back towards where they were before Allegra. Please pardon me if I an less than enthusiastic about the new crop of theories.
Google accquired APPLIED SEMANTICS:
Quote:
Applied Semantics' products are based on its patented CIRCA technology, which understands, organizes, and extracts knowledge from websites and information repositories in a way that mimics human thought and enables more effective information retrieval. A key application of the CIRCA technology is Applied Semantics' AdSense product that enables web publishers to understand the key themes on web pages to deliver highly relevant and targeted advertisements.
It should be natural for Google to try and incorporate this into web search as well.

I, too, am not enthusiastic. I am just curious. The topic itself is fascinating, and there is a lot of buzz going round. I'd like to get some clarity.

Alex
Reply With Quote
  #8 (permalink)  
Old 02-25-2005, 12:03 PM
Mel Mel is online now
WebProWorld 1,000+ Club
 

Join Date: Jul 2003
Posts: 1,922
Mel RepRank 0
Default

Quote:
Originally Posted by faglork
Quote:
Originally Posted by Mel
Ok Alex that makes more sense. LSI is not new nor a magic bullet, after all it effectively just does the same thing that parsing the words into barrels does,
No. Did you try the LSI demo machine? You can find pages which do not even have the keyword in it. This is way beyond "parsing into barrels".
That may be but google will be happy to show you rankings for pages that do not have the word on the page using the current technology. Not to say that LSI could not be more useful if the very high overheads could be conquered.

Quote:
Originally Posted by faglork
Quote:
Originally Posted by Mel
My take on the article is that he is not talking about LSI at all, but about the problems of producing more relevant results by any means.
The article appeared in a series:
Quote:
This text is excerpted from SDForum's Semantic Technologies Seminar
Sure enough, but that does not automatically make all the topics discussed about LSI. Semantics is a very big field.

Quote:
Originally Posted by faglork
Quote:
Originally Posted by Mel
Now we have another update that has people scratching thier heads, but just like Florida the results seem to be slowly gravitating back towards where they were before Allegra. Please pardon me if I an less than enthusiastic about the new crop of theories.
Google accquired APPLIED SEMANTICS:
Quote:
Applied Semantics' products are based on its patented CIRCA technology, which understands, organizes, and extracts knowledge from websites and information repositories in a way that mimics human thought and enables more effective information retrieval. A key application of the CIRCA technology is Applied Semantics' AdSense product that enables web publishers to understand the key themes on web pages to deliver highly relevant and targeted advertisements.
People have been speculating that Google bought Applied Semantics for search reasons but at the time of the sale it was said that the main reasons for buying that company were to have inhouse control over Adsense, and to enlarge the distribution channels for it.


Quote:
Originally Posted by faglork
I, too, am not enthusiastic. I am just curious. The topic itself is fascinating, and there is a lot of buzz going round. I'd like to get some clarity.
That buzz is one of the things I am talking about in the last part of my post.
__________________
Mel Nelson
Expert SEO
Reply With Quote
  #9 (permalink)  
Old 02-25-2005, 12:37 PM
WebProWorld 1,000+ Club
 

Join Date: Dec 2003
Location: Houston
Posts: 5,715
greeneagle RepRank 0
Default

faglork wrote:
Quote:
"No. Did you try the LSI demo machine? You can find pages which do not even have the keyword in it. This is way beyond "parsing into barrels"."
Interesting that you should point that out. Yesterday I went and looked at some of the most competitive single indiscriminant keyword categories one can think of and reviewed Sites in the Top GOOGLE SERPS. For virtually every category there were Top Sites Without metatags, titles and even HTML Body in one case.

Here's just a few:

Marble: http://www.marble-institute.com/ - #1,out of 10,500,000 returns – Title only, no metatags.

Chickens: http://www.ansi.okstate.edu/poultry/chickens/ #1 out of 3,630,000 returns – Title only, No Metatags, No HTML Body, list of breeds in the navigation frame.

Trucks: http://www.chevrolet.com/ - #1 out of 23,400,000 – Title and Description, “truck” or “trucks” not used in either, no keywords.

Tires: http://www.goodyear.com/ - #1 out of 12,800,000 returns. Title only, no metatags.

Fertilizer: http://www.fertilizer.com/ - #1 out of 4,310,000 returns. Title only, no metatags.

Snakes: http://www.pitt.edu/~mcs2/herp/SoNA.html - #1 - #1 out of 4,330,000 returns. Title only, no metatags.

Oil: http://www.oilonline.com/ #2 out of 91,300,000 returns. Title only, no metatags.

Ken
Reply With Quote
  #10 (permalink)  
Old 02-25-2005, 12:43 PM
WebProWorld 1,000+ Club
 

Join Date: Dec 2003
Location: Houston
Posts: 5,715
greeneagle RepRank 0
Default

Mel:
Quote:
"People have been speculating that Google bought Applied Semantics for search reasons but at the time of the sale it was said that the main reasons for buying that company were to have inhouse control over Adsense, and to enlarge the distribution channels for it."
_______

Why can't that technology be applied to searches as well. The information is indexed for millions of pages that have GOOGLE ads on them anyway. Maybe the technology has matured enough. If so, they have been quite brilliant about the wey they went about funding it.

Ken
Reply With Quote
  #11 (permalink)  
Old 02-25-2005, 01:33 PM
WebProWorld 1,000+ Club
 

Join Date: Dec 2003
Location: Houston
Posts: 5,715
greeneagle RepRank 0
Default

faglork:
Quote:
"A Google search for "definition: latent semantic indexing" will give you an overview. What really stunned me is that the first result is a SEO company claiming to optimize for LSI ... "
_____

I am going to go ahead and provide a link to their LSI Optimization Page. IMO - they are narrowly focused, their Site is one of more appealing SEO Sites I have visited, well designed and SEO'd. They may be ahead of the game:

http://www.bayst-search-engine-optim...n.com/LSI.html

I particularly agree with this comment on their Site:
Quote:
"Anchor Text Repetition: another form of duplication is the presence of repetitive anchor text in links (e.g., web promotion) pointing to a site. The overuse of anchor text in links is one of the major reasons LSI was brought in."
There are some more LSI links there too.

Ken

Sorry about the multiple posts in a row, I walked in a little late.
Reply With Quote
  #12 (permalink)  
Old 02-25-2005, 03:23 PM
WebProWorld Pro
 

Join Date: Jan 2004
Location: California, The OC
Posts: 283
voasi RepRank 0
Default

Here's a list of LSA/I topics and papers: http://www.voasi.com/2005/02/lsa-for...e-rankings.htm

Here's a quite lengthy thread over at Search Engine Watch that goes into the dymanics of LSA, with some PH.d's and Grads getting into the discussion which really helps to break down the complexity of the subject. http://forums.searchenginewatch.com/...ead.php?t=4009
__________________
Voasi Blog
SEM Inc. Blog
Reply With Quote
  #13 (permalink)  
Old 02-28-2005, 03:56 PM
Jason Tor's Avatar
WebProWorld Veteran
 

Join Date: Jul 2004
Location: Arizona
Posts: 444
Jason Tor RepRank 0
Default

Wow! This thread is all over the place. I think we know as much about LSA as anybody else does. I was reading several threads on different forums and the only consistency is that googles serps are a bit crazy and LSA is probably to blame.

Which is the main point isn't it? Googles results are decent for some terms and absolutely ignorant for all the rest. So is LSA responsible for this? I personally don't see every result having anything to do with LSA, but I do see some results mixed in, these are usually the ignorant results that really have nothing to do with what were searching for.

I've even noticed on my adsense content I'm getting some results that have nothing to do with my page content.

Is Google trying to hard to produce relevant results? Most likely. Will they stop and go back to the most relevant results they used to produce? Not likely.

I think I'll learn as much about LSA as possible and experiment with LSA/SEO if such a thing actually can exist. In the mean time, I'll keep searching MSN and Yahoo when I actually want to find what I'm looking for.

Jason Tor
Reply With Quote
  #14 (permalink)  
Old 02-28-2005, 04:07 PM
WebProWorld 1,000+ Club
 

Join Date: Dec 2003
Location: Houston
Posts: 5,715
greeneagle RepRank 0
Default

Jason:
Quote:
"Is Google trying to hard to produce relevant results? Most likely. Will they stop and go back to the most relevant results they used to produce? Not likely."
I think the technology capacity for achieving desired results is there, even though being quite complex, it is definitely in it's infancy. They are just learning to crawl.

On the other hand witht the "braintrust" and capital onhand, I would expect a "toddler" soon!

Ken
Reply With Quote
  #15 (permalink)  
Old 02-28-2005, 06:45 PM
Jason Tor's Avatar
WebProWorld Veteran
 

Join Date: Jul 2004
Location: Arizona
Posts: 444
Jason Tor RepRank 0
Default

As much as the evil Google irritates me at the moment(and that moment may or may not pass) LSA intrigues me! I saw a post on http://forums.searchenginewatch.com/ that explains what LSA is more thoroughly, it was posted my Randfish:

LSA - Latent Semantic Analysis
The idea behind this is that by taking a huge composite (index) of millions of web pages, the search engines can "learn" which words are related and which noun concepts relate to one another.

For example, using LSA, a search engine would recognize that trips to the zoo often include viewing wildlife and animals, possibly as part of a tour.

Now, conduct a search at Google for ~zoo ~trips. Note the bolded words match the terms I italicized in the paragraph above. Google is bolding 'related' terms and recognizing which terms that frequently occur concurrently (together / on the same page / in close proximity) in their index.

Some forms of LSA are too computationally expensive. For example, Google isn't smart enough to 'learn' the way some of the newer learning computers do at MIT (see some news reports on this). They cannot, for example, learn through their index that Zebras and Tigers are both examples of striped animals, although they may realize that stripes and zebra are more semanticly connected then ducks and stripes.

Theming
Theming is more of an SEO concocted subject that is floated around often - choosing a 'themed' page for a link rather than a non-themed page. Basically, theming is what Google bought the company Kaltix for. They created the site-themed (flavored) search for Google, which is able to categorize many websites, based on their content/links/etc. into varying themes through a categorization structure.

This helped me to understand it a little bit more, here is the thread
http://forums.searchenginewatch.com/...ead.php?t=4009.

Jason Tor
Reply With Quote
  #16 (permalink)  
Old 02-28-2005, 07:59 PM
incrediblehelp's Avatar
Moderator
WebProWorld Moderator
 

Join Date: Jan 2004
Location: Live in Cincy Now
Posts: 7,702
incrediblehelp RepRank 4incrediblehelp RepRank 4incrediblehelp RepRank 4incrediblehelp RepRank 4
Default

LSI is probably not new to search results and probably not to hard to organize the architecture (barrels of keywords) for these same meaning/homogenous keywords. I think the problem for the search engines (we are seeing this now with Allegra) is trying to apply correct rank value to websites that fall into each of the LSI barrels. For instance:

http://www.google.com/search?q=~zoo

Some websites talk about "wildlife" and some about "aquariums" which both relate to "zoos". Sure this does a good job in getting related websites into a pool that previously maybe did not relate according to Google search results, but what value does this relationship to the "zoo" keyword provide getting one website ranked higher than the other? My assumption that this LSI feature is going to be just an equal piece to the overall ranking algorithm that Goggle or others use. Definitely not more or less than the current pieces to the puzzle.
Reply With Quote
  #17 (permalink)  
Old 02-28-2005, 08:41 PM
WebProWorld 1,000+ Club
 

Join Date: Dec 2003
Location: Houston
Posts: 5,715
greeneagle RepRank 0
Default

incrediblehelp:

Don't you think that the "barrels" now have "sub-barrels" of "similar" terms? Is that really too far fetched? I surely can't classify that possibility as "science fiction"!

Ken
Reply With Quote
  #18 (permalink)  
Old 02-28-2005, 09:03 PM
incrediblehelp's Avatar
Moderator
WebProWorld Moderator
 

Join Date: Jan 2004
Location: Live in Cincy Now
Posts: 7,702
incrediblehelp RepRank 4incrediblehelp RepRank 4incrediblehelp RepRank 4incrediblehelp RepRank 4
Default

Quote:
Originally Posted by greeneagle
incrediblehelp:

Don't you think that the "barrels" now have "sub-barrels" of "similar" terms? Is that really too far fetched? I surely can't classify that possibility as "science fiction"!

Ken
Of course they do and I am sure the sub-barrels have many more sub-barrels underneath and so on and so on. No sci-fi here but that still doesn't do much more than add another component into a now "dizzying" algorithm from Google.
Reply With Quote