iEntry 10th Anniversary Forum Rules Search
WebProWorld
Register FAQ Calendar Mark Forums Read
Search Engine Optimization Forum SEO is much easier with help from peers and experts! The WebProWorld SEO forum is for the discussion and exploration of various search engine optimization topics. Any non (engine) specific SEO or SEM topics should go here.

Share Thread: & Tags

Share Thread:

Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 04-08-2004, 09:33 AM
suthra's Avatar
WebProWorld New Member
 
Join Date: Mar 2004
Posts: 22
suthra RepRank 0
Default Identifying The duplicate pages

Hi , how does a searchengine like google identify a duplicate page.

For example, i create 2 websites (seperate domains) with contents like:

Same Templates Footer , header for both domains and the content goes like this.
1st Domain :
In todays world everyone searches for money....

2nd Domain

In this world Everyone searches for money.....

Now humans knows this is a duplicate page , but how does a bot knows that it is a duplicate page?




Plz reply.,
Regds.,
sarathy.s
Reply With Quote
  #2 (permalink)  
Old 04-08-2004, 12:12 PM
Mel Mel is offline
WebProWorld 1,000+ Club
 
Join Date: Jul 2003
Posts: 1,903
Mel RepRank 2Mel RepRank 2
Default

IMO it depends on the search engine involved, but to take google as an example, they parse the text into hitlists records for each page and I would imagine thet the hitlists would be near duplicates.

Another scheme which Google has patented, but which I don't know is in actual use by Google, is a technique to "fingerprint" each page by recording dozens of parameters and combining that into what is a unique id for each page based on its content.
__________________
Mel Nelson
Expert SEO | Cheap used cars
Reply With Quote
Reply

  WebProWorld > Search Engines > Search Engine Optimization Forum

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT -4. The time now is 05:20 AM.



Search Engine Optimization by vBSEO 3.3.0