|
|
||||||
|
||||||
| Index Link To US Private Messages Archive FAQ RSS | ||||||
| Search Engine Optimization Forum SEO is much easier with help from peers and experts! The WebProWorld SEO forum is for the discussion and exploration of various search engine optimization topics. Any non (engine) specific SEO or SEM topics should go here. |
Share Thread: & Tags
|
||||
|
![]() |
|
|
LinkBack | Thread Tools | Display Modes |
|
||||
|
1. The difference between a batch-based search engine and an incrementally-updated search engine.
I have read a very interesting recent interview with Matt Cutts, by Aaron Wall. And one of the questions was: When you guys roll out new algorithms, filters, and patches some good sites end up getting filtered out with the bad. Do you pre-test most of the algorithms prior to launching them? How do you know how strongly to apply filters? By default do you usually lean on one side or the other and then tweak your way back? and Matt's answer was: We always put algorithmic changes into our test harnesses to poke and prod in lots of different ways. But you also have to be adaptive. If someone in the outside world notices an issue after a launch that you didn't notice, it's important to take that feedback and act on it, and also to try to improve the testing procedure to cover that in the future. We usually have a pretty strong sense of whether something will be a large-impact launch or not. But you can't completely avoid having a large impact with a launch. An example might be if you're replacing a large subsystem in the crawl-index-serve pipeline. We continually go back and improve or replace sections of our system. Sometimes the results can't be bit-for-bit compatible in output, so you have to do the best you can. Update Fritz in 2003 is the canonical example of that; you can't go from a batch-based search engine to an incrementally-updated search engine without some visible impact. To answer your last question, I personally lean toward softer launches; webmasters never need any extra stress. But sometimes launches can't be made completely soft or invisible, as I mentioned. See msg #298 in this link: http://www.webmasterworld.com/forum30/31688-10-30.htm 2. Stop 302 Redirect Hijacking http://www.loriswebs.com/hijacking_web_pages.html 3. How to stop bad bots and robots.txt http://www.garykeith.com/browsers/downloads.asp 4. Mode-Reweite Basics http://www.macwoms.com/, http://www.modrewrite.com/ and ISAPI rewrite http://www.isapirewrite.com/ 5. 301 Redirects http://www.heatherswebdesign.com/zw12.htm 6. Use the .htaccess file to block referrer spam. http://www.aaronlogan.com/downloads/htaccess.php |
|
|||
|
I enjoyed that articles and it is great to see Aaron Wall be the one to interview Matt.
Please stop posting the WMW links most of us refuse to pay to get in there! ;-( AND you can learn a lot from what Matt Cutt says, and more from what he doesn't. :-)
__________________
SEO Blog |
|
|||
|
Quote:
As Kgun's example illustrates, it is good to get your info from as many sources as possible. |
|
||||
|
"How search engines work
Creating and maintaining an inverted index is the central problem when building an efficient keyword search engine. To index a document, you must first scan it to produce a list of postings. Postings describe occurrences of a word in a document; they generally include the word, a document ID, and possibly the location(s) or frequency of the word within the document. If you think of the postings as tuples of the form <word, document-id>, a set of documents will yield a list of postings sorted by document ID. But in order to efficiently find documents that contain specific words, you should instead sort the postings by word (or by both word and document, which will make multiword searches faster). In this sense, building a search index is basically a sorting problem. The search index is a list of postings sorted by word." and "Incremental versus batch indexing: Some search engines only support batch indexing; once they create an index for a set of documents, adding new documents becomes difficult without reindexing all the documents. Incremental indexing allows easy adding of documents to an existing index. For some applications, like those that handle live data feeds, incremental indexing is critical." http://www.javaworld.com/javaworld/j...15-lucene.html |
![]() |
|
| Thread Tools | |
| Display Modes | |
|
|
|
WebProWorld |
Advertise |
Contact Us |
About |
Forum Rules |
MVP's |
Archive |
Newsletter Archive |
Top |
WebProNews
WebProWorld is an iEntry, Inc. ® site - © 2009 All Rights Reserved Privacy Policy and Legal iEntry, Inc. 2549 Richmond Rd. Lexington KY, 40509 |