1. Background:
- http://www.webproworld.com/google-di...tml#post437745
- http://www.webproworld.com/search-en...tml#post403724
- http://www.webproworld.com/search-en...tml#post442147
I found some interesting articles on this
Smart IT Consulting Services
(
Related blog: sebastianx.blogspot.com
Example: Just another victim of the nofollow plague
)
site related to those subjects. Here are some relevant snippets.
2. Tagging irrelevant page areas: class=robots-nocontent
"Telling a search engine that particular page areas aren't related to a page's core contents was a problem, until Yahoo! introduced the "robots-nocontent" class name in May 2007. Perhaps other search engines will follow and support this mechanism too. Google has something similar called section targeting for the AdSense crawler, but puts crawler directives in HTML comments instead of the class attribute".
Source: Tagging irrelevant page areas: class=robots-nocontent
3. Link Specific Regulation: REL=NOFOLLOW
Google has introduced the NOFOLLOW value in the <A> tag's REL attribute in January, 2005, as an instrument to prevent comment spam. Yahoo, MSN and guestbook/forum/blog software makers quickly joined the initiative. In the meantime this syntax found its way into Google's guidelines and Google reps encourage webmasters to use it anytime they can't vouch for a link.
......................
There are good reasons not to use REL=NOFOLLOW to hoard PageRank™. First, PageRank™ hoarding is easy to discover and you will earn a ranking penalty for over-optimizing. Second, other webmasters are smart too and will cancel link trades if you cheat.
Source: Link Specific Regulation: REL=NOFOLLOW
4. URL Specific Control: the Robots META Tag
Note that the robots META tag is for use in HTML documents only. If you offer your content additionally in PDF or DOC format, and you don't want to find the PDF/DOC-files in search results, store them in a directory protected by robots.txt or disallow these extensions in general.
Source: URL Specific Control: the Robots META Tag
5. Steering and Supporting Search Engine Crawling
Source: Steering and Supporting Search Engine Crawling
6. Identifying and Tracking SE Crawling
An interesting PHP function:
PHP Code:
function isSpider ( $userAgent ) {
if ( stristr($userAgent, "Googlebot") || /* Google */
stristr($userAgent, "Slurp") || /* Inktomi/Y! */
stristr($userAgent, "MSNBOT") || /* MSN */
stristr($userAgent, "teoma") || /* Teoma */
stristr($userAgent, "ia_archiver") || /* Alexa */
stristr($userAgent, "Scooter") || /* Altavista */
stristr($userAgent, "Mercator") || /* Altavista */
stristr($userAgent, "FAST") || /* AllTheWeb */
stristr($userAgent, "MantraAgent") || /* LookSmart */
stristr($userAgent, "Lycos") || /* Lycos */
stristr($userAgent, "ZyBorg") /* WISEnut */
) return TRUE;
return FALSE;
}
if (isSpider(getenv("HTTP_USER_AGENT"))) {
$useSessionID = FALSE;
$logAccess = TRUE;
}
Source: Identifying and Tracking SE Crawling
7. Why is there not a rel="follow" attribute?
The meta nofollow tag in the head element refers to all links in the page.
http://www.google.com/support/webmasters/bin/answer.py?answer=33581&t...
If you put after that rel="follow" for individual links either it will not work or it will confuse the bot.
Source:
Google Groups Google Webmaster Help discussion thread: rel="follow" ? - Crawling, indexing, and ranking | Google Groups
8. A guide to clever linking for geeks and savvy programmers
Anatomy and Deployment of Links
9. Links for the future?
Advanced semantic linking and transclusion.