Submit Your Article Forum Rules

Page 1 of 2 12 LastLast
Results 1 to 10 of 15

Thread: Caching and Indexing

  1. #1
    WebProWorld MVP mjtaylor's Avatar
    Join Date
    Dec 2003
    Posts
    6,237

    Caching and Indexing

    I admit it, I am confused. I used to think that a page being 'cached' indicated that it was indexed. And if a page was NOT cached, it was not indexed. But these days I find lots of pages in SERPs that are not cached. What is the significance of a page being cached?
    SEO Friendly Premium Web Directory - Submit Now| Need to write a love letter to Google? I'm an SEO Copywriter who knows Search Smart DesignŽ. | Travel Gypsy in Key West.

  2. #2
    Administrator weegillis's Avatar
    Join Date
    Oct 2003
    Posts
    5,793
    My guess: that it once was indexed? I have no idea of the logic behind page caching but it might actually be there as an alternative to searching only the index for a search query. Could the results be pulled straight from the cached pages?

  3. #3
    WebProWorld MVP chandrika's Avatar
    Join Date
    Oct 2005
    Location
    UK
    Posts
    742
    It is possible that those sites have used the meta tag
    Code:
    <meta name="robots" content="noarchive">
    that prevents google and other bots from caching their content. People might use it if they did not want old copies of their site in places such as waybackmachine, which that metatag also prevents a sites content being added to.

  4. #4
    Senior Member davidweb's Avatar
    Join Date
    Mar 2007
    Posts
    273

    Wink

    Quote Originally Posted by mjtaylor View Post
    I admit it, I am confused. I used to think that a page being 'cached' indicated that it was indexed. And if a page was NOT cached, it was not indexed. But these days I find lots of pages in SERPs that are not cached. What is the significance of a page being cached?
    The real purpose of Cache is to store all the data pertaining to your website [content part] in Google database. This data is then poured into Google Algorithm where it is processed like a cheese. If the ingredients of your website are good then Google says Cheeese otherwise you get a finger I mean lady finger

    Google cache is where all the on-page SEO things are stored.
    SEO Company - SeoHawk.com provides Search Engine Optimization and Marketing Services | Small but Best SEO Blog on the Planet http://www.seohawk.com/blog/

  5. #5
    Member peskyhuman's Avatar
    Join Date
    Jun 2010
    Location
    New Zealand
    Posts
    52
    Quote Originally Posted by davidweb View Post
    The real purpose of Cache is to store all the data pertaining to your website [content part] in Google database. This data is then poured into Google Algorithm where it is processed like a cheese. If the ingredients of your website are good then Google says Cheeese otherwise you get a finger I mean lady finger

    Google cache is where all the on-page SEO things are stored.
    How can you get Google to update the cache more often? Sometimes it lists quite old contents in search results because of caching.

  6. #6
    Senior Member deepsand's Avatar
    Join Date
    May 2004
    Location
    State College, PA
    Posts
    16,489
    The crawler, which does no more than request copy of resources, dumps such into the cache, preparatory to it's being processed by the indexing engine. Thus, the cache can only be updated if and when a resource is re-crawled.

    The search engine decides in real time, in response to a query, whether or not to display a link to the cache, assuming that it's archived. The criteria for making such decision are, to the best of my knowledge, undisclosed. The displayed cache data are limited to, I believe, the first 101 KB of test.

    For display purposes, cached files are, with the exceptions of Text and SWF files, converted into HTML format, so that no special viewer application is required.

    You can access the cached data for any archived resource via the cache: operator; e.g. .

    Google has it's own command, <META NAME="GOOGLEBOT" CONTENT="NOARCHIVE"> , which tells it to not make the cache public, and to not archive it once indexing is indexing is completed. <META NAME="ROBOTS" CONTENT="NOARCHIVE"> is a universal directive.

    There are other Meta Tags relative to caching, as applicable to devices other than SEs, such as proxy servers. For details, see Useful HTML Meta Tags.

  7. #7
    Senior Member
    Join Date
    Jan 2010
    Posts
    123
    Quote Originally Posted by davidweb View Post
    The real purpose of Cache is to store all the data pertaining to your website [content part] in Google database. This data is then poured into Google Algorithm where it is processed like a cheese. If the ingredients of your website are good then Google says Cheeese otherwise you get a finger I mean lady finger

    Google cache is where all the on-page SEO things are stored.
    Too funny.

    In reality - a site may be included within the index and be returned (rank) for relevant queries without having a cached version available. A page also does not need to be cached to be reported and/or counted as a backlink; it only needs to be indexed.

    http://www.google.com/intl/en/help/f...st.html#cached

    Cached Links
    Google takes a snapshot of each page examined as it crawls the web and caches these as a back-up in case the original page is unavailable. If you click on the "Cached" link, you will see the web page as it looked when we indexed it. The cached content is the content Google uses to judge whether this page is a relevant match for your query.

    When the cached page is displayed, it will have a header at the top which serves as a reminder that this is not necessarily the most recent version of the page. Terms that match your query are highlighted on the cached version to make it easier for you to see why your page is relevant.

    The "Cached" link will be missing for sites that have not been indexed, as well as for sites whose owners have requested we not cache their content.

  8. #8
    Senior Member NetProwler's Avatar
    Join Date
    Jan 2007
    Posts
    197
    Some CMS adopt a no cache directive by default. For example a typical default Joomla installation has this directive set at the header:
    <meta HTTP-EQUIV="Pragma" Content="no-cache">
    <meta HTTP-EQUIV="cache-control" content="no-cache"> and this:

    Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
    In such cases, Google will crawl and index the pages, but will not display the cache option.

    For the end users, some times using the cached version is faster to view than the original site - if the site is slow.

  9. #9
    Senior Member deepsand's Avatar
    Join Date
    May 2004
    Location
    State College, PA
    Posts
    16,489
    It is my understanding that said directives are not used by SEs, but by proxy servers and clients.

  10. #10
    WebProWorld MVP dburdon's Avatar
    Join Date
    Oct 2004
    Posts
    1,602
    Post caffeine Google has broken the link between the old crawl, index, cache and the SERPs. See: http://uksearch.blogspot.com/2010/06...goes-live.html
    In essence if you grasp the nature of the diagram Google can draw on results without the weight and time lag of the old system.
    Simply Clicks | Simply Clicks | UK Search Blog | Travel Thinking | Smarter Search Marketing

Page 1 of 2 12 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •