iEntry 10th Anniversary Forum Rules Search
WebProWorld
Register FAQ Calendar Mark Forums Read
Google Discussion Forum Google Discussion forum is for topics specifically related to Google. There is a subforum dedicated to AdSense/AdWords subjects.

Share Thread: & Tags

Share Thread:

Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 07-09-2004, 08:44 AM
Mark Carey's Avatar
WebProWorld New Member
 
Join Date: Apr 2004
Posts: 15
Mark Carey RepRank 0
Default All my pages showing no title or description

Over the past week, I have noticed a disturbing trend with my pages in Google. It started with the major (highest PR) pages: they remained in Google, but they were not showing a title or description. As a few days passed, the same started to occur and now most of my pages in the index are showing this way. They are still in Google, and some are holding rankings (due to anchor text, I assume).

It is alomost as if there were "noindex" meta on all of the pages, but there are none.

This has happened for multiple domains, all on the same hosting account. I use .htaccess to redirect some domains to sub-folders, a technique that had worked fine for many months with out issue.

www.stanthecaddy.com&hl=en&lr=&ie=UTF-8&start=40&sa=N]Here is an example from my Seinfeld site[/url]. Most of the pages are showing no title and description.

Thoughts? Advice?
Reply With Quote
  #2 (permalink)  
Old 07-09-2004, 09:43 AM
Mark Carey's Avatar
WebProWorld New Member
 
Join Date: Apr 2004
Posts: 15
Mark Carey RepRank 0
Default Googlebot not asking for HTML pages

I just checked my logs for the past 5 days.

Googlebot is coming, but ONLY requesting images and XML files (some of which don't exist: rss.xml, etc.)

Is it possible that the "HTML bots" have their DNS screwed and can't find the pages? Any other potential causes?

I hate to use the "B" word, but could this be a ban? If so, why? Toolbar stills shows PR, backlinks are still reported -- but after the next PR update, who knows?
Reply With Quote
  #3 (permalink)  
Old 07-09-2004, 01:32 PM
WebProWorld Member
 
Join Date: Jun 2004
Posts: 89
emils RepRank 0
Default

I am not familiar with XHTML and have basic knowledge of XML. Anyway - i went to your site and here is how your <html> tag looks:

<html xmlns="http://www.w3.org/1999/xhtml">

Up to my understanding of this, this tag declares an XHTML page and not HTML, which by itself is actually an XML document. I bet Google, although able to scan this, may not follow links from this type of documents the regular way. I could not find some info on Google about indexing this type of documents, so if someone can share a light on this, please post to this topic.

The usual html tag everyone uses is something like:
<html lang="en">.

Also there is a considerable amount of blank lines at the beginning of the page. While this by itself should not cause trouble with googlebot, i'd rather remove them.

The third thing is the page content itself. for example by looking at http://www.stanthecaddy.com/judge-vandelay-discuss.html . There is no actual Description tag. The page contains lots of Javascript and forms, but the actual textual content is really low. If i had a page of such low content, unless this page had a lot of external links pointing to it, it would not be very strange for me to see it disappearing from the index.

Out of these, the XHTML issue seems the most significant. The pages not showing title and description in Google, looks to me like a confirmation of Google treating differently than html.

As to the DNS thing, i strongly doubt there's a problem on that.
Reply With Quote
  #4 (permalink)  
Old 07-09-2004, 02:20 PM
Mark Carey's Avatar
WebProWorld New Member
 
Join Date: Apr 2004
Posts: 15
Mark Carey RepRank 0
Default

Thanks for the reply, emils.

First, I am not an expert with XHTML stuff either. The format of that tag is the default for pages created by Movable Type, a CMS that powers hundreds of thousands (millions?) of pages on web. I have had no probel with that for ~14 months. However, I did remove a <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> declaration from the top of the default MT templates (because it was causing a problem with external stylesheets in Mozilla). With Google experimenting with XML/RSS indexing, perhaps a recent Google spidering change could be treating the files as XML?

Yes, some pages have little content, like the short discussion thread you cited. But this is a global problem (20,000+ pages), affecting pages of all content sizes.

The disturbing thing is that Google hasn't requested any of these page in at least 5 days...
Reply With Quote
  #5 (permalink)  
Old 07-09-2004, 05:18 PM
WebProWorld Member
 
Join Date: Jun 2004
Posts: 89
emils RepRank 0
Default

Quote:
Originally Posted by Mark Carey
I have had no probel with that for ~14 months. However, I did remove a <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> declaration from the top of the default MT templates (because it was causing a problem with external stylesheets in Mozilla).
If i was you, i'd put that back right away. This changes a lot of things... its your declaration of document type.

Quote:
Originally Posted by Mark Carey
With Google experimenting with XML/RSS indexing, perhaps a recent Google spidering change could be treating the files as XML?
I bet it treats them as XML. As they are declared now, they *are* XML.

Quote:
Originally Posted by Mark Carey
The disturbing thing is that Google hasn't requested any of these page in at least 5 days...
Well i would put the !DOCTYPE statement back right away. Then i would wait for a bit. If your site has several links towards the homepage, this should convince Google revisit you in a few days max. If he sees the home page as HTML then it should begin recrawling your site again. I don't have any other advice for now, except this one which I think worths trying.
Reply With Quote
  #6 (permalink)  
Old 07-09-2004, 06:11 PM
Mark Carey's Avatar
WebProWorld New Member
 
Join Date: Apr 2004
Posts: 15
Mark Carey RepRank 0
Default

Thanks, emils.

I have tried the opposite: I did not put back the DOCTYPE, but I changed it to plain <html> tags instead. I did this for one large section of one of the sites, as a test. The "home page" of the section has many links pointing to it, so hopefully GB will stop by soon...
Reply With Quote
  #7 (permalink)  
Old 07-09-2004, 07:35 PM
WebProWorld Pro
 
Join Date: Sep 2003
Location: United Kingdom
Posts: 215
kikkertm RepRank 0
Default

It might be google is just getting confused about what it is spidering... Your pages are definitely not valid HTML:

http://validator.w3.org/check?uri=ww...a-discuss.html
Reply With Quote
  #8 (permalink)  
Old 07-12-2004, 10:10 AM
Mark Carey's Avatar
WebProWorld New Member
 
Join Date: Apr 2004
Posts: 15
Mark Carey RepRank 0
Default

Quote:
Originally Posted by kikkertm
It might be google is just getting confused about what it is spidering... Your pages are definitely not valid HTML:
Google doesn't care about valid HTML. The Google index would be a fraction of its current size if it only indexed "valid" HTML.

You're also fogetting that Google is not even requesting these pages -- or at least my server is not receiving and responding to any such requests.
Reply With Quote
Reply

  WebProWorld > Search Engines > Google Discussion Forum

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT -4. The time now is 07:49 PM.



Search Engine Optimization by vBSEO 3.3.0