iEntry 10th Anniversary Forum Rules Search
WebProWorld
Register FAQ Calendar Mark Forums Read
Google Discussion Forum Google Discussion forum is for topics specifically related to Google. There is a subforum dedicated to AdSense/AdWords subjects.

Share Thread: & Tags

Share Thread:

Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 12-28-2005, 07:41 PM
WebProWorld New Member
 
Join Date: Jun 2005
Posts: 21
travisbickle100 RepRank 0
Default Google indexed pages question

I went to uptimebot and entered my site and found that Google had only indexed only a couple hundred of my pages. The web site is a database of private schools and I have almost 30,000 pages on the site. The site has been up for almost 5 years. Can anyone tell me why the entire contents have not been indexed after so long?
__________________
"You miss 100% of the shots you never take" - Wayne Gretsky

http://www.eschoolsearch.com
Reply With Quote
  #2 (permalink)  
Old 12-28-2005, 10:29 PM
bobkom's Avatar
WebProWorld Pro
 
Join Date: May 2004
Location: indianworld
Posts: 151
bobkom RepRank 0
Default Re: Google indexed pages question

Quote:
Originally Posted by travisbickle100
I went to uptimebot and entered my site and found that Google had only indexed only a couple hundred of my pages. The web site is a database of private schools and I have almost 30,000 pages on the site. The site has been up for almost 5 years. Can anyone tell me why the entire contents have not been indexed after so long?
I think it can be possible by increasing the crawl time by writing the code in the robots.txt


User-agent: googlebot
Crawl-delay: 120


I dont know how much time i,e 120 or other to crawl the 30,000 pages but the robots.txt works if proper navigation is there in your site increase the time 120 if you are not crawled well.
__________________
Mission www.INdianworld.IN
www.venkyfans.in
Reply With Quote
  #3 (permalink)  
Old 12-29-2005, 12:03 AM
ADAM Web Design's Avatar
WebProWorld 1,000+ Club
 
Join Date: Dec 2003
Location: Toronto, Ontario, Canada
Posts: 2,181
ADAM Web Design RepRank 1
Default

Travis, you've got a few issues here:

1) Your problem is at least partly being caused by invalid code.

http://validator.w3.org/check?uri=ht...m%3Fstate%3DAL

You got yourself a mess of bad code there, partner. And that's at the state level.

Your city level is better, but still has some problems.

http://validator.w3.org/check?uri=ht...doctype=Inline

If I were to guess at any ONE thing (and I don't think ONE thing will solve it), I'd say it's your double-declaration of the <body> tag.

Your detail pages are better again, from what I can tell, but still have some minor issues. The more you can solve, the easier it will be to crawl.

http://validator.w3.org/check?uri=ht...doctype=Inline

2) If that doesn't work, try running a little Bruce Clay magic on the pages of your site. He doesn't seem to like your state pages:

http://www.seotoolset.com/cgi-bin/kd...hEngine=Google

You can run with it from there, I assume.

If this doesn't solve things, you should be a lot closer and usually more things tend to reveal themselves.
Reply With Quote
  #4 (permalink)  
Old 12-29-2005, 12:06 AM
WebProWorld MVP
WebProWorld MVP
 
Join Date: Jul 2004
Location: Omaha
Posts: 2,714
brian.mark RepRank 3brian.mark RepRank 3
Default Re: Google indexed pages question

Quote:
Originally Posted by nagarjuna55334
I think it can be possible by increasing the crawl time by writing the code in the robots.txt


User-agent: googlebot
Crawl-delay: 120


I dont know how much time i,e 120 or other to crawl the 30,000 pages but the robots.txt works if proper navigation is there in your site increase the time 120 if you are not crawled well.
Surely you're not advising to make it say 120. Wait 2 minutes in between each page requast? That'd take forever to do 30,000 pages. Leaving off the crawl delay would be much better advice.

Brian.
__________________
ToolBarn.com, an Internet Retailer Top 500 and Inc. 500 Company | Tool Parts | Pet Supplies
Reply With Quote
  #5 (permalink)  
Old 12-29-2005, 11:02 AM
WebProWorld New Member
 
Join Date: Jun 2005
Posts: 21
travisbickle100 RepRank 0
Default Re: Google indexed pages question

nagarjuna55334 wrote:

I think it can be possible by increasing the crawl time by writing the code in the robots.txt

User-agent: googlebot
Crawl-delay: 120



Are you saying that I have too long a delay on my robot.txt file? Maybe I should decrease it?
__________________
"You miss 100% of the shots you never take" - Wayne Gretsky

http://www.eschoolsearch.com
Reply With Quote
  #6 (permalink)  
Old 12-29-2005, 11:17 AM
WebProWorld New Member
 
Join Date: Jun 2005
Posts: 21
travisbickle100 RepRank 0
Default

ADAM Web Design wrote:

1) Your problem is at least partly being caused by invalid code.

I see your point. However, I put in a URL from my site that has been indexed and it came up with the same number of errors. So I am at a loss to understand why the full site has not been indexed after 5 years. I can understand that my code makes it more difficult to index but surely 5 years is enough.
__________________
"You miss 100% of the shots you never take" - Wayne Gretsky

http://www.eschoolsearch.com
Reply With Quote
  #7 (permalink)  
Old 12-29-2005, 11:29 AM
WebProWorld New Member
 
Join Date: Jun 2005
Posts: 21
travisbickle100 RepRank 0
Default

Adam,

I am now more confused than ever. I went to the SEO tools site you recommended and did a server validation and used the server page tool and it said that I have No robots.txt file. Also, below that the only result that came back under all three was:

Spider Input 1:
Spider Input 2:
Spider Input 3:
Spider Input 4: <base href="http://www.eschoolsearch.com/">
Spider Input 5:
Spider Input 6:
Spider Input 7:
Spider Input 8:
Spider Input 9:
Spider Input 10:
Spider Input 11:
Spider Input 12:
Spider Input 13:
Spider Input 14:
Spider Input 15:
Spider Input 16:
Spider Input 17:
Spider Input 18:
Spider Input 19:
__________________
"You miss 100% of the shots you never take" - Wayne Gretsky

http://www.eschoolsearch.com
Reply With Quote
  #8 (permalink)  
Old 12-29-2005, 11:48 AM
WebProWorld New Member
 
Join Date: Jun 2005
Posts: 21
travisbickle100 RepRank 0
Default

This is very interesting. I don't mean to hijack my own topic but I went to that validator and put in the Yahoo home page and they have 285 errors on their home page!!!
__________________
"You miss 100% of the shots you never take" - Wayne Gretsky

http://www.eschoolsearch.com
Reply With Quote
  #9 (permalink)  
Old 12-30-2005, 08:11 AM
Faglork's Avatar
WebProWorld Veteran
 
Join Date: Feb 2005
Location: Forchheim, Germany
Posts: 936
Faglork RepRank 1
Default Re: Google indexed pages question

Quote:
Originally Posted by nagarjuna55334
I think it can be possible by increasing the crawl time by writing the code in the robots.txt


User-agent: googlebot
Crawl-delay: 120


I dont know how much time i,e 120 or other to crawl the 30,000 pages but the robots.txt works if proper navigation is there in your site increase the time 120 if you are not crawled well.

You seem to misunderstand the concept of "Crawl-delay". It is used to increase the time between spidering one page and the next by introducing a delay - this is done to reduce load on the server:
http://www.ilovejackdaniels.com/deve...ts-txt-file/3/

It has nothing to do with the time a bot takes to spider your pages.

faglork
Reply With Quote
  #10 (permalink)  
Old 12-30-2005, 03:05 PM
WebProWorld New Member
 
Join Date: Jun 2005
Posts: 21
travisbickle100 RepRank 0
Default

I went to Google direct and put in:

site:eschoolsearch.com esearchforit

and 36,700 results came up instead of 440.
__________________
"You miss 100% of the shots you never take" - Wayne Gretsky

http://www.eschoolsearch.com
Reply With Quote
  #11 (permalink)  
Old 01-03-2006, 05:02 PM
WebProWorld New Member
 
Join Date: Jun 2005
Posts: 21
travisbickle100 RepRank 0
Default

It turns out that all of my pages are indexed but most of them are in the supplemental index because they are so similiar. I am not sure of the consequences of this.
__________________
"You miss 100% of the shots you never take" - Wayne Gretsky

http://www.eschoolsearch.com
Reply With Quote
  #12 (permalink)  
Old 01-03-2006, 07:31 PM
WebProWorld Veteran
 
Join Date: Jan 2006
Posts: 352
SPC2 RepRank 0
Default

Invalid code is irrelevant to crawling, unless there is broken code that affects the links. For instance, I saw a page recently where the comment tag had been opened in the head, but it was never closed, so everything in the page was a comment.

I had a quick look at the site in your signature, and, unless I missed a link somewhere, I'm amazed that Google has indexed more than a dozen or so pages. They probably got most of the 200 from links pointing at them from other sites.

Your problem is that most of the pages aren't crawlable. They are hidden behind forms, and spiders can't fill forms in. You need to make paths for spiders to follow, so they can reach all the pages.

One way of doing it would be to add a directory as an alternative way for people to use the site. E.g. home page -> directory top (lists states) -> state pages (lists cities) -> city pages.

Another way would be to add an alphabar to the homepage, so that people can click a letter to get a list of cities that start with the letter. Clicking on one of those would return the list of schools in the city, and so on.

I may have missed something in the site, but I can't see any way of reaching any state, city, or detail pages without filling in the Search form, and spiders can't do that.

<added>
I've just seen your last 2 posts, and it looks like I've missed something in your site. Or maybe you've changed it to a form-only system since all the pages were crawled.
Reply With Quote
  #13 (permalink)  
Old 01-03-2006, 07:37 PM
WebProWorld Veteran
 
Join Date: Jan 2006
Posts: 352
SPC2 RepRank 0
Default

Pages from the Supplemental index are only listed in the serps when the normal index can't produce sufficient results. The Supplemental index is not the place you want any pages to be.
Reply With Quote
  #14 (permalink)  
Old 01-03-2006, 08:19 PM
dburdon's Avatar
WebProWorld 1,000+ Club
 
Join Date: Oct 2004
Location: Kent, England
Posts: 1,458
dburdon RepRank 1
Default Privilege

As we say about private education in England - "You can buy privilege, but you can't buy brains".

Solution: Links. And a logical site structure will help solve the problem.
__________________
Simply Clicks | SEO | SEO Training| Pay Per Click Advertising | Search Engine Powered Marketing
Reply With Quote
  #15 (permalink)  
Old 01-03-2006, 11:26 PM
incrediblehelp's Avatar
WebProWorld 1,000+ Club
WebProWorld MVP
 
Join Date: Jan 2004
Location: Live in Cincy Now
Posts: 7,573
incrediblehelp RepRank 4incrediblehelp RepRank 4incrediblehelp RepRank 4incrediblehelp RepRank 4incrediblehelp RepRank 4
Default

1. Get rid of the crawl delay

2. I still dont subscribe to page validaiton as reason for not ranking. I have seen far to many not valid pages rank well.

3. You lack of indexing probably has more to do with lack of content more than anything. You website is simply a directory full of links leading to the contact info for the private school.

4. You seem to be indexed and ranking on Google fine:

http://www.google.com/search?hl=en&q...=Google+Search

http://www.google.com/search?hl=en&l...chools+in+ohio

Add some content, add some more content weekly (private school news blog?) and magic will happen.
Reply With Quote
  #16 (permalink)  
Old 01-03-2006, 11:34 PM
WebProWorld Veteran
 
Join Date: Jan 2006
Posts: 352
SPC2 RepRank 0
Default

So that's where it was - the sitemap. I missed it completely :(
Reply With Quote
  #17 (permalink)  
Old 01-11-2006, 04:40 PM
WebProWorld New Member
 
Join Date: Jun 2005
Posts: 21
travisbickle100 RepRank 0
Default

Thanks for the help here. I will work on the site and try to correct some of the problems.
__________________
"You miss 100% of the shots you never take" - Wayne Gretsky

http://www.eschoolsearch.com
Reply With Quote
Reply

  WebProWorld > Search Engines > Google Discussion Forum

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT -4. The time now is 12:27 PM.



Search Engine Optimization by vBSEO 3.3.0