PDA

View Full Version : Google indexing only few pages



retrovisor
05-07-2009, 01:38 PM
Hi everyone,
after months not being able to figure out Google's logic onto indexing the pages of my website, I decided to ask you guys some help.

I have a "free ads" website, working with PHP and MySQL. I use .htaccess and mod_rewrite to deliver friendly URLs.
I have more than 3000 ads posted on the website today. However, Google won`t index more than 300 of them at the same time. For some strange reason, also, the indexed pages seem to differ from one week to the other.

(by the way, here it is: moracomigo DOTCOM DOTBR ) - sorry, the forum won't allow me to post urls!

I assume there might be something wrong with the website's architecture. I just don't know where it is. I use some kind of dropdown menu on the frontpage, where the user is able to choose State and City in order to search for ads. Why can`t google follow it? What could explain why some pages are indexed and some are not, when the ads structure are all the same?


Do you guys have any tips about what could be done ? :B

Thank you!

ebuzzmaster
05-08-2009, 03:56 PM
There are a couple of things for you to look at:
1. Your top level navigation all has the same meta titles. Each of these pages should have a unique set of meta data. Looking at (I think) individual ad pages, they DO seem to have unique meta titles, at least (do not know about descriptions or keywords).

2. The site has canonical URL issues - it should show up just for www.your domain. rather than both that and without the www. Do a 301 redirect from whichever you do not want to be primary.

3. There doesn't seem to be a way for Google to find the pages by spidering your site. I suggest adding "similar" ad links on some ad pages. Alternately (from what I can tell it is an apartment rental site? I don't speak Portuguese...) - add a list of cities or localities that a person could click through to see all of the current ads for that city / location.

4. Set up an XML sitemap GENERATOR so that every time a new ad is added to your site, it is also added to the sitemap. This is probably one of the more important things you can do.

Hope this helps!!

jordy3738
05-08-2009, 04:33 PM
Java Script navigation can cause serious issues in this arena too! Are you using Java Script.

What about a sitemap, do you have one and have you submitted it to Google?

retrovisor
05-08-2009, 05:06 PM
Thank you for the help so far.
I`ll work on 301 redirect soon, thanks for the tip.

ebuzzmaster, by meta title tags you mean the link tags on the menu links, not on the page that this link leads when clicking on it, right?

I dont understand why google can crawl some pages and on other cases it is not possible. On the homepage there are two boxes showing the last ads posted . Could it be that google crawl them when they`re there, and when they get off the list, they disappear of the list of indexed pages?

In case I´m using Javascript navigation, should I try to establish an alternative navigation, through links, so that google can follow it?

I tried many different sitemaps generator, but they would only crawl about 10 pages - even though I have more than 300 of it!

Thanks a lot!

deepsand
05-08-2009, 05:14 PM
... not being able to figure out Google's logic onto indexing the pages of my website
Firstly, are you referring to the pages which appear in the SERPs or your Webmaster Tools?

Secondly, is it your expectation that all pages should at all times appear in either or both of these places?

retrovisor
05-08-2009, 05:40 PM
deepsand, I refer to the pages that exist for "human navigation". Each ad posted would generate a new url, and based on that I expect the SE to index a certain number of pages.

I know how unstable the number of indexed pages of a website can be, but I assume there might be something wrong with my website when noticing that SEs are randomly indexing 10% of the existing pages

deepsand
05-08-2009, 06:15 PM
It's still not clear what measure of "pages indexed" you are using.

Are you referring to, i.e. looking at, the pages which appear in the SERPs or your Webmaster Tools?

And, are you expecting that more of the unique URLs created for each ad should appear wherever it is that you are looking? Or, is it that you feel too few of the static URLs appear there?

retrovisor
05-08-2009, 07:03 PM
By "indexed pages" I mean the pages that appear in Google.

I expect that more of the unique URLS created for each ad should appear on the SERPs. They all become static through mod_rewrite, here`s an example:

moracomigoDOTCOM DOTBR/something DOT html

deepsand
05-08-2009, 08:05 PM
By "indexed pages" I mean the pages that appear in Google.
There is a fine distinction between being "indexed" and being "listed" in the SERPs.

SERPs = Search Engine Results Pages = pages publicly listed in response to specific search query string.

site:Site_URL_here = pages indexed, independent of any specific search query.

The latter yields 202 hits; 205 if "similar" results are included.


I expect that more of the unique URLS created for each ad should appear on the SERPs. They all become static through mod_rewrite, ...
Whether rewritten or not, the fact remains that they are temporary URLs, whose content very much mirrors that of others.

Considering that no SE continuously crawls & re-indexes your site, and that none have an interest in displaying 3000 temporary & very similar pages, it seems to me that you're fortunate to have so many indexed.

Why is it important that all be listed in the SERPs?

crankydave
05-08-2009, 08:21 PM
Google cannot see/follow your links.

Text Only Cache (http://74.125.95.132/search?q=cache:HOrN6a7khvUJ:www.moracomigo.com.br/&hl=en&gl=us&strip=1)

Dave

retrovisor
05-08-2009, 08:42 PM
deepsand, thank you for making it clear to me. I always thought that the pages found through that "site:Site_URL_here" query were showing me pages that could possibly be found on the search of certain keywords

I agree that the pages are similar- there are three patterns that vary according to the information provided by the visitor who's posting the ad.
But still, if this is the problem, how come that websites like OLX or other free ads sites got more than 1 million pages indexed? The content in their pages vary as much as mine do.

I assume it is important to have them all listed because even though they seem so similar, they are different ads. The pages listed now only cover a third of the cities with existing ads. Today, someone searching for "quartos sao paulo" (rooms sao paulo) might get to my website. Someone searching for "quartos brasilia" (rooms brasilia) cannot find any ocurrency of my website at all.

Am I right thinking this way?

deepsand
05-08-2009, 09:18 PM
deepsand, thank you for making it clear to me. I always thought that the pages found through that "site:Site_URL_here" query were showing me pages that could possibly be found on the search of certain keywords
Your assumption is correct. It's just that I wanted to look at what you were looking at, but had no way of knowing if that operator was what you were using, or, if it was a particular search query string.


I agree that the pages are similar- there are three patterns that vary according to the information provided by the visitor who's posting the ad. But still, if this is the problem, how come that websites like OLX or other free ads sites got more than 1 million pages indexed? The content in their pages vary as much as mine do.
Crawlers/robots/spiders can only see what a text based browser sees. If you look at the cache that Dave linked to, you will see that only those URLs which are expressly set forth on your home page are readable by Google's spider.


I assume it is important to have them all listed because even though they seem so similar, they are different ads. The pages listed now only cover a third of the cities with existing ads. Today, someone searching for "quartos sao paulo" (rooms sao paulo) might get to my website. Someone searching for "quartos brasilia" (rooms brasilia) cannot find any ocurrency of my website at all.

Am I right thinking this way?
Bear in mind that it is impossible for any SE to index and list any links that were not present when your site was last crawled; nor has it any way of knowing which have been removed since such indexing. Therefore, ad links that were added in the interim cannot be listed; and, those removed will still be present, yielding a 404 error if clicked on.

Rather than the links to all of the ads, it seems to me that the matter of greater import is having the URLs for your State/City pages, e.g. MoraComigo.com.br (http://www.moracomigo.com.br/Acre/Morador-em-Rio-Branco.html?flat_state=Acre&flat_city=Rio-Branco) , indexed. I.e., rather than trying to get the temporary ads themselves indexed, get the permanent pages which actually contain the ads indexed.

Have you submitted a sitemap containing such to Google?

Peter (IMC)
05-09-2009, 10:23 AM
Que bom, um Brasileiro aqui no WebProWorld. :) Em qual cidade você mora?


The problem is: A search engine can not find your pages because you´re not linking to them. The dropdown menu on the home page you refer to, aren't links. You´re even using AJAX to get the list of cities based on the state chosen. Search engines don't execute Javascript so you´re pretty much blocking them from all your internal pages.

Do the following: Create a directory structure in which you only show states and cities that actually have ads in them. This way you won't show states and cities that don't have ads. This is important because you don't want to have thousands of empty pages in your site.

You can leave the AJAX drop down menu in the home page but you need to create a page that gives access to the directory structure. Link to these pages from the top menu. You can use a link text like "todos as moradias" (excuse me if that portuguese is wrong, I'm still not 100% in Brazilian portuguese) and maybe you can also come up with some other directory structures based on size or type of neighborhood, etc.

iambored
05-09-2009, 04:50 PM
In my Opinion You have to create site map & then submitted it to Google through Google webmaster tool..

retrovisor
05-09-2009, 05:05 PM
deepsand, thank you again for the explanations. The fact is that I´m unable to generate a sitemap, probably because of this navigation issues. I'm not an expert on PHP and javascript but I'll have to figure out a way to improve that on the site.

Olá Peter, tudo bem? Sou de Fpolis mas vivo hoje em São Paulo.
You made me understand the problems I have in there. I've tried a temporary solution based on you suggestion, " You can use a link text like "todos as moradias". I put on the homepage a link to a specific page that lists all the ads on the database. I didnt put it as a big link on the menu because I prefer the ads to be visible only for registered users, if you know what I mean.

So I put a small link on the footer to the page (linked with the "_" symbol, right next to the sitemeter icon), which lists all the ads in the database. I've noticed that a text based browser can see it, so I assume Google also does, right?

Am I using a bad technique this way or is it ok?

Thank you all again for the help

Peter (IMC)
05-10-2009, 02:12 AM
You have a problem. You want to hide the ads from unregistered users, but you do want Google to index them. That's contradictory. When a search engine has indexed them, people will find those internal pages and access them directly through the Search Engine Result Pages.

That small link is a very bad idea. You´re trying to hide a link there. Very bad idea.

You have to make a choice:

Or you allow search engines to index all pages and with that you also allow all visitors access,... Or you do not allow unregistered visitors access to the ads and with that you accept that search engines also don't have access.

Search Engine Robots like GoogleBot will never register,. that is one thing you can be sure of.

You mentioned that some other sites have so many pages indexed. Is it required to register in those sites to access the ads?

innominds
05-10-2009, 05:30 AM
You allow search engines to index all pages and with that you also allow all visitors access

I think this is the only option left to you if you are serious with Google Indexing.

spirulinaworld
05-10-2009, 07:23 AM
Website sitemap submission in xml format thru google webmaster tools helps.

Regards,
Ajay

deepsand
05-10-2009, 05:06 PM
The fact is that I´m unable to generate a sitemap, probably because of this navigation issues.
In the interim, you can still add the static URLs for your State/City pages to your site map by using any number of site map creation/editing tools available on-line, such as those at Google Sitemaps Generator, Editor and Keyword Analyzer Free Online (http://www.sitemapdoc.com/) .

As for the temporary ad pages themselves, I would urge against their being indexed, as your primary goal is to get people to your site, rather than to a specific ad.

retrovisor
05-10-2009, 09:00 PM
Peter,
Let me explain better what I said about hiding the ads to unregistered users.
It is not that they're hidden for visualization. What I want is that, in order to browse freely through the site, the user should be registered. Non registered users can seea limit of two ads of the city desired. If coming from a SE, the visitor can also view the ad found through the keywords.
I don't know if I made myself clear, since this strategy mixes two different ideas

(By the way, registration is free)

I do want the search engines to index all pages of the website. I don't care if the visitors visualize them by landing directly to the ad's URL - I just think that it is important to have the biggest number of registered users possible.

deepsand,
I am still figuring out how to create static URLs to State/City pages, thanks for the tip

retrovisor
05-11-2009, 09:24 PM
is there anybody in there? :o)

ericajoieake
05-11-2009, 10:54 PM
That is the way of google indexing the websites, google are not showing all of the pages of your indexed pages because of security purposes.

deepsand
05-11-2009, 11:34 PM
What "security purposes?"

BanquetTables.Pro
05-11-2009, 11:44 PM
What "security purposes?"


Their stock, market share, ppc revenue is at risk. Thats why they have so many filters etc.

deepsand
05-12-2009, 12:27 AM
What have these matters to do with "security?"

And, how does indexing those ads which are detectable serve to protect Google's stock value and/or revenue streams?

zeruel
05-12-2009, 08:04 AM
I don't know what's happening with Google's indexing as well. I see odd changes made from day to day. One thing I noticed is that the other site that I am working was already indexed in Google last Wed, but when I checked again the following day, it is not indexed anymore.