PDA

View Full Version : Google started indexing, I wasn't ready



selfbuilder
09-16-2008, 11:05 AM
I've been putting a new web site together and uploading the progress to the domain so that my copywriter can see it.
One day my copywriter mentioned that we were on Google's first page with a certain keyword.

Seeing as there were only two of us viewing it at that time, and no other sites or people were aware of its presence,
I'm wondering how Google managed to discover the site and start indexing it?

tamecrow
09-16-2008, 11:50 AM
Do either of you have the Google Toolbar installed? This alerts Google to every site you visit.

selfbuilder
09-16-2008, 03:58 PM
Do either of you have the Google Toolbar installed? This alerts Google to every site you visit.


I do indeed have Google's Toolbar installed within the superb Firefox 3.

Thanks for your speedy reply tamecrow

tamecrow
09-17-2008, 04:43 AM
No problem - you may want to place a robots.txt file on the server to disallow crawling of the domain/subdomain and password protect the folders as a safeguard.

false
09-17-2008, 11:28 AM
I wish I had a similar problem. I am trying to get a new site indexed by Google and nothing seems to have worked. I used all the standard tricks including adding to adWords, adding adSense, Analytics, Webmaster Tools code properly. Still not one page is indexed.

Niche
09-17-2008, 01:26 PM
Also depends on who your hosting accountis with. Some hosting accounts automaticallyput you in their directory
A friend's website was indexed while it had a coming soon page up and no links

ArthurNYC
09-17-2008, 05:01 PM
I have seen that in the past but I assume that they were accessing newly registered domains from a whois directory or something. But the toolbar makes sense.

A

BSmithTTS
09-17-2008, 05:16 PM
Google says it doesn't.
Debunking: Toolbar doesn’t lead to page being indexed (http://www.mattcutts.com/blog/debunking-toolbar-doesnt-lead-to-page-being-indexed/)

And as a test, I have a start page that is used with our company internally only and is not for public use... but is not hidden from public view, if someone happened across it, it would not be a big deal.

We all use the google toolbar on various browsers... I can't find the page in the SERPs.
I disagree... I think it got leaked.

Any third party services installed on the site? (tracking, counters, ads, etc)
Wouldn't happen to be a cms with rss features on it?

cgrantski
09-17-2008, 05:40 PM
Last time I experienced this, I made a page with no links to it and browsed to it within a minute of making it. The next day I saw Googlebot in the log, visiting that page with a time stamp of 20 minutes later. I don't see how it can be anything other than Google Toolbar, no matter what Google says. That may have been true in 2006 when Matt Cutts wrote the debunk post.

SemAdvance
09-17-2008, 05:54 PM
Google (like most search spiders) crawls servers, and happen to follow the links on the pages it finds there.

It does not need a toolbar. Just needs a link to a site existing on the server and in time it will crawl the entire server.

Depending whether or not a robots.txt file is installed at the root and validated as properly formatted the search spiders should read & follow the instructions within.

Hope this helps,

;->

a53mp
09-17-2008, 06:15 PM
It does not need a toolbar. Just needs a link to a site existing on the server and in time it will crawl the entire server.

That is not true. If you have a directory called /thisisaspecialdirectory and do not link to it via any links, images, css, sitemap, etc... a search engine will NOT be able to access it, because it will NOT know if it's presence. The search spiders do not go to every site on the internet trying to access directories it thinks might exist. Spam bots do that, search engines don't. The only way a spider can index the ENTIRE server, is if it has system access to the ENTIRE server.

morestar
09-17-2008, 06:28 PM
No problem - you may want to place a robots.txt file on the server to disallow crawling of the domain/subdomain and password protect the folders as a safeguard.

well, i wouldn't want to dis-allow anything...

just let google watch the progress of your site - well that's what i would do, i mean, i wouldn't panic...at least google knows you're there now and will start crawling your every few days or so...

worry not...google will figure out what your site is all about in time especially if you seo it so to speak...

deepsand
09-17-2008, 06:28 PM
SEs can learn of new Domain Names simply by interrogating Registration data. Google became a Registrant so that they would have quick & easy access to such data.

MajorTom
09-17-2008, 06:41 PM
Google says it doesn't.
Debunking: Toolbar doesn’t lead to page being indexed (http://www.mattcutts.com/blog/debunking-toolbar-doesnt-lead-to-page-being-indexed/)

And as a test, I have a start page that is used with our company internally only and is not for public use... but is not hidden from public view, if someone happened across it, it would not be a big deal.

We all use the google toolbar on various browsers... I can't find the page in the SERPs.
I disagree... I think it got leaked.

Any third party services installed on the site? (tracking, counters, ads, etc)
Wouldn't happen to be a cms with rss features on it?

Quoting a post from Matt Cutts' blog that says Google doesn't break privacy rules is ridiculous. Of course Matt Cutts is going to always say Google respects privacy, he has stock options worth millions, so he'll say anything. Just like he said Chrome browser didn't track user data until ZDnet debunked him with proof Google was tracking data.

What you're saying is like saying "Go ask John McCain or George Bush if the economy is strong"

btw, using the Alexa toolbar or SearchStatus can also lead to getting indexed. Check if the site has been indexed in Alexa.

The best way to prevent unintentional indexing is to use .htaccess password protection until you're ready for the site to be launched.

freetutes
09-18-2008, 12:54 AM
Google says as it is mentioned in their Add Url page, they crawl the web to add and update new sites to their index. It is obvious you don't have to submit your site to get indexed. It automatically happens. If you want to stop getting indexed for the time being, use robots.txt file and disallow crawling.

cbosleeds
09-18-2008, 02:23 AM
We had a similar scenario recently and I wish I'd read all this before. For some reason we looked at every option except .htaccess password protection which is by far the most common sense and practical of all the alternatives given.

tamecrow
09-18-2008, 04:14 AM
Google says it doesn't.
Debunking: Toolbar doesn’t lead to page being indexed (http://www.mattcutts.com/blog/debunking-toolbar-doesnt-lead-to-page-being-indexed/)

And as a test, I have a start page that is used with our company internally only and is not for public use... but is not hidden from public view, if someone happened across it, it would not be a big deal.

We all use the google toolbar on various browsers... I can't find the page in the SERPs.
I disagree... I think it got leaked.

Any third party services installed on the site? (tracking, counters, ads, etc)
Wouldn't happen to be a cms with rss features on it?

I've done tests on this specifically to discover whether Google indexes pages that are visited with the Google Toolbar, and so far as I can see, it happens. Of course, no SEO experiments can be entirely scientific and without the possibility of extraneous causes.

tamecrow
09-18-2008, 04:14 AM
I wish I had a similar problem. I am trying to get a new site indexed by Google and nothing seems to have worked. I used all the standard tricks including adding to adWords, adding adSense, Analytics, Webmaster Tools code properly. Still not one page is indexed.

PM me your URL and I'll take a look.

tamecrow
09-18-2008, 04:16 AM
Of course, if you're able, another good option would be to host the site locally (on another machine within your network) which is not externally accessible.

Janna122003
09-18-2008, 06:01 AM
I wish I had a similar problem. I am trying to get a new site indexed by Google and nothing seems to have worked. I used all the standard tricks including adding to adWords, adding adSense, Analytics, Webmaster Tools code properly. Still not one page is indexed.

Have you tried building links for the site so that google will find your site?

Terry Van Horne
09-18-2008, 09:50 AM
There is no end to the ways this could happen. With age and a number of other factors there is IMO, no reason to Robot.TXT the content if it is just copy and a crappy design, so what. I never stop SEs from indexing, that's kinda' like a farmer not feedin' the cows until he needs the milk. Controlled rollout is the best way to go because SEs to some extent still refresh/index pages based on the rate of change of the site, so... if a SE sees new pages being added often and those pages are being linked to... bada bing... you are semi-controlling the indexing of the site. I rollout stores a few categories at a time. That way the content also improves as I tweak the CMS writing the meta description and title tags and target article syndication and community development activities at specific pages being rolled out.

flhu
09-18-2008, 12:11 PM
Google definitely, unequivocally, and without question uses new domain registrations to find new sites.

What I did was have a whitelist of IP addresses I allowed to view the site, and displayed a "Coming Soon" page for all others. That worked pretty good.

Terry Van Horne
09-18-2008, 02:31 PM
Not just registrations, IMO, Google is sensitive to any change to a domain record. The type of change determines the activity it produces. So, you park a domain Google attempts to index. You activate by setting the DNS from registrar to a live DNS server and IMO, Google is again going to attempt to index shortly thereafter. I've seen sites that are a OK, they renew some old under the radar spammy domains and the main site gets slammed. Turn off the sites by removing the records from the DNS server and suddenly there is Google indexing what it dropped and all the new pages. I've had suspicions about this for over 4 years and continually get data indicating this sensitivity is a factor you want to coinsider when changing and registering domain records.

morestar
09-18-2008, 08:51 PM
There is no end to the ways this could happen. With age and a number of other factors there is IMO, no reason to Robot.TXT the content if it is just copy and a crappy design, so what. I never stop SEs from indexing, that's kinda' like a farmer not feedin' the cows until he needs the milk. Controlled rollout is the best way to go because SEs to some extent still refresh/index pages based on the rate of change of the site, so... if a SE sees new pages being added often and those pages are being linked to... bada bing... you are semi-controlling the indexing of the site. I rollout stores a few categories at a time. That way the content also improves as I tweak the CMS writing the meta description and title tags and target article syndication and community development activities at specific pages being rolled out.

i agree