PDA

View Full Version : How did Google find my test page?



Tarzan2
11-27-2009, 09:18 PM
I was led to believe that Google found web pages by following links.

I made a test page with no links going to that page. It is on a brand new dot CA domain name that was just registered by me. It is hosted in a sub-folder of one of my other domains but uses the domain pointing tool to point to its own domain name.

I was doing some research on keywords and found this test page ranking fairly high (high enough to find it by accident) in the serps when there should be no way for Google to know the page exists! The page has no practical purpose yet, so I did not want the search engines to find it at this point. How would Google have found it?

wige
11-28-2009, 02:37 AM
You mentioned the page has a domain name pointing to it. That may be how Google found it. I have heard enough rumors to believe that Google may be getting proactive trying to find and "preindex" if you will new domains, sometimes within hours of them going live. Of course, there is always the possibility that someone else owned the same domain at some point in the past and Google found it that way.

Of course, there are some theoretical ways Google could possibly discover a page, although Google denies they use these methods - if you traveled from the page in question to a site that has Adwords or Analytics installed, Google might be able to see the URL in the referrer tag.

Web Marketer
11-28-2009, 05:23 AM
ya google noticed you just because of the page which has a domain pointing to it.....

Tarzan2
11-30-2009, 06:52 PM
You mentioned the page has a domain name pointing to it. That may be how Google found it. I have heard enough rumors to believe that Google may be getting proactive trying to find and "preindex" if you will new domains, sometimes within hours of them going live. Of course, there is always the possibility that someone else owned the same domain at some point in the past and Google found it that way.

Of course, there are some theoretical ways Google could possibly discover a page, although Google denies they use these methods - if you traveled from the page in question to a site that has Adwords or Analytics installed, Google might be able to see the URL in the referrer tag.

If what you say is true, then Google would have to have made a deal with all the registrars to be notified when a new domain is registered.

As for the name being pre-owned; I rather doubt it as most of the pre-owned domains are held and used for serving ads while a message shows up saying the name may be for sale. At least that is my experience with pre-owned domain names.

I have a hunch that somehow Google peeks into your public HTML folder and scouts around where it shouldn't be. I was building the page offline and uploaded it via ftp. No links to any Google service, so no referer stats to collect. There is ONE OTHER POSSIBILITY though. I use the Google Chrome browser and it could be spying on me!

Tarzan2
11-30-2009, 07:09 PM
ya google noticed you just because of the page which has a domain pointing to it.....

This remark is good only for boosting your post count and doesn't contribute anything of value to the question. You made a statement that has nothing to back it up and doesn't even make sense. It does nothing to validate your knowledge of SEO; in fact it does the opposite. Please take the time to say something of value or say nothing at all. My guess is that you just want the backlinks and don't care to contribute to this forum in exchange. Filling the forum with useless junk just degrades the forum.

deepsand
11-30-2009, 08:57 PM
If what you say is true, then Google would have to have made a deal with all the registrars to be notified when a new domain is registered.
Not necessary, as Google is a registrar; rumor has it that they undertook such solely for the purpose of having unfettered access to such data.


I use the Google Chrome browser and it could be spying on me!
That would be my guess.

Tarzan2
11-30-2009, 10:18 PM
Not necessary, as Google is a registrar; rumor has it that they undertook such solely for the purpose of having unfettered access to such data.


That would be my guess.
But I would have had to register through Google Registry for them to know about the site and I used a Canadian registrar, so unless I misunderstand the role of a registrar, the only other logical explanation is that Chrome is spyware.

wige
12-01-2009, 10:36 AM
But I would have had to register through Google Registry for them to know about the site and I used a Canadian registrar, so unless I misunderstand the role of a registrar, the only other logical explanation is that Chrome is spyware.

Although Google is a registrar, as far as I know they don't actually sell domains, at least not currently. When a domain is purchased, the information is sent back to ICANN to be logged, added to the DNS system, and entered into the WHOIS database. As a registrar, Google may have the ability to run certain queries against the WHOIS database that are not available to non-registrars, including being able to get lists of newly registered domain names.

In fact, I know for a fact that, as a registrar, Google has access to this information, as ICANN makes no secret of the fact that this information is available for the gTLDs. ccTLDs, on the other hand, are harder to obtain.

As far as Chrome talking to Google, the nice thing is that the browser is open source and anyone really curious can easily find out by browsing the source code, or even by setting up a tracking proxy. I am too lazy to do that today, so I am just going through what the hacker community at large has found. This is fairly in line with what Google readily admits. Chrome only sends the following information to Google.com:


Search queries (generally, anything missing a TLD or containing spaces or special operators) entered into the address bar.
Usage statistics, if the user opts in.
Crash reports, following user prompt.
The URL being visited if and only if the status code is 404 and the file length is less than or equal to 512 bytes.
Automatic update check every 25 hours.
Suspicious sites file downloaded every 30 minutes.
Bookmarks. When you bookmark a page, the URL is sent to your Google Account - but, only if you are using a development build (Alpha or Beta) and have opted in to Bookmark Synchronization.

And, thats it. Google Chrome does not send browsing or history information to Google. Ok, there is a way to specifically enable such behavior, I believe, in production builds, but I am not going to post the steps to do so here. If Chrome did send such usage information, it would be fairly easy to detect, and would have made big news.

On the other hand, as mentioned above, what you do on pages that are owned by Google, or which use Google services such as Analytics or Adwords, will report back to Google.

Tarzan2
12-02-2009, 12:41 AM
Thanks Wige,

I'm still shaking my head trying to get the cobwebs out in order to figure out how that page got indexed. None of what you explained (except for the part about getting access to new registrations) would have fit as far as I can tell.

The only good thing about it is that when I am ready to actually use the page, it will have had some time to ripen! Considering that it is listed fairly high now only brings confidence that when I start to use the page that it will be found. Pointing links to it will only make it rank higher!

What are g and cc Top Level Domains?

wige
12-02-2009, 09:56 AM
gTLDs are generic TLDs such as .com, .net and .edu. ccTLDs are Country code TLDs such as .us and .uk.

deepsand
12-02-2009, 08:51 PM
As far as Chrome talking to Google, the nice thing is that the browser is open source and anyone really curious can easily find out by browsing the source code, or even by setting up a tracking proxy. I am too lazy to do that today, so I am just going through what the hacker community at large has found. This is fairly in line with what Google readily admits. Chrome only sends the following information to Google.com:


Search queries (generally, anything missing a TLD or containing spaces or special operators) entered into the address bar.
Usage statistics, if the user opts in.
Crash reports, following user prompt.
The URL being visited if and only if the status code is 404 and the file length is less than or equal to 512 bytes.
Automatic update check every 25 hours.
Suspicious sites file downloaded every 30 minutes.
Bookmarks. When you bookmark a page, the URL is sent to your Google Account - but, only if you are using a development build (Alpha or Beta) and have opted in to Bookmark Synchronization.

Nicely done.

Was this from a single source or multiple ones?

Which one(s)?

wige
12-02-2009, 10:49 PM
Most of the points I took off Matt Cutts' blog, then searched for confirmation. The last point is based on a new feature that is being developed for Chrome. I have not found any other references to other communication between Chrome and Google.

It is fairly easy to confirm this information though. Simply install Paros Proxy on your computer, and route the browser to run through it. Paros will give you a report of all HTTP traffic coming out of the browser.

Tarzan2
12-03-2009, 03:21 AM
gTLDs are generic TLDs such as .com, .net and .edu. ccTLDs are Country code TLDs such as .us and .uk.

Thanks for the clarification wige. I would never have guess Generic, but I may have figured out the Country code abbreviation on my own.

deepsand
12-03-2009, 09:19 PM
install Paros Proxy on your computer, and route the browser to run through it. Paros will give you a report of all HTTP traffic coming out of the browser.
Oh, great, just what I need; another neat toy to suck up my time. ;)

Presumably, the data captured & reported by Paros is not unlike that provided by HttpFox.

And, from a quick overview of the Parosproxy.org site, it appears that one toggles its use on/off by way of the browser's Proxy settings. Correct?

From one tinkerer to another, thanks for the pointer.

wige
12-04-2009, 10:03 AM
Correct.

The main difference between Paros Proxy and HTTPFox is that you can change information as it is in transit. Plus, you can use it with any browser. This is the tool I used to determine that Google is capable of detecting rank checking utilities and scrambling the results it returns. And to figure out that there is actually a server, owned by Google, known as the TrustRank Server.

deepsand
12-04-2009, 07:12 PM
This is the tool I used to determine that Google is capable of detecting rank checking utilities and scrambling the results it returns. And to figure out that there is actually a server, owned by Google, known as the TrustRank Server.
Were either of these two matters elaborated in other threads?

If so, can you point me to such?

If not, each sounds like a topic worth its own thread.

Doc
12-04-2009, 10:23 PM
Correct.

The main difference between Paros Proxy and HTTPFox is that you can change information as it is in transit. Plus, you can use it with any browser. This is the tool I used to determine that Google is capable of detecting rank checking utilities and scrambling the results it returns. And to figure out that there is actually a server, owned by Google, known as the TrustRank Server.

HUH! That's interesting! Any further information on that, wige?

Tarzan2
12-06-2009, 04:07 AM
Isn't it amazing how one thing leads to another and there is always more interesting information to be obtained from this forum?
I'm sure glad I found this site!

sequencehosting
12-10-2009, 02:03 PM
Ah this also happened to me. It's very annoying. You may wish to create a robots.txt file to deny search engines access to that folder.

deepsand
12-10-2009, 02:11 PM
You may wish to create a robots.txt file to deny search engines access to that folder.
Such will not suffice to deny access, but only to ask that a well behaved 'bot not crawl the specified page(s).

sequencehosting
12-10-2009, 02:15 PM
Such will not suffice to deny access, but only to ask that a well behaved 'bot not crawl the specified page(s).

Sorry for my bad explanation. Yes you are right.