View Full Version : Can Google Or Would Google Read The Name Of My Databases?
morestar
03-07-2011, 08:14 AM
Good Monday members of WebProWorld.com and thank you for your time. I have a questions about Google and their relationship to my database. Do they have one?
I've seen a lot of strange things with Google bot over the years, or, something that's come from Google that seems to be able to read my forms and submit them - not to mention the fact that I can give Google-bot access to private member areas of my site (Google's own user-name) if I so choose to.
So with that said, could it or would it be possible or of Google's interest to at least read/find the name of my databases?
I ask because for some of the site's I'm playing with right now, I may want to use the same database but on different sites - the content displayed differently etc. but I have a suspicion Google may not like this practice too much...
Any thoughts are always appreciated!
;)
DaveSawers
03-07-2011, 08:37 AM
Presumably, the only place your database name exists is in code on your server. There's no need for that name to make it into the HTML code of the site so how would a search engine or anyone else discover it?.
mjtaylor
03-07-2011, 08:44 AM
I would not rule out Google being able to see the name of a database, but I would put a small wager at least that says Google does not care what you name your databases.
morestar
03-07-2011, 09:06 AM
See well, when it comes to fighting spam (content farms) and being a multi-billion dollar company I can't see why they wouldn't want to read the name of a database on a server.
I certainly don't know the ins and outs of major search engines, who they employ nor why they employ them but if i was a search engine so concerned about spam than I might as one of my engineers to see if it's possible to read the database name just to see if other sites on the same IP are using it.
Just a thought really...
mjtaylor
03-07-2011, 09:16 AM
And what does a name have to do with relevance? Seriously. How many databases are named database1? Or db1?
a53mp
03-07-2011, 02:55 PM
The only way Google will ever know your database, or even that it has a database is if you are running your site on a Google server or give Google full access to every file on your server.
Anything server side (php, asp, cf, .net, etc) is just that.. server side code. It can not be accessible unless you have access to the server.
morestar
03-07-2011, 03:01 PM
OK, and would it be considered unethical on Google's or anyone's part to even try to obtain that information?
MJ, I'm talking more around the lines of content farms and so forth. If two or more sites are using the same database and offering the same content then to a degree they can be considered a content farm. Let's say Google can see that the two duplicate sites have the same database, that could push the fact that the two are connected. Of course all this if Google could read the name of your databases or figure out how to, or have the name of a site's database scanned in the crawling process somehow.
I know this may sound like a silly idea but I know it's not impossible.
a53mp
03-07-2011, 03:05 PM
It would be duplicate content, which they can do now.
I just finished a project for a client where the script I wrote can serve content for an unlimited number of websites.. who's domains point to the same hosting path, and depending on the domain it will pull different content from the database. Right now there is only 2 sites but theoretically could be thousands or whatever.. all using the same DB, but all with different content. It's not just content though, it content, navigation items, pages, profiles, etc..
Wordpress MU? has the same thing though where you run multiple blogs off the same database, it just creates a bunch of tables as needed.
Some hosting only offers 1 DB, so the user is forced to use multiple tables with 1 DB.. they shouldn't be penalized for that. Content is different though.. which like I said, Google knows about
computergenius
03-07-2011, 03:06 PM
Google is not magic, it can only read pages that visitors to your site can read.
If some of your coding is bad enough to allow your database name to be read by site visitors, then Google is the least of your worries!
SemAdvance
03-07-2011, 04:41 PM
The only way Google will ever know your database, or even that it has a database is if you are running your site on a Google server or give Google full access to every file on your server.
Anything server side (php, asp, cf, .net, etc) is just that.. server side code. It can not be accessible unless you have access to the server.
and googlebot has access to most servers, (as does almost any crawler) where in the crawlers index the files and folders found on the server,... unless instructed otherwise, it is after all an interconnected worldwide web.....
Besides googlebot or any crawler, does not need to know the name of, or have access to a database, to catch dupe content....
.
computergenius
03-07-2011, 05:22 PM
and googlebot has access to most servers, (as does almost any crawler)
Neither Googlebot, nor any other crawler, has access to any files on any of my domains, other than files that the public see. And the public cannot see any database names on any of my servers.
All a bot is, is an automatic site visitor, following the links around.
microtekblue
03-07-2011, 05:51 PM
The answer is a big NO...heres why:
1 - database name is hard coded into a variable connection string
2 - its arbitrary so you can have any name, it doesnt matter what you name your database
3 - google does not have the permission to access all your files on your server. most database connections are hidden in certain db classes or config files that are NOT accessible publicly
a53mp
03-07-2011, 07:15 PM
and googlebot has access to most servers, (as does almost any crawler) where in the crawlers index the files and folders found on the server,... unless instructed otherwise, it is after all an interconnected worldwide web.....
.
That is wrong.. well the concept is.. you are misunderstanding how it works. When a spider/crawler accesses your website, they are accessing the html that is sent to the browser. They do not access the code on the server. When they visit a pure html site for example, they are crawling the html page that is sent to the browser (which is the same html file that is on the server). However when a spider accesses a PHP as ASP (or any server side language) the spider is only accessing the OUTPUT html, not the source coding which lives on the server.
How it works, is the visitor requests a page from the browser, the browser connects to the server and if it is HTML, the server will send the HTML code back to the browser for the browser to display. But PHP for example, once the server receives the request, it will send the PHP page to the PHP interpreter and will process the PHP and and PHP will create/output the HTML code that is sent back to the browser to display. This is how it works whether you are a visitor, bot, spider, dinosaur, alien.. whatever.. that's how it works and there is no way around it. The only way you can see the source server code, is if you gain access to the physical server (ftp, ssh, etc).. This is why in server side languages you can have a configuration file like config.php which has all the relevant information about your site, like database connections, usernames, passwords, server paths, etc.. because even if you access the config.php directly from a browser, you will not see any output HTML because PHP would not send any of that information.. unless you do some really bad stupid coding and print that information out.
Hope that helps.
SteveGerencser
03-07-2011, 09:06 PM
And what does a name have to do with relevance? Seriously. How many databases are named database1? Or db1?
How many people have named things in css 'hidden text' or 'seo'?? ;)
Google has shown the desire an ability to guess at directory names such as wp-content/plugins to crawl them and see what is in there in the past.. I see no reason to think that they wouldn't at least try to see what a db name is..
a53mp
03-07-2011, 09:44 PM
How many people have named things in css 'hidden text' or 'seo'?? ;)
Google has shown the desire an ability to guess at directory names such as wp-content/plugins to crawl them and see what is in there in the past.. I see no reason to think that they wouldn't at least try to see what a db name is..
There are only two ways Google can see the database name
1. Hack into your hosting server and read your config file
2. Hack into your mysql server and read your database
Even if Google knew your database name (which is impossible) there is nothing they can do with it. First thing, it's not really possible Google will hack into your server.. but even if there was a way for Google to know your database name, the only way it would know that it was the correct database name for the correct site being using it, would be to also hack into your mysql server.
Yes, Google can theoretically search in folders like /wp-content/plugins but I doubt that Google does it. Is there documentation showing that actually happens? What DOES happen though is hackers will send out spiders looking for directories like /forum /bb /phpbb /admin /phpmyadmin etc to find common scripts with known vulnerabilities.
Either way... the bottom line is
1. Google can NOT see your database name and I have absolutely no reason to think that they will ever even attempt to try.
2. Google can NOT see what is inside your database
3. Google can NOT see your server side code (only output HTML)
4. Google while may be taking over the world, does NOT have access to your server, will NOT hack your website, and anyone arguing otherwise only shows their lack of understanding of the web, spiders, and servers.
mjtaylor
03-08-2011, 07:54 AM
Whether Google can or cannot see your database is the question because if it can't then there is no concern, right morestar?
But let's assume for a moment that they can. Why would they? Your example
MJ, I'm talking more around the lines of content farms and so forth. If two or more sites are using the same database and offering the same content then to a degree they can be considered a content farm. Let's say Google can see that the two duplicate sites have the same database, that could push the fact that the two are connected. Of course all this if Google could read the name of your databases or figure out how to, or have the name of a site's database scanned in the crawling process somehow.
I know this may sound like a silly idea but I know it's not impossible.
It may be possible. The silly part ;) is that it can't matter to Google since there is no logical connection between the coincidence of two identically named dbs. It doesn't necessarily reveal a connection. It is not logical to draw any pertinent conclusion based on a database name. And an algorithm must be logical in order to function effectively, does it not?
If your databases contain the same content, G will see that on the page files and doesn't need the db name to make the connection. Google doesn't care if you own all the dbs. It only cares if the content is relevant to the query.
OK, and would it be considered unethical on Google's or anyone's part to even try to obtain that information?
Ethical? How would it be unethical?
SteveGerencser
03-08-2011, 10:09 AM
Yes, Google can theoretically search in folders like /wp-content/plugins but I doubt that Google does it.
http://www.google.com/search?hl=en&rls=GGGL,GGGL:2006-34,GGGL:en&q=inurl:wp-content/plugins&btnG=Search
192,000,000 results show you are wrong.. I'm just saying that I do not trust Google to always do the right thing and stay where they belong.. They have a long tradition of going after every single bit of data they can get their crawlers in to, and I see no reason to think that they would ever stop just because 'there is no need'..
dagaul101
03-08-2011, 01:58 PM
I don't think they would have any reason to have your database info, much less know where it is
a53mp
03-08-2011, 03:15 PM
http://www.google.com/search?hl=en&rls=GGGL,GGGL:2006-34,GGGL:en&q=inurl:wp-content/plugins&btnG=Search
192,000,000 results show you are wrong.. I'm just saying that I do not trust Google to always do the right thing and stay where they belong.. They have a long tradition of going after every single bit of data they can get their crawlers in to, and I see no reason to think that they would ever stop just because 'there is no need'..
I'm not wrong. All that shows is that 192million people have Wordpress installed using a plugin that has an image or link accessible on the website, and that their site does not have Directory Listing disabled. Remember, spiders index everything they see, including images and links. If you run a WP site and have a plugin that has a link in, once it gets indexed G will know the path to that image, which includes /wp-content/plugins
I would put money down that if you check your WP logs, the ONLY time you will G indexing the contents of the plugins folder is if you do NOT have Directory Listing enabled. and even then you still need active plugins being linked somehow on the main site.
Case in point 1
This is a clients site I set WP on hg3law.com/blog
His site is indexed in G
/wp-content/plugins/
is NOT indexed.. he also does NOT have Directory Listings shown.
Case in point 2
The link you gave: http://www.google.com/search?hl=en&rls=GGGL,GGGL:2006-34,GGGL:en&q=inurl:wp-content/plugins&btnG=Search
That's nice and all.. but you are also getting results for people talking about plugins fixing WP... not really a good example.
Why not try searching for inurl:wp-content/plugins +index
That will show you ONLY plugin results with Directory Listing enabled... 74 million results, ALL directory listing results
Ok.. maybe a coincidence. but maybe not?
Why not try searching for inurl:wp-content/plugins -index
Wait.. what just happened? Not ONE result showing the plugin directory.
morestar
03-08-2011, 03:35 PM
...it can't matter to Google since there is no logical connection between the coincidence of two identically named dbs. It doesn't necessarily reveal a connection.
Right it doesn't necessarily reveal a direct connection and of course my thoughts were based on pure speculation but from a speculative standpoint, if I were a search engine with a lot at stake and encountered 3 websites that were serving the same content, clearly from the same DB (dbname), I would like to discredit two of them in the search results - significantly.
If I were the owner of a search engine as wealthy as the big wigs, I'd order a few people to investigate all the ramifications of search related to content and their sources. That's all...
If your databases contain the same content, G will see that on the page files and doesn't need the db name to make the connection. Google doesn't care if you own all the dbs. It only cares if the content is relevant to the query.
Right and with that, back to my OP, I could launch 3 websites with the same content but of course changed up a bit for originality.
a53mp
03-08-2011, 03:55 PM
If I were the owner of a search engine as wealthy as the big wigs, I'd order a few people to investigate all the ramifications of search related to content and their sources. That's all...
except the database.. since it is impossible and unethical if they found a way to do it without the owners permission. :)
Tiggerito
03-09-2011, 01:40 AM
I'd say that if Google can find out the name of your database, then you have a security hole and need to fix it.
For most servers, the database is not accessible outside the hosting system and any database details should be kept in files that are not visible outside your own private part of the server. The system should also be internal or secured so no one can eavesdrop on messages.
Anything less and your opening yourself up to potential attack.
In some cases, you may have to open up the database to the public. Then it is even more important that no database details are available outside your private and protected servers.
Conclusion: if Google can find out your databases name, you have far bigger issues, because others can too. And oftent the database name is stored with login details.
I'd go on the side that detecting actual duplicate content on the website is the right way. It raises an interesting point though. Websites using the same database are likely to use the same URL structures for the duplicated pages. be it the id for the page or the friendly name used. That would be a big tell.
a53mp
03-09-2011, 02:08 AM
I'd go on the side that detecting actual duplicate content on the website is the right way. It raises an interesting point though. Websites using the same database are likely to use the same URL structures for the duplicated pages. be it the id for the page or the friendly name used. That would be a big tell.
Great point.. especially with WP sites
DonOmite
03-09-2011, 06:40 PM
Now, in steps a nice Security+ certified person (me). It doesn't matter if G can see the name of your database, but if a spider or visitor CAN then you have done something seriously wrong. The most common way this happens is an error page pops up dumping a lot of data that nobody should be seeing but the developers. It may show lines of code that include your connection string. Always be sure that there is a custom error page for your website that does NOT show code. It should just say something like "ooooops. we are working on it". Meanwhile the actual error is either logged or emailed to somebody.
Next, your db should always require credentials to access it. So the queries will require a username and password. Now those should be encrypted and the encryption is stored in an application variable that spiders can't see.
Finally, the db is normally on a seperate server. I don't know of any hosting companies that have the webserver and db server combined. And even if the spider can see every folder on the server and the db is on that server, the spider can't actually access the tables and fields in the db without running a query against it. That would be a mighty sophisticated spider.
Do those spiders exist? Yes. But they do not look for the db name they look for forms. They will do what is called a SQL Injection attack. Once again, proper programming stops these cold but when I do security checks on websites I find this to be the biggest problem around. Make sure your programmers know to close that hole.
I could go on and on. Bottom line is Google can not know what is in your db and the name should not be accessible to the spider.
TechEvangelist
03-16-2011, 08:00 AM
except the database.. since it is impossible and unethical if they found a way to do it without the owners permission. :)
I would not call anything impossible with the web. I agree that it should not be possible if proper security is on place on a server, but nothing is impossible. Good hackers (an oxymoron) find very creative ways to circumvent security.
Ethics are an issue that is "in the eye of the beholder". Google tracks a ton of information about user activities that they never reveal. They have also been in court numerous times in Europe for privacy issues and violations. In the USA, we don't express the same concerns about Big Brother that the EU does, but that is all a matter of perspective.
Finally, the db is normally on a seperate server. I don't know of any hosting companies that have the webserver and db server combined.
Have you ever heard of localhost? I've been doing web development for 13+ years and find that most hosting companies have the DB on the same server. That doesn't really matter, because where the DB is set up does not necessarily make it easier to break into. It is still a server-side function that requires that someone knows a DB name, username, password and other info in order to gain access.
This entire issue really comes down to whether or not G would have any desire to break into a database. I don't see what they would gain by doing that, because everything that they need to see is available in the client-side code.
a53mp
03-16-2011, 04:49 PM
If it was that easy to read server side code without hacking into a system, then millions of websites would be at the hands of anyone who know how to exploit it. Fortunately, that is impossible. Can google hack your site? Theoretically, say if you have gmail, and have hosting with a company, theoretically G could get into your email, find your login information, then get into your site via ftp, ssh, control panel, etc.. but that isn't really related to the question.
TechEvangelist
03-17-2011, 01:58 PM
It sounds like you are saying that is is possible, but just not probable. :D
a53mp
03-17-2011, 03:58 PM
Nope.. it's not possible, unless they hack into yours system. But that is not the about the original question at hand.
C0ldf1re
03-18-2011, 11:13 AM
... Seriously. How many databases are named database1? Or db1?
Especially when countless webmasters install site software through cPanel and just accept the default database name offered.
bmservice
05-05-2011, 03:09 AM
Actually they can not. Spiter is not so clever as hacker to break into you database.
C0ldf1re
05-05-2011, 04:33 AM
Actually they can not. Spiter is not so clever as hacker to break into you database.
I'm assuming you meant spider! Even if Google could do it, would it be worth their while, especially in view of the risk of adverse publicity?