View Full Version : Problems Getting Site Relaunched Crawled
savantcreative
12-09-2010, 09:20 AM
I relaunched a website on 11/18/2010 and am pretty good with this kind of stuff - sitemaps, webmaster tools link building, seo...
I am having the weirdest problem though. It is a 7 page site and Google has crawled 6 of my pages and they show up in the serps as being cached but my Home Page has not! It is the one url that has remained the same and it has not been crawled of cached yet.
What can I do do? Until it does none of my new keywords are helping me.
The other is that after all of this time I have the opposite problem with Yahoo and Bing-they have only indexed and re-crawled my home page which is ranking great.
Thanks for your help
mjtaylor
12-09-2010, 05:00 PM
What's your domain? You can type it domainnameDOTcom since you don't have enough posts to make a link.
claybutler
12-09-2010, 05:08 PM
I know it seems like a long time to you, but six out of seven is not unreasonable for a relaunch less than a month ago, especially since the one that isn't indexed is the one that didn't change it's URL.
I'd give it a another week before I'd get overly concerned.
mjtaylor
12-09-2010, 05:10 PM
Good point, Clay, though I thought it just a bit odd that the index page would be last to be, um, indexed. ;)
claybutler
12-09-2010, 05:26 PM
Good point, Clay, though I thought it just a bit odd that the index page would be last to be, um, indexed. ;)
I'd bet good money that Google has indexed. It just hasn't figured out it's a new version yet so it's still showing the old one. The other ones were a no brainer because the URL's are different. Google's just a machine after all.
savantcreative
12-09-2010, 05:56 PM
What's your domain? You can type it domainnameDOTcom since you don't have enough posts to make a link.
savantcreativegroupDOTcom
Thanks :)
savantcreative
12-09-2010, 05:58 PM
When they crawl a site they cache it as well, right. I am not showing for my new keywords in G but am in Yahoo and Bing. Isn't that strange?
Thanks
Tony_V
12-09-2010, 10:08 PM
search savantcreativegroup and it comes up. although the cache is missing most of the layout, so, things are still updating I take it.
Tony V
martindow
12-10-2010, 05:28 AM
I wonder if it is really such a problem. If someone is searching for 'web design new york' or 'green advertising' it is better if they go directly to your pages dealing specifically with that. Many searches bypass the home page.
savantcreative
12-10-2010, 08:31 AM
I am dependent on the keywords on my home page though
Bernd
12-10-2010, 11:11 AM
Get some quality backlinks and crawling and indexing will be faster and the ranking of your site will be better.
savantcreative
12-10-2010, 11:26 AM
It's not the indexing I worried about. It is having my home page crawled and cached. Because it has not been it does not turn up in searched for the new keywords.
Bernd
12-10-2010, 11:30 AM
Because it has not been it does not turn up in searched for the new keywords.That's the indexing. :)
savantcreative
12-10-2010, 11:48 AM
Because it is an existing domain the index.html file is indexed. The problem is that it has not been crawled so Google just not know the content of the new page just the old one. Does that make sense?
Thanks
Bernd
12-11-2010, 07:39 AM
Your new content isn't indexed, because it isn't even crawled.
http://seoblog.intrapromote.com/2008/08/the_difference_1.html
claybutler
12-11-2010, 07:46 AM
Your new content isn't indexed, because it isn't even crawled.
http://seoblog.intrapromote.com/2008/08/the_difference_1.html
You got that backwards. He has been crawled and indexed. That's why his new pages are showing. It's the new index page that isn't in the SERPs yet. The new pages have new URLs so it was very clear to index them. The home pafge has the same URL so it's still showing the old content until it figures out it's also new content.
savantcreative
12-11-2010, 11:07 AM
Thank you Clay. That is exactly what is happening. Do you think there is anything I can do to speed things up a bit?
mjtaylor
12-11-2010, 09:01 PM
search savantcreativegroup and it comes up. although the cache is missing most of the layout, so, things are still updating I take it.
Tony V
The cache is clearly not the newer site.
It doesn't look as though your site was relaunched on 11/18 since the Google cache of 11/24 shows text that is not what your current site shows.
With PR2, I am guessing your site is not going to be crawled more frequently than once a month - what was the usual interval? A powerful link or two might effect it, or a big social network push. If you could drive a large amount of traffic and get a series of retweets, you might have success in shortening the time.
deepsand
12-11-2010, 10:54 PM
It's not the indexing I worried about. It is having my home page crawled and cached. Because it has not been it does not turn up in searched for the new keywords.
That's the indexing. :)
Because it is an existing domain the index.html file is indexed. The problem is that it has not been crawled so Google just not know the content of the new page just the old one. Does that make sense?
Thanks
There seems to be some confusion here re. what indexing is and is not.
Simply put, indexing is no more than an SE taking note that it knows of the existence of something. That something may be merely a URI; or, it may be the contents of that URI.
What it does or does not do based on such knowledge is immaterial to its being indexed. Absence of evidence of such in the SERPs does not speak to its being or not being indexed, with one exception; if one does a search for a URI, with no match found, then that URI is either not indexed or is banned.
Now, obviously, a URI can be indexed without that being the case for its content; however, the converse cannot be true. And, for content to be indexed, it must first be crawled.
As both the cache and description snippets are dependent on the content being crawled and indexed, failure of a SERP listing to reflect current content in one or both of these elements signals four possible causes:
Content not crawled;
Content not indexed;
Corrupted indices; or,
Corrupted or delayed data base replication.
Before spending any time guessing as to the cause of the perceived problem, first determine if one actually exists, by finding out when the resources(s) in question was(were) last visited by googlebot.
Between your host logs and Goggle Webmaster Tools account you should be able to find that data.
deepsand
12-11-2010, 11:02 PM
With PR2, I am guessing your site is not going to be crawled more frequently than once a month ...
I see PR 0 sites crawled 20 or more time a month on an ongoing basis.
Flyinjs
12-11-2010, 11:43 PM
I am getting a lot of white space after your text ends on your home page. Is it just me or what?
deepsand
12-11-2010, 11:59 PM
I am getting a lot of white space after your text ends on your home page. Is it just me or what?
Same here; there are two empty & unterminated divs at the end.
mjtaylor
12-12-2010, 12:43 PM
I see PR 0 sites crawled 20 or more time a month on an ongoing basis.
I meant page. Ooops.
savantcreative
12-12-2010, 01:28 PM
I can tell that I have all 8 urls indexed when I view my sitemap. I checked my hosting logs and found this: Do you see the
Http Code: 304 for my home page? Does that mean that the Google bot thinks the page hasn't changed? If so, what should I do? Thanks!
Host: 66.249.72.82
/seo-consulting.html
Http Code: 200 Date: Dec 12 07:14:59 Http Version: HTTP/1.1 Size in Bytes: 7580
Referer: -
Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
/website-development.html
Http Code: 200 Date: Dec 12 07:24:36 Http Version: HTTP/1.1 Size in Bytes: 8281
Referer: -
Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
/robots.txt
Http Code: 200 Date: Dec 12 07:34:37 Http Version: HTTP/1.1 Size in Bytes: 24
Referer: -
Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
/contact-a-savant.html
Http Code: 200 Date: Dec 12 08:15:00 Http Version: HTTP/1.1 Size in Bytes: 6871
Referer: -
Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
/commercial-photographer.html/robots.txt
Http Code: 404 Date: Dec 12 09:25:32 Http Version: HTTP/1.1 Size in Bytes: 938
Referer: -
Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
/commercial-photographer.html
Http Code: 200 Date: Dec 12 09:25:32 Http Version: HTTP/1.1 Size in Bytes: 9784
Referer: -
Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
/blog
Http Code: 404 Date: Dec 12 12:11:18 Http Version: HTTP/1.1 Size in Bytes: 938
Referer: -
Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
/
Http Code: 304 Date: Dec 12 13:04:45 Http Version: HTTP/1.1 Size in Bytes: -
Referer: -
Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
williamc
12-12-2010, 01:56 PM
It means your hosting companies server setup is very very odd. What host are you using?
savantcreative
12-12-2010, 02:01 PM
Is that what is causing my new home page not to get crawled and cached? My hosting company is HostPapa.
Thanks
williamc
12-12-2010, 02:13 PM
Good question. I don't recall ever running into a 304 on a homepage, and am not sure how google handles that specific instance to be honest. I would say there is a damn good chance that could be the cause tho.
savantcreative
12-12-2010, 02:22 PM
Is there anything that I can do? I would really like to get me new home page properly indexed because I have made some keyword changes.
Thanks again for your help :)
williamc
12-12-2010, 02:29 PM
If it were me, I would have already submitted a support ticket to the host, or found a real host...
savantcreative
12-12-2010, 02:31 PM
Why do you think this is a hosting problem and not a Google one? Yahoo and Bing have correctly picked up the new home page. Why is HostPapa not a real host?
Thanks.
williamc
12-12-2010, 03:11 PM
In retrospect, I checked your site with lynx...
You are correct, it is NOT a hosting issue. Nor is it a Google issue. It is pure and simple PEBKAC.
[root@delta1 ~]# lynx -head 'http://www.savantcreativegroup.com/'
HTTP/1.1 301 Moved Permanently
Date: Sun, 12 Dec 2010 20:08:52 GMT
Server: Apache
Location: http://savantcreativegroup.com/406.shtml
Connection: close
Content-Type: text/html; charset=iso-8859-1
[root@delta1 ~]# lynx -head 'http://savantcreativegroup.com'
HTTP/1.1 406 Not Acceptable
Date: Sun, 12 Dec 2010 20:11:03 GMT
Server: Apache
Accept-Ranges: bytes
Content-Length: 877
Connection: close
Content-Type: text/html
You have some really messed up redirects going there mate.
savantcreative
12-12-2010, 03:47 PM
Thanks for checking it out but I haven't written any redirects. There is just an htaccess file and when I switched hosting providers, I updated the nameservers on the domain. Any ideas?
Thanks a million
savantcreative
12-12-2010, 03:48 PM
One more thing. What is PEBKAC?
Thanks
williamc
12-12-2010, 04:04 PM
Thanks for checking it out but I haven't written any redirects. There is just an htaccess file
That is where the redirect(s) would be.....
One more thing. What is PEBKAC?
Problem Exists Between Keyboard And Chair
savantcreative
12-12-2010, 04:44 PM
I spoke with my hosting company and they are saying that because the original site was Wordpress that things are wacked. Does that sound right?
Thanks
williamc
12-12-2010, 06:02 PM
Only if they kept the original WP htaccess file and uploaded your new site, that could cause issues.
savantcreative
12-12-2010, 07:08 PM
No. I wrote that file and have web using it for years without issue.
deepsand
12-12-2010, 08:47 PM
Good question. I don't recall ever running into a 304 on a homepage, ...
That's because, if the 304 is correct, it means that the client should have a current copy of the resource in its cache.
. and am not sure how google handles that specific instance to be honest.
The 304 means that the resource has not changed since the date specified in the Googlebot's Request Header.
Odd thing is that Google fairly recently said that it no longer checks If_Modified_Since prior to downloading a resource.
williamc
12-12-2010, 08:54 PM
Right, but if you read a couple posts further, I checked it from one of my servers and got some very different results than what Google told of.
deepsand
12-12-2010, 09:07 PM
No. I wrote that file and have web using it for years without issue.
Do you mean that you created the htaccess? If so, why the re-direct to the server-side include file 406.shtml?
deepsand
12-12-2010, 09:16 PM
Right, but if you read a couple posts further, I checked it from one of my servers and got some very different results than what Google told of.
That's because the Header Requests differed; Googlebot's included the If_Modified_Since check.
savantcreative
12-13-2010, 08:31 AM
The htaccess was just to redirect all www traffic to non www. I have used it for years without issue. I do not know what the other one is about. Neither does the hosting company.
deepsand
12-13-2010, 04:46 PM
The htaccess was just to redirect all www traffic to non www. I have used it for years without issue. I do not know what the other one is about. Neither does the hosting company.
Normally a 406 is generated when a resource exists, but is of a (MIME) type other than that expected by the requesting agent.
Seeing your htaccess file might be useful.
williamc
12-13-2010, 04:53 PM
Seeing your htaccess file might be useful.
Agreed. If you can paste the contents of the .htaccess file here we would have a better idea what to tell you from this point forward.
savantcreative
12-13-2010, 05:37 PM
RewriteEngine On
RewriteCond %{HTTP_HOST} ^www.savantcreativegroup\.com
RewriteRule (.*) http://savantcreativegroup.com/$1 [R=301,L]
RewriteCond %{THE_REQUEST} ^.*\/index\.html?
RewriteRule ^(.*)index\.html?$ http://savantcreativegroup.com/$1 [R=301,L]
Thanks again for all of your help.
deepsand
12-13-2010, 06:27 PM
The 406 error appears to have been an artifact of the LYNX Request Header having used a Request Method of "TRACE;" using "GET" and "HEAD" both yield a 200 response code. As the site has a Custom 406 page, the result of the 301 is the file named 406.shtml.
With that out of the way, the issue then becomes one of why Googlebot's If_Modified_Since check yields a 304. Either it is using a bad date in its request; or, the file's last_modified date is wrong.
HighFalutin
12-15-2010, 07:46 PM
Hi, I'm just wondering the wisdom of deleting all your pages except the home page and creating new urls in their place for a relaunch? Aren't the urls you deleted, indexed and aged in the search engines and most likely have inbound links pointing to them? Wouldn't deleting them wipe out all those links and any seniority your you sub pages have earned? Would in not be better to redo the page, keep the url the same and add new urls if you have additional content. Just curious, that's all. Any input would be welcome.
mjtaylor
12-15-2010, 10:10 PM
Hi, I'm just wondering the wisdom of deleting all your pages except the home page and creating new urls in their place for a relaunch? Aren't the urls you deleted, indexed and aged in the search engines and most likely have inbound links pointing to them? Wouldn't deleting them wipe out all those links and any seniority your you sub pages have earned? Would in not be better to redo the page, keep the url the same and add new urls if you have additional content. Just curious, that's all. Any input would be welcome.
This is often done because a site has changed from simple html to a database site, perhaps with different extensions. Or perhaps because content has changed and the new file names reflect keywords ... or other similar reasons. A 301 redirect is then put in place and the links with their link juice flow to the new pages. The 301 redirects are not perfect and Google warns that some PR may be lost, but the system seems to work quite well.
savantcreative
12-17-2010, 08:02 AM
Hi, I'm just wondering the wisdom of deleting all your pages except the home page and creating new urls in their place for a relaunch? Aren't the urls you deleted, indexed and aged in the search engines and most likely have inbound links pointing to them? Wouldn't deleting them wipe out all those links and any seniority your you sub pages have earned? Would in not be better to redo the page, keep the url the same and add new urls if you have additional content. Just curious, that's all. Any input would be welcome.
The problem is that I changed the site model from word press to static. If the old pages were flat I would have left them. Does that make sense?
Thanks
deepsand
12-17-2010, 02:35 PM
Have you yet reviewed the last_modified date of the file in question, per http://www.webproworld.com/webmaster-forum/threads/105417-Problems-Getting-Site-Relaunched-Crawled?p=542490&viewfull=1#post542490 ?
savantcreative
12-18-2010, 03:01 PM
Hi Deepsand. I think it was a Google issue more than anything else. I just got crawled on 12/15 a started showing up in the SERPS.
Thanks for your help :)
williamc
12-18-2010, 03:07 PM
Your headers are still messed up. You STILL have an issue there that should be corrected. Unless you wish it to come back and bite you in the arse later, you would do well to correct it.
savantcreative
12-18-2010, 03:10 PM
Can you explain what I need to do?
Thanks
deepsand
12-18-2010, 03:11 PM
Your headers are still messed up.
In what way?
williamc
12-18-2010, 03:13 PM
I am still seeing 301 and 406 headers when accessing the site via www and non-www respectively from any server.
Same results as posted here: http://www.webproworld.com/webmaster-forum/threads/105417-Problems-Getting-Site-Relaunched-Crawled?p=542340&viewfull=1#post542340
savantcreative
12-18-2010, 03:15 PM
I do not know what that means.
williamc
12-18-2010, 03:18 PM
savant: Email your hosts support department and tell them to run lynx on your site with a full head check like this:
lynx -head 'http://savantcreativegroup.com'
Ask them why in the heck your site is returning a 406 (not acceptable) header instead of the proper 200 (success) header.
I have a feeling that they are doing this based on UserAgent, and if so, Google very well may run into more issues 'at times' while re-spidering your site. Which is why I said correct them before they come back to bite you in the arse.
deepsand
12-18-2010, 03:24 PM
I am still seeing 301 and 406 headers when accessing the site via www and non-www respectively from any server.
As noted at http://www.webproworld.com/webmaster-forum/threads/105417-Problems-Getting-Site-Relaunched-Crawled?p=542490&viewfull=1#post542490 ,
The 406 error appears to have been an artifact of the LYNX Request Header having used a Request Method of "TRACE;" using "GET" and "HEAD" both yield a 200 response code. As the site has a Custom 406 page, the result of the 301 is the file named 406.shtml.
I.e., I can only duplicate the 406 when using "TRACE."
williamc
12-18-2010, 03:28 PM
Nope, checked that the first time you said it, and set trace to off by default. I also just checked it just now with trace off with the same results. even setting it to force a get comes back 406. Not to mention if you actually look at the commandline I posted both times, I had lynx already set to force HEAD '-head'
deepsand
12-18-2010, 03:36 PM
Most peculiar.
I just now rechecked it on http://web-sniffer.net/ , using every User Agent provided for there, with the same results.
Twilight Zone? Or, Outer Limits?
Addendum: Did a check using www.seoconsultants.com/tools/headers , using User Agent Lynx 2.8.6, and there get 406 for "GET," "HEAD," and "POST."
williamc
12-18-2010, 03:45 PM
here ya go:
http://www.seo-shop.com/savant.php?url=http://savantcreativegroup.com
<?php
$url = $_REQUEST[url];
$command = "lynx -dump -head '$url'";
echo "<p>" . $command . "</p>\n";
echo '<pre>' . `$command` . '</pre>';
?>
also try variations such as http://www.seo-shop.com/savant.php?url=http://www.savantcreativegroup.com
now do you see why it is giving the 406? :)
deepsand
12-18-2010, 03:48 PM
Appears that it could be a Lynx specific problem; see http://www.google.com/search?q=lynx+406+error&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a .
406 Not Acceptable Error
I received an email from Joe Clark a few days ago titled Your site is borking in Lynx and it stated he received a 406 Not Acceptable error every time. I used the Yellowpipe Lynx Firefox plugin (or you could just use the Yellowpipe online Lynx viewer) and he was entirely right. Any ideas why this would be so?
My own cursory investigation shows that my accounts on the newer PHP5 server throw the 406 Not Acceptable error in Lynx every time. Whereas my other accounts on a different server using an older version of PHP work perfectly fine.
This could even be caused by the Apache module mod_security if its filter list includes the word lynx, although the .htaccess fix for that didn’t work. Apparently the mod_security filter setting inclusion of lynx isn’t that uncommon, although I’d have to ask why Lynx winds up on that security list.
savantcreative
12-18-2010, 03:48 PM
I just finished up an online chat with Hostpapap. I have pasted the conversation below. What would you do?
Can you please run lynx on the site with a full head check like this:
lynx -head 'http://savantcreativegroup.com'
Why is my site is returning a 406 (not acceptable) header instead of the proper 200 (success) header.
SebastienV
maybe due to mod_security, you should ask you webmaster and we can disable it for you if that's the issue
Bruce
I am the webmaster
SebastienV
do you think this could be related to mod_security ?
Bruce
No.
SebastienV
then I don't
SebastienV
know
Bruce
What is mod_security?
SebastienV
to me, it looks like it's because there's a redirection on http://savantcreative.com/ to http://savantcreativegroup.com/
Bruce
What do you mean?
SebastienV
some kind of firewall
SebastienV
“ModSecurity is an open source intrusion detection and prevention engine for web applications. Operating as an Apache Web server module, the purpose of ModSecurity is to increase web application security, protecting web applications from known and unknown attacks
Bruce
I have no firewall. I simple created a static site and uploaded to your server
SebastienV
mod_sec is enable by default on every accounts at hostpapa
SebastienV
usually the webmaster knows if he needs to have it disable or not
SebastienV
if you'd like to disable it (to try if it solves the problem)
SebastienV
just submit a ticket on the help centre and tech-support will do that for you
Bruce
I have never heard of this. What is involved in disabling it?
SebastienV
by disabling it, security is not as good
Bruce
Can you explain a bit more
Bruce
What kind of security?
SebastienV
I am sorry we do not provide this level of support, I can't help you to developp this site or fix the code. but I can forward your ticket to tech if you want to try disabling it
Bruce
There is nothing wrong with my code
SebastienV
I am ont saying there's
SebastienV
not saying sorry
SebastienV
we just don't provide that type of support, but if you ask me... I think you should give a go to disable mod_sec
SebastienV
and see how it goes
Bruce
How will I know if it works
SebastienV
you shouldn't be getting 406 then
Bruce
how long will it take to show
SebastienV
open a ticket > we make the changes > you test and you know
Bruce
what should i request?
SebastienV
in the subject line put "Disable mod_sec (your-domain.xxx)
Bruce
Is that it? Will it be done immediately?
SebastienV
yes, don't forget to say something in the message itself (hello, or can you please do this) as it can't be empty
SebastienV
I'll forward it immediately
williamc
12-18-2010, 03:52 PM
Appears that it could be a Lynx specific problem; see http://www.google.com/search?q=lynx+406+error&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a .
Checked that as well. wget returns the same.
Savant as soon as they reply that it is turned off, let me know and you can check it at this url:
http://www.seo-shop.com/savant.php?url=http://savantcreativegroup.com
it will return a status of 200 (Ok) when it is right.
savantcreative
12-18-2010, 03:59 PM
sorry this was a double entry
williamc
12-18-2010, 04:00 PM
Here is a transcript of the conversation with them. What would you do?
I replied to this up above here: http://www.webproworld.com/webmaster-forum/threads/105417-Problems-Getting-Site-Relaunched-Crawled?p=543107&viewfull=1#post543107
savantcreative
12-18-2010, 04:00 PM
Is it ok to turn it off? What does it do?
Thanks
williamc
12-18-2010, 04:01 PM
Is it ok to turn it off? What does it do?
Thanks
It is meant to keep some spiders away mainly, but turning it off should not bother your site or its performance at all. The simple fact that it has both Lynx and wget in its list of bad spiders would be enough for me to shut it off, if I used it, which I do not.
savantcreative
12-18-2010, 04:10 PM
Will do. Is there somewhere I can read about what all this means?
Thanks for all your help!
williamc
12-18-2010, 04:26 PM
Will do. Is there somewhere I can read about what all this means?
Header fields
http://en.wikipedia.org/wiki/List_of_HTTP_header_fields
Status codes
http://en.wikipedia.org/wiki/List_of_HTTP_status_codes
savantcreative
12-18-2010, 05:00 PM
Thanks. I will check this out ASAP. The hosting company still didn't make the changes. Do you know of any good GREEN hosting providers you can recommend?
williamc
12-18-2010, 05:10 PM
What exactly do they say makes them a 'Green' hosting provider? All servers take energy, all hardware takes energy.
And do you realize that in actuality, most 'Green' energy sources actually take more fossil fuels than just running from fossil fuels, just in the 'creation' of the technologies?
savantcreative
12-18-2010, 05:15 PM
Because I do green advertising and marketing for companies I work with printers that use recycled materials, FSC certified paper, and non petroleum based inks. I need to host my site as well as client sites who power their servers with energy from sources like solar and wind. That is how I picked the provider I just switched to but I am less than thrilled so far. You have provided me with much better info than their support staff has :)
williamc
12-18-2010, 05:24 PM
Because I do green advertising and marketing for companies I work with printers that use recycled materials, FSC certified paper, and non petroleum based inks. I need to host my site as well as client sites who power their servers with energy from sources like solar and wind. That is how I picked the provider I just switched to but I am less than thrilled so far. You have provided me with much better info than their support staff has :)
Okay, those are good reasons that I can understand well enough. Pleasing clientbases is always a good thing. You should be able to find a site that does reviews on green hosting somewhere. I have no clue as I have never looked for a 'green' host. If you had provided any answer other than the one you did I had this ready to go for it:
Fun Facts: Green energy sources such as photovoltaic cells, turbines and steel used in wind power frames and turbines are still manufactured using fossil fuels, they are then shipped from long distance locations generally because very few manufacturers make them currently (many overseas) via vehicles (ships, trucks, trains, planes, etc.) that are powered by fossil fuels. Then, the fact that they are not currently made to last very long, involves buying more of the units, causing more fossil fuel usage in the manufacturing and shipping and you get a roundhouse effect that makes any savings of the ecosystem null and void.
Inks and things like that that use pre-existing cartridges however, I can see as actually being 'green' basically, so that's good on your clients. :)
Found this from a simple search: http://webhostinggeeks.com/greenwebhosting.html
deepsand
12-18-2010, 05:32 PM
Thanks. I will check this out ASAP. The hosting company still didn't make the changes. Do you know of any good GREEN hosting providers you can recommend?
You might take a look at the list at http://www.hosting-review.com/hosting-directory/top-10-lists/Top-Green-Web-Hosting-Companies.shtml?gclid=CI2S9JHt9qUCFUGo4Aodkiq2nQ ; and, then, in a separate thread, seek input from other members re. their experiences with said hosts.
savantcreative
12-18-2010, 05:35 PM
Good points made here. All things involve some kind of compromise and many things are presented using logic that collapses under scrutiny.
Common sense and gut usually point out the best way to go.
Thanks for the link. The only one I have used is DreamHost and am not that thrilled with them either. I will check the other ones out. Strangely enough, the company I am using ranked very highly in a couple of sources I have used. Besides reliability, customer support is most important to me.
savantcreative
12-18-2010, 05:36 PM
Thanks. Ironically HP is at the top of this list. I will do some more research.
deepsand
12-18-2010, 05:42 PM
It is meant to keep some spiders away mainly, but turning it off should not bother your site or its performance at all. The simple fact that it has both Lynx and wget in its list of bad spiders would be enough for me to shut it off, if I used it, which I do not.
If Lynx and Wget are not to be trusted, then one has to wonder why only them?
deepsand
12-18-2010, 05:45 PM
Thanks. Ironically HP is at the top of this list. I will do some more research.
One on that list that has gotten a lot of mention on-line whenever the subject of hosts comes up is HostGator. To the best of my recollection, most, although not all, have expressed a positive opinion.
williamc
12-18-2010, 05:47 PM
If Lynx and Wget are not to be trusted, then one has to wonder why only them?
Exactly, seeing that they are both simply a means to pull documents down, LWP for perl, include(), require(), file() and file_get_contents() and cURL for php are as well used in spiders as lynx or wget. Makes no sense at all.
savantcreative
12-18-2010, 05:48 PM
The only thing that pissed me off about them is that when you use their contact form script they try to sell their seo services at the bottom.
williamc
12-18-2010, 05:48 PM
One on that list that has gotten a lot of mention on-line whenever the subject of hosts comes up is HostGator. To the best of my recollection, most, although not all, have expressed a positive opinion.
HostGator is not a terrible host and their support is decent enough as well, IMO.
savantcreative
12-18-2010, 05:53 PM
Should I install lynx or wget on my Windows machine? Is that the best way for me to use them?
deepsand
12-18-2010, 05:58 PM
Should I install lynx or wget on my Windows machine? Is that the best way for me to use them?
What do you mean? You don't have to use them in order to allow them to access your site.
As a separate matter, you may very well find Lynx useful. Being a text only browser, it gives you a good view of what bots see as being a page's content after it has parsed out the code.
It's also useful for exploring sites who trustworthiness is in doubt, as it won't run any scripts.
savantcreative
12-18-2010, 06:09 PM
What do you mean? You don't have to use them in order to allow them to access your site.
As a separate matter, you may very well find Lynx useful. Being a text only browser, it gives you a good view of what bots see as being a page's content after it has parsed out the code.
It's also useful for exploring sites who trustworthiness is in doubt, as it won't run any scripts.
I mean, since it is a browser, should I install it like I have with the other browsers I use to check my sites out with?
deepsand
12-18-2010, 06:17 PM
As noted, Lynx is a text only browser; i.e., it has no GUI, but simply displays plain text in a DOS box.
See http://en.wikipedia.org/wiki/Lynx_%28web_browser%29 and http://www.vordweb.co.uk/standards/download_lynx.htm .
And, Wget is not a browser at all, but a file retriever using the HTTP, HTTPS and FTP protocols.
See http://www.gnu.org/software/wget/ .
savantcreative
12-19-2010, 09:52 AM
I finally see what you have been talking about. I got Lynx working and see the issue you mentioned. I checked the two other sites that I have hosted with HP and received the same 406. I then checked a bunch of other sites I have done this year that are hosted with other providers and all of the come back 200. My ticket still remains open with HP even though they said they would take care of the issue ASAP. I will follow up with them again. Thank you for all the help you are providing me :)
Do you think they are being overly secure or just lame?
deepsand
12-19-2010, 10:37 AM
Thank you for all the help you are providing me
And, thank you for actually sticking with us here, rather than vanishing after 1 or 2 posts like so many do. It's frustrating when you try to help someone, only to hear the sound of your own voice, not knowing if anything was accomplished.
Do you think they are being overly secure or just lame?
Definitely irrational to block those two, and only those two.
Perhaps it's a case of blindly doing what others have done before, on the assumption that they must have had a good reason for it.
And, it may even be the case that, at one time, such concerns were justified. However, like all applications, web servers evolve, vulnerabilities are mitigated, so that that which was once a threat is no more.
As various searches for "406" show, the problem with Lynx and several other applications has floating been around for years now, making the odd appearance here and there, suggesting that it may indeed be a remnant of times past that has yet to be fully eradicated.
williamc
12-19-2010, 10:45 AM
Do you think they are being overly secure or just lame?
Both, IMO.
savantcreative
12-19-2010, 10:45 AM
I stick around. I just needed some time to digest and learn about what you guys were telling me.
The other issue with HP is that I have never waited so long to have sites indexed before. A site that I put up there a couple of month's back still is not fully indexed and it's a small site. Even my site is not fully indexed by Yahoo yet. I am wondering if it is due to the fact that the spiders are having difficulty. Do you think this is possible too?
savantcreative
12-19-2010, 10:49 AM
Both, IMO.
Unfortunately I paid for a year of hosting for myself and the 2 clients I mentioned. IF I can get this straightened out with my account I will put in the same tickets for the 2 clients and then TRY to stick out the year unless I have other problems.
williamc
12-19-2010, 10:53 AM
I am wondering if it is due to the fact that the spiders are having difficulty. Do you think this is possible too?
Let me give you a little background here and a small story...
I am a programmer, I mainly code tools, most of which spider websites, and search engines at times. Whenever I create a spider, I build in redundancy, meaning that I use numerous methods of grabbing any given document/page. In the event the main fetch function fails, it then steps down to the other methods until it can grab the page. However, rather than spending a lot of time on a page my spiders mark a entry as unfetched, and the next time it is found in the iterations it then goes to the next method, until either all pages are fetched, or all methods have been tried without success. This method means I generally get a 99.99% success rate.
What the above means to you
I am sure google has similar flexibility in its systems, so it stands to reason that the more tries it has to make to your site to fetch the page properly (200 Ok status), the longer it is going to take to fetch your entire site.
In other words
Yes, I think it is more than possible this affects the search engines.
deepsand
12-19-2010, 10:57 AM
The other issue with HP is that I have never waited so long to have sites indexed before. A site that I put up there a couple of month's back still is not fully indexed and it's a small site. Even my site is not fully indexed by Yahoo yet.
By what evidence do you reach such conclusions?
I am wondering if it is due to the fact that the spiders are having difficulty. Do you think this is possible too?
What do your server logs show re. bot activity?
From a functional standpoint, the only factor involving the host that should be expected to come into play here would be bandwidth. If it's too small, such that the bots have trouble getting stuff quickly enough, the well behaved ones will throttle back their attempts.
savantcreative
12-19-2010, 11:03 AM
If I can get THIS issue straightened out with them do you think I am crazy to hang out for the year?
williamc
12-19-2010, 11:13 AM
Not if everything else with them has been satisfactory. If it has then why change, just make sure any new sites you add to their servers have this issue taken care of immediately. The fact that it has been almost 20 hours with no support ticket reply tho, means I would be already gone, but that is just me.
deepsand
12-19-2010, 11:14 AM
No business, hosting companies included, is without warts.
If the service itself is satisfactory, and bearing in mind that the 406 error is not material to your needs, as it affects no SE's bots, it would be foolhardy to decide in haste.
savantcreative
12-19-2010, 11:21 AM
I just got a response that my ticket has been forwarded to a tech person and that they will be in touch shortly.
@deepsand-What do you mean that 406 error is not material to my needs?
williamc
12-19-2010, 11:24 AM
@deepsand-What do you mean that 406 error is not material to my needs?
He means that apparently it is not stopping Googlebot from crawling your site.
That said, Googlebot is a bot and does pay attention to http statuses, so it actually may be material at times currently.
Good that they finally got back to you.
savantcreative
12-19-2010, 11:39 AM
Thanks. I think I might as well make it as easy for Google to do its job as possible.
deepsand
12-19-2010, 11:46 AM
Unless googlebot deliberately disguises itself as the User Agent "lynx" or "wget," it will have no problem.
savantcreative
12-19-2010, 12:00 PM
Unless googlebot deliberately disguises itself as the User Agent "lynx" or "wget," it will have no problem.
Do you think it can slow the spiders down a bit? Since I am still not fully indexed with Yahoo I wonder if it is this or Yahoo.
deepsand
12-19-2010, 12:10 PM
The 406 issue cannot affect any User Agent other than "lynx" or "wget."
Google identifies itself as User Agent "googlebot," Yahoo, as "slurp," and Bing as "msnbot."
BTW, understand that Yahoo's SERPs are now derived from Microsoft's Bing.
Again, what evidence leads you to your conclusions re. what is and is not indexed?
savantcreative
12-19-2010, 12:24 PM
So why would I concern myself with it at all?
This all started when the googlebot was received page already updated message 304 instead of 200 after posting my new site.
I understand that Yahoo is being fed from Bing.
Pages are not showing up in Yahoo site explorer.
deepsand
12-19-2010, 12:38 PM
So why would I concern myself with it at all?
This all started when the googlebot was received page already updated message 304 instead of 200 after posting my new site.
As earlier explained, the 304 was an artifact of an If_Modified_Since check gone awry, a matter that is wholly independent of the 406.
Pages are not showing up in Yahoo site explorer.
Absence of evidence does not constitute evidence of absence.
Have you tried doing a search for the URLs(s) in question on Bing?
savantcreative
12-19-2010, 12:58 PM
Got ya.
Yes and Bing has everything indexed.
deepsand
12-19-2010, 01:47 PM
Not unexpected.
Be it Google, MSN/Bing, or Yahoo, webmaster data are always suspect.
savantcreative
12-19-2010, 01:50 PM
Yes. I have and continue to see some wild things in the tools and analytics :(
savantcreative
12-20-2010, 06:10 PM
Thanks wiliamc and deepsand! My hosting company finally made the security change this afternoon and I just finished checking my site with lynx. Everything looks OK to me. I even made some changes to the site because of some jquery stuff I am using. I think I improved the SEO. You guys are the best.
williamc
12-20-2010, 06:32 PM
Anytime at all :)
deepsand
12-20-2010, 07:35 PM
And, thanks for the feedback. It's good to know that our time was well spent.