View Full Version : Can a TXT sitemap URL be placed in the robots.txt file?
Clint1
07-29-2009, 03:38 AM
I've seen some putting their sitemap.xml file's URL in their robots.txt file, but is that allowed and will it work for a sitemap.txt file?
Thanks.
angilina
07-29-2009, 08:53 PM
At the end of your robots.txt file, you can simply put the sitemap URL like this:
Sitemap: http://www.yoursite.com
In Google webmaster tools, I see this message;
Line 2340: Sitemap: http://www...................... Valid Sitemap reference detected
What is the format of the text file? Is it simply a list of URLs, or is it an XML formatted document that just has a .txt extension?
Clint1
07-30-2009, 09:17 AM
Angilina, thanks.
What is the format of the text file? Is it simply a list of URLs, or is it an XML formatted document that just has a .txt extension?
No it's just list of URL's. A true plain text .txt file.
You know, I have always wondered if Googlebot would "see" and follow or index a URL that was just plain text, be it in a text file or a just in the body of a web page. I guess now we kind of have a way to test that, don't we?
Ok... tested. Seems to work without a problem. I would just make sure the document shows up in Webmaster Tools, and is listed as being indexed correctly.
Clint1
07-30-2009, 10:24 AM
You know, I have always wondered if Googlebot would "see" and follow or index a URL that was just plain text, be it in a text file or a just in the body of a web page. I guess now we kind of have a way to test that, don't we?
Yes my txt sitemap shows up in G searches for specific URL's. There are also numerous places on the web where my URL's are on webpages as non-clickable plain text, and those show up as IBL's. In fact, many times they are screwed up and misspelled, and those show up as invalid URL's in the WMT area.
Ok... tested. Seems to work without a problem. I would just make sure the document shows up in Webmaster Tools, and is listed as being indexed correctly.
How did you test that so quickly?
Yes my txt sitemap shows up in G searches for specific URL's. There are also numerous places on the web where my URL's are on webpages as non-clickable plain text, and those show up as IBL's.
Interesting. This is something I have tried to test, but never really found the results convincing. But for the plain text to show up as an IBL is pretty convincing to me.
How did you test that so quickly?
I posted it, had Google scan the file, and it was in the sitemap list in about three minutes, showing that Google recognized all of the URLs as valid.
Clint1
07-30-2009, 10:59 AM
At the end of your robots.txt file, you can simply put the sitemap URL like this:
Sitemap: http://www.yoursite.com
In Google webmaster tools, I see this message;
Line 2340: Sitemap: http://www...................... Valid Sitemap reference detected
If you're talking about you tested your robots.txt file in that area where you can paste it in the WMT tools area.....so far I've tried that 5 times and the page keeps crashing. :confused: As soon as I do anything to my text that's already there, the WMT page just locks up. So just to clarify; you put a sitemap.txt line in your robots.txt and you got a recognition for it from your G WMT?
Thanks.
Clint1
07-30-2009, 11:07 AM
Interesting. This is something I have tried to test, but never really found the results convincing. But for the plain text to show up as an IBL is pretty convincing to me.
Yeah, there's so many of the screwed up URL's that I had to add 301's from the botched URL's to the real pages. The Gbot will actually see the URL as it's typed on a page. Many times they appear as truncated, obviously copied/pasted from another webpage that truncates URL's, but, makes them clickable (as happens with links here). So the plain text unclickable version will show as something like:
http:// www . domain.com/whatever/whate......html (Spaces added here so the URL would show)
And that is the exact URL upon which Gbot will hit! With those ..... marks in it.
I posted it, had Google scan the file, and it was in the sitemap list in about three minutes, showing that Google recognized all of the URLs as valid.
Could you please elaborate on that? Where did you "post" it and how did you get G to "scan" the file? You put a temp URL in your robots.txt file and G picked that up in 3 minutes?
Clint1
07-30-2009, 11:44 AM
Regarding my post #9 above, I tried several more times in IE and it still crashed. I tried in FF and it also was "exhibiting strange sluggishness behavior". I couldn't add the line to the robots.txt text input area, but a full "Select all" then copy/paste from my file worked. After running the test, I see the "Valid Sitemap reference detected".
Thanks guys. ;) Hopefully with that, that will help my "Cached link in G results in infinite refresh loop when clicked" victimized page problem.
My guess would be that Google is timing out. I took a two-pronged approach to test my text file. I added it to robots.txt and made sure Google saw it (I find that if it has been several hours since Google last saw the file, Google will recheck it as soon as I go to the tool) then I manually submitted the file to make sure Google was able to see the URLs listed within.
I am not sure if this is better than submitting an XML version for your refreshing situation, however. The reason is that in the XML sitemaps you have a place to indicate when the file was updated - we already know Google knows the URL exists, what we need to tell Google is that the page has changed and needs a recrawl.
Clint1
07-30-2009, 12:37 PM
My guess would be that Google is timing out.
If you mean about the page crashing problem, no, it's a browser-type problem apparently caused by something on their "Settings" page where the robots.txt file is. Both browsers gave a "Not responding", but FF was finally able to work. There's something on or in the page, that's new, that's causing IE(6) to totally lock up. Nothing on the page is clickable once you put the cursor in the robots.txt text area and the hourglass just keeps showing. The only way to close the page is repeatedly clicking the X then XP will finally show that window saying "This page is not responding....". This only starting happening today.
I took a two-pronged approach to test my text file. I added it to robots.txt and made sure Google saw it (I find that if it has been several hours since Google last saw the file, Google will recheck it as soon as I go to the tool) then I manually submitted the file to make sure Google was able to see the URLs listed within.
Ok, what "tool"?
I am not sure if this is better than submitting an XML version for your refreshing situation, however. The reason is that in the XML sitemaps you have a place to indicate when the file was updated - we already know Google knows the URL exists, what we need to tell Google is that the page has changed and needs a recrawl.
Yeah I know, it's something I thought I would try. I have been making changes on the page and resubmitting it, but it's still showing that screwed up July 1 cached page. I haven't checked my logs to see if the Gbot hit on it lately but I will.
Ok, what "tool"?
The robots.txt test page in GWT.
Clint1
07-30-2009, 12:48 PM
The robots.txt test page in GWT.
Ahh. :facepalm: :D
angilina
08-02-2009, 01:40 AM
If you're talking about you tested your robots.txt file in that area where you can paste it in the WMT tools area.....so far I've tried that 5 times and the page keeps crashing. :confused: As soon as I do anything to my text that's already there, the WMT page just locks up. So just to clarify; you put a sitemap.txt line in your robots.txt and you got a recognition for it from your G WMT?
Thanks.
" So just to clarify; you put a sitemap.txt line in your robots.txt and you got a recognition for it from your G WMT?"
Yes, simply put the URLs of sitemaps at the end of your robots.txt file and you will then see the details in WMT.
You can find people talking about it here:
google.com/support/forum/p/Webmasters/thread?tid=5d8cd6f39c0a6237&hl=en
petefreitag.com/item/636.cfm
Clint1
08-02-2009, 01:58 AM
" So just to clarify; you put a sitemap.txt line in your robots.txt and you got a recognition for it from your G WMT?"
Yes, simply put the URLs of sitemaps at the end of your robots.txt file and you will then see the details in WMT.
You can find people talking about it here:
google.com/support/forum/p/Webmasters/thread?tid=5d8cd6f39c0a6237&hl=en
petefreitag.com/item/636.cfm
Yeah I got that done ok. See my post #11. But thanks for following up. ;)