View Full Version : Validated HTML - How valuable is this?
08-19-2003, 02:00 PM
I've seen this brought up in a few places, that properly validated web pages may gain better ranking with search engines, and I'm curious as to how much truth there is to such a statement.
I should think that it is in both the web site developer's best interest to produce clean and valid HTML to be assured that the site will render as expected in a standards compliant web browser. I also feel that a search engine would benefit in the very same way in that sites which are valid will render appropriately for the SE's visitors and not cause frustration which might well boil over to the SE itself for sending the visitor to that 'terrible page'.
That said, in order for a search engine to consider ranking based on validation, the SE would need either validate the pages itself or use specified clues to expect a page to be valid. (ie. The search engine might assume that seeing as there is a valid doctype, the document is valid per that specification.)
Does anyone actually know, do search engines acknowledge validated source? Do they validate themselves, or do they take a shortcut?
08-19-2003, 04:22 PM
Yes - always - even when there is a formating problem I put the research time in to appreciate a solution.
Compliance to standards is extremely important... even when is it not law to do so... in all likelihood it will be some day, and all that don't will be doing some back-paddling.
I also promote "Standard Compliance Services" -- seems like a good competitive advantage.
08-19-2003, 04:47 PM
I'm in complete agreement that everyone should comply to standards as a general rule, but does it help (or hurt if you don't comply or undestand how to) us with our SEO?
08-19-2003, 05:27 PM
Can I prove that statement -- not really.
1. 50 PhD's working at Google developing the engine are "academics" and academics by nature start by planning in accordance with some guide.
Obviously they could make their own guide but I tend to believe an existing guide is a good place to start.
2. validation solves formating problems. A WYSIWYG is a tool is does what you tell it, what it doesn't do so well is appreciate that you changed your mind. e.g. the table is in the wrong place, I'll move it.
Sometimes all the code moves - sometimes it doesn't and in the latter you can get extra code snippets that don't affect the visible content but it does affect the crawlability.
We often hear people saying "googlebot crawled and listed my mainpage two months ago, but nothing else... what's wrong!".
This can be a case of bot confusion - if the code doesn't make sense "to humans" -- a program that needs precise instructions to perform a task, must get confused alot, however we never really get to see, or can appreciate the results of "bad code", and many simply say this is too time-consuming to waste time on as I see no evidence that it helps.
I know of no other industry though where standards hurt.
08-20-2003, 01:18 PM
We validate all the pages at our site. Validation can help you correct any format problems you have even if you don't know you have them. It insures that your pages will view consistently with all browsers. Not to mention the fact that it also ensures accessibility by people with handicaps.
Validation and SEO. According to WebProNews newsletter dated 7/3/03,
Making Your Site Accessible Creates High Rankings
By Tiffany K. Edmonds
YES it does help.
As far as it being, THE LAW. Screw that, the government is already too big and wastes way too much money. And how will they regulate this? The way I see it, if web designers want to produce quality websites they will validate their code without any government involvement.
08-20-2003, 01:32 PM
Love your thought process there joliettech... screw the government! :-)
08-27-2003, 07:58 PM
I always use W3C's HTML and CSS validators to catch those simple typos and to check valid syntax.
Depending on client requirements, I may incorporate non-standard/browser-specific things as necessary, especially where the browsers don't interpret the HTML or CSS in a uniform fashion, but only as an exception and after I've eliminated as much as possible any (invalid) code that contributes to cross-browser compatibility issues.
10-01-2003, 11:57 PM
Hi I need some assistance with my document tag. I'm using frontpage to generate my pages but the doc tags are not included. I've tried to validate the pages but it always says incorrect doc type.
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
this seems to be the closest one but it still won't validate. I've tried all the tags I could find but nothing seems to work. Could someone please look at my site and suggest a tag.
First of all we need to clearly understand the the W3C recommendations are just that -recommendations- and not rules, and that most browsers are not even html compliant by W3C recommendations.
About 4 months ago I went through the validation process in three sites with some 250 pages, and corrected several "errors" which caused the pages not to validate (most common "error" - not having a blank alt tag on spacer gifs) and saw absolutely no changes in spidering and ranking, which confirms my suspicion that search engines are concerned about relevance and not syntax.
10-02-2003, 06:17 AM
Search engines do return their results based on relevance, but before they can return a website, it has to be in their database, and before it can be in the database, it has to be spidered.
Just like a browser, a spider can only read through valid html code. Imagine a browser decides that <title title="hello"></title> is a nice new title tag. Although that browser will display the title nicely, the bot would see "" as title, because it's not standard code. Same as if you put <title>theTitle</title the bot might not see the title again, bad closing tag.
Tables with too many or mis matched row/cell tags may also confuse the spider, and result in some valuable data being missed out the index. I see this with many sites,
Some Important data that uniquely idetifies this site.
This Is the most important data on my website that needs indexing.
Neither of those sentences may be seen by a robot that only works on compliant code, the first is not in a <td></td> tag, and the other is totally out of the table tag, you would not believe how many times I see these errors on webpages, and although IE can do a great job if ignoring the syntax error, a compliant robot or browser may not.
If you want an indication of how well Google can parse less than perfect html just submit the top ranking URLs in the results of Google searches to an html validator. I have run several such searches vs validations which lead me to my conclusion that valid html does not affect search engine rankings.
Try this random seach for instance in Google
dentists New York.
and you will see that none of the top ten results html validate using the W3C validator yet they rank top at Google.
10-03-2003, 05:41 AM
Oh I agree with that, whatever info it picks up, is what info it uses its complex text matching techniques on, it doesn't care wether you have alt tags, or if you use invalid attributes like margin= in your body tag. The robot might read the doctype so it knows how to parse code, on the other hand, it may not. The small spider I wrote for my own website doesn't care at all about syntax, it strips all tags and just looks for text, although bad tags could cause invalid stripping like </title might strip everything between <title> and <body>...
It is worth just running your code through a validator to check syntax, ensure all your page is being seen, just as part of your task in making yourself search engine friendly.