 |

09-14-2004, 07:36 PM
|
|
WebProWorld Member
|
|
Join Date: Apr 2004
Location: Australia
Posts: 30
|
|
HTML vs PHP in Google SERPS
I have heard and read differing views on this subject. If you would like to have a page that ranks well and is optimized for Google is there any difference between index.html and index.php assuming they both had the same content? Would the HTML page rank higher just because it is ending in .html?
I know that having a lot of extra data following a ? character on the URL is not attractive to Googlebot so we are talking about just straight page URLs here.
Opinions are good but facts would be better. :)
|

09-15-2004, 01:13 PM
|
|
WebProWorld Pro
|
|
Join Date: Mar 2004
Location: UK
Posts: 202
|
|
At the SES conference in London I attended a session where the panel member from Google said that the page extension itself will not affect ranking. As you've said, long dynamic URLs with lots of parameters may suffer, but that's not because the pages are PHP or ASP pages, it's partly because the SEs consider some dynamic content to be so fickle that it isn't worth indexing... or so they say.
Empirically, I have sites using .ASP, .HTML and .PHP extensions. All rank similarly.
|

09-15-2004, 11:55 PM
|
|
WebProWorld Member
|
|
Join Date: Apr 2004
Location: Australia
Posts: 30
|
|
Well that is pretty good getting it straight from Google as it were. I suspected as much, with some of my sites getting quite decent PR using PHP pages. Conflicting opinions seem to abound on the many forums I've investigated on the subject though. Overall, it would appear that there is enough evidence to support staying with PHP.
Thanks.
|

09-17-2004, 08:41 PM
|
 |
WebProWorld 1,000+ Club
|
|
Join Date: Aug 2003
Location: Central US
Posts: 1,581
|
|
Dynamic Url's are only part of the problem. Take this forum for example and the page you are looking at right now. There can be 5 or 6 different dynamic Url "patterns" that will bring up the exact same page.
Each SE is different in their approach, Google being the most sophisticated. Some SE's will not even index dynamic Url's with more than 3 parameters in them, while some just will not do it at all. In Google's case, they will eventually filter out the duplicate entry Url's and drop them from the index. But this takes time.
Even modding the board to make the pages appear to be static Urls -- the fact is that you will still have a few that point to the same page. Effective use of a robots.txt file can eliminate these duplicates and move the spider to the true page for indexing. This will also speed up getting those pages into the index and not waste the time of the spider on duplicate content.
|

09-17-2004, 09:09 PM
|
 |
WebProWorld Veteran
|
|
Join Date: Sep 2004
Location: Las Vegas
Posts: 570
|
|
Dynamic Driven Shopping Carts
< self edited >
|

09-17-2004, 10:40 PM
|
|
WebProWorld Pro
|
|
Join Date: Aug 2004
Location: the land of lost content
Posts: 224
|
|
Interesting - but not proof positive that html is rated higher than php.
My guess is that the content of each of your pages is more individually crafted.
Maybe google is penalising THEIR pages for being too similar.
I get this effect with my shopping cart - Google shows only a selection of my pages, whereas I have seen other dynamic sites get nearly all their pages listed coz they pay attention to the content on EACH page.
So I believe I simply need to create better descriptions for each one of my products - hard work, that may pay off.
|

09-18-2004, 12:48 AM
|
|
WebProWorld New Member
|
|
Join Date: Aug 2003
Posts: 3
|
|
Quote:
|
Originally Posted by buddhu
Empirically, I have sites using .ASP, .HTML and .PHP extensions. All rank similarly.
|
I have .asp pages that come up as the number one search result in Google. They're static pages, not dynamic, so there's no long string of parameters.
|

09-18-2004, 04:24 AM
|
 |
WebProWorld Veteran
|
|
Join Date: Sep 2004
Location: Posse's On Broadway
Posts: 953
|
|
not the issue
I think most of us have found rewrite to change the issue....The matter now is if the spiders can sense the rewrite and treat them like pages trying to look other than they are. Long load times may give away heavy dynamic pages.
I have gone both ways, and so far im yet to see the PHP do as well in overall ranking, but it will take time to know for sure. All my pages appear to be html now, and are getting picked up slowly, but I think the delays are mostly googbot being a slow dog as of the last few months, too lazy to go deep into anything. I think PHP has changed the playing field, and when rewrite is added to the mix i expect there have been tons of algo changes to meet the issue.
|

09-18-2004, 06:08 AM
|
 |
WebProWorld 1,000+ Club
|
|
Join Date: Aug 2003
Location: Central US
Posts: 1,581
|
|
Re: not the issue
Quote:
|
Originally Posted by hoptoo
I think most of us have found rewrite to change the issue....The matter now is if the spiders can sense the rewrite and treat them like pages trying to look other than they are. Long load times may give away heavy dynamic pages.
I have gone both ways, and so far im yet to see the PHP do as well in overall ranking, but it will take time to know for sure. All my pages appear to be html now, and are getting picked up slowly, but I think the delays are mostly googbot being a slow dog as of the last few months, too lazy to go deep into anything. I think PHP has changed the playing field, and when rewrite is added to the mix i expect there have been tons of algo changes to meet the issue.
|
This is a load of garbage. Sorry, but it makes no sense.
One, long delays can happen on static .html pages due to high server load on cheap hosting and googlebot will no more wait for them than anyone else. Delays have nothing to do with whether they are php or not.
If you are using extensive (or poorly planned) mod_rewrites that add undue burden on a server, or your server is just not equipped to handle them in the first place -- then I highly suggest that you look for a new host or upgrade your package to handle the extra burden. This burden can be high traffic load also, in which case you should also upgrade or move your site.
Second, googlebot is not a slug. If it is not crawling your site in a timely fashion, then it is for some other reason besides being php pages. Usually it is because there are not enough backlinks pointing at your site to warrant it or you do not have new content on the home page or new pages being added.
Googlebot is an efficient spider and if it does not see changes, then it does not crawl as often. One key thing you can do is to have a constantly changing Home Page which provide the bots with links to your fresh content. If those links are buried one or two clicks away from the home page, then you will have to wait for googlebot to crawl that page to spot them -- in this case, you are at the mercy of the bot schedule to refresh those pages in its index.
I administer a forum that googlebot crawls daily. Sure there are lapses in the crawl schedule, but on the whole it crawls over 1000 Urls a week (notice how I did not say 'pages' ..hehehe) Google is sending about 80 referrals a day. All of this on a phpBB board that has not done any mod_rewrite or spider freindly Url modifications -- and phpBB out of the box has some of the worst Urls to crawl imaginable.
There is no algo change or anything mysterious about it. They do not "sense" anything as you put it, they will crawl you like a bat out of hell and post the results.
Googlebot cannot "sense" anything that is happening on the server-side, googlebot operates on the client-side. Unless you are using php to do something along the lines of cloaking or some other devious behaviour, you should not have any problems -- otherwise Google will find out via the spam report from your competitors.
|

09-18-2004, 07:48 AM
|
 |
WebProWorld 1,000+ Club
|
|
Join Date: Aug 2003
Location: Edmonton, AB, Canada
Posts: 3,406
|
|
According to Google, they pay almost no heed to the '?' in URLs, their bots adjust slightly to give pages time to generate. The less length the url is, the easier to spider. Session id and query strings cause problems, but not always insurmountable.
From AKA Marketing Forum
Quote:
3) Google will crawl dynamic URL's at about a third the speed and depth at which it indexes static pages. It will barely crawl at all if there are session IDs in the query strings, because it soon discovers that multiple URLs lead to the same page and regards the site as being full of duplicate content.
Another challenge dynamic sites throw at search engines is serving up different core content at the same URL. This might result when a site has content that may be viewed at the same URL in multiple languages, depending on the browser settings, or content, such as on a news site, which changes every few minutes.
|
Digital Point
Quote:
|
Dynamic URLs are fine with me too... I've seen as many as 30 Googlebots spidering this forum (which of course has dynamic URLs) at once.
|
More at Digital:
Quote:
Interesting Observation Regarding Dynamic URLs
--------------------------------------------------------------------------------
Google had this page in its index for about 3 months but it would not spider/index the actual articles listed on the page:
http://www.equitysafeteam.com/articles&type=buying
Early last week I decided to put in an isapirewrite rule to make all these links to the individual articles appear to be static.
BAM! All the pages are now indexed.
Here is an example of how a link was before and after the change:
|
Here is a link to some info about 'mod_rewrite' for apache hosted sites:
http://www.webstractions.com/news/20...s-regular.html
From what I've been reading the last few days, it is the characters 'id' in an url that cause most spiders to balk. It is a matter of being wary of 'spider traps'.
I am just getting a shopping cart up and I have just left in cat numbers in the generated pages: /customer/home.php?cat=249
The order pages and sign-ups for newsletters etc. all still have session id.
I get the links generated for the menu with just the cat number, and put a link to a static site map as the first link in the html code.
This page also has all the urls in static html. The pages may be generated on the fly, but the urls are still the same.
It is X-cart, and my first experience with this, so I can't offer specific info for other carts.
It seems that there are ways to handle asp and cfm pages as well with a mod rewrite sort of thing
We will be watching closely to see how this works out. I can also generate a static html version of the catalogue and we may try this also.
__________________
What I am is what I am, are you what you are, or what.
Eddie Brickel
|

09-18-2004, 07:52 AM
|
 |
WebProWorld 1,000+ Club
|
|
Join Date: Aug 2003
Location: Edmonton, AB, Canada
Posts: 3,406
|
|
Oops! Sorry dodger, I stand corrected on the session ID. I was aware of the spidering going on here, but hadn't looked at the urls!
http://www.bandofgonzos.com/phpbb/in...b5b701de1e3ac3
It may make differences to other bots?
__________________
What I am is what I am, are you what you are, or what.
Eddie Brickel
|

09-18-2004, 08:17 AM
|
 |
WebProWorld 1,000+ Club
|
|
Join Date: Aug 2003
Location: Central US
Posts: 1,581
|
|
No. You are looking at the page while logged in. SessionID numbers are turned off on all guest accounts (not logged in) which includes spiders. =)
|

09-18-2004, 08:29 AM
|
|
WebProWorld Pro
|
|
Join Date: Feb 2004
Location: Latvia, Europe
Posts: 175
|
|
Well, simply page rank is secondary in my opinion.
Of course it is needed.
I do my pages with php, but I am adding .HTML extensions to PHP as php parsed. first of all, it allows me to have .HTML pages. They look better than PHP and if I can, I try to export all these pages to reduce having urls with .php?param=3049
then better simply have file like /my-keyword.html
and inside you can use
<?
$param = 3409;
include("generic.php");
?>
and It looks like html.
Love this way to go.
|

09-18-2004, 08:29 AM
|
|
WebProWorld Member
|
|
Join Date: Jan 2004
Location: UK
Posts: 49
|
|
Hi,
I'm in the throes of a similar query myself regarding ASP pages. I have a site up for review at the moment because we're hardly getting ranked at all by any of the main search engines. We have a site map which is a list of our advertisers, one click from the homepage which has a link to every single listing page generated by our database. We have used a program to generate static URLs from our previous dynamic ones but the content just isn't being indexed. I analysed my pages with www.sitereportcard.com and it says they are full of broken links, yet I can click on every single one of them and they exist just fine.
I checked out my competition yesterday and nearly everyone at the top rankings in Google have HTML pages or static URLs without ASP in them. That may be our problem because our static URLs still have .ASP at the end. I think we may have to change over the main content pages of the website (excluding the database pages) to HTML just for starters to try to get some headway. It's so frustrating.
Heather
__________________
EarthFire
www.earth-fire.com - eMedia specialists. Lead your competitors.
|

09-19-2004, 04:39 PM
|
 |
WebProWorld Veteran
|
|
Join Date: Sep 2004
Location: Posse's On Broadway
Posts: 953
|
|
Thanks for the warm greeting
I was under the impresison the thread was in \regards to simply file extension, to which i posted that rewrite would seem to me to squash the issue. I can present any extension i want from any actual page content i want. So i want on to pose the question if there is another way that the spiders can tell what is php and what is HTML as it seemed to me to be resolved as a non-issue if it is being rewritten anyhow. If you took the post as anything other than this, then ill take that as the 'why' in your instant flame. Ill assume you have a middle finger on your welcome mat at home.
|

09-20-2004, 10:53 AM
|
|
WebProWorld Member
|
|
Join Date: Aug 2004
Location: UK
Posts: 42
|
|
I have a php site with a section in which the php extensions include 3 parameters. Initially Google didn't spider these pages, but at some point the site's PR rose, and those pages started getting spidered, without any changes on my part. So html is clearly more attractive to Googlebot in certain circumstances. It's capable of spidering multiple parameters, but may not always be inclined to do so.
|
| Thread Tools |
|
|
| Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|