|
|
||||||
|
||||||
| Index Link To US Private Messages Archive FAQ RSS | ||||||
| Search Engine Optimization Forum SEO is much easier with help from peers and experts! The WebProWorld SEO forum is for the discussion and exploration of various search engine optimization topics. Any non (engine) specific SEO or SEM topics should go here. |
Share Thread: & Tags
|
||||
|
![]() |
|
|
LinkBack | Thread Tools | Display Modes |
|
|||
|
The company I work for are about to upload a shedload of pdfs to the website and I would like them to be as optimised as possible. I've never gone about optimising pdfs before though.
I did search on this forum before posting but couldn't find any information about this topic at all. I have a couple of questions which I hope you can help me answer: Firstly, does Google read and rank pdfs pretty well? Secondly, what are your best pdf optimisation tips? Many thanks in advance for any help you can give me. |
|
||||
|
I don't thing GoogleBOT is able to read PDF documents (that would imply OCR like scanning ability).
The ranking is another question based on citations / references / that translates to IBL's in WWW. Write quality documents that other people link to with different anchor text. Other members may say, submit, write about your documents etc. etc. As always it is a question about semantics and context. Write and link where it is natural and useful for the surfer. Finding your own niche may be important. |
|
||||
|
As long as the PDF is rastorized (flattened) to an image and the text is embedded, it's pretty much a cinch.
There are a few caveats though. If you save out a PDF and apply security preferences to it, the contents are encrypted and can't be read by the Search Engines. If it doesn't matter that the end user can copy text or print the page, don't lock it up. Here are a couple of articles of interest: SEO Your PDF's - Does This Work? Optimizing PDFs for SEO Good Luck! |
|
||||
|
We've all been seeing PDF's and content snippets listed alongside web pages in the SERPs for some time, so it would only stand to reason that the PDF files are being spidered and indexed as well.
|
|
||||
|
Quote:
Quote:
Natural search term: does googlebot spider pdf documents site:google.com
__________________
Mini Network:: Financial information at your fingertips Learn object oriented programming where it started Last edited by kgun; 05-07-2009 at 11:56 AM. |
|
|||
|
Google does rank the PDF pages, however it is very useful to input all those optional fields under File -> Properties to help placement, so you need to use Acrobat or a full fledged PDF editor rather than just a PDF writer.
For a product specific search term with 453k results, our PDF page ranks 14th on Google and the HTML is listed underneath it. It might rank higher if we had a better page title.. The quality of copy etc I believe still applies, along with the basics of keywords. |
|
||||
|
Based on spidered file content is my question.
Quote:
I looked at some of the above hits that I suggested in my search term and could not find an article on Google.com that said that GoogleBot is able to scan PDF documents and rank content based on such a scanning.
__________________
Mini Network:: Financial information at your fingertips Learn object oriented programming where it started Last edited by kgun; 05-07-2009 at 01:44 PM. |
|
|||
|
They definitely rank. I have a client that had several pdf files that had pretty high rankings. I actually spent time replacing those rankings with other html pages. The reason? The files were too big. It took forever to download them and people just bounced off the site. The moral of the story for me is to create pdf files that have a pretty fast download.
|
|
|||
|
I would tend to put the content in flat HTML pages and have a .pdf hanging off that for download if required.
Why? HTML pages have your navigation around them and the opportunity to place calls to action. And you know HTML gets indexed and ranked |
|
|||
|
Sorry for being against the topic. What should one do if PDF's are not to be indexed? Do zipping a file will help it?
__________________
Download Free DVD Movies || Cold Sore Treatment || Best eBooks & Software Downloads || |
|
|||
|
Search engines can crawl & index PDF files. For optimizing PDF files few point are worth considering.
1. Use keywords enrich contents in the PDF file. 2. Proper use of <H1><b><I> should be in the PDF file. 3. Targeted keywords should be included in the page urls. 4. Use Acrobat 6.0 or above version for creating PDF files. 5. Dont use too much graphics/images in PDF files. 6. Keep PDF files length as less as possible. These are some tips from my side. |
|
||||
|
@innominds:
When you are creating/saving your pdf file - select the security tab from the application you are using and select the 'Encrypt the PDF document' option which will prompt you for a password. Your saved file can't be indexed by any search engine. |
|
|||
|
I just searched for an arbitrary PDF file on the web. I took a snippet of text from it and searched for that snippet in "" in Google. The document was found.
Many PDF files are offered as HTML. Does that not imply that Google can read them? I think they can. They don't rank as well in my opinion for reasons I don't understand or can explain. Just an opinion there. They do rank though. I will write pdf file and link to it from one of my sites. Put arbitrary content in it and see what happens as an experiment because it could be useful in the future. Oh and just to get KGUN to be a bit less confrontational in the future - |
|
||||
|
Google has been reading and ranking pdf's for years.
Optimising for a pdf is the same as optimising for any other form of document. Think target market, do keyword research, build keywords into content, build content into meta tags. I've got a 2005 pdf document out there that still - rather embarrassingly - ranks highly.
__________________
Simply Clicks | SEO | SEO Training| Pay Per Click Advertising | Search Engine Powered Marketing |
|
|||
|
PDFs can be optimised and do rank. I am just reviewing the rankings report for one of our clients. This firm publishes a regular newsletter that we re-save in Adobe Acrobat with title, description, keyword tags. Links can be added to the text in the document.
How well does it work? Currently, this client has several PDFs on page 1 and many others in the top 50 results. mark chapman. |
|
|||
|
Quote:
__________________
Peace, through superior firepower. "Roach" SAP Jobs : Search Engine Optimisation : SAP |
|
|||
|
Firstly, does Google read and rank pdfs pretty well?
Yes, most of the major search engines now can read the basic contents of PDF files, though getting these pages to rank as well as HTML files is still questionable. Secondly, what are your best pdf optimisation tips?The simple answer is, yes. The title tag and body copy can still be optimized and the major search engines will index it accordingly. As far as the Keywords and Description meta tags, well Google ignores this in PDF’s just as it does in HTML documents and Yahoo!, which does use the description tag, is only half way to where it needs to be. |
|
|||
|
We run a site for a client that has a lot of PDF White Paper downloads. All the main search engines, including Google can index PDFs.
Google doesn't really seem to give them as much priority in SERPS as HTML - they mostly come up in the results for longer tail / very topic specific searches, but they do still get returned some of the time. This site also has Google site search in it, and the PDFs routinely show in the results. Personally I don't think Google has put the same sort of effort into analysing / indexing PDF format with a view to ranking, or maybe PDFs don't tend to get a much Page Rank - probably fewer links etc. Have found that Live and Yahoo tend to show PDFs more often - I'd say Live are currently the best at being able to deliver relevant PDFs in results. Quote:
|
|
|||
|
Many thanks for all the replies. By the looks of the webpages I had been looking at before posting, it seems that this topic had been overlooked for a while - nice to get a fresh view on it.
Thanks for the optimisation tips too, very useful and I will definitely be putting them into practice! Edit: just seen Clarrie's post. That format was exactly what I had planned - a list of optimised intro snippets on one page with the PDFs all linked off. |
|
||||
|
Quote:
That is no proof that PDF documents are crawled by GoogleBOT.
__________________
Mini Network:: Financial information at your fingertips Learn object oriented programming where it started Last edited by kgun; 05-08-2009 at 08:35 AM. |
|
|||
|
From a user point of view I find pdfs frustrating; on my set-up (Mac Tiger/Safari) there's around a 50/50 success rate for them loading within the browser (I often have to download them first and then open them in Acrobat), and when they do work they load very slowly. Yes this maybe a browser issue, but there are many Safari users out there.
It doesn't matter what Google thinks if your visitors are put off by slow load times or documents failing to load altogether. If it is too great a task to convert all the documents to HTML it might be worth offering the first page of each in HTML with a link to the full document so your visitors know it's worth the trouble. |
|
||||
|
Quote:
|
|
|||
|
Did a test, searched for...Impact of Managed Care in
the Developmental Disabilities Sector... and PDFs come up peppered throughout the SERPs. Looks like copy is cited and Google gives the option to view the PDF as HTML. Don't know if this would prove that some level of "spidering" occurs. Some of our clients have successful, high level traffic for their PDFs. We do encourage them to develop HTML instead of, or in addition as support, the PDFs. IMO HTML equivalents will fair better in results. |
|
||||
|
I googled:
Impact of Managed Care in the Developmental Disabilities Sector Result HTML document starting like this: Ten Dimensions of Public-Sector Managed Care Michael A. Hoge, Ph.D., Selby Jacobs, M.D., Neil M. Thakur, M.Phil. and Ezra E.H. Griffith, M.D. First hit: Ten Dimensions of Public-Sector Managed Care -- Hoge et al. 50 (1): 51 -- Psychiatr Serv Second hit: http://psychservices.psychiatryonlin...nt/50/1/51.pdf Do you see any similarities? <quote> Looks like copy is cited and Google gives the option to view the PDF as HTML. Don't know if this would prove that some level of "spidering" occurs. </quote> My bolding. At least not the above example. This example comes under the same category as that mentioned in my above post. I am sure that there are much more complex examples if text from a PDF document is written on n other sites on the internet. Do you find that likely? Can <link rel="Canonical" href="http://www.yourdomain.com"> prevent it? No it can not, since the (stolen / duplicated) HTML document can rank higher as "the original document" (especially if the PDF version can not be spidered). So long I have seen no proof that it can.
__________________
Mini Network:: Financial information at your fingertips Learn object oriented programming where it started Last edited by kgun; 05-08-2009 at 11:55 AM. |
|
||||
|
I suggested the search and pulled up this link:
NLCDD Policy Insights Bulletin (March 2009).pub At the top of the page, Google inserted a message that it automatically creates HTML versions of PDF documents that it crawls. That should be enough proof. However, the PDF version loaded so much slower and I believe users would appreciate having an HTML version more. Also, as others noted, the HTML version can you have your navigation which I also believe gives the user a better experience. Especially when it comes to a search. If you open the PDf from the search it is much more difficult to get to the home page or other pages on the site. cd :O) |
|
||||
|
Quote:
<quote> This is the html version of the file http://www.nasddds.org/pdf/PolicyInsightsBulletin(March2009).pdf. Google automatically generates html versions of documents as we crawl the web. </quote> Where are these documents placed? I use Opera's principle and do not rely on any site on the internet. Do you rely on the above? Compare the hits: Google automatically generates html versions of documents as we crawl the web Google automatically generates html versions of documents as we crawl the web site:google.com Do you find any proof on google.com?
__________________
Mini Network:: Financial information at your fingertips Learn object oriented programming where it started Last edited by kgun; 05-08-2009 at 12:23 PM. |
|
|||
|
Does this link help answer it?
Official Google Webmaster Central Blog: First date with the Googlebot: Headers and compression In there is some text Googlebot: Website, let me give a bit more background. After actually downloading a file, I use the Content-Type header to check whether it really is HTML, an image, text, or something else. If it's a special data type like a PDF file, Word document, or Excel spreadsheet, I'll make sure it's in the valid format and extract the text content. Maybe it has a virus; you never know. If the document or data type is really garbled, there's usually not much to do besides discard the content. My understanding is that Googlebot visits a site, and crawls all the links for file types, it then decides to look at particular types of files that claim to be of a type, the Googlebot then interrogates the file further looking at the content and then decides on the algorithim how to rank or include in the index. Is this what you are looking for Kgun? or am i missing the point again? |
|
||||
|
Quote:
Quote:
I have seen enough nonsense from informal Google sites. Here is an example: AdSense: Support bad without responsibility? More precisely here: http://www.google.com/support/forum/...c9227b77&hl=en That is even from the Google.com forum. The people participating there are not Google emplyoee's. They have no responsibility on behalf of Google as far as I know.
__________________
Mini Network:: Financial information at your fingertips Learn object oriented programming where it started Last edited by kgun; 05-08-2009 at 12:45 PM. |
|
||||
|
Proof that google understands and spiders pdfs:
1. search google for manual .pdf 2. Click any of the pdf result's VIEW AS HTML 3. The resulting link is the PDF, presented as HTML, with all search words highlighted served from google's site. ( ie: The Manual The attached manual was located by the Manchester (England) Metropolitan Police during a search an al Qaeda) (Google is google) As far as ranking and optimizing for spidering, that's a different story. Not all PDFs are just a collection of images. Those originally generated directly from publishing layout software usually contains the text content with instructions on how to position and display it. If your pdfs are scans, consider running them through OCR so the text is in the file as, well, text. otherwise, could also use related text around and descriptive linking to the pdf, but that won't be nearly as effective. Here's some fun: find a pdf that has text that can be highlighted as text, change its extension to .txt and load it in your favorite text editor.
__________________
I liken SEO to voodoo and make a sacrifice of rum and decapitate a chicken to Papa Legba, spirit of communications and crossroads, before every site launch. Last edited by flhu; 05-08-2009 at 01:01 PM. |
|
|||
|
Ok, my link was for the google webmaster blog, however the link below is directly on the Google.com website
Site not appearing in search results, or appearing lower - Webmasters/Site owners Help That isnt a google forum message answer so I take it as "official employee" answer. The relevent part is Googlebot processes each of the pages it crawls in order to compile a massive index of all the words it sees and their location on each page. In addition, we process information included in key content tags and attributes, such as title tags and alt attributes. Google can process many types of content. However, while we can process HTML, PDF, and Flash files, we have a more difficult time understanding (e.g. crawling and indexing) other rich media formats, such as Silverlight. |
|
||||
|
Google:
"C++ Builder 2009 Professional::: Getting started PDF download." Results 1 - 1 of 1 for "C++ Builder 2009 Professional::: Getting started PDF download.". (0.33 seconds) Learn object oriented programming, at OopSchool.com C++ Builder 2009 Professional::: Getting started PDF download. Kjell Gunnar Bleivik 05.01.2009::: Computing or drawing? Today, French-Russian mathematician ... Learn object oriented programming, at OopSchool.com - It can not be viewed as an html document today, may 8 2009.
User-agent: * Disallow: error_log Disallow: .ftpquota Disallow: /cgi-bin/ Disallow: /include/ Disallow: /javascript/ Disallow: /styling/ and here #The following line allow html extension pages to act as php pages AddType application/x-httpd-php .php .html .htm #The following line removes the identifier telling the user that the page uses PHP #Header unset X-Powered-By #Only include this line once to enable the rewriting engine RewriteEngine on #Begrenser tilgang ## File paths are relative to the Document Root (/) # '404 Not Found' error ErrorDocument 404 /404.htm # '403 Forbidden' error ErrorDocument 403 /my.htm # '401 Unauthorized' error ErrorDocument 401 /401.htm # Or.. # ErrorDocument 401 "The webserver could not authorise you for content access. order deny,allow allow from all is my addon domain .htaccess Here is the most important parts of my main domain .htaccess #The following line allow html extension pages to act as php pages #AddType application/x-httpd-php .php .html #The following line removes the identifier telling the user that the page uses PHP #Header unset X-Powered-By #Only include this line once to enable the rewriting engine RewriteEngine on #Begrenser tilgang ## File paths are relative to the Document Root (/) # '404 Not Found' error ErrorDocument 404 /404.htm # '403 Forbidden' error ErrorDocument 403 "Sorry: We have no capacity to allow you access now. Please, try later. #ErrorDocument 403 "Sorry: We are upgrading our forum. Please, try later. # Or.. #ErrorDocument 403 /403.htm # '401 Unauthorized' error ErrorDocument 401 "The webserver could not authorise you for content access. # Or.. #ErrorDocument 401 /401.htm # # Managing server access # <Files "config.php"> Order Allow,Deny Deny from All </Files> # <Files "common.php"> Order Allow,Deny Deny from All </Files> order deny,allow deny from all # # White list start .............................. # White list end # The next line is commented out when the above white list is used. allow from all
__________________
Mini Network:: Financial information at your fingertips Learn object oriented programming where it started Last edited by kgun; 05-08-2009 at 01:41 PM. |
|
|||
|
You're right kgun, it is difficult to find definitive documentation from Google that they officially crawl the content of PDF files and rank specifically on content instead of IBL influence. I still believe they can do it.
There's only two links to the PDF at www dot nlcdd dot org/managedcare/policy-bulletin-short.pdf one from the site www dot nlcdd dot org as follows [a href="www dot nlcdd dot org/managedcare/policy-bulletin-short.pdf" target="_blank" title="PDF file opens in a new window."][img src="images/nlcdd-policy.gif" alt="NLCDD Policy Brief on Managed Care in DD Field - Click Here!" width="163" height="181" id="brief" /][/a] (linked from an image, not text. alt attrb. has "Managed Care in DD Field" not exact match to the search phrase may have something to do with IBL, I doubt it though) and one from a larger PDF at www dot nlcdd dot org/pdf/PolicyInsightsBulletin(March2009).pdf --from PDF doc-- "...which is posted on the website of the National Leadership Consortium on Developmental Disabilities at www dot nlcdd dot org/managedcare." So why would a search for "managed care in the developmental disabilities sector" bring this PDF up as #2 in SERP with snip of content including the phrase if Google hadn't crawled the document in some way? |
|
||||
|
Quote:
|
|
||||
|
I searched using Google.com. The messege was inserted by google.com.
I typed "google indexing pdf" into google's SE. And found tons of links, by sites that I would consider authoritive: Google Now Indexing Text Within Scanned Adobe PDF Files Official Google Blog: A picture of a thousand words? Google Does PDF & Other Changes - Search Engine Watch (SEW) So, Kgun, I'm not sure what "proof" you are looking for, but the evidence seems spot on to me. But, like I said, I think the user experience is better if the pdf is converted into html and then the pdf can be offered as a download for a potentially fancier, portable presentation/document. cd :O) |
|
|||
|
When a client insists on a PDF I will continue to suggest no security measures on the doc and that it have great content full of targeted keywords/phrases and links to their site(s), if possible. And I will stick to my theory that Google can crawl and index PDFs based on that content.
|
|
|||
|
I have some information on SEO for PDFs and a video here: Link building with PDFs
I'm not sure what damage password protecting a PDF might do to be honest after reading the above so maybe ignore that bit. |
|
|||
|
The phrase I searched for was at the bottom of the pdf. There were only 2 results returned. First place was the document found.
I didn't post links because in the other forums I frequent it is frowned upon. PDFs are crawled to be converted into HTML, they are indexed and they are ranked. |
|
||||
|
Please do the internet community a favor and talk your client out of using pdf's.
They suck. And they just keep getting worse. Here's a couple links: pdfs suck - Yahoo! Search Results ' (I changed it to the SE results becuase there's just so many sites out there explaining the numerous way pdf's aren't fit for use except in very, very limited circumstances.)
__________________
Take a break and watch some stupid video clips |
|
||||
|
I agree with texxs; PDF's are awful as web content. It seems everyone on this post is off on academic discussions on PDF's getting crawled or not when the first question to Vithe should be why all the PDFs?
PDF's were never meant to replace hmtl and just break peoples surfing experience. I often see them used out of pure laziness; people don't want to recreate marketing material in print and online formats... Now, for manuals and other documents that mostly get printed, yeah, use PDF's but for web content they are the ultimate party poopers |
|
|||
|
Well to give you a bit of background, my company has never uploaded its press releases to the web so we have loads (read: like 60) stored up.
My choices seem to be a) upload them as they are or b) take the content and make a load of brand new html pages. Unfortunately I'm not that technical - I don't ever create new pages, I have to get one of the IT guys to do that - but I can add to existing pages through the CMS. So, option a) would be fine for me to do because I can upload the pdfs using our FTP, whilst option b) would involve having to get one of our technical guys to spend ages creating new pages. What I was hoping was that PDFs could be read by search engines quite well so option a) would suffice and I could do the work myself. This is why I was planning to down the snippet + pdf route. I realise that this might seem a bit of a lame reason - and probably makes me look quite daft for those of you with much more technical knowledge - but the practical side of it is quite important for me. I really appreciate all of the comments that have been made here and I can see the benefits of adding the page in proper online format. I'm not keen on PDFs when I'm surfing the web either. I think I will try to go down the html route as it really seems that that is going to be the best for both search engines and visitors. Pity the poor overworked technical team Thanks for all your comments. I hope you don't think me too naive but my background is mostly as an SEO copywriter and I'm having to do some catch up on the technical side of things. |
|
||||
|
Quote:
Doing this will give you the best result, but it might be an awful lot of work.
__________________
FREE SEO ! Really? YES! All you have to do is implement it! Follow me on Twitter PeterIMC |
|
||||
|
Quote:
Hate to keep spouting off negativety, but pdf's aren't good for printing either. They only time you should use them is when you are required to have an encrypted document that the average person can't edit, as in a contract. Even then you should take the time to learn how to "lock" it in the software you created the document in the first place. the whole PDF thing just doesn't make sense to use EVER in my book: Here's a typical effiecent work flow:
Here's the PDf workflow
Now why does anyone use PDF's again? Please there must be some reason people use pdf's? Is is really because they Don't want to take 5 minutes to learn how to lock documents in their word processor? Seriously?
__________________
Take a break and watch some stupid video clips |
|
||||
|
Quote:
IMO, Because it is portable, platform independent. That was the main reason, at least as far as I understand, that Adobe made Acrobat reader and the professional version that can make PDF documents. I create them very fast in MS Word. When the Word document is finished, I click the PDF button and a document of 50 pages is created in a few minutes. My bolding. Another example of WWW rumor or fact? |
|
|||
|
Quote:
How about zipping the pdf file? Will it index it?
__________________
Download Free DVD Movies || Cold Sore Treatment || Best eBooks & Software Downloads || |
|
|||
|
now Mozilla also support pdf. may be Google also able to crawl pdf. content
__________________
Granite Worktops London |
|
|||
|
Quote:
I have never heard about using pdf's for search engine marketing, but I guess it could be done. It is certainly worth exploring.
__________________
Bill Platt - The Phantom Writers & Performance Based SEO. Bill Platt Services News |
|
|||
|
You could also put your pdf files into a special directory, then tell the robots in your robots.txt that the directory is off limits to them.
__________________
Bill Platt - The Phantom Writers & Performance Based SEO. Bill Platt Services News |
|
||||
|
Try out this free software A-PDF INFO Changer: A free utility for reading and changing properties of PDF files, includes author, title, subject, keywords.! [A-PDF.com] for your PDFs.
Create a PDF file properly, get it indexed and tell us what happen. Here are some tips that may help: http://searchengineland.com/eleven-t...-engines-12156
__________________
"Being an expert isn't telling other people what you know. It's understanding what questions to ask, and flexibly applying your knowledge to the specific situation at hand. Being an expert means providing sensible, highly contextual direction." Jeff Atwood SEO Workers - Search Engine Optimization Consulting Company | SEO Analysis Tool | Webnauts Net SEO Last edited by Webnauts; 05-11-2009 at 11:26 AM. |
![]() |
|
| Thread Tools | |
| Display Modes | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| 301 on PDFs? | PaulMycroft | Search Engine Optimization Forum | 3 | 07-10-2008 07:29 PM |
| Do PDFs help? | Kzajko | Google Discussion Forum | 2 | 04-25-2007 08:51 PM |
| Advice for selling PDFs? - need a very simple payment system | jkardos1 | eCommerce Discussion Forum | 9 | 01-25-2006 06:59 PM |
| Seeking Security Software for PDFs | exoticpublishing | Internet Security Discussion Forum | 0 | 09-30-2005 09:42 PM |
| Downloading PDFs from XP | ajpaulus | Web Programming Discussion Forum | 3 | 09-05-2004 11:55 AM |
|
WebProWorld |
Advertise |
Contact Us |
About |
Forum Rules |
MVP's |
Archive |
Newsletter Archive |
Top |
WebProNews
WebProWorld is an iEntry, Inc. ® site - © 2009 All Rights Reserved Privacy Policy and Legal iEntry, Inc. 2549 Richmond Rd. Lexington KY, 40509 |