 |

07-31-2005, 10:43 PM
|
|
WebProWorld Pro
|
|
Join Date: Oct 2004
Location: NYC, USA
Posts: 148
|
|
Program or script to strip links from html
I need a program or script that will take an html file or a section of html and remove all the links--i.e., all the <a href....... you know what I mean.
Problem is that I have a public-domain document I want to add to my site (US-gov-created) but they did that annoying think of hyperlinking every term of any importance to other web pages and I just want to take them all out.
I do not, however, want to remove their HTML tags since the HTML is perfectly standards-compliant with nothing weird.
I would prefer a solution that leaves the anchor text but only removes the anchor tags.
Any ideas?
|

08-03-2005, 05:03 PM
|
|
WebProWorld New Member
|
|
Join Date: Nov 2003
Posts: 3
|
|
You might try downloading the source script into a MS Word document, hi-liting the entire page after download/copy/paste/remove hyperlinks.
Corection: Hi-lite the entire document then ctrl-z. this will remove all hyperlinks and should leave the html alone
__________________
Visit The Las Vegas Gambler on line at http://www.LasVegasGambler.net for info on how to play our games and how to win with casino stocks.
|

08-04-2005, 04:53 AM
|
|
WebProWorld Veteran
|
|
Join Date: Aug 2003
Location: Cornwall, UK
Posts: 862
|
|
This isn't pretty but does strip tags (requires PHP 4.3.0 or later):
Code:
<?php
$data = file_get_contents('http://www.tolranet.co.uk');
$data = str_replace('</a>', '', $data);
$data = preg_replace('/<a[^>]+href[^>]+>/', '', $data);
// Write to browser
echo $data;
// Write to file
$f = fopen('noatags.html', 'w');
fwrite($f, $data);
fclose($f);
?>
Put that into a file e.g. strip.php and change the URL of the page to strip from www.tolranet.co.uk to whatever you want, you don't have to use a URL you can use the name of a file on the same machine.
When you access the script from a browser it will load the page at the URL without the <a> tags, and save it to a file called noatags.html (assuming you have write permissions for that file).
Any problems with it let me know.
|

08-04-2005, 10:12 AM
|
 |
WebProWorld 1,000+ Club
|
|
Join Date: May 2005
Location: Norway
Posts: 5,402
|
|
Should be simple / effective in C++, C# or Java.
Simple effective solution if you have the tools and expertice in your organization:
It is simple stringmanipulation (on a pointer / array).
More advanced:
A simple Database As MS Access have a "hyperlink" datatype.
1. Parse the code and put the URL's into Access
(URL in one field and anchor text in text field
- or was it record?)
2. Now it is easy to sort etc.
3. By combining it with Visual Basic for Applications and (embedded) SQL you should be able to write programs that does the operations you need on the database.
http://www.techonthenet.com/access/f...reate_date.php
C# is perhaps the most effective language. Its advanced Intellij functionality makes it extremely productive.
Some helpful links:
http://www.c-sharpcorner.com/code/20...ngLanguage.asp
http://msdn.microsoft.com/vstudio/
http://groups.msn.com/
http://www.microsoft.com/communities...ortalHome.mspx
http://msdn.microsoft.com/chats/
http://forums.microsoft.com/msdn/
Perhaps not a simple solution, but your question may be the top of an iceberg?
P.S. As gambler said, open the document in Word. Then you can save it in different formats and because of the DDE in MS Office, you may
1. Save it as plain text.
2. Import that text into Excel in different columns (depends on the separator for columns).
3. Save it as an Excel file.
4. Import it from Excel to Access as a database where you select fields and records.
5. Perhaps import it directly into Access from Word. Use Help to check for possibilties.
6. Import it from Access to Oracle, Sybase or MySQL etc. etc.
7. Combine it with a crawler and you have an autogenreated directory ala, http://www.craigslist.com/
It is mostly (some will say only) a programming (and embedding) task.
Kjell Gunnar Bleivik
http://www.multifinanceit.com/
http://www.blognorway.com/
|

08-04-2005, 01:01 PM
|
 |
WebProWorld Pro
|
|
Join Date: Jul 2005
Location: Eielson AFB, AK
Posts: 174
|
|
Don't use Word - you'll get a ton of IE proprietary code and lose your W3C compliancy.
I'd go with the PHP solution that was previously mentioned.
|

08-07-2005, 03:23 PM
|
|
WebProWorld Pro
|
|
Join Date: Oct 2004
Location: NYC, USA
Posts: 148
|
|
Quote:
|
Originally Posted by speed
This isn't pretty but does strip tags (requires PHP 4.3.0 or later):
Code:
<?php
$data = file_get_contents('http://www.tolranet.co.uk');
$data = str_replace('</a>', '', $data);
$data = preg_replace('/<a[^>]+href[^>]+>/', '', $data);
// Write to browser
echo $data;
// Write to file
$f = fopen('noatags.html', 'w');
fwrite($f, $data);
fclose($f);
?>
Put that into a file e.g. strip.php and change the URL of the page to strip from www.tolranet.co.uk to whatever you want, you don't have to use a URL you can use the name of a file on the same machine.
When you access the script from a browser it will load the page at the URL without the <a> tags, and save it to a file called noatags.html (assuming you have write permissions for that file).
Any problems with it let me know.
|
Hi Speed,
I'm a complete php idiot, I saved it to striplinks.php, didn't make any changes to it (to test it out with your url), and opening it up all that came back was ', '', $data); $data = preg_replace('/]+href[^>]+>/', '', $data); // Write to browser echo $data; // Write to file $f = fopen('noatags.html', 'w'); fwrite($f, $data); fclose($f); ?>
No html file was created. What went wrong?
Thanks
|

08-07-2005, 03:56 PM
|
|
WebProWorld Veteran
|
|
Join Date: Aug 2003
Location: Cornwall, UK
Posts: 862
|
|
It sounds like you don't have the ability to run PHP on that web server.
However it's worth checking that the <?php tag is at the start of the file, assuming it is then you need to check if your web host supports PHP and if you need to do anything special to run PHP scripts. I know of one host where PHP scripts have to be uploaded to a different area to normal HTML pages.
If the host doesn't support PHP then we'll have to have another think, unless you can borrow some PHP enabled web space to strip the documents.
Let me know how you get on.
|

08-07-2005, 07:00 PM
|
|
WebProWorld Pro
|
|
Join Date: Oct 2004
Location: NYC, USA
Posts: 148
|
|
Quote:
|
Originally Posted by speed
It sounds like you don't have the ability to run PHP on that web server.
However it's worth checking that the <?php tag is at the start of the file, assuming it is then you need to check if your web host supports PHP and if you need to do anything special to run PHP scripts. I know of one host where PHP scripts have to be uploaded to a different area to normal HTML pages.
If the host doesn't support PHP then we'll have to have another think, unless you can borrow some PHP enabled web space to strip the documents.
Let me know how you get on.
|
Hi--thanks a lot. Is it possible to get the script to work with a file located on the local harddrive and not just one uploaded to the server? That would be a big time-saver: just set a default filename, strip.htm, and save any file I want to strip with the file name and voila!
Found my server's scripts directory, thanks.
|

08-07-2005, 08:05 PM
|
|
WebProWorld Veteran
|
|
Join Date: Aug 2003
Location: Cornwall, UK
Posts: 862
|
|
Yes it's possible to run PHP on a local machine, http://www.php.net/downloads.php is the main PHP downloads, and http://www.firepages.com.au/devindex.htm for a complete PHP, apache setup for windows.
Changing the script to:
Code:
<?php
$data = file_get_contents('strip.html');
$data = str_replace('</a>', '', $data);
$data = preg_replace('/<a[^>]+href[^>]+>/', '', $data);
// Write to browser
echo $data;
// Write to file
$f = fopen('noatags.html', 'w');
fwrite($f, $data);
fclose($f);
?>
Allows you to upload the file to convert as strip.html into the same folder as the above php script, access the php script from the browser and download noatags.html
If you've got command line access you can invoke the above script with something like "php strip.php", depending on your installation, rather than accessing it with a browser.
I don't know how many files you have to strip, if it's only few then it's not worth installing PHP locally, if you've got hundreds then it would probably be better to update the script to convert all html files in a folder so you can bulk upload, convert, download.
|

08-20-2005, 06:55 PM
|
|
WebProWorld Pro
|
|
Join Date: Oct 2004
Location: NYC, USA
Posts: 148
|
|
Quote:
|
Originally Posted by speed
Yes it's possible to run PHP on a local machine, http://www.php.net/downloads.php is the main PHP downloads, and http://www.firepages.com.au/devindex.htm for a complete PHP, apache setup for windows.
Changing the script to:
Code:
<?php
$data = file_get_contents('strip.html');
$data = str_replace('</a>', '', $data);
$data = preg_replace('/<a[^>]+href[^>]+>/', '', $data);
// Write to browser
echo $data;
// Write to file
$f = fopen('noatags.html', 'w');
fwrite($f, $data);
fclose($f);
?>
Allows you to upload the file to convert as strip.html into the same folder as the above php script, access the php script from the browser and download noatags.html
If you've got command line access you can invoke the above script with something like "php strip.php", depending on your installation, rather than accessing it with a browser.
I don't know how many files you have to strip, if it's only few then it's not worth installing PHP locally, if you've got hundreds then it would probably be better to update the script to convert all html files in a folder so you can bulk upload, convert, download.
|
How do I do it so I can do it in bulk?
How bad is this? Just remove the .html to go from a file to a directory?
Code:
<?php
$data = file_get_contents('strip');
$data = str_replace('</a>', '', $data);
$data = preg_replace('/<a[^>]+href[^>]+>/', '', $data);
// Write to file
$f = fopen('noatags', 'w');
fwrite($f, $data);
fclose($f);
?>
|

08-21-2005, 06:03 AM
|
|
WebProWorld Veteran
|
|
Join Date: Aug 2003
Location: Cornwall, UK
Posts: 862
|
|
Put the following in a .php file:
Code:
<?php
$d = opendir('in');
while(($file = readdir($d)) !== false) {
if($file != '.' && $file != '..') {
$data = file_get_contents('in/' . $file);
$data = str_replace('</a>', '', $data);
$data = preg_replace('/<a[^>]+href[^>]+>/', '', $data);
// Write to browser
echo "file $file
\n";
// Write to file
$f = fopen('out/' . $file, 'w');
fwrite($f, $data);
fclose($f);
}
}
closedir($d);
echo "Done...
\n";
?>
Create a folder called 'in' and a folder called 'out' in the same folder as the script. NOTE: 'out' must be writable by the script.
Upload all the .html files to the 'in' folder, access the script from a web browser, then download all the converted ones from 'out'.
The script will overwrite files in the 'out' folder that have the same name.
|
| Thread Tools |
|
|
| Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|