 |

10-17-2003, 11:41 AM
|
 |
WebProWorld Veteran
|
|
Join Date: Oct 2003
Location: Northern Ireland
Posts: 498
|
|
Implementing a 'Search' option in my web site.
As the number of articles etc on my web site is increasing al the time, I will have to look at implementing a search option soon.
I do not want to use an external 'free' site search option that displays adverts on the results pages, and I dont particularly want to pay for one either.
This is completely new territory for me.
Anyone have and 'Idiots Guide' to the above?
NB - the ISP that hosts our web site uses Unix / Apache web server I think.
Thanks in advance.
|

10-17-2003, 03:05 PM
|
 |
WebProWorld Veteran
|
|
Join Date: Aug 2003
Location: Grand Rapids, MI USA
Posts: 553
|
|
What you are trying to accomplish would most likely be done best with php and mysql. You might be able to find some information on them by searching google.
hotscripts.com also has some premade scripts. Although I have never used any of them this one looked good http://www.hotscripts.com/Detailed/10525.html
|

10-20-2003, 07:02 AM
|
 |
WebProWorld Veteran
|
|
Join Date: Oct 2003
Location: Northern Ireland
Posts: 498
|
|
Thanks.
This does look promising. I will look into it further.....
|

10-24-2003, 12:30 PM
|
 |
WebProWorld Veteran
|
|
Join Date: Aug 2003
Posts: 659
|
|
One thing to consider in your quest for a site search script is that many free scripts will index every HTML and plain text page on your site, including your robots.txt file, .htaccess, and your PERL scripts. I recommend that you select a script that allows you to exclude certain files or even entire directories.
|

10-25-2003, 05:49 PM
|
 |
WebProWorld New Member
|
|
Join Date: Sep 2003
Location: Poland
Posts: 22
|
|
'Search' option in your web site
Hello,
Do you want to search within contents of your website ?
Things may be simplified if the "number of articles etc" are in a database.
Usually database search is rather za common task.
However if your website contains a set of HTML files there is a need to scan them using regular expressions to build a database of keywords.
BTW: have a look at: http://www.ij.nq.pl/english/technology.html#index
Igor
|

10-26-2003, 09:59 AM
|
 |
WebProWorld Pro
|
|
Join Date: Jul 2003
Location: EU, Poland
Posts: 139
|
|
If your server is MS-based, you might consider implementing a solution based on Microsoft Indexing Service.
|

10-27-2003, 07:06 AM
|
 |
WebProWorld Veteran
|
|
Join Date: Oct 2003
Location: Northern Ireland
Posts: 498
|
|
Thanks everyone for the replies.
The Google search in particular is an appealing option for someone like me (lazy), although I dont think it would look quite as professional as having your 'own' search function.
After a bit of further investigation, the ISP that hosts my site - www.tibus.net - will provide a pretty decent, easy to implement search facility for an extra £10 per month. As we would have to pay extra anyway for the privilege of running PHP etc, I think this is probably the way we will go. (Although I might mess with the Google option for a bit first).
|

11-01-2003, 12:13 PM
|
|
WebProWorld New Member
|
|
Join Date: Oct 2003
Posts: 6
|
|
php search engine
Don't pay your host anything extra before trying this:
http://www.isearchthenet.com/isearch/
You can see the search script in action on my site (link in signature) The only advertising in the script is a link back to isearch on the results page.
|

11-01-2003, 01:05 PM
|
|
WebProWorld Veteran
|
|
Join Date: Sep 2003
Location: SD
Posts: 771
|
|
I fthis works as good as it looks I will gladly be making a donation!
|

11-03-2003, 07:34 AM
|
|
WebProWorld Veteran
|
|
Join Date: Sep 2003
Location: SD
Posts: 771
|
|
Has anyone tried working with this?
I'm having a problem of it going very slow partway through spidering my site
|

09-28-2004, 12:01 PM
|
 |
WebProWorld Veteran
|
|
Join Date: Oct 2003
Location: Northern Ireland
Posts: 498
|
|
Looks like you need to have an SQL database available to you to be able to use this (we do not have one with the hosting package we have).
Thanks for the info anyway.
|

09-28-2004, 04:08 PM
|
 |
WebProWorld 1,000+ Club
|
|
Join Date: Aug 2003
Location: Edmonton, AB, Canada
Posts: 3,406
|
|
I have been keeping my eyes open for a while now(at least 6 months) and have found a few very promising ones.
I do not have MySQL, but there are two types of server-side engines I know of. One you use from your own computer and create an index which you upoad. It is not dynamic, so must be re-done all the time.
The other is cgi/perl.
This is a php based one - no MySQL (in case someone is interested):
Triality
Quote:
|
Triality is my new php search engine. It does not need a database or indexing. It is a spider/crawler. It now includes customizable results headers so you can make it look how you want. Triality automatically searches up to 2 directories deep. It can also take search options, so a user can select an option, and Triality will start digging down at that path instead of the main level. It can also be used as complete file listing for a website, it will build the complete file tree
|
However, these guys make the most excellent scripts I have run - I am not kidding.
They will even install it for you, for free.
This is perl, and it has everything I can think of, including searching PDFs:
Quote:
Home > Fluid Dynamics Search Engine > Features
What is the Fluid Dynamics Search Engine?
FDSE is a search engine that you install on your own site. Visitors to your site use it to find files on your site or on a small cluster of sites. The search box at the top of this page is an example of how FDSE is typically used.
FDSE is different than Google or Altavista, which search the entire Internet. FDSE only searches the sites that you tell it to. It can handle about 10,000 documents in all, which is plenty for one site but much fewer than the total number of documents on the Internet. (more info on size limits)
FDSE is smaller than Google or Altavista, but it is qualitatively identical to them. It has its own built-in web robot for retreiving files, which means it is not limited to searching only documents on its own server. It builds its own index files and returns results from them, unlike some "meta-search" scripts which make behind-the-scenes requests to major search engines to gather results.
FDSE runs entirely on your server, so visitors aren't redirected to a separate centralized server to get their results (as with Atomz and Freefind). If your web server doesn't support Perl CGI at all, then you might be better off with one of those remotely-hosted solutions.
FDSE is a flat search engine - it accepts keywords and shows a ranked list of search results. It does not organize pages into browsable categories and subcategories like Yahoo does.
Features and Benefits:
Unrestricted full version download - you can try before you buy.
Code executes 100% locally on your own server - no dependencies on other sites or companies.
Code is 100% pure Perl - no dependencies on external modules or system calls.
No forced banner advertisements to distract your visitors.
Extras are optional. For example, you can configure your own keyword-triggered banner ads, but that's your choice. They aren't forced on you.
Platform indepedence - runs well on Unix, Linux, Windows NT, Windows 200X, Win95/98/ME.
Completely template-based: you control the entire look-and-feel of the site by editing text/html template files. No need to edit the source code... though you can do that too. You can always preserve your existing templates and data when upgrading or re-installing the product.
Dependable user support, featuring many in-depth help files and an active discussion forum.
Code is modular and heavily commented for the benefit of those who want to be hardcore. Can be called as an API from another Perl script. Format of all data files is documented in the help file.
Highly customizable filter rules allow you to programmatically control which web pages are included in the index. Filtering can be done based on patterns in the hostname, URL, or Document Text, or based on RASCi and Safesurf PICS headers.
Resource-intensive actions, like indexing entire web sites, are spread across multiple CGI executions, using META refreshes. This prevents web server timeouts due to excessive resource usage, and allows the action to recover if some individual CGI executions fail.
Searches text and HTML files. Can also search PDF, MP3, and MS Word files with helper applications (help file).
Add Your URL - any visitor can add her own website to the index, at your option. This can be turned on or off by the script owner. (more info)
Attribute Indexing - a document's text, keywords, description, title, and address are all extracted and used for searching.
Rich Display - the title, description, size, last modified time, and address of each document are shown to the user in the list of hits. The admin can configure the number of hits to show per page.
Relevance Listing - documents are sorted by the number of keyword hits, so that the most relevant document appears first. Search terms found in the title, keywords, or description are given additional weight.
Smart HTML Parsing - the search engine does not index text appearing inside of HTML tags, nor inside <SCRIPT> or <STYLE> blocks
Attribute Searching - by default, searches find words in the body, title, keywords, URL, links, or text of a document. By using attribute:value searches, each portion of a document can be searched. The supported attributes are:
url:value (host:value) (domain:value)
Finds "value" in the web address of the document. For example, host:whitehouse.gov will only find matches on that website. The prefixes "url," "host," and "domain" all act the same.
title:value
Finds "value" between the <TITLE> and </TITLE> tags of the target document.
text:value
Searches only the actual text of the document, not the links or the URL. Due to the data structure of the index file, this attribute will include the title, keywords, and description of the file
link:value
Searches only the text extracted from hyperlinks in the document. Useful to see which documents link to a particular page, such as "link:http://my.host.com/". Relative links are extracted as-is, and are not expanded.
Phrase Searching - Enclosing words in quotation marks causes them to be evaluated as a phrase. That is, all terms must occur next to each other and in order. "My bad self", when quoted, will not match "my self is bad".
Intended Phrase Optimization - a set of unquoted search terms will be treated as a phrase first, and as individual terms second. Thus, users who don't quote their phrases will still see phrase matches near the top of the results list.
Punctuation for Phrase Binding - words joined by punctuation will be treated as a phrase. Searching for "Bill.Clinton" (unquoted) is the same as "Bill Clinton" when quoted.
Punctuation-Insensitve - only alpha-numeric characters can be used for search terms. The characters "+," "|," "-," ":," and "*" all have special meaning (require term, prefer term, forbid term, bind attribute and wildcard match, respectively.) All other punctuation characters are treated as whitespace.
Case Sensitivity - All searches are case insensitive and accent insensitive. Searching for "Fur" will match the lowercase "fur", uppercase "FUR", and German "für".
Granular Any/All Control - users may configure each search to find "any" keyword or "all" keywords in the set. In addition to setting a default for all keywords, users can specify whether specific keywords should be required by using a "+" sign before them. Words can be optional with a leading "|", and forbidden with a leading "-".
For example, the .... (more features!!!)
|
http://www.xav.com/scripts/search/features.html]
Quote:
Download and Installation
Download / Manual Install - download source, configure, then install by hand.
Automated Install - immediately get a working search engine. No download, no FTP, no mess.
Need help? Get a custom install for free.
Purchase and Licensing
You may use this script in freeware mode, or you may Purchase the script for $40 to use the registered mode.
More Information
FDSE User's Guide
Help - includes 200 articles
etc.,
|
I have absolutely nothing to do with these guys, but I sure have no problem telling about good deals I have found.
I used the AXS before I even knew what Perl and PHP were. It worked flawlessly, on IIS, and it is made for Apache.
__________________
What I am is what I am, are you what you are, or what.
Eddie Brickel
|

09-29-2004, 04:12 AM
|
 |
WebProWorld Veteran
|
|
Join Date: Oct 2003
Location: Northern Ireland
Posts: 498
|
|
Thanks for the info mikmik. Sounds like it is definitely worth looking into further....
|

10-19-2004, 11:36 AM
|
 |
WebProWorld Veteran
|
|
Join Date: Oct 2003
Location: Northern Ireland
Posts: 498
|
|
Here is an update.. I downloaded the triality.php search, and got it working on our web site in a test version. It was pretty impressive, but I wanted to change it a wee bit so the 'look' of search results were in keeping with the rest of our web site (and modify it to exclude jpeg filenames etc from the search results).
It said in the readme that came with it to contact the author if you wanted to make ANY modifications. I did this, but over two weeks later I have not heard anything back from the author!
(But having said that, it certainly is a VERY good basis for a search tool for people that do not have access to a MYSQL database).
|

10-19-2004, 04:32 PM
|
|
WebProWorld 1,000+ Club
|
|
Join Date: Sep 2003
Location: Texas
Posts: 1,283
|
|
PhpDig & Google API
I use PhpDig. It is open source, you can pick what content you want indexed (anywhere on the web), and it comes with a great back end.
You can see my implementation of it here. I really like it a lot; you can modify it do just about anything you need.
The other option I like is to use the Google API to set up your own site search engine. Assuming your content is mostly indexed by Google, you can set it up so that no one ever knows it is using Google. I found this tutorial. I found another on DevShed about using the API to make a site search engine that I used to help me make the "Link Development Tool" in my sig.
Remember that only the content indexed by google would be available in your search results and your site would be limited to 1000 searches per day. Since you can only pull 10 results at a time, each "Next Page" click counts toward your 1000/day.
Anyway, good luck.
|

10-19-2004, 07:03 PM
|
 |
Administrator
|
|
Join Date: Jul 2004
Location: Omaha
Posts: 2,717
|
|
Fluid Dynamics
We used the Fluid Dynamics search for quite a while with a lot of success. Very well done. Most of the scripts on xav.com seem to be done very well.
The only problem I've ever had with searching on a site is that many people don't care what order the results are shown in. Make sure whatever option you choose (google obviously does this already) shows the results in order of relevance, not most recent or first added or in order of file name.
|

10-19-2004, 10:08 PM
|
 |
WebProWorld MVP
|
|
Join Date: Jul 2003
Location: Denver, Colorado USA
Posts: 1,437
|
|
jordanmcclements,
I was one of the first customers of Atomz.com until they told me to "go away" as they needed "a minimum of $10K/yr from each client".
I then found FreeFind.com
They are inexpensive.
They are a small company that is very responsive.
The paid versions search HTML and PDF.
I have 20 sites using them.
,dave
|

10-20-2004, 12:28 AM
|
|
WebProWorld Pro
|
|
Join Date: May 2004
Location: Anjo, Japan
Posts: 108
|
|
I chose atomz for my new site as it gives detailed stats on what users are searching for internally. This has obvious benefits for future SEO and useability efforts. The other thing is that they will now do up to 750 pages for free (up from 500).
Knowing what the visitors are looking for and the fact that it was free was what did it for me.
|

10-20-2004, 02:12 AM
|
|
WebProWorld Pro
|
|
Join Date: Jul 2004
Location: Reno, NV and Vancouver, BC
Posts: 122
|
|
If you are interested | |