PDA

View Full Version : Implementing a 'Search' option in my web site.



jordanmcclements
10-17-2003, 11:41 AM
As the number of articles etc on my web site is increasing al the time, I will have to look at implementing a search option soon.
I do not want to use an external 'free' site search option that displays adverts on the results pages, and I dont particularly want to pay for one either.
This is completely new territory for me.
Anyone have and 'Idiots Guide' to the above?

NB - the ISP that hosts our web site uses Unix / Apache web server I think.

Thanks in advance.

redcircle
10-17-2003, 03:05 PM
What you are trying to accomplish would most likely be done best with php and mysql. You might be able to find some information on them by searching google.

hotscripts.com also has some premade scripts. Although I have never used any of them this one looked good http://www.hotscripts.com/Detailed/10525.html

jordanmcclements
10-20-2003, 07:02 AM
Thanks.

This does look promising. I will look into it further.....

rlrouse
10-24-2003, 12:30 PM
One thing to consider in your quest for a site search script is that many free scripts will index every HTML and plain text page on your site, including your robots.txt file, .htaccess, and your PERL scripts. I recommend that you select a script that allows you to exclude certain files or even entire directories.

igor1
10-25-2003, 05:49 PM
Hello,

Do you want to search within contents of your website ?

Things may be simplified if the "number of articles etc" are in a database.
Usually database search is rather za common task.

However if your website contains a set of HTML files there is a need to scan them using regular expressions to build a database of keywords.


BTW: have a look at: http://www.ij.nq.pl/english/technology.html#index

Igor

ranjan
10-25-2003, 07:20 PM
You could take a look at google search

http://www.google.com/searchcode.html#both

or atomz search

http://www.atomz.com/search/

These are easier to implement for small websites

softwaresubmit
10-26-2003, 09:59 AM
If your server is MS-based, you might consider implementing a solution based on Microsoft Indexing Service.

jordanmcclements
10-27-2003, 07:06 AM
Thanks everyone for the replies.

The Google search in particular is an appealing option for someone like me (lazy), although I dont think it would look quite as professional as having your 'own' search function.

After a bit of further investigation, the ISP that hosts my site - www.tibus.net - will provide a pretty decent, easy to implement search facility for an extra £10 per month. As we would have to pay extra anyway for the privilege of running PHP etc, I think this is probably the way we will go. (Although I might mess with the Google option for a bit first).

matthew.ag
11-01-2003, 12:13 PM
Don't pay your host anything extra before trying this:
http://www.isearchthenet.com/isearch/

You can see the search script in action on my site (link in signature) The only advertising in the script is a link back to isearch on the results page.

jackson992
11-01-2003, 01:05 PM
I fthis works as good as it looks I will gladly be making a donation!

jackson992
11-03-2003, 07:34 AM
Has anyone tried working with this?

I'm having a problem of it going very slow partway through spidering my site

jordanmcclements
09-28-2004, 12:01 PM
Looks like you need to have an SQL database available to you to be able to use this (we do not have one with the hosting package we have).

Thanks for the info anyway.

mikmik
09-28-2004, 04:08 PM
I have been keeping my eyes open for a while now(at least 6 months) and have found a few very promising ones.

I do not have MySQL, but there are two types of server-side engines I know of. One you use from your own computer and create an index which you upoad. It is not dynamic, so must be re-done all the time.

The other is cgi/perl.
This is a php based one - no MySQL (in case someone is interested):
Triality (http://www.pyroxpro.com/phpapps/index.php?s=10&e=15)

Triality is my new php search engine. It does not need a database or indexing. It is a spider/crawler. It now includes customizable results headers so you can make it look how you want. Triality automatically searches up to 2 directories deep. It can also take search options, so a user can select an option, and Triality will start digging down at that path instead of the main level. It can also be used as complete file listing for a website, it will build the complete file tree

However, these guys make the most excellent scripts I have run - I am not kidding.
They will even install it for you, for free.

This is perl, and it has everything I can think of, including searching PDFs:


Home > Fluid Dynamics Search Engine (http://www.xav.com/scripts/search/) > Features
What is the Fluid Dynamics Search Engine?

FDSE is a search engine that you install on your own site. Visitors to your site use it to find files on your site or on a small cluster of sites. The search box at the top of this page is an example of how FDSE is typically used.

FDSE is different than Google or Altavista, which search the entire Internet. FDSE only searches the sites that you tell it to. It can handle about 10,000 documents in all, which is plenty for one site but much fewer than the total number of documents on the Internet. (more info on size limits)

FDSE is smaller than Google or Altavista, but it is qualitatively identical to them. It has its own built-in web robot for retreiving files, which means it is not limited to searching only documents on its own server. It builds its own index files and returns results from them, unlike some "meta-search" scripts which make behind-the-scenes requests to major search engines to gather results.

FDSE runs entirely on your server, so visitors aren't redirected to a separate centralized server to get their results (as with Atomz and Freefind). If your web server doesn't support Perl CGI at all, then you might be better off with one of those remotely-hosted solutions.

FDSE is a flat search engine - it accepts keywords and shows a ranked list of search results. It does not organize pages into browsable categories and subcategories like Yahoo does.

Features and Benefits:

Unrestricted full version download - you can try before you buy.

Code executes 100% locally on your own server - no dependencies on other sites or companies.

Code is 100% pure Perl - no dependencies on external modules or system calls.

No forced banner advertisements to distract your visitors.

Extras are optional. For example, you can configure your own keyword-triggered banner ads, but that's your choice. They aren't forced on you.

Platform indepedence - runs well on Unix, Linux, Windows NT, Windows 200X, Win95/98/ME.

Completely template-based: you control the entire look-and-feel of the site by editing text/html template files. No need to edit the source code... though you can do that too. You can always preserve your existing templates and data when upgrading or re-installing the product.

Dependable user support, featuring many in-depth help files and an active discussion forum.

Code is modular and heavily commented for the benefit of those who want to be hardcore. Can be called as an API from another Perl script. Format of all data files is documented in the help file.

Highly customizable filter rules allow you to programmatically control which web pages are included in the index. Filtering can be done based on patterns in the hostname, URL, or Document Text, or based on RASCi and Safesurf PICS headers.

Resource-intensive actions, like indexing entire web sites, are spread across multiple CGI executions, using META refreshes. This prevents web server timeouts due to excessive resource usage, and allows the action to recover if some individual CGI executions fail.

Searches text and HTML files. Can also search PDF, MP3, and MS Word files with helper applications (help file).

Add Your URL - any visitor can add her own website to the index, at your option. This can be turned on or off by the script owner. (more info)

Attribute Indexing - a document's text, keywords, description, title, and address are all extracted and used for searching.

Rich Display - the title, description, size, last modified time, and address of each document are shown to the user in the list of hits. The admin can configure the number of hits to show per page.

Relevance Listing - documents are sorted by the number of keyword hits, so that the most relevant document appears first. Search terms found in the title, keywords, or description are given additional weight.


Smart HTML Parsing - the search engine does not index text appearing inside of HTML tags, nor inside <SCRIPT> or <STYLE> blocks


Attribute Searching - by default, searches find words in the body, title, keywords, URL, links, or text of a document. By using attribute:value searches, each portion of a document can be searched. The supported attributes are:

url:value (host:value) (domain:value)
Finds "value" in the web address of the document. For example, host:whitehouse.gov will only find matches on that website. The prefixes "url," "host," and "domain" all act the same.


title:value
Finds "value" between the <TITLE> and </TITLE> tags of the target document.


text:value
Searches only the actual text of the document, not the links or the URL. Due to the data structure of the index file, this attribute will include the title, keywords, and description of the file


link:value
Searches only the text extracted from hyperlinks in the document. Useful to see which documents link to a particular page, such as "link:http://my.host.com/". Relative links are extracted as-is, and are not expanded.
Phrase Searching - Enclosing words in quotation marks causes them to be evaluated as a phrase. That is, all terms must occur next to each other and in order. "My bad self", when quoted, will not match "my self is bad".


Intended Phrase Optimization - a set of unquoted search terms will be treated as a phrase first, and as individual terms second. Thus, users who don't quote their phrases will still see phrase matches near the top of the results list.


Punctuation for Phrase Binding - words joined by punctuation will be treated as a phrase. Searching for "Bill.Clinton" (unquoted) is the same as "Bill Clinton" when quoted.


Punctuation-Insensitve - only alpha-numeric characters can be used for search terms. The characters "+," "|," "-," ":," and "*" all have special meaning (require term, prefer term, forbid term, bind attribute and wildcard match, respectively.) All other punctuation characters are treated as whitespace.


Case Sensitivity - All searches are case insensitive and accent insensitive. Searching for "Fur" will match the lowercase "fur", uppercase "FUR", and German "für".


Granular Any/All Control - users may configure each search to find "any" keyword or "all" keywords in the set. In addition to setting a default for all keywords, users can specify whether specific keywords should be required by using a "+" sign before them. Words can be optional with a leading "|", and forbidden with a leading "-".

For example, the .... (more features!!!)

http://www.xav.com/scripts/search/features.html]

Download and Installation

Download / Manual Install - download source, configure, then install by hand.

Automated Install - immediately get a working search engine. No download, no FTP, no mess.

Need help? Get a custom install for free.

Purchase and Licensing

You may use this script in freeware mode, or you may Purchase the script for $40 to use the registered mode.

More Information

FDSE User's Guide

Help - includes 200 articles

etc.,

I have absolutely nothing to do with these guys, but I sure have no problem telling about good deals I have found.

I used the AXS before I even knew what Perl and PHP were. It worked flawlessly, on IIS, and it is made for Apache.

jordanmcclements
09-29-2004, 04:12 AM
Thanks for the info mikmik. Sounds like it is definitely worth looking into further....

jordanmcclements
10-19-2004, 11:36 AM
Here is an update.. I downloaded the triality.php search, and got it working on our web site in a test version. It was pretty impressive, but I wanted to change it a wee bit so the 'look' of search results were in keeping with the rest of our web site (and modify it to exclude jpeg filenames etc from the search results).
It said in the readme that came with it to contact the author if you wanted to make ANY modifications. I did this, but over two weeks later I have not heard anything back from the author!
(But having said that, it certainly is a VERY good basis for a search tool for people that do not have access to a MYSQL database).

flood6
10-19-2004, 04:32 PM
I use PhpDig (http://www.phpdig.net/). It is open source, you can pick what content you want indexed (anywhere on the web), and it comes with a great back end.

You can see my implementation of it here (http://www.divergentlines.com/search/index.php). I really like it a lot; you can modify it do just about anything you need.

The other option I like is to use the Google API to set up your own site search engine. Assuming your content is mostly indexed by Google, you can set it up so that no one ever knows it is using Google. I found this tutorial (http://blog.outer-court.com/archive/2003_06_22_index.html). I found another on DevShed about using the API to make a site search engine that I used to help me make the "Link Development Tool" in my sig.

Remember that only the content indexed by google would be available in your search results and your site would be limited to 1000 searches per day. Since you can only pull 10 results at a time, each "Next Page" click counts toward your 1000/day.

Anyway, good luck.

brian.mark
10-19-2004, 07:03 PM
We used the Fluid Dynamics search for quite a while with a lot of success. Very well done. Most of the scripts on xav.com seem to be done very well.

The only problem I've ever had with searching on a site is that many people don't care what order the results are shown in. Make sure whatever option you choose (google obviously does this already) shows the results in order of relevance, not most recent or first added or in order of file name.

paulchri
10-19-2004, 08:05 PM
Northern Ireland...

May I direct you to a cheap, quality, proven solution? No testing headaches with new PHP code, etc. Just $49...per year! Spiderline (http://www.spiderline.com). Customizable in your site motif, simple to implement. We've got it on this distributor's site, Test Equipment Distributors (http://www.tedndt.com/cat/cat44.html). Good luck in your efforts….

Paul C.

Target Marketing Solutions – Search Engine Marketing Services (http://www.targetmarketingsol.com)

davebarnes
10-19-2004, 10:08 PM
jordanmcclements,

I was one of the first customers of Atomz.com until they told me to "go away" as they needed "a minimum of $10K/yr from each client".

I then found FreeFind.com (http://www.freefind.com)
They are inexpensive.
They are a small company that is very responsive.
The paid versions search HTML and PDF.
I have 20 sites using them.

,dave

stephenmunday
10-20-2004, 12:28 AM
I chose atomz for my new site as it gives detailed stats on what users are searching for internally. This has obvious benefits for future SEO and useability efforts. The other thing is that they will now do up to 750 pages for free (up from 500).

Knowing what the visitors are looking for and the fact that it was free was what did it for me.

salomon741
10-20-2004, 02:12 AM
If you are interested in a CGI script, I suggest visiting www.bignosebird.com

Since you mentioned that you are using more and more pages now, here is their fastest site search named Xavatoria (http://bignosebird.com/carchive/search2.shtml). It is an AltaVista-like interface, and is also designed to index and search high-speed.

All of the scripts in their archive are for free and are generally cut and paste. Xavatoria is a little more complex.

Hope this helps...

wwwizzard
10-20-2004, 06:08 AM
Here is another freebie (for the time being):
http://www.interspire.com/fastfind/
I have not used it, but I had a play with their demo and is seems very promising.

Cheers

jordanmcclements
10-20-2004, 06:37 AM
Thanks everyone. I have more information than I can shake a stick at now.......

Larke
10-20-2004, 08:37 AM
I have been trying site search scripts for the last 2 weeks. Of all that I tried (including freefind.com) I preferred Perlfect's Search.

http://www.perlfect.com/freescripts/search/

I found it highly customizable and very easy to exclude pages or entire directories. You can set it to search your site with a cron job, or do it manually. Each search result indicates the last time the indexing was updated.

A live example on my site http://cattery-index.com

jbolanos
10-20-2004, 11:27 AM
Tried the isearch option in the Matthew site... I don't know why but I could not get valid results: I put the words "raise" and "diarios" taken from names on the first page, one at a time. In the first case, "Raise your voice" was not listed on the results. In the second case, there were no valid results ("Diarios de motocicleta" was not found.)

I know I am missunderstanding something but don't know what. Could you explain this ?

globalhostinggroup
10-20-2004, 12:48 PM
jordanmcclements what do you pay for your hosting and how much space and bandwidth do you get for that hosting is more completive now I include MySQL and many other features at no additional cost as do many other reliable hosting companies.

jordanmcclements
10-21-2004, 10:34 AM
Re Hosting - We - www.campbell-fitzpatrick.co.uk - use a local company www.tibus.net - not because they are particularly good value, but because they are just down the road, and their support is excellent (you can always get someone on the phone if there is a problem). Since they are only £10 per month - I am 99.99% certain that we will not be changing no matter how good value the opposition is....

Re Search Options (I should really just take the £10 per month option that Tibus offer - but I can't resist trying out the alternatives to see what they are like). I have now spent a bit of time trying out freefind.com and have gotten a search working with a custom template, and think the service is absolutely first class. At $5 per month for searching up to 250 pages - it is pretty good value also (the free version with the adverts included in the search results is not really an option for most web sites).

But - I think I will persevere a bit more with PHP based searches that do not require MYSQL before I go with a paid service...

Intensity
10-21-2004, 02:15 PM
I would definitely recommend Fluid Dynamics Search Engine.

Very easy to install and good / powerful functionality.

jordanmcclements
10-22-2004, 08:30 AM
Re Fluid Dyanimcs...

For www.campbell-fitzpatrick.co.uk this is not an option as our hosting account does not seem to allow Perl CGI. (So I will be going with freefind.com which seems to work extremely well, and provide all sorts of reports etc on searches performed).

But I have started messing with the Fluid Dynamics Search Engine on my own site - www.jmcwd.com and am VERY impressed with the instructions and the install procedure (considering that I know NOTHING about Perl).

Update - Thanks to those who suggested fluid dynamics search engine - it most certainly is a fantastic piece of programming. I have now got it fully integrated into my web site I heartily recommend it to anyone who has a hosting account that allows Perl scripting (most people I would think).

jordanmcclements
12-02-2004, 08:01 AM
Can I just say one more time...

The Fluid Dynamics search engine is absolutely first class and superb! :-

http://www.xav.com/scripts/search/

pagetta
03-30-2005, 11:45 AM
I have only briefly read through this thread as its pretty long and it is my home time, so apologies if I am repeating something already asked!We have/are in the process of implementing the fluid ynamics search and it has been just as easy as everyone said - just one question for anyone who's used it:

I want to add the search box to a bar that is 29 pixels high, however, whenever I add the form code it automatically increases the height of the bar - it puts an extra line under the form, which ruins the look of the site! Does anyone know why it does this or how to stop it - have tried a variety of styles on it but to no avail!

any advice much appreciated

Biggles
04-01-2005, 10:46 AM
have you tried giving it an id or class name and refering it to your style sheet?

eg/

html file:

<input type="text" id="search" name="">

css file:

input#search {
height: 14px;
width: 50px;
background-color: Silver;
border: 1px solid Gray;
padding-top: 1px;
font-size: 8px;
font-weight: bold;
}


etc. etc.

pagetta
05-03-2005, 10:00 AM
I know some people in this thread have used fluid dynamics search very successfully. I have 3 questions

1. I have applied the right classes to all the sections, but for some reason on my results page there is a large left hand margin that cannot find where its coming from - any ideas?

http://www.codestone.net/index-s.html

2. how do you get the SE to recrawl your site as it was put on our server whilst we were still builidng the site and it has some pages stored that no longer exist - which section of the admin will allow me to do this?

3. why do my results display the word 'february' at the top? I can see its being pulled from 'strings.txt' but why it is printing it I have no idea!

Any advice/answers/tips most welcome - thsnkyou!

vark
05-03-2005, 10:02 PM
my .02, PhpDig is what I use. Have not had any problems with it and like the features of being able to index PDF, MS-Word, MS-Excel, and MS-PowerPoint files if you install external binaries for this.