But the important question is what the SE index and find. Does it scan every page in your site for:
"bug"
" bug "
for example?
The code I mention above is so simple that there should be no problem to modify it to find whatever you are looking for in your documents, especially if you have structured your documents and files well.
If you buy an engine, it scans documents based on another programmers chosen criteria. If you program it yourself, it scans on what you specify. Not least for large companies that make their own XML meta language this may be very important. It can be made very effecient choosing the correct tags and attributes.
How many have used engines like freefind, yahoo, google etc. sitesearch on their sites and searched for words you know are in your documents without finding them? The reason is that the engine does not scan the complete document as a string and match the word(s) you specify.
There is no theoretic limit on the number of documents that can be scanned. The engine goes in a loop using XMLReader or another parser, loads one and one file into an object /variable, whose tags and attributs are scanned for the chosen criteria.
Here is the most important and difficult

part of the code:
Code:
$handle = opendir($fileDir);
$items = array();
while (($file = readdir($handle)) !== FALSE) {
if (is_dir($fileDir . $file)) continue;
if (!eregi("^(news|article|webcopy).*\.xml$", $file)) continue;
$xmlItem = simplexml_load_file($fileDir . $file);
if ((stripos($xmlItem->keywords, $term) !== FALSE or
stripos($xmlItem->headline, $term) !== FALSE or
stripos($xmlItem->description, $term) !== FALSE) and
(string)$xmlItem->status == 'live') {
$item = array();
$item['id'] = (string)$xmlItem['id'];
$item['headline'] = (string)$xmlItem->headline;
$items[] = $item;
} //The if test ends here.
} // The while loop ends here.
Note that $items is an array of $item arrays. The part in red is the part you can modify to your own needs. As you will see, this special engine only scans documents with a status of live. You can choose whatever you want.
simplexml_load_file loads the document into a variable that may be scanned as a string. For example "stripos($xmlItem->headline, $term)" scans the document for headlines that contains the $term you choose. Stripos automatically casts the first argument in the function to a string. That is not done in the last read line, where you have to cast the $xmlItem->status object to a string. Casting is everyday work in OOP.
If you looked up the two related posts at the W3 Schools forum, you may have noted that my signature there is:
$MyProfile = simplexml_load_file(myprofile.xml);
echo $MyProfile->name;
:: Kjell Gunnar Bleivik
echo $MyProfile->xpath('/personalinfo/profile[@profileID=W3Schools]');
:: Why learn the bad dialect of HTML when you get it free when tagging XML?
Learn to tag with XML, program with XSL, then improve using Ruby on Rails, PHP...
Later you may learn to fly using
BETA.
I liked the Borland C++ Builder with the possiblity to use inline ASM {statements, but I am an economist}.
Overall principle, make it simple, as simple as possible but no simpler.
as of 30. august 2007. It is when you combine the parsers with XPath (red line above), XPointer, XLink, XSL(T) the power of XML comes to its right.