Submit Your Article Forum Rules

Page 2 of 4 FirstFirst 1234 LastLast
Results 11 to 20 of 31

Thread: free internal site search code ?

  1. #11
    WebProWorld MVP kgun's Avatar
    Join Date
    May 2005
    Location
    Norway
    Posts
    8,007

    Re: free internal site search code ?

    But the important question is what the SE index and find. Does it scan every page in your site for:

    "bug"

    " bug "

    for example?

    The code I mention above is so simple that there should be no problem to modify it to find whatever you are looking for in your documents, especially if you have structured your documents and files well.

    If you buy an engine, it scans documents based on another programmers chosen criteria. If you program it yourself, it scans on what you specify. Not least for large companies that make their own XML meta language this may be very important. It can be made very effecient choosing the correct tags and attributes.

    How many have used engines like freefind, yahoo, google etc. sitesearch on their sites and searched for words you know are in your documents without finding them? The reason is that the engine does not scan the complete document as a string and match the word(s) you specify.

    There is no theoretic limit on the number of documents that can be scanned. The engine goes in a loop using XMLReader or another parser, loads one and one file into an object /variable, whose tags and attributs are scanned for the chosen criteria.

    Here is the most important and difficult part of the code:
    Code:
    $handle = opendir($fileDir);
     $items = array();
     while (($file = readdir($handle)) !== FALSE) {
      if (is_dir($fileDir . $file)) continue;  
      if (!eregi("^(news|article|webcopy).*\.xml$", $file)) continue;
          $xmlItem = simplexml_load_file($fileDir . $file);
           if ((stripos($xmlItem->keywords, $term) !== FALSE or
           stripos($xmlItem->headline, $term) !== FALSE or
           stripos($xmlItem->description, $term) !== FALSE) and
           (string)$xmlItem->status == 'live') {
            $item = array();
       $item['id'] = (string)$xmlItem['id'];
       $item['headline'] = (string)$xmlItem->headline;
       $items[] = $item;
       } //The if test ends here.
     }  // The while loop ends here.
    Note that $items is an array of $item arrays. The part in red is the part you can modify to your own needs. As you will see, this special engine only scans documents with a status of live. You can choose whatever you want. simplexml_load_file loads the document into a variable that may be scanned as a string. For example "stripos($xmlItem->headline, $term)" scans the document for headlines that contains the $term you choose. Stripos automatically casts the first argument in the function to a string. That is not done in the last read line, where you have to cast the $xmlItem->status object to a string. Casting is everyday work in OOP.

    If you looked up the two related posts at the W3 Schools forum, you may have noted that my signature there is:

    $MyProfile = simplexml_load_file(myprofile.xml);
    echo $MyProfile->name;
    :: Kjell Gunnar Bleivik
    echo $MyProfile->xpath('/personalinfo/profile[@profileID=W3Schools]');
    :: Why learn the bad dialect of HTML when you get it free when tagging XML?
    Learn to tag with XML, program with XSL, then improve using Ruby on Rails, PHP...
    Later you may learn to fly using BETA.
    I liked the Borland C++ Builder with the possiblity to use inline ASM {statements, but I am an economist}.
    Overall principle, make it simple, as simple as possible but no simpler.

    as of 30. august 2007. It is when you combine the parsers with XPath (red line above), XPointer, XLink, XSL(T) the power of XML comes to its right.

  2. #12
    WebProWorld MVP kgun's Avatar
    Join Date
    May 2005
    Location
    Norway
    Posts
    8,007

    Re: free internal site search code ?

    If there is something called a web 2.0 site search engine / function, that is especially well suited for XML and its family of techologies and XML parsers. There should be no problem for a clever programmer that knows:
    1. XML XPath XPointer and XLink
    2. PHP or another server side language with good XML Parsers.
    3. AJAX.
    to build a fairly advanced site search engine with Google suggest functionality. May be there are one already.

    You find more information in my Web 2.0 static link collection of resources. Especially the links with anchor text:
    • Improving Web linking using XLink.
    • XML linking language.
    These technologies make it possible to have dynamic and generic links, multi-source, multi-destination links and much much more. Using link bases, location sets, archs and assiciations between these sets in your XML documents, the search engine may be fairly advanced and flexible.

    Here is a link base example:

    "Example: Annotating a Specification

    Following is a non-normative set of declarations for an extended link that specializes in providing linkbase arcs:

    Code:
    <!ELEMENT basesloaded ((startrsrc|linkbase|load)*)>
    <!ATTLIST basesloaded  xlink:type      (extended)      #FIXED "extended">
    <!ELEMENT startrsrc EMPTY>
    <!ATTLIST startrsrc  xlink:type      (locator)       #FIXED "locator"  xlink:href      CDATA           #REQUIRED  xlink:label     NMTOKEN         #IMPLIED>
    <!ELEMENT linkbase EMPTY>
    <!ATTLIST linkbase  xlink:type      (locator)       #FIXED "locator"  xlink:href      CDATA           #REQUIRED  xlink:label     NMTOKEN         #IMPLIED>
    <!ELEMENT load EMPTY>
    <!ATTLIST load  xlink:type      (arc)           #FIXED "arc"  xlink:arcrole   CDATA           #FIXED "http://www.w3.org/1999/xlink/properties/linkbase"  xlink:actuate   (onLoad                  |onRequest                  |other                  |none)          #IMPLIED  xlink:from      NMTOKEN         #IMPLIED  xlink:to        NMTOKEN         #IMPLIED>
    Following is how an XML element using these declarations might look. This would indicate that when a specification document is loaded, a linkbase full of annotations to it should automatically be loaded as well, possibly necessitating re-rendering of the entire specification document to reveal any regions within it that serve as starting resources in the links found in the linkbase.

    Code:
    <basesloaded>
     <startrsrc xlink:label="spec" xlink:href="spec.xml" /> 
     <linkbase xlink:label="linkbase" xlink:href="linkbase.xml" /> 
     <load xlink:from="spec" xlink:to="linkbase" actuate="onLoad" />
    </basesloaded>
    Following is how an XML element using these declarations might look if the linkbase loading were on request. This time, the starting resource consists of the words "Click here to reveal annotations." If the starting resource were the entire document as in the example above, a reasonable behavior for allowing a user to actuate traversal would be a confirmation dialog box".

    Code:
    <basesloaded>  
    <startrsrc    xlink:label="spec"    xlink:href="spec.xml#string-range(//*,'Click here to reveal annotations.')" /> 
     <linkbase xlink:label="linkbase" xlink:href="linkbase.xml" />  
    <load xlink:from="spec" xlink:to="linkbase" actuate="onRequest" />
    </basesloaded>
    Source: ML Linking Language (XLink) Version 1.0

    For those that only know HTML, there should be nothing revolutionary in the above markup. The difficult part is to learn the new markup languages and the important concept of XML name spaces that you use to bind different resources. Note that Internationalized Resource Identifiers (IRIs) is a generalization of an URI that is an generalization of an URL.

    So to sum up. To make a very efficient Web 2.0 XML powered site search engine with AJAX functionality (like Google suggest) the technology is there already. It is very important to think thoroughly when you structure your XML (CMS) site. Clever and smart use of tags, nodes, attributes, link bases and location sets etc. etc. may make your site search engine stand out. If you take the time to write one based on the above ideas, please cite this source and give me an example for free. Google suggest (AJAX) functionality will be much appreciated. Preferrably, use PHP paresers to make the code compact. Use streaming parsers like XMLReader to make it efficient.

  3. #13
    Member
    Join Date
    Sep 2006
    Posts
    35

    Re: free internal site search code ?

    Hi All,

    I wrote my own internal site search for the website i work on, derived from the project i worked on for my Msc. I used a ranking based on natural language and stem formation of words. (For example if someone writes reporting, reported or report these would all be converted to their stem form which is 'report'). I could then compare words easily when i then set a reference within a database table to each page on my site including reference to the key-phrases on each page.

    This method allows me to rank pages based on what and how i wanted them to rank and appear - and of course cause i built it it's free - woo-hoo! The only bummer is i havent written anything to automattically index new pages so i have to update the external page search params via mssql. If anyone wants to know anymore/access the source drop me a line.

    cheers

    mamola

  4. #14
    WebProWorld MVP kgun's Avatar
    Join Date
    May 2005
    Location
    Norway
    Posts
    8,007

    Re: free internal site search code ?

    Should like to see the code.

  5. #15
    WebProWorld MVP mikmik's Avatar
    Join Date
    Aug 2003
    Posts
    1,557

    Re: free internal site search code ?

    Quote Originally Posted by kgun View Post
    If there is something called a web 2.0 site search engine / function, that is especially well suited for XML and its family of techologies and XML parsers. There should be no problem for a clever programmer that knows:
    1. XML XPath XPointer and XLink
    2. PHP or another server side language with good XML Parsers.
    3. AJAX.
    to build a fairly advanced site search engine with Google suggest functionality. May be there are one already.

    You find more information in my Web 2.0 static link collection of resources. Especially the links with anchor text:
    • Improving Web linking using XLink.
    • XML linking language.
    These technologies make it possible to have dynamic and generic links, multi-source, multi-destination links and much much more. Using link bases, location sets, archs and assiciations between these sets in your XML documents, the search engine may be fairly advanced and flexible.

    Here is a link base example:

    "Example: Annotating a Specification

    Following is a non-normative set of declarations for an extended link that specializes in providing linkbase arcs:

    Code:
    <!ELEMENT basesloaded ((startrsrc|linkbase|load)*)>
    <!ATTLIST basesloaded  xlink:type      (extended)      #FIXED "extended">
    <!ELEMENT startrsrc EMPTY>
    <!ATTLIST startrsrc  xlink:type      (locator)       #FIXED "locator"  xlink:href      CDATA           #REQUIRED  xlink:label     NMTOKEN         #IMPLIED>
    <!ELEMENT linkbase EMPTY>
    <!ATTLIST linkbase  xlink:type      (locator)       #FIXED "locator"  xlink:href      CDATA           #REQUIRED  xlink:label     NMTOKEN         #IMPLIED>
    <!ELEMENT load EMPTY>
    <!ATTLIST load  xlink:type      (arc)           #FIXED "arc"  xlink:arcrole   CDATA           #FIXED "http://www.w3.org/1999/xlink/properties/linkbase"  xlink:actuate   (onLoad                  |onRequest                  |other                  |none)          #IMPLIED  xlink:from      NMTOKEN         #IMPLIED  xlink:to        NMTOKEN         #IMPLIED>
    Following is how an XML element using these declarations might look. This would indicate that when a specification document is loaded, a linkbase full of annotations to it should automatically be loaded as well, possibly necessitating re-rendering of the entire specification document to reveal any regions within it that serve as starting resources in the links found in the linkbase.

    Code:
    <basesloaded>
     <startrsrc xlink:label="spec" xlink:href="spec.xml" /> 
     <linkbase xlink:label="linkbase" xlink:href="linkbase.xml" /> 
     <load xlink:from="spec" xlink:to="linkbase" actuate="onLoad" />
    </basesloaded>
    Following is how an XML element using these declarations might look if the linkbase loading were on request. This time, the starting resource consists of the words "Click here to reveal annotations." If the starting resource were the entire document as in the example above, a reasonable behavior for allowing a user to actuate traversal would be a confirmation dialog box".

    Code:
    <basesloaded>  
    <startrsrc    xlink:label="spec"    xlink:href="spec.xml#string-range(//*,'Click here to reveal annotations.')" /> 
     <linkbase xlink:label="linkbase" xlink:href="linkbase.xml" />  
    <load xlink:from="spec" xlink:to="linkbase" actuate="onRequest" />
    </basesloaded>
    Source: ML Linking Language (XLink) Version 1.0

    For those that only know HTML, there should be nothing revolutionary in the above markup. The difficult part is to learn the new markup languages and the important concept of XML name spaces that you use to bind different resources. Note that Internationalized Resource Identifiers (IRIs) is a generalization of an URI that is an generalization of an URL.

    So to sum up. To make a very efficient Web 2.0 XML powered site search engine with AJAX functionality (like Google suggest) the technology is there already. It is very important to think thoroughly when you structure your XML (CMS) site. Clever and smart use of tags, nodes, attributes, link bases and location sets etc. etc. may make your site search engine stand out. If you take the time to write one based on the above ideas, please cite this source and give me an example for free. Google suggest (AJAX) functionality will be much appreciated. Preferrably, use PHP paresers to make the code compact. Use streaming parsers like XMLReader to make it efficient.
    This is incredible. i have to learn xlink and xforms, but I use ajax and of course the basics of xslt and schemas (and dtd).
    I am getting a new website this weekend, with JSP. This is xml at it`s most long developed api.
    Dynamic search responses are excellent using javascript, and xml files.
    Babies don't need a vacation, but I still see them at the beach... it pisses me off! I'll go over to a little baby and say 'What are you doing here? You haven't worked a day in your life!'
    Steven Wright

  6. #16
    Junior Member
    Join Date
    Nov 2007
    Posts
    6

    Re: free internal site search code ?

    thanks guys.
    i like this site because i get usefull infos for me

  7. #17
    WebProWorld MVP mikmik's Avatar
    Join Date
    Aug 2003
    Posts
    1,557

    Re: free internal site search code ?

    Fluid Dynamics, written in PERL, is the best I have ever used.
    Babies don't need a vacation, but I still see them at the beach... it pisses me off! I'll go over to a little baby and say 'What are you doing here? You haven't worked a day in your life!'
    Steven Wright

  8. #18
    WebProWorld MVP kgun's Avatar
    Join Date
    May 2005
    Location
    Norway
    Posts
    8,007

    Re: free internal site search code ?

    I am sure there is an exceedingly fast out there written in todays "assember", C++ where my preferred platform is the platform just above ASP.net on this site.

    <digression>
    Why is Russian Academy of Sciences ranked just above Havard on that site? The reason is the following told by one of my professors (that I rely on) of mathematics.
    1. An American mathematican (don't ask me whome and about what) claimed that he had proved an unproved mathematical "theorem", that was 400 pages long.
    2. In the USA they laughed at him.
    3. He travelled to Russia and held a lecture on his proof.
    4. The audience was silent listening for days.
    5. When he finished, they applauded.
    6. Even if rocket scientists some times crash out of orbit, mathematics is very important in economics and finance.
    Fluid dynamics is solution of partial differential equations (Conservation laws, used in the oil industry are parabolic as far as I remember. Are those used in flued dynamics elliptical? They can be discontionous too, if the discontinuity is not too wild.). The most advanced are non-linear where differntial operators are defined on distributions like the Dirac Delta function or more genereally, generalized functions (scroll down to the lecture by Michael Oberguggenberger, with the title ""Nonlinear SDEs: Colombeau solutions and pathwise limits")

    Evolutin is not always continuos, but may evolve in steps, known from Genetic algorithms as mutation.

    This is not linear or continous reality. (look in the right column).

    Reccomended deeper search.

    Today I got an idea,

    Project for a student.

    Write a book with the following title: "From tagging, via XSL rule based programming, procedural and object oriented programming in C++ to distributed object programming and patterns in Beta".

    There are some treads in the C++ sub forum of my forum, that shows how compact, generic and flexible C++ is.

    This is definitely not W3 Schools stuff at present.

    P.S. Is Beta still ahead of its time, or are there some true pattern based languages out there.
    </digression>

    Note: Perl and PHP are scripring languages. PPH is interpreted. I do not know Perl, so that may be compiled machine code on the server.

    The version of Borland C++ that I have, could combine C++ code with inline assembler code. I once saw an example of a fractal programmed in C (that may be faster than C++ because of overhead in C++) and the same program written in assembler (you may use conditional compilation to solve probemes with various assemblers). The assembler version was 100 times faster.

    H. W. Stockman (Sept-Oct 1988 : "Fast fractals" Micro Cornucopia #43 page 22 - 29.

    <cite>
    This is one of our trick articles. The fractals are really just a sneaky way to get our attention. After all, how many of you would read an article about speeding up 386 software by a factor of 100? (One second thought ...)
    </cite>

    Fast computers or fast code or both?

    Example chess program written in Borland C++ Builder in 1995. Click on the link "sjakkprogram".

    mikmik, do you beat that program 10 of 100 times when the computer has 5 seconds to take the next move, you are definitely a better chess player than me. I am an unhappy amateur

    Pm me the code / link if you find one in C++

  9. #19
    Junior Member
    Join Date
    Dec 2007
    Posts
    4

    Re: free internal site search code ?

    Ok guys, nice to see this post, but:
    Is that code working for non databse search engine?
    I have all my pages static on the server / folders / etc.
    Can i use it?

  10. #20

    Re: free internal site search code ?

    "If you structure your site very well, you can make a very efficient site search engine in my view. It is not so very difficult to modify the code to your own needs. That site search function will retrive content by:
    • keywords,
    • titles,
    • and description
    and display those pieces that "-that's good idea

Page 2 of 4 FirstFirst 1234 LastLast

Similar Threads

  1. Free tag code for site?
    By edhan in forum Web Programming Discussion Forum
    Replies: 2
    Last Post: 07-21-2007, 02:35 PM
  2. Internal Search
    By empyrelounge in forum Marketing Strategies Discussion Forum
    Replies: 4
    Last Post: 05-16-2006, 04:50 AM
  3. I need an INTERNAL Search for my site
    By fctoma in forum Graphics & Design Discussion Forum
    Replies: 2
    Last Post: 11-08-2004, 03:11 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •