Build JavaScript Search Dialog

Introduction
Build Page | Test Page
Tutorial | Search Form Examples | Rules & Limitations | Special Use

This command is available from the Authoring menu in the FAR main window.

Introduction

Use it to build a JavaScript based Search Engine that can work with loose HTML files at any location (DVD, CD, Intranet, Internet, USB Key drive etc). All modern browsers are fully JavaScript enabled. We've successfully tested browsers as old as :- MS Internet Explorer 4; FireFox 1.0; Opera 7.5; Safari 3.x; Netscape 7.2;

The command is disabled until you add HTML files [.htm|.html|.asp|.hta|.mht] into the FAR file list. Only add the HTML files you want rip to the JS search data. Click Build to collect all search terms from all HTML files and generate a searchdata.js file, then go to the Test page and generate a test search system. You can actually use this on your own help web. But primarily the output file searchdata.js is consumed by FAR uncompressed help dialog (see Authoring > Make Uncompressed Help dialog) which generates TOC, Index and Search navigation for your site.

You can see a real world example of a FAR JavaScript search system on the Helpware web site:

About FAR version 5.x Unicode

Past versions of FAR were ANSI based. Which meant you were limited to working with a single foreign language (which had to match the Operation System language) + English (which is part of every ANSI code page). Very limiting and caused a lot of confusion.

FAR v5 is now Unicode which means you can now work with all languages simultaneously. The search database can contain search terms from German, French, Japanese, Chinese etc all together.

Just make sure that if you do have foreign language source HTML files that do not match the operating system language make sure they are in Unicode (UTF-8 or UTF-16) format so FAR can index them correctly.

CHM2WEBB Compatible

We have attempted to make the FAR JavaScript search data file (searchdata.js) fully compatible with A!K Research Labs product CHM2WEB's search data file. The file format is used with permission. This means you could conceivably regenerate a CHM2WEB searchdata.js file using FAR without performing the full port from the original CHM file. CHM2WEB is an award winning product perfect for porting CHM to HTML web along with Contents, Index and Search. We highly recommend it.

Licensing

Please respect our intellectual property by asking permission before redistributing or sharing source with other web developers and authors. As far as your web site is concerned you don't need to ask permission to ship our files as long as you are a licensed user of FAR HTML.

Build Page
Output File

Base Dir: This read-only field is set when you add files to the FAR file list (FAR main window). Normally it is the root folder of your web site. All search data files will be created or copied to this location.

File name: This is the name of the search data file that will be created when Build button is clicked. Normally it should be left as "searchdata.js". If you change the name then be sure to either rename the file generated back to searchdata.js later OR modify the search HTML file that uses this file.

You can also enter the name of a .HTM file. FAR will create the HTML file, collect all search words found in the FAR file list, and add them to a hidden <DIV> section in the HTML file. For more info see Special Use.
Search Options

These options affect which search terms FAR will store in the searchdata.js file.

Ignore Words x chars long and smaller: Check this option to exclude words up to X character in length.

Ignore words containing only numbers: Check this option to exclude words containing only numbers. EG. 1234 911

Use Stop Word List file: Enter the full path of a file containing a list of words to exclude. This is a simple text file containing one word per line (same type of file authors use in MS compiled HTML Help). If you use non-English characters then best to make sure the file format is Unicode (UTF-8 is preferred).

Additional search chars: By default FAR gathers all words containing alpha-numeric characters, and "-" and "_". Normally if say a "\" char is found in a word then FAR considers this like a space char and breaks the word into two words. If you want to allow words containing say "\" and "/" chars then simply enter \ and / into the enter field.

Alternatively you can place all your extra chars into a text file. Enter the full path of the text file into the entry field. Spaces and Control characters (such as line feeds and tabs chars) will be ignored.

FAR v5 Note: In the past (FAR v4 ANSI) to do Japanese or Chinese you had to include every foreign language character in this list. Under FAR v5 Unicode this is no longer required.

Select a Code Page to use when reading ANSI files: If you are reading foreign language ANSI files then select the Code Page here so that FAR can correctly read them. Older version of FAR required you to set the code page in the Windows Control Panel. If you have a mixture of foreign languages then you will need to convert your source files to Unicode (UTF-8 is probably best because of it's smaller size).

Break apart CJK text: Chinese/Japanese/Korean pictogram character text is often _not_ broken into words by space characters and punctuation. This is not a problem with search since checking the "Partial match" checkbox on the search form will always find the text. If you check this option, all CJK paragraphs are broken into as many sub strings as possible and added the database. So that's nice that "Partial match" is no longer required, but it rather bloats the search data file. This could cause the search page to load very slowly on the web. Recommend you leave this unchecked and train your operators to use "Partial match". 

Build searchdata.js File

This button kicks off the generation of a new searchdata.js file into the Base Directory. If you have a lot of files to process it can take a long time to scan them all for search terms. In this case go and get a cup of coffee. :-)

Note that FAR only knows how to parse HTML files for words. If you include any other file types in the file list FAR will ignore them. Actually far considers the following file types as HTML [.htm|.html|.asp|.hta|.mht]. This list can be modified by editing the Settings.ini file (FAR.EXE folder).

[MAIN]
HtmlHelpFileTypes = .htm|.html|.asp|.hta|.mht

However remember that other functions in FAR also use this rule. Also note that the HTML parsing function is expecting to find at least one <body> tag.

Progress Panel

Above the Build button are some progress fields. The name of the file currently being scanned. Current File Number / Total Files to be scanned. Words found in current file / Total words found so far.

Test Page
Main search files required by FAR JS Search.

searchdata.js: This is the data file of search terms you generated on the build page.

search.js: Contains functions used to search the data file, list and display results. Do not edit this file unless you have a very good reason to do so.

search0?.html, search0?.frame.html: Search form examples. Open the frame file to test your search. You can modify these files to create your own search forms.

Apart from searchdata.js, which is created from scratch, all these files live in .\extra\ folder below the FAR.EXE folder.

Controls

Drop Down Control - This control contains a few simple search form examples. Select a search form then click the Create button.

Create button - Press the Create button to copy all search files into the Base Directory (searchdata.js is already there). Create also opens the frame file so you can test your search engine.

Windows Explorer - This button opens Windows Explorer at the current Base Directory so you can examine the files.

Tutorial

The procedure is quite simple.

  1. Add to the main FAR file list all the files you want to include into searchdata.js.
    I normally set my Drop Filter to ".HTM* file only" then drag and drop the root folder of my local web onto FAR.
     
  2. Open this window and click "Build searchdata.js File". This collects all the search terms in all the HTML files and generates a data file call searchdata.js (or whatever Output file name you specified) in the base directory (which is set when you add files to the FAR file list).
     
  3. Go to the Test page, select a search form example and click "Create". This copies the selected example files into your Base Dir and opens the search in your browser so you can test it.
How many files can JS Search handled?

It depends on how big your searchdata.js file becomes. Webs containing thousands of files can end up with searchdata.js files several MBytes in size. For a web site this is a very big download. For a local file on the hard disk of a fast PC this is not a problem. On slower computers the user may experience a slight delay when the search form loads. In general a few thousand average sized topic files is not a problem. Experiment and test for yourself. Once the data loads the search operation itself is relatively fast. If the data file becomes too big then try breaking the web site into several logical areas and generate a search data file for each area.

How do I search non-html files?

You could create HTML files containing nothing but search keywords, that when opened diverts the user to a non-HTML file such as a PDF of JPG file. In this way you could indirectly allow non-HTML files to be present in your search results generated by any search engine. See Special Use below.

Search Form Examples
Search00.HTML

This is a multi-line search form. See our web site for a live example of this form. It works best in the left frame of a web site. The result list is configured with Target="right" so that when you click a result item the topic page opens in the right side pane (pane with name="right"). If your right pane is called say "contents" then simply change the last parameter of the DoSearch() function from 'right' to 'contents'. You can find this by searching the code in the search form Search0.HTML.

Transplanting code from Search0.HTML into your own search page is simple. There are 2 main sections. One in the Head section (containing javascript include statements) and one is the Body section (containing the search form). These sections are clearly marked with comment tags.

<HTML>
<HEAD>
  <META http-equiv="Content-Type" content="text/html; charset=windows-1252">
  <TITLE>Search</TITLE>

  <!-- BEGIN_SEARCH_SCRIPT -->
  <!-- Helpware Search data generated by FAR.EXE http://helpware.net/FAR/ -->
  <script language="JavaScript" src="searchdata.js"></script>
  <!-- Search Code to find words -->
  <script language="JavaScript" src="search.js"></script>
  <!-- END_SEARCH_SCRIPT -->

</HEAD>
<BODY bgcolor="white" topmargin="0" leftmargin="0" rightmargin="0" bottommargin="0">

<!-- BEGIN_SEARCH_FORM -->
<form name="searchform">
  <script language="javascript">
  <!--
  function SearchForWords() {DoSearch(searchform.searchdata1.value, searchform.searchdata2.value, searchform.searchdata3.value, searchform.searchdata4.checked, 'right');}
  function KeyDownEvent() {if ((event.which && event.which==13) || (event.keyCode && event.keyCode==13)) {
    SearchForWords(); return(false); } else return(true);}
  -->
  </script>
  <table width="100%" height="100%" cellpadding="8">
    <tr height="10%">
    <td bgcolor="#FFFFEE">
    Topics containing any of these words: <input type="text" name="searchdata1" style="width:100%" onKeyDown="javascript:return(KeyDownEvent());"><br>
    Topics containing all of these words: <input type="text" name="searchdata2" style="width:100%" onKeyDown="javascript:return(KeyDownEvent());"><br>
    Topics should not contain these words: <input type="text" name="searchdata3" style="width:100%" onKeyDown="javascript:return(KeyDownEvent());"><br>
    <input type="checkbox" value="1" name ="searchdata4"> Partial word matching<br>
    <br>
    <input name="searchbutton" type="button" value="Search" style="width:150" onclick="javascript:SearchForWords();"></td>
    </tr>
    <tr height="90%">
    <td>
    <select name="SearchResultList" size="500" style="width:100%; height:100%; border-color:#FF0000" onchange="javascript:OpenResultListDoc()"></select>
    </td>
    </tr>
  </table>
</form>
<!-- END_SEARCH_FORM -->

</BODY>
</HTML>
Search00.HTML Walkthrough

The Head section code (above) is simple enough. It simply includes both JavaScript files searchdata.js & search.js.

The Form section in the Body of the document is slightly more complex.

Function SearchForWords() is called whenever the search button is clicked. It calls the main function in search.js called DoSearch(). This function passes the search query terms entered to search.js which in return populates the listbox "SearchResultList" with results. If there is no list box called "SearchResultList" defined then the search results are simply written to a new HTML document.

DoSearch(s1, s2, s3, PartialMatch, Target);

S1 (OR)
Passes user entered search words from Form field 1. By default all space separated words are OR'd together during the search. This means that results will be those topics that contain any of these search words.
S2 (AND)
Passes user entered search words from Form field 2. By default all space separated words are AND'd together during the search. This means that results will be those topics that contain all of these search words.
S3 (NOT)
Passes user entered search words from Form field 3. By default all space separated words are NOT'd together during the search. This means that results will be those topics that do not contain any of these search words.
PartialMatch
This Boolean parameter allows you to perform partial matches. EG. When TRUE the search term "nut" will match topics containing whole words "nut", "nuts", "beernuts" etc.
Target
This is where a document will open when you click on a result list items. If your right-hand frame has a name = "right" then set this parameter to "right" if that's where you want the result topic to be opened.

The nice thing about the Search00 example is that users don't need to use the key words OR and AND and NOT. They just need to enter search terms into the appropriate entry fields. You could however alter the form and use only one entry field. If you wanted the default action to be "words OR'd" then you would pass all search words entered in the first parameter S1. Parameters S2 and S3 would be left blank. If you want the default action to be "Words AND'd" then pass all search words entered in the second parameter S2 (S1 & S3 left blank). Other examples show you how to do this.

OR, AND, NOT Keywords

With the first 2 fields the user can control whether a search term is OR'd or AND'd or NOT'd by prefixing each search term with a keyword OR, AND or NOT. Just as you can in MS Help or Google.

Example: Here's what happens for a search spec of "Dogs or Cats and Birds not Fish".

  1. Results list = All topics found containing the word "Dogs"
  2. Results list = Add topics containing the word "Cats"
  3. Results list = Reduce the results to topics containing the word "Birds".
  4. Results list = Reduce the results to topics not containing the word "Fish".

To put it another way: Topics containing ("Dogs" OR "Cats") AND "Birds" but NOT "Fish".

Highlighting search terms in the result topic

When you open a result topic FAR automatically highlights the search terms found in that file. This is a feature limited to Internet Explorer browsers only.

Result List - Examples 02 & 03

If you examine the code in search.js you will see that if the search form has a result list named "SearchResultList" then it is populated with search results. If no "SearchResultList" control is found then a new result list document is dynamically created and results are displayed in it as a simple list of links. See Examples 02 and 03 to see this in action. These are not polish examples. To modify the search results window you need to tweak the search.js function called ShowSearchResultsWindow();

Rules & Limitations
  1. The FAR JavaScript search system does not support the use of quotes and brackets. Thus you cannot for instance search for "My diary". The best you could do is search for the words MY AND DIARY.
  2. Search is not case sensitive. In fact the search database searchdata.js has all its data stored in uppercase to optimize storage size and search speed.
  3. Like MS help search engines the FAR JS search engine displays a maximum of 500 result items.
Special Use
HTML file with hidden list of keywords

Here's a novel use for this dialog that may help some authors. The problem with non-HTML files such as .PDF is that you can't search within them using the main search engine. Here's a way to get around this problem.

You could create a special HTML page that links to (or auto-loads) a .PDF file. If we added all the unique words from the PDF into a hidden section in the HTML file (and recompiled), then the user can now search for PDF keyword and find the HTML file.

If you use a .HTML file extension for the output file name, then this dialog will produce an HTML file with embedded keywords. This can also help when debugging (ie. since it lists all search terms found in a readable fashion).

  1. Copy the text in the .PDF file (or from the .PDF source files) into the body of a .HTM file. Remember to set the <title> tag of your HTML file.
    If the target file is a .JPG Image file then type some appropriate search keywords into the HTML file.
     
  2. Add the .HTM file to the FAR file list.
     
  3. Open the Build "JavaScript Search" dialog and
    set the output filename to be .HTM or .HTML file extension
    .

Build will now created an HTML file with a hidden list of keywords.

Variations

Edit the HTML file and replace "XXX.PDF" with the real target filename.

 


http://www.HelpwareGroup.com/