diff --git a/docs/demo.html b/docs/demo.html index fe1fdb62779..0c4aecaf01e 100644 --- a/docs/demo.html +++ b/docs/demo.html @@ -114,8 +114,8 @@ limitations under the License.

-This document is intended as a "getting started" guide to using and running the -Apache Lucene demos. It walks you through some basic installation and configuration. +This document is intended as a "getting started" guide to using and running the Lucene demos. +It walks you through some basic installation and configuration.

@@ -131,9 +131,8 @@ Apache Lucene demos. It walks you through some basic installation and configura

-The Lucene Demo code is a set of command line example applications that demonstrate various -functionality of Lucene and how one should go about adding it to their -applications. +The Lucene command-line demo code consists of two applications that demonstrate various +functionalities of Lucene and how one should go about adding Lucene to their applications.

@@ -143,22 +142,22 @@ applications. @@ -197,15 +196,15 @@ Tomcat.

@@ -242,8 +242,7 @@ index (or you skipped the step of creating it).

- Setting your classpath + Setting your CLASSPATH

-First, extract the latest Lucene distribution. +First, you should download the +latest Lucene distribution and then extract it to a working directory. Alternatively, you can check out the sources from +Subversion, and then run ant war-demo to generate the JARs and WARs.

-You should see the Apache Lucene jar file in the directory you created -when you extracted the archive. It should be named something like -lucene-{version}.jar. -

-

-You should also see a file called called lucene-demos-{version}.jar. -Put both of these files in your Java CLASSPATH. +You should see the Lucene JAR file in the directory you created when you extracted the archive. It +should be named something like lucene-core-{version}.jar. You should also see a file +called lucene-demos-{version}.jar. If you checked out the sources from Subversion then +the JARs are located under the build subdirectory (after running ant +successfully). Put both of these files in your Java CLASSPATH.

@@ -174,18 +173,27 @@ Put both of these files in your Java CLASSPATH.

-Once you've gotten this far you're probably itching to go. Let's build an index! -Assuming you've set your classpath correctly, just type -"java org.apache.lucene.demo.IndexFiles {full-path-to-lucene}/src". This will produce -a subdirectory called "index" which will contain an index of all of the Lucene -sourcecode. +Once you've gotten this far you're probably itching to go. Let's build an index! Assuming +you've set your CLASSPATH correctly, just type: + +

+    java org.apache.lucene.demo.IndexFiles {full-path-to-lucene}/src
+
+ +This will produce a subdirectory called index which will contain an index of all of the +Lucene source code.

- To search the index type "java org.apache.lucene.demo.SearchFiles". You'll be prompted -for a query. Type in a swear word and press the enter key. You'll see that the Lucene -developers are very well mannered and get no results. Now try entering the word "vector". -That should return a whole bunch of documents. The results will page at every tenth -result and ask you whether you want more results. +To search the index type: + +

+    java org.apache.lucene.demo.SearchFiles
+
+ +You'll be prompted for a query. Type in a swear word and press the enter key. You'll see that the +Lucene developers are very well mannered and get no results. Now try entering the word "vector". +That should return a whole bunch of documents. The results will page at every tenth result and ask +you whether you want more results.

diff --git a/docs/demo2.html b/docs/demo2.html index 186c0d5693a..0983f595d41 100644 --- a/docs/demo2.html +++ b/docs/demo2.html @@ -34,7 +34,7 @@ limitations under the License. - Apache Lucene - Apache Lucene - Basic Demo Sources Walkthrough + Apache Lucene - Apache Lucene - Basic Demo Sources Walk-through @@ -114,9 +114,9 @@ limitations under the License.

-In this section we walk through the sources behind the basic Lucene demo such as where to -find it, its parts and their function. This section is intended for Java developers -wishing to understand how to use Apache Lucene in their applications. +In this section we walk through the sources behind the command-line Lucene demo: where to find them, +their parts and their function. This section is intended for Java developers wishing to understand +how to use Lucene in their applications.

@@ -132,14 +132,14 @@ wishing to understand how to use Apache Lucene in their applications.

-Relative to the directory created when you extracted Lucene or retreived it from Subversion, you -should see a directory called "src" which in turn contains a directory called "demo". -This is the root for all of the Lucene demos. Under this directory is org/apache/lucene/demo, -this is where all the Java sources live. +Relative to the directory created when you extracted Lucene or retrieved it from Subversion, you +should see a directory called src which in turn contains a directory called +demo. This is the root for all of the Lucene demos. Under this directory is +org/apache/lucene/demo. This is where all the Java sources for the demos live.

-Within this directory you should see the IndexFiles class we executed earlier. Bring that -up in vi or your alternative text editor and lets take a look at it. +Within this directory you should see the IndexFiles.java class we executed earlier. +Bring it up in vi or your editor of choice and let's take a look at it.

@@ -155,43 +155,45 @@ up in vi or your alternative text editor and lets take a look at it.

-As we discussed in the previous walkthrough, the IndexFiles class creates a Lucene Index. -Lets take a look at how it does this. +As we discussed in the previous walk-through, the IndexFiles class creates a Lucene +Index. Let's take a look at how it does this.

-The first substantial thing the main function does is instantiate an instance -of IndexWriter. It passes a string called "index" and a new instance of a class called -"StandardAnalyzer". The "index" string is the name of the directory that all index information -should be stored in. Because we're not passing any path information, one must assume this -will be created as a subdirectory of the current directory (if it does not already exist). On -some platforms this may actually result in it being created in other directories (such as -the user's home directory). +The first substantial thing the main function does is instantiate IndexWriter. It passes the string +"index" and a new instance of a class called StandardAnalyzer. +The "index" string is the name of the filesystem directory where all index information +should be stored. Because we're not passing a full path, this will be created as a subdirectory of +the current working directory (if it does not already exist). On some platforms, it may be created +in other directories (such as the user's home directory).

-The IndexWriter is the main class responsible for creating indicies. To use it you -must instantiate it with a path that it can write the index into, if this path does not -exist it will create it, otherwise it will refresh the index living at that path. You -must a also pass an instance of org.apache.lucene.analysis.Analyzer. +The IndexWriter is the main +class responsible for creating indices. To use it you must instantiate it with a path that it can +write the index into. If this path does not exist it will first create it. Otherwise it will +refresh the index at that path. You can also create an index using one of the subclasses of Directory. In any case, you must also pass an +instance of org.apache.lucene.analysis.Analyzer.

-The Analyzer, in this case, the StandardAnalyzer is little more than a standard Java -Tokenizer, converting all strings to lowercase and filtering out useless words and characters from the index. -By useless words and characters I mean common language words such as articles (a, an, the, etc.) and other -strings that would be useless for searching (e.g. 's) . It should be noted that there are different -rules for every language, and you should use the proper analyzer for each. Lucene currently -provides Analyzers for English and German, more can be found in the Lucene Sandbox. +The particular Analyzer we +are using, StandardAnalyzer, is +little more than a standard Java Tokenizer, converting all strings to lowercase and filtering out +useless words and characters from the index. By useless words and characters I mean common language +words such as articles (a, an, the, etc.) and other strings that would be useless for searching +(e.g. 's) . It should be noted that there are different rules for every language, and you +should use the proper analyzer for each. Lucene currently provides Analyzers for a number of +different languages (see the *Analyzer.java sources under contrib/analyzers/src/java/org/apache/lucene/analysis).

-Looking down further in the file, you should see the indexDocs() code. This recursive function -simply crawls the directories and uses FileDocument to create Document objects. The Document -is simply a data object to represent the content in the file as well as its creation time and -location. These instances are added to the indexWriter. Take a look inside FileDocument. It's -not particularly complicated, it just adds fields to the Document. +Looking further down in the file, you should see the indexDocs() code. This recursive +function simply crawls the directories and uses FileDocument to create Document objects. The Document is simply a data object to +represent the content in the file as well as its creation time and location. These instances are +added to the indexWriter. Take a look inside FileDocument. It's not particularly +complicated. It just adds fields to the Document.

As you can see there isn't much to creating an index. The devil is in the details. You may also -wish to examine the other samples in this directory, particularly the IndexHTML class. It is -a bit more complex but builds upon this example. +wish to examine the other samples in this directory, particularly the IndexHTML class. It is a bit more +complex but builds upon this example.

@@ -207,12 +209,19 @@ a bit more complex but builds upon this example.

-The SearchFiles class is quite simple. It primarily collaborates with an IndexSearcher, StandardAnalyzer -(which is used in the IndexFiles class as well) and a QueryParser. The query parser is constructed -with an analyzer used to interperate your query in the same way the Index was interperated: finding -the end of words and removing useless words like 'a', 'an' and 'the'. The Query object contains the -results from the QueryParser which is passed to the searcher. The searcher results are returned in -a collection of Documents called "Hits" which is then iterated through and displayed to the user. +The SearchFiles class is +quite simple. It primarily collaborates with an IndexSearcher, StandardAnalyzer +(which is used in the IndexFiles class as well) and a +QueryParser. The +query parser is constructed with an analyzer used to interpret your query text in the same way the +documents are interpreted: finding the end of words and removing useless words like 'a', 'an' and +'the'. The Query object contains +the results from the QueryParser which is passed to +the searcher. Note that it's also possible to programmatically construct a rich Query object without using the query +parser. The query parser just enables decoding the Lucene query +syntax into the corresponding Query object. The searcher results are +returned in a collection of Documents called Hits which is then iterated through and +displayed to the user.

diff --git a/docs/demo3.html b/docs/demo3.html index 3c5402d398b..e0426bde4f4 100644 --- a/docs/demo3.html +++ b/docs/demo3.html @@ -114,11 +114,10 @@ limitations under the License.

-This document is intended as a "getting started" guide to installing and running the -Apache Lucene web application demo. This guide assumes that you have read the -information in the previous two examples or already know it anyhow. We'll use -Tomcat 4.0.1 as our reference web container. These demos should work with nearly -any container, but it is up to you to adapt them appropriately. +This document is intended as a "getting started" guide to installing and running the Lucene +web application demo. This guide assumes that you have read the information in the previous two +examples. We'll use Tomcat as our reference web container. These demos should work with nearly any +container, but you may have to adapt them appropriately.

@@ -134,12 +133,11 @@ any container, but it is up to you to adapt them appropriately.

-The Lucene Web Application demo is a template web application intended for deployment -on Tomcat or a similar web container. It's NOT designed as a "best practices" -implementation by ANY means. It's more of a "hello world" type Lucene Web App. -The purpose of this application is to demonstrate Lucene. With that being said, -it should be relatively simple to create a small searchable website in Tomcat or -a similar application server. +The Lucene Web Application demo is a template web application intended for deployment on Tomcat or a +similar web container. It's NOT designed as a "best practices" implementation by ANY means. It's +more of a "hello world" type Lucene Web App. The purpose of this application is to demonstrate +Lucene. With that being said, it should be relatively simple to create a small searchable website +in Tomcat or a similar application server.

@@ -154,18 +152,19 @@ a similar application server.
-

-Once you've gotten this far you're probably itching to go. -Let's start by creating the index you'll need for the web examples. -Since you've already set your classpath in the previous examples, -all you need to do is type - "java org.apache.lucene.demo.IndexHTML -create -index {index-dir} ..". -You'll need to do this from a (any) subdirectory of your {tomcat}/webapps directory (make sure you didn't leave off the ".." or you'll get a null pointer -exception). -{index-dir} -should be a directory that Tomcat has permission to read and write, but is -outside of a web accessible context. By default the webapp is configured -to look in /opt/lucene/index for this index. +

Once you've gotten this far you're probably itching to go. Let's start by creating the index +you'll need for the web examples. Since you've already set your CLASSPATH in the previous examples, +all you need to do is type: + +

+    java org.apache.lucene.demo.IndexHTML -create -index {index-dir} ..
+
+ +You'll need to do this from a (any) subdirectory of your {tomcat}/webapps directory +(make sure you didn't leave off the .. or you'll get a null pointer exception). +{index-dir} should be a directory that Tomcat has permission to read and write, but is +outside of a web accessible context. By default the webapp is configured to look in +/opt/lucene/index for this index.

@@ -180,10 +179,10 @@ to look in /opt/lucene/index for this index.
-

Located in your distribution directory you should see -a war file called luceneweb.war. Copy this to your -{tomcat-home}/webapps directory. You may need to restart -Tomcat.

+

Located in your distribution directory you should see a war file called +luceneweb.war. If you're working with a Subversion checkout, this will be under the +build subdirectory. Copy this to your {tomcat-home}/webapps directory. +You may need to restart Tomcat.

-

-From your Tomcat directory look in the webapps/luceneweb subdirectory. If it's not -present, try browsing to "http://localhost:8080/luceneweb" then look again. -Edit a file called configuration.jsp. Ensure that the indexLocation is equal to the -location you used for your index. You may also customize the appTitle and appFooter -strings as you see fit. Once you have finished altering the configuration you should -restart Tomcat. You may also wish to update the war file by typing -jar -uf luceneweb.war configuration.jsp from the luceneweb subdirectory. -(The -u option is not available in all versions of jar. In this case recreate the war file). +

From your Tomcat directory look in the webapps/luceneweb subdirectory. If it's not +present, try browsing to http://localhost:8080/luceneweb (which causes Tomcat to deploy +the webapp), then look again. Edit a file called configuration.jsp. Ensure that the +indexLocation is equal to the location you used for your index. You may also customize +the appTitle and appFooter strings as you see fit. Once you have finished +altering the configuration you may need to restart Tomcat. You may also wish to update the war file +by typing jar -uf luceneweb.war configuration.jsp from the luceneweb +subdirectory. (The -u option is not available in all versions of jar. In this case recreate the +war file).

@@ -220,14 +219,15 @@ restart Tomcat. You may also wish to update the war file by typing
-

Now you're ready to roll. In your browser set the url to "http://localhost:8080/luceneweb" -enter "test" and the number of items per page and press search.

-

You should now be looking either at a number of results (provided you didn't erase the -Tomcat examples) or nothing. Try other search terms. Depending on the number of items -per page you set and results returned, there may be a link at the bottom that says "more results>>", -clicking it goes to subsequent pages. If you get an error regarding opening the index, then you -probably set the path in "configuration" incorrectly or Tomcat doesn't have permissions to the -index (or you skipped the step of creating it).

+

Now you're ready to roll. In your browser set the url to +http://localhost:8080/luceneweb enter test and the number of items per +page and press search.

+

You should now be looking either at a number of results (provided you didn't erase the Tomcat +examples) or nothing. If you get an error regarding opening the index, then you probably set the +path in configuration.jsp incorrectly or Tomcat doesn't have permissions to the index +(or you skipped the step of creating it). Try other search terms. Depending on the number of items +per page you set and results returned, there may be a link at the bottom that says More +Results>>; clicking it takes you to subsequent pages.

-If you want to know more about how this web app works or how to customize it then -read on>>>. +If you want to know more about how this web app works or how to customize it then read on>>>.

diff --git a/docs/demo4.html b/docs/demo4.html index d42286bb7ae..56fb075e6dd 100644 --- a/docs/demo4.html +++ b/docs/demo4.html @@ -114,10 +114,10 @@ limitations under the License.

-In this section we walk through the sources behind the basic Lucene Web Application demo. -Where to find it, its parts, and their function. This section is intended for Java developers -wishing to understand how to use Apache Lucene in their applications or for those involved -in deploying web applications based on Lucene. +In this section we walk through the sources behind the basic Lucene Web Application demo: where to +find them, their parts and their function. This section is intended for Java developers wishing to +understand how to use Lucene in their applications or for those involved in deploying web +applications based on Lucene.

@@ -133,13 +133,13 @@ in deploying web applications based on Lucene.

-Relative the directory created when you extracted Lucene or retreived it from Subversion, you -should see a directory called "src" which in turn contains a directory called "jsp". -This is the root for all of the Lucene web demo. +Relative to the directory created when you extracted Lucene or retrieved it from Subversion, you +should see a directory called src which in turn contains a directory called +jsp. This is the root for all of the Lucene web demo.

-Within this directory you should see the index.jsp class. Bring this up in vi or your -editor of choice. +Within this directory you should see index.jsp. Bring this up in vi or your editor of +choice.

@@ -155,14 +155,12 @@ editor of choice.

-This jsp page is pretty boring by itself. All it does is include a header, display a form and -include a footer. If you look at the form, it has two fields: query (where you enter your -search criteria) and maxresults where you specify the number of results per page. If you look -at the form tag, you'll notice it uses the get method as opposed to the post. While this is -considered deprecated functionality by the latest w3c specs, its unlikely to go away due to the -usefulness of being able to bookmark things like searches. By the structure of this JSP it should -be easy to customize it without even editing this particular file. You could simply change the -header and footer. Let's look at the header.jsp (located in the same directory) next. +This jsp page is pretty boring by itself. All it does is include a header, display a form and +include a footer. If you look at the form, it has two fields: query (where you enter +your search criteria) and maxresults where you specify the number of results per page. +By the structure of this JSP it should be easy to customize it without even editing this particular +file. You could simply change the header and footer. Let's look at the header.jsp +(located in the same directory) next.

@@ -178,11 +176,11 @@ header and footer. Let's look at the header.jsp (located in the same directory)

-The header is also very simple by itself. The only thing it does is include the configuration.jsp -(which you looked at in the last section of this guide) and set the title and a brief header. This -would be a good place to put your own custom HTML to "pretty" things up a bit. We won't cover the -footer because all it does is display the footer and close your tags. Let's look at the results.jsp, -the meat of this application next. +The header is also very simple by itself. The only thing it does is include the +configuration.jsp (which you looked at in the last section of this guide) and set the +title and a brief header. This would be a good place to put your own custom HTML to "pretty" things +up a bit. We won't cover the footer because all it does is display the footer and close your tags. +Let's look at the results.jsp, the meat of this application, next.

@@ -198,43 +196,52 @@ the meat of this application next.

-The results.jsp had a lot more functionality. Much of it is for paging the search results we'll not -cover this as it's commented well enough. It does not perform any optimizations such as caching results, -etc. as that would make this a more complex example. The first thing in this page is the actual imports -for the Lucene classes and Lucene demo classes. These classes are loaded from the jars included in the -WEB-INF/lib directory in the final war file. +Most of the functionality lies in results.jsp. Much of it is for paging the search +results, which we'll not cover here as it's commented well enough. The first thing in this page is +the actual imports for the Lucene classes and Lucene demo classes. These classes are loaded from +the jars included in the WEB-INF/lib directory in the luceneweb.war file.

-You'll notice that this file includes the same header and footer as the "index.jsp". From there the jsp -constructs an IndexSearcher with the "indexLocation" that was specified in the "configuration.jsp". If there -is an error of any kind in opening the index, it is diplayed to the user and a boolean flag is set to tell -the rest of the sections of the jsp not to continue. +You'll notice that this file includes the same header and footer as index.jsp. From +there it constructs an IndexSearcher with the +indexLocation that was specified in configuration.jsp. If there is an +error of any kind in opening the index, it is displayed to the user and the boolean flag +error is set to tell the rest of the sections of the jsp not to continue.

-From there, this jsp attempts to get the search criteria, the start index (used for paging) and the maximum -number of results per page. If the maximum results per page is not set or not valid then it and the -start index are set to default values. If only the start index is invalid it is set to a default value. If -the criteria isn't provided then a servlet error is thrown (it is assumed that this is the result of url tampering -or some form of browser malfunction). +From there, this jsp attempts to get the search criteria, the start index (used for paging) and the +maximum number of results per page. If the maximum results per page is not set or not valid then it +and the start index are set to default values. If only the start index is invalid it is set to a +default value. If the criteria isn't provided then a servlet error is thrown (it is assumed that +this is the result of url tampering or some form of browser malfunction).

-The jsp moves on to construct a StandardAnalyzer just as in the simple demo, to analyze the search critieria, it -is passed to the QueryParser along with the criteria to construct a Query object. You'll also notice the -string literal "contents" included. This is to specify the search should include the contents and not -the title, url or some other field in the indexed documents. If there is any error in constructing a Query -object an error is displayed to the user. +The jsp moves on to construct a StandardAnalyzer to +analyze the search text. This matches the analyzer used during indexing (IndexHTML), which is generally +recommended. This is passed to the QueryParser along with the +criteria to construct a Query +object. You'll also notice the string literal "contents" included. This specifies +that the search should cover the contents field and not the title, +url or some other field in the indexed documents. If there is any error in +constructing a Query object an +error is displayed to the user.

-In the next section of the jsp the IndexSearcher is asked to search given the query object. The results are -returned in a collection called "hits". If the length property of the hits collection is 0 then an error -is displayed to the user and the error flag is set. +In the next section of the jsp the IndexSearcher is asked to search +given the query object. The results are returned in a collection called hits. If the +length property of the hits collection is 0 (meaning there were no results) then an +error is displayed to the user and the error flag is set.

-Finally the jsp iterates through the hits collection and displayed properties of the "Document" objects we talked -about in the first walkthrough. These objects contain "known" fields specific to their indexer (in this case -"IndexHTML" constructs a document with "url", "title" and "contents"). You'll notice that these results are paged -but the search is repeated every time. This is an area where optimization could improve performance for large -result sets. +Finally the jsp iterates through the hits collection, taking the current page into +account, and displays properties of the Document objects we talked about in +the first walkthrough. These objects contain "known" fields specific to their indexer (in this case +IndexHTML constructs a document +with "url", "title" and "contents"). +

+

+Please note that in a real deployment of Lucene, it's best to instantiate IndexSearcher and QueryParser once, and then +share them across search requests, instead of re-instantiating per search request.

@@ -250,10 +257,11 @@ result sets.

-There are additional sources used by the web app that were not specifically covered by either walkthrough. For -example the HTML parser, the IndexHTML class and HTMLDocument class. These are very similar to the classes -covered in the first example, however they have properties sepecific to parsing and indexing HTML. This is -beyond our scope; however, by now you should feel like you're "getting started" with Lucene. +There are additional sources used by the web app that were not specifically covered by either +walkthrough. For example the HTML parser, the IndexHTML class and HTMLDocument class. These are very +similar to the classes covered in the first example, with properties specific to parsing and +indexing HTML. This is beyond our scope; however, by now you should feel like you're "getting +started" with Lucene.

@@ -263,24 +271,26 @@ beyond our scope; however, by now you should feel like you're "getting started"
- Where to go from here? (Everyone!) + Where to go from here? (everyone!)

-There are a number of things this demo doesn't do or doesn't do quite right. For instance, you may +There are a number of things this demo doesn't do or doesn't do quite right. For instance, you may have noticed that documents in the root context are unreachable (unless you reconfigure Tomcat to -support that context or redirect to it), anywhere where the directory doesn't quite match the context mapping, -you'll have a broken link in your results. If you want to index non-local files or have some other -needs this isn't supported, plus there may be security issues with running the indexing application from -your webapps directory. There are a number of things left for you the implementor or developer to do. +support that context or redirect to it), anywhere where the directory doesn't quite match the +context mapping, you'll have a broken link in your results. If you want to index non-local files or +have some other needs this isn't supported, plus there may be security issues with running the +indexing application from your webapps directory. There are a number of things left for you the +developer to do.

-In time some of these things may be added to Lucene as features (if you've got a good idea we'd love to hear it!), -but for now: this is where you begin and the search engine/indexer ends. Lastly, one would assume you'd -want to follow the above advice and customize the application to look a little more fancy than black on -white with "Lucene Template" at the top. We'll see you on the Lucene Users' or Developers' mailing lists! +In time some of these things may be added to Lucene as features (if you've got a good idea we'd love +to hear it!), but for now: this is where you begin and the search engine/indexer ends. Lastly, one +would assume you'd want to follow the above advice and customize the application to look a little +more fancy than black on white with "Lucene Template" at the top. We'll see you on the Lucene +Users' or Developers' mailing lists!

@@ -296,11 +306,12 @@ white with "Lucene Template" at the top. We'll see you on the Lucene Users' or

-Please resist the urge to contact the authors of this document (without bribes of fame and fortune attached). First -contact the mailing lists. That being said feedback, -and modifications to this document and samples are ever so greatly appreciated. They are just best sent to the -lists so that everyone can share in them. Certainly you'll get the most help there as well. -Thanks for understanding. +Please resist the urge to contact the authors of this document (without bribes of fame and fortune +attached). First contact the mailing lists, taking care to Ask Questions The Smart Way. +Certainly you'll get the most help that way as well. That being said, feedback, and modifications +to this document and samples are ever so greatly appreciated. They are just best sent to the lists +or posted as patches, so that +everyone can share in them. Thanks for understanding!

diff --git a/docs/gettingstarted.html b/docs/gettingstarted.html index 6c6f1e1fd12..51228e175b1 100644 --- a/docs/gettingstarted.html +++ b/docs/gettingstarted.html @@ -114,40 +114,38 @@ limitations under the License.

-This document is intended as a "getting started" guide. It has three basic -audiences: novices looking to install Apache Lucene on their application or -web server, developers looking to modify or base the applications they develop -on Lucene, and developers looking to become involved in and contribute to the -development of Lucene. This document is written in tutorial and walkthrough -format. It intends to help you in "getting started", but does not go into great -depth into some of the conceptual or inner details of Apache Lucene. +This document is intended as a "getting started" guide. It has three audiences: first-time users +looking to install Apache Lucene in their application or web server; developers looking to modify or base +the applications they develop on Lucene; and developers looking to become involved in and contribute +to the development of Lucene. This document is written in tutorial and walk-through format. The +goal is to help you "get started". It does not go into great depth on some of the conceptual or +inner details of Lucene.

-Each section listed below builds on one another. That being said more advanced users may -wish to skip sections. +Each section listed below builds on one another. More advanced users +may wish to skip sections.

    -
  • About the basic Lucene demo and its usage. - This section is intended for anyone who wants a basic background on using the provided Lucene demos.
  • +
  • About the command-line Lucene demo and its usage. This section + is intended for anyone who wants to use the command-line Lucene demo.
  • -

  • About the sources and implementation - for the basic Lucene demo section we walk through . This section is intended for developers.
  • +
  • About the sources and implementation for the command-line Lucene + demo. This section walks through the implementation details (sources) of the + command-line Lucene demo. This section is intended for developers.
  • -

  • About installing - and configuring the template web application. While this walkthrough assumes - Tomcat 4.0.x as your container of choice, there is no reason you can't (provided you have - the requisite knowledge) adapt the instructions to your container. This section is intended - for those responsible for the development or deployment of Lucene-based web applications.
  • +
  • About installing and configuring the demo template web + application. While this walk-through assumes Tomcat as your container of choice, + there is no reason you can't (provided you have the requisite knowledge) adapt the + instructions to your container. This section is intended for those responsible for the + development or deployment of Lucene-based web applications.
  • + +

  • About the sources used to construct the demo template web + application. Please note the template application is designed to highlight features of + Lucene and is not an example of best practices. (One would hopefully use MVC + architecture such as provided by Jakarta Struts and taglibs, but showing you how to do that + would be WAY beyond the scope of this guide.) This section is intended for developers and + those wishing to customize the demo template web application to their needs.
  • -
  • About the sources used to construct the - template web application. Please note the template application is designed to highlight - features of Lucene and is not an example of best practices. (One would hopefully - use MVC architecture such as provided by Jakarta Struts and taglibs, or better yet XML - with stylesheets, but showing you how to do that would be WAY beyond the scope of this - demonstration. Additionally one could cache results, and perform other performance - optimizations, but those are beyond the scope of this demo). - This section is intended for developers and those wishing to customize the template web - application to their needs. The sections useful to developers only are clearly delineated.

diff --git a/src/jsp/results.jsp b/src/jsp/results.jsp index 208d2d7df99..71c76adc351 100755 --- a/src/jsp/results.jsp +++ b/src/jsp/results.jsp @@ -1,4 +1,4 @@ -<%@ page import = " javax.servlet.*, javax.servlet.http.*, java.io.*, org.apache.lucene.analysis.*, org.apache.lucene.document.*, org.apache.lucene.index.*, org.apache.lucene.search.*, org.apache.lucene.queryParser.*, org.apache.lucene.demo.*, org.apache.lucene.demo.html.Entities, java.net.URLEncoder" %> +<%@ page import = " javax.servlet.*, javax.servlet.http.*, java.io.*, org.apache.lucene.analysis.*, org.apache.lucene.analysis.standard.StandardAnalyzer, org.apache.lucene.document.*, org.apache.lucene.index.*, org.apache.lucene.search.*, org.apache.lucene.queryParser.*, org.apache.lucene.demo.*, org.apache.lucene.demo.html.Entities, java.net.URLEncoder" %> <% /* @@ -76,7 +76,7 @@ public String escapeHTML(String s) { //query string so you get the //treatment - Analyzer analyzer = new StopAnalyzer(); //construct our usual analyzer + Analyzer analyzer = new StandardAnalyzer(); //construct our usual analyzer try { QueryParser qp = new QueryParser("contents", analyzer); query = qp.parse(queryString); //parse the @@ -126,8 +126,11 @@ public String escapeHTML(String s) { <% Document doc = hits.doc(i); //get the next document String doctitle = doc.get("title"); //get its title - String url = doc.get("url"); //get its url field - if ((doctitle == null) || doctitle.equals("")) //use the url if it has no title + String url = doc.get("path"); //get its path field + if (url != null && url.startsWith("../webapps/")) { // strip off ../webapps prefix if present + url = url.substring(10); + } + if ((doctitle == null) || doctitle.equals("")) //use the path if it has no title doctitle = url; //then output! %> diff --git a/xdocs/demo.xml b/xdocs/demo.xml index 02975a5da00..f027559479d 100644 --- a/xdocs/demo.xml +++ b/xdocs/demo.xml @@ -8,49 +8,58 @@

-This document is intended as a "getting started" guide to using and running the -Apache Lucene demos. It walks you through some basic installation and configuration. +This document is intended as a "getting started" guide to using and running the Lucene demos. +It walks you through some basic installation and configuration.

-The Lucene Demo code is a set of command line example applications that demonstrate various -functionality of Lucene and how one should go about adding it to their -applications. +The Lucene command-line demo code consists of two applications that demonstrate various +functionalities of Lucene and how one should go about adding Lucene to their applications.

-
+

-First, extract the latest Lucene distribution. +First, you should download the +latest Lucene distribution and then extract it to a working directory. Alternatively, you can check out the sources from +Subversion, and then run ant war-demo to generate the JARs and WARs.

-You should see the Apache Lucene jar file in the directory you created -when you extracted the archive. It should be named something like -lucene-{version}.jar. -

-

-You should also see a file called called lucene-demos-{version}.jar. -Put both of these files in your Java CLASSPATH. +You should see the Lucene JAR file in the directory you created when you extracted the archive. It +should be named something like lucene-core-{version}.jar. You should also see a file +called lucene-demos-{version}.jar. If you checked out the sources from Subversion then +the JARs are located under the build subdirectory (after running ant +successfully). Put both of these files in your Java CLASSPATH.

-Once you've gotten this far you're probably itching to go. Let's build an index! -Assuming you've set your classpath correctly, just type -"java org.apache.lucene.demo.IndexFiles {full-path-to-lucene}/src". This will produce -a subdirectory called "index" which will contain an index of all of the Lucene -sourcecode. +Once you've gotten this far you're probably itching to go. Let's build an index! Assuming +you've set your CLASSPATH correctly, just type: + +

+    java org.apache.lucene.demo.IndexFiles {full-path-to-lucene}/src
+
+ +This will produce a subdirectory called index which will contain an index of all of the +Lucene source code.

- To search the index type "java org.apache.lucene.demo.SearchFiles". You'll be prompted -for a query. Type in a swear word and press the enter key. You'll see that the Lucene -developers are very well mannered and get no results. Now try entering the word "vector". -That should return a whole bunch of documents. The results will page at every tenth -result and ask you whether you want more results. +To search the index type: + +

+    java org.apache.lucene.demo.SearchFiles
+
+ +You'll be prompted for a query. Type in a swear word and press the enter key. You'll see that the +Lucene developers are very well mannered and get no results. Now try entering the word "vector". +That should return a whole bunch of documents. The results will page at every tenth result and ask +you whether you want more results.

diff --git a/xdocs/demo2.xml b/xdocs/demo2.xml index 6c55d43f8b2..1d6244eecd7 100644 --- a/xdocs/demo2.xml +++ b/xdocs/demo2.xml @@ -2,89 +2,132 @@ Andrew C. Oliver -Apache Lucene - Basic Demo Sources Walkthrough +Apache Lucene - Basic Demo Sources Walk-through

-In this section we walk through the sources behind the basic Lucene demo such as where to -find it, its parts and their function. This section is intended for Java developers -wishing to understand how to use Apache Lucene in their applications. +In this section we walk through the sources behind the command-line Lucene demo: where to find them, +their parts and their function. This section is intended for Java developers wishing to understand +how to use Lucene in their applications.

+

-Relative to the directory created when you extracted Lucene or retreived it from Subversion, you -should see a directory called "src" which in turn contains a directory called "demo". -This is the root for all of the Lucene demos. Under this directory is org/apache/lucene/demo, -this is where all the Java sources live. +Relative to the directory created when you extracted Lucene or retrieved it from Subversion, you +should see a directory called src which in turn contains a directory called +demo. This is the root for all of the Lucene demos. Under this directory is +org/apache/lucene/demo. This is where all the Java sources for the demos live.

+

-Within this directory you should see the IndexFiles class we executed earlier. Bring that -up in vi or your alternative text editor and lets take a look at it. +Within this directory you should see the IndexFiles.java class we executed earlier. +Bring it up in vi or your editor of choice and let's take a look at it.

+
+

-As we discussed in the previous walkthrough, the IndexFiles class creates a Lucene Index. -Lets take a look at how it does this. +As we discussed in the previous walk-through, the IndexFiles class creates a Lucene +Index. Let's take a look at how it does this.

+

-The first substantial thing the main function does is instantiate an instance -of IndexWriter. It passes a string called "index" and a new instance of a class called -"StandardAnalyzer". The "index" string is the name of the directory that all index information -should be stored in. Because we're not passing any path information, one must assume this -will be created as a subdirectory of the current directory (if it does not already exist). On -some platforms this may actually result in it being created in other directories (such as -the user's home directory). +The first substantial thing the main function does is instantiate IndexWriter. It passes the string +"index" and a new instance of a class called StandardAnalyzer. +The "index" string is the name of the filesystem directory where all index information +should be stored. Because we're not passing a full path, this will be created as a subdirectory of +the current working directory (if it does not already exist). On some platforms, it may be created +in other directories (such as the user's home directory).

+

-The IndexWriter is the main class responsible for creating indicies. To use it you -must instantiate it with a path that it can write the index into, if this path does not -exist it will create it, otherwise it will refresh the index living at that path. You -must a also pass an instance of org.apache.lucene.analysis.Analyzer. +The IndexWriter is the main +class responsible for creating indices. To use it you must instantiate it with a path that it can +write the index into. If this path does not exist it will first create it. Otherwise it will +refresh the index at that path. You can also create an index using one of the subclasses of Directory. In any case, you must also pass an +instance of org.apache.lucene.analysis.Analyzer.

+

-The Analyzer, in this case, the StandardAnalyzer is little more than a standard Java -Tokenizer, converting all strings to lowercase and filtering out useless words and characters from the index. -By useless words and characters I mean common language words such as articles (a, an, the, etc.) and other -strings that would be useless for searching (e.g. 's) . It should be noted that there are different -rules for every language, and you should use the proper analyzer for each. Lucene currently -provides Analyzers for English and German, more can be found in the Lucene Sandbox. +The particular Analyzer we +are using, StandardAnalyzer, is +little more than a standard Java Tokenizer, converting all strings to lowercase and filtering out +useless words and characters from the index. By useless words and characters I mean common language +words such as articles (a, an, the, etc.) and other strings that would be useless for searching +(e.g. 's) . It should be noted that there are different rules for every language, and you +should use the proper analyzer for each. Lucene currently provides Analyzers for a number of +different languages (see the *Analyzer.java sources under contrib/analyzers/src/java/org/apache/lucene/analysis).

+

-Looking down further in the file, you should see the indexDocs() code. This recursive function -simply crawls the directories and uses FileDocument to create Document objects. The Document -is simply a data object to represent the content in the file as well as its creation time and -location. These instances are added to the indexWriter. Take a look inside FileDocument. It's -not particularly complicated, it just adds fields to the Document. +Looking further down in the file, you should see the indexDocs() code. This recursive +function simply crawls the directories and uses FileDocument to create Document objects. The Document is simply a data object to +represent the content in the file as well as its creation time and location. These instances are +added to the indexWriter. Take a look inside FileDocument. It's not particularly +complicated. It just adds fields to the Document.

+

As you can see there isn't much to creating an index. The devil is in the details. You may also -wish to examine the other samples in this directory, particularly the IndexHTML class. It is -a bit more complex but builds upon this example. +wish to examine the other samples in this directory, particularly the IndexHTML class. It is a bit more +complex but builds upon this example.

+
+

-The SearchFiles class is quite simple. It primarily collaborates with an IndexSearcher, StandardAnalyzer -(which is used in the IndexFiles class as well) and a QueryParser. The query parser is constructed -with an analyzer used to interperate your query in the same way the Index was interperated: finding -the end of words and removing useless words like 'a', 'an' and 'the'. The Query object contains the -results from the QueryParser which is passed to the searcher. The searcher results are returned in -a collection of Documents called "Hits" which is then iterated through and displayed to the user. +The SearchFiles class is +quite simple. It primarily collaborates with an IndexSearcher, StandardAnalyzer +(which is used in the IndexFiles class as well) and a +QueryParser. The +query parser is constructed with an analyzer used to interpret your query text in the same way the +documents are interpreted: finding the end of words and removing useless words like 'a', 'an' and +'the'. The Query object contains +the results from the QueryParser which is passed to +the searcher. Note that it's also possible to programmatically construct a rich Query object without using the query +parser. The query parser just enables decoding the Lucene query +syntax into the corresponding Query object. The searcher results are +returned in a collection of Documents called Hits which is then iterated through and +displayed to the user.

+
+

read on>>>

+
diff --git a/xdocs/demo3.xml b/xdocs/demo3.xml index a90db494955..77b7204e624 100644 --- a/xdocs/demo3.xml +++ b/xdocs/demo3.xml @@ -9,77 +9,75 @@

-This document is intended as a "getting started" guide to installing and running the -Apache Lucene web application demo. This guide assumes that you have read the -information in the previous two examples or already know it anyhow. We'll use -Tomcat 4.0.1 as our reference web container. These demos should work with nearly -any container, but it is up to you to adapt them appropriately. +This document is intended as a "getting started" guide to installing and running the Lucene +web application demo. This guide assumes that you have read the information in the previous two +examples. We'll use Tomcat as our reference web container. These demos should work with nearly any +container, but you may have to adapt them appropriately.

-The Lucene Web Application demo is a template web application intended for deployment -on Tomcat or a similar web container. It's NOT designed as a "best practices" -implementation by ANY means. It's more of a "hello world" type Lucene Web App. -The purpose of this application is to demonstrate Lucene. With that being said, -it should be relatively simple to create a small searchable website in Tomcat or -a similar application server. +The Lucene Web Application demo is a template web application intended for deployment on Tomcat or a +similar web container. It's NOT designed as a "best practices" implementation by ANY means. It's +more of a "hello world" type Lucene Web App. The purpose of this application is to demonstrate +Lucene. With that being said, it should be relatively simple to create a small searchable website +in Tomcat or a similar application server.

-

-Once you've gotten this far you're probably itching to go. -Let's start by creating the index you'll need for the web examples. -Since you've already set your classpath in the previous examples, -all you need to do is type - "java org.apache.lucene.demo.IndexHTML -create -index {index-dir} ..". -You'll need to do this from a (any) subdirectory of your {tomcat}/webapps directory (make sure you didn't leave off the ".." or you'll get a null pointer -exception). -{index-dir} -should be a directory that Tomcat has permission to read and write, but is -outside of a web accessible context. By default the webapp is configured -to look in /opt/lucene/index for this index. +

Once you've gotten this far you're probably itching to go. Let's start by creating the index +you'll need for the web examples. Since you've already set your CLASSPATH in the previous examples, +all you need to do is type: + +

+    java org.apache.lucene.demo.IndexHTML -create -index {index-dir} ..
+
+ +You'll need to do this from a (any) subdirectory of your {tomcat}/webapps directory +(make sure you didn't leave off the .. or you'll get a null pointer exception). +{index-dir} should be a directory that Tomcat has permission to read and write, but is +outside of a web accessible context. By default the webapp is configured to look in +/opt/lucene/index for this index.

-

Located in your distribution directory you should see -a war file called luceneweb.war. Copy this to your -{tomcat-home}/webapps directory. You may need to restart -Tomcat.

-
+

Located in your distribution directory you should see a war file called +luceneweb.war. If you're working with a Subversion checkout, this will be under the +build subdirectory. Copy this to your {tomcat-home}/webapps directory. +You may need to restart Tomcat.

-

-From your Tomcat directory look in the webapps/luceneweb subdirectory. If it's not -present, try browsing to "http://localhost:8080/luceneweb" then look again. -Edit a file called configuration.jsp. Ensure that the indexLocation is equal to the -location you used for your index. You may also customize the appTitle and appFooter -strings as you see fit. Once you have finished altering the configuration you should -restart Tomcat. You may also wish to update the war file by typing -jar -uf luceneweb.war configuration.jsp from the luceneweb subdirectory. -(The -u option is not available in all versions of jar. In this case recreate the war file). +

From your Tomcat directory look in the webapps/luceneweb subdirectory. If it's not +present, try browsing to http://localhost:8080/luceneweb (which causes Tomcat to deploy +the webapp), then look again. Edit a file called configuration.jsp. Ensure that the +indexLocation is equal to the location you used for your index. You may also customize +the appTitle and appFooter strings as you see fit. Once you have finished +altering the configuration you may need to restart Tomcat. You may also wish to update the war file +by typing jar -uf luceneweb.war configuration.jsp from the luceneweb +subdirectory. (The -u option is not available in all versions of jar. In this case recreate the +war file).

-

Now you're ready to roll. In your browser set the url to "http://localhost:8080/luceneweb" -enter "test" and the number of items per page and press search.

-

You should now be looking either at a number of results (provided you didn't erase the -Tomcat examples) or nothing. Try other search terms. Depending on the number of items -per page you set and results returned, there may be a link at the bottom that says "more results>>", -clicking it goes to subsequent pages. If you get an error regarding opening the index, then you -probably set the path in "configuration" incorrectly or Tomcat doesn't have permissions to the -index (or you skipped the step of creating it).

-
+

Now you're ready to roll. In your browser set the url to +http://localhost:8080/luceneweb enter test and the number of items per +page and press search.

+

You should now be looking either at a number of results (provided you didn't erase the Tomcat +examples) or nothing. If you get an error regarding opening the index, then you probably set the +path in configuration.jsp incorrectly or Tomcat doesn't have permissions to the index +(or you skipped the step of creating it). Try other search terms. Depending on the number of items +per page you set and results returned, there may be a link at the bottom that says More +Results>>; clicking it takes you to subsequent pages.

-If you want to know more about how this web app works or how to customize it then -read on>>>. +If you want to know more about how this web app works or how to customize it then read on>>>.

diff --git a/xdocs/demo4.xml b/xdocs/demo4.xml index aa1d06bb777..fae35e042c5 100644 --- a/xdocs/demo4.xml +++ b/xdocs/demo4.xml @@ -8,124 +8,146 @@

-In this section we walk through the sources behind the basic Lucene Web Application demo. -Where to find it, its parts, and their function. This section is intended for Java developers -wishing to understand how to use Apache Lucene in their applications or for those involved -in deploying web applications based on Lucene. +In this section we walk through the sources behind the basic Lucene Web Application demo: where to +find them, their parts and their function. This section is intended for Java developers wishing to +understand how to use Lucene in their applications or for those involved in deploying web +applications based on Lucene.

-Relative the directory created when you extracted Lucene or retreived it from Subversion, you -should see a directory called "src" which in turn contains a directory called "jsp". -This is the root for all of the Lucene web demo. +Relative to the directory created when you extracted Lucene or retrieved it from Subversion, you +should see a directory called src which in turn contains a directory called +jsp. This is the root for all of the Lucene web demo.

-Within this directory you should see the index.jsp class. Bring this up in vi or your -editor of choice. +Within this directory you should see index.jsp. Bring this up in vi or your editor of +choice.

-This jsp page is pretty boring by itself. All it does is include a header, display a form and -include a footer. If you look at the form, it has two fields: query (where you enter your -search criteria) and maxresults where you specify the number of results per page. If you look -at the form tag, you'll notice it uses the get method as opposed to the post. While this is -considered deprecated functionality by the latest w3c specs, its unlikely to go away due to the -usefulness of being able to bookmark things like searches. By the structure of this JSP it should -be easy to customize it without even editing this particular file. You could simply change the -header and footer. Let's look at the header.jsp (located in the same directory) next. +This jsp page is pretty boring by itself. All it does is include a header, display a form and +include a footer. If you look at the form, it has two fields: query (where you enter +your search criteria) and maxresults where you specify the number of results per page. +By the structure of this JSP it should be easy to customize it without even editing this particular +file. You could simply change the header and footer. Let's look at the header.jsp +(located in the same directory) next.

-The header is also very simple by itself. The only thing it does is include the configuration.jsp -(which you looked at in the last section of this guide) and set the title and a brief header. This -would be a good place to put your own custom HTML to "pretty" things up a bit. We won't cover the -footer because all it does is display the footer and close your tags. Let's look at the results.jsp, -the meat of this application next. +The header is also very simple by itself. The only thing it does is include the +configuration.jsp (which you looked at in the last section of this guide) and set the +title and a brief header. This would be a good place to put your own custom HTML to "pretty" things +up a bit. We won't cover the footer because all it does is display the footer and close your tags. +Let's look at the results.jsp, the meat of this application, next.

-The results.jsp had a lot more functionality. Much of it is for paging the search results we'll not -cover this as it's commented well enough. It does not perform any optimizations such as caching results, -etc. as that would make this a more complex example. The first thing in this page is the actual imports -for the Lucene classes and Lucene demo classes. These classes are loaded from the jars included in the -WEB-INF/lib directory in the final war file. +Most of the functionality lies in results.jsp. Much of it is for paging the search +results, which we'll not cover here as it's commented well enough. The first thing in this page is +the actual imports for the Lucene classes and Lucene demo classes. These classes are loaded from +the jars included in the WEB-INF/lib directory in the luceneweb.war file.

-You'll notice that this file includes the same header and footer as the "index.jsp". From there the jsp -constructs an IndexSearcher with the "indexLocation" that was specified in the "configuration.jsp". If there -is an error of any kind in opening the index, it is diplayed to the user and a boolean flag is set to tell -the rest of the sections of the jsp not to continue. +You'll notice that this file includes the same header and footer as index.jsp. From +there it constructs an IndexSearcher with the +indexLocation that was specified in configuration.jsp. If there is an +error of any kind in opening the index, it is displayed to the user and the boolean flag +error is set to tell the rest of the sections of the jsp not to continue.

-From there, this jsp attempts to get the search criteria, the start index (used for paging) and the maximum -number of results per page. If the maximum results per page is not set or not valid then it and the -start index are set to default values. If only the start index is invalid it is set to a default value. If -the criteria isn't provided then a servlet error is thrown (it is assumed that this is the result of url tampering -or some form of browser malfunction). +From there, this jsp attempts to get the search criteria, the start index (used for paging) and the +maximum number of results per page. If the maximum results per page is not set or not valid then it +and the start index are set to default values. If only the start index is invalid it is set to a +default value. If the criteria isn't provided then a servlet error is thrown (it is assumed that +this is the result of url tampering or some form of browser malfunction).

-The jsp moves on to construct a StandardAnalyzer just as in the simple demo, to analyze the search critieria, it -is passed to the QueryParser along with the criteria to construct a Query object. You'll also notice the -string literal "contents" included. This is to specify the search should include the contents and not -the title, url or some other field in the indexed documents. If there is any error in constructing a Query -object an error is displayed to the user. +The jsp moves on to construct a StandardAnalyzer to +analyze the search text. This matches the analyzer used during indexing (IndexHTML), which is generally +recommended. This is passed to the QueryParser along with the +criteria to construct a Query +object. You'll also notice the string literal "contents" included. This specifies +that the search should cover the contents field and not the title, +url or some other field in the indexed documents. If there is any error in +constructing a Query object an +error is displayed to the user.

-In the next section of the jsp the IndexSearcher is asked to search given the query object. The results are -returned in a collection called "hits". If the length property of the hits collection is 0 then an error -is displayed to the user and the error flag is set. +In the next section of the jsp the IndexSearcher is asked to search +given the query object. The results are returned in a collection called hits. If the +length property of the hits collection is 0 (meaning there were no results) then an +error is displayed to the user and the error flag is set.

-Finally the jsp iterates through the hits collection and displayed properties of the "Document" objects we talked -about in the first walkthrough. These objects contain "known" fields specific to their indexer (in this case -"IndexHTML" constructs a document with "url", "title" and "contents"). You'll notice that these results are paged -but the search is repeated every time. This is an area where optimization could improve performance for large -result sets. +Finally the jsp iterates through the hits collection, taking the current page into +account, and displays properties of the Document objects we talked about in +the first walkthrough. These objects contain "known" fields specific to their indexer (in this case +IndexHTML constructs a document +with "url", "title" and "contents"). +

+

+Please note that in a real deployment of Lucene, it's best to instantiate IndexSearcher and QueryParser once, and then +share them across search requests, instead of re-instantiating per search request.

-There are additional sources used by the web app that were not specifically covered by either walkthrough. For -example the HTML parser, the IndexHTML class and HTMLDocument class. These are very similar to the classes -covered in the first example, however they have properties sepecific to parsing and indexing HTML. This is -beyond our scope; however, by now you should feel like you're "getting started" with Lucene. +There are additional sources used by the web app that were not specifically covered by either +walkthrough. For example the HTML parser, the IndexHTML class and HTMLDocument class. These are very +similar to the classes covered in the first example, with properties specific to parsing and +indexing HTML. This is beyond our scope; however, by now you should feel like you're "getting +started" with Lucene.

-
+

-There are a number of things this demo doesn't do or doesn't do quite right. For instance, you may +There are a number of things this demo doesn't do or doesn't do quite right. For instance, you may have noticed that documents in the root context are unreachable (unless you reconfigure Tomcat to -support that context or redirect to it), anywhere where the directory doesn't quite match the context mapping, -you'll have a broken link in your results. If you want to index non-local files or have some other -needs this isn't supported, plus there may be security issues with running the indexing application from -your webapps directory. There are a number of things left for you the implementor or developer to do. +support that context or redirect to it), anywhere where the directory doesn't quite match the +context mapping, you'll have a broken link in your results. If you want to index non-local files or +have some other needs this isn't supported, plus there may be security issues with running the +indexing application from your webapps directory. There are a number of things left for you the +developer to do.

-In time some of these things may be added to Lucene as features (if you've got a good idea we'd love to hear it!), -but for now: this is where you begin and the search engine/indexer ends. Lastly, one would assume you'd -want to follow the above advice and customize the application to look a little more fancy than black on -white with "Lucene Template" at the top. We'll see you on the Lucene Users' or Developers' mailing lists! +In time some of these things may be added to Lucene as features (if you've got a good idea we'd love +to hear it!), but for now: this is where you begin and the search engine/indexer ends. Lastly, one +would assume you'd want to follow the above advice and customize the application to look a little +more fancy than black on white with "Lucene Template" at the top. We'll see you on the Lucene +Users' or Developers' mailing lists!

-Please resist the urge to contact the authors of this document (without bribes of fame and fortune attached). First -contact the mailing lists. That being said feedback, -and modifications to this document and samples are ever so greatly appreciated. They are just best sent to the -lists so that everyone can share in them. Certainly you'll get the most help there as well. -Thanks for understanding. +Please resist the urge to contact the authors of this document (without bribes of fame and fortune +attached). First contact the mailing lists, taking care to Ask Questions The Smart Way. +Certainly you'll get the most help that way as well. That being said, feedback, and modifications +to this document and samples are ever so greatly appreciated. They are just best sent to the lists +or posted as patches, so that +everyone can share in them. Thanks for understanding!

diff --git a/xdocs/gettingstarted.xml b/xdocs/gettingstarted.xml index 8c513da0350..af6c604e081 100644 --- a/xdocs/gettingstarted.xml +++ b/xdocs/gettingstarted.xml @@ -8,42 +8,40 @@

-This document is intended as a "getting started" guide. It has three basic -audiences: novices looking to install Apache Lucene on their application or -web server, developers looking to modify or base the applications they develop -on Lucene, and developers looking to become involved in and contribute to the -development of Lucene. This document is written in tutorial and walkthrough -format. It intends to help you in "getting started", but does not go into great -depth into some of the conceptual or inner details of Apache Lucene. +This document is intended as a "getting started" guide. It has three audiences: first-time users +looking to install Apache Lucene in their application or web server; developers looking to modify or base +the applications they develop on Lucene; and developers looking to become involved in and contribute +to the development of Lucene. This document is written in tutorial and walk-through format. The +goal is to help you "get started". It does not go into great depth on some of the conceptual or +inner details of Lucene.

-Each section listed below builds on one another. That being said more advanced users may -wish to skip sections. +Each section listed below builds on one another. More advanced users +may wish to skip sections.

    -
  • About the basic Lucene demo and its usage. - This section is intended for anyone who wants a basic background on using the provided Lucene demos.
  • +
  • About the command-line Lucene demo and its usage. This section + is intended for anyone who wants to use the command-line Lucene demo.
  • -

  • About the sources and implementation - for the basic Lucene demo section we walk through . This section is intended for developers.
  • +
  • About the sources and implementation for the command-line Lucene + demo. This section walks through the implementation details (sources) of the + command-line Lucene demo. This section is intended for developers.
  • -

  • About installing - and configuring the template web application. While this walkthrough assumes - Tomcat 4.0.x as your container of choice, there is no reason you can't (provided you have - the requisite knowledge) adapt the instructions to your container. This section is intended - for those responsible for the development or deployment of Lucene-based web applications.
  • +
  • About installing and configuring the demo template web + application. While this walk-through assumes Tomcat as your container of choice, + there is no reason you can't (provided you have the requisite knowledge) adapt the + instructions to your container. This section is intended for those responsible for the + development or deployment of Lucene-based web applications.
  • + +

  • About the sources used to construct the demo template web + application. Please note the template application is designed to highlight features of + Lucene and is not an example of best practices. (One would hopefully use MVC + architecture such as provided by Jakarta Struts and taglibs, but showing you how to do that + would be WAY beyond the scope of this guide.) This section is intended for developers and + those wishing to customize the demo template web application to their needs.
  • -
  • About the sources used to construct the - template web application. Please note the template application is designed to highlight - features of Lucene and is not an example of best practices. (One would hopefully - use MVC architecture such as provided by Jakarta Struts and taglibs, or better yet XML - with stylesheets, but showing you how to do that would be WAY beyond the scope of this - demonstration. Additionally one could cache results, and perform other performance - optimizations, but those are beyond the scope of this demo). - This section is intended for developers and those wishing to customize the template web - application to their needs. The sections useful to developers only are clearly delineated.