diff --git a/lucene/core/src/java/org/apache/lucene/analysis/package.html b/lucene/core/src/java/org/apache/lucene/analysis/package.html index 342e99dfa36..0ac07e6ef32 100644 --- a/lucene/core/src/java/org/apache/lucene/analysis/package.html +++ b/lucene/core/src/java/org/apache/lucene/analysis/package.html @@ -149,7 +149,7 @@ and proximity searches (though sentence identification is not provided by Lucene {@link org.apache.lucene.document.Field}s.
Analysis is one of the main causes of performance degradation during indexing. Simply put, the more you analyze the slower the indexing (in most cases). - Perhaps your application would be just fine using the simple WhitespaceTokenizer combined with a StopFilter. The contrib/benchmark library can be useful + Perhaps your application would be just fine using the simple WhitespaceTokenizer combined with a StopFilter. The benchmark/ library can be useful for testing out the speed of the analysis process.
First and foremost, a {@link org.apache.lucene.document.Document} is something created by the user application. It is your job to create Documents based on the content of the files you are working with in your application (Word, txt, PDF, Excel or any other format.) How this is done is completely up to you. That being said, there are many tools available in other projects that can make - the process of taking a file and converting it into a Lucene {@link org.apache.lucene.document.Document}. To see an example of this, - take a look at the Lucene demo and the associated source code - for extracting content from HTML. + the process of taking a file and converting it into a Lucene {@link org.apache.lucene.document.Document}.
The {@link org.apache.lucene.document.DateTools} is a utility class to make dates and times searchable (remember, Lucene only searches text). {@link org.apache.lucene.document.IntField}, {@link org.apache.lucene.document.LongField}, diff --git a/lucene/core/src/java/org/apache/lucene/index/package.html b/lucene/core/src/java/org/apache/lucene/index/package.html index 1ef714e72e7..9cdef6312fa 100644 --- a/lucene/core/src/java/org/apache/lucene/index/package.html +++ b/lucene/core/src/java/org/apache/lucene/index/package.html @@ -21,5 +21,6 @@
Code to maintain and access indices. +