mirror of https://github.com/apache/lucene.git
- Fixed an incorrect statement about 0.2 boost of Fuzzy queries.
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@150015 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
577bfab58f
commit
b058f63c30
|
@ -19,7 +19,7 @@
|
|||
|
||||
|
||||
<title>Jakarta Lucene -
|
||||
Query Parser Syntax - Jakarta Lucene
|
||||
Query Parser Syntax - Jakarta Lucene
|
||||
</title>
|
||||
</head>
|
||||
|
||||
|
@ -126,24 +126,24 @@
|
|||
<p>
|
||||
Before choosing to use the provided Query Parser, please consider the following:
|
||||
<ol>
|
||||
<li>If you are programmatically generating a query string and then
|
||||
parsing it with the query parser then you should seriously consider building
|
||||
<li>If you are programmatically generating a query string and then
|
||||
parsing it with the query parser then you should seriously consider building
|
||||
your queries directly with the query API. In other words, the query
|
||||
parser is designed for human-entered text, not for program-generated
|
||||
parser is designed for human-entered text, not for program-generated
|
||||
text.</li>
|
||||
|
||||
<li>Untokenized fields are best added directly to queries, and not
|
||||
through the query parser. If a field's values are generated programmatically
|
||||
by the application, then so should query clauses for this field.
|
||||
Analyzers, like the query parser, are designed to convert human-entered
|
||||
text to terms. Program-generated values, like dates, keywords, etc.,
|
||||
<li>Untokenized fields are best added directly to queries, and not
|
||||
through the query parser. If a field's values are generated programmatically
|
||||
by the application, then so should query clauses for this field.
|
||||
Analyzers, like the query parser, are designed to convert human-entered
|
||||
text to terms. Program-generated values, like dates, keywords, etc.,
|
||||
should be consistently program-generated.</li>
|
||||
|
||||
<li>In a query form, fields which are general text should use the query
|
||||
parser. All others, such as date ranges, keywords, etc. are better added
|
||||
directly through the query API. A field with a limit set of values,
|
||||
that can be specified with a pull-down menu should not be added to a
|
||||
query string which is subsequently parsed, but rather added as a
|
||||
<li>In a query form, fields which are general text should use the query
|
||||
parser. All others, such as date ranges, keywords, etc. are better added
|
||||
directly through the query API. A field with a limit set of values,
|
||||
that can be specified with a pull-down menu should not be added to a
|
||||
query string which is subsequently parsed, but rather added as a
|
||||
TermQuery clause.</li>
|
||||
</ol>
|
||||
</p>
|
||||
|
@ -181,7 +181,7 @@
|
|||
<blockquote>
|
||||
<p>Lucene supports fielded data. When performing a search you can either specify a field, or use the default field. The field names and default field is implementation specific.</p>
|
||||
<p>You can search any field by typing the field name followed by a colon ":" and then the term you are looking for. </p>
|
||||
<p>As an example, let's assume a Lucene index contains two fields, title and text and text is the default field.
|
||||
<p>As an example, let's assume a Lucene index contains two fields, title and text and text is the default field.
|
||||
If you want to find the document entitled "The Right Way" which contains the text "don't go this way", you can enter: </p>
|
||||
<div align="left">
|
||||
<table cellspacing="4" cellpadding="0" border="0">
|
||||
|
@ -363,7 +363,6 @@
|
|||
</table>
|
||||
</div>
|
||||
<p>This search will find terms like foam and roams</p>
|
||||
<p>Note:Terms found by the fuzzy search will automatically get a boost factor of 0.2</p>
|
||||
</blockquote>
|
||||
</td></tr>
|
||||
<tr><td><br/></td></tr>
|
||||
|
@ -486,7 +485,7 @@
|
|||
</tr>
|
||||
</table>
|
||||
</div>
|
||||
<p>and you want the term "jakarta" to be more relevant boost it using the ^ symbol along with the boost factor next to the term.
|
||||
<p>and you want the term "jakarta" to be more relevant boost it using the ^ symbol along with the boost factor next to the term.
|
||||
You would type:</p>
|
||||
<div align="left">
|
||||
<table cellspacing="4" cellpadding="0" border="0">
|
||||
|
@ -555,7 +554,7 @@
|
|||
</td></tr>
|
||||
<tr><td>
|
||||
<blockquote>
|
||||
<p>The OR operator is the default conjunction operator. This means that if there is no Boolean operator between two terms, the OR operator is used.
|
||||
<p>The OR operator is the default conjunction operator. This means that if there is no Boolean operator between two terms, the OR operator is used.
|
||||
The OR operator links two terms and finds a matching document if either of the terms exist in a document. This is equivalent to a union using sets.
|
||||
The symbol || can be used in place of the word OR.</p>
|
||||
<p>To search for documents that contain either "jakarta apache" or just "jakarta" use the query:</p>
|
||||
|
@ -610,7 +609,7 @@
|
|||
</td></tr>
|
||||
<tr><td>
|
||||
<blockquote>
|
||||
<p>The AND operator matches documents where both terms exist anywhere in the text of a single document.
|
||||
<p>The AND operator matches documents where both terms exist anywhere in the text of a single document.
|
||||
This is equivalent to an intersection using sets. The symbol && can be used in place of the word AND.</p>
|
||||
<p>To search for documents that contain "jakarta apache" and "jakarta lucene" use the query: </p>
|
||||
<div align="left">
|
||||
|
|
|
@ -3,7 +3,7 @@
|
|||
<properties>
|
||||
<author email="carlson@apache.org">Peter Carlson</author>
|
||||
<title>
|
||||
Query Parser Syntax - Jakarta Lucene
|
||||
Query Parser Syntax - Jakarta Lucene
|
||||
</title>
|
||||
</properties>
|
||||
<body>
|
||||
|
@ -16,24 +16,24 @@
|
|||
<p>
|
||||
Before choosing to use the provided Query Parser, please consider the following:
|
||||
<ol>
|
||||
<li>If you are programmatically generating a query string and then
|
||||
parsing it with the query parser then you should seriously consider building
|
||||
<li>If you are programmatically generating a query string and then
|
||||
parsing it with the query parser then you should seriously consider building
|
||||
your queries directly with the query API. In other words, the query
|
||||
parser is designed for human-entered text, not for program-generated
|
||||
parser is designed for human-entered text, not for program-generated
|
||||
text.</li>
|
||||
|
||||
<li>Untokenized fields are best added directly to queries, and not
|
||||
through the query parser. If a field's values are generated programmatically
|
||||
by the application, then so should query clauses for this field.
|
||||
Analyzers, like the query parser, are designed to convert human-entered
|
||||
text to terms. Program-generated values, like dates, keywords, etc.,
|
||||
<li>Untokenized fields are best added directly to queries, and not
|
||||
through the query parser. If a field's values are generated programmatically
|
||||
by the application, then so should query clauses for this field.
|
||||
Analyzers, like the query parser, are designed to convert human-entered
|
||||
text to terms. Program-generated values, like dates, keywords, etc.,
|
||||
should be consistently program-generated.</li>
|
||||
|
||||
<li>In a query form, fields which are general text should use the query
|
||||
parser. All others, such as date ranges, keywords, etc. are better added
|
||||
directly through the query API. A field with a limit set of values,
|
||||
that can be specified with a pull-down menu should not be added to a
|
||||
query string which is subsequently parsed, but rather added as a
|
||||
<li>In a query form, fields which are general text should use the query
|
||||
parser. All others, such as date ranges, keywords, etc. are better added
|
||||
directly through the query API. A field with a limit set of values,
|
||||
that can be specified with a pull-down menu should not be added to a
|
||||
query string which is subsequently parsed, but rather added as a
|
||||
TermQuery clause.</li>
|
||||
</ol>
|
||||
</p>
|
||||
|
@ -47,15 +47,15 @@
|
|||
<p>Note: The analyzer used to create the index will be used on the terms and phrases in the query string.
|
||||
So it is important to choose an analyzer that will not interfere with the terms used in the query string.</p>
|
||||
</section>
|
||||
|
||||
|
||||
<section name="Fields">
|
||||
<p>Lucene supports fielded data. When performing a search you can either specify a field, or use the default field. The field names and default field is implementation specific.</p>
|
||||
<p>You can search any field by typing the field name followed by a colon ":" and then the term you are looking for. </p>
|
||||
<p>As an example, let's assume a Lucene index contains two fields, title and text and text is the default field.
|
||||
<p>As an example, let's assume a Lucene index contains two fields, title and text and text is the default field.
|
||||
If you want to find the document entitled "The Right Way" which contains the text "don't go this way", you can enter: </p>
|
||||
|
||||
|
||||
<source>title:"The Right Way" AND text:go</source>
|
||||
<p>or</p>
|
||||
<p>or</p>
|
||||
<source>title:"Do it right" AND right</source>
|
||||
<p>Since text is the default field, the field indicator is not required.</p>
|
||||
|
||||
|
@ -63,11 +63,11 @@
|
|||
<source>title:Do it right</source>
|
||||
<p>Will only find "Do" in the title field. It will find "it" and "right" in the default field (in this case the text field). </p>
|
||||
</section>
|
||||
|
||||
|
||||
<section name="Term Modifiers">
|
||||
|
||||
<p>Lucene supports modifying query terms to provide a wide range of searching options.</p>
|
||||
|
||||
|
||||
<subsection name="Wildcard Searches">
|
||||
<p>Lucene supports single and multiple character wildcard searches.</p>
|
||||
<p>To perform a single character wildcard search use the "?" symbol.</p>
|
||||
|
@ -82,24 +82,23 @@
|
|||
<source>te*t</source>
|
||||
<p>Note: You cannot use a * or ? symbol as the first character of a search.</p>
|
||||
</subsection>
|
||||
|
||||
|
||||
|
||||
|
||||
<subsection name="Fuzzy Searches">
|
||||
<p>Lucene supports fuzzy searches based on the Levenshtein Distance, or Edit Distance algorithm. To do a fuzzy search use the tilde, "~", symbol at the end of a Single word Term. For example to search for a term similar in spelling to "roam" use the fuzzy search: </p>
|
||||
|
||||
<source>roam~</source>
|
||||
<p>This search will find terms like foam and roams</p>
|
||||
<p>Note:Terms found by the fuzzy search will automatically get a boost factor of 0.2</p>
|
||||
</subsection>
|
||||
|
||||
|
||||
|
||||
<subsection name="Proximity Searches">
|
||||
<p>Lucene supports finding words are a within a specific distance away. To do a proximity search use the tilde, "~", symbol at the end of a Phrase. For example to search for a "apache" and "jakarta" within 10 words of each other in a document use the search: </p>
|
||||
|
||||
<source>"jakarta apache"~10</source>
|
||||
</subsection>
|
||||
|
||||
|
||||
|
||||
<subsection name="Range Searches">
|
||||
<p>Range Queries allow one to match documents whose field(s) values
|
||||
are between the lower and upper bound specified by the Range Query.
|
||||
|
@ -112,13 +111,13 @@
|
|||
<p>This will find all documents whose titles are between Aida and Carmen.</p>
|
||||
</subsection>
|
||||
|
||||
|
||||
|
||||
<subsection name="Boosting a Term">
|
||||
<p>Lucene provides the relevance level of matching documents based on the terms found. To boost a term use the caret, "^", symbol with a boost factor (a number) at the end of the term you are searching. The higher the boost factor, the more relevant the term will be.</p>
|
||||
<p>Boosting allows you to control the relevance of a document by boosting its term. For example, if you are searching for</p>
|
||||
|
||||
<source>jakarta apache</source>
|
||||
<p>and you want the term "jakarta" to be more relevant boost it using the ^ symbol along with the boost factor next to the term.
|
||||
<p>and you want the term "jakarta" to be more relevant boost it using the ^ symbol along with the boost factor next to the term.
|
||||
You would type:</p>
|
||||
<source>jakarta^4 apache</source>
|
||||
<p>This will make documents with the term jakarta appear more relevant. You can also boost Phrase Terms as in the example: </p>
|
||||
|
@ -136,7 +135,7 @@
|
|||
Lucene supports AND, "+", OR, NOT and "-" as Boolean operators(Note: Boolean operators must be ALL CAPS).</p>
|
||||
|
||||
<subsection name="OR">
|
||||
<p>The OR operator is the default conjunction operator. This means that if there is no Boolean operator between two terms, the OR operator is used.
|
||||
<p>The OR operator is the default conjunction operator. This means that if there is no Boolean operator between two terms, the OR operator is used.
|
||||
The OR operator links two terms and finds a matching document if either of the terms exist in a document. This is equivalent to a union using sets.
|
||||
The symbol || can be used in place of the word OR.</p>
|
||||
<p>To search for documents that contain either "jakarta apache" or just "jakarta" use the query:</p>
|
||||
|
@ -149,13 +148,13 @@
|
|||
|
||||
</subsection>
|
||||
<subsection name="AND">
|
||||
<p>The AND operator matches documents where both terms exist anywhere in the text of a single document.
|
||||
<p>The AND operator matches documents where both terms exist anywhere in the text of a single document.
|
||||
This is equivalent to an intersection using sets. The symbol && can be used in place of the word AND.</p>
|
||||
<p>To search for documents that contain "jakarta apache" and "jakarta lucene" use the query: </p>
|
||||
|
||||
<source>"jakarta apache" AND "jakarta lucene"</source>
|
||||
</subsection>
|
||||
|
||||
|
||||
<subsection name="+">
|
||||
<p>The "+" or required operator requires that the term after the "+" symbol exist somewhere in a the field of a single document.</p>
|
||||
<p>To search for documents that must contain "jakarta" and may contain "lucene" use the query:</p>
|
||||
|
@ -170,26 +169,26 @@
|
|||
|
||||
<source>"jakarta apache" NOT "jakarta lucene"</source>
|
||||
<p>Note: The NOT operator cannot be used with just one term. For example, the following search will return no results:</p>
|
||||
|
||||
|
||||
<source>NOT "jakarta apache"</source>
|
||||
</subsection>
|
||||
|
||||
|
||||
<subsection name="-">
|
||||
<p>The "-" or prohibit operator excludes documents that contain the term after the "-" symbol.</p>
|
||||
<p>The "-" or prohibit operator excludes documents that contain the term after the "-" symbol.</p>
|
||||
<p>To search for documents that contain "jakarta apache" but not "jakarta lucene" use the query: </p>
|
||||
|
||||
<source>"jakarta apache" -"jakarta lucene"</source>
|
||||
</subsection>
|
||||
|
||||
|
||||
</section>
|
||||
|
||||
|
||||
<section name="Grouping">
|
||||
<p>Lucene supports using parentheses to group clauses to form sub queries. This can be very useful if you want to control the boolean logic for a query.</p>
|
||||
<p>To search for either "jakarta" or "apache" and "website" use the query:</p>
|
||||
<source>(jakarta OR apache) AND website</source>
|
||||
<p>This eliminates any confusion and makes sure you that website must exist and either term jakarta or apache may exist.</p>
|
||||
</section>
|
||||
|
||||
|
||||
<section name="Field Grouping">
|
||||
<p>Lucene supports using parentheses to group multiple clauses to a single field.</p>
|
||||
<p>To search for a title that contains both the word "return" and the phrase "pink panther" use the query:</p>
|
||||
|
@ -202,6 +201,6 @@
|
|||
<p>To escape these character use the \ before the character. For example to search for (1+1):2 use the query:</p>
|
||||
<source>\(1\+1\)\:2</source>
|
||||
</section>
|
||||
|
||||
|
||||
</body>
|
||||
</document>
|
||||
|
|
Loading…
Reference in New Issue