mirror of https://github.com/apache/lucene.git
458 lines
20 KiB
HTML
458 lines
20 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
|
|
<html>
|
|
<head>
|
|
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
|
|
<meta content="Apache Forrest" name="Generator">
|
|
<meta name="Forrest-version" content="0.7">
|
|
<meta name="Forrest-skin-name" content="pelt">
|
|
<title>
|
|
Apache Lucene - Query Parser Syntax
|
|
</title>
|
|
<link type="text/css" href="skin/basic.css" rel="stylesheet">
|
|
<link media="screen" type="text/css" href="skin/screen.css" rel="stylesheet">
|
|
<link media="print" type="text/css" href="skin/print.css" rel="stylesheet">
|
|
<link type="text/css" href="skin/profile.css" rel="stylesheet">
|
|
<script src="skin/getBlank.js" language="javascript" type="text/javascript"></script><script src="skin/getMenu.js" language="javascript" type="text/javascript"></script><script src="skin/fontsize.js" language="javascript" type="text/javascript"></script>
|
|
<link rel="shortcut icon" href="images/favicon.ico">
|
|
</head>
|
|
<body onload="init()">
|
|
<script type="text/javascript">ndeSetTextSize();</script>
|
|
<div id="top">
|
|
<div class="breadtrail">
|
|
<a href="http://www.apache.org/">Apache</a> > <a href="http://lucene.apache.org/">Lucene</a><script src="skin/breadcrumbs.js" language="JavaScript" type="text/javascript"></script>
|
|
</div>
|
|
<div class="header">
|
|
<div class="grouplogo">
|
|
<a href="http://lucene.apache.org/"><img class="logoImage" alt="Lucene" src="http://www.apache.org/images/asf_logo_simple.png" title="Apache Lucene"></a>
|
|
</div>
|
|
<div class="projectlogo">
|
|
<a href="http://lucene.apache.org/java/"><img class="logoImage" alt="Lucene" src="http://lucene.apache.org/images/lucene_green_300.gif" title="Apache Lucene is a high-performance, full-featured text search engine library written entirely in
|
|
Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform."></a>
|
|
</div>
|
|
<div class="searchbox">
|
|
<form action="http://www.google.com/search" method="get" class="roundtopsmall">
|
|
<input value="lucene.apache.org" name="sitesearch" type="hidden"><input onFocus="getBlank (this, 'Search the site with google');" size="25" name="q" id="query" type="text" value="Search the site with google">
|
|
<input attr="value" name="Search" value="Search" type="submit">
|
|
</form>
|
|
</div>
|
|
<ul id="tabs">
|
|
<li class="current">
|
|
<a class="base-selected" href="index.html">Main</a>
|
|
</li>
|
|
<li>
|
|
<a class="base-not-selected" href="http://wiki.apache.org/jakarta-lucene">Wiki</a>
|
|
</li>
|
|
</ul>
|
|
</div>
|
|
</div>
|
|
<div id="main">
|
|
<div id="publishedStrip">
|
|
<div id="level2tabs"></div>
|
|
<script type="text/javascript"><!--
|
|
document.write("<text>Last Published:</text> " + document.lastModified);
|
|
// --></script>
|
|
</div>
|
|
<div class="breadtrail">
|
|
|
|
|
|
</div>
|
|
<div id="menu">
|
|
<div onclick="SwitchMenu('menu_1.1', 'skin/')" id="menu_1.1Title" class="menutitle">About</div>
|
|
<div id="menu_1.1" class="menuitemgroup">
|
|
<div class="menuitem">
|
|
<a href="index.html" title="Welcome to Java Lucene">Overview</a>
|
|
</div>
|
|
<div class="menuitem">
|
|
<a href="features.html">Features</a>
|
|
</div>
|
|
<div class="menuitem">
|
|
<a href="http://wiki.apache.org/jakarta-lucene/PoweredBy">Powered by Lucene</a>
|
|
</div>
|
|
<div class="menuitem">
|
|
<a href="whoweare.html">Who We Are</a>
|
|
</div>
|
|
</div>
|
|
<div onclick="SwitchMenu('menu_selected_1.2', 'skin/')" id="menu_selected_1.2Title" class="menutitle" style="background-image: url('skin/images/chapter_open.gif');">Documentation</div>
|
|
<div id="menu_selected_1.2" class="selectedmenuitemgroup" style="display: block;">
|
|
<div class="menuitem">
|
|
<a href="api/">API Docs</a>
|
|
</div>
|
|
<div class="menuitem">
|
|
<a href="benchmarks.html">Benchmarks</a>
|
|
</div>
|
|
<div class="menuitem">
|
|
<a href="contributions.html">Contributions</a>
|
|
</div>
|
|
<div class="menuitem">
|
|
<a href="http://wiki.apache.org/jakarta-lucene/LuceneFAQ">FAQ</a>
|
|
</div>
|
|
<div class="menuitem">
|
|
<a href="fileformats.html">File Formats</a>
|
|
</div>
|
|
<div class="menuitem">
|
|
<a href="gettingstarted.html">Getting Started</a>
|
|
</div>
|
|
<div class="menuitem">
|
|
<a href="lucene-sandbox/index.html">Lucene Sandbox</a>
|
|
</div>
|
|
<div class="menupage">
|
|
<div class="menupagetitle">Query Syntax</div>
|
|
</div>
|
|
<div class="menuitem">
|
|
<a href="scoring.html">Scoring</a>
|
|
</div>
|
|
<div class="menuitem">
|
|
<a href="http://wiki.apache.org/jakarta-lucene">Wiki</a>
|
|
</div>
|
|
</div>
|
|
<div onclick="SwitchMenu('menu_1.3', 'skin/')" id="menu_1.3Title" class="menutitle">Resources</div>
|
|
<div id="menu_1.3" class="menuitemgroup">
|
|
<div class="menuitem">
|
|
<a href="developer-resources.html">Developers</a>
|
|
</div>
|
|
<div class="menuitem">
|
|
<a href="http://issues.apache.org/jira/browse/LUCENE">Issue Tracking</a>
|
|
</div>
|
|
<div class="menuitem">
|
|
<a href="mailinglists.html">Mailing Lists</a>
|
|
</div>
|
|
<div class="menuitem">
|
|
<a href="releases.html">Releases</a>
|
|
</div>
|
|
<div class="menuitem">
|
|
<a href="http://svn.apache.org/viewcvs.cgi/lucene/java/">Version Control</a>
|
|
</div>
|
|
</div>
|
|
<div onclick="SwitchMenu('menu_1.4', 'skin/')" id="menu_1.4Title" class="menutitle">Site Versions</div>
|
|
<div id="menu_1.4" class="menuitemgroup">
|
|
<div class="menuitem">
|
|
<a href="./">Main</a>
|
|
</div>
|
|
<div class="menuitem">
|
|
<a href="http://lucene.apache.org/java/1_4_3/">1.4.3</a>
|
|
</div>
|
|
<div class="menuitem">
|
|
<a href="http://lucene.apache.org/java/1_9_0/">1.9.0</a>
|
|
</div>
|
|
<div class="menuitem">
|
|
<a href="http://lucene.apache.org/java/1_9_1/">1.9.1</a>
|
|
</div>
|
|
<div class="menuitem">
|
|
<a href="http://lucene.apache.org/java/2_0_0/">2.0.0</a>
|
|
</div>
|
|
</div>
|
|
<div onclick="SwitchMenu('menu_1.5', 'skin/')" id="menu_1.5Title" class="menutitle">Related Projects</div>
|
|
<div id="menu_1.5" class="menuitemgroup">
|
|
<div class="menuitem">
|
|
<a href="http://lucene.apache.org">Lucene (Top-Level)</a>
|
|
</div>
|
|
<div class="menuitem">
|
|
<a href="http://lucene.apache.org/hadoop/">Hadoop</a>
|
|
</div>
|
|
<div class="menuitem">
|
|
<a href="http://lucene.apache.org/lucy/">Lucy</a>
|
|
</div>
|
|
<div class="menuitem">
|
|
<a href="http://incubator.apache.org/projects/lucene.net.html">Lucene.NET</a>
|
|
</div>
|
|
<div class="menuitem">
|
|
<a href="http://lucene.apache.org/nutch/">Nutch</a>
|
|
</div>
|
|
<div class="menuitem">
|
|
<a href="http://incubator.apache.org/solr/">SOLR</a>
|
|
</div>
|
|
</div>
|
|
<div id="credit"></div>
|
|
<div id="roundbottom">
|
|
<img style="display: none" class="corner" height="15" width="15" alt="" src="skin/images/rc-b-l-15-1body-2menu-3menu.png"></div>
|
|
<div id="credit2"></div>
|
|
</div>
|
|
<div id="content">
|
|
<div title="Portable Document Format" class="pdflink">
|
|
<a class="dida" href="queryparsersyntax.pdf"><img alt="PDF -icon" src="skin/images/pdfdoc.gif" class="skin"><br>
|
|
PDF</a>
|
|
</div>
|
|
<h1>
|
|
Apache Lucene - Query Parser Syntax
|
|
</h1>
|
|
<div id="minitoc-area">
|
|
<ul class="minitoc">
|
|
<li>
|
|
<a href="#Overview">Overview</a>
|
|
</li>
|
|
<li>
|
|
<a href="#Terms">Terms</a>
|
|
</li>
|
|
<li>
|
|
<a href="#Fields">Fields</a>
|
|
</li>
|
|
<li>
|
|
<a href="#Term Modifiers">Term Modifiers</a>
|
|
<ul class="minitoc">
|
|
<li>
|
|
<a href="#Wildcard Searches">Wildcard Searches</a>
|
|
</li>
|
|
<li>
|
|
<a href="#Fuzzy Searches">Fuzzy Searches</a>
|
|
</li>
|
|
<li>
|
|
<a href="#Proximity Searches">Proximity Searches</a>
|
|
</li>
|
|
<li>
|
|
<a href="#Range Searches">Range Searches</a>
|
|
</li>
|
|
<li>
|
|
<a href="#Boosting a Term">Boosting a Term</a>
|
|
</li>
|
|
</ul>
|
|
</li>
|
|
<li>
|
|
<a href="#Boolean operators">Boolean Operators</a>
|
|
<ul class="minitoc">
|
|
<li>
|
|
<a href="#OR"></a>
|
|
</li>
|
|
<li>
|
|
<a href="#AND">AND</a>
|
|
</li>
|
|
<li>
|
|
<a href="#+">+</a>
|
|
</li>
|
|
<li>
|
|
<a href="#NOT">NOT</a>
|
|
</li>
|
|
<li>
|
|
<a href="#-">-</a>
|
|
</li>
|
|
</ul>
|
|
</li>
|
|
<li>
|
|
<a href="#Grouping">Grouping</a>
|
|
</li>
|
|
<li>
|
|
<a href="#Field Grouping">Field Grouping</a>
|
|
</li>
|
|
<li>
|
|
<a href="#Escaping Special Characters">Escaping Special Characters</a>
|
|
</li>
|
|
</ul>
|
|
</div>
|
|
|
|
<a name="N10013"></a><a name="Overview"></a>
|
|
<h2 class="boxed">Overview</h2>
|
|
<div class="section">
|
|
<p>Although Lucene provides the ability to create your own
|
|
queries through its API, it also provides a rich query
|
|
language through the Query Parser, a lexer which
|
|
interprets a string into a Lucene Query using JavaCC.
|
|
</p>
|
|
<p>This page provides the Query Parser syntax in Lucene 1.9.
|
|
If you are using a different
|
|
version of Lucene, please consult the copy of
|
|
<span class="codefrag">docs/queryparsersyntax.html</span> that was distributed
|
|
with the version you are using.
|
|
</p>
|
|
<p>
|
|
Before choosing to use the provided Query Parser, please consider the following:
|
|
<ol>
|
|
|
|
<li>If you are programmatically generating a query string and then
|
|
parsing it with the query parser then you should seriously consider building
|
|
your queries directly with the query API. In other words, the query
|
|
parser is designed for human-entered text, not for program-generated
|
|
text.</li>
|
|
|
|
|
|
<li>Untokenized fields are best added directly to queries, and not
|
|
through the query parser. If a field's values are generated programmatically
|
|
by the application, then so should query clauses for this field.
|
|
An analyzer, which the query parser uses, is designed to convert human-entered
|
|
text to terms. Program-generated values, like dates, keywords, etc.,
|
|
should be consistently program-generated.</li>
|
|
|
|
|
|
<li>In a query form, fields which are general text should use the query
|
|
parser. All others, such as date ranges, keywords, etc. are better added
|
|
directly through the query API. A field with a limit set of values,
|
|
that can be specified with a pull-down menu should not be added to a
|
|
query string which is subsequently parsed, but rather added as a
|
|
TermQuery clause.</li>
|
|
|
|
</ol>
|
|
|
|
</p>
|
|
</div>
|
|
|
|
|
|
<a name="N10032"></a><a name="Terms"></a>
|
|
<h2 class="boxed">Terms</h2>
|
|
<div class="section">
|
|
<p>A query is broken up into terms and operators. There are two types of terms: Single Terms and Phrases.</p>
|
|
<p>A Single Term is a single word such as "test" or "hello".</p>
|
|
<p>A Phrase is a group of words surrounded by double quotes such as "hello dolly".</p>
|
|
<p>Multiple terms can be combined together with Boolean operators to form a more complex query (see below).</p>
|
|
<p>Note: The analyzer used to create the index will be used on the terms and phrases in the query string.
|
|
So it is important to choose an analyzer that will not interfere with the terms used in the query string.</p>
|
|
</div>
|
|
|
|
|
|
<a name="N10048"></a><a name="Fields"></a>
|
|
<h2 class="boxed">Fields</h2>
|
|
<div class="section">
|
|
<p>Lucene supports fielded data. When performing a search you can either specify a field, or use the default field. The field names and default field is implementation specific.</p>
|
|
<p>You can search any field by typing the field name followed by a colon ":" and then the term you are looking for. </p>
|
|
<p>As an example, let's assume a Lucene index contains two fields, title and text and text is the default field.
|
|
If you want to find the document entitled "The Right Way" which contains the text "don't go this way", you can enter: </p>
|
|
<pre class="code">title:"The Right Way" AND text:go</pre>
|
|
<p>or</p>
|
|
<pre class="code">title:"Do it right" AND right</pre>
|
|
<p>Since text is the default field, the field indicator is not required.</p>
|
|
<p>Note: The field is only valid for the term that it directly precedes, so the query</p>
|
|
<pre class="code">title:Do it right</pre>
|
|
<p>Will only find "Do" in the title field. It will find "it" and "right" in the default field (in this case the text field). </p>
|
|
</div>
|
|
|
|
|
|
<a name="N1006D"></a><a name="Term Modifiers"></a>
|
|
<h2 class="boxed">Term Modifiers</h2>
|
|
<div class="section">
|
|
<p>Lucene supports modifying query terms to provide a wide range of searching options.</p>
|
|
<a name="N10076"></a><a name="Wildcard Searches"></a>
|
|
<h3 class="boxed">Wildcard Searches</h3>
|
|
<p>Lucene supports single and multiple character wildcard searches.</p>
|
|
<p>To perform a single character wildcard search use the "?" symbol.</p>
|
|
<p>To perform a multiple character wildcard search use the "*" symbol.</p>
|
|
<p>The single character wildcard search looks for terms that match that with the single character replaced. For example, to search for "text" or "test" you can use the search:</p>
|
|
<pre class="code">te?t</pre>
|
|
<p>Multiple character wildcard searches looks for 0 or more characters. For example, to search for test, tests or tester, you can use the search: </p>
|
|
<pre class="code">test*</pre>
|
|
<p>You can also use the wildcard searches in the middle of a term.</p>
|
|
<pre class="code">te*t</pre>
|
|
<p>Note: You cannot use a * or ? symbol as the first character of a search.</p>
|
|
<a name="N1009B"></a><a name="Fuzzy Searches"></a>
|
|
<h3 class="boxed">Fuzzy Searches</h3>
|
|
<p>Lucene supports fuzzy searches based on the Levenshtein Distance, or Edit Distance algorithm. To do a fuzzy search use the tilde, "~", symbol at the end of a Single word Term. For example to search for a term similar in spelling to "roam" use the fuzzy search: </p>
|
|
<pre class="code">roam~</pre>
|
|
<p>This search will find terms like foam and roams.</p>
|
|
<p>Starting with Lucene 1.9 an additional (optional) parameter can specify the required similarity. The value is between 0 and 1, with a value closer to 1 only terms with a higher similarity will be matched. For example:</p>
|
|
<pre class="code">roam~0.8</pre>
|
|
<p>The default that is used if the parameter is not given is 0.5.</p>
|
|
<a name="N100B4"></a><a name="Proximity Searches"></a>
|
|
<h3 class="boxed">Proximity Searches</h3>
|
|
<p>Lucene supports finding words are a within a specific distance away. To do a proximity search use the tilde, "~", symbol at the end of a Phrase. For example to search for a "apache" and "jakarta" within 10 words of each other in a document use the search: </p>
|
|
<pre class="code">"jakarta apache"~10</pre>
|
|
<a name="N100C1"></a><a name="Range Searches"></a>
|
|
<h3 class="boxed">Range Searches</h3>
|
|
<p>Range Queries allow one to match documents whose field(s) values
|
|
are between the lower and upper bound specified by the Range Query.
|
|
Range Queries can be inclusive or exclusive of the upper and lower bounds.
|
|
Sorting is done lexicographically.</p>
|
|
<pre class="code">mod_date:[20020101 TO 20030101]</pre>
|
|
<p>This will find documents whose mod_date fields have values between 20020101 and 20030101, inclusive.
|
|
Note that Range Queries are not reserved for date fields. You could also use range queries with non-date fields:</p>
|
|
<pre class="code">title:{Aida TO Carmen}</pre>
|
|
<p>This will find all documents whose titles are between Aida and Carmen, but not including Aida and Carmen.</p>
|
|
<p>Inclusive range queries are denoted by square brackets. Exclusive range queries are denoted by
|
|
curly brackets.</p>
|
|
<a name="N100DA"></a><a name="Boosting a Term"></a>
|
|
<h3 class="boxed">Boosting a Term</h3>
|
|
<p>Lucene provides the relevance level of matching documents based on the terms found. To boost a term use the caret, "^", symbol with a boost factor (a number) at the end of the term you are searching. The higher the boost factor, the more relevant the term will be.</p>
|
|
<p>Boosting allows you to control the relevance of a document by boosting its term. For example, if you are searching for</p>
|
|
<pre class="code">jakarta apache</pre>
|
|
<p>and you want the term "jakarta" to be more relevant boost it using the ^ symbol along with the boost factor next to the term.
|
|
You would type:</p>
|
|
<pre class="code">jakarta^4 apache</pre>
|
|
<p>This will make documents with the term jakarta appear more relevant. You can also boost Phrase Terms as in the example: </p>
|
|
<pre class="code">"jakarta apache"^4 "Apache Lucene"</pre>
|
|
<p>By default, the boost factor is 1. Although the boost factor must be positive, it can be less than 1 (e.g. 0.2)</p>
|
|
</div>
|
|
|
|
|
|
|
|
<a name="N100FA"></a><a name="Boolean operators"></a>
|
|
<h2 class="boxed">Boolean Operators</h2>
|
|
<div class="section">
|
|
<p>Boolean operators allow terms to be combined through logic operators.
|
|
Lucene supports AND, "+", OR, NOT and "-" as Boolean operators(Note: Boolean operators must be ALL CAPS).</p>
|
|
<a name="N10103"></a><a name="OR"></a>
|
|
<h3 class="boxed"></h3>
|
|
<p>The OR operator is the default conjunction operator. This means that if there is no Boolean operator between two terms, the OR operator is used.
|
|
The OR operator links two terms and finds a matching document if either of the terms exist in a document. This is equivalent to a union using sets.
|
|
The symbol || can be used in place of the word OR.</p>
|
|
<p>To search for documents that contain either "jakarta apache" or just "jakarta" use the query:</p>
|
|
<pre class="code">"jakarta apache" jakarta</pre>
|
|
<p>or</p>
|
|
<pre class="code">"jakarta apache" OR jakarta</pre>
|
|
<a name="N10116"></a><a name="AND"></a>
|
|
<h3 class="boxed">AND</h3>
|
|
<p>The AND operator matches documents where both terms exist anywhere in the text of a single document.
|
|
This is equivalent to an intersection using sets. The symbol && can be used in place of the word AND.</p>
|
|
<p>To search for documents that contain "jakarta apache" and "Apache Lucene" use the query: </p>
|
|
<pre class="code">"jakarta apache" AND "Apache Lucene"</pre>
|
|
<a name="N10126"></a><a name="+"></a>
|
|
<h3 class="boxed">+</h3>
|
|
<p>The "+" or required operator requires that the term after the "+" symbol exist somewhere in a the field of a single document.</p>
|
|
<p>To search for documents that must contain "jakarta" and may contain "lucene" use the query:</p>
|
|
<pre class="code">+jakarta apache</pre>
|
|
<a name="N10136"></a><a name="NOT"></a>
|
|
<h3 class="boxed">NOT</h3>
|
|
<p>The NOT operator excludes documents that contain the term after NOT.
|
|
This is equivalent to a difference using sets. The symbol ! can be used in place of the word NOT.</p>
|
|
<p>To search for documents that contain "jakarta apache" but not "Apache Lucene" use the query: </p>
|
|
<pre class="code">"jakarta apache" NOT "Apache Lucene"</pre>
|
|
<p>Note: The NOT operator cannot be used with just one term. For example, the following search will return no results:</p>
|
|
<pre class="code">NOT "jakarta apache"</pre>
|
|
<a name="N1014C"></a><a name="-"></a>
|
|
<h3 class="boxed">-</h3>
|
|
<p>The "-" or prohibit operator excludes documents that contain the term after the "-" symbol.</p>
|
|
<p>To search for documents that contain "jakarta apache" but not "Apache Lucene" use the query: </p>
|
|
<pre class="code">"jakarta apache" -"Apache Lucene"</pre>
|
|
</div>
|
|
|
|
|
|
<a name="N1015D"></a><a name="Grouping"></a>
|
|
<h2 class="boxed">Grouping</h2>
|
|
<div class="section">
|
|
<p>Lucene supports using parentheses to group clauses to form sub queries. This can be very useful if you want to control the boolean logic for a query.</p>
|
|
<p>To search for either "jakarta" or "apache" and "website" use the query:</p>
|
|
<pre class="code">(jakarta OR apache) AND website</pre>
|
|
<p>This eliminates any confusion and makes sure you that website must exist and either term jakarta or apache may exist.</p>
|
|
</div>
|
|
|
|
|
|
<a name="N10170"></a><a name="Field Grouping"></a>
|
|
<h2 class="boxed">Field Grouping</h2>
|
|
<div class="section">
|
|
<p>Lucene supports using parentheses to group multiple clauses to a single field.</p>
|
|
<p>To search for a title that contains both the word "return" and the phrase "pink panther" use the query:</p>
|
|
<pre class="code">title:(+return +"pink panther")</pre>
|
|
</div>
|
|
|
|
|
|
<a name="N10180"></a><a name="Escaping Special Characters"></a>
|
|
<h2 class="boxed">Escaping Special Characters</h2>
|
|
<div class="section">
|
|
<p>Lucene supports escaping special characters that are part of the query syntax. The current list special characters are</p>
|
|
<p>+ - && || ! ( ) { } [ ] ^ " ~ * ? : \</p>
|
|
<p>To escape these character use the \ before the character. For example to search for (1+1):2 use the query:</p>
|
|
<pre class="code">\(1\+1\)\:2</pre>
|
|
</div>
|
|
|
|
|
|
</div>
|
|
<div class="clearboth"> </div>
|
|
</div>
|
|
<div id="footer">
|
|
<div class="lastmodified">
|
|
<script type="text/javascript"><!--
|
|
document.write("<text>Last Published:</text> " + document.lastModified);
|
|
// --></script>
|
|
</div>
|
|
<div class="copyright">
|
|
Copyright ©
|
|
2006 <a href="http://www.apache.org/licenses/">The Apache Software Foundation.</a>
|
|
</div>
|
|
</div>
|
|
</body>
|
|
</html>
|