This DTD describes the XML syntax used to perform advanced searches using the core Lucene search engine. The motivation behind the XML query syntax is:
<ol>
<li>To open up Lucene functionality to clients other than Java</li>
<li>To offer a form of expressing queries that can easily be
<ul>
<li>Persisted for logging/auditing purposes</li>
<li>Changed by editing text query templates (XSLT) without requiring a recompile/redeploy of applications</li>
<li>Serialized across networks (without requiring Java bytecode for Query logic deployed on clients)</li>
</ul>
</li>
<li>To provide a shorthand way of expressing query logic which echos the logical tree structure of query objects more closely than reading procedural Java query construction code</li>
<li>To bridge the growing gap between Lucene query/filtering functionality and the set of functionality accessible throught the standard Lucene QueryParser syntax</li>
<li>To provide a simply extensible syntax that does not require complex parser skills such as knowledge of JavaCC syntax</li>
</ol></p><p><h3>Syntax overview</h3>
Search syntax consists of two types of elements:
<ul>
<li><i>Queries</i></li>
<li><i>Filters</i></li>
</ul></p><p><h4>Queries</h4>
The root of any XML search must be a <i>Query</i> type element used to select content.
Queries typically score matches on documents using a number of different factors in order to provide relevant results first.
One common example of a query tag is the <ahref="#UserQuery">UserQuery</a> element which uses the standard
Lucene QueryParser to parse Google-style search syntax provided by end users.</p><p><h4>Filters</h4>
Unlike Queries, <i>Filters</i> are not used to select or score content - they are simply used to filter <i>Query</i> output (see <ahref="#FilteredQuery">FilteredQuery</a> for an example use of query filtering).
Because Filters simply offer a yes/no decision for each document in the index their output can be efficiently cached in memory as a <ahref="http://java.sun.com/j2se/1.4.2/docs/api/java/util/BitSet.html">Bitset</a> for
subsequent reuse (see <ahref="#CachedFilter">CachedFilter</a> tag).</p><p><h4>Nesting elements</h4>
Many of the the elements can nest other elements to produce queries/filters of an arbitrary depth and complexity.
The <ahref="#BooleanQuery">BooleanQuery</a> element is one such example which provides a means for combining other queries (including other BooleanQueries) using Boolean
logic to determine mandatory or optional elements.</p><p><h3>Advanced topics</h3>
The <i>SpanQuery</i> class of queries allow for complex positional tests which not only look for certain combinations of words but in particular
positions in relation to each other and the documents containing them.</p><p>CoreParser.java is the Java class that encapsulates this parser behaviour.</p><br/>
Child of <ahref='#Clause'>Clause</a>, <ahref='#CachedFilter'>CachedFilter</a>, <ahref='#Query'>Query</a>
</td></tr></table>
<p>BooleanQuerys implement Boolean logic which controls how multiple Clauses should be interpreted.
Some clauses may represent optional Query criteria while others represent mandatory criteria.</p><p><spanclass='inTextTitle'>Example:</span><em>Find articles about banks, preferably talking about mergers but nothing to do with "sumitomo"</em>
Attribute of <ahref='#BooleanQuery'>BooleanQuery</a>
</td></tr></table>
<p>Optional boost for matches on this query. Values > 1</p><p><spanclass='inTextTitle'>Default value</span>: 1.0</p><aname='BooleanQuery_fieldName'></a>
Attribute of <ahref='#BooleanQuery'>BooleanQuery</a>
</td></tr></table>
<p>The "Coordination factor" rewards documents that contain more of the optional clauses in this list. This flag can be used to turn off this factor.</p><p><spanclass='inTextTitle'>Possible values</span>: true, false - <spanclass='inTextTitle'>Default value</span>: false</p><aname='BooleanQuery_minimumNumberShouldMatch'></a>
Attribute of <ahref='#BooleanQuery'>BooleanQuery</a>
</td></tr></table>
<p>The minimum number of optional clauses that should be present in any one document before it is considered to be a match.</p><p><spanclass='inTextTitle'>Default value</span>: 0</p><aname='Clause'></a>
<p>Controls if the clause is optional (should), mandatory (must) or unacceptable (mustNot)</p><p><spanclass='inTextTitle'>Possible values</span>: should, must, mustnot - <spanclass='inTextTitle'>Default value</span>: should</p><aname='CachedFilter'></a>
Child of <ahref='#ConstantScoreQuery'>ConstantScoreQuery</a>, <ahref='#Clause'>Clause</a>, <ahref='#Filter'>Filter</a>
</td></tr></table>
<p>Caches any nested query or filter in an LRU (Least recently used) Cache. Cached queries, like filters, are turned into
Bitsets at a cost of 1 bit per document in the index. The memory cost of a cached query/filter is therefore numberOfDocsinIndex/8 bytes.
Queries that are cached as filters obviously retain none of the scoring information associated with results - they retain just
a Boolean yes/no record of which documents matched.</p><p><spanclass='inTextTitle'>Example:</span><em>Search for documents about banks from the last 10 years - caching the commonly-used "last 10 year" filter as a BitSet in
RAM to eliminate the cost of building this filter from disk for every query</em>
Child of <ahref='#Clause'>Clause</a>, <ahref='#CachedFilter'>CachedFilter</a>, <ahref='#Query'>Query</a>
</td></tr></table>
<p>Passes content directly through to the standard LuceneQuery parser see "Lucene Query Syntax"</p><p><spanclass='inTextTitle'>Example:</span><em>Search for documents about John Smith or John Doe using standard LuceneQuerySyntax</em>
</p><pre>
<UserQuery>"John Smith" OR "John Doe"</UserQuery>
<p>fieldName must be defined here or is taken from the most immediate parent XML element that defines a "fieldName" attribute</p><aname='TermsQuery'></a>
Child of <ahref='#Clause'>Clause</a>, <ahref='#CachedFilter'>CachedFilter</a>, <ahref='#Query'>Query</a>
</td></tr></table>
<p>The equivalent of a BooleanQuery with multiple optional TermQuery clauses.
Child text is analyzed using a field-specific choice of Analyzer to produce a set of terms that are ORed together in Boolean logic.
Unlike UserQuery element, this does not parse any special characters to control fuzzy/phrase/boolean logic and as such is incapable
of producing a Query parse error given any user input</p><p><spanclass='inTextTitle'>Example:</span><em>Match on text from a database description (which may contain characters that
are illegal characters in the standard Lucene Query syntax used in the UserQuery tag</em>
</p><pre>
<TermsQuery fieldName="description">Smith & Sons (Ltd) : incorporated 1982</TermsQuery>
<p>fieldName must be defined here or is taken from the most immediate parent XML element that defines a "fieldName" attribute</p><aname='TermsQuery_disableCoord'></a>
<p>The "Coordination factor" rewards documents that contain more of the terms in this list. This flag can be used to turn off this factor.</p><p><spanclass='inTextTitle'>Possible values</span>: true, false - <spanclass='inTextTitle'>Default value</span>: false</p><aname='TermsQuery_minimumNumberShouldMatch'></a>
<p>The minimum number of terms that should be present in any one document before it is considered to be a match.</p><p><spanclass='inTextTitle'>Default value</span>: 0</p><aname='FilteredQuery'></a>
Child of <ahref='#Clause'>Clause</a>, <ahref='#CachedFilter'>CachedFilter</a>, <ahref='#Query'>Query</a>
</td></tr></table>
<p>Runs a Query and filters results to only those query matches that also match the Filter element.</p><p><spanclass='inTextTitle'>Example:</span><em>Find all documents about Lucene that have a status of "published"</em>
Child of <ahref='#FilteredQuery'>FilteredQuery</a>
</td></tr></table>
<p>Used to identify a nested Query element inside another container element. NOT a top-level query tag</p><blockquote><tablesummary='element info'><tr>
Child of <ahref='#ConstantScoreQuery'>ConstantScoreQuery</a>, <ahref='#Clause'>Clause</a>, <ahref='#CachedFilter'>CachedFilter</a>, <ahref='#Filter'>Filter</a>
</td></tr></table>
<p>Filter used to limit query results to documents matching a range of field values</p><p><spanclass='inTextTitle'>Example:</span><em>Search for documents about banks from the last 10 years</em>
Attribute of <ahref='#RangeFilter'>RangeFilter</a>
</td></tr></table>
<p>fieldName must be defined here or is taken from the most immediate parent XML element that defines a "fieldName" attribute</p><aname='RangeFilter_lowerTerm'></a>
Attribute of <ahref='#RangeFilter'>RangeFilter</a>
</td></tr></table>
<p>The lower-most term value for this field (must be <= upperTerm)</p><p><spanclass='inTextTitle'>Required</span></p><aname='RangeFilter_upperTerm'></a>
Attribute of <ahref='#RangeFilter'>RangeFilter</a>
</td></tr></table>
<p>The upper-most term value for this field (must be >= lowerTerm)</p><p><spanclass='inTextTitle'>Required</span></p><aname='RangeFilter_includeLower'></a>
Attribute of <ahref='#RangeFilter'>RangeFilter</a>
</td></tr></table>
<p>Controls if the lowerTerm in the range is part of the allowed set of values</p><p><spanclass='inTextTitle'>Possible values</span>: true, false - <spanclass='inTextTitle'>Default value</span>: true</p><aname='RangeFilter_includeUpper'></a>
Attribute of <ahref='#RangeFilter'>RangeFilter</a>
</td></tr></table>
<p>Controls if the upperTerm in the range is part of the allowed set of values</p><p><spanclass='inTextTitle'>Possible values</span>: true, false - <spanclass='inTextTitle'>Default value</span>: true</p><aname='SpanTerm'></a>
Child of <ahref='#SpanOr'>SpanOr</a>, <ahref='#SpanFirst'>SpanFirst</a>, <ahref='#Exclude'>Exclude</a>, <ahref='#Clause'>Clause</a>, <ahref='#Include'>Include</a>, <ahref='#CachedFilter'>CachedFilter</a>, <ahref='#SpanNear'>SpanNear</a>, <ahref='#Query'>Query</a>
</td></tr></table>
<p>A single term used in a SpanQuery. These clauses are the building blocks for more complex "span" queries which test word proximity</p><p><spanclass='inTextTitle'>Example:</span><em>Find documents using terms close to each other about mining and accidents</em>
<p>fieldName must be defined here or is taken from the most immediate parent XML element that defines a "fieldName" attribute</p><p><spanclass='inTextTitle'>Required</span></p><aname='SpanOrTerms'></a>
Child of <ahref='#SpanOr'>SpanOr</a>, <ahref='#SpanFirst'>SpanFirst</a>, <ahref='#Exclude'>Exclude</a>, <ahref='#Clause'>Clause</a>, <ahref='#Include'>Include</a>, <ahref='#CachedFilter'>CachedFilter</a>, <ahref='#SpanNear'>SpanNear</a>, <ahref='#Query'>Query</a>
</td></tr></table>
<p>A field-specific analyzer is used here to parse the child text provided in this tag. The SpanTerms produced are ORed in terms of Boolean logic</p><p><spanclass='inTextTitle'>Example:</span><em>Use SpanOrTerms as a more convenient/succinct way of expressing multiple choices of SpanTerms. This example looks for reports
using words describing a fatality near to references to miners</em>
Attribute of <ahref='#SpanOrTerms'>SpanOrTerms</a>
</td></tr></table>
<p>fieldName must be defined here or is taken from the most immediate parent XML element that defines a "fieldName" attribute</p><p><spanclass='inTextTitle'>Required</span></p><aname='SpanOr'></a>
Child of <ahref='#SpanFirst'>SpanFirst</a>, <ahref='#Exclude'>Exclude</a>, <ahref='#Clause'>Clause</a>, <ahref='#Include'>Include</a>, <ahref='#CachedFilter'>CachedFilter</a>, <ahref='#SpanNear'>SpanNear</a>, <ahref='#Query'>Query</a>
</td></tr></table>
<p>Takes any number of child queries from the Span family</p><p><spanclass='inTextTitle'>Example:</span><em>Find documents using terms close to each other about mining and accidents</em>
<p>defines the maximum distance between Span elements where distance is expressed as word number, not byte offset</p><p><spanclass='inTextTitle'>Example:</span><em>Find documents using terms within 8 words of each other talking about mining and accidents</em>
<p>Controls if matching terms have to appear in the order listed or can be reversed</p><p><spanclass='inTextTitle'>Possible values</span>: true, false - <spanclass='inTextTitle'>Default value</span>: true</p><aname='SpanFirst'></a>
Child of <ahref='#SpanOr'>SpanOr</a>, <ahref='#Exclude'>Exclude</a>, <ahref='#Clause'>Clause</a>, <ahref='#Include'>Include</a>, <ahref='#CachedFilter'>CachedFilter</a>, <ahref='#SpanNear'>SpanNear</a>, <ahref='#Query'>Query</a>
</td></tr></table>
<p>Looks for a SpanQuery match occuring near the beginning of a document</p><p><spanclass='inTextTitle'>Example:</span><em>Find letters where the first 50 words talk about a resignation:</em>
<p>Controls the end of the region considered in a document's field (expressed in word number, not byte offset)</p><p><spanclass='inTextTitle'>Required</span></p><aname='SpanFirst_boost'></a>
Child of <ahref='#SpanOr'>SpanOr</a>, <ahref='#SpanFirst'>SpanFirst</a>, <ahref='#Exclude'>Exclude</a>, <ahref='#Clause'>Clause</a>, <ahref='#Include'>Include</a>, <ahref='#CachedFilter'>CachedFilter</a>, <ahref='#SpanNear'>SpanNear</a>, <ahref='#Query'>Query</a>
</td></tr></table>
<p>Finds documents matching a SpanQuery but not if matching another SpanQuery</p><p><spanclass='inTextTitle'>Example:</span><em>Find documents talking about social services but not containing the word "public"</em>