LuceneCoreQuery.dtd
: Elements - Entities - Source | Intro - Index
FRAMES / NO FRAMES
CoreParser.java is the Java class that encapsulates this parser behaviour.
<BooleanQuery> | Child of Query, Clause, CachedFilter |
BooleanQuerys implement Boolean logic which controls how multiple Clauses should be interpreted. Some clauses may represent optional Query criteria while others represent mandatory criteria.
Example: Find articles about banks, preferably talking about mergers but nothing to do with "sumitomo"
<BooleanQuery fieldName="contents"> <Clause occurs="should"> <TermQuery>merger</TermQuery> </Clause> <Clause occurs="mustnot"> <TermQuery>sumitomo</TermQuery> </Clause> <Clause occurs="must"> <TermQuery>bank</TermQuery> </Clause> </BooleanQuery>
Element's model:
<BooleanQuery>'s children Name Cardinality Clause At least one
<BooleanQuery>'s attributes Name Values Default boost 1.0 disableCoord true, false false fieldName minimumNumberShouldMatch 0
(Clause)+
@boost | Attribute of BooleanQuery |
Optional boost for matches on this query. Values > 1
Default value: 1.0
@fieldName | Attribute of BooleanQuery |
fieldName can optionally be defined here as a default attribute used by all child elements
@disableCoord | Attribute of BooleanQuery |
The "Coordination factor" rewards documents that contain more of the optional clauses in this list. This flag can be used to turn off this factor.
Possible values: true, false - Default value: false
@minimumNumberShouldMatch | Attribute of BooleanQuery |
The minimum number of optional clauses that should be present in any one document before it is considered to be a match.
Default value: 0
<Clause> | Child of BooleanQuery |
NOTE: "Clause" tag has 2 modes of use - inside <BooleanQuery> in which case only "query" types can be child elements - while in a <BooleanFilter> clause only "filter" types can be contained.
Element's model:
<Clause>'s children Name Cardinality BooleanQuery One or none CachedFilter One or none ConstantScoreQuery One or none FilteredQuery One or none MatchAllDocsQuery One or none RangeFilter One or none SpanFirst One or none SpanNear One or none SpanNot One or none SpanOr One or none SpanOrTerms One or none SpanTerm One or none TermQuery One or none TermsQuery One or none UserQuery One or none
<Clause>'s attributes Name Values Default occurs should, must, mustnot should
(BooleanQuery | UserQuery | FilteredQuery | TermQuery | TermsQuery | MatchAllDocsQuery | ConstantScoreQuery | SpanOr | SpanNear | SpanOrTerms | SpanFirst | SpanNot | SpanTerm | RangeFilter | CachedFilter)
@occurs | Attribute of Clause |
Controls if the clause is optional (should), mandatory (must) or unacceptable (mustNot)
Possible values: should, must, mustnot - Default value: should
<CachedFilter> | Child of ConstantScoreQuery, Clause, Filter |
Caches any nested query or filter in an LRU (Least recently used) Cache. Cached queries, like filters, are turned into Bitsets at a cost of 1 bit per document in the index. The memory cost of a cached query/filter is therefore numberOfDocsinIndex/8 bytes. Queries that are cached as filters obviously retain none of the scoring information associated with results - they retain just a Boolean yes/no record of which documents matched.
Example: Search for documents about banks from the last 10 years - caching the commonly-used "last 10 year" filter as a BitSet in RAM to eliminate the cost of building this filter from disk for every query
<FilteredQuery> <Query> <UserQuery>bank</UserQuery> </Query> <Filter> <CachedFilter> <RangeFilter fieldName="date" lowerTerm="19970101" upperTerm="20070101"/> </CachedFilter> </Filter> </FilteredQuery>
Element's model:
<CachedFilter>'s children Name Cardinality BooleanQuery One or none CachedFilter One or none ConstantScoreQuery One or none FilteredQuery One or none MatchAllDocsQuery One or none RangeFilter One or none SpanFirst One or none SpanNear One or none SpanNot One or none SpanOr One or none SpanOrTerms One or none SpanTerm One or none TermQuery One or none TermsQuery One or none UserQuery One or none
(BooleanQuery | UserQuery | FilteredQuery | TermQuery | TermsQuery | MatchAllDocsQuery | ConstantScoreQuery | SpanOr | SpanNear | SpanOrTerms | SpanFirst | SpanNot | SpanTerm | RangeFilter | CachedFilter)
<UserQuery> | Child of Query, Clause, CachedFilter |
Passes content directly through to the standard LuceneQuery parser see "Lucene Query Syntax"
Example: Search for documents about John Smith or John Doe using standard LuceneQuerySyntax
<UserQuery>"John Smith" OR "John Doe"</UserQuery>
<UserQuery>'s attributes Name Values Default boost 1.0
@boost | Attribute of UserQuery |
Optional boost for matches on this query. Values > 1
Default value: 1.0
<MatchAllDocsQuery/> | Child of Query, Clause, CachedFilter |
A query which is used to match all documents. This has a couple of uses:
Example: Effectively use a Filter as a query
<FilteredQuery> <Query> <MatchAllDocsQuery/> </Query> <Filter> <RangeFilter fieldName="date" lowerTerm="19870409" upperTerm="19870412"/> </Filter> </FilteredQuery>
This element is always empty.
<TermQuery> | Child of Query, Clause, CachedFilter |
a single term query - no analysis is done of the child text
Example: Match on a primary key
<TermQuery fieldName="primaryKey">13424</TermQuery>
<TermQuery>'s attributes Name Values Default boost 1.0 fieldName
@boost | Attribute of TermQuery |
Optional boost for matches on this query. Values > 1
Default value: 1.0
@fieldName | Attribute of TermQuery |
fieldName must be defined here or is taken from the most immediate parent XML element that defines a "fieldName" attribute
<TermsQuery> | Child of Query, Clause, CachedFilter |
The equivalent of a BooleanQuery with multiple optional TermQuery clauses. Child text is analyzed using a field-specific choice of Analyzer to produce a set of terms that are ORed together in Boolean logic. Unlike UserQuery element, this does not parse any special characters to control fuzzy/phrase/boolean logic and as such is incapable of producing a Query parse error given any user input
Example: Match on text from a database description (which may contain characters that are illegal characters in the standard Lucene Query syntax used in the UserQuery tag
<TermsQuery fieldName="description">Smith & Sons (Ltd) : incorporated 1982</TermsQuery>
<TermsQuery>'s attributes Name Values Default boost 1.0 disableCoord true, false false fieldName minimumNumberShouldMatch 0
@boost | Attribute of TermsQuery |
Optional boost for matches on this query. Values > 1
Default value: 1.0
@fieldName | Attribute of TermsQuery |
fieldName must be defined here or is taken from the most immediate parent XML element that defines a "fieldName" attribute
@disableCoord | Attribute of TermsQuery |
The "Coordination factor" rewards documents that contain more of the terms in this list. This flag can be used to turn off this factor.
Possible values: true, false - Default value: false
@minimumNumberShouldMatch | Attribute of TermsQuery |
The minimum number of terms that should be present in any one document before it is considered to be a match.
Default value: 0
<FilteredQuery> | Child of Query, Clause, CachedFilter |
Runs a Query and filters results to only those query matches that also match the Filter element.
Example: Find all documents about Lucene that have a status of "published"
<FilteredQuery> <Query> <UserQuery>Lucene</UserQuery> </Query> <Filter> <TermsFilter fieldName="status">published</TermsFilter> </Filter> </FilteredQuery>
Element's model:
<FilteredQuery>'s children Name Cardinality Filter Only one Query Only one
<FilteredQuery>'s attributes Name Values Default boost 1.0
@boost | Attribute of FilteredQuery |
Optional boost for matches on this query. Values > 1
Default value: 1.0
<Query> | Child of FilteredQuery |
Used to identify a nested Query element inside another container element. NOT a top-level query tag
Element's model:
<Query>'s children Name Cardinality BooleanQuery One or none ConstantScoreQuery One or none FilteredQuery One or none MatchAllDocsQuery One or none SpanFirst One or none SpanNear One or none SpanNot One or none SpanOr One or none SpanOrTerms One or none SpanTerm One or none TermQuery One or none TermsQuery One or none UserQuery One or none
(BooleanQuery | UserQuery | FilteredQuery | TermQuery | TermsQuery | MatchAllDocsQuery | ConstantScoreQuery | SpanOr | SpanNear | SpanOrTerms | SpanFirst | SpanNot | SpanTerm)
<Filter> | Child of FilteredQuery |
The choice of Filter that MUST also be matched
Element's model:
<Filter>'s children Name Cardinality CachedFilter One or none RangeFilter One or none
<RangeFilter/> | Child of ConstantScoreQuery, Clause, CachedFilter, Filter |
Filter used to limit query results to documents matching a range of field values
Example: Search for documents about banks from the last 10 years
<FilteredQuery> <Query> <UserQuery>bank</UserQuery> </Query> <Filter> <RangeFilter fieldName="date" lowerTerm="19970101" upperTerm="20070101"/> </Filter> </FilteredQuery>
<RangeFilter>'s attributes Name Values Default fieldName includeLower true, false true includeUpper true, false true lowerTerm upperTerm
This element is always empty.
@fieldName | Attribute of RangeFilter |
fieldName must be defined here or is taken from the most immediate parent XML element that defines a "fieldName" attribute
@lowerTerm | Attribute of RangeFilter |
The lower-most term value for this field (must be <= upperTerm)
Required
@upperTerm | Attribute of RangeFilter |
The upper-most term value for this field (must be >= lowerTerm)
Required
@includeLower | Attribute of RangeFilter |
Controls if the lowerTerm in the range is part of the allowed set of values
Possible values: true, false - Default value: true
@includeUpper | Attribute of RangeFilter |
Controls if the upperTerm in the range is part of the allowed set of values
Possible values: true, false - Default value: true
<SpanTerm> | Child of SpanNear, Include, Query, Clause, SpanOr, SpanFirst, Exclude, CachedFilter |
A single term used in a SpanQuery. These clauses are the building blocks for more complex "span" queries which test word proximity
Example: Find documents using terms close to each other about mining and accidents
<SpanNear slop="8" inOrder="false" fieldName="text"> <SpanOr> <SpanTerm>killed</SpanTerm> <SpanTerm>died</SpanTerm> <SpanTerm>dead</SpanTerm> </SpanOr> <SpanOr> <SpanTerm>miner</SpanTerm> <SpanTerm>mining</SpanTerm> <SpanTerm>miners</SpanTerm> </SpanOr> </SpanNear>
<SpanTerm>'s attributes Name Values Default fieldName
@fieldName | Attribute of SpanTerm |
fieldName must be defined here or is taken from the most immediate parent XML element that defines a "fieldName" attribute
Required
<SpanOrTerms> | Child of SpanNear, Include, Query, Clause, SpanOr, SpanFirst, Exclude, CachedFilter |
A field-specific analyzer is used here to parse the child text provided in this tag. The SpanTerms produced are ORed in terms of Boolean logic
Example: Use SpanOrTerms as a more convenient/succinct way of expressing multiple choices of SpanTerms. This example looks for reports using words describing a fatality near to references to miners
<SpanNear slop="8" inOrder="false" fieldName="text"> <SpanOrTerms>killed died death dead deaths</SpanOrTerms> <SpanOrTerms>miner mining miners</SpanOrTerms> </SpanNear>
<SpanOrTerms>'s attributes Name Values Default fieldName
@fieldName | Attribute of SpanOrTerms |
fieldName must be defined here or is taken from the most immediate parent XML element that defines a "fieldName" attribute
Required
<SpanOr> | Child of SpanNear, Include, Query, Clause, SpanFirst, Exclude, CachedFilter |
Takes any number of child queries from the Span family
Example: Find documents using terms close to each other about mining and accidents
<SpanNear slop="8" inOrder="false" fieldName="text"> <SpanOr> <SpanTerm>killed</SpanTerm> <SpanTerm>died</SpanTerm> <SpanTerm>dead</SpanTerm> </SpanOr> <SpanOr> <SpanTerm>miner</SpanTerm> <SpanTerm>mining</SpanTerm> <SpanTerm>miners</SpanTerm> </SpanOr> </SpanNear>
Element's model:
<SpanOr>'s children Name Cardinality SpanFirst Any number SpanNear Any number SpanNot Any number SpanOr Any number SpanOrTerms Any number SpanTerm Any number
(SpanOr | SpanNear | SpanOrTerms | SpanFirst | SpanNot | SpanTerm)*
<SpanNear> | Child of Include, Query, Clause, SpanOr, SpanFirst, Exclude, CachedFilter |
Takes any number of child queries from the Span family and tests for proximity
Element's model:
<SpanNear>'s children Name Cardinality SpanFirst Any number SpanNear Any number SpanNot Any number SpanOr Any number SpanOrTerms Any number SpanTerm Any number
<SpanNear>'s attributes Name Values Default inOrder true, false true slop
(SpanOr | SpanNear | SpanOrTerms | SpanFirst | SpanNot | SpanTerm)*
@slop | Attribute of SpanNear |
defines the maximum distance between Span elements where distance is expressed as word number, not byte offset
Example: Find documents using terms within 8 words of each other talking about mining and accidents
<SpanNear slop="8" inOrder="false" fieldName="text"> <SpanOr> <SpanTerm>killed</SpanTerm> <SpanTerm>died</SpanTerm> <SpanTerm>dead</SpanTerm> </SpanOr> <SpanOr> <SpanTerm>miner</SpanTerm> <SpanTerm>mining</SpanTerm> <SpanTerm>miners</SpanTerm> </SpanOr> </SpanNear>
Required
@inOrder | Attribute of SpanNear |
Controls if matching terms have to appear in the order listed or can be reversed
Possible values: true, false - Default value: true
<SpanFirst> | Child of SpanNear, Include, Query, Clause, SpanOr, Exclude, CachedFilter |
Looks for a SpanQuery match occuring near the beginning of a document
Example: Find letters where the first 50 words talk about a resignation:
<SpanFirst end="50"> <SpanOrTerms fieldName="text">resigning resign leave</SpanOrTerms> </SpanFirst>
Element's model:
<SpanFirst>'s children Name Cardinality SpanFirst One or none SpanNear One or none SpanNot One or none SpanOr One or none SpanOrTerms One or none SpanTerm One or none
<SpanFirst>'s attributes Name Values Default boost 1.0 end
(SpanOr | SpanNear | SpanOrTerms | SpanFirst | SpanNot | SpanTerm)
@end | Attribute of SpanFirst |
Controls the end of the region considered in a document's field (expressed in word number, not byte offset)
Required
@boost | Attribute of SpanFirst |
Optional boost for matches on this query. Values > 1
Default value: 1.0
<SpanNot> | Child of SpanNear, Include, Query, Clause, SpanOr, SpanFirst, Exclude, CachedFilter |
Finds documents matching a SpanQuery but not if matching another SpanQuery
Example: Find documents talking about social services but not containing the word "public"
<SpanNot fieldName="text"> <Include> <SpanNear slop="2" inOrder="true"> <SpanTerm>social</SpanTerm> <SpanTerm>services</SpanTerm> </SpanNear> </Include> <Exclude> <SpanTerm>public</SpanTerm> </Exclude> </SpanNot>
Element's model:
<SpanNot>'s children Name Cardinality Exclude Only one Include Only one
<Include> | Child of SpanNot |
The SpanQuery to find
Element's model:
<Include>'s children Name Cardinality SpanFirst One or none SpanNear One or none SpanNot One or none SpanOr One or none SpanOrTerms One or none SpanTerm One or none
(SpanOr | SpanNear | SpanOrTerms | SpanFirst | SpanNot | SpanTerm)
<Exclude> | Child of SpanNot |
The SpanQuery to be avoided
Element's model:
<Exclude>'s children Name Cardinality SpanFirst One or none SpanNear One or none SpanNot One or none SpanOr One or none SpanOrTerms One or none SpanTerm One or none
(SpanOr | SpanNear | SpanOrTerms | SpanFirst | SpanNot | SpanTerm)
<ConstantScoreQuery> | Child of Query, Clause, CachedFilter |
a utility tag to wrap any filter as a query
Example: Find all documents from the last 10 years
<ConstantScoreQuery> <RangeFilter fieldName="date" lowerTerm="19970101" upperTerm="20070101"/> </ConstantScoreQuery>
Element's model:
<ConstantScoreQuery>'s children Name Cardinality CachedFilter Any number RangeFilter Any number
<ConstantScoreQuery>'s attributes Name Values Default boost 1.0
(RangeFilter | CachedFilter)*
@boost | Attribute of ConstantScoreQuery |
Optional boost for matches on this query. Values > 1
Default value: 1.0