2009-06-22 18:18:56 -04:00
<!--
< h 3 > B a c k g r o u n d < / h 3 >
T h i s D T D d e s c r i b e s t h e X M L s y n t a x u s e d t o p e r f o r m a d v a n c e d s e a r c h e s u s i n g t h e c o r e L u c e n e s e a r c h e n g i n e . T h e m o t i v a t i o n b e h i n d t h e X M L q u e r y s y n t a x i s :
< o l >
< l i > T o o p e n u p L u c e n e f u n c t i o n a l i t y t o c l i e n t s o t h e r t h a n J a v a < / l i >
< l i > T o o f f e r a f o r m o f e x p r e s s i n g q u e r i e s t h a t c a n e a s i l y b e
< u l >
< l i > P e r s i s t e d f o r l o g g i n g / a u d i t i n g p u r p o s e s < / l i >
< l i > C h a n g e d b y e d i t i n g t e x t q u e r y t e m p l a t e s ( X S L T ) w i t h o u t r e q u i r i n g a r e c o m p i l e / r e d e p l o y o f a p p l i c a t i o n s < / l i >
< l i > S e r i a l i z e d a c r o s s n e t w o r k s ( w i t h o u t r e q u i r i n g J a v a b y t e c o d e f o r Q u e r y l o g i c d e p l o y e d o n c l i e n t s ) < / l i >
< / u l >
< / l i >
< l i > T o p r o v i d e a s h o r t h a n d w a y o f e x p r e s s i n g q u e r y l o g i c w h i c h e c h o s t h e l o g i c a l t r e e s t r u c t u r e o f q u e r y o b j e c t s m o r e c l o s e l y t h a n r e a d i n g p r o c e d u r a l J a v a q u e r y c o n s t r u c t i o n c o d e < / l i >
< l i > T o b r i d g e t h e g r o w i n g g a p b e t w e e n L u c e n e q u e r y / f i l t e r i n g f u n c t i o n a l i t y a n d t h e s e t o f f u n c t i o n a l i t y a c c e s s i b l e t h r o u g h t t h e s t a n d a r d L u c e n e Q u e r y P a r s e r s y n t a x < / l i >
< l i > T o p r o v i d e a s i m p l y e x t e n s i b l e s y n t a x t h a t d o e s n o t r e q u i r e c o m p l e x p a r s e r s k i l l s s u c h a s k n o w l e d g e o f J a v a C C s y n t a x < / l i >
< / o l >
< h 3 > S y n t a x o v e r v i e w < / h 3 >
S e a r c h s y n t a x c o n s i s t s o f t w o t y p e s o f e l e m e n t s :
< u l >
< l i > < i > Q u e r i e s < / i > < / l i >
< l i > < i > F i l t e r s < / i > < / l i >
< / u l >
< h 4 > Q u e r i e s < / h 4 >
T h e r o o t o f a n y X M L s e a r c h m u s t b e a < i > Q u e r y < / i > t y p e e l e m e n t u s e d t o s e l e c t c o n t e n t .
Q u e r i e s t y p i c a l l y s c o r e m a t c h e s o n d o c u m e n t s u s i n g a n u m b e r o f d i f f e r e n t f a c t o r s i n o r d e r t o p r o v i d e r e l e v a n t r e s u l t s f i r s t .
O n e c o m m o n e x a m p l e o f a q u e r y t a g i s t h e < a h r e f = "#UserQuery" > U s e r Q u e r y < / a > e l e m e n t w h i c h u s e s t h e s t a n d a r d
L u c e n e Q u e r y P a r s e r t o p a r s e G o o g l e - s t y l e s e a r c h s y n t a x p r o v i d e d b y e n d u s e r s .
< h 4 > F i l t e r s < / h 4 >
U n l i k e Q u e r i e s , < i > F i l t e r s < / i > a r e n o t u s e d t o s e l e c t o r s c o r e c o n t e n t - t h e y a r e s i m p l y u s e d t o f i l t e r < i > Q u e r y < / i > o u t p u t ( s e e < a h r e f = "#FilteredQuery" > F i l t e r e d Q u e r y < / a > f o r a n e x a m p l e u s e o f q u e r y f i l t e r i n g ) .
B e c a u s e F i l t e r s s i m p l y o f f e r a y e s / n o d e c i s i o n f o r e a c h d o c u m e n t i n t h e i n d e x t h e i r o u t p u t c a n b e e f f i c i e n t l y c a c h e d i n m e m o r y a s a < a h r e f = "http://java.sun.com/j2se/1.4.2/docs/api/java/util/BitSet.html" > B i t s e t < / a > f o r
s u b s e q u e n t r e u s e ( s e e < a h r e f = "#CachedFilter" > C a c h e d F i l t e r < / a > t a g ) .
< h 4 > N e s t i n g e l e m e n t s < / h 4 >
M a n y o f t h e t h e e l e m e n t s c a n n e s t o t h e r e l e m e n t s t o p r o d u c e q u e r i e s / f i l t e r s o f a n a r b i t r a r y d e p t h a n d c o m p l e x i t y .
T h e < a h r e f = "#BooleanQuery" > B o o l e a n Q u e r y < / a > e l e m e n t i s o n e s u c h e x a m p l e w h i c h p r o v i d e s a m e a n s f o r c o m b i n i n g o t h e r q u e r i e s ( i n c l u d i n g o t h e r B o o l e a n Q u e r i e s ) u s i n g B o o l e a n
l o g i c t o d e t e r m i n e m a n d a t o r y o r o p t i o n a l e l e m e n t s .
< h 3 > A d v a n c e d t o p i c s < / h 3 >
< h 4 > A d v a n c e d p o s i t i o n a l t e s t i n g - s p a n q u e r i e s < / h 4 >
T h e < i > S p a n Q u e r y < / i > c l a s s o f q u e r i e s a l l o w f o r c o m p l e x p o s i t i o n a l t e s t s w h i c h n o t o n l y l o o k f o r c e r t a i n c o m b i n a t i o n s o f w o r d s b u t i n p a r t i c u l a r
p o s i t i o n s i n r e l a t i o n t o e a c h o t h e r a n d t h e d o c u m e n t s c o n t a i n i n g t h e m .
C o r e P a r s e r . j a v a i s t h e J a v a c l a s s t h a t e n c a p s u l a t e s t h i s p a r s e r b e h a v i o u r .
@ t i t l e C o r e L u c e n e
- - >
<!-- @hidden Define core types of XML elements -->
<!ENTITY % coreSpanQueries "SpanOr|SpanNear|SpanOrTerms|SpanFirst|SpanNot|SpanTerm|BoostingTermQuery" >
<!ENTITY % coreQueries "BooleanQuery|UserQuery|FilteredQuery|TermQuery|TermsQuery|MatchAllDocsQuery|ConstantScoreQuery|BoostingTermQuery" >
<!ENTITY % coreFilters "RangeFilter|CachedFilter" >
<!-- @hidden Allow for extensions -->
<!ENTITY % extendedSpanQueries1 " " >
<!ENTITY % extendedQueries1 " " >
<!ENTITY % extendedFilters1 " " >
<!ENTITY % spanQueries "%coreSpanQueries;%extendedSpanQueries1;" >
<!ENTITY % queries "%coreQueries;|%spanQueries;%extendedQueries1;" >
<!ENTITY % filters "%coreFilters;%extendedFilters1;" >
<!--
B o o l e a n Q u e r y s i m p l e m e n t B o o l e a n l o g i c w h i c h c o n t r o l s h o w m u l t i p l e C l a u s e s s h o u l d b e i n t e r p r e t e d .
S o m e c l a u s e s m a y r e p r e s e n t o p t i o n a l Q u e r y c r i t e r i a w h i l e o t h e r s r e p r e s e n t m a n d a t o r y c r i t e r i a .
@ e x a m p l e
< e m > F i n d a r t i c l e s a b o u t b a n k s , p r e f e r a b l y t a l k i n g a b o u t m e r g e r s b u t n o t h i n g t o d o w i t h "sumitomo" < / e m >
%
< B o o l e a n Q u e r y f i e l d N a m e = "contents" >
< C l a u s e o c c u r s = "should" >
< T e r m Q u e r y > m e r g e r < / T e r m Q u e r y >
< / C l a u s e >
< C l a u s e o c c u r s = "mustnot" >
< T e r m Q u e r y > s u m i t o m o < / T e r m Q u e r y >
< / C l a u s e >
< C l a u s e o c c u r s = "must" >
< T e r m Q u e r y > b a n k < / T e r m Q u e r y >
< / C l a u s e >
< / B o o l e a n Q u e r y >
%
- - >
<!ELEMENT BooleanQuery ( Clause ) + >
<!-- Optional boost for matches on this query. Values > 1 -->
<!ATTLIST BooleanQuery boost CDATA "1.0" >
<!-- fieldName can optionally be defined here as a default attribute used by all child elements -->
<!ATTLIST BooleanQuery fieldName CDATA #IMPLIED >
<!-- The "Coordination factor" rewards documents that contain more of the optional clauses in this list. This flag can be used to turn off this factor. -->
<!ATTLIST BooleanQuery disableCoord ( true | false ) "false" >
<!-- The minimum number of optional clauses that should be present in any one document before it is considered to be a match. -->
<!ATTLIST BooleanQuery minimumNumberShouldMatch CDATA "0" >
<!-- NOTE: "Clause" tag has 2 modes of use - inside <BooleanQuery> in which case only "query" types can be
c h i l d e l e m e n t s - w h i l e i n a < B o o l e a n F i l t e r > c l a u s e o n l y "filter" t y p e s c a n b e c o n t a i n e d .
@ h i d d e n T O D O : C h a n g e B o o l e a n F i l t e r B u i l d e r a n d B o o l e a n Q u e r y B u i l d e r t o a u t o - w r a p c h o i c e o f q u e r y o r f i l t e r s . T h i s t y p e o f
c o d e a l r e a d y e x i s t s i n C a c h e d F i l t e r s o c o u l d b e r e u s e d .
- - >
<!ELEMENT Clause ( %queries; | %filters; ) >
<!-- Controls if the clause is optional (should), mandatory (must) or unacceptable (mustNot) -->
<!ATTLIST Clause occurs ( should | must | mustnot ) "should" >
<!-- Caches any nested query or filter in an LRU (Least recently used) Cache. Cached queries, like filters, are turned into
B i t s e t s a t a c o s t o f 1 b i t p e r d o c u m e n t i n t h e i n d e x . T h e m e m o r y c o s t o f a c a c h e d q u e r y / f i l t e r i s t h e r e f o r e n u m b e r O f D o c s i n I n d e x / 8 b y t e s .
Q u e r i e s t h a t a r e c a c h e d a s f i l t e r s o b v i o u s l y r e t a i n n o n e o f t h e s c o r i n g i n f o r m a t i o n a s s o c i a t e d w i t h r e s u l t s - t h e y r e t a i n j u s t
a B o o l e a n y e s / n o r e c o r d o f w h i c h d o c u m e n t s m a t c h e d .
@ e x a m p l e
< e m > S e a r c h f o r d o c u m e n t s a b o u t b a n k s f r o m t h e l a s t 1 0 y e a r s - c a c h i n g t h e c o m m o n l y - u s e d "last 10 year" f i l t e r a s a B i t S e t i n
R A M t o e l i m i n a t e t h e c o s t o f b u i l d i n g t h i s f i l t e r f r o m d i s k f o r e v e r y q u e r y < / e m >
%
< F i l t e r e d Q u e r y >
< Q u e r y >
< U s e r Q u e r y > b a n k < / U s e r Q u e r y >
< / Q u e r y >
< F i l t e r >
< C a c h e d F i l t e r >
< R a n g e F i l t e r f i e l d N a m e = "date" l o w e r T e r m = "19970101" u p p e r T e r m = "20070101" / >
< / C a c h e d F i l t e r >
< / F i l t e r >
< / F i l t e r e d Q u e r y >
%
- - >
<!ELEMENT CachedFilter ( %queries; | %filters; ) >
<!--
P a s s e s c o n t e n t d i r e c t l y t h r o u g h t o t h e s t a n d a r d L u c e n e Q u e r y p a r s e r s e e "Lucene Query Syntax"
@ e x a m p l e
< e m > S e a r c h f o r d o c u m e n t s a b o u t J o h n S m i t h o r J o h n D o e u s i n g s t a n d a r d L u c e n e Q u e r y S y n t a x < / e m >
%
< U s e r Q u e r y > "John Smith" O R "John Doe" < / U s e r Q u e r y >
%
- - >
<!ELEMENT UserQuery ( #PCDATA ) >
<!-- Optional boost for matches on this query. Values > 1 -->
<!ATTLIST UserQuery boost CDATA "1.0" >
<!-- fieldName can optionally be defined here to change the default field used in the QueryParser -->
<!ATTLIST UserQuery fieldName CDATA #IMPLIED >
<!-- A query which is used to match all documents. This has a couple of uses:
< o l >
< l i > a s a C l a u s e i n a B o o l e a n Q u e r y w h o ' s o n l y o t h e r c l a u s e
i s a "mustNot" m a t c h ( L u c e n e r e q u i r e s a t l e a s t o n e p o s i t i v e c l a u s e ) a n d . . < / l i >
< l i > i n a F i l t e r e d Q u e r y w h e r e a F i l t e r t a g i s e f f e c t i v e l y b e i n g
u s e d t o s e l e c t c o n t e n t r a t h e r t h a n i t ' s u s u a l r o l e o f f i l t e r i n g t h e r e s u l t s o f a q u e r y . < / l i >
< / o l >
@ e x a m p l e
< e m > E f f e c t i v e l y u s e a F i l t e r a s a q u e r y < / e m >
%
< F i l t e r e d Q u e r y >
< Q u e r y >
< M a t c h A l l D o c s Q u e r y / >
< / Q u e r y >
< F i l t e r >
< R a n g e F i l t e r f i e l d N a m e = "date" l o w e r T e r m = "19870409" u p p e r T e r m = "19870412" / >
< / F i l t e r >
< / F i l t e r e d Q u e r y >
%
- - >
<!ELEMENT MatchAllDocsQuery EMPTY >
<!-- a single term query - no analysis is done of the child text
@ e x a m p l e
< e m > M a t c h o n a p r i m a r y k e y < / e m >
%
< T e r m Q u e r y f i e l d N a m e = "primaryKey" > 1 3 4 2 4 < / T e r m Q u e r y >
%
- - >
<!ELEMENT TermQuery ( #PCDATA ) >
<!-- Optional boost for matches on this query. Values > 1 -->
<!ATTLIST TermQuery boost CDATA "1.0" >
<!-- fieldName must be defined here or is taken from the most immediate parent XML element that defines a "fieldName" attribute -->
<!ATTLIST TermQuery fieldName CDATA #IMPLIED >
<!--
A b o o s t e d t e r m q u e r y - n o a n a l y s i s i s d o n e o f t h e c h i l d t e x t . A l s o a s p a n m e m b e r .
( T e x t b e l o w i s c o p i e d f r o m t h e j a v a d o c s o f B o o s t i n g T e r m Q u e r y )
T h e B o o s t i n g T e r m Q u e r y i s v e r y s i m i l a r t o t h e { @ l i n k o r g . a p a c h e . l u c e n e . s e a r c h . s p a n s . S p a n T e r m Q u e r y } e x c e p t
t h a t i t f a c t o r s i n t h e v a l u e o f t h e p a y l o a d l o c a t e d a t e a c h o f t h e p o s i t i o n s w h e r e t h e
{ @ l i n k o r g . a p a c h e . l u c e n e . i n d e x . T e r m } o c c u r s .
I n o r d e r t o t a k e a d v a n t a g e o f t h i s , y o u m u s t o v e r r i d e { @ l i n k o r g . a p a c h e . l u c e n e . s e a r c h . S i m i l a r i t y # s c o r e P a y l o a d ( S t r i n g , b y t e [ ] , i n t , i n t ) }
w h i c h r e t u r n s 1 b y d e f a u l t .
P a y l o a d s c o r e s a r e a v e r a g e d a c r o s s t e r m o c c u r r e n c e s i n t h e d o c u m e n t .
@ s e e o r g . a p a c h e . l u c e n e . s e a r c h . S i m i l a r i t y # s c o r e P a y l o a d ( S t r i n g , b y t e [ ] , i n t , i n t )
- - >
<!ELEMENT BoostingTermQuery ( #PCDATA ) >
<!-- Optional boost for matches on this query. Values > 1 -->
<!ATTLIST TermQuery boost CDATA "1.0" >
<!-- fieldName must be defined here or is taken from the most immediate parent XML element that defines a "fieldName" attribute -->
<!ATTLIST TermQuery fieldName CDATA #IMPLIED >
<!--
T h e e q u i v a l e n t o f a B o o l e a n Q u e r y w i t h m u l t i p l e o p t i o n a l T e r m Q u e r y c l a u s e s .
C h i l d t e x t i s a n a l y z e d u s i n g a f i e l d - s p e c i f i c c h o i c e o f A n a l y z e r t o p r o d u c e a s e t o f t e r m s t h a t a r e O R e d t o g e t h e r i n B o o l e a n l o g i c .
U n l i k e U s e r Q u e r y e l e m e n t , t h i s d o e s n o t p a r s e a n y s p e c i a l c h a r a c t e r s t o c o n t r o l f u z z y / p h r a s e / b o o l e a n l o g i c a n d a s s u c h i s i n c a p a b l e
o f p r o d u c i n g a Q u e r y p a r s e e r r o r g i v e n a n y u s e r i n p u t
@ e x a m p l e
< e m > M a t c h o n t e x t f r o m a d a t a b a s e d e s c r i p t i o n ( w h i c h m a y c o n t a i n c h a r a c t e r s t h a t
a r e i l l e g a l c h a r a c t e r s i n t h e s t a n d a r d L u c e n e Q u e r y s y n t a x u s e d i n t h e U s e r Q u e r y t a g < / e m >
%
< T e r m s Q u e r y f i e l d N a m e = "description" > S m i t h & S o n s ( L t d ) : i n c o r p o r a t e d 1 9 8 2 < / T e r m s Q u e r y >
%
- - >
<!ELEMENT TermsQuery ( #PCDATA ) >
<!-- Optional boost for matches on this query. Values > 1 -->
<!ATTLIST TermsQuery boost CDATA "1.0" >
<!-- fieldName must be defined here or is taken from the most immediate parent XML element that defines a "fieldName" attribute -->
<!ATTLIST TermsQuery fieldName CDATA #IMPLIED >
<!-- The "Coordination factor" rewards documents that contain more of the terms in this list. This flag can be used to turn off this factor. -->
<!ATTLIST TermsQuery disableCoord ( true | false ) "false" >
<!-- The minimum number of terms that should be present in any one document before it is considered to be a match. -->
<!ATTLIST TermsQuery minimumNumberShouldMatch CDATA "0" >
<!--
R u n s a Q u e r y a n d f i l t e r s r e s u l t s t o o n l y t h o s e q u e r y m a t c h e s t h a t a l s o m a t c h t h e F i l t e r e l e m e n t .
@ e x a m p l e
< e m > F i n d a l l d o c u m e n t s a b o u t L u c e n e t h a t h a v e a s t a t u s o f "published" < / e m >
%
< F i l t e r e d Q u e r y >
< Q u e r y >
< U s e r Q u e r y > L u c e n e < / U s e r Q u e r y >
< / Q u e r y >
< F i l t e r >
< T e r m s F i l t e r f i e l d N a m e = "status" > p u b l i s h e d < / T e r m s F i l t e r >
< / F i l t e r >
< / F i l t e r e d Q u e r y >
%
- - >
<!ELEMENT FilteredQuery ( Query , Filter ) >
<!-- Optional boost for matches on this query. Values > 1 -->
<!ATTLIST FilteredQuery boost CDATA "1.0" >
<!-- Used to identify a nested Query element inside another container element. NOT a top - level query tag -->
<!ELEMENT Query ( %queries; ) >
<!-- The choice of Filter that MUST also be matched -->
<!ELEMENT Filter ( %filters; ) >
<!--
F i l t e r u s e d t o l i m i t q u e r y r e s u l t s t o d o c u m e n t s m a t c h i n g a r a n g e o f f i e l d v a l u e s
@ e x a m p l e
< e m > S e a r c h f o r d o c u m e n t s a b o u t b a n k s f r o m t h e l a s t 1 0 y e a r s < / e m >
%
< F i l t e r e d Q u e r y >
< Q u e r y >
< U s e r Q u e r y > b a n k < / U s e r Q u e r y >
< / Q u e r y >
< F i l t e r >
< R a n g e F i l t e r f i e l d N a m e = "date" l o w e r T e r m = "19970101" u p p e r T e r m = "20070101" / >
< / F i l t e r >
< / F i l t e r e d Q u e r y >
%
- - >
<!ELEMENT RangeFilter EMPTY >
<!-- fieldName must be defined here or is taken from the most immediate parent XML element that defines a "fieldName" attribute -->
<!ATTLIST RangeFilter fieldName CDATA #IMPLIED >
<!-- The lower - most term value for this field (must be <= upperTerm) -->
<!ATTLIST RangeFilter lowerTerm CDATA #REQUIRED >
<!-- The upper - most term value for this field (must be >= lowerTerm) -->
<!ATTLIST RangeFilter upperTerm CDATA #REQUIRED >
<!-- Controls if the lowerTerm in the range is part of the allowed set of values -->
<!ATTLIST RangeFilter includeLower ( true | false ) "true" >
<!-- Controls if the upperTerm in the range is part of the allowed set of values -->
<!ATTLIST RangeFilter includeUpper ( true | false ) "true" >
<!-- A single term used in a SpanQuery. These clauses are the building blocks for more complex "span" queries which test word proximity
@ e x a m p l e < e m > F i n d d o c u m e n t s u s i n g t e r m s c l o s e t o e a c h o t h e r a b o u t m i n i n g a n d a c c i d e n t s < / e m >
%
< S p a n N e a r s l o p = "8" i n O r d e r = "false" f i e l d N a m e = "text" >
< S p a n O r >
< S p a n T e r m > k i l l e d < / S p a n T e r m >
< S p a n T e r m > d i e d < / S p a n T e r m >
< S p a n T e r m > d e a d < / S p a n T e r m >
< / S p a n O r >
< S p a n O r >
< S p a n T e r m > m i n e r < / S p a n T e r m >
< S p a n T e r m > m i n i n g < / S p a n T e r m >
< S p a n T e r m > m i n e r s < / S p a n T e r m >
< / S p a n O r >
< / S p a n N e a r >
%
- - >
<!ELEMENT SpanTerm ( #PCDATA ) >
<!-- fieldName must be defined here or is taken from the most immediate parent XML element that defines a "fieldName" attribute -->
<!ATTLIST SpanTerm fieldName CDATA #REQUIRED >
<!-- A field - specific analyzer is used here to parse the child text provided in this tag. The SpanTerms produced are ORed in terms of Boolean logic
@ e x a m p l e < e m > U s e S p a n O r T e r m s a s a m o r e c o n v e n i e n t / s u c c i n c t w a y o f e x p r e s s i n g m u l t i p l e c h o i c e s o f S p a n T e r m s . T h i s e x a m p l e l o o k s f o r r e p o r t s
u s i n g w o r d s d e s c r i b i n g a f a t a l i t y n e a r t o r e f e r e n c e s t o m i n e r s < / e m >
%
< S p a n N e a r s l o p = "8" i n O r d e r = "false" f i e l d N a m e = "text" >
< S p a n O r T e r m s > k i l l e d d i e d d e a t h d e a d d e a t h s < / S p a n O r T e r m s >
< S p a n O r T e r m s > m i n e r m i n i n g m i n e r s < / S p a n O r T e r m s >
< / S p a n N e a r >
%
- - >
<!ELEMENT SpanOrTerms ( #PCDATA ) >
<!-- fieldName must be defined here or is taken from the most immediate parent XML element that defines a "fieldName" attribute -->
<!ATTLIST SpanOrTerms fieldName CDATA #REQUIRED >
<!-- Takes any number of child queries from the Span family
@ e x a m p l e < e m > F i n d d o c u m e n t s u s i n g t e r m s c l o s e t o e a c h o t h e r a b o u t m i n i n g a n d a c c i d e n t s < / e m >
%
< S p a n N e a r s l o p = "8" i n O r d e r = "false" f i e l d N a m e = "text" >
< S p a n O r >
< S p a n T e r m > k i l l e d < / S p a n T e r m >
< S p a n T e r m > d i e d < / S p a n T e r m >
< S p a n T e r m > d e a d < / S p a n T e r m >
< / S p a n O r >
< S p a n O r >
< S p a n T e r m > m i n e r < / S p a n T e r m >
< S p a n T e r m > m i n i n g < / S p a n T e r m >
< S p a n T e r m > m i n e r s < / S p a n T e r m >
< / S p a n O r >
< / S p a n N e a r >
%
- - >
<!ELEMENT SpanOr ( %spanQueries; ) * >
<!-- Takes any number of child queries from the Span family and tests for proximity
@ h i d d e n T O D O S p a n N e a r m i s s i n g " b o o s t a t t r ( c o u l d a d d t o S p a n B u i l d e r B a s e )
- - >
<!ELEMENT SpanNear ( %spanQueries; ) * >
<!-- defines the maximum distance between Span elements where distance is expressed as word number, not byte offset
@ e x a m p l e < e m > F i n d d o c u m e n t s u s i n g t e r m s w i t h i n 8 w o r d s o f e a c h o t h e r t a l k i n g a b o u t m i n i n g a n d a c c i d e n t s < / e m >
%
< S p a n N e a r s l o p = "8" i n O r d e r = "false" f i e l d N a m e = "text" >
< S p a n O r >
< S p a n T e r m > k i l l e d < / S p a n T e r m >
< S p a n T e r m > d i e d < / S p a n T e r m >
< S p a n T e r m > d e a d < / S p a n T e r m >
< / S p a n O r >
< S p a n O r >
< S p a n T e r m > m i n e r < / S p a n T e r m >
< S p a n T e r m > m i n i n g < / S p a n T e r m >
< S p a n T e r m > m i n e r s < / S p a n T e r m >
< / S p a n O r >
< / S p a n N e a r >
%
- - >
<!ATTLIST SpanNear slop CDATA #REQUIRED >
<!-- Controls if matching terms have to appear in the order listed or can be reversed -->
<!ATTLIST SpanNear inOrder ( true | false ) "true" >
<!-- Looks for a SpanQuery match occuring near the beginning of a document
@ e x a m p l e
< e m > F i n d l e t t e r s w h e r e t h e f i r s t 5 0 w o r d s t a l k a b o u t a r e s i g n a t i o n : < / e m >
%
< S p a n F i r s t e n d = "50" >
< S p a n O r T e r m s f i e l d N a m e = "text" > r e s i g n i n g r e s i g n l e a v e < / S p a n O r T e r m s >
< / S p a n F i r s t >
%
- - >
<!ELEMENT SpanFirst ( %spanQueries; ) >
<!-- Controls the end of the region considered in a document's field (expressed in word number, not byte offset) -->
<!ATTLIST SpanFirst end CDATA #REQUIRED >
<!-- Optional boost for matches on this query. Values > 1 -->
<!ATTLIST SpanFirst boost CDATA "1.0" >
<!-- Finds documents matching a SpanQuery but not if matching another SpanQuery
@ e x a m p l e < e m > F i n d d o c u m e n t s t a l k i n g a b o u t s o c i a l s e r v i c e s b u t n o t c o n t a i n i n g t h e w o r d "public" < / e m >
%
< S p a n N o t f i e l d N a m e = "text" >
< I n c l u d e >
< S p a n N e a r s l o p = "2" i n O r d e r = "true" >
< S p a n T e r m > s o c i a l < / S p a n T e r m >
< S p a n T e r m > s e r v i c e s < / S p a n T e r m >
< / S p a n N e a r >
< / I n c l u d e >
< E x c l u d e >
< S p a n T e r m > p u b l i c < / S p a n T e r m >
< / E x c l u d e >
< / S p a n N o t >
%
- - >
<!ELEMENT SpanNot ( Include , Exclude ) >
<!-- The SpanQuery to find -->
<!ELEMENT Include ( %spanQueries; ) >
<!-- The SpanQuery to be avoided -->
<!ELEMENT Exclude ( %spanQueries; ) >
<!-- a utility tag to wrap any filter as a query
@ e x a m p l e < e m > F i n d a l l d o c u m e n t s f r o m t h e l a s t 1 0 y e a r s < / e m >
%
< C o n s t a n t S c o r e Q u e r y >
< R a n g e F i l t e r f i e l d N a m e = "date" l o w e r T e r m = "19970101" u p p e r T e r m = "20070101" / >
< / C o n s t a n t S c o r e Q u e r y >
%
- - >
<!ELEMENT ConstantScoreQuery ( %filters; ) * >
<!-- Optional boost for matches on this query. Values > 1 -->
<!ATTLIST ConstantScoreQuery boost CDATA "1.0" >