LUCENE-6224: cut over more package.htmls

git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1658447 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Robert Muir 2015-02-09 16:12:32 +00:00
parent 1a9c584816
commit 2f7a0e7a77
48 changed files with 1044 additions and 1095 deletions

View File

@ -0,0 +1,24 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/**
* Uses already seen data (the indexed documents) to classify new documents.
* <p>
* Currently contains a (simplistic) Naive Bayes classifier, a k-Nearest
* Neighbor classifier and a Perceptron based classifier.
*/
package org.apache.lucene.classification;

View File

@ -1,22 +0,0 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<html>
<body>
Uses already seen data (the indexed documents) to classify new documents.
Currently contains a (simplistic) Naive Bayes classifier, a k-Nearest Neighbor classifier and a Perceptron based classifier
</body>
</html>

View File

@ -0,0 +1,21 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/**
* Utilities for evaluation, data preparation, etc.
*/
package org.apache.lucene.classification.utils;

View File

@ -1,23 +0,0 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<!doctype html public "-//w3c//dtd html 4.0 transitional//en">
<html>
<body>
Utilities for evaluation, data preparation, etc.
</body>
</html>

View File

@ -0,0 +1,21 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/**
* Pluggable term index / block terms dictionary implementations.
*/
package org.apache.lucene.codecs.blockterms;

View File

@ -1,25 +0,0 @@
<!doctype html public "-//w3c//dtd html 4.0 transitional//en">
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
</head>
<body>
Pluggable term index / block terms dictionary implementations.
</body>
</html>

View File

@ -0,0 +1,23 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/**
* Same postings format as Lucene50, except the terms dictionary also
* supports ords, i.e. returning which ord the enum is seeked to, and
* seeking by ord.
*/
package org.apache.lucene.codecs.blocktreeords;

View File

@ -1,27 +0,0 @@
<!doctype html public "-//w3c//dtd html 4.0 transitional//en">
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
</head>
<body>
Same postings format as Lucene41, except the terms dictionary also
supports ords, i.e. returning which ord the enum is seeked to, and
seeking by ord.
</body>
</html>

View File

@ -0,0 +1,22 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/**
* Codec PostingsFormat for fast access to low-frequency terms
* such as primary key fields.
*/
package org.apache.lucene.codecs.bloom;

View File

@ -1,25 +0,0 @@
<!doctype html public "-//w3c//dtd html 4.0 transitional//en">
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
</head>
<body>
Codec PostingsFormat for fast access to low-frequency terms such as primary key fields.
</body>
</html>

View File

@ -0,0 +1,22 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/**
* Term dictionary, DocValues or Postings formats that are read
* entirely into memory.
*/
package org.apache.lucene.codecs.memory;

View File

@ -1,25 +0,0 @@
<!doctype html public "-//w3c//dtd html 4.0 transitional//en">
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
</head>
<body>
Term dictionary, DocValues or Postings formats that are read entirely into memory.
</body>
</html>

View File

@ -0,0 +1,21 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/**
* Simpletext Codec: writes human readable postings.
*/
package org.apache.lucene.codecs.simpletext;

View File

@ -1,25 +0,0 @@
<!doctype html public "-//w3c//dtd html 4.0 transitional//en">
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
</head>
<body>
Simpletext Codec: writes human readable postings.
</body>
</html>

View File

@ -0,0 +1,21 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/**
* Facets example code.
*/
package org.apache.lucene.demo.facet;

View File

@ -1,22 +0,0 @@
<!doctype html public "-//w3c//dtd html 4.0 transitional//en">
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<html><head></head>
<body>
Facets example code.
</body>
</html>

View File

@ -0,0 +1,21 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/**
* Demo applications for indexing and searching.
*/
package org.apache.lucene.demo;

View File

@ -1,22 +0,0 @@
<!doctype html public "-//w3c//dtd html 4.0 transitional//en">
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<html><head></head>
<body>
Demo applications for indexing and searching.
</body>
</html>

View File

@ -0,0 +1,21 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/**
* Demo servlet for the XML Query Parser.
*/
package org.apache.lucene.demo.xmlparser;

View File

@ -1,22 +0,0 @@
<!doctype html public "-//w3c//dtd html 4.0 transitional//en">
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<html><head></head>
<body>
Demo servlet for the XML Query Parser.
</body>
</html>

View File

@ -0,0 +1,42 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/**
* Javascript expressions.
* <p>A Javascript expression is a numeric expression specified using an expression syntax that's based on JavaScript expressions. You can construct expressions using:</p>
* <ul>
* <li>Integer, floating point, hex and octal literals</li>
* <li>Arithmetic operators: <code>+ - * / %</code></li>
* <li>Bitwise operators: <code>| &amp; ^ ~ &lt;&lt; &gt;&gt; &gt;&gt;&gt;</code></li>
* <li>Boolean operators (including the ternary operator): <code>&amp;&amp; || ! ?:</code></li>
* <li>Comparison operators: <code>&lt; &lt;= == &gt;= &gt;</code></li>
* <li>Common mathematic functions: <code>abs ceil exp floor ln log2 log10 logn max min sqrt pow</code></li>
* <li>Trigonometric library functions: <code>acosh acos asinh asin atanh atan atan2 cosh cos sinh sin tanh tan</code></li>
* <li>Distance functions: <code>haversin</code></li>
* <li>Miscellaneous functions: <code>min, max</code></li>
* <li>Arbitrary external variables - see {@link org.apache.lucene.expressions.Bindings}</li>
* </ul>
*
* <p>
* JavaScript order of precedence rules apply for operators. Shortcut evaluation is used for logical operatorsthe second argument is only evaluated if the value of the expression cannot be determined after evaluating the first argument. For example, in the expression <code>a || b</code>, <code>b</code> is only evaluated if a is not true.
* </p>
*
* <p>
* To compile an expression, use {@link org.apache.lucene.expressions.js.JavascriptCompiler}.
* </p>
*/
package org.apache.lucene.expressions.js;

View File

@ -1,45 +0,0 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<html>
<head>
<title>Javascript expressions</title>
</head>
<body>
<h1>Javascript expressions</h1>
<p>A Javascript expression is a numeric expression specified using an expression syntax that's based on JavaScript expressions. You can construct expressions using:</p>
<ul>
<li>Integer, floating point, hex and octal literals</li>
<li>Arithmetic operators: <code>+ - * / %</code></li>
<li>Bitwise operators: <code>| &amp; ^ ~ &lt;&lt; &gt;&gt; &gt;&gt;&gt;</code></li>
<li>Boolean operators (including the ternary operator): <code>&& || ! ?:</code></li>
<li>Comparison operators: <code>&lt; &lt;= == &gt;= &gt;</code></li>
<li>Common mathematic functions: <code>abs ceil exp floor ln log2 log10 logn max min sqrt pow</code></li>
<li>Trigonometric library functions: <code>acosh acos asinh asin atanh atan atan2 cosh cos sinh sin tanh tan</code></li>
<li>Distance functions: <code>haversin</code></li>
<li>Miscellaneous functions: <code>min, max</code></li>
<li>Arbitrary external variables - see {@link org.apache.lucene.expressions.Bindings}</li>
</ul>
<p>
JavaScript order of precedence rules apply for operators. Shortcut evaluation is used for logical operators—the second argument is only evaluated if the value of the expression cannot be determined after evaluating the first argument. For example, in the expression <code>a || b</code>, <code>b</code> is only evaluated if a is not true.
</p>
<p>
To compile an expression, use {@link org.apache.lucene.expressions.js.JavascriptCompiler}.
</p>
</body>
</html>

View File

@ -0,0 +1,35 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/**
* Expressions.
* <p>
* {@link org.apache.lucene.expressions.Expression} - result of compiling an expression, which can
* evaluate it for a given document. Each expression can have external variables are resolved by
* {@code Bindings}.
* </p>
*
* <p>
* {@link org.apache.lucene.expressions.Bindings} - abstraction for binding external variables
* to a way to get a value for those variables for a particular document (ValueSource).
* </p>
*
* <p>
* {@link org.apache.lucene.expressions.SimpleBindings} - default implementation of bindings which provide easy ways to bind sort fields and other expressions to external variables
* </p>
*/
package org.apache.lucene.expressions;

View File

@ -1,39 +0,0 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<html>
<head>
<title>expressions</title>
</head>
<body>
<h1>expressions</h1>
<p>
{@link org.apache.lucene.expressions.Expression} - result of compiling an expression, which can
evaluate it for a given document. Each expression can have external variables are resolved by
{@code Bindings}.
</p>
<p>
{@link org.apache.lucene.expressions.Bindings} - abstraction for binding external variables
to a way to get a value for those variables for a particular document (ValueSource).
</p>
<p>
{@link org.apache.lucene.expressions.SimpleBindings} - default implementation of bindings which provide easy ways to bind sort fields and other expressions to external variables
</p>
</body>
</html>

View File

@ -0,0 +1,61 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/**
* Faceted search.
* <p>
* This module provides multiple methods for computing facet counts and
* value aggregations:
* <ul>
* <li> Taxonomy-based methods rely on a separate taxonomy index to
* map hierarchical facet paths to global int ordinals for fast
* counting at search time; these methods can compute counts
* (({@link org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts}, {@link
* org.apache.lucene.facet.taxonomy.TaxonomyFacetCounts}) aggregate long or double values {@link
* org.apache.lucene.facet.taxonomy.TaxonomyFacetSumIntAssociations}, {@link
* org.apache.lucene.facet.taxonomy.TaxonomyFacetSumFloatAssociations}, {@link
* org.apache.lucene.facet.taxonomy.TaxonomyFacetSumValueSource}. Add {@link org.apache.lucene.facet.FacetField} or
* {@link org.apache.lucene.facet.taxonomy.AssociationFacetField} to your documents at index time
* to use taxonomy-based methods.
*
* <li> Sorted-set doc values method does not require a separate
* taxonomy index, and computes counts based on sorted set doc
* values fields ({@link org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts}). Add
* {@link org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetField} to your documents at
* index time to use sorted set facet counts.
*
* <li> Range faceting {@link org.apache.lucene.facet.range.LongRangeFacetCounts}, {@link
* org.apache.lucene.facet.range.DoubleRangeFacetCounts} compute counts for a dynamic numeric
* range from a provided {@link org.apache.lucene.queries.function.ValueSource} (previously indexed
* numeric field, or a dynamic expression such as distance).
* </ul>
* <p>
* At search time you first run your search, but pass a {@link
* org.apache.lucene.facet.FacetsCollector} to gather all hits (and optionally, scores for each
* hit). Then, instantiate whichever facet methods you'd like to use
* to compute aggregates. Finally, all methods implement a common
* {@link org.apache.lucene.facet.Facets} base API that you use to obtain specific facet
* counts.
* </p>
* <p>
* The various {@link org.apache.lucene.facet.FacetsCollector#search} utility methods are
* useful for doing an "ordinary" search (sorting by score, or by a
* specified Sort) but also collecting into a {@link org.apache.lucene.facet.FacetsCollector} for
* subsequent faceting.
* </p>
*/
package org.apache.lucene.facet;

View File

@ -1,65 +0,0 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<html>
<head>
<title>faceted search</title>
</head>
<body>
<h1>faceted search</h1>
<p>
This module provides multiple methods for computing facet counts and
value aggregations:
<ul>
<li> Taxonomy-based methods rely on a separate taxonomy index to
map hierarchical facet paths to global int ordinals for fast
counting at search time; these methods can compute counts
(({@link org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts}, {@link
org.apache.lucene.facet.taxonomy.TaxonomyFacetCounts}) aggregate long or double values {@link
org.apache.lucene.facet.taxonomy.TaxonomyFacetSumIntAssociations}, {@link
org.apache.lucene.facet.taxonomy.TaxonomyFacetSumFloatAssociations}, {@link
org.apache.lucene.facet.taxonomy.TaxonomyFacetSumValueSource}. Add {@link org.apache.lucene.facet.FacetField} or
{@link org.apache.lucene.facet.taxonomy.AssociationFacetField} to your documents at index time
to use taxonomy-based methods.
<li> Sorted-set doc values method does not require a separate
taxonomy index, and computes counts based on sorted set doc
values fields ({@link org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts}). Add
{@link org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetField} to your documents at
index time to use sorted set facet counts.
<li> Range faceting {@link org.apache.lucene.facet.range.LongRangeFacetCounts}, {@link
org.apache.lucene.facet.range.DoubleRangeFacetCounts} compute counts for a dynamic numeric
range from a provided {@link org.apache.lucene.queries.function.ValueSource} (previously indexed
numeric field, or a dynamic expression such as distance).
</ul>
</p>
<p>
At search time you first run your search, but pass a {@link
org.apache.lucene.facet.FacetsCollector} to gather all hits (and optionally, scores for each
hit). Then, instantiate whichever facet methods you'd like to use
to compute aggregates. Finally, all methods implement a common
{@link org.apache.lucene.facet.Facets} base API that you use to obtain specific facet
counts.
</p>
<p>
The various {@link org.apache.lucene.facet.FacetsCollector#search} utility methods are
useful for doing an "ordinary" search (sorting by score, or by a
specified Sort) but also collecting into a {@link org.apache.lucene.facet.FacetsCollector} for
subsequent faceting.
</p>
</body>
</html>

View File

@ -0,0 +1,21 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/**
* Provides range faceting capabilities.
*/
package org.apache.lucene.facet.range;

View File

@ -1,24 +0,0 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<html>
<head>
<title>Range Facets</title>
</head>
<body>
Provides range faceting capabilities.
</body>
</html>

View File

@ -0,0 +1,22 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/**
* Provides faceting capabilities over facets that were indexed
* with {@link org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetField}.
*/
package org.apache.lucene.facet.sortedset;

View File

@ -1,25 +0,0 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<html>
<head>
<title>SortedSet Facets</title>
</head>
<body>
Provides faceting capabilities over facets that were indexed with {@link org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetField}.
</body>
</html>

View File

@ -0,0 +1,21 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/**
* Taxonomy index implementation using on top of a Directory.
*/
package org.apache.lucene.facet.taxonomy.directory;

View File

@ -1,24 +0,0 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<html>
<head>
<title>Taxonomy index implementation using on top of a Directory</title>
</head>
<body>
Taxonomy index implementation using on top of a Directory.
</body>
</html>

View File

@ -0,0 +1,52 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/**
* Taxonomy of Categories.
* <p>
* Facets are defined using a hierarchy of categories, known as a <i>Taxonomy</i>.
* For example, the taxonomy of a book store application might have the following structure:
*
* <ul>
* <li>Author
* <ul>
* <li>Mark Twain</li>
* <li>J. K. Rowling</li>
* </ul>
* </li>
* </ul>
*
* <ul>
* <li>Date
* <ul>
* <li>2010</li>
* </ul>
* <ul>
* <li>March</li>
* <li>April</li>
* </ul>
* </li>
* <li>2009</li>
* </ul>
*
* <p>
* The <i>Taxonomy</i> translates category-paths into integer identifiers (often termed <i>ordinals</i>) and vice versa.
* The category <code>Author/Mark Twain</code> adds two nodes to the taxonomy: <code>Author</code> and
* <code>Author/Mark Twain</code>, each is assigned a different ordinal. The taxonomy maintains the invariant that a
* node always has an ordinal that is &lt; all its children.
*/
package org.apache.lucene.facet.taxonomy;

View File

@ -1,53 +0,0 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<html>
<head>
<title>Taxonomy of Categories</title>
</head>
<body>
<h1>Taxonomy of Categories</h1>
Facets are defined using a hierarchy of categories, known as a <i>Taxonomy</i>.
For example, the taxonomy of a book store application might have the following structure:
<ul>
<li>Author
<ul>
<li>Mark Twain</li>
<li>J. K. Rowling</li>
</ul>
</li>
</ul>
<ul>
<li>Date
<ul>
<li>2010
<ul>
<li>March</li>
<li>April</li>
</ul>
</li>
<li>2009</li>
</ul>
</li>
</ul>
The <i>Taxonomy</i> translates category-paths into interger identifiers (often termed <i>ordinals</i>) and vice versa.
The category <code>Author/Mark Twain</code> adds two nodes to the taxonomy: <code>Author</code> and
<code>Author/Mark Twain</code>, each is assigned a different ordinal. The taxonomy maintains the invariant that a
node always has an ordinal that is &lt; all its children.
</body>
</html>

View File

@ -0,0 +1,21 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/**
* Improves indexing time by caching a map of CategoryPath to their Ordinal.
*/
package org.apache.lucene.facet.taxonomy.writercache;

View File

@ -1,24 +0,0 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<html>
<head>
<title>Taxonomy index cache</title>
</head>
<body>
Improves indexing time by caching a map of CategoryPath to their Ordinal.
</body>
</html>

View File

@ -0,0 +1,21 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/**
* Support for grouping by {@link org.apache.lucene.queries.function.ValueSource}.
*/
package org.apache.lucene.search.grouping.function;

View File

@ -1,21 +0,0 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<html>
<body>
Support for grouping by {@link org.apache.lucene.queries.function.ValueSource}.
</body>
</html>

View File

@ -0,0 +1,200 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/**
* Grouping.
* <p>
* This module enables search result grouping with Lucene, where hits
* with the same value in the specified single-valued group field are
* grouped together. For example, if you group by the <code>author</code>
* field, then all documents with the same value in the <code>author</code>
* field fall into a single group.
* </p>
*
* <p>Grouping requires a number of inputs:</p>
*
* <ul>
* <li><code>groupField</code>: this is the field used for grouping.
* For example, if you use the <code>author</code> field then each
* group has all books by the same author. Documents that don't
* have this field are grouped under a single group with
* a <code>null</code> group value.
*
* <li><code>groupSort</code>: how the groups are sorted. For sorting
* purposes, each group is "represented" by the highest-sorted
* document according to the <code>groupSort</code> within it. For
* example, if you specify "price" (ascending) then the first group
* is the one with the lowest price book within it. Or if you
* specify relevance group sort, then the first group is the one
* containing the highest scoring book.
*
* <li><code>topNGroups</code>: how many top groups to keep. For
* example, 10 means the top 10 groups are computed.
*
* <li><code>groupOffset</code>: which "slice" of top groups you want to
* retrieve. For example, 3 means you'll get 7 groups back
* (assuming <code>topNGroups</code> is 10). This is useful for
* paging, where you might show 5 groups per page.
*
* <li><code>withinGroupSort</code>: how the documents within each group
* are sorted. This can be different from the group sort.
*
* <li><code>maxDocsPerGroup</code>: how many top documents within each
* group to keep.
*
* <li><code>withinGroupOffset</code>: which "slice" of top
* documents you want to retrieve from each group.
*
* </ul>
*
* <p>The implementation is two-pass: the first pass ({@link
* org.apache.lucene.search.grouping.term.TermFirstPassGroupingCollector})
* gathers the top groups, and the second pass ({@link
* org.apache.lucene.search.grouping.term.TermSecondPassGroupingCollector})
* gathers documents within those groups. If the search is costly to
* run you may want to use the {@link
* org.apache.lucene.search.CachingCollector} class, which
* caches hits and can (quickly) replay them for the second pass. This
* way you only run the query once, but you pay a RAM cost to (briefly)
* hold all hits. Results are returned as a {@link
* org.apache.lucene.search.grouping.TopGroups} instance.</p>
*
* <p>
* This module abstracts away what defines group and how it is collected. All grouping collectors
* are abstract and have currently term based implementations. One can implement
* collectors that for example group on multiple fields.
* </p>
*
* <p>Known limitations:</p>
* <ul>
* <li> For the two-pass grouping search, the group field must be a
* indexed as a {@link org.apache.lucene.document.SortedDocValuesField}).
* <li> Although Solr support grouping by function and this module has abstraction of what a group is, there are currently only
* implementations for grouping based on terms.
* <li> Sharding is not directly supported, though is not too
* difficult, if you can merge the top groups and top documents per
* group yourself.
* </ul>
*
* <p>Typical usage for the generic two-pass grouping search looks like this using the grouping convenience utility
* (optionally using caching for the second pass search):</p>
*
* <pre class="prettyprint">
* GroupingSearch groupingSearch = new GroupingSearch("author");
* groupingSearch.setGroupSort(groupSort);
* groupingSearch.setFillSortFields(fillFields);
*
* if (useCache) {
* // Sets cache in MB
* groupingSearch.setCachingInMB(4.0, true);
* }
*
* if (requiredTotalGroupCount) {
* groupingSearch.setAllGroups(true);
* }
*
* TermQuery query = new TermQuery(new Term("content", searchTerm));
* TopGroups&lt;BytesRef&gt; result = groupingSearch.search(indexSearcher, query, groupOffset, groupLimit);
*
* // Render groupsResult...
* if (requiredTotalGroupCount) {
* int totalGroupCount = result.totalGroupCount;
* }
* </pre>
*
* <p>To use the single-pass <code>BlockGroupingCollector</code>,
* first, at indexing time, you must ensure all docs in each group
* are added as a block, and you have some way to find the last
* document of each group. One simple way to do this is to add a
* marker binary field:</p>
*
* <pre class="prettyprint">
* // Create Documents from your source:
* List&lt;Document&gt; oneGroup = ...;
*
* Field groupEndField = new Field("groupEnd", "x", Field.Store.NO, Field.Index.NOT_ANALYZED);
* groupEndField.setIndexOptions(IndexOptions.DOCS_ONLY);
* groupEndField.setOmitNorms(true);
* oneGroup.get(oneGroup.size()-1).add(groupEndField);
*
* // You can also use writer.updateDocuments(); just be sure you
* // replace an entire previous doc block with this new one. For
* // example, each group could have a "groupID" field, with the same
* // value for all docs in this group:
* writer.addDocuments(oneGroup);
* </pre>
*
* Then, at search time, do this up front:
*
* <pre class="prettyprint">
* // Set this once in your app &amp; save away for reusing across all queries:
* Filter groupEndDocs = new CachingWrapperFilter(new QueryWrapperFilter(new TermQuery(new Term("groupEnd", "x"))));
* </pre>
*
* Finally, do this per search:
*
* <pre class="prettyprint">
* // Per search:
* BlockGroupingCollector c = new BlockGroupingCollector(groupSort, groupOffset+topNGroups, needsScores, groupEndDocs);
* s.search(new TermQuery(new Term("content", searchTerm)), c);
* TopGroups groupsResult = c.getTopGroups(withinGroupSort, groupOffset, docOffset, docOffset+docsPerGroup, fillFields);
*
* // Render groupsResult...
* </pre>
*
* Or alternatively use the <code>GroupingSearch</code> convenience utility:
*
* <pre class="prettyprint">
* // Per search:
* GroupingSearch groupingSearch = new GroupingSearch(groupEndDocs);
* groupingSearch.setGroupSort(groupSort);
* groupingSearch.setIncludeScores(needsScores);
* TermQuery query = new TermQuery(new Term("content", searchTerm));
* TopGroups groupsResult = groupingSearch.search(indexSearcher, query, groupOffset, groupLimit);
*
* // Render groupsResult...
* </pre>
*
* Note that the <code>groupValue</code> of each <code>GroupDocs</code>
* will be <code>null</code>, so if you need to present this value you'll
* have to separately retrieve it (for example using stored
* fields, <code>FieldCache</code>, etc.).
*
* <p>Another collector is the <code>TermAllGroupHeadsCollector</code> that can be used to retrieve all most relevant
* documents per group. Also known as group heads. This can be useful in situations when one wants to compute group
* based facets / statistics on the complete query result. The collector can be executed during the first or second
* phase. This collector can also be used with the <code>GroupingSearch</code> convenience utility, but when if one only
* wants to compute the most relevant documents per group it is better to just use the collector as done here below.</p>
*
* <pre class="prettyprint">
* AbstractAllGroupHeadsCollector c = TermAllGroupHeadsCollector.create(groupField, sortWithinGroup);
* s.search(new TermQuery(new Term("content", searchTerm)), c);
* // Return all group heads as int array
* int[] groupHeadsArray = c.retrieveGroupHeads()
* // Return all group heads as FixedBitSet.
* int maxDoc = s.maxDoc();
* FixedBitSet groupHeadsBitSet = c.retrieveGroupHeads(maxDoc)
* </pre>
*
* <p>For each of the above collector types there is also a variant that works with <code>ValueSource</code> instead of
* of fields. Concretely this means that these variants can work with functions. These variants are slower than
* there term based counter parts. These implementations are located in the
* <code>org.apache.lucene.search.grouping.function</code> package, but can also be used with the
* <code>GroupingSearch</code> convenience utility
* </p>
*/
package org.apache.lucene.search.grouping;

View File

@ -1,199 +0,0 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<html>
<body>
<p>This module enables search result grouping with Lucene, where hits
with the same value in the specified single-valued group field are
grouped together. For example, if you group by the <code>author</code>
field, then all documents with the same value in the <code>author</code>
field fall into a single group.</p>
<p>Grouping requires a number of inputs:</p>
<ul>
<li> <code>groupField</code>: this is the field used for grouping.
For example, if you use the <code>author</code> field then each
group has all books by the same author. Documents that don't
have this field are grouped under a single group with
a <code>null</code> group value.
<li> <code>groupSort</code>: how the groups are sorted. For sorting
purposes, each group is "represented" by the highest-sorted
document according to the <code>groupSort</code> within it. For
example, if you specify "price" (ascending) then the first group
is the one with the lowest price book within it. Or if you
specify relevance group sort, then the first group is the one
containing the highest scoring book.
<li> <code>topNGroups</code>: how many top groups to keep. For
example, 10 means the top 10 groups are computed.
<li> <code>groupOffset</code>: which "slice" of top groups you want to
retrieve. For example, 3 means you'll get 7 groups back
(assuming <code>topNGroups</code> is 10). This is useful for
paging, where you might show 5 groups per page.
<li> <code>withinGroupSort</code>: how the documents within each group
are sorted. This can be different from the group sort.
<li> <code>maxDocsPerGroup</code>: how many top documents within each
group to keep.
<li> <code>withinGroupOffset</code>: which "slice" of top
documents you want to retrieve from each group.
</ul>
<p>The implementation is two-pass: the first pass ({@link
org.apache.lucene.search.grouping.term.TermFirstPassGroupingCollector})
gathers the top groups, and the second pass ({@link
org.apache.lucene.search.grouping.term.TermSecondPassGroupingCollector})
gathers documents within those groups. If the search is costly to
run you may want to use the {@link
org.apache.lucene.search.CachingCollector} class, which
caches hits and can (quickly) replay them for the second pass. This
way you only run the query once, but you pay a RAM cost to (briefly)
hold all hits. Results are returned as a {@link
org.apache.lucene.search.grouping.TopGroups} instance.</p>
<p>
This module abstracts away what defines group and how it is collected. All grouping collectors
are abstract and have currently term based implementations. One can implement
collectors that for example group on multiple fields.
</p>
<p>Known limitations:</p>
<ul>
<li> For the two-pass grouping search, the group field must be a
indexed as a {@link org.apache.lucene.document.SortedDocValuesField}).
<li> Although Solr support grouping by function and this module has abstraction of what a group is, there are currently only
implementations for grouping based on terms.
<li> Sharding is not directly supported, though is not too
difficult, if you can merge the top groups and top documents per
group yourself.
</ul>
<p>Typical usage for the generic two-pass grouping search looks like this using the grouping convenience utility
(optionally using caching for the second pass search):</p>
<pre class="prettyprint">
GroupingSearch groupingSearch = new GroupingSearch("author");
groupingSearch.setGroupSort(groupSort);
groupingSearch.setFillSortFields(fillFields);
if (useCache) {
// Sets cache in MB
groupingSearch.setCachingInMB(4.0, true);
}
if (requiredTotalGroupCount) {
groupingSearch.setAllGroups(true);
}
TermQuery query = new TermQuery(new Term("content", searchTerm));
TopGroups&lt;BytesRef&gt; result = groupingSearch.search(indexSearcher, query, groupOffset, groupLimit);
// Render groupsResult...
if (requiredTotalGroupCount) {
int totalGroupCount = result.totalGroupCount;
}
</pre>
<p>To use the single-pass <code>BlockGroupingCollector</code>,
first, at indexing time, you must ensure all docs in each group
are added as a block, and you have some way to find the last
document of each group. One simple way to do this is to add a
marker binary field:</p>
<pre class="prettyprint">
// Create Documents from your source:
List&lt;Document&gt; oneGroup = ...;
Field groupEndField = new Field("groupEnd", "x", Field.Store.NO, Field.Index.NOT_ANALYZED);
groupEndField.setIndexOptions(IndexOptions.DOCS_ONLY);
groupEndField.setOmitNorms(true);
oneGroup.get(oneGroup.size()-1).add(groupEndField);
// You can also use writer.updateDocuments(); just be sure you
// replace an entire previous doc block with this new one. For
// example, each group could have a "groupID" field, with the same
// value for all docs in this group:
writer.addDocuments(oneGroup);
</pre>
Then, at search time, do this up front:
<pre class="prettyprint">
// Set this once in your app & save away for reusing across all queries:
Filter groupEndDocs = new CachingWrapperFilter(new QueryWrapperFilter(new TermQuery(new Term("groupEnd", "x"))));
</pre>
Finally, do this per search:
<pre class="prettyprint">
// Per search:
BlockGroupingCollector c = new BlockGroupingCollector(groupSort, groupOffset+topNGroups, needsScores, groupEndDocs);
s.search(new TermQuery(new Term("content", searchTerm)), c);
TopGroups groupsResult = c.getTopGroups(withinGroupSort, groupOffset, docOffset, docOffset+docsPerGroup, fillFields);
// Render groupsResult...
</pre>
Or alternatively use the <code>GroupingSearch</code> convenience utility:
<pre class="prettyprint">
// Per search:
GroupingSearch groupingSearch = new GroupingSearch(groupEndDocs);
groupingSearch.setGroupSort(groupSort);
groupingSearch.setIncludeScores(needsScores);
TermQuery query = new TermQuery(new Term("content", searchTerm));
TopGroups groupsResult = groupingSearch.search(indexSearcher, query, groupOffset, groupLimit);
// Render groupsResult...
</pre>
Note that the <code>groupValue</code> of each <code>GroupDocs</code>
will be <code>null</code>, so if you need to present this value you'll
have to separately retrieve it (for example using stored
fields, <code>FieldCache</code>, etc.).
<p>Another collector is the <code>TermAllGroupHeadsCollector</code> that can be used to retrieve all most relevant
documents per group. Also known as group heads. This can be useful in situations when one wants to compute group
based facets / statistics on the complete query result. The collector can be executed during the first or second
phase. This collector can also be used with the <code>GroupingSearch</code> convenience utility, but when if one only
wants to compute the most relevant documents per group it is better to just use the collector as done here below.</p>
<pre class="prettyprint">
AbstractAllGroupHeadsCollector c = TermAllGroupHeadsCollector.create(groupField, sortWithinGroup);
s.search(new TermQuery(new Term("content", searchTerm)), c);
// Return all group heads as int array
int[] groupHeadsArray = c.retrieveGroupHeads()
// Return all group heads as FixedBitSet.
int maxDoc = s.maxDoc();
FixedBitSet groupHeadsBitSet = c.retrieveGroupHeads(maxDoc)
</pre>
<p>For each of the above collector types there is also a variant that works with <code>ValueSource</code> instead of
of fields. Concretely this means that these variants can work with functions. These variants are slower than
there term based counter parts. These implementations are located in the
<code>org.apache.lucene.search.grouping.function</code> package, but can also be used with the
<code>GroupingSearch</code> convenience utility
</p>
</body>
</html>

View File

@ -0,0 +1,21 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/**
* Support for grouping by indexed terms via {@link org.apache.lucene.index.DocValues}.
*/
package org.apache.lucene.search.grouping.term;

View File

@ -1,21 +0,0 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<html>
<body>
Support for grouping by indexed terms via {@link org.apache.lucene.index.DocValues}.
</body>
</html>

View File

@ -0,0 +1,95 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/**
* Highlighting search terms.
* <p>
* The highlight package contains classes to provide "keyword in context" features
* typically used to highlight search terms in the text of results pages.
* The Highlighter class is the central component and can be used to extract the
* most interesting sections of a piece of text and highlight them, with the help of
* Fragmenter, fragment Scorer, and Formatter classes.
*
* <h2>Example Usage</h2>
*
* <pre class="prettyprint">
* //... Above, create documents with two fields, one with term vectors (tv) and one without (notv)
* IndexSearcher searcher = new IndexSearcher(directory);
* QueryParser parser = new QueryParser("notv", analyzer);
* Query query = parser.parse("million");
*
* TopDocs hits = searcher.search(query, 10);
*
* SimpleHTMLFormatter htmlFormatter = new SimpleHTMLFormatter();
* Highlighter highlighter = new Highlighter(htmlFormatter, new QueryScorer(query));
* for (int i = 0; i &lt; 10; i++) {
* int id = hits.scoreDocs[i].doc;
* Document doc = searcher.doc(id);
* String text = doc.get("notv");
* TokenStream tokenStream = TokenSources.getAnyTokenStream(searcher.getIndexReader(), id, "notv", analyzer);
* TextFragment[] frag = highlighter.getBestTextFragments(tokenStream, text, false, 10);//highlighter.getBestFragments(tokenStream, text, 3, "...");
* for (int j = 0; j &lt; frag.length; j++) {
* if ((frag[j] != null) &amp;&amp; (frag[j].getScore() &gt; 0)) {
* System.out.println((frag[j].toString()));
* }
* }
* //Term vector
* text = doc.get("tv");
* tokenStream = TokenSources.getAnyTokenStream(searcher.getIndexReader(), hits.scoreDocs[i].doc, "tv", analyzer);
* frag = highlighter.getBestTextFragments(tokenStream, text, false, 10);
* for (int j = 0; j &lt; frag.length; j++) {
* if ((frag[j] != null) &amp;&amp; (frag[j].getScore() &gt; 0)) {
* System.out.println((frag[j].toString()));
* }
* }
* System.out.println("-------------");
* }
* </pre>
*
* <h2>New features 06/02/2005</h2>
*
* This release adds options for encoding (thanks to Nicko Cadell).
* An "Encoder" implementation such as the new SimpleHTMLEncoder class can be passed to the highlighter to encode
* all those non-xhtml standard characters such as &amp; into legal values. This simple class may not suffice for
* some languages - Commons Lang has an implementation that could be used: escapeHtml(String) in
* http://svn.apache.org/viewcvs.cgi/jakarta/commons/proper/lang/trunk/src/java/org/apache/commons/lang/StringEscapeUtils.java?rev=137958&amp;view=markup
*
* <h2>New features 22/12/2004</h2>
*
* This release adds some new capabilities:
* <ol>
* <li>Faster highlighting using Term vector support</li>
* <li>New formatting options to use color intensity to show informational value</li>
* <li>Options for better summarization by using term IDF scores to influence fragment selection</li>
* </ol>
*
* <p>
* The highlighter takes a TokenStream as input. Until now these streams have typically been produced
* using an Analyzer but the new class TokenSources provides helper methods for obtaining TokenStreams from
* the new TermVector position support (see latest CVS version).</p>
*
* <p>The new class GradientFormatter can use a scale of colors to highlight terms according to their score.
* A subtle use of color can help emphasise the reasons for matching (useful when doing "MoreLikeThis" queries and
* you want to see what the basis of the similarities are).</p>
*
* <p>The QueryScorer class has a new constructor which can use an IndexReader to derive the IDF (inverse document frequency)
* for each term in order to influence the score. This is useful for helping to extracting the most significant sections
* of a document and in supplying scores used by the new GradientFormatter to color significant words more strongly.
* The QueryScorer.getMaxWeight method is useful when passed to the GradientFormatter constructor to define the top score
* which is associated with the top color.</p>
*/
package org.apache.lucene.search.highlight;

View File

@ -1,99 +0,0 @@
<!doctype html public "-//w3c//dtd html 4.0 transitional//en">
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<html>
<body>
The highlight package contains classes to provide "keyword in context" features
typically used to highlight search terms in the text of results pages.
The Highlighter class is the central component and can be used to extract the
most interesting sections of a piece of text and highlight them, with the help of
Fragmenter, fragment Scorer, and Formatter classes.
<h2>Example Usage</h2>
<pre class="prettyprint">
//... Above, create documents with two fields, one with term vectors (tv) and one without (notv)
IndexSearcher searcher = new IndexSearcher(directory);
QueryParser parser = new QueryParser("notv", analyzer);
Query query = parser.parse("million");
TopDocs hits = searcher.search(query, 10);
SimpleHTMLFormatter htmlFormatter = new SimpleHTMLFormatter();
Highlighter highlighter = new Highlighter(htmlFormatter, new QueryScorer(query));
for (int i = 0; i < 10; i++) {
int id = hits.scoreDocs[i].doc;
Document doc = searcher.doc(id);
String text = doc.get("notv");
TokenStream tokenStream = TokenSources.getAnyTokenStream(searcher.getIndexReader(), id, "notv", analyzer);
TextFragment[] frag = highlighter.getBestTextFragments(tokenStream, text, false, 10);//highlighter.getBestFragments(tokenStream, text, 3, "...");
for (int j = 0; j < frag.length; j++) {
if ((frag[j] != null) && (frag[j].getScore() > 0)) {
System.out.println((frag[j].toString()));
}
}
//Term vector
text = doc.get("tv");
tokenStream = TokenSources.getAnyTokenStream(searcher.getIndexReader(), hits.scoreDocs[i].doc, "tv", analyzer);
frag = highlighter.getBestTextFragments(tokenStream, text, false, 10);
for (int j = 0; j < frag.length; j++) {
if ((frag[j] != null) && (frag[j].getScore() > 0)) {
System.out.println((frag[j].toString()));
}
}
System.out.println("-------------");
}
</pre>
<h2>New features 06/02/2005</h2>
This release adds options for encoding (thanks to Nicko Cadell).
An "Encoder" implementation such as the new SimpleHTMLEncoder class can be passed to the highlighter to encode
all those non-xhtml standard characters such as &amp; into legal values. This simple class may not suffice for
some languages - Commons Lang has an implementation that could be used: escapeHtml(String) in
http://svn.apache.org/viewcvs.cgi/jakarta/commons/proper/lang/trunk/src/java/org/apache/commons/lang/StringEscapeUtils.java?rev=137958&view=markup
<h2>New features 22/12/2004</h2>
This release adds some new capabilities:
<ol>
<li>Faster highlighting using Term vector support</li>
<li>New formatting options to use color intensity to show informational value</li>
<li>Options for better summarization by using term IDF scores to influence fragment selection</li>
</ol>
<p>
The highlighter takes a TokenStream as input. Until now these streams have typically been produced
using an Analyzer but the new class TokenSources provides helper methods for obtaining TokenStreams from
the new TermVector position support (see latest CVS version).</p>
<p>The new class GradientFormatter can use a scale of colors to highlight terms according to their score.
A subtle use of color can help emphasise the reasons for matching (useful when doing "MoreLikeThis" queries and
you want to see what the basis of the similarities are).</p>
<p>The QueryScorer class has a new constructor which can use an IndexReader to derive the IDF (inverse document frequency)
for each term in order to influence the score. This is useful for helping to extracting the most significant sections
of a document and in supplying scores used by the new GradientFormatter to color significant words more strongly.
The QueryScorer.getMaxWeight method is useful when passed to the GradientFormatter constructor to define the top score
which is associated with the top color.</p>
</body>
</html>

View File

@ -0,0 +1,21 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/**
* Highlighter implementation that uses offsets from postings lists.
*/
package org.apache.lucene.search.postingshighlight;

View File

@ -1,22 +0,0 @@
<!doctype html public "-//w3c//dtd html 4.0 transitional//en">
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<html>
<body>
Highlighter implementation that uses offsets from postings lists.
</body>
</html>

View File

@ -0,0 +1,194 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/**
* Another highlighter implementation based on term vectors.
*
* <h2>Features</h2>
* <ul>
* <li>fast for large docs</li>
* <li>support N-gram fields</li>
* <li>support phrase-unit highlighting with slops</li>
* <li>support multi-term (includes wildcard, range, regexp, etc) queries</li>
* <li>highlight fields need to be stored with Positions and Offsets</li>
* <li>take into account query boost and/or IDF-weight to score fragments</li>
* <li>support colored highlight tags</li>
* <li>pluggable FragListBuilder / FieldFragList</li>
* <li>pluggable FragmentsBuilder</li>
* </ul>
*
* <h2>Algorithm</h2>
* <p>To explain the algorithm, let's use the following sample text
* (to be highlighted) and user query:</p>
*
* <table border=1 summary="sample document and query">
* <tr>
* <td><b>Sample Text</b></td>
* <td>Lucene is a search engine library.</td>
* </tr>
* <tr>
* <td><b>User Query</b></td>
* <td>Lucene^2 OR "search library"~1</td>
* </tr>
* </table>
*
* <p>The user query is a BooleanQuery that consists of TermQuery("Lucene")
* with boost of 2 and PhraseQuery("search library") with slop of 1.</p>
* <p>For your convenience, here is the offsets and positions info of the
* sample text.</p>
*
* <pre>
* +--------+-----------------------------------+
* | | 1111111111222222222233333|
* | offset|01234567890123456789012345678901234|
* +--------+-----------------------------------+
* |document|Lucene is a search engine library. |
* +--------*-----------------------------------+
* |position|0 1 2 3 4 5 |
* +--------*-----------------------------------+
* </pre>
*
* <h3>Step 1.</h3>
* <p>In Step 1, Fast Vector Highlighter generates {@link org.apache.lucene.search.vectorhighlight.FieldQuery.QueryPhraseMap} from the user query.
* <code>QueryPhraseMap</code> consists of the following members:</p>
* <pre class="prettyprint">
* public class QueryPhraseMap {
* boolean terminal;
* int slop; // valid if terminal == true and phraseHighlight == true
* float boost; // valid if terminal == true
* Map&lt;String, QueryPhraseMap&gt; subMap;
* }
* </pre>
* <p><code>QueryPhraseMap</code> has subMap. The key of the subMap is a term
* text in the user query and the value is a subsequent <code>QueryPhraseMap</code>.
* If the query is a term (not phrase), then the subsequent <code>QueryPhraseMap</code>
* is marked as terminal. If the query is a phrase, then the subsequent <code>QueryPhraseMap</code>
* is not a terminal and it has the next term text in the phrase.</p>
*
* <p>From the sample user query, the following <code>QueryPhraseMap</code>
* will be generated:</p>
* <pre>
* QueryPhraseMap
* +--------+-+ +-------+-+
* |"Lucene"|o+-&gt;|boost=2|*| * : terminal
* +--------+-+ +-------+-+
*
* +--------+-+ +---------+-+ +-------+------+-+
* |"search"|o+-&gt;|"library"|o+-&gt;|boost=1|slop=1|*|
* +--------+-+ +---------+-+ +-------+------+-+
* </pre>
*
* <h3>Step 2.</h3>
* <p>In Step 2, Fast Vector Highlighter generates {@link org.apache.lucene.search.vectorhighlight.FieldTermStack}. Fast Vector Highlighter uses term vector data
* (must be stored {@link org.apache.lucene.document.FieldType#setStoreTermVectorOffsets(boolean)} and {@link org.apache.lucene.document.FieldType#setStoreTermVectorPositions(boolean)})
* to generate it. <code>FieldTermStack</code> keeps the terms in the user query.
* Therefore, in this sample case, Fast Vector Highlighter generates the following <code>FieldTermStack</code>:</p>
* <pre>
* FieldTermStack
* +------------------+
* |"Lucene"(0,6,0) |
* +------------------+
* |"search"(12,18,3) |
* +------------------+
* |"library"(26,33,5)|
* +------------------+
* where : "termText"(startOffset,endOffset,position)
* </pre>
* <h3>Step 3.</h3>
* <p>In Step 3, Fast Vector Highlighter generates {@link org.apache.lucene.search.vectorhighlight.FieldPhraseList}
* by reference to <code>QueryPhraseMap</code> and <code>FieldTermStack</code>.</p>
* <pre>
* FieldPhraseList
* +----------------+-----------------+---+
* |"Lucene" |[(0,6)] |w=2|
* +----------------+-----------------+---+
* |"search library"|[(12,18),(26,33)]|w=1|
* +----------------+-----------------+---+
* </pre>
* <p>The type of each entry is <code>WeightedPhraseInfo</code> that consists of
* an array of terms offsets and weight.
* </p>
* <h3>Step 4.</h3>
* <p>In Step 4, Fast Vector Highlighter creates <code>FieldFragList</code> by reference to
* <code>FieldPhraseList</code>. In this sample case, the following
* <code>FieldFragList</code> will be generated:</p>
* <pre>
* FieldFragList
* +---------------------------------+
* |"Lucene"[(0,6)] |
* |"search library"[(12,18),(26,33)]|
* |totalBoost=3 |
* +---------------------------------+
* </pre>
*
* <p>
* The calculation for each <code>FieldFragList.WeightedFragInfo.totalBoost</code> (weight)
* depends on the implementation of <code>FieldFragList.add( ... )</code>:
* <pre class="prettyprint">
* public void add( int startOffset, int endOffset, List&lt;WeightedPhraseInfo&gt; phraseInfoList ) {
* float totalBoost = 0;
* List&lt;SubInfo&gt; subInfos = new ArrayList&lt;SubInfo&gt;();
* for( WeightedPhraseInfo phraseInfo : phraseInfoList ){
* subInfos.add( new SubInfo( phraseInfo.getText(), phraseInfo.getTermsOffsets(), phraseInfo.getSeqnum() ) );
* totalBoost += phraseInfo.getBoost();
* }
* getFragInfos().add( new WeightedFragInfo( startOffset, endOffset, subInfos, totalBoost ) );
* }
*
* </pre>
* The used implementation of <code>FieldFragList</code> is noted in <code>BaseFragListBuilder.createFieldFragList( ... )</code>:
* <pre class="prettyprint">
* public FieldFragList createFieldFragList( FieldPhraseList fieldPhraseList, int fragCharSize ){
* return createFieldFragList( fieldPhraseList, new SimpleFieldFragList( fragCharSize ), fragCharSize );
* }
* </pre>
* <p>
* Currently there are basically to approaches available:
* </p>
* <ul>
* <li><code>SimpleFragListBuilder using SimpleFieldFragList</code>: <i>sum-of-boosts</i>-approach. The totalBoost is calculated by summarizing the query-boosts per term. Per default a term is boosted by 1.0</li>
* <li><code>WeightedFragListBuilder using WeightedFieldFragList</code>: <i>sum-of-distinct-weights</i>-approach. The totalBoost is calculated by summarizing the IDF-weights of distinct terms.</li>
* </ul>
* <p>Comparison of the two approaches:</p>
* <table border="1">
* <caption>
* query = das alte testament (The Old Testament)
* </caption>
* <tr><th>Terms in fragment</th><th>sum-of-distinct-weights</th><th>sum-of-boosts</th></tr>
* <tr><td>das alte testament</td><td>5.339621</td><td>3.0</td></tr>
* <tr><td>das alte testament</td><td>5.339621</td><td>3.0</td></tr>
* <tr><td>das testament alte</td><td>5.339621</td><td>3.0</td></tr>
* <tr><td>das alte testament</td><td>5.339621</td><td>3.0</td></tr>
* <tr><td>das testament</td><td>2.9455688</td><td>2.0</td></tr>
* <tr><td>das alte</td><td>2.4759595</td><td>2.0</td></tr>
* <tr><td>das das das das</td><td>1.5015357</td><td>4.0</td></tr>
* <tr><td>das das das</td><td>1.3003681</td><td>3.0</td></tr>
* <tr><td>das das</td><td>1.061746</td><td>2.0</td></tr>
* <tr><td>alte</td><td>1.0</td><td>1.0</td></tr>
* <tr><td>alte</td><td>1.0</td><td>1.0</td></tr>
* <tr><td>das</td><td>0.7507678</td><td>1.0</td></tr>
* <tr><td>das</td><td>0.7507678</td><td>1.0</td></tr>
* <tr><td>das</td><td>0.7507678</td><td>1.0</td></tr>
* <tr><td>das</td><td>0.7507678</td><td>1.0</td></tr>
* <tr><td>das</td><td>0.7507678</td><td>1.0</td></tr>
* </table>
*
* <h3>Step 5.</h3>
* <p>In Step 5, by using <code>FieldFragList</code> and the field stored data,
* Fast Vector Highlighter creates highlighted snippets!</p>
*/
package org.apache.lucene.search.vectorhighlight;

View File

@ -1,196 +0,0 @@
<!doctype html public "-//w3c//dtd html 4.0 transitional//en">
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<html>
<body>
This is an another highlighter implementation.
<h2>Features</h2>
<ul>
<li>fast for large docs</li>
<li>support N-gram fields</li>
<li>support phrase-unit highlighting with slops</li>
<li>support multi-term (includes wildcard, range, regexp, etc) queries</li>
<li>need Java 1.5</li>
<li>highlight fields need to be stored with Positions and Offsets</li>
<li>take into account query boost and/or IDF-weight to score fragments</li>
<li>support colored highlight tags</li>
<li>pluggable FragListBuilder / FieldFragList</li>
<li>pluggable FragmentsBuilder</li>
</ul>
<h2>Algorithm</h2>
<p>To explain the algorithm, let's use the following sample text
(to be highlighted) and user query:</p>
<table border=1>
<tr>
<td><b>Sample Text</b></td>
<td>Lucene is a search engine library.</td>
</tr>
<tr>
<td><b>User Query</b></td>
<td>Lucene^2 OR "search library"~1</td>
</tr>
</table>
<p>The user query is a BooleanQuery that consists of TermQuery("Lucene")
with boost of 2 and PhraseQuery("search library") with slop of 1.</p>
<p>For your convenience, here is the offsets and positions info of the
sample text.</p>
<pre>
+--------+-----------------------------------+
| | 1111111111222222222233333|
| offset|01234567890123456789012345678901234|
+--------+-----------------------------------+
|document|Lucene is a search engine library. |
+--------*-----------------------------------+
|position|0 1 2 3 4 5 |
+--------*-----------------------------------+
</pre>
<h3>Step 1.</h3>
<p>In Step 1, Fast Vector Highlighter generates {@link org.apache.lucene.search.vectorhighlight.FieldQuery.QueryPhraseMap} from the user query.
<code>QueryPhraseMap</code> consists of the following members:</p>
<pre class="prettyprint">
public class QueryPhraseMap {
boolean terminal;
int slop; // valid if terminal == true and phraseHighlight == true
float boost; // valid if terminal == true
Map&lt;String, QueryPhraseMap&gt; subMap;
}
</pre>
<p><code>QueryPhraseMap</code> has subMap. The key of the subMap is a term
text in the user query and the value is a subsequent <code>QueryPhraseMap</code>.
If the query is a term (not phrase), then the subsequent <code>QueryPhraseMap</code>
is marked as terminal. If the query is a phrase, then the subsequent <code>QueryPhraseMap</code>
is not a terminal and it has the next term text in the phrase.</p>
<p>From the sample user query, the following <code>QueryPhraseMap</code>
will be generated:</p>
<pre>
QueryPhraseMap
+--------+-+ +-------+-+
|"Lucene"|o+->|boost=2|*| * : terminal
+--------+-+ +-------+-+
+--------+-+ +---------+-+ +-------+------+-+
|"search"|o+->|"library"|o+->|boost=1|slop=1|*|
+--------+-+ +---------+-+ +-------+------+-+
</pre>
<h3>Step 2.</h3>
<p>In Step 2, Fast Vector Highlighter generates {@link org.apache.lucene.search.vectorhighlight.FieldTermStack}. Fast Vector Highlighter uses term vector data
(must be stored {@link org.apache.lucene.document.FieldType#setStoreTermVectorOffsets(boolean)} and {@link org.apache.lucene.document.FieldType#setStoreTermVectorPositions(boolean)})
to generate it. <code>FieldTermStack</code> keeps the terms in the user query.
Therefore, in this sample case, Fast Vector Highlighter generates the following <code>FieldTermStack</code>:</p>
<pre>
FieldTermStack
+------------------+
|"Lucene"(0,6,0) |
+------------------+
|"search"(12,18,3) |
+------------------+
|"library"(26,33,5)|
+------------------+
where : "termText"(startOffset,endOffset,position)
</pre>
<h3>Step 3.</h3>
<p>In Step 3, Fast Vector Highlighter generates {@link org.apache.lucene.search.vectorhighlight.FieldPhraseList}
by reference to <code>QueryPhraseMap</code> and <code>FieldTermStack</code>.</p>
<pre>
FieldPhraseList
+----------------+-----------------+---+
|"Lucene" |[(0,6)] |w=2|
+----------------+-----------------+---+
|"search library"|[(12,18),(26,33)]|w=1|
+----------------+-----------------+---+
</pre>
<p>The type of each entry is <code>WeightedPhraseInfo</code> that consists of
an array of terms offsets and weight.
</p>
<h3>Step 4.</h3>
<p>In Step 4, Fast Vector Highlighter creates <code>FieldFragList</code> by reference to
<code>FieldPhraseList</code>. In this sample case, the following
<code>FieldFragList</code> will be generated:</p>
<pre>
FieldFragList
+---------------------------------+
|"Lucene"[(0,6)] |
|"search library"[(12,18),(26,33)]|
|totalBoost=3 |
+---------------------------------+
</pre>
<p>
The calculation for each <code>FieldFragList.WeightedFragInfo.totalBoost</code> (weight)
depends on the implementation of <code>FieldFragList.add( ... )</code>:
<pre class="prettyprint">
public void add( int startOffset, int endOffset, List&lt;WeightedPhraseInfo&gt; phraseInfoList ) {
float totalBoost = 0;
List&lt;SubInfo&gt; subInfos = new ArrayList&lt;SubInfo&gt;();
for( WeightedPhraseInfo phraseInfo : phraseInfoList ){
subInfos.add( new SubInfo( phraseInfo.getText(), phraseInfo.getTermsOffsets(), phraseInfo.getSeqnum() ) );
totalBoost += phraseInfo.getBoost();
}
getFragInfos().add( new WeightedFragInfo( startOffset, endOffset, subInfos, totalBoost ) );
}
</pre>
The used implementation of <code>FieldFragList</code> is noted in <code>BaseFragListBuilder.createFieldFragList( ... )</code>:
<pre class="prettyprint">
public FieldFragList createFieldFragList( FieldPhraseList fieldPhraseList, int fragCharSize ){
return createFieldFragList( fieldPhraseList, new SimpleFieldFragList( fragCharSize ), fragCharSize );
}
</pre>
<p>
Currently there are basically to approaches available:
</p>
<ul>
<li><code>SimpleFragListBuilder using SimpleFieldFragList</code>: <i>sum-of-boosts</i>-approach. The totalBoost is calculated by summarizing the query-boosts per term. Per default a term is boosted by 1.0</li>
<li><code>WeightedFragListBuilder using WeightedFieldFragList</code>: <i>sum-of-distinct-weights</i>-approach. The totalBoost is calculated by summarizing the IDF-weights of distinct terms.</li>
</ul>
<p>Comparison of the two approaches:</p>
<table border="1">
<caption>
query = das alte testament (The Old Testament)
</caption>
<tr><th>Terms in fragment</th><th>sum-of-distinct-weights</th><th>sum-of-boosts</th></tr>
<tr><td>das alte testament</td><td>5.339621</td><td>3.0</td></tr>
<tr><td>das alte testament</td><td>5.339621</td><td>3.0</td></tr>
<tr><td>das testament alte</td><td>5.339621</td><td>3.0</td></tr>
<tr><td>das alte testament</td><td>5.339621</td><td>3.0</td></tr>
<tr><td>das testament</td><td>2.9455688</td><td>2.0</td></tr>
<tr><td>das alte</td><td>2.4759595</td><td>2.0</td></tr>
<tr><td>das das das das</td><td>1.5015357</td><td>4.0</td></tr>
<tr><td>das das das</td><td>1.3003681</td><td>3.0</td></tr>
<tr><td>das das</td><td>1.061746</td><td>2.0</td></tr>
<tr><td>alte</td><td>1.0</td><td>1.0</td></tr>
<tr><td>alte</td><td>1.0</td><td>1.0</td></tr>
<tr><td>das</td><td>0.7507678</td><td>1.0</td></tr>
<tr><td>das</td><td>0.7507678</td><td>1.0</td></tr>
<tr><td>das</td><td>0.7507678</td><td>1.0</td></tr>
<tr><td>das</td><td>0.7507678</td><td>1.0</td></tr>
<tr><td>das</td><td>0.7507678</td><td>1.0</td></tr>
</table>
<h3>Step 5.</h3>
<p>In Step 5, by using <code>FieldFragList</code> and the field stored data,
Fast Vector Highlighter creates highlighted snippets!</p>
</body>
</html>