diff --git a/lucene/classification/src/java/org/apache/lucene/classification/package-info.java b/lucene/classification/src/java/org/apache/lucene/classification/package-info.java new file mode 100644 index 00000000000..a41187b9c72 --- /dev/null +++ b/lucene/classification/src/java/org/apache/lucene/classification/package-info.java @@ -0,0 +1,24 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** + * Uses already seen data (the indexed documents) to classify new documents. + *

+ * Currently contains a (simplistic) Naive Bayes classifier, a k-Nearest + * Neighbor classifier and a Perceptron based classifier. + */ +package org.apache.lucene.classification; diff --git a/lucene/classification/src/java/org/apache/lucene/classification/package.html b/lucene/classification/src/java/org/apache/lucene/classification/package.html deleted file mode 100644 index f5141fd2c7d..00000000000 --- a/lucene/classification/src/java/org/apache/lucene/classification/package.html +++ /dev/null @@ -1,22 +0,0 @@ - - - -Uses already seen data (the indexed documents) to classify new documents. -Currently contains a (simplistic) Naive Bayes classifier, a k-Nearest Neighbor classifier and a Perceptron based classifier - - diff --git a/lucene/classification/src/java/org/apache/lucene/classification/utils/package-info.java b/lucene/classification/src/java/org/apache/lucene/classification/utils/package-info.java new file mode 100644 index 00000000000..d510e093c2c --- /dev/null +++ b/lucene/classification/src/java/org/apache/lucene/classification/utils/package-info.java @@ -0,0 +1,21 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** + * Utilities for evaluation, data preparation, etc. + */ +package org.apache.lucene.classification.utils; diff --git a/lucene/classification/src/java/org/apache/lucene/classification/utils/package.html b/lucene/classification/src/java/org/apache/lucene/classification/utils/package.html deleted file mode 100644 index 6ad3ea2ebac..00000000000 --- a/lucene/classification/src/java/org/apache/lucene/classification/utils/package.html +++ /dev/null @@ -1,23 +0,0 @@ - - - - - -Utilities for evaluation, data preparation, etc. - - diff --git a/lucene/codecs/src/java/org/apache/lucene/codecs/blockterms/package-info.java b/lucene/codecs/src/java/org/apache/lucene/codecs/blockterms/package-info.java new file mode 100644 index 00000000000..3317f0a09e9 --- /dev/null +++ b/lucene/codecs/src/java/org/apache/lucene/codecs/blockterms/package-info.java @@ -0,0 +1,21 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** + * Pluggable term index / block terms dictionary implementations. + */ +package org.apache.lucene.codecs.blockterms; diff --git a/lucene/codecs/src/java/org/apache/lucene/codecs/blockterms/package.html b/lucene/codecs/src/java/org/apache/lucene/codecs/blockterms/package.html deleted file mode 100644 index 95ebecfe3df..00000000000 --- a/lucene/codecs/src/java/org/apache/lucene/codecs/blockterms/package.html +++ /dev/null @@ -1,25 +0,0 @@ - - - - - - - -Pluggable term index / block terms dictionary implementations. - - diff --git a/lucene/codecs/src/java/org/apache/lucene/codecs/blocktreeords/package-info.java b/lucene/codecs/src/java/org/apache/lucene/codecs/blocktreeords/package-info.java new file mode 100644 index 00000000000..57f7e38649c --- /dev/null +++ b/lucene/codecs/src/java/org/apache/lucene/codecs/blocktreeords/package-info.java @@ -0,0 +1,23 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** + * Same postings format as Lucene50, except the terms dictionary also + * supports ords, i.e. returning which ord the enum is seeked to, and + * seeking by ord. + */ +package org.apache.lucene.codecs.blocktreeords; diff --git a/lucene/codecs/src/java/org/apache/lucene/codecs/blocktreeords/package.html b/lucene/codecs/src/java/org/apache/lucene/codecs/blocktreeords/package.html deleted file mode 100644 index 4af09d5c6c2..00000000000 --- a/lucene/codecs/src/java/org/apache/lucene/codecs/blocktreeords/package.html +++ /dev/null @@ -1,27 +0,0 @@ - - - - - - - -Same postings format as Lucene41, except the terms dictionary also -supports ords, i.e. returning which ord the enum is seeked to, and -seeking by ord. - - diff --git a/lucene/codecs/src/java/org/apache/lucene/codecs/bloom/package-info.java b/lucene/codecs/src/java/org/apache/lucene/codecs/bloom/package-info.java new file mode 100644 index 00000000000..9080ae664e7 --- /dev/null +++ b/lucene/codecs/src/java/org/apache/lucene/codecs/bloom/package-info.java @@ -0,0 +1,22 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** + * Codec PostingsFormat for fast access to low-frequency terms + * such as primary key fields. + */ +package org.apache.lucene.codecs.bloom; diff --git a/lucene/codecs/src/java/org/apache/lucene/codecs/bloom/package.html b/lucene/codecs/src/java/org/apache/lucene/codecs/bloom/package.html deleted file mode 100644 index a0c591ae4e7..00000000000 --- a/lucene/codecs/src/java/org/apache/lucene/codecs/bloom/package.html +++ /dev/null @@ -1,25 +0,0 @@ - - - - - - - -Codec PostingsFormat for fast access to low-frequency terms such as primary key fields. - - \ No newline at end of file diff --git a/lucene/codecs/src/java/org/apache/lucene/codecs/memory/package-info.java b/lucene/codecs/src/java/org/apache/lucene/codecs/memory/package-info.java new file mode 100644 index 00000000000..fadc94a391d --- /dev/null +++ b/lucene/codecs/src/java/org/apache/lucene/codecs/memory/package-info.java @@ -0,0 +1,22 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** + * Term dictionary, DocValues or Postings formats that are read + * entirely into memory. + */ +package org.apache.lucene.codecs.memory; diff --git a/lucene/codecs/src/java/org/apache/lucene/codecs/memory/package.html b/lucene/codecs/src/java/org/apache/lucene/codecs/memory/package.html deleted file mode 100644 index d752b4ddeed..00000000000 --- a/lucene/codecs/src/java/org/apache/lucene/codecs/memory/package.html +++ /dev/null @@ -1,25 +0,0 @@ - - - - - - - -Term dictionary, DocValues or Postings formats that are read entirely into memory. - - diff --git a/lucene/codecs/src/java/org/apache/lucene/codecs/simpletext/package-info.java b/lucene/codecs/src/java/org/apache/lucene/codecs/simpletext/package-info.java new file mode 100644 index 00000000000..812a93c0867 --- /dev/null +++ b/lucene/codecs/src/java/org/apache/lucene/codecs/simpletext/package-info.java @@ -0,0 +1,21 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** + * Simpletext Codec: writes human readable postings. + */ +package org.apache.lucene.codecs.simpletext; diff --git a/lucene/codecs/src/java/org/apache/lucene/codecs/simpletext/package.html b/lucene/codecs/src/java/org/apache/lucene/codecs/simpletext/package.html deleted file mode 100644 index 88aad683412..00000000000 --- a/lucene/codecs/src/java/org/apache/lucene/codecs/simpletext/package.html +++ /dev/null @@ -1,25 +0,0 @@ - - - - - - - -Simpletext Codec: writes human readable postings. - - diff --git a/lucene/demo/src/java/org/apache/lucene/demo/facet/package-info.java b/lucene/demo/src/java/org/apache/lucene/demo/facet/package-info.java new file mode 100644 index 00000000000..af74d23e2a1 --- /dev/null +++ b/lucene/demo/src/java/org/apache/lucene/demo/facet/package-info.java @@ -0,0 +1,21 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** + * Facets example code. + */ +package org.apache.lucene.demo.facet; diff --git a/lucene/demo/src/java/org/apache/lucene/demo/facet/package.html b/lucene/demo/src/java/org/apache/lucene/demo/facet/package.html deleted file mode 100644 index cc637090e99..00000000000 --- a/lucene/demo/src/java/org/apache/lucene/demo/facet/package.html +++ /dev/null @@ -1,22 +0,0 @@ - - - - -Facets example code. - - diff --git a/lucene/demo/src/java/org/apache/lucene/demo/package-info.java b/lucene/demo/src/java/org/apache/lucene/demo/package-info.java new file mode 100644 index 00000000000..77f76dd3ed2 --- /dev/null +++ b/lucene/demo/src/java/org/apache/lucene/demo/package-info.java @@ -0,0 +1,21 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** + * Demo applications for indexing and searching. + */ +package org.apache.lucene.demo; diff --git a/lucene/demo/src/java/org/apache/lucene/demo/package.html b/lucene/demo/src/java/org/apache/lucene/demo/package.html deleted file mode 100644 index 7c4715ffb74..00000000000 --- a/lucene/demo/src/java/org/apache/lucene/demo/package.html +++ /dev/null @@ -1,22 +0,0 @@ - - - - -Demo applications for indexing and searching. - - diff --git a/lucene/demo/src/java/org/apache/lucene/demo/xmlparser/package-info.java b/lucene/demo/src/java/org/apache/lucene/demo/xmlparser/package-info.java new file mode 100644 index 00000000000..0b569dcff1c --- /dev/null +++ b/lucene/demo/src/java/org/apache/lucene/demo/xmlparser/package-info.java @@ -0,0 +1,21 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** + * Demo servlet for the XML Query Parser. + */ +package org.apache.lucene.demo.xmlparser; diff --git a/lucene/demo/src/java/org/apache/lucene/demo/xmlparser/package.html b/lucene/demo/src/java/org/apache/lucene/demo/xmlparser/package.html deleted file mode 100644 index 55a6cb41c9f..00000000000 --- a/lucene/demo/src/java/org/apache/lucene/demo/xmlparser/package.html +++ /dev/null @@ -1,22 +0,0 @@ - - - - -Demo servlet for the XML Query Parser. - - diff --git a/lucene/expressions/src/java/org/apache/lucene/expressions/js/package-info.java b/lucene/expressions/src/java/org/apache/lucene/expressions/js/package-info.java new file mode 100644 index 00000000000..152cd80d64a --- /dev/null +++ b/lucene/expressions/src/java/org/apache/lucene/expressions/js/package-info.java @@ -0,0 +1,42 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** + * Javascript expressions. + *

A Javascript expression is a numeric expression specified using an expression syntax that's based on JavaScript expressions. You can construct expressions using:

+ * + * + *

+ * JavaScript order of precedence rules apply for operators. Shortcut evaluation is used for logical operators—the second argument is only evaluated if the value of the expression cannot be determined after evaluating the first argument. For example, in the expression a || b, b is only evaluated if a is not true. + *

+ * + *

+ * To compile an expression, use {@link org.apache.lucene.expressions.js.JavascriptCompiler}. + *

+ */ +package org.apache.lucene.expressions.js; diff --git a/lucene/expressions/src/java/org/apache/lucene/expressions/js/package.html b/lucene/expressions/src/java/org/apache/lucene/expressions/js/package.html deleted file mode 100644 index 27015902d58..00000000000 --- a/lucene/expressions/src/java/org/apache/lucene/expressions/js/package.html +++ /dev/null @@ -1,45 +0,0 @@ - - - - Javascript expressions - - -

Javascript expressions

-

A Javascript expression is a numeric expression specified using an expression syntax that's based on JavaScript expressions. You can construct expressions using:

- - -

-JavaScript order of precedence rules apply for operators. Shortcut evaluation is used for logical operators—the second argument is only evaluated if the value of the expression cannot be determined after evaluating the first argument. For example, in the expression a || b, b is only evaluated if a is not true. -

- -

- To compile an expression, use {@link org.apache.lucene.expressions.js.JavascriptCompiler}. -

- - \ No newline at end of file diff --git a/lucene/expressions/src/java/org/apache/lucene/expressions/package-info.java b/lucene/expressions/src/java/org/apache/lucene/expressions/package-info.java new file mode 100644 index 00000000000..62a519b4c20 --- /dev/null +++ b/lucene/expressions/src/java/org/apache/lucene/expressions/package-info.java @@ -0,0 +1,35 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** + * Expressions. + *

+ * {@link org.apache.lucene.expressions.Expression} - result of compiling an expression, which can + * evaluate it for a given document. Each expression can have external variables are resolved by + * {@code Bindings}. + *

+ * + *

+ * {@link org.apache.lucene.expressions.Bindings} - abstraction for binding external variables + * to a way to get a value for those variables for a particular document (ValueSource). + *

+ * + *

+ * {@link org.apache.lucene.expressions.SimpleBindings} - default implementation of bindings which provide easy ways to bind sort fields and other expressions to external variables + *

+ */ +package org.apache.lucene.expressions; diff --git a/lucene/expressions/src/java/org/apache/lucene/expressions/package.html b/lucene/expressions/src/java/org/apache/lucene/expressions/package.html deleted file mode 100644 index d4de75db31b..00000000000 --- a/lucene/expressions/src/java/org/apache/lucene/expressions/package.html +++ /dev/null @@ -1,39 +0,0 @@ - - - - expressions - - -

expressions

-

-{@link org.apache.lucene.expressions.Expression} - result of compiling an expression, which can -evaluate it for a given document. Each expression can have external variables are resolved by -{@code Bindings}. -

- -

-{@link org.apache.lucene.expressions.Bindings} - abstraction for binding external variables -to a way to get a value for those variables for a particular document (ValueSource). -

- -

-{@link org.apache.lucene.expressions.SimpleBindings} - default implementation of bindings which provide easy ways to bind sort fields and other expressions to external variables -

- - - \ No newline at end of file diff --git a/lucene/facet/src/java/org/apache/lucene/facet/package-info.java b/lucene/facet/src/java/org/apache/lucene/facet/package-info.java new file mode 100644 index 00000000000..0501d6ae862 --- /dev/null +++ b/lucene/facet/src/java/org/apache/lucene/facet/package-info.java @@ -0,0 +1,61 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** + * Faceted search. + *

+ * This module provides multiple methods for computing facet counts and + * value aggregations: + *

+ *

+ * At search time you first run your search, but pass a {@link + * org.apache.lucene.facet.FacetsCollector} to gather all hits (and optionally, scores for each + * hit). Then, instantiate whichever facet methods you'd like to use + * to compute aggregates. Finally, all methods implement a common + * {@link org.apache.lucene.facet.Facets} base API that you use to obtain specific facet + * counts. + *

+ *

+ * The various {@link org.apache.lucene.facet.FacetsCollector#search} utility methods are + * useful for doing an "ordinary" search (sorting by score, or by a + * specified Sort) but also collecting into a {@link org.apache.lucene.facet.FacetsCollector} for + * subsequent faceting. + *

+ */ +package org.apache.lucene.facet; diff --git a/lucene/facet/src/java/org/apache/lucene/facet/package.html b/lucene/facet/src/java/org/apache/lucene/facet/package.html deleted file mode 100644 index be1de9a0c29..00000000000 --- a/lucene/facet/src/java/org/apache/lucene/facet/package.html +++ /dev/null @@ -1,65 +0,0 @@ - - - - faceted search - - -

faceted search

-

- This module provides multiple methods for computing facet counts and - value aggregations: -

-

-

- At search time you first run your search, but pass a {@link - org.apache.lucene.facet.FacetsCollector} to gather all hits (and optionally, scores for each - hit). Then, instantiate whichever facet methods you'd like to use - to compute aggregates. Finally, all methods implement a common - {@link org.apache.lucene.facet.Facets} base API that you use to obtain specific facet - counts. -

-

- The various {@link org.apache.lucene.facet.FacetsCollector#search} utility methods are - useful for doing an "ordinary" search (sorting by score, or by a - specified Sort) but also collecting into a {@link org.apache.lucene.facet.FacetsCollector} for - subsequent faceting. -

- - diff --git a/lucene/facet/src/java/org/apache/lucene/facet/range/package-info.java b/lucene/facet/src/java/org/apache/lucene/facet/range/package-info.java new file mode 100644 index 00000000000..aa41077fc3e --- /dev/null +++ b/lucene/facet/src/java/org/apache/lucene/facet/range/package-info.java @@ -0,0 +1,21 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** + * Provides range faceting capabilities. + */ +package org.apache.lucene.facet.range; diff --git a/lucene/facet/src/java/org/apache/lucene/facet/range/package.html b/lucene/facet/src/java/org/apache/lucene/facet/range/package.html deleted file mode 100644 index fc2ba1071a0..00000000000 --- a/lucene/facet/src/java/org/apache/lucene/facet/range/package.html +++ /dev/null @@ -1,24 +0,0 @@ - - - -Range Facets - - -Provides range faceting capabilities. - - \ No newline at end of file diff --git a/lucene/facet/src/java/org/apache/lucene/facet/sortedset/package-info.java b/lucene/facet/src/java/org/apache/lucene/facet/sortedset/package-info.java new file mode 100644 index 00000000000..9cb254195a0 --- /dev/null +++ b/lucene/facet/src/java/org/apache/lucene/facet/sortedset/package-info.java @@ -0,0 +1,22 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** + * Provides faceting capabilities over facets that were indexed + * with {@link org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetField}. + */ +package org.apache.lucene.facet.sortedset; diff --git a/lucene/facet/src/java/org/apache/lucene/facet/sortedset/package.html b/lucene/facet/src/java/org/apache/lucene/facet/sortedset/package.html deleted file mode 100644 index 08a4363aab8..00000000000 --- a/lucene/facet/src/java/org/apache/lucene/facet/sortedset/package.html +++ /dev/null @@ -1,25 +0,0 @@ - - - -SortedSet Facets - - -Provides faceting capabilities over facets that were indexed with {@link org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetField}. - - - \ No newline at end of file diff --git a/lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/package-info.java b/lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/package-info.java new file mode 100644 index 00000000000..9867f248489 --- /dev/null +++ b/lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/package-info.java @@ -0,0 +1,21 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** + * Taxonomy index implementation using on top of a Directory. + */ +package org.apache.lucene.facet.taxonomy.directory; diff --git a/lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/package.html b/lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/package.html deleted file mode 100644 index edbec74f587..00000000000 --- a/lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/package.html +++ /dev/null @@ -1,24 +0,0 @@ - - - -Taxonomy index implementation using on top of a Directory - - -Taxonomy index implementation using on top of a Directory. - - \ No newline at end of file diff --git a/lucene/facet/src/java/org/apache/lucene/facet/taxonomy/package-info.java b/lucene/facet/src/java/org/apache/lucene/facet/taxonomy/package-info.java new file mode 100644 index 00000000000..ce097540e9c --- /dev/null +++ b/lucene/facet/src/java/org/apache/lucene/facet/taxonomy/package-info.java @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** + * Taxonomy of Categories. + *

+ * Facets are defined using a hierarchy of categories, known as a Taxonomy. + * For example, the taxonomy of a book store application might have the following structure: + * + *

+ * + * + * + *

+ * The Taxonomy translates category-paths into integer identifiers (often termed ordinals) and vice versa. + * The category Author/Mark Twain adds two nodes to the taxonomy: Author and + * Author/Mark Twain, each is assigned a different ordinal. The taxonomy maintains the invariant that a + * node always has an ordinal that is < all its children. + */ +package org.apache.lucene.facet.taxonomy; \ No newline at end of file diff --git a/lucene/facet/src/java/org/apache/lucene/facet/taxonomy/package.html b/lucene/facet/src/java/org/apache/lucene/facet/taxonomy/package.html deleted file mode 100644 index 8713c672ca3..00000000000 --- a/lucene/facet/src/java/org/apache/lucene/facet/taxonomy/package.html +++ /dev/null @@ -1,53 +0,0 @@ - - - -Taxonomy of Categories - - -

Taxonomy of Categories

- - Facets are defined using a hierarchy of categories, known as a Taxonomy. - For example, the taxonomy of a book store application might have the following structure: - - - - The Taxonomy translates category-paths into interger identifiers (often termed ordinals) and vice versa. - The category Author/Mark Twain adds two nodes to the taxonomy: Author and - Author/Mark Twain, each is assigned a different ordinal. The taxonomy maintains the invariant that a - node always has an ordinal that is < all its children. - - \ No newline at end of file diff --git a/lucene/facet/src/java/org/apache/lucene/facet/taxonomy/writercache/package-info.java b/lucene/facet/src/java/org/apache/lucene/facet/taxonomy/writercache/package-info.java new file mode 100644 index 00000000000..263e9ea2b4c --- /dev/null +++ b/lucene/facet/src/java/org/apache/lucene/facet/taxonomy/writercache/package-info.java @@ -0,0 +1,21 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** + * Improves indexing time by caching a map of CategoryPath to their Ordinal. + */ +package org.apache.lucene.facet.taxonomy.writercache; diff --git a/lucene/facet/src/java/org/apache/lucene/facet/taxonomy/writercache/package.html b/lucene/facet/src/java/org/apache/lucene/facet/taxonomy/writercache/package.html deleted file mode 100644 index 72bae6f6a73..00000000000 --- a/lucene/facet/src/java/org/apache/lucene/facet/taxonomy/writercache/package.html +++ /dev/null @@ -1,24 +0,0 @@ - - - -Taxonomy index cache - - -Improves indexing time by caching a map of CategoryPath to their Ordinal. - - \ No newline at end of file diff --git a/lucene/grouping/src/java/org/apache/lucene/search/grouping/function/package-info.java b/lucene/grouping/src/java/org/apache/lucene/search/grouping/function/package-info.java new file mode 100644 index 00000000000..73588ce2463 --- /dev/null +++ b/lucene/grouping/src/java/org/apache/lucene/search/grouping/function/package-info.java @@ -0,0 +1,21 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** + * Support for grouping by {@link org.apache.lucene.queries.function.ValueSource}. + */ +package org.apache.lucene.search.grouping.function; diff --git a/lucene/grouping/src/java/org/apache/lucene/search/grouping/function/package.html b/lucene/grouping/src/java/org/apache/lucene/search/grouping/function/package.html deleted file mode 100644 index bd03b879c12..00000000000 --- a/lucene/grouping/src/java/org/apache/lucene/search/grouping/function/package.html +++ /dev/null @@ -1,21 +0,0 @@ - - - -Support for grouping by {@link org.apache.lucene.queries.function.ValueSource}. - - diff --git a/lucene/grouping/src/java/org/apache/lucene/search/grouping/package-info.java b/lucene/grouping/src/java/org/apache/lucene/search/grouping/package-info.java new file mode 100644 index 00000000000..824a98e31bf --- /dev/null +++ b/lucene/grouping/src/java/org/apache/lucene/search/grouping/package-info.java @@ -0,0 +1,200 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** + * Grouping. + *

+ * This module enables search result grouping with Lucene, where hits + * with the same value in the specified single-valued group field are + * grouped together. For example, if you group by the author + * field, then all documents with the same value in the author + * field fall into a single group. + *

+ * + *

Grouping requires a number of inputs:

+ * + * + * + *

The implementation is two-pass: the first pass ({@link + * org.apache.lucene.search.grouping.term.TermFirstPassGroupingCollector}) + * gathers the top groups, and the second pass ({@link + * org.apache.lucene.search.grouping.term.TermSecondPassGroupingCollector}) + * gathers documents within those groups. If the search is costly to + * run you may want to use the {@link + * org.apache.lucene.search.CachingCollector} class, which + * caches hits and can (quickly) replay them for the second pass. This + * way you only run the query once, but you pay a RAM cost to (briefly) + * hold all hits. Results are returned as a {@link + * org.apache.lucene.search.grouping.TopGroups} instance.

+ * + *

+ * This module abstracts away what defines group and how it is collected. All grouping collectors + * are abstract and have currently term based implementations. One can implement + * collectors that for example group on multiple fields. + *

+ * + *

Known limitations:

+ * + * + *

Typical usage for the generic two-pass grouping search looks like this using the grouping convenience utility + * (optionally using caching for the second pass search):

+ * + *
+ *   GroupingSearch groupingSearch = new GroupingSearch("author");
+ *   groupingSearch.setGroupSort(groupSort);
+ *   groupingSearch.setFillSortFields(fillFields);
+ * 
+ *   if (useCache) {
+ *     // Sets cache in MB
+ *     groupingSearch.setCachingInMB(4.0, true);
+ *   }
+ * 
+ *   if (requiredTotalGroupCount) {
+ *     groupingSearch.setAllGroups(true);
+ *   }
+ * 
+ *   TermQuery query = new TermQuery(new Term("content", searchTerm));
+ *   TopGroups<BytesRef> result = groupingSearch.search(indexSearcher, query, groupOffset, groupLimit);
+ * 
+ *   // Render groupsResult...
+ *   if (requiredTotalGroupCount) {
+ *     int totalGroupCount = result.totalGroupCount;
+ *   }
+ * 
+ * + *

To use the single-pass BlockGroupingCollector, + * first, at indexing time, you must ensure all docs in each group + * are added as a block, and you have some way to find the last + * document of each group. One simple way to do this is to add a + * marker binary field:

+ * + *
+ *   // Create Documents from your source:
+ *   List<Document> oneGroup = ...;
+ *   
+ *   Field groupEndField = new Field("groupEnd", "x", Field.Store.NO, Field.Index.NOT_ANALYZED);
+ *   groupEndField.setIndexOptions(IndexOptions.DOCS_ONLY);
+ *   groupEndField.setOmitNorms(true);
+ *   oneGroup.get(oneGroup.size()-1).add(groupEndField);
+ * 
+ *   // You can also use writer.updateDocuments(); just be sure you
+ *   // replace an entire previous doc block with this new one.  For
+ *   // example, each group could have a "groupID" field, with the same
+ *   // value for all docs in this group:
+ *   writer.addDocuments(oneGroup);
+ * 
+ * + * Then, at search time, do this up front: + * + *
+ *   // Set this once in your app & save away for reusing across all queries:
+ *   Filter groupEndDocs = new CachingWrapperFilter(new QueryWrapperFilter(new TermQuery(new Term("groupEnd", "x"))));
+ * 
+ * + * Finally, do this per search: + * + *
+ *   // Per search:
+ *   BlockGroupingCollector c = new BlockGroupingCollector(groupSort, groupOffset+topNGroups, needsScores, groupEndDocs);
+ *   s.search(new TermQuery(new Term("content", searchTerm)), c);
+ *   TopGroups groupsResult = c.getTopGroups(withinGroupSort, groupOffset, docOffset, docOffset+docsPerGroup, fillFields);
+ * 
+ *   // Render groupsResult...
+ * 
+ * + * Or alternatively use the GroupingSearch convenience utility: + * + *
+ *   // Per search:
+ *   GroupingSearch groupingSearch = new GroupingSearch(groupEndDocs);
+ *   groupingSearch.setGroupSort(groupSort);
+ *   groupingSearch.setIncludeScores(needsScores);
+ *   TermQuery query = new TermQuery(new Term("content", searchTerm));
+ *   TopGroups groupsResult = groupingSearch.search(indexSearcher, query, groupOffset, groupLimit);
+ *
+ *   // Render groupsResult...
+ * 
+ * + * Note that the groupValue of each GroupDocs + * will be null, so if you need to present this value you'll + * have to separately retrieve it (for example using stored + * fields, FieldCache, etc.). + * + *

Another collector is the TermAllGroupHeadsCollector that can be used to retrieve all most relevant + * documents per group. Also known as group heads. This can be useful in situations when one wants to compute group + * based facets / statistics on the complete query result. The collector can be executed during the first or second + * phase. This collector can also be used with the GroupingSearch convenience utility, but when if one only + * wants to compute the most relevant documents per group it is better to just use the collector as done here below.

+ * + *
+ *   AbstractAllGroupHeadsCollector c = TermAllGroupHeadsCollector.create(groupField, sortWithinGroup);
+ *   s.search(new TermQuery(new Term("content", searchTerm)), c);
+ *   // Return all group heads as int array
+ *   int[] groupHeadsArray = c.retrieveGroupHeads()
+ *   // Return all group heads as FixedBitSet.
+ *   int maxDoc = s.maxDoc();
+ *   FixedBitSet groupHeadsBitSet = c.retrieveGroupHeads(maxDoc)
+ * 
+ * + *

For each of the above collector types there is also a variant that works with ValueSource instead of + * of fields. Concretely this means that these variants can work with functions. These variants are slower than + * there term based counter parts. These implementations are located in the + * org.apache.lucene.search.grouping.function package, but can also be used with the + * GroupingSearch convenience utility + *

+ */ +package org.apache.lucene.search.grouping; diff --git a/lucene/grouping/src/java/org/apache/lucene/search/grouping/package.html b/lucene/grouping/src/java/org/apache/lucene/search/grouping/package.html deleted file mode 100644 index c346c717c7d..00000000000 --- a/lucene/grouping/src/java/org/apache/lucene/search/grouping/package.html +++ /dev/null @@ -1,199 +0,0 @@ - - - - -

This module enables search result grouping with Lucene, where hits -with the same value in the specified single-valued group field are -grouped together. For example, if you group by the author -field, then all documents with the same value in the author -field fall into a single group.

- -

Grouping requires a number of inputs:

- - - -

The implementation is two-pass: the first pass ({@link - org.apache.lucene.search.grouping.term.TermFirstPassGroupingCollector}) - gathers the top groups, and the second pass ({@link - org.apache.lucene.search.grouping.term.TermSecondPassGroupingCollector}) - gathers documents within those groups. If the search is costly to - run you may want to use the {@link - org.apache.lucene.search.CachingCollector} class, which - caches hits and can (quickly) replay them for the second pass. This - way you only run the query once, but you pay a RAM cost to (briefly) - hold all hits. Results are returned as a {@link - org.apache.lucene.search.grouping.TopGroups} instance.

- -

- This module abstracts away what defines group and how it is collected. All grouping collectors - are abstract and have currently term based implementations. One can implement - collectors that for example group on multiple fields. -

- -

Known limitations:

- - -

Typical usage for the generic two-pass grouping search looks like this using the grouping convenience utility - (optionally using caching for the second pass search):

- -
-  GroupingSearch groupingSearch = new GroupingSearch("author");
-  groupingSearch.setGroupSort(groupSort);
-  groupingSearch.setFillSortFields(fillFields);
-
-  if (useCache) {
-    // Sets cache in MB
-    groupingSearch.setCachingInMB(4.0, true);
-  }
-
-  if (requiredTotalGroupCount) {
-    groupingSearch.setAllGroups(true);
-  }
-
-  TermQuery query = new TermQuery(new Term("content", searchTerm));
-  TopGroups<BytesRef> result = groupingSearch.search(indexSearcher, query, groupOffset, groupLimit);
-
-  // Render groupsResult...
-  if (requiredTotalGroupCount) {
-    int totalGroupCount = result.totalGroupCount;
-  }
-
- -

To use the single-pass BlockGroupingCollector, - first, at indexing time, you must ensure all docs in each group - are added as a block, and you have some way to find the last - document of each group. One simple way to do this is to add a - marker binary field:

- -
-  // Create Documents from your source:
-  List<Document> oneGroup = ...;
-  
-  Field groupEndField = new Field("groupEnd", "x", Field.Store.NO, Field.Index.NOT_ANALYZED);
-  groupEndField.setIndexOptions(IndexOptions.DOCS_ONLY);
-  groupEndField.setOmitNorms(true);
-  oneGroup.get(oneGroup.size()-1).add(groupEndField);
-
-  // You can also use writer.updateDocuments(); just be sure you
-  // replace an entire previous doc block with this new one.  For
-  // example, each group could have a "groupID" field, with the same
-  // value for all docs in this group:
-  writer.addDocuments(oneGroup);
-
- -Then, at search time, do this up front: - -
-  // Set this once in your app & save away for reusing across all queries:
-  Filter groupEndDocs = new CachingWrapperFilter(new QueryWrapperFilter(new TermQuery(new Term("groupEnd", "x"))));
-
- -Finally, do this per search: - -
-  // Per search:
-  BlockGroupingCollector c = new BlockGroupingCollector(groupSort, groupOffset+topNGroups, needsScores, groupEndDocs);
-  s.search(new TermQuery(new Term("content", searchTerm)), c);
-  TopGroups groupsResult = c.getTopGroups(withinGroupSort, groupOffset, docOffset, docOffset+docsPerGroup, fillFields);
-
-  // Render groupsResult...
-
- -Or alternatively use the GroupingSearch convenience utility: - -
-  // Per search:
-  GroupingSearch groupingSearch = new GroupingSearch(groupEndDocs);
-  groupingSearch.setGroupSort(groupSort);
-  groupingSearch.setIncludeScores(needsScores);
-  TermQuery query = new TermQuery(new Term("content", searchTerm));
-  TopGroups groupsResult = groupingSearch.search(indexSearcher, query, groupOffset, groupLimit);
-
-  // Render groupsResult...
-
- -Note that the groupValue of each GroupDocs -will be null, so if you need to present this value you'll -have to separately retrieve it (for example using stored -fields, FieldCache, etc.). - -

Another collector is the TermAllGroupHeadsCollector that can be used to retrieve all most relevant - documents per group. Also known as group heads. This can be useful in situations when one wants to compute group - based facets / statistics on the complete query result. The collector can be executed during the first or second - phase. This collector can also be used with the GroupingSearch convenience utility, but when if one only - wants to compute the most relevant documents per group it is better to just use the collector as done here below.

- -
-  AbstractAllGroupHeadsCollector c = TermAllGroupHeadsCollector.create(groupField, sortWithinGroup);
-  s.search(new TermQuery(new Term("content", searchTerm)), c);
-  // Return all group heads as int array
-  int[] groupHeadsArray = c.retrieveGroupHeads()
-  // Return all group heads as FixedBitSet.
-  int maxDoc = s.maxDoc();
-  FixedBitSet groupHeadsBitSet = c.retrieveGroupHeads(maxDoc)
-
- -

For each of the above collector types there is also a variant that works with ValueSource instead of - of fields. Concretely this means that these variants can work with functions. These variants are slower than - there term based counter parts. These implementations are located in the - org.apache.lucene.search.grouping.function package, but can also be used with the - GroupingSearch convenience utility -

- - - diff --git a/lucene/grouping/src/java/org/apache/lucene/search/grouping/term/package-info.java b/lucene/grouping/src/java/org/apache/lucene/search/grouping/term/package-info.java new file mode 100644 index 00000000000..27320118d7c --- /dev/null +++ b/lucene/grouping/src/java/org/apache/lucene/search/grouping/term/package-info.java @@ -0,0 +1,21 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** + * Support for grouping by indexed terms via {@link org.apache.lucene.index.DocValues}. + */ +package org.apache.lucene.search.grouping.term; diff --git a/lucene/grouping/src/java/org/apache/lucene/search/grouping/term/package.html b/lucene/grouping/src/java/org/apache/lucene/search/grouping/term/package.html deleted file mode 100644 index 29b44c5a6bc..00000000000 --- a/lucene/grouping/src/java/org/apache/lucene/search/grouping/term/package.html +++ /dev/null @@ -1,21 +0,0 @@ - - - -Support for grouping by indexed terms via {@link org.apache.lucene.index.DocValues}. - - diff --git a/lucene/highlighter/src/java/org/apache/lucene/search/highlight/package-info.java b/lucene/highlighter/src/java/org/apache/lucene/search/highlight/package-info.java new file mode 100755 index 00000000000..a435ff5111e --- /dev/null +++ b/lucene/highlighter/src/java/org/apache/lucene/search/highlight/package-info.java @@ -0,0 +1,95 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** + * Highlighting search terms. + *

+ * The highlight package contains classes to provide "keyword in context" features + * typically used to highlight search terms in the text of results pages. + * The Highlighter class is the central component and can be used to extract the + * most interesting sections of a piece of text and highlight them, with the help of + * Fragmenter, fragment Scorer, and Formatter classes. + * + *

Example Usage

+ * + *
+ * //... Above, create documents with two fields, one with term vectors (tv) and one without (notv)
+ * IndexSearcher searcher = new IndexSearcher(directory);
+ * QueryParser parser = new QueryParser("notv", analyzer);
+ * Query query = parser.parse("million");
+ * 
+ *   TopDocs hits = searcher.search(query, 10);
+ * 
+ *   SimpleHTMLFormatter htmlFormatter = new SimpleHTMLFormatter();
+ *   Highlighter highlighter = new Highlighter(htmlFormatter, new QueryScorer(query));
+ *   for (int i = 0; i < 10; i++) {
+ *     int id = hits.scoreDocs[i].doc;
+ *     Document doc = searcher.doc(id);
+ *     String text = doc.get("notv");
+ *     TokenStream tokenStream = TokenSources.getAnyTokenStream(searcher.getIndexReader(), id, "notv", analyzer);
+ *     TextFragment[] frag = highlighter.getBestTextFragments(tokenStream, text, false, 10);//highlighter.getBestFragments(tokenStream, text, 3, "...");
+ *     for (int j = 0; j < frag.length; j++) {
+ *       if ((frag[j] != null) && (frag[j].getScore() > 0)) {
+ *         System.out.println((frag[j].toString()));
+ *       }
+ *     }
+ *     //Term vector
+ *     text = doc.get("tv");
+ *     tokenStream = TokenSources.getAnyTokenStream(searcher.getIndexReader(), hits.scoreDocs[i].doc, "tv", analyzer);
+ *     frag = highlighter.getBestTextFragments(tokenStream, text, false, 10);
+ *     for (int j = 0; j < frag.length; j++) {
+ *       if ((frag[j] != null) && (frag[j].getScore() > 0)) {
+ *         System.out.println((frag[j].toString()));
+ *       }
+ *     }
+ *     System.out.println("-------------");
+ *   }
+ * 
+ * + *

New features 06/02/2005

+ * + * This release adds options for encoding (thanks to Nicko Cadell). + * An "Encoder" implementation such as the new SimpleHTMLEncoder class can be passed to the highlighter to encode + * all those non-xhtml standard characters such as & into legal values. This simple class may not suffice for + * some languages - Commons Lang has an implementation that could be used: escapeHtml(String) in + * http://svn.apache.org/viewcvs.cgi/jakarta/commons/proper/lang/trunk/src/java/org/apache/commons/lang/StringEscapeUtils.java?rev=137958&view=markup + * + *

New features 22/12/2004

+ * + * This release adds some new capabilities: + *
    + *
  1. Faster highlighting using Term vector support
  2. + *
  3. New formatting options to use color intensity to show informational value
  4. + *
  5. Options for better summarization by using term IDF scores to influence fragment selection
  6. + *
+ * + *

+ * The highlighter takes a TokenStream as input. Until now these streams have typically been produced + * using an Analyzer but the new class TokenSources provides helper methods for obtaining TokenStreams from + * the new TermVector position support (see latest CVS version).

+ * + *

The new class GradientFormatter can use a scale of colors to highlight terms according to their score. + * A subtle use of color can help emphasise the reasons for matching (useful when doing "MoreLikeThis" queries and + * you want to see what the basis of the similarities are).

+ * + *

The QueryScorer class has a new constructor which can use an IndexReader to derive the IDF (inverse document frequency) + * for each term in order to influence the score. This is useful for helping to extracting the most significant sections + * of a document and in supplying scores used by the new GradientFormatter to color significant words more strongly. + * The QueryScorer.getMaxWeight method is useful when passed to the GradientFormatter constructor to define the top score + * which is associated with the top color.

+ */ +package org.apache.lucene.search.highlight; diff --git a/lucene/highlighter/src/java/org/apache/lucene/search/highlight/package.html b/lucene/highlighter/src/java/org/apache/lucene/search/highlight/package.html deleted file mode 100755 index b890f250946..00000000000 --- a/lucene/highlighter/src/java/org/apache/lucene/search/highlight/package.html +++ /dev/null @@ -1,99 +0,0 @@ - - - - - -The highlight package contains classes to provide "keyword in context" features -typically used to highlight search terms in the text of results pages. -The Highlighter class is the central component and can be used to extract the -most interesting sections of a piece of text and highlight them, with the help of -Fragmenter, fragment Scorer, and Formatter classes. - -

Example Usage

- -
-  //... Above, create documents with two fields, one with term vectors (tv) and one without (notv)
-  IndexSearcher searcher = new IndexSearcher(directory);
-  QueryParser parser = new QueryParser("notv", analyzer);
-  Query query = parser.parse("million");
-
-  TopDocs hits = searcher.search(query, 10);
-
-  SimpleHTMLFormatter htmlFormatter = new SimpleHTMLFormatter();
-  Highlighter highlighter = new Highlighter(htmlFormatter, new QueryScorer(query));
-  for (int i = 0; i < 10; i++) {
-    int id = hits.scoreDocs[i].doc;
-    Document doc = searcher.doc(id);
-    String text = doc.get("notv");
-    TokenStream tokenStream = TokenSources.getAnyTokenStream(searcher.getIndexReader(), id, "notv", analyzer);
-    TextFragment[] frag = highlighter.getBestTextFragments(tokenStream, text, false, 10);//highlighter.getBestFragments(tokenStream, text, 3, "...");
-    for (int j = 0; j < frag.length; j++) {
-      if ((frag[j] != null) && (frag[j].getScore() > 0)) {
-        System.out.println((frag[j].toString()));
-      }
-    }
-    //Term vector
-    text = doc.get("tv");
-    tokenStream = TokenSources.getAnyTokenStream(searcher.getIndexReader(), hits.scoreDocs[i].doc, "tv", analyzer);
-    frag = highlighter.getBestTextFragments(tokenStream, text, false, 10);
-    for (int j = 0; j < frag.length; j++) {
-      if ((frag[j] != null) && (frag[j].getScore() > 0)) {
-        System.out.println((frag[j].toString()));
-      }
-    }
-    System.out.println("-------------");
-  }
-
- -

New features 06/02/2005

- -This release adds options for encoding (thanks to Nicko Cadell). -An "Encoder" implementation such as the new SimpleHTMLEncoder class can be passed to the highlighter to encode -all those non-xhtml standard characters such as & into legal values. This simple class may not suffice for -some languages - Commons Lang has an implementation that could be used: escapeHtml(String) in -http://svn.apache.org/viewcvs.cgi/jakarta/commons/proper/lang/trunk/src/java/org/apache/commons/lang/StringEscapeUtils.java?rev=137958&view=markup - -

New features 22/12/2004

- -This release adds some new capabilities: -
    -
  1. Faster highlighting using Term vector support
  2. -
  3. New formatting options to use color intensity to show informational value
  4. -
  5. Options for better summarization by using term IDF scores to influence fragment selection
  6. -
- -

-The highlighter takes a TokenStream as input. Until now these streams have typically been produced -using an Analyzer but the new class TokenSources provides helper methods for obtaining TokenStreams from -the new TermVector position support (see latest CVS version).

- -

The new class GradientFormatter can use a scale of colors to highlight terms according to their score. -A subtle use of color can help emphasise the reasons for matching (useful when doing "MoreLikeThis" queries and -you want to see what the basis of the similarities are).

- -

The QueryScorer class has a new constructor which can use an IndexReader to derive the IDF (inverse document frequency) -for each term in order to influence the score. This is useful for helping to extracting the most significant sections -of a document and in supplying scores used by the new GradientFormatter to color significant words more strongly. -The QueryScorer.getMaxWeight method is useful when passed to the GradientFormatter constructor to define the top score -which is associated with the top color.

- - - - - - \ No newline at end of file diff --git a/lucene/highlighter/src/java/org/apache/lucene/search/postingshighlight/package-info.java b/lucene/highlighter/src/java/org/apache/lucene/search/postingshighlight/package-info.java new file mode 100644 index 00000000000..10013c2cd6c --- /dev/null +++ b/lucene/highlighter/src/java/org/apache/lucene/search/postingshighlight/package-info.java @@ -0,0 +1,21 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** + * Highlighter implementation that uses offsets from postings lists. + */ +package org.apache.lucene.search.postingshighlight; diff --git a/lucene/highlighter/src/java/org/apache/lucene/search/postingshighlight/package.html b/lucene/highlighter/src/java/org/apache/lucene/search/postingshighlight/package.html deleted file mode 100644 index d6b4663d77a..00000000000 --- a/lucene/highlighter/src/java/org/apache/lucene/search/postingshighlight/package.html +++ /dev/null @@ -1,22 +0,0 @@ - - - - -Highlighter implementation that uses offsets from postings lists. - - \ No newline at end of file diff --git a/lucene/highlighter/src/java/org/apache/lucene/search/vectorhighlight/package-info.java b/lucene/highlighter/src/java/org/apache/lucene/search/vectorhighlight/package-info.java new file mode 100644 index 00000000000..39d0f5d5441 --- /dev/null +++ b/lucene/highlighter/src/java/org/apache/lucene/search/vectorhighlight/package-info.java @@ -0,0 +1,194 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** + * Another highlighter implementation based on term vectors. + * + *

Features

+ * + * + *

Algorithm

+ *

To explain the algorithm, let's use the following sample text + * (to be highlighted) and user query:

+ * + * + * + * + * + * + * + * + * + * + *
Sample TextLucene is a search engine library.
User QueryLucene^2 OR "search library"~1
+ * + *

The user query is a BooleanQuery that consists of TermQuery("Lucene") + * with boost of 2 and PhraseQuery("search library") with slop of 1.

+ *

For your convenience, here is the offsets and positions info of the + * sample text.

+ * + *
+ * +--------+-----------------------------------+
+ * |        |          1111111111222222222233333|
+ * |  offset|01234567890123456789012345678901234|
+ * +--------+-----------------------------------+
+ * |document|Lucene is a search engine library. |
+ * +--------*-----------------------------------+
+ * |position|0      1  2 3      4      5        |
+ * +--------*-----------------------------------+
+ * 
+ * + *

Step 1.

+ *

In Step 1, Fast Vector Highlighter generates {@link org.apache.lucene.search.vectorhighlight.FieldQuery.QueryPhraseMap} from the user query. + * QueryPhraseMap consists of the following members:

+ *
+ * public class QueryPhraseMap {
+ *   boolean terminal;
+ *   int slop;   // valid if terminal == true and phraseHighlight == true
+ *   float boost;  // valid if terminal == true
+ *   Map<String, QueryPhraseMap> subMap;
+ * } 
+ * 
+ *

QueryPhraseMap has subMap. The key of the subMap is a term + * text in the user query and the value is a subsequent QueryPhraseMap. + * If the query is a term (not phrase), then the subsequent QueryPhraseMap + * is marked as terminal. If the query is a phrase, then the subsequent QueryPhraseMap + * is not a terminal and it has the next term text in the phrase.

+ * + *

From the sample user query, the following QueryPhraseMap + * will be generated:

+ *
+ * QueryPhraseMap
+ * +--------+-+  +-------+-+
+ * |"Lucene"|o+->|boost=2|*|  * : terminal
+ * +--------+-+  +-------+-+
+ * 
+ * +--------+-+  +---------+-+  +-------+------+-+
+ * |"search"|o+->|"library"|o+->|boost=1|slop=1|*|
+ * +--------+-+  +---------+-+  +-------+------+-+
+ * 
+ * + *

Step 2.

+ *

In Step 2, Fast Vector Highlighter generates {@link org.apache.lucene.search.vectorhighlight.FieldTermStack}. Fast Vector Highlighter uses term vector data + * (must be stored {@link org.apache.lucene.document.FieldType#setStoreTermVectorOffsets(boolean)} and {@link org.apache.lucene.document.FieldType#setStoreTermVectorPositions(boolean)}) + * to generate it. FieldTermStack keeps the terms in the user query. + * Therefore, in this sample case, Fast Vector Highlighter generates the following FieldTermStack:

+ *
+ * FieldTermStack
+ * +------------------+
+ * |"Lucene"(0,6,0)   |
+ * +------------------+
+ * |"search"(12,18,3) |
+ * +------------------+
+ * |"library"(26,33,5)|
+ * +------------------+
+ * where : "termText"(startOffset,endOffset,position)
+ * 
+ *

Step 3.

+ *

In Step 3, Fast Vector Highlighter generates {@link org.apache.lucene.search.vectorhighlight.FieldPhraseList} + * by reference to QueryPhraseMap and FieldTermStack.

+ *
+ * FieldPhraseList
+ * +----------------+-----------------+---+
+ * |"Lucene"        |[(0,6)]          |w=2|
+ * +----------------+-----------------+---+
+ * |"search library"|[(12,18),(26,33)]|w=1|
+ * +----------------+-----------------+---+
+ * 
+ *

The type of each entry is WeightedPhraseInfo that consists of + * an array of terms offsets and weight. + *

+ *

Step 4.

+ *

In Step 4, Fast Vector Highlighter creates FieldFragList by reference to + * FieldPhraseList. In this sample case, the following + * FieldFragList will be generated:

+ *
+ * FieldFragList
+ * +---------------------------------+
+ * |"Lucene"[(0,6)]                  |
+ * |"search library"[(12,18),(26,33)]|
+ * |totalBoost=3                     |
+ * +---------------------------------+
+ * 
+ * + *

+ * The calculation for each FieldFragList.WeightedFragInfo.totalBoost (weight) + * depends on the implementation of FieldFragList.add( ... ): + *

+ *   public void add( int startOffset, int endOffset, List<WeightedPhraseInfo> phraseInfoList ) {
+ *     float totalBoost = 0;
+ *     List<SubInfo> subInfos = new ArrayList<SubInfo>();
+ *     for( WeightedPhraseInfo phraseInfo : phraseInfoList ){
+ *       subInfos.add( new SubInfo( phraseInfo.getText(), phraseInfo.getTermsOffsets(), phraseInfo.getSeqnum() ) );
+ *       totalBoost += phraseInfo.getBoost();
+ *     }
+ *     getFragInfos().add( new WeightedFragInfo( startOffset, endOffset, subInfos, totalBoost ) );
+ *   }
+ *   
+ * 
+ * The used implementation of FieldFragList is noted in BaseFragListBuilder.createFieldFragList( ... ): + *
+ *   public FieldFragList createFieldFragList( FieldPhraseList fieldPhraseList, int fragCharSize ){
+ *     return createFieldFragList( fieldPhraseList, new SimpleFieldFragList( fragCharSize ), fragCharSize );
+ *   }
+ * 
+ *

+ * Currently there are basically to approaches available: + *

+ * + *

Comparison of the two approaches:

+ * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + * + *
+ * query = das alte testament (The Old Testament) + *
Terms in fragmentsum-of-distinct-weightssum-of-boosts
das alte testament5.3396213.0
das alte testament5.3396213.0
das testament alte5.3396213.0
das alte testament5.3396213.0
das testament2.94556882.0
das alte2.47595952.0
das das das das1.50153574.0
das das das1.30036813.0
das das1.0617462.0
alte1.01.0
alte1.01.0
das0.75076781.0
das0.75076781.0
das0.75076781.0
das0.75076781.0
das0.75076781.0
+ * + *

Step 5.

+ *

In Step 5, by using FieldFragList and the field stored data, + * Fast Vector Highlighter creates highlighted snippets!

+ */ +package org.apache.lucene.search.vectorhighlight; diff --git a/lucene/highlighter/src/java/org/apache/lucene/search/vectorhighlight/package.html b/lucene/highlighter/src/java/org/apache/lucene/search/vectorhighlight/package.html deleted file mode 100644 index f8f17414f8c..00000000000 --- a/lucene/highlighter/src/java/org/apache/lucene/search/vectorhighlight/package.html +++ /dev/null @@ -1,196 +0,0 @@ - - - - -This is an another highlighter implementation. - -

Features

- - -

Algorithm

-

To explain the algorithm, let's use the following sample text - (to be highlighted) and user query:

- - - - - - - - - - -
Sample TextLucene is a search engine library.
User QueryLucene^2 OR "search library"~1
- -

The user query is a BooleanQuery that consists of TermQuery("Lucene") -with boost of 2 and PhraseQuery("search library") with slop of 1.

-

For your convenience, here is the offsets and positions info of the -sample text.

- -
-+--------+-----------------------------------+
-|        |          1111111111222222222233333|
-|  offset|01234567890123456789012345678901234|
-+--------+-----------------------------------+
-|document|Lucene is a search engine library. |
-+--------*-----------------------------------+
-|position|0      1  2 3      4      5        |
-+--------*-----------------------------------+
-
- -

Step 1.

-

In Step 1, Fast Vector Highlighter generates {@link org.apache.lucene.search.vectorhighlight.FieldQuery.QueryPhraseMap} from the user query. -QueryPhraseMap consists of the following members:

-
-public class QueryPhraseMap {
-  boolean terminal;
-  int slop;   // valid if terminal == true and phraseHighlight == true
-  float boost;  // valid if terminal == true
-  Map<String, QueryPhraseMap> subMap;
-} 
-
-

QueryPhraseMap has subMap. The key of the subMap is a term -text in the user query and the value is a subsequent QueryPhraseMap. -If the query is a term (not phrase), then the subsequent QueryPhraseMap -is marked as terminal. If the query is a phrase, then the subsequent QueryPhraseMap -is not a terminal and it has the next term text in the phrase.

- -

From the sample user query, the following QueryPhraseMap -will be generated:

-
-   QueryPhraseMap
-+--------+-+  +-------+-+
-|"Lucene"|o+->|boost=2|*|  * : terminal
-+--------+-+  +-------+-+
-
-+--------+-+  +---------+-+  +-------+------+-+
-|"search"|o+->|"library"|o+->|boost=1|slop=1|*|
-+--------+-+  +---------+-+  +-------+------+-+
-
- -

Step 2.

-

In Step 2, Fast Vector Highlighter generates {@link org.apache.lucene.search.vectorhighlight.FieldTermStack}. Fast Vector Highlighter uses term vector data -(must be stored {@link org.apache.lucene.document.FieldType#setStoreTermVectorOffsets(boolean)} and {@link org.apache.lucene.document.FieldType#setStoreTermVectorPositions(boolean)}) -to generate it. FieldTermStack keeps the terms in the user query. -Therefore, in this sample case, Fast Vector Highlighter generates the following FieldTermStack:

-
-   FieldTermStack
-+------------------+
-|"Lucene"(0,6,0)   |
-+------------------+
-|"search"(12,18,3) |
-+------------------+
-|"library"(26,33,5)|
-+------------------+
-where : "termText"(startOffset,endOffset,position)
-
-

Step 3.

-

In Step 3, Fast Vector Highlighter generates {@link org.apache.lucene.search.vectorhighlight.FieldPhraseList} -by reference to QueryPhraseMap and FieldTermStack.

-
-   FieldPhraseList
-+----------------+-----------------+---+
-|"Lucene"        |[(0,6)]          |w=2|
-+----------------+-----------------+---+
-|"search library"|[(12,18),(26,33)]|w=1|
-+----------------+-----------------+---+
-
-

The type of each entry is WeightedPhraseInfo that consists of -an array of terms offsets and weight. -

-

Step 4.

-

In Step 4, Fast Vector Highlighter creates FieldFragList by reference to -FieldPhraseList. In this sample case, the following -FieldFragList will be generated:

-
-   FieldFragList
-+---------------------------------+
-|"Lucene"[(0,6)]                  |
-|"search library"[(12,18),(26,33)]|
-|totalBoost=3                     |
-+---------------------------------+
-
- -

-The calculation for each FieldFragList.WeightedFragInfo.totalBoost (weight) -depends on the implementation of FieldFragList.add( ... ): -

-  public void add( int startOffset, int endOffset, List<WeightedPhraseInfo> phraseInfoList ) {
-    float totalBoost = 0;
-    List<SubInfo> subInfos = new ArrayList<SubInfo>();
-    for( WeightedPhraseInfo phraseInfo : phraseInfoList ){
-      subInfos.add( new SubInfo( phraseInfo.getText(), phraseInfo.getTermsOffsets(), phraseInfo.getSeqnum() ) );
-      totalBoost += phraseInfo.getBoost();
-    }
-    getFragInfos().add( new WeightedFragInfo( startOffset, endOffset, subInfos, totalBoost ) );
-  }
-  
-
-The used implementation of FieldFragList is noted in BaseFragListBuilder.createFieldFragList( ... ): -
-  public FieldFragList createFieldFragList( FieldPhraseList fieldPhraseList, int fragCharSize ){
-    return createFieldFragList( fieldPhraseList, new SimpleFieldFragList( fragCharSize ), fragCharSize );
-  }
-
-

-Currently there are basically to approaches available: -

- -

Comparison of the two approaches:

- - - - - - - - - - - - - - - - - - - -
- query = das alte testament (The Old Testament) -
Terms in fragmentsum-of-distinct-weightssum-of-boosts
das alte testament5.3396213.0
das alte testament5.3396213.0
das testament alte5.3396213.0
das alte testament5.3396213.0
das testament2.94556882.0
das alte2.47595952.0
das das das das1.50153574.0
das das das1.30036813.0
das das1.0617462.0
alte1.01.0
alte1.01.0
das0.75076781.0
das0.75076781.0
das0.75076781.0
das0.75076781.0
das0.75076781.0
- -

Step 5.

-

In Step 5, by using FieldFragList and the field stored data, -Fast Vector Highlighter creates highlighted snippets!

- -