* Remove hppc dependency
* Change fork version to 0.10.0
* Add @lucene.internal
* Move hppc classes to oal.internal.hppc but export it.
* Delete hppc license since it's no longer a dependency.
---------
Co-authored-by: Dawid Weiss <dawid.weiss@carrotsearch.com>
* Skip document by docValues
*When the queue is full with only one Comparator, we could better tune the maxValueAsBytes/minValueAsBytes. For instance, if the sort is ascending and bottom value is 5, we will use a range on [MIN_VALUE, 4].
---------
Co-authored-by: Adrien Grand <jpountz@gmail.com>
This adds `LeafCollector#finish` as a per-segment post-collection hook. While
it was already possible to do this sort of things on top of the collector API
before, a downside is that the last leaf would need to be post-collected in the
current thread instead of using the executor, which is a missed opportunity for
making queries concurrent.
Add new stored fields and termvectors interfaces: IndexReader.storedFields()
and IndexReader.termVectors(). Deprecate IndexReader.document() and IndexReader.getTermVector().
The new APIs do not rely upon ThreadLocal storage for each index segment, which can greatly
reduce RAM requirements when there are many threads and/or segments.
Co-authored-by: Adrien Grand <jpountz@gmail.com>
The sort position parameter in SortField.getComparator() is only ever used
to determine whether or not skipping should be enabled on a given comparator,
so the parameter name should reflect that. This commit also explicitly disables
skipping in a number of cases where it is never used, in particular CheckIndex
and the grouping collectors.
Require consistency between data-structures on a per-field basis
A field must be indexed with the same index options and data-structures across
all documents. Thus, for example, it is not allowed to have one document
where a certain field is indexed with doc values and points, and another document
where the same field is indexed only with points.
But it is allowed for a document not to have a certain field at all.
As a consequence of this, doc values updates are
only applicable for fields that are indexed with doc values only.
Detects common cases of unreachable/dead code.
For generated javacc code, the check is disabled via
SuppressWarnings("unused") because javacc generates strange/bad code such as:
if ("" == null)
For TestStressNRTReplication's startNode() method, the check is also
disabled because analysis folds the "test evilness controls" which are
static final constants. This itself is a WTF, shouldn't we instead
randomize these evil things in our tests rather than hardcoding them to
specific values?
Enable ecj unused local variable, private instance and method detection. Allow SuppressWarnings("unused") to disable unused checks (e.g. for generated code or very special tests). Fix gradlew regenerate for python 3.9 SuppressWarnings("unused") for generated javacc and jflex code. Enable a few other easy ecj checks such as Deprecated annotation, hashcode/equals, equals across different types.
Co-authored-by: Mike McCandless <mikemccand@apache.org>
Enable ecj unused local variable, private instance and method detection. Allow SuppressWarnings("unused") to disable unused checks (e.g. for generated code or very special tests). Fix gradlew regenerate for python 3.9 SuppressWarnings("unused") for generated javacc and jflex code. Enable a few other easy ecj checks such as Deprecated annotation, hashcode/equals, equals across different types.
Co-authored-by: Mike McCandless <mikemccand@apache.org>
SortedDocValues do not have a per-document binary value, they have a
per-document numeric `ordValue()`. The ordinal can then be dereferenced
to its binary form with `lookupOrd()`, but it was a performance trap to
implement a `binaryValue()` on the SortedDocValues api that does this
behind-the-scenes on every document.
You can replace calls of `binaryValue()` with `lookupOrd(ordValue())`
as a "quick fix", but it is better to use the ordinal alone
(integer-based datastructures) for per-document access, and only call
lookupOrd() a few times at the end (e.g. for the hits you want to display).
Otherwise, if you really don't want per-document ordinals, but instead a
per-document `byte[]`, use a BinaryDocValues field.
This change only addresses the API (slow `binaryValue()` trap), but
doesn't yet fix any slow algorithms that were discovered in the process,
so it doesn't yield any performance improvements.
This has the same logic as the previous python, but no longer relies
upon parsing HTML output, instead using java's doclet processor.
The errors are reported like "normal" javadoc errors with source file
name and line number and happen when running "gradlew javadoc"
Although the "rules" are the same as the previous python, the python had
some bugs where the checker didn't quite do exactly what we wanted, so
some fixes were applied throughout.
Co-authored-by: Dawid Weiss <dawid.weiss@carrotsearch.com>
Co-authored-by: Uwe Schindler <uschindler@apache.org>
The grouping module currently allows grouping on a SortedDocValues field, or on
a ValueSource. The latter groups only on exact values, and so will not perform well
on numeric-valued fields. This commit adds the ability to group by defined ranges
from a Long or DoubleValuesSource.
The grouping module tests currently all try and test both grouping by term and
grouping by ValueSource. They are quite difficult to follow, however, and it is not
at all easy to add tests for a new grouping type. This commit adds a new
BaseGroupSelectorTestCase class which can be extended to test particular
GroupSelector implementations, and adds tests for TermGroupSelector and
ValueSourceGroupSelector. It also adds a separate test for Block grouping,
so that the distinct grouping types are tested separately.