Commit Graph

35097 Commits

Author SHA1 Message Date
Dawid Weiss f8040c0ecf LUCENE-9650: errorprone plugin doesn't work on jdk16. A different workaround that keeps the dependency. 2021-03-15 10:19:27 +01:00
Peter Gromov cdff0accaa
Hunspell suggestions: speed up for some non-Latin scripts (#19) 2021-03-15 05:02:45 -04:00
Peter Gromov 8913a98379
LUCENE-9830: Hunspell: store word length for faster dictionary lookup/enumeration (#3) 2021-03-15 00:35:25 -04:00
Peter Gromov 42c6f780bf
LUCENE-9831: Hunspell GeneratingSuggester: faster flag & case checks, less allocations (#4) 2021-03-15 00:32:08 -04:00
Robert Muir d48193e8cf
LUCENE-9837: try to improve performance of VectorUtil.dotProduct (#17)
More loop unrolling for VectorUtil.dotProduct to eek out a bit more short-term performance.
2021-03-14 23:16:08 -04:00
Robert Muir f3a284ad83
LUCENE-9796: Fix SortedDocValues to no longer extend BinaryDocValues
SortedDocValues do not have a per-document binary value, they have a
per-document numeric `ordValue()`. The ordinal can then be dereferenced
to its binary form with `lookupOrd()`, but it was a performance trap to
implement a `binaryValue()` on the SortedDocValues api that does this
behind-the-scenes on every document.

You can replace calls of `binaryValue()` with `lookupOrd(ordValue())`
as a "quick fix", but it is better to use the ordinal alone
(integer-based datastructures) for per-document access, and only call
lookupOrd() a few times at the end (e.g. for the hits you want to display).
Otherwise, if you really don't want per-document ordinals, but instead a
per-document `byte[]`, use a BinaryDocValues field.

This change only addresses the API (slow `binaryValue()` trap), but
doesn't yet fix any slow algorithms that were discovered in the process,
so it doesn't yield any performance improvements.
2021-03-14 23:07:48 -04:00
Tomoko Uchida 471f38c031 LUCENE-9834: goodbye old friend - the classic luke logo 2021-03-12 23:17:32 +09:00
Dawid Weiss 4f5389bfa8 Flush output on javadoc emitting a failure. 2021-03-12 11:39:40 +01:00
Tomoko Uchida 7478b3fc17 LUCENE-9834: Adjast logo/colors in the Luke About dialog 2021-03-12 11:00:10 +09:00
Dawid Weiss 8bbcc39583 Always include errorprone dependency, even if we're not checking. This ensures consistent use patterns across JVMs. 2021-03-11 22:27:25 +01:00
Peter Gromov e784721e69
LUCENE-9833: Hunspell: AssertionError in WordStorage.lookupWord (#13) 2021-03-11 10:10:57 -05:00
Peter Gromov efa88a1790
LUCENE-9832: Hunspell: SIOOBE in GeneratingSuggester.expandRoot (#12) 2021-03-11 10:09:11 -05:00
Robert Muir 1b36406ec4
LUCENE-9827: Speed up merging of stored fields and term vectors for small segments
Stored Fields and Term Vectors are block-compressed. Decompressing and
recompressing all the documents on every merge is too slow, so we try to
avoid doing it unless it will actually improve the compression ratio. If
we can get away with it, we just bulk-copy existing compressed blocks to
the new segment.

Previously, small segments would always be considered dirty and
recompressed... the special optimized bulk merge wouldn't kick in until
segments were relatively large. But as block size and ratio (shared
dictionaries etc) have increased, "relatively large" has become a much
bigger number.

So try to avoid doing wasted work: if there's only 1 dirty chunk
(incompletely filled compression block), then don't recompress: it will
likely only give us 1 dirty chunk as a result, at the expense of cpu.

Require at least 2 dirty chunks to recompress: this way the
recompression actually buys us something (reduces 2 to 1).

The change also means that bulk merge will now happen often in
the unit test suite, increasing coverage.
2021-03-11 09:30:40 -05:00
Mike McCandless 12999d30f2 LUCENE-9791: add CHANGES.txt entry 2021-03-11 08:09:23 -05:00
pawel-bugalski-dynatrace 6367cd1b74
LUCENE-9791 Allow calling BytesRefHash#find concurrently (#8)
Removes `scratch1` field in `BytesRefHash` by accessing underlying bytes pool directly
in `equals` method. As a result it is now possible to call `BytesRefHash#find`
concurrently as long as there are no concurrent modifications to BytesRefHash instance
and it is correctly published.

This addresses the concurrency issue with Monitor (aka Luwak) since it
is using `BytesRefHash#find` concurrently without additional synchronization.
2021-03-11 08:06:03 -05:00
Uwe Schindler dcb52acd7d Fix badge URL after Jenkins job rename 2021-03-10 23:52:29 +01:00
Dawid Weiss 7f5e660395
LUCENE-9375: some build file cleanups. (#10) 2021-03-10 21:47:37 +01:00
Robert Muir 2892ef4ca0
LUCENE-9802: switch to new logo (#9)
Replace logo used in generated documentation
Replace logo used by luke (about box)
Add logo to README.md
2021-03-10 15:28:52 -05:00
Uwe Schindler 5ade66059f
Cleanup readme file, doaps and copy build instructions from lucene subfolder (#6) 2021-03-10 16:10:06 +01:00
Dawid Weiss ee4871f24a make gradlew mavenToLocalRepo work. 2021-03-10 13:03:48 +01:00
Dawid Weiss 2b9f5bb537 titles for github 2021-03-10 12:51:06 +01:00
Dawid Weiss f5012a4cda
Merge pull request #2 from dweiss/revTrie
LUCENE-9825: Hunspell: reverse the "words" trie for faster word lookup/suggestions (lucene repo)
2021-03-10 12:30:03 +01:00
Dawid Weiss 44833dc575
Merge pull request #1 from dweiss/LUCENE-9375
LUCENE-9375: check github actions (merge PR).
2021-03-10 11:59:13 +01:00
Dawid Weiss 4ab4ab1e67 LUCENE-9375: check gh actions 2021-03-10 11:33:47 +01:00
Dawid Weiss fdf486ba54 LUCENE-9375: post-repo-split removal of solr counterpart. 2021-03-10 11:20:08 +01:00
Andrzej Bialecki 7ada403218 SOLR-14749: Make sure the plugin config is reloaded on Overseer. 2021-03-09 16:58:29 +01:00
Ignacio Vera 578b2aea8f
LUCENE-9580: Fix bug in the polygon tessellator when introducing collinear edges during polygon splitting (#2452) 2021-03-09 08:50:58 +01:00
Dawid Weiss 8969225bd2 LUCENE-8626: correct test suite name. 2021-03-09 08:33:05 +01:00
Ignacio Vera 144ef2a0c0
LUCENE-9705: Create Lucene90StoredFieldsFormat (#2444) 2021-03-09 08:11:59 +01:00
David Smiley cf1025e576
SOLR-2852: SolrJ: remove Woodstox dependency (#2461)
It was never truly required there.
Pervasive use of "javabin" reduces the need to care about client-side XML speed.  Better to reduce dependencies and let clients use the libs they want.
2021-03-09 00:27:03 -05:00
Christine Poerschke 419db23041
LUCENE-8626: enforce name standardisation for org.apache.lucene tests (#2441)
Co-authored-by: Dawid Weiss <dweiss@apache.org>
2021-03-08 15:30:59 +00:00
Jan Høydahl 605d3a00bb
SOLR-15163 Update DOAP file for solr TLP (#2464) 2021-03-08 15:48:17 +01:00
Dawid Weiss b591daad38 SOLR-14759: correct build logic. 2021-03-08 15:04:20 +01:00
Dawid Weiss 409bc37c13
SOLR-14759: a few initial changes so that Lucene can be built independently while Solr code is still in place. (#2448) 2021-03-08 14:59:08 +01:00
Mike Drob 408b3775dd
SOLR-14759 fix tests that need on lucene test-src (#2462)
Rewrite one, ignore the other two.
2021-03-08 14:32:40 +01:00
Christine Poerschke d53b3da0ea
LUCENE-8626: standardise 3 more Lucene test names (#2440) 2021-03-08 13:21:52 +00:00
Peter Gromov 4959886c25 fix minor review comments 2021-03-08 11:10:23 +01:00
Peter Gromov f9cd8e5c80 fix processAllWords, add a test 2021-03-08 11:09:24 +01:00
Peter Gromov e28b50bae8 add/fix WordStorage comments 2021-03-08 11:09:19 +01:00
Peter Gromov e69390b268 don't lookup empty stems after stripping the whole word 2021-03-08 11:09:12 +01:00
Peter Gromov 469cfc67d4 fix lookupWord false positive 2021-03-08 11:09:05 +01:00
Joel Bernstein e9ddaaca51 SOLR-15193: Fix typo 2021-03-07 20:37:07 -05:00
Joel Bernstein 140c37eb0f SOLR-15193: Improve maxDocFreq docs 2021-03-07 20:31:21 -05:00
Tomoko Uchida 606cea94d7 LUCENE-9322: trivial fix in documentation. 2021-03-07 21:54:58 +09:00
Tomas Fernandez Lobbe 03aec55f1e SOLR-15154: Close Reader used for credentials file 2021-03-05 21:38:33 -08:00
Houston Putman 895deb89e6
Install ACL package for Solr Docker tests Github action (#2463)
ACL is no longer provided by default in Ubuntu 20.04

Other changes:
- Made tests easier to debug
- Removed two inconsequential lines from the Dockerfile
2021-03-05 18:29:39 -05:00
Tomas Fernandez Lobbe fe33a436a0
SOLR-15154: Let Http2SolrClient pass Basic Auth credentials to all requests (#2445)
Credentials can now be set explicitly at the client level, or can be read from System properties like in the previous version of the client when using PreemptiveBasicAuthClientBuilderFactory. Other implementations of HttpClientBuilderFactory can now also be used.
2021-03-05 10:51:22 -08:00
David Smiley f36a867bd0
SOLR-15219: Fix TestPointFields integer overflow (#2460)
And also restore it's getRandomInts(..,..,bound) semantics to what it was -- positive or negative random values.
2021-03-05 13:42:13 -05:00
Peter Gromov 7b048c5610 make fields private 2021-03-05 16:21:19 +01:00
Peter Gromov fb9805f4d3 update comment 2021-03-05 16:17:04 +01:00