Commit Graph

34940 Commits

Author SHA1 Message Date
Dawid Weiss 4f5389bfa8 Flush output on javadoc emitting a failure. 2021-03-12 11:39:40 +01:00
Tomoko Uchida 7478b3fc17 LUCENE-9834: Adjast logo/colors in the Luke About dialog 2021-03-12 11:00:10 +09:00
Dawid Weiss 8bbcc39583 Always include errorprone dependency, even if we're not checking. This ensures consistent use patterns across JVMs. 2021-03-11 22:27:25 +01:00
Peter Gromov e784721e69
LUCENE-9833: Hunspell: AssertionError in WordStorage.lookupWord (#13) 2021-03-11 10:10:57 -05:00
Peter Gromov efa88a1790
LUCENE-9832: Hunspell: SIOOBE in GeneratingSuggester.expandRoot (#12) 2021-03-11 10:09:11 -05:00
Robert Muir 1b36406ec4
LUCENE-9827: Speed up merging of stored fields and term vectors for small segments
Stored Fields and Term Vectors are block-compressed. Decompressing and
recompressing all the documents on every merge is too slow, so we try to
avoid doing it unless it will actually improve the compression ratio. If
we can get away with it, we just bulk-copy existing compressed blocks to
the new segment.

Previously, small segments would always be considered dirty and
recompressed... the special optimized bulk merge wouldn't kick in until
segments were relatively large. But as block size and ratio (shared
dictionaries etc) have increased, "relatively large" has become a much
bigger number.

So try to avoid doing wasted work: if there's only 1 dirty chunk
(incompletely filled compression block), then don't recompress: it will
likely only give us 1 dirty chunk as a result, at the expense of cpu.

Require at least 2 dirty chunks to recompress: this way the
recompression actually buys us something (reduces 2 to 1).

The change also means that bulk merge will now happen often in
the unit test suite, increasing coverage.
2021-03-11 09:30:40 -05:00
Mike McCandless 12999d30f2 LUCENE-9791: add CHANGES.txt entry 2021-03-11 08:09:23 -05:00
pawel-bugalski-dynatrace 6367cd1b74
LUCENE-9791 Allow calling BytesRefHash#find concurrently (#8)
Removes `scratch1` field in `BytesRefHash` by accessing underlying bytes pool directly
in `equals` method. As a result it is now possible to call `BytesRefHash#find`
concurrently as long as there are no concurrent modifications to BytesRefHash instance
and it is correctly published.

This addresses the concurrency issue with Monitor (aka Luwak) since it
is using `BytesRefHash#find` concurrently without additional synchronization.
2021-03-11 08:06:03 -05:00
Uwe Schindler dcb52acd7d Fix badge URL after Jenkins job rename 2021-03-10 23:52:29 +01:00
Dawid Weiss 7f5e660395
LUCENE-9375: some build file cleanups. (#10) 2021-03-10 21:47:37 +01:00
Robert Muir 2892ef4ca0
LUCENE-9802: switch to new logo (#9)
Replace logo used in generated documentation
Replace logo used by luke (about box)
Add logo to README.md
2021-03-10 15:28:52 -05:00
Uwe Schindler 5ade66059f
Cleanup readme file, doaps and copy build instructions from lucene subfolder (#6) 2021-03-10 16:10:06 +01:00
Dawid Weiss ee4871f24a make gradlew mavenToLocalRepo work. 2021-03-10 13:03:48 +01:00
Dawid Weiss 2b9f5bb537 titles for github 2021-03-10 12:51:06 +01:00
Dawid Weiss f5012a4cda
Merge pull request #2 from dweiss/revTrie
LUCENE-9825: Hunspell: reverse the "words" trie for faster word lookup/suggestions (lucene repo)
2021-03-10 12:30:03 +01:00
Dawid Weiss 44833dc575
Merge pull request #1 from dweiss/LUCENE-9375
LUCENE-9375: check github actions (merge PR).
2021-03-10 11:59:13 +01:00
Dawid Weiss 4ab4ab1e67 LUCENE-9375: check gh actions 2021-03-10 11:33:47 +01:00
Dawid Weiss fdf486ba54 LUCENE-9375: post-repo-split removal of solr counterpart. 2021-03-10 11:20:08 +01:00
Andrzej Bialecki 7ada403218 SOLR-14749: Make sure the plugin config is reloaded on Overseer. 2021-03-09 16:58:29 +01:00
Ignacio Vera 578b2aea8f
LUCENE-9580: Fix bug in the polygon tessellator when introducing collinear edges during polygon splitting (#2452) 2021-03-09 08:50:58 +01:00
Dawid Weiss 8969225bd2 LUCENE-8626: correct test suite name. 2021-03-09 08:33:05 +01:00
Ignacio Vera 144ef2a0c0
LUCENE-9705: Create Lucene90StoredFieldsFormat (#2444) 2021-03-09 08:11:59 +01:00
David Smiley cf1025e576
SOLR-2852: SolrJ: remove Woodstox dependency (#2461)
It was never truly required there.
Pervasive use of "javabin" reduces the need to care about client-side XML speed.  Better to reduce dependencies and let clients use the libs they want.
2021-03-09 00:27:03 -05:00
Christine Poerschke 419db23041
LUCENE-8626: enforce name standardisation for org.apache.lucene tests (#2441)
Co-authored-by: Dawid Weiss <dweiss@apache.org>
2021-03-08 15:30:59 +00:00
Jan Høydahl 605d3a00bb
SOLR-15163 Update DOAP file for solr TLP (#2464) 2021-03-08 15:48:17 +01:00
Dawid Weiss b591daad38 SOLR-14759: correct build logic. 2021-03-08 15:04:20 +01:00
Dawid Weiss 409bc37c13
SOLR-14759: a few initial changes so that Lucene can be built independently while Solr code is still in place. (#2448) 2021-03-08 14:59:08 +01:00
Mike Drob 408b3775dd
SOLR-14759 fix tests that need on lucene test-src (#2462)
Rewrite one, ignore the other two.
2021-03-08 14:32:40 +01:00
Christine Poerschke d53b3da0ea
LUCENE-8626: standardise 3 more Lucene test names (#2440) 2021-03-08 13:21:52 +00:00
Peter Gromov 4959886c25 fix minor review comments 2021-03-08 11:10:23 +01:00
Peter Gromov f9cd8e5c80 fix processAllWords, add a test 2021-03-08 11:09:24 +01:00
Peter Gromov e28b50bae8 add/fix WordStorage comments 2021-03-08 11:09:19 +01:00
Peter Gromov e69390b268 don't lookup empty stems after stripping the whole word 2021-03-08 11:09:12 +01:00
Peter Gromov 469cfc67d4 fix lookupWord false positive 2021-03-08 11:09:05 +01:00
Joel Bernstein e9ddaaca51 SOLR-15193: Fix typo 2021-03-07 20:37:07 -05:00
Joel Bernstein 140c37eb0f SOLR-15193: Improve maxDocFreq docs 2021-03-07 20:31:21 -05:00
Tomoko Uchida 606cea94d7 LUCENE-9322: trivial fix in documentation. 2021-03-07 21:54:58 +09:00
Tomas Fernandez Lobbe 03aec55f1e SOLR-15154: Close Reader used for credentials file 2021-03-05 21:38:33 -08:00
Houston Putman 895deb89e6
Install ACL package for Solr Docker tests Github action (#2463)
ACL is no longer provided by default in Ubuntu 20.04

Other changes:
- Made tests easier to debug
- Removed two inconsequential lines from the Dockerfile
2021-03-05 18:29:39 -05:00
Tomas Fernandez Lobbe fe33a436a0
SOLR-15154: Let Http2SolrClient pass Basic Auth credentials to all requests (#2445)
Credentials can now be set explicitly at the client level, or can be read from System properties like in the previous version of the client when using PreemptiveBasicAuthClientBuilderFactory. Other implementations of HttpClientBuilderFactory can now also be used.
2021-03-05 10:51:22 -08:00
David Smiley f36a867bd0
SOLR-15219: Fix TestPointFields integer overflow (#2460)
And also restore it's getRandomInts(..,..,bound) semantics to what it was -- positive or negative random values.
2021-03-05 13:42:13 -05:00
Peter Gromov 7b048c5610 make fields private 2021-03-05 16:21:19 +01:00
Peter Gromov fb9805f4d3 update comment 2021-03-05 16:17:04 +01:00
Peter Gromov 4842e0c9ca LUCENE-9825: Hunspell: reverse the "words" trie for faster word lookup/suggestions 2021-03-05 16:06:48 +01:00
Peter Gromov 99a4bbf3a0
LUCENE-9824: Hunspell suggestions: speed up ngram score calculation for each dictionary entry (#2457) 2021-03-05 10:00:02 -05:00
Joel Bernstein 6e67b9f959 SOLR-15193: Fix wording 2021-03-05 09:17:44 -05:00
Joel Bernstein 36386f4832 SOLR-15193: Fix wording 2021-03-05 09:14:48 -05:00
Joel Bernstein eb0c04b752 SOLR-15193: Add maxDocFreq docs 2021-03-05 09:11:59 -05:00
David Smiley ddbd3b88ec
SOLR-15185: Optimize Hash QParser (#1524)
used in parallel() streaming expression.  Hash algorithm is different.
* Simpler
* Don't use Filter (to be removed)
* Do use TwoPhaseIterator, not PostFilter
* Don't pre-compute matching docs (wasteful)
* Support more fields, and more field types
* Faster hash on Strings (avoid Char conversion)
* Stronger hash when using multiple fields
2021-03-04 23:43:16 -05:00
Robert Muir 8e337ab63f
LUCENE-9822: Assert that ForUtil.BLOCK_SIZE can be PFOR-encoded in a single byte
For/PFor code has BLOCK_SIZE=128 as a static final constant, with a lot
of assumptions and optimizations for that case. For example it will
encode 3 exceptions at most and optimizes the exception encoding with a
single byte.

This would not work at all if you changed the constant in the code to
something like 512, but an assertion at an early stage helps make
experimentation less painful, and better "documents" the assumption of how
the exception encoding currently works.
2021-03-04 18:59:03 -05:00