lucene

Commit Graph

Author	SHA1	Message	Date
Dawid Weiss	c5cf13259e	LUCENE-9562: All binary analysis packages (and corresponding Maven artifacts) with names containing '-analyzers-' have been renamed to '-analysis-'. (#1968 )	2020-10-12 09:15:56 +02:00
Dawid Weiss	7362c4ce60	LUCENE-6831: start removing LinkedList in favor of ArrayList or De/Queues (#1969 ) I'm committing it in, seems like a trivial thing.	2020-10-12 09:15:07 +02:00
Jason Gerlowski	9e13d99c52	Add back-compat indices for 8.6.3	2020-10-09 11:39:20 -04:00
msfroh	4e0aa0d23b	LUCENE-9567: JPOSSFF loads built-in stop tags by default (#1961 ) load stoptags.txt from analysis-kuromoji when no tags argument is specified	2020-10-09 10:52:07 -04:00
Jason Gerlowski	76a8cc3c3e	Add bugfix version 8.6.3	2020-10-09 10:27:42 -04:00
Uwe Schindler	2329423e5c	LUCENE-9577: Move Lucene/Solr Documentation assembly to subproject (#1967 )	2020-10-09 14:56:44 +02:00
Mike Drob	08e38d3452	LUCENE-9488 Create Release Artifacts with Gradle (#1905 ) * Build Lucene binary distribution using Gradle * Generate SHA-512 checksums for all release artifacts * Update documentation artifacts included in binaries * Delete some additional Ant relics Co-authored-by: Dawid Weiss <dawid.weiss@carrotsearch.com> Co-authored-by: Uwe Schindler <uschindler@apache.org>	2020-10-08 14:25:51 -05:00
Mayya Sharipova	0b08943112	LUCENE-9566 TestApproximationSearchEquivalence.testExclusion fix (#1955 )	2020-10-07 07:02:46 -04:00
Mayya Sharipova	5039e7170b	Mute TestApproximationSearchEquivalence.testExclusion Temporarily mute TestApproximationSearchEquivalence.testExclusion	2020-10-06 15:10:48 -04:00
Mayya Sharipova	6b8288445f	LUCENE-9541 ConjunctionDISI sub-iterators check (#1937 ) * LUCENE-9541 ConjunctionDISI sub-iterators check Ensure sub-iterators of a conjunction iterator are on the same doc.	2020-10-06 13:23:01 -04:00
Mayya Sharipova	874c446ab9	LUCENE-9565 Fix competitive iteration (#1952 ) PR #1351 introduced a sort optimization where documents can be skipped. But iteration over competitive iterators was not properly organized, as they were not storing the current docID, and when competitive iterator was updated the current doc ID was lost. This patch fixed it. Relates to #1351	2020-10-06 13:22:16 -04:00
Mayya Sharipova	6ac94a6f9f	LUCENE-9555: Advance conjunction Iterator for two phase iteration (#1943 ) PR #1351 introduced a sort optimization where documents can be skipped. But there was a bug in case we were using two phase approximation, as we would advance it without advancing an overall conjunction iterator. This patch fixed it. Relates to #1351	2020-10-06 09:22:42 -04:00
Mayya Sharipova	e325f66e61	Revert "LUCENE-9541 ConjunctionDISI sub-iterators check (#1937 )" This reverts commit `5f34acfdb5`.	2020-10-05 10:55:25 -04:00
Mayya Sharipova	5f34acfdb5	LUCENE-9541 ConjunctionDISI sub-iterators check (#1937 ) * LUCENE-9541 ConjunctionDISI sub-iterators check Ensure sub-iterators of a conjunction iterator are on the same doc.	2020-10-05 09:38:17 -04:00
Tomoko Uchida	b70eaeee5a	LUCENE-9558: Clean up package name conflicts for analyzers-icu. (#1946 )	2020-10-05 17:52:23 +09:00
iverase	0864b39a11	make sure we don't build circles with zero radius in ShapeTestUtil	2020-10-05 10:45:31 +02:00
Adrien Grand	1038fe8bee	Fix rare test failure. This test fails when the maximum segment size is only one byte larger than the min segment size.	2020-10-05 09:58:18 +02:00
Erick Erickson	f6c4f8a755	SOLR-14910: Use in-line tags for logger declarations in Gradle ValidateLogCalls that are non-standard, change //logok to //nowarn	2020-10-03 09:47:37 -04:00
Nhat Nguyen	7e04e4d0ca	LUCENE-9554: Expose IndexWriter#pendingNumDocs (#1941 ) Some applications can use the pendingNumDocs from IndexWriter to estimate that the number of documents of an index is very close to the hard limit so that it can reject writes without constructing Lucene documents.	2020-10-02 17:32:20 -04:00
David Smiley	2aa51fe77c	LUCENE-9032: BaseTokenStreamTestCase minor... * make checkResetException() public * one assertAnalyzesTo variant should be calling checkAnalysisConsistency (only used by OpenNLP tests now)	2020-10-01 23:43:03 -04:00
David Smiley	0303063e12	LUCENE-9458: WDGF should tie-break by endOffset (#1740 ) Can happen with catenateAll and not generating word xor number part when the input ends with the non-generated sub-token. Fuzzing revealed that only start & end offsets are needed to order sub-tokens.	2020-10-01 22:27:45 -04:00
goankur	2e2161b0e0	LUCENE-9444: Improve test coverage for TaxonomyFacetLabels (#1928 ) Co-authored-by: Ankur Goel <goankur@amazon.com>	2020-09-30 13:21:18 -04:00
Munendra S N	b9c7f50b6e	LUCENE-9401: include field in the complex pharse query's toString	2020-09-29 19:28:01 +05:30
Dawid Weiss	65a62b04c5	Remove unused imports.	2020-09-29 10:24:17 +02:00
Mike McCandless	98a49ed18d	LUCENE-9444: add CHANGES.txt entry	2020-09-28 13:01:22 -04:00
goankur	24aadc220b	LUCENE-9444: add utility class to retrieve facet labels from the taxonomy index for a facet field (#1893 ) LUCENE-9444: add utility class to retrieve facet labels from the taxonomy index for a facet field so such fields do not also have to be redundantly stored in the index. Co-authored-by: Ankur Goel <goankur@amazon.com>	2020-09-28 10:55:37 -04:00
Adrien Grand	fc6d0a40dc	LUCENE-9317: Remove unused imports.	2020-09-28 16:29:36 +02:00
Adrien Grand	c3f97fbdc1	Compute RAM usage ByteBuffersDataOutput on the fly. (#1919 ) This helps remove the assumption that all blocks have the same size.	2020-09-28 15:08:08 +02:00
Namgyu Kim	00d7f5ea68	LUCENE-9544: Port Nori dictionary compilation (#1926 )	2020-09-28 20:28:21 +09:00
Tomoko Uchida	5e617ccc33	LUCENE-9317: Clean up split package in analyzers-common (#1836 )	2020-09-28 16:49:28 +09:00
Adrien Grand	c032cd1b6f	Revert "LUCENE-9535: Reduce the size of compressed blocks of stored fields by 2x." This reverts commit `12dd19427e`.	2020-09-25 22:17:18 +02:00
Simon Willnauer	c258905bd0	LUCENE-9535: Commit DWPT bytes used before locking indexing (#1918 ) Currently we calculate the ramBytesUsed by the DWPT under the flushControl lock. We can do this calculation safely outside of the lock without any downside. The FlushControl lock should be used with care since it's a central part of indexing and might block all indexing.	2020-09-24 09:39:33 +02:00
Adrien Grand	d226abd448	LUCENE-9535: Make ByteBuffersDataOutput#ramBytesUsed run in constant-time. (#1917 )	2020-09-23 19:49:15 +02:00
Mayya Sharipova	7d90b858c2	Fix bug in sort optimization (#1903 ) Fix bug how iterator with skipping functionality advances and produces docs Relates to #1725	2020-09-23 09:09:43 -04:00
Uwe Schindler	e19239d96b	Upgrade forbiddenapis to version 3.1	2020-09-23 14:56:46 +02:00
Simon Willnauer	17c285d617	LUCENE-9539: Remove caches from SortingCodecReader (#1909 ) SortingCodecReader keeps all docvalues in memory that are loaded from this reader. Yet, this reader should only be used for merging which happens sequentially. This makes caching docvalues unnecessary. Co-authored-by: Jim Ferenczi <jim.ferenczi@elastic.co>	2020-09-23 14:21:28 +02:00
Adrien Grand	12dd19427e	LUCENE-9535: Reduce the size of compressed blocks of stored fields by 2x. In order to see whether this has any effect on nigthly benchmarks.	2020-09-23 12:22:22 +02:00
Simon Willnauer	c82b99464d	LUCENE-9539: Use more compact datastructures for sorting doc-values (#1908 ) This change cuts over from object based data-structures to primitive / compressed data-structures.	2020-09-22 15:10:53 +02:00
Simon Willnauer	208a1c07b0	LUCENE-9534: Ensure DWPT#ramBytesUsed is only called unter lock (#1889 ) Consumers of the used RAM of a DWPT should use it's committed bytesUsed value that's threadsafe.	2020-09-18 17:59:05 +02:00
Dawid Weiss	3a92e1b93e	LUCENE-9528: cleanup of flexible query parser's grammar (#1879 )	2020-09-18 09:38:20 +02:00
Dawid Weiss	5ec2bac91c	LUCENE-9531: Consolidate duplicated generated classes CharStream and FastCharStream (#1886 )	2020-09-18 08:53:30 +02:00
Ignacio Vera	fbf8e4f044	LUCENE-9523: Speed up query shapes for geometries that generate multiple points (#1866 ) In query shapes over shape fields, skip points while traversing the BKD tree when the relationship with the document is already known	2020-09-18 07:50:58 +02:00
Adrien Grand	33f7280078	LUCENE-9529: Track dirtiness of stored fields via a number of docs, not chunks. (#1882 ) The problem of tracking dirtiness via numbers of chunks is that larger chunks make stored fields readers more likely to be considered dirty, so I'm trying to work around it by tracking numbers of docs instead.	2020-09-17 18:59:08 +02:00
Adrien Grand	e0a64908d8	Further tune Lucene87StoredFieldsFormat for small documents. (#1888 ) The increase of the maximum number of chunks per doc done in previous issues was mostly random. I'd like to provide users with a similar trade-off with what the old versions of BEST_SPEED and BEST_COMPRESSION used to do. So since BEST_SPEED used to compress at most 128 docs at once, I think we should roughly make it 128*10 now since there are 10 sub blocks. I made it 1024 to account for the fact that there is a preset dict as well that need decompressing. And similarly BEST_COMPRESSION used to allow 4x more docs than BEST_SPEED, so I made it 4096. With such larger numbers of docs per chunk, the decoding of metadata became a bottleneck for stored field access so I made it a bit faster by doing bulk decoding of the packed longs.	2020-09-17 18:30:57 +02:00
Dawid Weiss	6c9d7adf79	LUCENE-9527: upgrade javacc to 7.0.4 (#1884 )	2020-09-17 13:29:18 +02:00
Dawid Weiss	4f344cb0d4	LUCENE-9530: cleaned up javacc gradle generation scripts. (#1883 ) * LUCENE-9530: cleaned up gradle javacc generation/ tweaks script so that it's consistent across runs. Removed ant remnants.	2020-09-17 10:53:02 +02:00
Adrien Grand	ad71bee016	LUCENE-9525: Better handle small documents with Lucene87StoredFieldsFormat. (#1876 ) Instead of configuring a dictionary size and a block size, the format now tries to have 10 sub blocks per bigger block, and adapts the size of the dictionary and of the sub blocks to this overall block size.	2020-09-16 13:09:00 +02:00
Adrien Grand	93094ef7e4	LUCENE-9510: Don't compress temporary stored fields and term vectors when index sorting is enabled. (#1874 ) When index sorting is enabled, stored fields and term vectors can't be written on the fly like in the normal case, so they are written into temporary files that then get resorted. For these temporary files, disabling compression speeds up indexing significantly. On a synthetic test that indexes stored fields and a doc value field populated with random values that is used for index sorting, this resulted in a 3x indexing speedup.	2020-09-16 13:05:22 +02:00
Dawid Weiss	9b9b0a6339	Fix corrupted umlaut characters. This was introduced back in 2009...	2020-09-15 19:07:30 +02:00
Mike Drob	3134f10a42	LUCENE-9488 Update release process to work with gradle (#1860 ) * Restore lucene/version.properties * Switch release wizard commands from ant to gradle equivalents * Remove remaining checks for ant * Remove checks for Java 8 * Update Copyright year * Minor bug fixes around determining next version for a major release	2020-09-15 10:10:17 -05:00

1 2 3 4 5 ...

12085 Commits