lucene

Commit Graph

Author	SHA1	Message	Date
Dawid Weiss	3aa0676194	LUCENE-9713: apply source validation to txt files outside of src/* folders. Fix offenders. (#339 )	2021-09-30 15:13:42 +02:00
Dawid Weiss	1bb4554832	LUCENE-10135: Correct passage selector behavior for long matching snippets (#334 )	2021-09-30 15:05:41 +02:00
Chris Hegarty	797cfbf477	LUCENE-10118: Improve CMS infostream messages (#337 ) Expand the log message when CMS.MergeThread completes its merge operation, to include addition useful diagnostic information, like the total-bytes-written, the time taken, as well as rate limiter information. Also, while here, unify the thread start and end log output to help improve tracing.	2021-09-30 11:43:45 +01:00
Alan Woodward	ca810e732d	LUCENE-10138: Use maven central to resolve third-party gradle plugins (#336 ) The gradle plugin portal uses jcenter to resolve third-party plugins, which can be flaky. This commit instructs gradle to look first in maven central, and only use the plugin portal for gradle's own plugins.	2021-09-30 11:41:05 +01:00
Chris Hegarty	3e568b911f	Support addition of diagnostics by custom merge policies (#329 ) This commit adds a new `addDiagnostics` method to `SegmentInfo` that allows custom merge policies to add new diagnostic information to the segment's diagnostic map.	2021-09-30 09:50:22 +01:00
Dawid Weiss	0c13a52df5	Correct test error that allowed an empty array.	2021-09-30 09:17:29 +02:00
Dawid Weiss	d2b88b7a0b	LUCENE-10134: clean up the test from leaking threads and resources if an error occurs somewhere - this obscures the original cause of the problem.	2021-09-30 09:14:58 +02:00
Mayya Sharipova	88b264a368	LUCENE-10126 Add extra test on _doc sort (#326 ) Add extra test on _doc sort to test that search with after collects all documents	2021-09-29 14:49:16 -04:00
Adrien Grand	84e4050269	LUCENE-10125: Speed up DirectWriter. (#327 ) There was a regression introduced in https://github.com/apache/lucene/pull/107/files#diff-49b11ced76acedf749c5a5a0ff6e7fe93b8fb64caf8697e487a56f4f7adbb510 where we moved from write logic that was optimized for every number of bits per value to more general logic that had to work for every number of bits per value. This PR doesn't restore as much specialization, but some middle ground that makes flushes and merges of doc values noticeably faster (though not much faster).	2021-09-29 19:21:14 +02:00
Nhat Nguyen	e56995d85e	LUCENE-10126: Remove chunk scoring in AssertingBulkScorer Many tests are failing due to the newly introduced chunk scoring in AssertingBulkScorer. This commit reverts that change and will reintroduce it later.	2021-09-28 22:21:17 -04:00
Nhat Nguyen	cb153886eb	LUCENE-10126: Fix AssertingBulkScorer AssertingBulkScorer can generate a backward sub-range.	2021-09-28 19:50:40 -04:00
Timothy Potter	a73848cfab	DOAP changes for release 8.10.0	2021-09-28 13:28:45 -06:00
Nhat Nguyen	5ab900e10b	LUCENE-10126: Fix competitiveIterator wrongly skip documents (#324 ) The competitive iterator can wrongly skip a document that is advanced but not collected in the previous scoreRange.	2021-09-28 15:26:30 -04:00
Adrien Grand	9f80b4d8fb	LUCENE-10125: Speed up computation of exceptions. (#322 ) Even though it was not the driver for the slowdown, in LUCENE-10125 we identified that the move to PFOR had slowed down indexing significantly for fields indexed with indexOptions=DOCS. This patch gets some of the peformance back by using the `LongHeap` that we introduced for vectors instead of sorting the same array over and over again. On the NYC Taxis benchmark, I observed ~8% faster merges of postings with this change.	2021-09-28 17:25:56 +02:00
Adrien Grand	8f3f2ea4ab	LUCENE-10127: Minor speedup to doc values writes. (#325 ) This reduces a bit the overhead of writing doc values. On the NYC Taxis benchmark this resulted in ~10% faster merges for doc values.	2021-09-28 17:23:09 +02:00
Robert Muir	6ac311068f	LUCENE-10128: avoid costly reflection in SparseFixedBitSet ctor Seems that VectorFormat merge creates A LOT of these bitsets. We don't need to do any fancy reflection here via shallowSizeOf(Object), when we can call sizeOf(long[]) which is fast. We may want to revisit this RAMUsageEstimator api in the future to prevent traps like this.	2021-09-28 09:39:36 -04:00
Adrien Grand	7357bdc272	LUCENE-10123: Handling of singletons in DocValuesConsumer. (#320 ) This avoids double wrapping of doc values in `Lucene90DocValuesConsumer`.	2021-09-28 08:54:46 +02:00
Greg Miller	1ebd193fbe	Move CHANGES entry for LUCENE-10070 under 8.11 after backport (#323 )	2021-09-27 12:15:52 -07:00
Uwe Schindler	849d5fc1ac	LUCENE-10125: Optimize primitive writes in OutputStreamIndexOutput (#321 )	2021-09-27 19:04:03 +02:00
Julie Tibshirani	eaa421094d	LUCENE-10109: Bump default beam width for HNSW (#312 ) Lucene90HnswVectorsFormat has a default 'beam width' of 16. This is quite low and produces poor recall on typical-sized datasets. This commit bumps it to 100. This new default tries to balance good search performance with indexing speed. Most runs in ann-benchmarks set the parameter between ~400 and 800, but they are heavily optimizing search over index speed.	2021-09-24 18:02:34 -07:00
Greg Miller	eb44d1e6ad	Add slightly more language in the README Contributing section (#318 )	2021-09-24 12:06:06 -07:00
Nhat Nguyen	7390d1af51	LUCENE-10119: Do not set single sort with search after (#317 ) We should not set single sort when the search_after is non-null; otherwise, we will incorrectly skip documents whose values are equal to the value from the search_after and docIDs are greater than the docID from the search_after.	2021-09-23 13:10:17 -04:00
Uwe Schindler	fc475360a8	Only pass "--illegal-access=deny" up to JDK-15, later versions deprecate the option and default to "deny"	2021-09-22 19:41:59 +02:00
Lu Xugang	ed7fb8dea0	LUCENE-10116: Missing calculating the bytes used of DocsWithFieldSet and currentValues in SortedSetDocValuesWriter (#316 )	2021-09-22 14:25:40 +02:00
Lu Xugang	a7bddfaacc	LUCENE-10111: Missing calculating the bytes used of DocsWithFieldSet in NormValuesWriter (#307 )	2021-09-22 07:44:30 +02:00
Chris Hegarty	a7578709a6	LUCENE-10115: Add a fuzzy parsing extension point for custom query parsers This commit adds the QueryParserBase::getFuzzyDistance protected method, which can be overridden by subclasses to provide customisation of how the similarity distance is determined. The default implementation retains the current behaviour.	2021-09-21 13:25:09 +01:00
Julie Tibshirani	b2a04a4bb4	LUCENE-10069: Adjust TestKnnVectorQuery#testRandom to stop failures The test fails randomly because HNSW can sometimes miss results when k is close to the number of total docs. While we wait for a fix, this commit decreases k to prevent failures.	2021-09-20 14:16:47 -07:00
Uwe Schindler	5871ea7972	LUCENE-10112: Improve LZ4 Compression performance with direct primitive read/writes (#310 ) Co-authored-by: Tim Brooks <tim@timbrooks.org>	2021-09-20 19:12:38 +02:00
Christine Poerschke	57524c6a5e	LUCENE-9809: replace 'master' with 'main' in release wizard (#305 )	2021-09-20 17:51:41 +01:00
Uwe Schindler	c57d6e5f8c	LUCENE-10113: Use VarHandles to access int/long/short types in byte arrays (e.g. ByteArrayDataInput) (#308 ) Co-authored-by: Robert Muir <rmuir@apache.org>	2021-09-20 15:37:33 +02:00
Adrien Grand	4bcd64c5ed	LUCENE-9620: Fix test bug.	2021-09-20 09:49:13 +02:00
Uwe Schindler	075d801abe	LUCENE-10114: Remove unused byte order mark in Lucene90PostingsWriter (#309 ) Co-authored-by: Robert Muir <rmuir@apache.org>	2021-09-20 08:37:05 +02:00
Jim Ferenczi	ccf0d5404d	LUCENE-10110: MultiCollector should conditionally wrap single leaf collector (#303 ) MultiCollector should wrap single leaf collector that wants to skip low-scoring hits but the combined score mode doesn't allow it.	2021-09-20 07:26:51 +02:00
Tomoko Uchida	6c1e5920d8	LUCENE-10102: do not call incrementToken() against already consumed input stream.	2021-09-20 10:58:39 +09:00
Robert Muir	8b95e51d70	Add additional docs refs (nightly, build system help/) to README.md (#302 )	2021-09-19 20:24:13 -04:00
Uwe Schindler	f3c3b90e35	LUCENE-9047: fix typo in javadocs (still referred to big endian)	2021-09-19 13:51:51 +02:00
Tomoko Uchida	5dfbef313c	LUCENE-10102: Fix JapaneseCompletionFilter javadoc	2021-09-18 14:43:03 +09:00
goankur	deff5a1f5a	LUCENE-10070: Skip deleted documents during facet counting for all documents (#293 )	2021-09-17 10:35:44 -07:00
Tomoko Uchida	4e86df96c0	LUCENE-10102: Add JapaneseCompletionFilter for Input Method-aware auto-completion (#297 ) Co-authored-by: Robert Muir <rmuir@apache.org>	2021-09-17 22:37:12 +09:00
Dawid Weiss	de45b68c90	LUCENE-9448, LUCENE-9990: fix Luke's launcher task.	2021-09-16 08:49:26 +02:00
Nhat Nguyen	b7a286dd69	LUCENE-10106: Sort optimization wrongly skip first docs (#300 ) The first documents of subsequent segments are mistakenly skipped when sort optimization is enabled. We should initialize maxDocVisited in NumericComparator to -1 instead of 0.	2021-09-15 09:21:59 -04:00
Uwe Schindler	1586933b18	Merge branch 'main' of https://gitbox.apache.org/repos/asf/lucene into main	2021-09-15 01:19:42 +02:00
Uwe Schindler	3c6d4a00cd	LUCENE-10104, SOLR-15631: Upgrade forbiddenapis to version 3.2	2021-09-15 01:19:17 +02:00
Alan Woodward	26093735cc	LUCENE-8638: Expressions haversin() method should continue to return its value in km (#299 ) SloppyMath had a deprecated haversin() function that returned its values in km, which has been replaced by a haversinMeters() function that is explicit about its units. As part of removing this function, we changed the expressions module haversin function to point instead to haversinMeters. However, this may silently change the behaviour of expressions on upgrade. This commit instead adds a haversinKilometers method to the expressions module and maps the haversin function to it. It also adds a new haversinMeters expression function to be more explicit for future users.	2021-09-14 14:01:10 +01:00
Uwe Schindler	3802bdc686	LUCENE-10101: Use getField() instead of getDeclaredField() to minimize security impact by analysis SPI discovery (#298 )	2021-09-14 10:31:46 +02:00
Jim Ferenczi	19537578dd	LUCENE-10089: Disable numeric sort optimization early (#291 ) This commit moves the responsibility to disable the numeric sort optimization on comparators to the SortField. This way we don't need to apply the logic on every top field collectors.	2021-09-13 07:31:43 +02:00
Robert Muir	56968b762a	LUCENE-10098: add note/link to GermanAnalyzer for decompounding nouns. (#294 ) LUCENE-10098: add note/link to GermanAnalyzer for decompounding nouns. We can't do this out of box with the analyzer, due to incompatible licenses. But we can make it easy on the user to do this, by linking to repo that has sample code, documentation, and the required data files.	2021-09-12 12:55:51 -04:00
Robert Muir	24aa45dc3e	LUCENE-10096: Tamil Analyzer (#292 ) Add Tamil analyzer based on snowball stemmer and TamilNLP stopwords	2021-09-10 21:02:11 -04:00
Robert Muir	8bce765218	LUCENE-10095: Nepali Analyzer (#290 ) Add Nepali analyzer based on snowball stemmer and NLTK stopwords	2021-09-10 20:45:23 -04:00
Alan Woodward	cc8c4283dd	LUCENE-10094: Fix test bug	2021-09-10 16:32:33 +01:00

1 2 3 4 5 ...

35347 Commits All Branches Search

35347 Commits

All Branches