diff --git a/lucene/CHANGES.txt b/lucene/CHANGES.txt index c31adcd95f0..9c4890d2c7b 100644 --- a/lucene/CHANGES.txt +++ b/lucene/CHANGES.txt @@ -20,6 +20,9 @@ New Features * LUCENE-7621: Add CoveringQuery, a query whose required number of matching clauses can be defined per document. (Adrien Grand) +* LUCENE-7927: Add LongValueFacetCounts, to compute facet counts for individual + numeric values (Mike McCandless) + Optimizations * LUCENE-7905: Optimize how OrdinalMap (used by @@ -84,6 +87,9 @@ New Features * LUCENE-7838: Knn classifier based on fuzzified term queries (Tommaso Teofili) +* LUCENE-7855: Added advanced options of the Wikipedia tokenizer to its factory. + (Juan Pedro via Adrien Grand) + API Changes * LUCENE-2605: Classic QueryParser no longer splits on whitespace by default. @@ -175,6 +181,9 @@ Bug Fixes functions (Operations.isFinite and Operations.topsortState) to prevent large automaton to overflow the stack (Robert Muir, Adrien Grand, Jim Ferenczi) +* LUCENE-7864: IndexMergeTool is not using intermediate hard links (even + if possible). (Dawid Weiss) + Improvements * LUCENE-7489: Better storage of sparse doc-values fields with the default @@ -186,6 +195,8 @@ Improvements * LUCENE-7901: Original Highlighter now eagerly throws an exception if you provide components that are null. (Jason Gerlowski, David Smiley) +* LUCENE-7841: Normalize ґ to г in Ukrainian analyzer. (Andriy Rysin via Dawid Weiss) + Optimizations * LUCENE-7416: BooleanQuery optimizes queries that have queries that occur both @@ -210,6 +221,10 @@ Optimizations * LUCENE-7874: DisjunctionMaxQuery rewrites to a BooleanQuery when tiebreaker is set to 1. (Jim Ferenczi) +* LUCENE-7828: Speed up range queries on range fields by improving how we + compute the relation between the query and inner nodes of the BKD tree. + (Adrien Grand) + Other * LUCENE-7923: Removed FST.Arc.node field (unused). (Dawid Weiss) @@ -247,27 +262,7 @@ Other * LUCENE-7773: Remove unused/deprecated token types from StandardTokenizer. (Ahmet Arslan via Steve Rowe) -======================= Lucene 6.7.0 ======================= - -New Features - -* LUCENE-7855: Added advanced options of the Wikipedia tokenizer to its factory. - (Juan Pedro via Adrien Grand) - -Bug Fixes - -* LUCENE-7864: IndexMergeTool is not using intermediate hard links (even - if possible). (Dawid Weiss) - -* LUCENE-7869: Changed MemoryIndex to sort 1d points. In case of 1d points, the PointInSetQuery.MergePointVisitor expects - that these points are visited in ascending order. The memory index doesn't do this and this can result in document - with multiple points that should match to not match. (Martijn van Groningen) - -* LUCENE-7878: Fix query builder to keep the SHOULD clause that wraps multi-word synonyms. (Jim Ferenczi) - -Other - -* LUCENE-7800: Remove code that potentially rethrows checked exceptions +* LUCENE-7800: Remove code that potentially rethrows checked exceptions from methods that don't declare them ("sneaky throw" hack). (Robert Muir, Uwe Schindler, Dawid Weiss) @@ -275,15 +270,15 @@ Other that are trivially replaced by LeafReader.terms() and MultiFields.getTerms() (David Smiley) -Improvements +======================= Lucene 6.6.1 ======================= -* LUCENE-7841: Normalize ґ to г in Ukrainian analyzer. (Andriy Rysin via Dawid Weiss) +Bug Fixes -Optimizations +* LUCENE-7869: Changed MemoryIndex to sort 1d points. In case of 1d points, the PointInSetQuery.MergePointVisitor expects + that these points are visited in ascending order. The memory index doesn't do this and this can result in document + with multiple points that should match to not match. (Martijn van Groningen) -* LUCENE-7828: Speed up range queries on range fields by improving how we - compute the relation between the query and inner nodes of the BKD tree. - (Adrien Grand) +* LUCENE-7878: Fix query builder to keep the SHOULD clause that wraps multi-word synonyms. (Jim Ferenczi) ======================= Lucene 6.6.0 ======================= diff --git a/lucene/facet/src/java/org/apache/lucene/facet/LongValueFacetCounts.java b/lucene/facet/src/java/org/apache/lucene/facet/LongValueFacetCounts.java new file mode 100644 index 00000000000..3f7dbf4da2c --- /dev/null +++ b/lucene/facet/src/java/org/apache/lucene/facet/LongValueFacetCounts.java @@ -0,0 +1,499 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.lucene.facet; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collections; +import java.util.List; + +import org.apache.lucene.facet.FacetResult; +import org.apache.lucene.facet.Facets; +import org.apache.lucene.facet.FacetsCollector.MatchingDocs; +import org.apache.lucene.facet.FacetsCollector; +import org.apache.lucene.facet.LabelAndValue; +import org.apache.lucene.index.DocValues; +import org.apache.lucene.index.IndexReader; +import org.apache.lucene.index.LeafReaderContext; +import org.apache.lucene.index.NumericDocValues; +import org.apache.lucene.index.SortedNumericDocValues; +import org.apache.lucene.search.ConjunctionDISI; +import org.apache.lucene.search.DocIdSetIterator; +import org.apache.lucene.search.LongValues; +import org.apache.lucene.search.LongValuesSource; +import org.apache.lucene.util.InPlaceMergeSorter; +import org.apache.lucene.util.PriorityQueue; + +/** {@link Facets} implementation that computes counts for + * all uniqute long values, more efficiently counting small values (0-1023) using an int array, + * and switching to a HashMap for values above 1023. + * Retrieve all facet counts, in value order, with {@link #getAllChildrenSortByValue}, + * or get the topN values sorted by count with {@link #getTopChildrenSortByCount}. + * + * @lucene.experimental */ +public class LongValueFacetCounts extends Facets { + + /** Used for all values that are < 1K. */ + private final int[] counts = new int[1024]; + + /** Used for all values that are >= 1K. */ + private final HashTable hashCounts = new HashTable(); + + private final String field; + + /** Total number of values counted, which is the subset of hits that had a value for this field. */ + private int totCount; + + /** Create {@code LongValueFacetCounts}, using either single-valued {@link + * NumericDocValues} or multi-valued {@link SortedNumericDocValues} from the + * specified field. */ + public LongValueFacetCounts(String field, FacetsCollector hits, boolean multiValued) throws IOException { + this(field, null, hits, multiValued); + } + + /** Create {@code LongValueFacetCounts}, using the provided + * {@link org.apache.lucene.queries.function.ValueSource}. If hits is + * null then all facets are counted. */ + public LongValueFacetCounts(String field, LongValuesSource valueSource, FacetsCollector hits) throws IOException { + this(field, valueSource, hits, false); + } + + /** Create {@code LongValueFacetCounts}, using the provided + * {@link org.apache.lucene.queries.function.ValueSource}. + * random access (implement {@link org.apache.lucene.search.DocIdSet#bits}). */ + public LongValueFacetCounts(String field, LongValuesSource valueSource, FacetsCollector hits, + boolean multiValued) throws IOException { + this.field = field; + if (valueSource == null) { + if (multiValued) { + countMultiValued(field, hits.getMatchingDocs()); + } else { + count(field, hits.getMatchingDocs()); + } + } else { + // value source is always single valued + if (multiValued) { + throw new IllegalArgumentException("can only compute multi-valued facets directly from doc values (when valueSource is null)"); + } + count(valueSource, hits.getMatchingDocs()); + } + } + + /** Counts all facet values for this reader. This produces the same result as computing + * facets on a {@link org.apache.lucene.search.MatchAllDocsQuery}, but is more efficient. */ + public LongValueFacetCounts(String field, IndexReader reader, boolean multiValued) throws IOException { + this.field = field; + if (multiValued) { + countAllMultiValued(reader, field); + } else { + countAll(reader, field); + } + } + + /** Counts all facet values for the provided {@link LongValuesSource}. This produces the same result as computing + * facets on a {@link org.apache.lucene.search.MatchAllDocsQuery}, but is more efficient. */ + public LongValueFacetCounts(String field, LongValuesSource valueSource, IndexReader reader) throws IOException { + this.field = field; + countAll(valueSource, field, reader); + } + + private void count(LongValuesSource valueSource, List matchingDocs) throws IOException { + + for (MatchingDocs hits : matchingDocs) { + LongValues fv = valueSource.getValues(hits.context, null); + + // NOTE: this is not as efficient as working directly with the doc values APIs in the sparse case + // because we are doing a linear scan across all hits, but this API is more flexible since a + // LongValuesSource can compute interesting values at query time + + DocIdSetIterator docs = hits.bits.iterator(); + for (int doc = docs.nextDoc(); doc != DocIdSetIterator.NO_MORE_DOCS;) { + // Skip missing docs: + if (fv.advanceExact(doc)) { + increment(fv.longValue()); + totCount++; + } + + doc = docs.nextDoc(); + } + } + } + + private void count(String field, List matchingDocs) throws IOException { + for (MatchingDocs hits : matchingDocs) { + NumericDocValues fv = hits.context.reader().getNumericDocValues(field); + if (fv == null) { + continue; + } + countOneSegment(fv, hits); + } + } + + private void countOneSegment(NumericDocValues values, MatchingDocs hits) throws IOException { + DocIdSetIterator it = ConjunctionDISI.intersectIterators( + Arrays.asList(hits.bits.iterator(), values)); + + for (int doc = it.nextDoc(); doc != DocIdSetIterator.NO_MORE_DOCS; doc = it.nextDoc()) { + increment(values.longValue()); + totCount++; + } + } + + /** Counts directly from SortedNumericDocValues. */ + private void countMultiValued(String field, List matchingDocs) throws IOException { + + for (MatchingDocs hits : matchingDocs) { + SortedNumericDocValues values = hits.context.reader().getSortedNumericDocValues(field); + if (values == null) { + // this field has no doc values for this segment + continue; + } + + NumericDocValues singleValues = DocValues.unwrapSingleton(values); + + if (singleValues != null) { + countOneSegment(singleValues, hits); + } else { + + DocIdSetIterator it = ConjunctionDISI.intersectIterators( + Arrays.asList(hits.bits.iterator(), values)); + + for (int doc = it.nextDoc(); doc != DocIdSetIterator.NO_MORE_DOCS; doc = it.nextDoc()) { + int limit = values.docValueCount(); + totCount += limit; + for (int i = 0; i < limit; i++) { + increment(values.nextValue()); + } + } + } + } + } + + /** Optimized version that directly counts all doc values. */ + private void countAll(IndexReader reader, String field) throws IOException { + + for (LeafReaderContext context : reader.leaves()) { + + NumericDocValues values = context.reader().getNumericDocValues(field); + if (values == null) { + // this field has no doc values for this segment + continue; + } + + countAllOneSegment(values); + } + } + + private void countAllOneSegment(NumericDocValues values) throws IOException { + int doc; + while ((doc = values.nextDoc()) != DocIdSetIterator.NO_MORE_DOCS) { + totCount++; + increment(values.longValue()); + } + } + + private void countAll(LongValuesSource valueSource, String field, IndexReader reader) throws IOException { + + for (LeafReaderContext context : reader.leaves()) { + LongValues fv = valueSource.getValues(context, null); + int maxDoc = context.reader().maxDoc(); + + for (int doc = 0; doc < maxDoc; doc++) { + // Skip missing docs: + if (fv.advanceExact(doc)) { + increment(fv.longValue()); + totCount++; + } + } + } + } + + private void countAllMultiValued(IndexReader reader, String field) throws IOException { + + for (LeafReaderContext context : reader.leaves()) { + + SortedNumericDocValues values = context.reader().getSortedNumericDocValues(field); + if (values == null) { + // this field has no doc values for this segment + continue; + } + NumericDocValues singleValues = DocValues.unwrapSingleton(values); + if (singleValues != null) { + countAllOneSegment(singleValues); + } else { + int doc; + while ((doc = values.nextDoc()) != DocIdSetIterator.NO_MORE_DOCS) { + int limit = values.docValueCount(); + totCount += limit; + for (int i = 0; i < limit; i++) { + increment(values.nextValue()); + } + } + } + } + } + + private void increment(long value) { + /* + if (value >= 0 && value < counts.length) { + counts[(int) value]++; + } else { + hashCounts.add(value, 1); + } + */ + hashCounts.add(value, 1); + } + + @Override + public FacetResult getTopChildren(int topN, String dim, String... path) { + if (dim.equals(field) == false) { + throw new IllegalArgumentException("invalid dim \"" + dim + "\"; should be \"" + field + "\""); + } + if (path.length != 0) { + throw new IllegalArgumentException("path.length should be 0"); + } + return getTopChildrenSortByCount(topN); + } + + /** Reusable hash entry to hold long facet value and int count. */ + private static class Entry { + int count; + long value; + } + + /** Returns the specified top number of facets, sorted by count. */ + public FacetResult getTopChildrenSortByCount(int topN) { + PriorityQueue pq = new PriorityQueue(Math.min(topN, counts.length + hashCounts.size)) { + @Override + protected boolean lessThan(Entry a, Entry b) { + // sort by count descending, breaking ties by value ascending: + return a.count < b.count || (a.count == b.count && a.value > b.value); + } + }; + + int childCount = 0; + Entry e = null; + for (int i = 0; i < counts.length; i++) { + if (counts[i] != 0) { + childCount++; + if (e == null) { + e = new Entry(); + } + e.value = i; + e.count = counts[i]; + e = pq.insertWithOverflow(e); + } + } + + if (hashCounts.size != 0) { + childCount += hashCounts.size; + for (int i = 0; i < hashCounts.values.length; i++) { + int count = hashCounts.counts[i]; + if (count != 0) { + if (e == null) { + e = new Entry(); + } + e.value = hashCounts.values[i]; + e.count = count; + e = pq.insertWithOverflow(e); + } + } + } + + LabelAndValue[] results = new LabelAndValue[pq.size()]; + while (pq.size() != 0) { + Entry entry = pq.pop(); + results[pq.size()] = new LabelAndValue(Long.toString(entry.value), entry.count); + } + + return new FacetResult(field, new String[0], totCount, results, childCount); + } + + + /** Returns all unique values seen, sorted by value. */ + public FacetResult getAllChildrenSortByValue() { + List labelValues = new ArrayList<>(); + + // compact & sort hash table's arrays by value + int upto = 0; + for (int i = 0; i < hashCounts.values.length; i++) { + if (hashCounts.counts[i] != 0) { + hashCounts.counts[upto] = hashCounts.counts[i]; + hashCounts.values[upto] = hashCounts.values[i]; + upto++; + } + } + + // zero fill all remaining counts so if we are called again we don't mistake these as real values + Arrays.fill(hashCounts.counts, upto, hashCounts.counts.length, 0); + + assert upto == hashCounts.size : "upto=" + upto + " hashCounts.size=" + hashCounts.size; + + new InPlaceMergeSorter() { + @Override + public int compare(int i, int j) { + return Long.compare(hashCounts.values[i], hashCounts.values[j]); + } + + @Override + public void swap(int i, int j) { + int x = hashCounts.counts[i]; + hashCounts.counts[i] = hashCounts.counts[j]; + hashCounts.counts[j] = x; + + long y = hashCounts.values[j]; + hashCounts.values[j] = hashCounts.values[i]; + hashCounts.values[i] = y; + } + }.sort(0, upto); + + boolean countsAdded = false; + for (int i = 0; i < upto; i++) { + /* + if (countsAdded == false && hashCounts.values[i] >= counts.length) { + countsAdded = true; + appendCounts(labelValues); + } + */ + + labelValues.add(new LabelAndValue(Long.toString(hashCounts.values[i]), + hashCounts.counts[i])); + } + + /* + if (countsAdded == false) { + appendCounts(labelValues); + } + */ + + return new FacetResult(field, new String[0], totCount, labelValues.toArray(new LabelAndValue[0]), labelValues.size()); + } + + private void appendCounts(List labelValues) { + for (int i = 0; i < counts.length; i++) { + if (counts[i] != 0) { + labelValues.add(new LabelAndValue(Long.toString(i), counts[i])); + } + } + } + + @Override + public Number getSpecificValue(String dim, String... path) throws IOException { + // TODO: should we impl this? + throw new UnsupportedOperationException(); + } + + @Override + public List getAllDims(int topN) throws IOException { + return Collections.singletonList(getTopChildren(topN, field)); + } + + @Override + public String toString() { + StringBuilder b = new StringBuilder(); + b.append("LongValueFacetCounts totCount="); + b.append(totCount); + b.append(":\n"); + for (int i = 0; i < counts.length; i++) { + if (counts[i] != 0) { + b.append(" "); + b.append(i); + b.append(" -> count="); + b.append(counts[i]); + b.append('\n'); + } + } + + if (hashCounts.size != 0) { + for (int i = 0; i < hashCounts.values.length; i++) { + if (hashCounts.counts[i] != 0) { + b.append(" "); + b.append(hashCounts.values[i]); + b.append(" -> count="); + b.append(hashCounts.counts[i]); + b.append('\n'); + } + } + } + + return b.toString(); + } + + /** Native typed hash table. */ + static class HashTable { + + static final float LOAD_FACTOR = 0.7f; + + long[] values; // values identifying a value + int[] counts; + int mask; + int size; + int threshold; + + HashTable() { + int capacity = 64; // must be a power of 2 + values = new long[capacity]; + counts = new int[capacity]; + mask = capacity - 1; + size = 0; + threshold = (int) (capacity * LOAD_FACTOR); + } + + private int hash(long v) { + int h = (int) (v ^ (v >>> 32)); + h = (31 * h) & mask; // * 31 to try to use the whole table, even if values are dense + return h; + } + + void add(long value, int inc) { + if (size >= threshold) { + rehash(); + } + final int h = hash(value); + for (int slot = h;; slot = (slot + 1) & mask) { + if (counts[slot] == 0) { + values[slot] = value; + ++size; + } else if (values[slot] != value) { + continue; + } + counts[slot] += inc; + break; + } + } + + private void rehash() { + final long[] oldValues = values; + final int[] oldCounts = counts; + + final int newCapacity = values.length * 2; + values = new long[newCapacity]; + counts = new int[newCapacity]; + mask = newCapacity - 1; + threshold = (int) (LOAD_FACTOR * newCapacity); + size = 0; + + for (int i = 0; i < oldValues.length; ++i) { + if (oldCounts[i] > 0) { + add(oldValues[i], oldCounts[i]); + } + } + } + } +} diff --git a/lucene/facet/src/java/org/apache/lucene/facet/range/LongRangeFacetCounts.java b/lucene/facet/src/java/org/apache/lucene/facet/range/LongRangeFacetCounts.java index c9c42a3fa8c..8303a32a06f 100644 --- a/lucene/facet/src/java/org/apache/lucene/facet/range/LongRangeFacetCounts.java +++ b/lucene/facet/src/java/org/apache/lucene/facet/range/LongRangeFacetCounts.java @@ -51,16 +51,18 @@ public class LongRangeFacetCounts extends RangeFacetCounts { this(field, LongValuesSource.fromLongField(field), hits, ranges); } - /** Create {@code RangeFacetCounts}, using the provided + /** Create {@code LongRangeFacetCounts}, using the provided * {@link ValueSource}. */ public LongRangeFacetCounts(String field, LongValuesSource valueSource, FacetsCollector hits, LongRange... ranges) throws IOException { this(field, valueSource, hits, null, ranges); } - /** Create {@code RangeFacetCounts}, using the provided + /** Create {@code LongRangeFacetCounts}, using the provided * {@link ValueSource}, and using the provided Filter as * a fastmatch: only documents passing the filter are - * checked for the matching ranges. The filter must be + * checked for the matching ranges, which is helpful when + * the provided {@link LongValuesSource} is costly per-document, + * such as a geo distance. The filter must be * random access (implement {@link DocIdSet#bits}). */ public LongRangeFacetCounts(String field, LongValuesSource valueSource, FacetsCollector hits, Query fastMatchQuery, LongRange... ranges) throws IOException { super(field, ranges, fastMatchQuery); @@ -121,7 +123,7 @@ public class LongRangeFacetCounts extends RangeFacetCounts { missingCount += x; - //System.out.println("totCount " + totCount + " missingCount " + counter.missingCount); + //System.out.println("totCount " + totCount + " x " + x + " missingCount " + missingCount); totCount -= missingCount; } } diff --git a/lucene/facet/src/java/org/apache/lucene/facet/range/RangeFacetCounts.java b/lucene/facet/src/java/org/apache/lucene/facet/range/RangeFacetCounts.java index 32a9c56c774..2687fa3567d 100644 --- a/lucene/facet/src/java/org/apache/lucene/facet/range/RangeFacetCounts.java +++ b/lucene/facet/src/java/org/apache/lucene/facet/range/RangeFacetCounts.java @@ -78,7 +78,7 @@ abstract class RangeFacetCounts extends Facets { @Override public List getAllDims(int topN) throws IOException { - return Collections.singletonList(getTopChildren(topN, null)); + return Collections.singletonList(getTopChildren(topN, field)); } @Override diff --git a/lucene/facet/src/java/org/apache/lucene/facet/taxonomy/writercache/CharBlockArray.java b/lucene/facet/src/java/org/apache/lucene/facet/taxonomy/writercache/CharBlockArray.java index 017641a8953..84a0652b3e6 100644 --- a/lucene/facet/src/java/org/apache/lucene/facet/taxonomy/writercache/CharBlockArray.java +++ b/lucene/facet/src/java/org/apache/lucene/facet/taxonomy/writercache/CharBlockArray.java @@ -65,6 +65,9 @@ class CharBlockArray implements Appendable, Serializable, CharSequence { } private void addBlock() { + if (blockSize * (long) (blocks.size() + 1) > Integer.MAX_VALUE) { + throw new IllegalStateException("cannot store more than 2 GB in CharBlockArray"); + } this.current = new Block(this.blockSize); this.blocks.add(this.current); } diff --git a/lucene/facet/src/test/org/apache/lucene/facet/TestLongValueFacetCounts.java b/lucene/facet/src/test/org/apache/lucene/facet/TestLongValueFacetCounts.java new file mode 100644 index 00000000000..18808c2cdb1 --- /dev/null +++ b/lucene/facet/src/test/org/apache/lucene/facet/TestLongValueFacetCounts.java @@ -0,0 +1,543 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.lucene.facet; + +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collections; +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +import org.apache.lucene.document.Document; +import org.apache.lucene.document.IntPoint; +import org.apache.lucene.document.NumericDocValuesField; +import org.apache.lucene.document.SortedNumericDocValuesField; +import org.apache.lucene.facet.FacetResult; +import org.apache.lucene.facet.Facets; +import org.apache.lucene.facet.FacetsCollector; +import org.apache.lucene.index.IndexReader; +import org.apache.lucene.index.RandomIndexWriter; +import org.apache.lucene.search.IndexSearcher; +import org.apache.lucene.search.LongValuesSource; +import org.apache.lucene.search.MatchAllDocsQuery; +import org.apache.lucene.store.Directory; +import org.apache.lucene.util.LuceneTestCase; +import org.apache.lucene.util.TestUtil; + +/** Tests long value facets. */ +public class TestLongValueFacetCounts extends LuceneTestCase { + + public void testBasic() throws Exception { + Directory d = newDirectory(); + RandomIndexWriter w = new RandomIndexWriter(random(), d); + for (long l = 0; l < 100; l++) { + Document doc = new Document(); + doc.add(new NumericDocValuesField("field", l % 5)); + w.addDocument(doc); + } + + // Also add Long.MAX_VALUE + Document doc = new Document(); + doc.add(new NumericDocValuesField("field", Long.MAX_VALUE)); + w.addDocument(doc); + + IndexReader r = w.getReader(); + w.close(); + + FacetsCollector fc = new FacetsCollector(); + IndexSearcher s = newSearcher(r); + s.search(new MatchAllDocsQuery(), fc); + + LongValueFacetCounts facets = new LongValueFacetCounts("field", fc, false); + + FacetResult result = facets.getAllChildrenSortByValue(); + assertEquals("dim=field path=[] value=101 childCount=6\n 0 (20)\n 1 (20)\n 2 (20)\n 3 (20)\n " + + "4 (20)\n 9223372036854775807 (1)\n", + result.toString()); + r.close(); + d.close(); + } + + public void testOnlyBigLongs() throws Exception { + Directory d = newDirectory(); + RandomIndexWriter w = new RandomIndexWriter(random(), d); + for (long l = 0; l < 3; l++) { + Document doc = new Document(); + doc.add(new NumericDocValuesField("field", Long.MAX_VALUE - l)); + w.addDocument(doc); + } + + IndexReader r = w.getReader(); + w.close(); + + FacetsCollector fc = new FacetsCollector(); + IndexSearcher s = newSearcher(r); + s.search(new MatchAllDocsQuery(), fc); + + LongValueFacetCounts facets = new LongValueFacetCounts("field", fc, false); + + FacetResult result = facets.getAllChildrenSortByValue(); + assertEquals("dim=field path=[] value=3 childCount=3\n 9223372036854775805 (1)\n " + + "9223372036854775806 (1)\n 9223372036854775807 (1)\n", + result.toString()); + r.close(); + d.close(); + } + + public void testGetAllDims() throws Exception { + Directory d = newDirectory(); + RandomIndexWriter w = new RandomIndexWriter(random(), d); + for (long l = 0; l < 100; l++) { + Document doc = new Document(); + doc.add(new NumericDocValuesField("field", l % 5)); + w.addDocument(doc); + } + + // Also add Long.MAX_VALUE + Document doc = new Document(); + doc.add(new NumericDocValuesField("field", Long.MAX_VALUE)); + w.addDocument(doc); + + IndexReader r = w.getReader(); + w.close(); + + FacetsCollector fc = new FacetsCollector(); + IndexSearcher s = newSearcher(r); + s.search(new MatchAllDocsQuery(), fc); + + Facets facets = new LongValueFacetCounts("field", fc, false); + + List result = facets.getAllDims(10); + assertEquals(1, result.size()); + assertEquals("dim=field path=[] value=101 childCount=6\n 0 (20)\n 1 (20)\n 2 (20)\n " + + "3 (20)\n 4 (20)\n 9223372036854775807 (1)\n", + result.get(0).toString()); + r.close(); + d.close(); + } + + public void testRandom() throws Exception { + Directory dir = newDirectory(); + RandomIndexWriter w = new RandomIndexWriter(random(), dir); + + int valueCount = atLeast(1000); + double missingChance = random().nextDouble(); + long maxValue; + if (random().nextBoolean()) { + maxValue = random().nextLong() & Long.MAX_VALUE; + } else { + maxValue = random().nextInt(1000); + } + if (VERBOSE) { + System.out.println("TEST: valueCount=" + valueCount + " valueRange=-" + maxValue + + "-" + maxValue + " missingChance=" + missingChance); + } + Long[] values = new Long[valueCount]; + int missingCount = 0; + for (int i = 0; i < valueCount; i++) { + Document doc = new Document(); + doc.add(new IntPoint("id", i)); + if (random().nextDouble() > missingChance) { + long value = TestUtil.nextLong(random(), -maxValue, maxValue); + doc.add(new NumericDocValuesField("field", value)); + values[i] = value; + } else { + missingCount++; + } + w.addDocument(doc); + } + + IndexReader r = w.getReader(); + w.close(); + + IndexSearcher s = newSearcher(r); + + int iters = atLeast(100); + for (int iter = 0; iter < iters; iter++) { + FacetsCollector fc = new FacetsCollector(); + if (VERBOSE) { + System.out.println("\nTEST: iter=" + iter); + System.out.println(" test all docs"); + } + + // all docs + Map expected = new HashMap<>(); + int expectedChildCount = 0; + for (int i = 0; i < valueCount; i++) { + if (values[i] != null) { + Integer curCount = expected.get(values[i]); + if (curCount == null) { + curCount = 0; + expectedChildCount++; + } + expected.put(values[i], curCount + 1); + } + } + + List> expectedCounts = new ArrayList<>(expected.entrySet()); + + // sort by value + Collections.sort(expectedCounts, + (a, b) -> (Long.compare(a.getKey(), b.getKey()))); + + LongValueFacetCounts facetCounts; + if (random().nextBoolean()) { + s.search(new MatchAllDocsQuery(), fc); + if (random().nextBoolean()) { + if (VERBOSE) { + System.out.println(" use value source"); + } + facetCounts = new LongValueFacetCounts("field", LongValuesSource.fromLongField("field"), fc); + } else { + if (VERBOSE) { + System.out.println(" use doc values"); + } + facetCounts = new LongValueFacetCounts("field", fc, false); + } + } else { + // optimized count all: + if (random().nextBoolean()) { + if (VERBOSE) { + System.out.println(" count all value source"); + } + facetCounts = new LongValueFacetCounts("field", LongValuesSource.fromLongField("field"), r); + } else { + if (VERBOSE) { + System.out.println(" count all doc values"); + } + facetCounts = new LongValueFacetCounts("field", r, false); + } + } + + FacetResult actual = facetCounts.getAllChildrenSortByValue(); + assertSame("all docs, sort facets by value", expectedCounts, expectedChildCount, + valueCount - missingCount, actual, Integer.MAX_VALUE); + + // sort by count + Collections.sort(expectedCounts, + (a, b) -> { + int cmp = -Integer.compare(a.getValue(), b.getValue()); + if (cmp == 0) { + // tie break by value + cmp = Long.compare(a.getKey(), b.getKey()); + } + return cmp; + }); + int topN; + if (random().nextBoolean()) { + topN = valueCount; + } else { + topN = random().nextInt(valueCount); + } + if (VERBOSE) { + System.out.println(" topN=" + topN); + } + actual = facetCounts.getTopChildrenSortByCount(topN); + assertSame("all docs, sort facets by count", expectedCounts, expectedChildCount, valueCount - missingCount, actual, topN); + + // subset of docs + int minId = random().nextInt(valueCount); + int maxId = random().nextInt(valueCount); + if (minId > maxId) { + int tmp = minId; + minId = maxId; + maxId = tmp; + } + if (VERBOSE) { + System.out.println(" test id range " + minId + "-" + maxId); + } + + fc = new FacetsCollector(); + s.search(IntPoint.newRangeQuery("id", minId, maxId), fc); + if (random().nextBoolean()) { + if (VERBOSE) { + System.out.println(" use doc values"); + } + facetCounts = new LongValueFacetCounts("field", fc, false); + } else { + if (VERBOSE) { + System.out.println(" use value source"); + } + facetCounts = new LongValueFacetCounts("field", LongValuesSource.fromLongField("field"), fc); + } + + expected = new HashMap<>(); + expectedChildCount = 0; + int totCount = 0; + for (int i = minId; i <= maxId; i++) { + if (values[i] != null) { + totCount++; + Integer curCount = expected.get(values[i]); + if (curCount == null) { + expectedChildCount++; + curCount = 0; + } + expected.put(values[i], curCount + 1); + } + } + expectedCounts = new ArrayList<>(expected.entrySet()); + + // sort by value + Collections.sort(expectedCounts, + (a, b) -> (Long.compare(a.getKey(), b.getKey()))); + actual = facetCounts.getAllChildrenSortByValue(); + assertSame("id " + minId + "-" + maxId + ", sort facets by value", expectedCounts, + expectedChildCount, totCount, actual, Integer.MAX_VALUE); + + // sort by count + Collections.sort(expectedCounts, + (a, b) -> { + int cmp = -Integer.compare(a.getValue(), b.getValue()); + if (cmp == 0) { + // tie break by value + cmp = Long.compare(a.getKey(), b.getKey()); + } + return cmp; + }); + if (random().nextBoolean()) { + topN = valueCount; + } else { + topN = random().nextInt(valueCount); + } + actual = facetCounts.getTopChildrenSortByCount(topN); + assertSame("id " + minId + "-" + maxId + ", sort facets by count", expectedCounts, expectedChildCount, totCount, actual, topN); + } + r.close(); + dir.close(); + } + + public void testRandomMultiValued() throws Exception { + Directory dir = newDirectory(); + RandomIndexWriter w = new RandomIndexWriter(random(), dir); + + int valueCount = atLeast(1000); + double missingChance = random().nextDouble(); + + // sometimes exercise codec optimizations when a claimed multi valued field is in fact single valued: + boolean allSingleValued = rarely(); + long maxValue; + + if (random().nextBoolean()) { + maxValue = random().nextLong() & Long.MAX_VALUE; + } else { + maxValue = random().nextInt(1000); + } + if (VERBOSE) { + System.out.println("TEST: valueCount=" + valueCount + " valueRange=-" + maxValue + + "-" + maxValue + " missingChance=" + missingChance + " allSingleValued=" + allSingleValued); + } + + long[][] values = new long[valueCount][]; + int missingCount = 0; + for (int i = 0; i < valueCount; i++) { + Document doc = new Document(); + doc.add(new IntPoint("id", i)); + if (random().nextDouble() > missingChance) { + if (allSingleValued) { + values[i] = new long[1]; + } else { + values[i] = new long[TestUtil.nextInt(random(), 1, 5)]; + } + + for (int j = 0; j < values[i].length; j++) { + long value = TestUtil.nextLong(random(), -maxValue, maxValue); + values[i][j] = value; + doc.add(new SortedNumericDocValuesField("field", value)); + } + + if (VERBOSE) { + System.out.println(" doc=" + i + " values=" + Arrays.toString(values[i])); + } + + } else { + missingCount++; + + if (VERBOSE) { + System.out.println(" doc=" + i + " missing values"); + } + } + w.addDocument(doc); + } + + IndexReader r = w.getReader(); + w.close(); + + IndexSearcher s = newSearcher(r); + + int iters = atLeast(100); + for (int iter = 0; iter < iters; iter++) { + FacetsCollector fc = new FacetsCollector(); + if (VERBOSE) { + System.out.println("\nTEST: iter=" + iter); + System.out.println(" test all docs"); + } + + // all docs + Map expected = new HashMap<>(); + int expectedChildCount = 0; + int expectedTotalCount = 0; + for (int i = 0; i < valueCount; i++) { + if (values[i] != null) { + for (long value : values[i]) { + Integer curCount = expected.get(value); + if (curCount == null) { + curCount = 0; + expectedChildCount++; + } + expected.put(value, curCount + 1); + expectedTotalCount++; + } + } + } + + List> expectedCounts = new ArrayList<>(expected.entrySet()); + + // sort by value + Collections.sort(expectedCounts, + (a, b) -> (Long.compare(a.getKey(), b.getKey()))); + + LongValueFacetCounts facetCounts; + if (random().nextBoolean()) { + s.search(new MatchAllDocsQuery(), fc); + if (VERBOSE) { + System.out.println(" use doc values"); + } + facetCounts = new LongValueFacetCounts("field", fc, true); + } else { + // optimized count all: + if (VERBOSE) { + System.out.println(" count all doc values"); + } + facetCounts = new LongValueFacetCounts("field", r, true); + } + + FacetResult actual = facetCounts.getAllChildrenSortByValue(); + assertSame("all docs, sort facets by value", expectedCounts, expectedChildCount, + expectedTotalCount, actual, Integer.MAX_VALUE); + + // sort by count + Collections.sort(expectedCounts, + (a, b) -> { + int cmp = -Integer.compare(a.getValue(), b.getValue()); + if (cmp == 0) { + // tie break by value + cmp = Long.compare(a.getKey(), b.getKey()); + } + return cmp; + }); + int topN; + if (random().nextBoolean()) { + topN = valueCount; + } else { + topN = random().nextInt(valueCount); + } + if (VERBOSE) { + System.out.println(" topN=" + topN); + } + actual = facetCounts.getTopChildrenSortByCount(topN); + assertSame("all docs, sort facets by count", expectedCounts, expectedChildCount, expectedTotalCount, actual, topN); + + // subset of docs + int minId = random().nextInt(valueCount); + int maxId = random().nextInt(valueCount); + if (minId > maxId) { + int tmp = minId; + minId = maxId; + maxId = tmp; + } + if (VERBOSE) { + System.out.println(" test id range " + minId + "-" + maxId); + } + + fc = new FacetsCollector(); + s.search(IntPoint.newRangeQuery("id", minId, maxId), fc); + // cannot use value source here because we are multi valued + facetCounts = new LongValueFacetCounts("field", fc, true); + + expected = new HashMap<>(); + expectedChildCount = 0; + int totCount = 0; + for (int i = minId; i <= maxId; i++) { + if (values[i] != null) { + for (long value : values[i]) { + totCount++; + Integer curCount = expected.get(value); + if (curCount == null) { + expectedChildCount++; + curCount = 0; + } + expected.put(value, curCount + 1); + } + } + } + expectedCounts = new ArrayList<>(expected.entrySet()); + + // sort by value + Collections.sort(expectedCounts, + (a, b) -> (Long.compare(a.getKey(), b.getKey()))); + actual = facetCounts.getAllChildrenSortByValue(); + assertSame("id " + minId + "-" + maxId + ", sort facets by value", expectedCounts, + expectedChildCount, totCount, actual, Integer.MAX_VALUE); + + // sort by count + Collections.sort(expectedCounts, + (a, b) -> { + int cmp = -Integer.compare(a.getValue(), b.getValue()); + if (cmp == 0) { + // tie break by value + cmp = Long.compare(a.getKey(), b.getKey()); + } + return cmp; + }); + if (random().nextBoolean()) { + topN = valueCount; + } else { + topN = random().nextInt(valueCount); + } + actual = facetCounts.getTopChildrenSortByCount(topN); + assertSame("id " + minId + "-" + maxId + ", sort facets by count", expectedCounts, expectedChildCount, totCount, actual, topN); + } + r.close(); + dir.close(); + } + + private static void assertSame(String desc, List> expectedCounts, + int expectedChildCount, int expectedTotalCount, FacetResult actual, int topN) { + int expectedTopN = Math.min(topN, expectedCounts.size()); + if (VERBOSE) { + System.out.println(" expected topN=" + expectedTopN); + for (int i = 0; i < expectedTopN; i++) { + System.out.println(" " + i + ": value=" + expectedCounts.get(i).getKey() + " count=" + expectedCounts.get(i).getValue()); + } + System.out.println(" actual topN=" + actual.labelValues.length); + for (int i = 0; i < actual.labelValues.length; i++) { + System.out.println(" " + i + ": value=" + actual.labelValues[i].label + " count=" + actual.labelValues[i].value); + } + } + assertEquals(desc + ": topN", expectedTopN, actual.labelValues.length); + assertEquals(desc + ": childCount", expectedChildCount, actual.childCount); + assertEquals(desc + ": totCount", expectedTotalCount, actual.value.intValue()); + assertTrue(actual.labelValues.length <= topN); + + for (int i = 0; i < expectedTopN; i++) { + assertEquals(desc + ": label[" + i + "]", Long.toString(expectedCounts.get(i).getKey()), actual.labelValues[i].label); + assertEquals(desc + ": counts[" + i + "]", expectedCounts.get(i).getValue().intValue(), actual.labelValues[i].value.intValue()); + } + } +} diff --git a/lucene/facet/src/test/org/apache/lucene/facet/range/TestRangeFacetCounts.java b/lucene/facet/src/test/org/apache/lucene/facet/range/TestRangeFacetCounts.java index 5a80c3b1a70..233738ceff0 100644 --- a/lucene/facet/src/test/org/apache/lucene/facet/range/TestRangeFacetCounts.java +++ b/lucene/facet/src/test/org/apache/lucene/facet/range/TestRangeFacetCounts.java @@ -18,6 +18,7 @@ package org.apache.lucene.facet.range; import java.io.IOException; import java.util.HashMap; +import java.util.List; import java.util.Map; import java.util.Set; import java.util.concurrent.atomic.AtomicBoolean; @@ -28,8 +29,8 @@ import org.apache.lucene.document.DoublePoint; import org.apache.lucene.document.LongPoint; import org.apache.lucene.document.NumericDocValuesField; import org.apache.lucene.facet.DrillDownQuery; -import org.apache.lucene.facet.DrillSideways; import org.apache.lucene.facet.DrillSideways.DrillSidewaysResult; +import org.apache.lucene.facet.DrillSideways; import org.apache.lucene.facet.FacetField; import org.apache.lucene.facet.FacetResult; import org.apache.lucene.facet.FacetTestCase; @@ -99,6 +100,44 @@ public class TestRangeFacetCounts extends FacetTestCase { d.close(); } + public void testLongGetAllDims() throws Exception { + Directory d = newDirectory(); + RandomIndexWriter w = new RandomIndexWriter(random(), d); + Document doc = new Document(); + NumericDocValuesField field = new NumericDocValuesField("field", 0L); + doc.add(field); + for(long l=0;l<100;l++) { + field.setLongValue(l); + w.addDocument(doc); + } + + // Also add Long.MAX_VALUE + field.setLongValue(Long.MAX_VALUE); + w.addDocument(doc); + + IndexReader r = w.getReader(); + w.close(); + + FacetsCollector fc = new FacetsCollector(); + IndexSearcher s = newSearcher(r); + s.search(new MatchAllDocsQuery(), fc); + + Facets facets = new LongRangeFacetCounts("field", fc, + new LongRange("less than 10", 0L, true, 10L, false), + new LongRange("less than or equal to 10", 0L, true, 10L, true), + new LongRange("over 90", 90L, false, 100L, false), + new LongRange("90 or above", 90L, true, 100L, false), + new LongRange("over 1000", 1000L, false, Long.MAX_VALUE, true)); + + List result = facets.getAllDims(10); + assertEquals(1, result.size()); + assertEquals("dim=field path=[] value=22 childCount=5\n less than 10 (10)\n less than or equal to 10 (11)\n over 90 (9)\n 90 or above (10)\n over 1000 (1)\n", + result.get(0).toString()); + + r.close(); + d.close(); + } + public void testUselessRange() { expectThrows(IllegalArgumentException.class, () -> { new LongRange("useless", 7, true, 6, true); diff --git a/lucene/facet/src/test/org/apache/lucene/facet/taxonomy/writercache/Test2GBCharBlockArray.java b/lucene/facet/src/test/org/apache/lucene/facet/taxonomy/writercache/Test2GBCharBlockArray.java new file mode 100644 index 00000000000..ffd232ffe29 --- /dev/null +++ b/lucene/facet/src/test/org/apache/lucene/facet/taxonomy/writercache/Test2GBCharBlockArray.java @@ -0,0 +1,46 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.facet.taxonomy.writercache; + +import org.apache.lucene.util.LuceneTestCase.Monster; +import org.apache.lucene.util.LuceneTestCase; +import org.apache.lucene.util.TestUtil; + +@Monster("uses lots of space and takes a few minutes") +public class Test2GBCharBlockArray extends LuceneTestCase { + + public void test2GBChars() throws Exception { + int blockSize = 32768; + CharBlockArray array = new CharBlockArray(blockSize); + + int size = TestUtil.nextInt(random(), 20000, 40000); + + char[] chars = new char[size]; + int count = 0; + while (true) { + count++; + try { + array.append(chars, 0, size); + } catch (IllegalStateException ise) { + assertTrue(count * (long) size + blockSize > Integer.MAX_VALUE); + break; + } + assertFalse("appended " + (count * (long) size - Integer.MAX_VALUE) + " characters beyond Integer.MAX_VALUE!", + count * (long) size > Integer.MAX_VALUE); + } + } +} diff --git a/lucene/facet/src/test/org/apache/lucene/facet/taxonomy/writercache/TestCharBlockArray.java b/lucene/facet/src/test/org/apache/lucene/facet/taxonomy/writercache/TestCharBlockArray.java index 4914156ca40..f03a54a6f1d 100644 --- a/lucene/facet/src/test/org/apache/lucene/facet/taxonomy/writercache/TestCharBlockArray.java +++ b/lucene/facet/src/test/org/apache/lucene/facet/taxonomy/writercache/TestCharBlockArray.java @@ -27,11 +27,9 @@ import java.nio.file.Path; import org.apache.lucene.facet.FacetTestCase; -import org.junit.Test; - public class TestCharBlockArray extends FacetTestCase { - @Test public void testArray() throws Exception { + public void testArray() throws Exception { CharBlockArray array = new CharBlockArray(); StringBuilder builder = new StringBuilder(); diff --git a/solr/CHANGES.txt b/solr/CHANGES.txt index d3a95a2d0b5..6b342fc6437 100644 --- a/solr/CHANGES.txt +++ b/solr/CHANGES.txt @@ -325,6 +325,14 @@ Upgrading from Solr 6.x detect Java 9 correctly and setup Garbage Collector logging. If the configuration file contains logging options that are no longer supported with Java 9, startup will fail. +* SOLR-10307: If starting Jetty without the Solr start script, you must now pass keystore and truststore + passwords via the env variables SOLR_SSL_KEY_STORE_PASSWORD and SOLR_SSL_TRUST_STORE_PASSWORD rather + than system properties. + +* SOLR-10379: ManagedSynonymFilterFactory has been deprecated in favor of ManagedSynonymGraphFilterFactory. + +* SOLR-10503: CurrencyField has been deprecated in favor of new CurrencyFieldType. + New Features ---------------------- * SOLR-9857, SOLR-9858: Collect aggregated metrics from nodes and shard leaders in overseer. (ab) @@ -406,6 +414,59 @@ New Features * SOLR-11173 TermsComponent support for Points fields. (yonik) +* SOLR-10849: MoreLikeThisComponent should expose setMaxDocFreqPct (maxDoc + frequency percentage). (Dawid Weiss) + +* SOLR-10307: Allow Passing SSL passwords through environment variables. (Mano Kovacs, Michael Suzuki via Mark Miller) + +* SOLR-10379: Add ManagedSynonymGraphFilterFactory, deprecate ManagedSynonymFilterFactory. (Steve Rowe) + +* SOLR-10479: Adds support for HttpShardHandlerFactory.loadBalancerRequests(MinimumAbsolute|MaximumFraction) + configuration. (Ramsey Haddad, Daniel Collins, Christine Poerschke) + +* SOLR-3702: concat(...) function query (Andrey Kudryavtsev via Mikhail Khludnev) + +* SOLR-10767: Add movingAvg Stream Evaluator (Joel Bernstein) + +* SOLR-10813: Add arraySort Stream Evaluator (Joel Bernstein) + +* SOLR-10696: Add cumulative probability function (Joel Bernstein) + +* SOLR-10765: Add anova Stream Evaluator (Joel Bernstein) + +* SOLR-10754: Add hist Stream Evaluator (Joel Bernstein) + +* SOLR-10753: Add array Stream Evaluator (Joel Bernstein) + +* SOLR-10747: Allow /stream handler to execute Stream Evaluators directly (Joel Bernstein) + +* SOLR-10743: Add sequence StreamEvaluator (Joel Bernstein) + +* SOLR-10684: Add finddelay Stream Evaluator (Joel Bernstein) + +* SOLR-10731: Add knn Streaming Expression (Joel Bernstein) + +* SOLR-10724: Add describe Stream Evaluator (Joel Bernstein) + +* SOLR-10693: Add copyOfRange Stream Evaluator (Joel Bernstein) + +* SOLR-10623: Add sql Streaming Expression (Joel Bernstein) + +* SOLR-10661: Add copyOf Stream Evaluator (Joel Bernstein) + +* SOLR-10663: Add distance Stream Evaluator (Joel Bernstein) + +* SOLR-10664: Add scale Stream Evaluator (Joel Bernstein) + +* SOLR-10666: Add rank transformation Stream Evaluator (Joel Bernstein) + +* SOLR-10662: Add length Stream Evaluator (Joel Bernstein) + +* SOLR-10660: Add reverse Stream Evaluator (Joel Bernstein) + +* SOLR-9910: Add solr/solr.cmd parameter to append jetty parameters to the start script. + (Mano Kovacs via Mark Miller) + Bug Fixes ---------------------- * SOLR-9262: Connection and read timeouts are being ignored by UpdateShardHandler after SOLR-4509. @@ -504,6 +565,49 @@ Bug Fixes * SOLR-8689: Fix bin/solr.cmd so it can run properly on Java 9 (Uwe Schindler, hossman) +* SOLR-10723 JSON Facet API: resize() implemented incorrectly for CountSlotAcc, HllAgg.NumericAcc + resulting in exceptions when using a hashing faceting method and sorting by hll(numeric_field). + (yonik) + +* SOLR-10719: Creating a core.properties fails if the parent of core.properties is a symlinked dierctory + (Erick Erickson) + +* SOLR-10360: Solr HDFS snapshot export fails due to FileNotFoundException error when using MR1 instead of + yarn. (Hrishikesh via Mark Miller) + +* SOLR-10137: Ensure that ConfigSets created via API are mutable. (Hrishikesh via Mark Miller) + +* SOLR-10829: Fixed IndexSchema to enforce that uniqueKey can not be Points based for correctness (hossman) + +* SOLR-10836: The query parsers igain, significantTerms, and tlogit (used by streaming expressions by + the same name) might throw a NullPointerException if the referenced field had no indexed data in some + shards. The fix included an optimization to use Solr's cached AtomicReader instead of re-calculating. + (David Smiley) + +* SOLR-10715: /v2/ should not be an alias for /v2/collections (Cao Manh Dat) + +* SOLR-10835: Add support for point fields in Export Handler (Tomás Fernández Löbbe) + +* SOLR-10704: REPLACENODE may cause data loss when replicationFactor is 1. (ab, shalin) + +* SOLR-10833: Point numeric fields should throw SolrException(BAD_REQUEST) for malformed numbers in queries. + Trie numeric fields should throw SolrException(BAD_REQUEST) for malformed docValues range queries. + (hossman, Tomás Fernández Löbbe) + +* SOLR-10832: Fixed VersionInfo.getMaxVersionFromIndex when using PointsField with indexed="true" (hossman) + +* SOLR-10763: Admin UI replication tab sometimes empty when failed replications (janhoy, Bojan Vitnik) + +* SOLR-10824: fix NPE ExactSharedStatsCache, fixing maxdocs skew for terms which are absent at one of shards +when using one of Exact*StatsCache (Mikhail Khludnev) + +* SOLR-10963: Fix example json in MultipleAdditiveTreesModel javadocs. + (Stefan Langenmaier via Christine Poerschke) + +* SOLR-10914: RecoveryStrategy's sendPrepRecoveryCmd can get stuck for 5 minutes if leader is unloaded. (shalin) + +* SOLR-11198: downconfig downloads empty file as folder (Erick Erickson) + Optimizations ---------------------- @@ -527,6 +631,13 @@ Optimizations * SOLR-11070: Make docValues range queries behave the same as Trie/Point fields for Double/Float Infinity cases (Tomás Fernández Löbbe, Andrey Kudryavtsev) +* SOLR-10634: JSON Facet API: When a field/terms facet will retrieve all buckets (i.e. limit:-1) + and there are no nested facets, aggregations are computed in the first collection phase + so that the second phase which would normally involve calculating the domain for the bucket + can be skipped entirely, leading to large performance improvements. (yonik) + +* SOLR-10722: Speed up Solr's use of the UnifiedHighlighter be re-using FieldInfos. (David Smiley) + Other Changes ---------------------- * SOLR-10236: Removed FieldType.getNumericType(). Use getNumberType() instead. (Tomás Fernández Löbbe) @@ -715,155 +826,15 @@ Other Changes * SOLR-11183: V2 APIs are now available at /api endpoint. (Ishan Chattopadhyaya) -================== 6.7.0 ================== - -Consult the LUCENE_CHANGES.txt file for additional, low level, changes in this release. - -Versions of Major Components ---------------------- -Apache Tika 1.13 -Carrot2 3.15.0 -Velocity 1.7 and Velocity Tools 2.0 -Apache UIMA 2.3.1 -Apache ZooKeeper 3.4.10 -Jetty 9.3.14.v20161028 - -Detailed Change List ----------------------- - -Upgrade Notes ----------------------- - -* SOLR-10307: If starting Jetty without the Solr start script, you must now pass keystore and truststore - passwords via the env variables SOLR_SSL_KEY_STORE_PASSWORD and SOLR_SSL_TRUST_STORE_PASSWORD rather - than system properties. - -* SOLR-10379: ManagedSynonymFilterFactory has been deprecated in favor of ManagedSynonymGraphFilterFactory. - -* SOLR-10503: CurrencyField has been deprecated in favor of new CurrencyFieldType. - -New Features ----------------------- - -* SOLR-10849: MoreLikeThisComponent should expose setMaxDocFreqPct (maxDoc - frequency percentage). (Dawid Weiss) - -* SOLR-10307: Allow Passing SSL passwords through environment variables. (Mano Kovacs, Michael Suzuki via Mark Miller) - -* SOLR-10379: Add ManagedSynonymGraphFilterFactory, deprecate ManagedSynonymFilterFactory. (Steve Rowe) - -* SOLR-10479: Adds support for HttpShardHandlerFactory.loadBalancerRequests(MinimumAbsolute|MaximumFraction) - configuration. (Ramsey Haddad, Daniel Collins, Christine Poerschke) - -* SOLR-3702: concat(...) function query (Andrey Kudryavtsev via Mikhail Khludnev) - -* SOLR-10767: Add movingAvg Stream Evaluator (Joel Bernstein) - -* SOLR-10813: Add arraySort Stream Evaluator (Joel Bernstein) - -* SOLR-10696: Add cumulative probability function (Joel Bernstein) - -* SOLR-10765: Add anova Stream Evaluator (Joel Bernstein) - -* SOLR-10754: Add hist Stream Evaluator (Joel Bernstein) - -* SOLR-10753: Add array Stream Evaluator (Joel Bernstein) - -* SOLR-10747: Allow /stream handler to execute Stream Evaluators directly (Joel Bernstein) - -* SOLR-10743: Add sequence StreamEvaluator (Joel Bernstein) - -* SOLR-10684: Add finddelay Stream Evaluator (Joel Bernstein) - -* SOLR-10731: Add knn Streaming Expression (Joel Bernstein) - -* SOLR-10724: Add describe Stream Evaluator (Joel Bernstein) - -* SOLR-10693: Add copyOfRange Stream Evaluator (Joel Bernstein) - -* SOLR-10623: Add sql Streaming Expression (Joel Bernstein) - -* SOLR-10661: Add copyOf Stream Evaluator (Joel Bernstein) - -* SOLR-10663: Add distance Stream Evaluator (Joel Bernstein) - -* SOLR-10664: Add scale Stream Evaluator (Joel Bernstein) - -* SOLR-10666: Add rank transformation Stream Evaluator (Joel Bernstein) - -* SOLR-10662: Add length Stream Evaluator (Joel Bernstein) - -* SOLR-10660: Add reverse Stream Evaluator (Joel Bernstein) - -* SOLR-9910: Add solr/solr.cmd parameter to append jetty parameters to the start script. - (Mano Kovacs via Mark Miller) - -Bug Fixes ----------------------- -* SOLR-10723 JSON Facet API: resize() implemented incorrectly for CountSlotAcc, HllAgg.NumericAcc - resulting in exceptions when using a hashing faceting method and sorting by hll(numeric_field). - (yonik) - -* SOLR-10719: Creating a core.properties fails if the parent of core.properties is a symlinked dierctory - (Erick Erickson) - -* SOLR-10360: Solr HDFS snapshot export fails due to FileNotFoundException error when using MR1 instead of - yarn. (Hrishikesh via Mark Miller) - -* SOLR-10137: Ensure that ConfigSets created via API are mutable. (Hrishikesh via Mark Miller) - -* SOLR-10829: Fixed IndexSchema to enforce that uniqueKey can not be Points based for correctness (hossman) - -* SOLR-10836: The query parsers igain, significantTerms, and tlogit (used by streaming expressions by - the same name) might throw a NullPointerException if the referenced field had no indexed data in some - shards. The fix included an optimization to use Solr's cached AtomicReader instead of re-calculating. - (David Smiley) - -* SOLR-10715: /v2/ should not be an alias for /v2/collections (Cao Manh Dat) - -* SOLR-10835: Add support for point fields in Export Handler (Tomás Fernández Löbbe) - -* SOLR-10704: REPLACENODE may cause data loss when replicationFactor is 1. (ab, shalin) - -* SOLR-10833: Point numeric fields should throw SolrException(BAD_REQUEST) for malformed numbers in queries. - Trie numeric fields should throw SolrException(BAD_REQUEST) for malformed docValues range queries. - (hossman, Tomás Fernández Löbbe) - -* SOLR-10832: Fixed VersionInfo.getMaxVersionFromIndex when using PointsField with indexed="true" (hossman) - -* SOLR-10763: Admin UI replication tab sometimes empty when failed replications (janhoy, Bojan Vitnik) - -* SOLR-10824: fix NPE ExactSharedStatsCache, fixing maxdocs skew for terms which are absent at one of shards -when using one of Exact*StatsCache (Mikhail Khludnev) - -* SOLR-10963: Fix example json in MultipleAdditiveTreesModel javadocs. - (Stefan Langenmaier via Christine Poerschke) - -* SOLR-10914: RecoveryStrategy's sendPrepRecoveryCmd can get stuck for 5 minutes if leader is unloaded. (shalin) - -* SOLR-11198: downconfig downloads empty file as folder (Erick Erickson) - -Optimizations ----------------------- -* SOLR-10634: JSON Facet API: When a field/terms facet will retrieve all buckets (i.e. limit:-1) - and there are no nested facets, aggregations are computed in the first collection phase - so that the second phase which would normally involve calculating the domain for the bucket - can be skipped entirely, leading to large performance improvements. (yonik) - -* SOLR-10722: Speed up Solr's use of the UnifiedHighlighter be re-using FieldInfos. (David Smiley) - -Other Changes ----------------------- - * SOLR-10617: JDBCStream accepts columns of type TIME, DATE & TIMESTAMP as well as CLOBs and decimal numeric types (James Dyer) * SOLR-10400: Replace (instanceof TrieFooField || instanceof FooPointField) constructs with FieldType.getNumberType() or SchemaField.getSortField() where appropriate. (hossman, Steve Rowe) - -* SOLR-10438: Assign explicit useDocValuesAsStored values to all points field types in + +* SOLR-10438: Assign explicit useDocValuesAsStored values to all points field types in schema-point.xml/TestPointFields. (hossman, Steve Rowe) - + * LUCENE-7705: Allow CharTokenizer-derived tokenizers and KeywordTokenizer to configure the max token length. (Amrit Sarkar via Erick Erickson) @@ -885,9 +856,9 @@ Other Changes * SOLR-10761: Switch trie numeric/date fields to points in data-driven-enabled example and test schemas. (Steve Rowe) -* SOLR-10851: SolrClients should clarify expectations for solrServerUrl parameter (Jason Gerlowski +* SOLR-10851: SolrClients should clarify expectations for solrServerUrl parameter (Jason Gerlowski via Tomás Fernández Löbbe) - + * SOLR-10891: BBoxField should support point-based number sub-fields. (Steve Rowe) * SOLR-10834: Fixed tests and test configs to stop using numeric uniqueKey fields (hossman) @@ -915,7 +886,7 @@ Bug Fixes * SOLR-10908: CloudSolrStream.toExpression incorrectly handles fq clauses (Rohit Singh via Erick Erickson) * SOLR-11177: CoreContainer.load needs to send lazily loaded core descriptors to the proper list rather than send - them all to the transient lists. (Erick Erickson) (note, not in 7.0, is in 7.1) + them all to the transient lists. (Erick Erickson) * SOLR-11122: Creating a core should write a core.properties file first and clean up on failure (Erick Erickson)