mirror of https://github.com/apache/lucene.git
LUCENE-7322: Remove CompressionTools.
This commit is contained in:
parent
bd9005d562
commit
894a8b275c
|
@ -1,90 +1,16 @@
|
|||
# Apache Lucene Migration Guide
|
||||
|
||||
## Query.hashCode and Query.equals are now abstract methods (LUCENE-7277)
|
||||
|
||||
Any custom query subclasses should redeclare equivalence relationship according
|
||||
to the subclass's details. See code patterns used in existing core Lucene query
|
||||
classes for details.
|
||||
|
||||
## The way how number of document calculated is changed (LUCENE-6711)
|
||||
The number of documents (numDocs) is used to calculate term specificity (idf) and average document length (avdl).
|
||||
Prior to LUCENE-6711, collectionStats.maxDoc() was used for the statistics.
|
||||
Now, collectionStats.docCount() is used whenever possible, if not maxDocs() is used.
|
||||
## CompressionTools removed (LUCENE-7322)
|
||||
|
||||
Assume that a collection contains 100 documents, and 50 of them have "keywords" field.
|
||||
In this example, maxDocs is 100 while docCount is 50 for the "keywords" field.
|
||||
The total number of tokens for "keywords" field is divided by docCount to obtain avdl.
|
||||
Therefore, docCount which is the total number of documents that have at least one term for the field, is a more precise metric for optional fields.
|
||||
Per-field compression has been superseded by codec-level compression, which has
|
||||
the benefit of being able to compress several fields, or even documents at once,
|
||||
yielding better compression ratios. In case you would still like to compress on
|
||||
top of the codec, you can do it on the application side by using the utility
|
||||
classes from the java.util.zip package.
|
||||
|
||||
DefaultSimilarity does not leverage avdl, so this change would have relatively minor change in the result list.
|
||||
Because relative idf values of terms will remain same.
|
||||
However, when combined with other factors such as term frequency, relative ranking of documents could change.
|
||||
Some Similarity implementations (such as the ones instantiated with NormalizationH2 and BM25) take account into avdl and would have notable change in ranked list.
|
||||
Especially if you have a collection of documents with varying lengths.
|
||||
Because NormalizationH2 tends to punish documents longer than avdl.
|
||||
|
||||
## FunctionValues.exist() Behavior Changes due to ValueSource bug fixes (LUCENE-5961)
|
||||
|
||||
Bugs fixed in several ValueSource functions may result in different behavior in
|
||||
situations where some documents do not have values for fields wrapped in other
|
||||
ValueSources. Users who want to preserve the previous behavior may need to wrap
|
||||
their ValueSources in a "DefFunction" along with a ConstValueSource of "0.0".
|
||||
|
||||
## Removal of Filter and FilteredQuery (LUCENE-6301,LUCENE-6583)
|
||||
|
||||
Filter and FilteredQuery have been removed. Regular queries can be used instead
|
||||
of filters as they have been optimized for the filtering case. And you can
|
||||
construct a BooleanQuery with one MUST clause for the query, and one FILTER
|
||||
clause for the filter in order to have similar behaviour to FilteredQuery.
|
||||
|
||||
## PhraseQuery and BooleanQuery made immutable (LUCENE-6531 LUCENE-6570)
|
||||
|
||||
PhraseQuery and BooleanQuery are now immutable and have a builder API to help
|
||||
construct them. For instance a BooleanQuery that used to be constructed like
|
||||
this:
|
||||
|
||||
BooleanQuery bq = new BooleanQuery();
|
||||
bq.add(q1, Occur.SHOULD);
|
||||
bq.add(q2, Occur.SHOULD);
|
||||
bq.add(q3, Occur.MUST);
|
||||
bq.setMinimumNumberShouldMatch(1);
|
||||
|
||||
can now be constructed this way using its builder:
|
||||
|
||||
BooleanQuery bq = new BooleanQuery.Builder()
|
||||
.add(q1, Occur.SHOULD)
|
||||
.add(q2, Occur.SHOULD)
|
||||
.add(q3, Occur.SHOULD)
|
||||
.setMinimumNumberShouldMatch(1)
|
||||
.build();
|
||||
|
||||
## AttributeImpl now requires that reflectWith() is implemented (LUCENE-6651)
|
||||
|
||||
AttributeImpl removed the default, reflection-based implementation of
|
||||
reflectWith(AtrributeReflector). The method was made abstract. If you have
|
||||
implemented your own attribute, make sure to add the required method sigature.
|
||||
See the Javadocs for an example.
|
||||
|
||||
## Query.setBoost() and Query.clone() are removed (LUCENE-6590)
|
||||
|
||||
Query.setBoost has been removed. In order to apply a boost to a Query, you now
|
||||
need to wrap it inside a BoostQuery. For instance,
|
||||
|
||||
Query q = ...;
|
||||
float boost = ...;
|
||||
q = new BoostQuery(q, boost);
|
||||
|
||||
would be equivalent to the following code with the old setBoost API:
|
||||
|
||||
Query q = ...;
|
||||
float boost = ...;
|
||||
q.setBoost(q.getBoost() * boost);
|
||||
|
||||
# PointValues replaces NumericField (LUCENE-6917)
|
||||
|
||||
PointValues provides faster indexing and searching, a smaller
|
||||
index size, and less heap used at search time. See org.apache.lucene.index.PointValues
|
||||
for an introduction.
|
||||
|
||||
Legacy numeric encodings from previous versions of Lucene are
|
||||
deprecated as LegacyIntField, LegacyFloatField, LegacyLongField, and LegacyDoubleField,
|
||||
and can be searched with LegacyNumericRangeQuery.
|
||||
|
|
|
@ -1,150 +0,0 @@
|
|||
/*
|
||||
* Licensed to the Apache Software Foundation (ASF) under one or more
|
||||
* contributor license agreements. See the NOTICE file distributed with
|
||||
* this work for additional information regarding copyright ownership.
|
||||
* The ASF licenses this file to You under the Apache License, Version 2.0
|
||||
* (the "License"); you may not use this file except in compliance with
|
||||
* the License. You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
package org.apache.lucene.document;
|
||||
|
||||
|
||||
import java.util.zip.Deflater;
|
||||
import java.util.zip.Inflater;
|
||||
import java.util.zip.DataFormatException;
|
||||
import java.io.ByteArrayOutputStream;
|
||||
|
||||
import org.apache.lucene.util.BytesRef;
|
||||
import org.apache.lucene.util.UnicodeUtil;
|
||||
|
||||
/** Simple utility class providing static methods to
|
||||
* compress and decompress binary data for stored fields.
|
||||
* This class uses java.util.zip.Deflater and Inflater
|
||||
* classes to compress and decompress.
|
||||
*/
|
||||
|
||||
public class CompressionTools {
|
||||
|
||||
// Export only static methods
|
||||
private CompressionTools() {}
|
||||
|
||||
/** Compresses the specified byte range using the
|
||||
* specified compressionLevel (constants are defined in
|
||||
* java.util.zip.Deflater). */
|
||||
public static byte[] compress(byte[] value, int offset, int length, int compressionLevel) {
|
||||
|
||||
/* Create an expandable byte array to hold the compressed data.
|
||||
* You cannot use an array that's the same size as the orginal because
|
||||
* there is no guarantee that the compressed data will be smaller than
|
||||
* the uncompressed data. */
|
||||
ByteArrayOutputStream bos = new ByteArrayOutputStream(length);
|
||||
|
||||
Deflater compressor = new Deflater();
|
||||
|
||||
try {
|
||||
compressor.setLevel(compressionLevel);
|
||||
compressor.setInput(value, offset, length);
|
||||
compressor.finish();
|
||||
|
||||
// Compress the data
|
||||
final byte[] buf = new byte[1024];
|
||||
while (!compressor.finished()) {
|
||||
int count = compressor.deflate(buf);
|
||||
bos.write(buf, 0, count);
|
||||
}
|
||||
} finally {
|
||||
compressor.end();
|
||||
}
|
||||
|
||||
return bos.toByteArray();
|
||||
}
|
||||
|
||||
/** Compresses the specified byte range, with default BEST_COMPRESSION level */
|
||||
public static byte[] compress(byte[] value, int offset, int length) {
|
||||
return compress(value, offset, length, Deflater.BEST_COMPRESSION);
|
||||
}
|
||||
|
||||
/** Compresses all bytes in the array, with default BEST_COMPRESSION level */
|
||||
public static byte[] compress(byte[] value) {
|
||||
return compress(value, 0, value.length, Deflater.BEST_COMPRESSION);
|
||||
}
|
||||
|
||||
/** Compresses the String value, with default BEST_COMPRESSION level */
|
||||
public static byte[] compressString(String value) {
|
||||
return compressString(value, Deflater.BEST_COMPRESSION);
|
||||
}
|
||||
|
||||
/** Compresses the String value using the specified
|
||||
* compressionLevel (constants are defined in
|
||||
* java.util.zip.Deflater). */
|
||||
public static byte[] compressString(String value, int compressionLevel) {
|
||||
byte[] b = new byte[UnicodeUtil.MAX_UTF8_BYTES_PER_CHAR * value.length()];
|
||||
final int len = UnicodeUtil.UTF16toUTF8(value, 0, value.length(), b);
|
||||
return compress(b, 0, len, compressionLevel);
|
||||
}
|
||||
|
||||
/** Decompress the byte array previously returned by
|
||||
* compress (referenced by the provided BytesRef) */
|
||||
public static byte[] decompress(BytesRef bytes) throws DataFormatException {
|
||||
return decompress(bytes.bytes, bytes.offset, bytes.length);
|
||||
}
|
||||
|
||||
/** Decompress the byte array previously returned by
|
||||
* compress */
|
||||
public static byte[] decompress(byte[] value) throws DataFormatException {
|
||||
return decompress(value, 0, value.length);
|
||||
}
|
||||
|
||||
/** Decompress the byte array previously returned by
|
||||
* compress */
|
||||
public static byte[] decompress(byte[] value, int offset, int length) throws DataFormatException {
|
||||
// Create an expandable byte array to hold the decompressed data
|
||||
ByteArrayOutputStream bos = new ByteArrayOutputStream(length);
|
||||
|
||||
Inflater decompressor = new Inflater();
|
||||
|
||||
try {
|
||||
decompressor.setInput(value, offset, length);
|
||||
|
||||
// Decompress the data
|
||||
final byte[] buf = new byte[1024];
|
||||
while (!decompressor.finished()) {
|
||||
int count = decompressor.inflate(buf);
|
||||
bos.write(buf, 0, count);
|
||||
}
|
||||
} finally {
|
||||
decompressor.end();
|
||||
}
|
||||
|
||||
return bos.toByteArray();
|
||||
}
|
||||
|
||||
/** Decompress the byte array previously returned by
|
||||
* compressString back into a String */
|
||||
public static String decompressString(byte[] value) throws DataFormatException {
|
||||
return decompressString(value, 0, value.length);
|
||||
}
|
||||
|
||||
/** Decompress the byte array previously returned by
|
||||
* compressString back into a String */
|
||||
public static String decompressString(byte[] value, int offset, int length) throws DataFormatException {
|
||||
final byte[] bytes = decompress(value, offset, length);
|
||||
final char[] result = new char[bytes.length];
|
||||
final int len = UnicodeUtil.UTF8toUTF16(bytes, 0, bytes.length, result);
|
||||
return new String(result, 0, len);
|
||||
}
|
||||
|
||||
/** Decompress the byte array (referenced by the provided BytesRef)
|
||||
* previously returned by compressString back into a String */
|
||||
public static String decompressString(BytesRef bytes) throws DataFormatException {
|
||||
return decompressString(bytes.bytes, bytes.offset, bytes.length);
|
||||
}
|
||||
}
|
|
@ -74,33 +74,4 @@ public class TestBinaryDocument extends LuceneTestCase {
|
|||
reader.close();
|
||||
dir.close();
|
||||
}
|
||||
|
||||
public void testCompressionTools() throws Exception {
|
||||
StoredField binaryFldCompressed = new StoredField("binaryCompressed", CompressionTools.compress(binaryValCompressed.getBytes(StandardCharsets.UTF_8)));
|
||||
StoredField stringFldCompressed = new StoredField("stringCompressed", CompressionTools.compressString(binaryValCompressed));
|
||||
|
||||
Document doc = new Document();
|
||||
|
||||
doc.add(binaryFldCompressed);
|
||||
doc.add(stringFldCompressed);
|
||||
|
||||
/** add the doc to a ram index */
|
||||
Directory dir = newDirectory();
|
||||
RandomIndexWriter writer = new RandomIndexWriter(random(), dir);
|
||||
writer.addDocument(doc);
|
||||
|
||||
/** open a reader and fetch the document */
|
||||
IndexReader reader = writer.getReader();
|
||||
Document docFromReader = reader.document(0);
|
||||
assertTrue(docFromReader != null);
|
||||
|
||||
/** fetch the binary compressed field and compare its content with the original one */
|
||||
String binaryFldCompressedTest = new String(CompressionTools.decompress(docFromReader.getBinaryValue("binaryCompressed")), StandardCharsets.UTF_8);
|
||||
assertTrue(binaryFldCompressedTest.equals(binaryValCompressed));
|
||||
assertTrue(CompressionTools.decompressString(docFromReader.getBinaryValue("stringCompressed")).equals(binaryValCompressed));
|
||||
|
||||
writer.close();
|
||||
reader.close();
|
||||
dir.close();
|
||||
}
|
||||
}
|
||||
|
|
Loading…
Reference in New Issue