mirror of https://github.com/apache/lucene.git
LUCENE-2370: Reintegrate flex_1458 branch into trunk (revision 931101)
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@931278 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
3e509789f8
commit
b679816a70
|
@ -1,5 +1,79 @@
|
|||
Lucene Change Log
|
||||
|
||||
======================= Flexible Indexing Branch =======================
|
||||
|
||||
Changes in backwards compatibility policy
|
||||
|
||||
* LUCENE-1458, LUCENE-2111, LUCENE-2354: Changes from flexible indexing:
|
||||
|
||||
- MultiReader ctor now throws IOException
|
||||
|
||||
- Directory.copy/Directory.copyTo now copies all files (not just
|
||||
index files), since what is and isn't and index file is now
|
||||
dependent on the codecs used. (Mike McCandless)
|
||||
|
||||
- UnicodeUtil now uses BytesRef for UTF-8 output, and some method
|
||||
signatures have changed to CharSequence. These are internal APIs
|
||||
and subject to change suddenly. (Robert Muir, Mike McCandless)
|
||||
|
||||
- Positional queries (PhraseQuery, *SpanQuery) will now throw an
|
||||
exception if use them on a field that omits positions during
|
||||
indexing (previously they silently returned no results).
|
||||
|
||||
- FieldCache.(Byte,Short,Int,Long,Float,Double}Parser's API has
|
||||
changed -- each parse method now takes a BytesRef instead of a
|
||||
String. If you have an existing Parser, a simple way to fix it is
|
||||
invoke BytesRef.utf8ToString, and pass that String to your
|
||||
existing parser. This will work, but performance would be better
|
||||
if you could fix your parser to instead operate directly on the
|
||||
byte[] in the BytesRef.
|
||||
|
||||
- The internal (experimental) API of NumericUtils changed completely
|
||||
from String to BytesRef. Client code should never use this class,
|
||||
so the change would normally not affect you. If you used some of
|
||||
the methods to inspect terms or create TermQueries out of
|
||||
prefix encoded terms, change to use BytesRef. Please note:
|
||||
Do not use TermQueries to search for single numeric terms.
|
||||
The recommended way is to create a corresponding NumericRangeQuery
|
||||
with upper and lower bound equal and included. TermQueries do not
|
||||
score correct, so the constant score mode of NRQ is the only
|
||||
correct way to handle single value queries.
|
||||
|
||||
- NumericTokenStream now works directly on byte[] terms. If you
|
||||
plug a TokenFilter on top of this stream, you will likely get
|
||||
an IllegalArgumentException, because the NTS does not support
|
||||
TermAttribute/CharTermAttribute. If you want to further filter
|
||||
or attach Payloads to NTS, use the new NumericTermAttribute.
|
||||
|
||||
Bug Fixes
|
||||
|
||||
* LUCENE-2222: FixedIntBlockIndexInput incorrectly read one block of
|
||||
0s before the actual data. (Renaud Delbru via Mike McCandless)
|
||||
|
||||
* LUCENE-2344: PostingsConsumer.merge was failing to call finishDoc,
|
||||
which caused corruption for sep codec. Also fixed several tests to
|
||||
test all 4 core codecs. (Renaud Delbru via Mike McCandless)
|
||||
|
||||
New features
|
||||
|
||||
* LUCENE-1606, LUCENE-2089: Adds AutomatonQuery, a MultiTermQuery that
|
||||
matches terms against a finite-state machine. Implement WildcardQuery
|
||||
and FuzzyQuery with finite-state methods. Adds RegexpQuery.
|
||||
(Robert Muir, Mike McCandless, Uwe Schindler, Mark Miller)
|
||||
|
||||
* LUCENE-1990: Adds internal packed ints implementation, to be used
|
||||
for more efficient storage of int arrays when the values are
|
||||
bounded, for example for storing the terms dict index Toke Toke
|
||||
Eskildsen via Mike McCandless)
|
||||
|
||||
* LUCENE-2321: Cutover to a more RAM efficient packed-ints based
|
||||
representation for the in-memory terms dict index. (Mike
|
||||
McCandless)
|
||||
|
||||
* LUCENE-2126: Add new classes for data (de)serialization: DataInput
|
||||
and DataOutput. IndexInput and IndexOutput extend these new classes.
|
||||
(Michael Busch)
|
||||
|
||||
======================= Trunk (not yet released) =======================
|
||||
|
||||
Changes in backwards compatibility policy
|
||||
|
@ -297,8 +371,8 @@ Optimizations
|
|||
Build
|
||||
|
||||
* LUCENE-2124: Moved the JDK-based collation support from contrib/collation
|
||||
into core, and moved the ICU-based collation support into contrib/icu.
|
||||
(Robert Muir)
|
||||
into core, and moved the ICU-based collation support into contrib/icu.
|
||||
(Robert Muir)
|
||||
|
||||
* LUCENE-2326: Removed SVN checkouts for backwards tests. The backwards branch
|
||||
is now included in the svn repository using "svn copy" after release.
|
||||
|
|
|
@ -237,4 +237,60 @@ http://www.python.org. Full license is here:
|
|||
|
||||
http://www.python.org/download/releases/2.4.2/license/
|
||||
|
||||
Some code in src/java/org/apache/lucene/util/automaton was
|
||||
derived from Brics automaton sources available at
|
||||
www.brics.dk/automaton/. Here is the copyright from those sources:
|
||||
|
||||
/*
|
||||
* Copyright (c) 2001-2009 Anders Moeller
|
||||
* All rights reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in the
|
||||
* documentation and/or other materials provided with the distribution.
|
||||
* 3. The name of the author may not be used to endorse or promote products
|
||||
* derived from this software without specific prior written permission.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
|
||||
* IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
|
||||
* IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
|
||||
* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
|
||||
* NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
|
||||
* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
|
||||
* THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
|
||||
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
|
||||
* THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
*/
|
||||
|
||||
The levenshtein automata tables in src/java/org/apache/lucene/util/automaton
|
||||
were automatically generated with the moman/finenight FSA package.
|
||||
Here is the copyright for those sources:
|
||||
|
||||
# Copyright (c) 2010, Jean-Philippe Barrette-LaPierre, <jpb@rrette.com>
|
||||
#
|
||||
# Permission is hereby granted, free of charge, to any person
|
||||
# obtaining a copy of this software and associated documentation
|
||||
# files (the "Software"), to deal in the Software without
|
||||
# restriction, including without limitation the rights to use,
|
||||
# copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
# copies of the Software, and to permit persons to whom the
|
||||
# Software is furnished to do so, subject to the following
|
||||
# conditions:
|
||||
#
|
||||
# The above copyright notice and this permission notice shall be
|
||||
# included in all copies or substantial portions of the Software.
|
||||
#
|
||||
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
||||
# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
|
||||
# OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
||||
# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
|
||||
# HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
|
||||
# WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
||||
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
|
||||
# OTHER DEALINGS IN THE SOFTWARE.
|
||||
|
|
|
@ -46,3 +46,12 @@ provided by Xiaoping Gao and copyright 2009 by www.imdict.net.
|
|||
ICU4J, (under contrib/icu) is licensed under an MIT styles license
|
||||
(contrib/icu/lib/ICU-LICENSE.txt) and Copyright (c) 1995-2008
|
||||
International Business Machines Corporation and others
|
||||
|
||||
Brics Automaton (under src/java/org/apache/lucene/util/automaton) is
|
||||
BSD-licensed, created by Anders Møller. See http://www.brics.dk/automaton/
|
||||
|
||||
The levenshtein automata tables (under src/java/org/apache/lucene/util/automaton) were
|
||||
automatically generated with the moman/finenight FSA library, created by
|
||||
Jean-Philippe Barrette-LaPierre. This library is available under an MIT license,
|
||||
see http://sites.google.com/site/rrettesite/moman and
|
||||
http://bitbucket.org/jpbarrette/moman/overview/
|
||||
|
|
|
@ -21,6 +21,7 @@ import org.apache.lucene.store.Directory;
|
|||
import org.apache.lucene.store.IndexOutput;
|
||||
import org.apache.lucene.store.IndexInput;
|
||||
import org.apache.lucene.util.BitVector;
|
||||
import org.apache.lucene.index.codecs.Codec;
|
||||
import java.io.IOException;
|
||||
import java.util.List;
|
||||
import java.util.Map;
|
||||
|
@ -129,6 +130,12 @@ public final class SegmentInfo {
|
|||
assert docStoreOffset == -1 || docStoreSegment != null: "dso=" + docStoreOffset + " dss=" + docStoreSegment + " docCount=" + docCount;
|
||||
}
|
||||
|
||||
// stub
|
||||
public SegmentInfo(String name, int docCount, Directory dir, boolean isCompoundFile, boolean hasSingleNormFile,
|
||||
int docStoreOffset, String docStoreSegment, boolean docStoreIsCompoundFile, boolean hasProx,
|
||||
Codec codec) {
|
||||
}
|
||||
|
||||
/**
|
||||
* Copy everything from src SegmentInfo into our instance.
|
||||
*/
|
||||
|
|
|
@ -29,6 +29,8 @@ import org.apache.lucene.index.MergePolicy.MergeAbortedException;
|
|||
import org.apache.lucene.store.Directory;
|
||||
import org.apache.lucene.store.IndexInput;
|
||||
import org.apache.lucene.store.IndexOutput;
|
||||
import org.apache.lucene.index.codecs.Codec;
|
||||
import org.apache.lucene.index.codecs.CodecProvider;
|
||||
|
||||
/**
|
||||
* The SegmentMerger class combines two or more Segments, represented by an IndexReader ({@link #add},
|
||||
|
@ -99,6 +101,11 @@ final class SegmentMerger {
|
|||
termIndexInterval = writer.getTermIndexInterval();
|
||||
}
|
||||
|
||||
// stub
|
||||
SegmentMerger(Directory dir, int termIndexInterval, String name, MergePolicy.OneMerge merge, CodecProvider codecs) {
|
||||
checkAbort = null;
|
||||
}
|
||||
|
||||
boolean hasProx() {
|
||||
return fieldInfos.hasProx();
|
||||
}
|
||||
|
@ -171,6 +178,11 @@ final class SegmentMerger {
|
|||
}
|
||||
}
|
||||
|
||||
// stub
|
||||
final List<String> createCompoundFile(String fileName, SegmentInfo info) {
|
||||
return null;
|
||||
}
|
||||
|
||||
final List<String> createCompoundFile(String fileName)
|
||||
throws IOException {
|
||||
CompoundFileWriter cfsWriter =
|
||||
|
@ -553,6 +565,11 @@ final class SegmentMerger {
|
|||
}
|
||||
}
|
||||
|
||||
// stub
|
||||
Codec getCodec() {
|
||||
return null;
|
||||
}
|
||||
|
||||
private SegmentMergeQueue queue = null;
|
||||
|
||||
private final void mergeTerms() throws CorruptIndexException, IOException {
|
||||
|
|
|
@ -37,6 +37,7 @@ import org.apache.lucene.store.IndexInput;
|
|||
import org.apache.lucene.store.IndexOutput;
|
||||
import org.apache.lucene.util.BitVector;
|
||||
import org.apache.lucene.util.CloseableThreadLocal;
|
||||
import org.apache.lucene.index.codecs.CodecProvider;
|
||||
|
||||
/** @version $Id */
|
||||
/**
|
||||
|
@ -594,6 +595,17 @@ public class SegmentReader extends IndexReader implements Cloneable {
|
|||
return instance;
|
||||
}
|
||||
|
||||
// stub
|
||||
public static SegmentReader get(boolean readOnly,
|
||||
Directory dir,
|
||||
SegmentInfo si,
|
||||
int readBufferSize,
|
||||
boolean doOpenStores,
|
||||
int termInfosIndexDivisor,
|
||||
CodecProvider codecs) {
|
||||
return null;
|
||||
}
|
||||
|
||||
void openDocStores() throws IOException {
|
||||
core.openDocStores(si);
|
||||
}
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
package org.apache.lucene.index;
|
||||
package org.apache.lucene.index.codecs;
|
||||
|
||||
/**
|
||||
* Licensed to the Apache Software Foundation (ASF) under one or more
|
||||
|
@ -17,15 +17,7 @@ package org.apache.lucene.index;
|
|||
* limitations under the License.
|
||||
*/
|
||||
|
||||
import java.io.IOException;
|
||||
// stub
|
||||
public class Codec {
|
||||
|
||||
|
||||
abstract class FormatPostingsPositionsConsumer {
|
||||
|
||||
/** Add a new position & payload. If payloadLength > 0
|
||||
* you must read those bytes from the IndexInput. */
|
||||
abstract void addPosition(int position, byte[] payload, int payloadOffset, int payloadLength) throws IOException;
|
||||
|
||||
/** Called when we are done adding positions & payloads */
|
||||
abstract void finish() throws IOException;
|
||||
}
|
|
@ -0,0 +1,25 @@
|
|||
package org.apache.lucene.index.codecs;
|
||||
|
||||
/**
|
||||
* Licensed to the Apache Software Foundation (ASF) under one or more
|
||||
* contributor license agreements. See the NOTICE file distributed with
|
||||
* this work for additional information regarding copyright ownership.
|
||||
* The ASF licenses this file to You under the Apache License, Version 2.0
|
||||
* (the "License"); you may not use this file except in compliance with
|
||||
* the License. You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
|
||||
// stub
|
||||
public class CodecProvider {
|
||||
public static CodecProvider getDefault() {
|
||||
return null;
|
||||
}
|
||||
}
|
|
@ -0,0 +1,234 @@
|
|||
package org.apache.lucene.store;
|
||||
|
||||
/**
|
||||
* Licensed to the Apache Software Foundation (ASF) under one or more
|
||||
* contributor license agreements. See the NOTICE file distributed with
|
||||
* this work for additional information regarding copyright ownership.
|
||||
* The ASF licenses this file to You under the Apache License, Version 2.0
|
||||
* (the "License"); you may not use this file except in compliance with
|
||||
* the License. You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
|
||||
import java.io.IOException;
|
||||
import java.util.HashMap;
|
||||
import java.util.Map;
|
||||
|
||||
/**
|
||||
* Abstract base class for performing read operations of Lucene's low-level
|
||||
* data types.
|
||||
*/
|
||||
public abstract class DataInput implements Cloneable {
|
||||
private byte[] bytes; // used by readString()
|
||||
private char[] chars; // used by readModifiedUTF8String()
|
||||
private boolean preUTF8Strings; // true if we are reading old (modified UTF8) string format
|
||||
|
||||
/** Reads and returns a single byte.
|
||||
* @see DataOutput#writeByte(byte)
|
||||
*/
|
||||
public abstract byte readByte() throws IOException;
|
||||
|
||||
/** Reads a specified number of bytes into an array at the specified offset.
|
||||
* @param b the array to read bytes into
|
||||
* @param offset the offset in the array to start storing bytes
|
||||
* @param len the number of bytes to read
|
||||
* @see DataOutput#writeBytes(byte[],int)
|
||||
*/
|
||||
public abstract void readBytes(byte[] b, int offset, int len)
|
||||
throws IOException;
|
||||
|
||||
/** Reads a specified number of bytes into an array at the
|
||||
* specified offset with control over whether the read
|
||||
* should be buffered (callers who have their own buffer
|
||||
* should pass in "false" for useBuffer). Currently only
|
||||
* {@link BufferedIndexInput} respects this parameter.
|
||||
* @param b the array to read bytes into
|
||||
* @param offset the offset in the array to start storing bytes
|
||||
* @param len the number of bytes to read
|
||||
* @param useBuffer set to false if the caller will handle
|
||||
* buffering.
|
||||
* @see DataOutput#writeBytes(byte[],int)
|
||||
*/
|
||||
public void readBytes(byte[] b, int offset, int len, boolean useBuffer)
|
||||
throws IOException
|
||||
{
|
||||
// Default to ignoring useBuffer entirely
|
||||
readBytes(b, offset, len);
|
||||
}
|
||||
|
||||
/** Reads two bytes and returns a short.
|
||||
* @see DataOutput#writeByte(byte)
|
||||
*/
|
||||
public short readShort() throws IOException {
|
||||
return (short) (((readByte() & 0xFF) << 8) | (readByte() & 0xFF));
|
||||
}
|
||||
|
||||
/** Reads four bytes and returns an int.
|
||||
* @see DataOutput#writeInt(int)
|
||||
*/
|
||||
public int readInt() throws IOException {
|
||||
return ((readByte() & 0xFF) << 24) | ((readByte() & 0xFF) << 16)
|
||||
| ((readByte() & 0xFF) << 8) | (readByte() & 0xFF);
|
||||
}
|
||||
|
||||
/** Reads an int stored in variable-length format. Reads between one and
|
||||
* five bytes. Smaller values take fewer bytes. Negative numbers are not
|
||||
* supported.
|
||||
* @see DataOutput#writeVInt(int)
|
||||
*/
|
||||
public int readVInt() throws IOException {
|
||||
byte b = readByte();
|
||||
int i = b & 0x7F;
|
||||
for (int shift = 7; (b & 0x80) != 0; shift += 7) {
|
||||
b = readByte();
|
||||
i |= (b & 0x7F) << shift;
|
||||
}
|
||||
return i;
|
||||
}
|
||||
|
||||
/** Reads eight bytes and returns a long.
|
||||
* @see DataOutput#writeLong(long)
|
||||
*/
|
||||
public long readLong() throws IOException {
|
||||
return (((long)readInt()) << 32) | (readInt() & 0xFFFFFFFFL);
|
||||
}
|
||||
|
||||
/** Reads a long stored in variable-length format. Reads between one and
|
||||
* nine bytes. Smaller values take fewer bytes. Negative numbers are not
|
||||
* supported. */
|
||||
public long readVLong() throws IOException {
|
||||
byte b = readByte();
|
||||
long i = b & 0x7F;
|
||||
for (int shift = 7; (b & 0x80) != 0; shift += 7) {
|
||||
b = readByte();
|
||||
i |= (b & 0x7FL) << shift;
|
||||
}
|
||||
return i;
|
||||
}
|
||||
|
||||
/** Call this if readString should read characters stored
|
||||
* in the old modified UTF8 format (length in java chars
|
||||
* and java's modified UTF8 encoding). This is used for
|
||||
* indices written pre-2.4 See LUCENE-510 for details. */
|
||||
public void setModifiedUTF8StringsMode() {
|
||||
preUTF8Strings = true;
|
||||
}
|
||||
|
||||
/** Reads a string.
|
||||
* @see DataOutput#writeString(String)
|
||||
*/
|
||||
public String readString() throws IOException {
|
||||
if (preUTF8Strings)
|
||||
return readModifiedUTF8String();
|
||||
int length = readVInt();
|
||||
if (bytes == null || length > bytes.length)
|
||||
bytes = new byte[(int) (length*1.25)];
|
||||
readBytes(bytes, 0, length);
|
||||
return new String(bytes, 0, length, "UTF-8");
|
||||
}
|
||||
|
||||
private String readModifiedUTF8String() throws IOException {
|
||||
int length = readVInt();
|
||||
if (chars == null || length > chars.length)
|
||||
chars = new char[length];
|
||||
readChars(chars, 0, length);
|
||||
return new String(chars, 0, length);
|
||||
}
|
||||
|
||||
/** Reads Lucene's old "modified UTF-8" encoded
|
||||
* characters into an array.
|
||||
* @param buffer the array to read characters into
|
||||
* @param start the offset in the array to start storing characters
|
||||
* @param length the number of characters to read
|
||||
* @see DataOutput#writeChars(String,int,int)
|
||||
* @deprecated -- please use readString or readBytes
|
||||
* instead, and construct the string
|
||||
* from those utf8 bytes
|
||||
*/
|
||||
@Deprecated
|
||||
public void readChars(char[] buffer, int start, int length)
|
||||
throws IOException {
|
||||
final int end = start + length;
|
||||
for (int i = start; i < end; i++) {
|
||||
byte b = readByte();
|
||||
if ((b & 0x80) == 0)
|
||||
buffer[i] = (char)(b & 0x7F);
|
||||
else if ((b & 0xE0) != 0xE0) {
|
||||
buffer[i] = (char)(((b & 0x1F) << 6)
|
||||
| (readByte() & 0x3F));
|
||||
} else {
|
||||
buffer[i] = (char)(((b & 0x0F) << 12)
|
||||
| ((readByte() & 0x3F) << 6)
|
||||
| (readByte() & 0x3F));
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Expert
|
||||
*
|
||||
* Similar to {@link #readChars(char[], int, int)} but does not do any conversion operations on the bytes it is reading in. It still
|
||||
* has to invoke {@link #readByte()} just as {@link #readChars(char[], int, int)} does, but it does not need a buffer to store anything
|
||||
* and it does not have to do any of the bitwise operations, since we don't actually care what is in the byte except to determine
|
||||
* how many more bytes to read
|
||||
* @param length The number of chars to read
|
||||
* @deprecated this method operates on old "modified utf8" encoded
|
||||
* strings
|
||||
*/
|
||||
@Deprecated
|
||||
public void skipChars(int length) throws IOException{
|
||||
for (int i = 0; i < length; i++) {
|
||||
byte b = readByte();
|
||||
if ((b & 0x80) == 0){
|
||||
//do nothing, we only need one byte
|
||||
} else if ((b & 0xE0) != 0xE0) {
|
||||
readByte();//read an additional byte
|
||||
} else {
|
||||
//read two additional bytes.
|
||||
readByte();
|
||||
readByte();
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/** Returns a clone of this stream.
|
||||
*
|
||||
* <p>Clones of a stream access the same data, and are positioned at the same
|
||||
* point as the stream they were cloned from.
|
||||
*
|
||||
* <p>Expert: Subclasses must ensure that clones may be positioned at
|
||||
* different points in the input from each other and from the stream they
|
||||
* were cloned from.
|
||||
*/
|
||||
@Override
|
||||
public Object clone() {
|
||||
DataInput clone = null;
|
||||
try {
|
||||
clone = (DataInput)super.clone();
|
||||
} catch (CloneNotSupportedException e) {}
|
||||
|
||||
clone.bytes = null;
|
||||
clone.chars = null;
|
||||
|
||||
return clone;
|
||||
}
|
||||
|
||||
public Map<String,String> readStringStringMap() throws IOException {
|
||||
final Map<String,String> map = new HashMap<String,String>();
|
||||
final int count = readInt();
|
||||
for(int i=0;i<count;i++) {
|
||||
final String key = readString();
|
||||
final String val = readString();
|
||||
map.put(key, val);
|
||||
}
|
||||
|
||||
return map;
|
||||
}
|
||||
}
|
|
@ -0,0 +1,194 @@
|
|||
package org.apache.lucene.store;
|
||||
|
||||
/**
|
||||
* Licensed to the Apache Software Foundation (ASF) under one or more
|
||||
* contributor license agreements. See the NOTICE file distributed with
|
||||
* this work for additional information regarding copyright ownership.
|
||||
* The ASF licenses this file to You under the Apache License, Version 2.0
|
||||
* (the "License"); you may not use this file except in compliance with
|
||||
* the License. You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
|
||||
import java.io.IOException;
|
||||
import java.util.Map;
|
||||
|
||||
import org.apache.lucene.util.BytesRef;
|
||||
import org.apache.lucene.util.UnicodeUtil;
|
||||
|
||||
/**
|
||||
* Abstract base class for performing write operations of Lucene's low-level
|
||||
* data types.
|
||||
*/
|
||||
public abstract class DataOutput {
|
||||
|
||||
private BytesRef utf8Result = new BytesRef(10);
|
||||
|
||||
/** Writes a single byte.
|
||||
* @see IndexInput#readByte()
|
||||
*/
|
||||
public abstract void writeByte(byte b) throws IOException;
|
||||
|
||||
/** Writes an array of bytes.
|
||||
* @param b the bytes to write
|
||||
* @param length the number of bytes to write
|
||||
* @see DataInput#readBytes(byte[],int,int)
|
||||
*/
|
||||
public void writeBytes(byte[] b, int length) throws IOException {
|
||||
writeBytes(b, 0, length);
|
||||
}
|
||||
|
||||
/** Writes an array of bytes.
|
||||
* @param b the bytes to write
|
||||
* @param offset the offset in the byte array
|
||||
* @param length the number of bytes to write
|
||||
* @see DataInput#readBytes(byte[],int,int)
|
||||
*/
|
||||
public abstract void writeBytes(byte[] b, int offset, int length) throws IOException;
|
||||
|
||||
/** Writes an int as four bytes.
|
||||
* @see DataInput#readInt()
|
||||
*/
|
||||
public void writeInt(int i) throws IOException {
|
||||
writeByte((byte)(i >> 24));
|
||||
writeByte((byte)(i >> 16));
|
||||
writeByte((byte)(i >> 8));
|
||||
writeByte((byte) i);
|
||||
}
|
||||
|
||||
/** Writes an int in a variable-length format. Writes between one and
|
||||
* five bytes. Smaller values take fewer bytes. Negative numbers are not
|
||||
* supported.
|
||||
* @see DataInput#readVInt()
|
||||
*/
|
||||
public void writeVInt(int i) throws IOException {
|
||||
while ((i & ~0x7F) != 0) {
|
||||
writeByte((byte)((i & 0x7f) | 0x80));
|
||||
i >>>= 7;
|
||||
}
|
||||
writeByte((byte)i);
|
||||
}
|
||||
|
||||
/** Writes a long as eight bytes.
|
||||
* @see DataInput#readLong()
|
||||
*/
|
||||
public void writeLong(long i) throws IOException {
|
||||
writeInt((int) (i >> 32));
|
||||
writeInt((int) i);
|
||||
}
|
||||
|
||||
/** Writes an long in a variable-length format. Writes between one and five
|
||||
* bytes. Smaller values take fewer bytes. Negative numbers are not
|
||||
* supported.
|
||||
* @see DataInput#readVLong()
|
||||
*/
|
||||
public void writeVLong(long i) throws IOException {
|
||||
while ((i & ~0x7F) != 0) {
|
||||
writeByte((byte)((i & 0x7f) | 0x80));
|
||||
i >>>= 7;
|
||||
}
|
||||
writeByte((byte)i);
|
||||
}
|
||||
|
||||
/** Writes a string.
|
||||
* @see DataInput#readString()
|
||||
*/
|
||||
public void writeString(String s) throws IOException {
|
||||
UnicodeUtil.UTF16toUTF8(s, 0, s.length(), utf8Result);
|
||||
writeVInt(utf8Result.length);
|
||||
writeBytes(utf8Result.bytes, 0, utf8Result.length);
|
||||
}
|
||||
|
||||
/** Writes a sub sequence of characters from s as the old
|
||||
* format (modified UTF-8 encoded bytes).
|
||||
* @param s the source of the characters
|
||||
* @param start the first character in the sequence
|
||||
* @param length the number of characters in the sequence
|
||||
* @deprecated -- please pre-convert to utf8 bytes
|
||||
* instead or use {@link #writeString}
|
||||
*/
|
||||
@Deprecated
|
||||
public void writeChars(String s, int start, int length)
|
||||
throws IOException {
|
||||
final int end = start + length;
|
||||
for (int i = start; i < end; i++) {
|
||||
final int code = s.charAt(i);
|
||||
if (code >= 0x01 && code <= 0x7F)
|
||||
writeByte((byte)code);
|
||||
else if (((code >= 0x80) && (code <= 0x7FF)) || code == 0) {
|
||||
writeByte((byte)(0xC0 | (code >> 6)));
|
||||
writeByte((byte)(0x80 | (code & 0x3F)));
|
||||
} else {
|
||||
writeByte((byte)(0xE0 | (code >>> 12)));
|
||||
writeByte((byte)(0x80 | ((code >> 6) & 0x3F)));
|
||||
writeByte((byte)(0x80 | (code & 0x3F)));
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/** Writes a sub sequence of characters from char[] as
|
||||
* the old format (modified UTF-8 encoded bytes).
|
||||
* @param s the source of the characters
|
||||
* @param start the first character in the sequence
|
||||
* @param length the number of characters in the sequence
|
||||
* @deprecated -- please pre-convert to utf8 bytes instead or use {@link #writeString}
|
||||
*/
|
||||
@Deprecated
|
||||
public void writeChars(char[] s, int start, int length)
|
||||
throws IOException {
|
||||
final int end = start + length;
|
||||
for (int i = start; i < end; i++) {
|
||||
final int code = s[i];
|
||||
if (code >= 0x01 && code <= 0x7F)
|
||||
writeByte((byte)code);
|
||||
else if (((code >= 0x80) && (code <= 0x7FF)) || code == 0) {
|
||||
writeByte((byte)(0xC0 | (code >> 6)));
|
||||
writeByte((byte)(0x80 | (code & 0x3F)));
|
||||
} else {
|
||||
writeByte((byte)(0xE0 | (code >>> 12)));
|
||||
writeByte((byte)(0x80 | ((code >> 6) & 0x3F)));
|
||||
writeByte((byte)(0x80 | (code & 0x3F)));
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
private static int COPY_BUFFER_SIZE = 16384;
|
||||
private byte[] copyBuffer;
|
||||
|
||||
/** Copy numBytes bytes from input to ourself. */
|
||||
public void copyBytes(DataInput input, long numBytes) throws IOException {
|
||||
assert numBytes >= 0: "numBytes=" + numBytes;
|
||||
long left = numBytes;
|
||||
if (copyBuffer == null)
|
||||
copyBuffer = new byte[COPY_BUFFER_SIZE];
|
||||
while(left > 0) {
|
||||
final int toCopy;
|
||||
if (left > COPY_BUFFER_SIZE)
|
||||
toCopy = COPY_BUFFER_SIZE;
|
||||
else
|
||||
toCopy = (int) left;
|
||||
input.readBytes(copyBuffer, 0, toCopy);
|
||||
writeBytes(copyBuffer, 0, toCopy);
|
||||
left -= toCopy;
|
||||
}
|
||||
}
|
||||
|
||||
public void writeStringStringMap(Map<String,String> map) throws IOException {
|
||||
if (map == null) {
|
||||
writeInt(0);
|
||||
} else {
|
||||
writeInt(map.size());
|
||||
for(final Map.Entry<String, String> entry: map.entrySet()) {
|
||||
writeString(entry.getKey());
|
||||
writeString(entry.getValue());
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
|
@ -17,180 +17,14 @@ package org.apache.lucene.store;
|
|||
* limitations under the License.
|
||||
*/
|
||||
|
||||
import java.io.IOException;
|
||||
import java.io.Closeable;
|
||||
import java.util.Map;
|
||||
import java.util.HashMap;
|
||||
import java.io.IOException;
|
||||
|
||||
/** Abstract base class for input from a file in a {@link Directory}. A
|
||||
* random-access input stream. Used for all Lucene index input operations.
|
||||
* @see Directory
|
||||
*/
|
||||
public abstract class IndexInput implements Cloneable,Closeable {
|
||||
private byte[] bytes; // used by readString()
|
||||
private char[] chars; // used by readModifiedUTF8String()
|
||||
private boolean preUTF8Strings; // true if we are reading old (modified UTF8) string format
|
||||
|
||||
/** Reads and returns a single byte.
|
||||
* @see IndexOutput#writeByte(byte)
|
||||
*/
|
||||
public abstract byte readByte() throws IOException;
|
||||
|
||||
/** Reads a specified number of bytes into an array at the specified offset.
|
||||
* @param b the array to read bytes into
|
||||
* @param offset the offset in the array to start storing bytes
|
||||
* @param len the number of bytes to read
|
||||
* @see IndexOutput#writeBytes(byte[],int)
|
||||
*/
|
||||
public abstract void readBytes(byte[] b, int offset, int len)
|
||||
throws IOException;
|
||||
|
||||
/** Reads a specified number of bytes into an array at the
|
||||
* specified offset with control over whether the read
|
||||
* should be buffered (callers who have their own buffer
|
||||
* should pass in "false" for useBuffer). Currently only
|
||||
* {@link BufferedIndexInput} respects this parameter.
|
||||
* @param b the array to read bytes into
|
||||
* @param offset the offset in the array to start storing bytes
|
||||
* @param len the number of bytes to read
|
||||
* @param useBuffer set to false if the caller will handle
|
||||
* buffering.
|
||||
* @see IndexOutput#writeBytes(byte[],int)
|
||||
*/
|
||||
public void readBytes(byte[] b, int offset, int len, boolean useBuffer)
|
||||
throws IOException
|
||||
{
|
||||
// Default to ignoring useBuffer entirely
|
||||
readBytes(b, offset, len);
|
||||
}
|
||||
|
||||
/** Reads four bytes and returns an int.
|
||||
* @see IndexOutput#writeInt(int)
|
||||
*/
|
||||
public int readInt() throws IOException {
|
||||
return ((readByte() & 0xFF) << 24) | ((readByte() & 0xFF) << 16)
|
||||
| ((readByte() & 0xFF) << 8) | (readByte() & 0xFF);
|
||||
}
|
||||
|
||||
/** Reads an int stored in variable-length format. Reads between one and
|
||||
* five bytes. Smaller values take fewer bytes. Negative numbers are not
|
||||
* supported.
|
||||
* @see IndexOutput#writeVInt(int)
|
||||
*/
|
||||
public int readVInt() throws IOException {
|
||||
byte b = readByte();
|
||||
int i = b & 0x7F;
|
||||
for (int shift = 7; (b & 0x80) != 0; shift += 7) {
|
||||
b = readByte();
|
||||
i |= (b & 0x7F) << shift;
|
||||
}
|
||||
return i;
|
||||
}
|
||||
|
||||
/** Reads eight bytes and returns a long.
|
||||
* @see IndexOutput#writeLong(long)
|
||||
*/
|
||||
public long readLong() throws IOException {
|
||||
return (((long)readInt()) << 32) | (readInt() & 0xFFFFFFFFL);
|
||||
}
|
||||
|
||||
/** Reads a long stored in variable-length format. Reads between one and
|
||||
* nine bytes. Smaller values take fewer bytes. Negative numbers are not
|
||||
* supported. */
|
||||
public long readVLong() throws IOException {
|
||||
byte b = readByte();
|
||||
long i = b & 0x7F;
|
||||
for (int shift = 7; (b & 0x80) != 0; shift += 7) {
|
||||
b = readByte();
|
||||
i |= (b & 0x7FL) << shift;
|
||||
}
|
||||
return i;
|
||||
}
|
||||
|
||||
/** Call this if readString should read characters stored
|
||||
* in the old modified UTF8 format (length in java chars
|
||||
* and java's modified UTF8 encoding). This is used for
|
||||
* indices written pre-2.4 See LUCENE-510 for details. */
|
||||
public void setModifiedUTF8StringsMode() {
|
||||
preUTF8Strings = true;
|
||||
}
|
||||
|
||||
/** Reads a string.
|
||||
* @see IndexOutput#writeString(String)
|
||||
*/
|
||||
public String readString() throws IOException {
|
||||
if (preUTF8Strings)
|
||||
return readModifiedUTF8String();
|
||||
int length = readVInt();
|
||||
if (bytes == null || length > bytes.length)
|
||||
bytes = new byte[(int) (length*1.25)];
|
||||
readBytes(bytes, 0, length);
|
||||
return new String(bytes, 0, length, "UTF-8");
|
||||
}
|
||||
|
||||
private String readModifiedUTF8String() throws IOException {
|
||||
int length = readVInt();
|
||||
if (chars == null || length > chars.length)
|
||||
chars = new char[length];
|
||||
readChars(chars, 0, length);
|
||||
return new String(chars, 0, length);
|
||||
}
|
||||
|
||||
/** Reads Lucene's old "modified UTF-8" encoded
|
||||
* characters into an array.
|
||||
* @param buffer the array to read characters into
|
||||
* @param start the offset in the array to start storing characters
|
||||
* @param length the number of characters to read
|
||||
* @see IndexOutput#writeChars(String,int,int)
|
||||
* @deprecated -- please use readString or readBytes
|
||||
* instead, and construct the string
|
||||
* from those utf8 bytes
|
||||
*/
|
||||
public void readChars(char[] buffer, int start, int length)
|
||||
throws IOException {
|
||||
final int end = start + length;
|
||||
for (int i = start; i < end; i++) {
|
||||
byte b = readByte();
|
||||
if ((b & 0x80) == 0)
|
||||
buffer[i] = (char)(b & 0x7F);
|
||||
else if ((b & 0xE0) != 0xE0) {
|
||||
buffer[i] = (char)(((b & 0x1F) << 6)
|
||||
| (readByte() & 0x3F));
|
||||
} else
|
||||
buffer[i] = (char)(((b & 0x0F) << 12)
|
||||
| ((readByte() & 0x3F) << 6)
|
||||
| (readByte() & 0x3F));
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Expert
|
||||
*
|
||||
* Similar to {@link #readChars(char[], int, int)} but does not do any conversion operations on the bytes it is reading in. It still
|
||||
* has to invoke {@link #readByte()} just as {@link #readChars(char[], int, int)} does, but it does not need a buffer to store anything
|
||||
* and it does not have to do any of the bitwise operations, since we don't actually care what is in the byte except to determine
|
||||
* how many more bytes to read
|
||||
* @param length The number of chars to read
|
||||
* @deprecated this method operates on old "modified utf8" encoded
|
||||
* strings
|
||||
*/
|
||||
public void skipChars(int length) throws IOException{
|
||||
for (int i = 0; i < length; i++) {
|
||||
byte b = readByte();
|
||||
if ((b & 0x80) == 0){
|
||||
//do nothing, we only need one byte
|
||||
}
|
||||
else if ((b & 0xE0) != 0xE0) {
|
||||
readByte();//read an additional byte
|
||||
} else{
|
||||
//read two additional bytes.
|
||||
readByte();
|
||||
readByte();
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
public abstract class IndexInput extends DataInput implements Cloneable,Closeable {
|
||||
/** Closes the stream to further operations. */
|
||||
public abstract void close() throws IOException;
|
||||
|
||||
|
@ -207,38 +41,4 @@ public abstract class IndexInput implements Cloneable,Closeable {
|
|||
|
||||
/** The number of bytes in the file. */
|
||||
public abstract long length();
|
||||
|
||||
/** Returns a clone of this stream.
|
||||
*
|
||||
* <p>Clones of a stream access the same data, and are positioned at the same
|
||||
* point as the stream they were cloned from.
|
||||
*
|
||||
* <p>Expert: Subclasses must ensure that clones may be positioned at
|
||||
* different points in the input from each other and from the stream they
|
||||
* were cloned from.
|
||||
*/
|
||||
@Override
|
||||
public Object clone() {
|
||||
IndexInput clone = null;
|
||||
try {
|
||||
clone = (IndexInput)super.clone();
|
||||
} catch (CloneNotSupportedException e) {}
|
||||
|
||||
clone.bytes = null;
|
||||
clone.chars = null;
|
||||
|
||||
return clone;
|
||||
}
|
||||
|
||||
public Map<String,String> readStringStringMap() throws IOException {
|
||||
final Map<String,String> map = new HashMap<String,String>();
|
||||
final int count = readInt();
|
||||
for(int i=0;i<count;i++) {
|
||||
final String key = readString();
|
||||
final String val = readString();
|
||||
map.put(key, val);
|
||||
}
|
||||
|
||||
return map;
|
||||
}
|
||||
}
|
|
@ -17,166 +17,15 @@ package org.apache.lucene.store;
|
|||
* limitations under the License.
|
||||
*/
|
||||
|
||||
import java.io.IOException;
|
||||
import java.io.Closeable;
|
||||
import java.util.Map;
|
||||
import org.apache.lucene.util.UnicodeUtil;
|
||||
import java.io.IOException;
|
||||
|
||||
/** Abstract base class for output to a file in a Directory. A random-access
|
||||
* output stream. Used for all Lucene index output operations.
|
||||
* @see Directory
|
||||
* @see IndexInput
|
||||
*/
|
||||
public abstract class IndexOutput implements Closeable {
|
||||
|
||||
private UnicodeUtil.UTF8Result utf8Result = new UnicodeUtil.UTF8Result();
|
||||
|
||||
/** Writes a single byte.
|
||||
* @see IndexInput#readByte()
|
||||
*/
|
||||
public abstract void writeByte(byte b) throws IOException;
|
||||
|
||||
/** Writes an array of bytes.
|
||||
* @param b the bytes to write
|
||||
* @param length the number of bytes to write
|
||||
* @see IndexInput#readBytes(byte[],int,int)
|
||||
*/
|
||||
public void writeBytes(byte[] b, int length) throws IOException {
|
||||
writeBytes(b, 0, length);
|
||||
}
|
||||
|
||||
/** Writes an array of bytes.
|
||||
* @param b the bytes to write
|
||||
* @param offset the offset in the byte array
|
||||
* @param length the number of bytes to write
|
||||
* @see IndexInput#readBytes(byte[],int,int)
|
||||
*/
|
||||
public abstract void writeBytes(byte[] b, int offset, int length) throws IOException;
|
||||
|
||||
/** Writes an int as four bytes.
|
||||
* @see IndexInput#readInt()
|
||||
*/
|
||||
public void writeInt(int i) throws IOException {
|
||||
writeByte((byte)(i >> 24));
|
||||
writeByte((byte)(i >> 16));
|
||||
writeByte((byte)(i >> 8));
|
||||
writeByte((byte) i);
|
||||
}
|
||||
|
||||
/** Writes an int in a variable-length format. Writes between one and
|
||||
* five bytes. Smaller values take fewer bytes. Negative numbers are not
|
||||
* supported.
|
||||
* @see IndexInput#readVInt()
|
||||
*/
|
||||
public void writeVInt(int i) throws IOException {
|
||||
while ((i & ~0x7F) != 0) {
|
||||
writeByte((byte)((i & 0x7f) | 0x80));
|
||||
i >>>= 7;
|
||||
}
|
||||
writeByte((byte)i);
|
||||
}
|
||||
|
||||
/** Writes a long as eight bytes.
|
||||
* @see IndexInput#readLong()
|
||||
*/
|
||||
public void writeLong(long i) throws IOException {
|
||||
writeInt((int) (i >> 32));
|
||||
writeInt((int) i);
|
||||
}
|
||||
|
||||
/** Writes an long in a variable-length format. Writes between one and five
|
||||
* bytes. Smaller values take fewer bytes. Negative numbers are not
|
||||
* supported.
|
||||
* @see IndexInput#readVLong()
|
||||
*/
|
||||
public void writeVLong(long i) throws IOException {
|
||||
while ((i & ~0x7F) != 0) {
|
||||
writeByte((byte)((i & 0x7f) | 0x80));
|
||||
i >>>= 7;
|
||||
}
|
||||
writeByte((byte)i);
|
||||
}
|
||||
|
||||
/** Writes a string.
|
||||
* @see IndexInput#readString()
|
||||
*/
|
||||
public void writeString(String s) throws IOException {
|
||||
UnicodeUtil.UTF16toUTF8(s, 0, s.length(), utf8Result);
|
||||
writeVInt(utf8Result.length);
|
||||
writeBytes(utf8Result.result, 0, utf8Result.length);
|
||||
}
|
||||
|
||||
/** Writes a sub sequence of characters from s as the old
|
||||
* format (modified UTF-8 encoded bytes).
|
||||
* @param s the source of the characters
|
||||
* @param start the first character in the sequence
|
||||
* @param length the number of characters in the sequence
|
||||
* @deprecated -- please pre-convert to utf8 bytes
|
||||
* instead or use {@link #writeString}
|
||||
*/
|
||||
public void writeChars(String s, int start, int length)
|
||||
throws IOException {
|
||||
final int end = start + length;
|
||||
for (int i = start; i < end; i++) {
|
||||
final int code = (int)s.charAt(i);
|
||||
if (code >= 0x01 && code <= 0x7F)
|
||||
writeByte((byte)code);
|
||||
else if (((code >= 0x80) && (code <= 0x7FF)) || code == 0) {
|
||||
writeByte((byte)(0xC0 | (code >> 6)));
|
||||
writeByte((byte)(0x80 | (code & 0x3F)));
|
||||
} else {
|
||||
writeByte((byte)(0xE0 | (code >>> 12)));
|
||||
writeByte((byte)(0x80 | ((code >> 6) & 0x3F)));
|
||||
writeByte((byte)(0x80 | (code & 0x3F)));
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/** Writes a sub sequence of characters from char[] as
|
||||
* the old format (modified UTF-8 encoded bytes).
|
||||
* @param s the source of the characters
|
||||
* @param start the first character in the sequence
|
||||
* @param length the number of characters in the sequence
|
||||
* @deprecated -- please pre-convert to utf8 bytes instead or use {@link #writeString}
|
||||
*/
|
||||
public void writeChars(char[] s, int start, int length)
|
||||
throws IOException {
|
||||
final int end = start + length;
|
||||
for (int i = start; i < end; i++) {
|
||||
final int code = (int)s[i];
|
||||
if (code >= 0x01 && code <= 0x7F)
|
||||
writeByte((byte)code);
|
||||
else if (((code >= 0x80) && (code <= 0x7FF)) || code == 0) {
|
||||
writeByte((byte)(0xC0 | (code >> 6)));
|
||||
writeByte((byte)(0x80 | (code & 0x3F)));
|
||||
} else {
|
||||
writeByte((byte)(0xE0 | (code >>> 12)));
|
||||
writeByte((byte)(0x80 | ((code >> 6) & 0x3F)));
|
||||
writeByte((byte)(0x80 | (code & 0x3F)));
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
private static int COPY_BUFFER_SIZE = 16384;
|
||||
private byte[] copyBuffer;
|
||||
|
||||
/** Copy numBytes bytes from input to ourself. */
|
||||
public void copyBytes(IndexInput input, long numBytes) throws IOException {
|
||||
assert numBytes >= 0: "numBytes=" + numBytes;
|
||||
long left = numBytes;
|
||||
if (copyBuffer == null)
|
||||
copyBuffer = new byte[COPY_BUFFER_SIZE];
|
||||
while(left > 0) {
|
||||
final int toCopy;
|
||||
if (left > COPY_BUFFER_SIZE)
|
||||
toCopy = COPY_BUFFER_SIZE;
|
||||
else
|
||||
toCopy = (int) left;
|
||||
input.readBytes(copyBuffer, 0, toCopy);
|
||||
writeBytes(copyBuffer, 0, toCopy);
|
||||
left -= toCopy;
|
||||
}
|
||||
}
|
||||
public abstract class IndexOutput extends DataOutput implements Closeable {
|
||||
|
||||
/** Forces any buffered output to be written. */
|
||||
public abstract void flush() throws IOException;
|
||||
|
@ -208,17 +57,5 @@ public abstract class IndexOutput implements Closeable {
|
|||
* undefined. Otherwise the file is truncated.
|
||||
* @param length file length
|
||||
*/
|
||||
public void setLength(long length) throws IOException {};
|
||||
|
||||
public void writeStringStringMap(Map<String,String> map) throws IOException {
|
||||
if (map == null) {
|
||||
writeInt(0);
|
||||
} else {
|
||||
writeInt(map.size());
|
||||
for(final Map.Entry<String, String> entry: map.entrySet()) {
|
||||
writeString(entry.getKey());
|
||||
writeString(entry.getValue());
|
||||
}
|
||||
}
|
||||
}
|
||||
public void setLength(long length) throws IOException {}
|
||||
}
|
||||
|
|
|
@ -0,0 +1,27 @@
|
|||
package org.apache.lucene.util;
|
||||
|
||||
/**
|
||||
* Licensed to the Apache Software Foundation (ASF) under one or more
|
||||
* contributor license agreements. See the NOTICE file distributed with
|
||||
* this work for additional information regarding copyright ownership.
|
||||
* The ASF licenses this file to You under the Apache License, Version 2.0
|
||||
* (the "License"); you may not use this file except in compliance with
|
||||
* the License. You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
|
||||
// stub for tests only
|
||||
public class BytesRef {
|
||||
public BytesRef(int capacity) {}
|
||||
public BytesRef() {}
|
||||
public byte[] bytes;
|
||||
public int offset;
|
||||
public int length;
|
||||
};
|
|
@ -106,6 +106,10 @@ final public class UnicodeUtil {
|
|||
}
|
||||
}
|
||||
|
||||
// stubs for tests only
|
||||
public static void UTF16toUTF8(char[] source, int offset, int length, BytesRef result) {}
|
||||
public static void UTF16toUTF8(CharSequence s, int offset, int length, BytesRef result) {}
|
||||
|
||||
/** Encode characters from a char[] source, starting at
|
||||
* offset and stopping when the character 0xffff is seen.
|
||||
* Returns the number of bytes written to bytesOut. */
|
||||
|
@ -223,7 +227,7 @@ final public class UnicodeUtil {
|
|||
/** Encode characters from this String, starting at offset
|
||||
* for length characters. Returns the number of bytes
|
||||
* written to bytesOut. */
|
||||
public static void UTF16toUTF8(final String s, final int offset, final int length, UTF8Result result) {
|
||||
public static void UTF16toUTF8(final CharSequence s, final int offset, final int length, UTF8Result result) {
|
||||
final int end = offset + length;
|
||||
|
||||
byte[] out = result.result;
|
||||
|
|
|
@ -1,73 +0,0 @@
|
|||
package org.apache.lucene.analysis;
|
||||
|
||||
/**
|
||||
* Licensed to the Apache Software Foundation (ASF) under one or more
|
||||
* contributor license agreements. See the NOTICE file distributed with
|
||||
* this work for additional information regarding copyright ownership.
|
||||
* The ASF licenses this file to You under the Apache License, Version 2.0
|
||||
* (the "License"); you may not use this file except in compliance with
|
||||
* the License. You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
|
||||
import org.apache.lucene.util.NumericUtils;
|
||||
import org.apache.lucene.analysis.tokenattributes.TermAttribute;
|
||||
import org.apache.lucene.analysis.tokenattributes.TypeAttribute;
|
||||
|
||||
public class TestNumericTokenStream extends BaseTokenStreamTestCase {
|
||||
|
||||
static final long lvalue = 4573245871874382L;
|
||||
static final int ivalue = 123456;
|
||||
|
||||
public void testLongStream() throws Exception {
|
||||
final NumericTokenStream stream=new NumericTokenStream().setLongValue(lvalue);
|
||||
// use getAttribute to test if attributes really exist, if not an IAE will be throwed
|
||||
final TermAttribute termAtt = stream.getAttribute(TermAttribute.class);
|
||||
final TypeAttribute typeAtt = stream.getAttribute(TypeAttribute.class);
|
||||
for (int shift=0; shift<64; shift+=NumericUtils.PRECISION_STEP_DEFAULT) {
|
||||
assertTrue("New token is available", stream.incrementToken());
|
||||
assertEquals("Term is correctly encoded", NumericUtils.longToPrefixCoded(lvalue, shift), termAtt.term());
|
||||
assertEquals("Type correct", (shift == 0) ? NumericTokenStream.TOKEN_TYPE_FULL_PREC : NumericTokenStream.TOKEN_TYPE_LOWER_PREC, typeAtt.type());
|
||||
}
|
||||
assertFalse("No more tokens available", stream.incrementToken());
|
||||
}
|
||||
|
||||
public void testIntStream() throws Exception {
|
||||
final NumericTokenStream stream=new NumericTokenStream().setIntValue(ivalue);
|
||||
// use getAttribute to test if attributes really exist, if not an IAE will be throwed
|
||||
final TermAttribute termAtt = stream.getAttribute(TermAttribute.class);
|
||||
final TypeAttribute typeAtt = stream.getAttribute(TypeAttribute.class);
|
||||
for (int shift=0; shift<32; shift+=NumericUtils.PRECISION_STEP_DEFAULT) {
|
||||
assertTrue("New token is available", stream.incrementToken());
|
||||
assertEquals("Term is correctly encoded", NumericUtils.intToPrefixCoded(ivalue, shift), termAtt.term());
|
||||
assertEquals("Type correct", (shift == 0) ? NumericTokenStream.TOKEN_TYPE_FULL_PREC : NumericTokenStream.TOKEN_TYPE_LOWER_PREC, typeAtt.type());
|
||||
}
|
||||
assertFalse("No more tokens available", stream.incrementToken());
|
||||
}
|
||||
|
||||
public void testNotInitialized() throws Exception {
|
||||
final NumericTokenStream stream=new NumericTokenStream();
|
||||
|
||||
try {
|
||||
stream.reset();
|
||||
fail("reset() should not succeed.");
|
||||
} catch (IllegalStateException e) {
|
||||
// pass
|
||||
}
|
||||
|
||||
try {
|
||||
stream.incrementToken();
|
||||
fail("incrementToken() should not succeed.");
|
||||
} catch (IllegalStateException e) {
|
||||
// pass
|
||||
}
|
||||
}
|
||||
|
||||
}
|
|
@ -107,10 +107,10 @@ public class TestTermAttributeImpl extends LuceneTestCase {
|
|||
char[] b = {'a', 'l', 'o', 'h', 'a'};
|
||||
TermAttributeImpl t = new TermAttributeImpl();
|
||||
t.setTermBuffer(b, 0, 5);
|
||||
assertEquals("term=aloha", t.toString());
|
||||
assertEquals("aloha", t.toString());
|
||||
|
||||
t.setTermBuffer("hi there");
|
||||
assertEquals("term=hi there", t.toString());
|
||||
assertEquals("hi there", t.toString());
|
||||
}
|
||||
|
||||
public void testMixedStringArray() throws Exception {
|
||||
|
|
|
@ -35,6 +35,7 @@ import org.apache.lucene.document.Field;
|
|||
import org.apache.lucene.store.Directory;
|
||||
import org.apache.lucene.store.FSDirectory;
|
||||
import org.apache.lucene.util.LuceneTestCase;
|
||||
import org.apache.lucene.index.codecs.CodecProvider;
|
||||
|
||||
|
||||
/** JUnit adaptation of an older test case DocTest. */
|
||||
|
@ -180,20 +181,24 @@ public class TestDoc extends LuceneTestCase {
|
|||
SegmentReader r1 = SegmentReader.get(true, si1, IndexReader.DEFAULT_TERMS_INDEX_DIVISOR);
|
||||
SegmentReader r2 = SegmentReader.get(true, si2, IndexReader.DEFAULT_TERMS_INDEX_DIVISOR);
|
||||
|
||||
SegmentMerger merger = new SegmentMerger(si1.dir, merged);
|
||||
SegmentMerger merger = new SegmentMerger(si1.dir, IndexWriter.DEFAULT_TERM_INDEX_INTERVAL, merged, null, CodecProvider.getDefault());
|
||||
|
||||
merger.add(r1);
|
||||
merger.add(r2);
|
||||
merger.merge();
|
||||
merger.closeReaders();
|
||||
|
||||
final SegmentInfo info = new SegmentInfo(merged, si1.docCount + si2.docCount, si1.dir,
|
||||
useCompoundFile, true, -1, null, false, merger.hasProx(),
|
||||
merger.getCodec());
|
||||
|
||||
if (useCompoundFile) {
|
||||
List filesToDelete = merger.createCompoundFile(merged + ".cfs");
|
||||
List filesToDelete = merger.createCompoundFile(merged + ".cfs", info);
|
||||
for (Iterator iter = filesToDelete.iterator(); iter.hasNext();)
|
||||
si1.dir.deleteFile((String) iter.next());
|
||||
}
|
||||
|
||||
return new SegmentInfo(merged, si1.docCount + si2.docCount, si1.dir, useCompoundFile, true);
|
||||
return info;
|
||||
}
|
||||
|
||||
|
||||
|
|
|
@ -986,29 +986,7 @@ public class TestIndexReader extends LuceneTestCase
|
|||
// new IndexFileDeleter, have it delete
|
||||
// unreferenced files, then verify that in fact
|
||||
// no files were deleted:
|
||||
String[] startFiles = dir.listAll();
|
||||
SegmentInfos infos = new SegmentInfos();
|
||||
infos.read(dir);
|
||||
new IndexFileDeleter(dir, new KeepOnlyLastCommitDeletionPolicy(), infos, null, null);
|
||||
String[] endFiles = dir.listAll();
|
||||
|
||||
Arrays.sort(startFiles);
|
||||
Arrays.sort(endFiles);
|
||||
|
||||
//for(int i=0;i<startFiles.length;i++) {
|
||||
// System.out.println(" startFiles: " + i + ": " + startFiles[i]);
|
||||
//}
|
||||
|
||||
if (!Arrays.equals(startFiles, endFiles)) {
|
||||
String successStr;
|
||||
if (success) {
|
||||
successStr = "success";
|
||||
} else {
|
||||
successStr = "IOException";
|
||||
err.printStackTrace();
|
||||
}
|
||||
fail("reader.close() failed to delete unreferenced files after " + successStr + " (" + diskFree + " bytes): before delete:\n " + arrayToString(startFiles) + "\n after delete:\n " + arrayToString(endFiles));
|
||||
}
|
||||
TestIndexWriter.assertNoUnreferencedFiles(dir, "reader.close() failed to delete unreferenced files");
|
||||
|
||||
// Finally, verify index is not corrupt, and, if
|
||||
// we succeeded, we see all docs changed, and if
|
||||
|
@ -1760,7 +1738,6 @@ public class TestIndexReader extends LuceneTestCase
|
|||
} catch (IllegalStateException ise) {
|
||||
// expected
|
||||
}
|
||||
assertFalse(((SegmentReader) r.getSequentialSubReaders()[0]).termsIndexLoaded());
|
||||
|
||||
assertEquals(-1, ((SegmentReader) r.getSequentialSubReaders()[0]).getTermInfosIndexDivisor());
|
||||
writer = new IndexWriter(dir, new WhitespaceAnalyzer(), IndexWriter.MaxFieldLength.UNLIMITED);
|
||||
|
@ -1773,7 +1750,12 @@ public class TestIndexReader extends LuceneTestCase
|
|||
IndexReader[] subReaders = r2.getSequentialSubReaders();
|
||||
assertEquals(2, subReaders.length);
|
||||
for(int i=0;i<2;i++) {
|
||||
assertFalse(((SegmentReader) subReaders[i]).termsIndexLoaded());
|
||||
try {
|
||||
subReaders[i].docFreq(new Term("field", "f"));
|
||||
fail("did not hit expected exception");
|
||||
} catch (IllegalStateException ise) {
|
||||
// expected
|
||||
}
|
||||
}
|
||||
r2.close();
|
||||
dir.close();
|
||||
|
|
|
@ -61,8 +61,10 @@ import org.apache.lucene.store.IndexOutput;
|
|||
import org.apache.lucene.store.Lock;
|
||||
import org.apache.lucene.store.LockFactory;
|
||||
import org.apache.lucene.store.MockRAMDirectory;
|
||||
import org.apache.lucene.store.NoLockFactory;
|
||||
import org.apache.lucene.store.RAMDirectory;
|
||||
import org.apache.lucene.store.SingleInstanceLockFactory;
|
||||
import org.apache.lucene.util.BytesRef;
|
||||
import org.apache.lucene.util.UnicodeUtil;
|
||||
import org.apache.lucene.util._TestUtil;
|
||||
import org.apache.lucene.util.Version;
|
||||
|
@ -524,10 +526,15 @@ public class TestIndexWriter extends LuceneTestCase {
|
|||
}
|
||||
|
||||
public static void assertNoUnreferencedFiles(Directory dir, String message) throws IOException {
|
||||
String[] startFiles = dir.listAll();
|
||||
SegmentInfos infos = new SegmentInfos();
|
||||
infos.read(dir);
|
||||
new IndexFileDeleter(dir, new KeepOnlyLastCommitDeletionPolicy(), infos, null, null);
|
||||
final LockFactory lf = dir.getLockFactory();
|
||||
String[] startFiles;
|
||||
try {
|
||||
dir.setLockFactory(new NoLockFactory());
|
||||
startFiles = dir.listAll();
|
||||
new IndexWriter(dir, new WhitespaceAnalyzer(), IndexWriter.MaxFieldLength.UNLIMITED).close();
|
||||
} finally {
|
||||
dir.setLockFactory(lf);
|
||||
}
|
||||
String[] endFiles = dir.listAll();
|
||||
|
||||
Arrays.sort(startFiles);
|
||||
|
@ -3309,7 +3316,7 @@ public class TestIndexWriter extends LuceneTestCase {
|
|||
// LUCENE-510
|
||||
public void testAllUnicodeChars() throws Throwable {
|
||||
|
||||
UnicodeUtil.UTF8Result utf8 = new UnicodeUtil.UTF8Result();
|
||||
BytesRef utf8 = new BytesRef(10);
|
||||
UnicodeUtil.UTF16Result utf16 = new UnicodeUtil.UTF16Result();
|
||||
char[] chars = new char[2];
|
||||
for(int ch=0;ch<0x0010FFFF;ch++) {
|
||||
|
@ -3329,16 +3336,16 @@ public class TestIndexWriter extends LuceneTestCase {
|
|||
UnicodeUtil.UTF16toUTF8(chars, 0, len, utf8);
|
||||
|
||||
String s1 = new String(chars, 0, len);
|
||||
String s2 = new String(utf8.result, 0, utf8.length, "UTF-8");
|
||||
String s2 = new String(utf8.bytes, 0, utf8.length, "UTF-8");
|
||||
assertEquals("codepoint " + ch, s1, s2);
|
||||
|
||||
UnicodeUtil.UTF8toUTF16(utf8.result, 0, utf8.length, utf16);
|
||||
UnicodeUtil.UTF8toUTF16(utf8.bytes, 0, utf8.length, utf16);
|
||||
assertEquals("codepoint " + ch, s1, new String(utf16.result, 0, utf16.length));
|
||||
|
||||
byte[] b = s1.getBytes("UTF-8");
|
||||
assertEquals(utf8.length, b.length);
|
||||
for(int j=0;j<utf8.length;j++)
|
||||
assertEquals(utf8.result[j], b[j]);
|
||||
assertEquals(utf8.bytes[j], b[j]);
|
||||
}
|
||||
}
|
||||
|
||||
|
@ -3403,7 +3410,7 @@ public class TestIndexWriter extends LuceneTestCase {
|
|||
char[] buffer = new char[20];
|
||||
char[] expected = new char[20];
|
||||
|
||||
UnicodeUtil.UTF8Result utf8 = new UnicodeUtil.UTF8Result();
|
||||
BytesRef utf8 = new BytesRef(20);
|
||||
UnicodeUtil.UTF16Result utf16 = new UnicodeUtil.UTF16Result();
|
||||
|
||||
for(int iter=0;iter<100000;iter++) {
|
||||
|
@ -3414,10 +3421,10 @@ public class TestIndexWriter extends LuceneTestCase {
|
|||
byte[] b = new String(buffer, 0, 20).getBytes("UTF-8");
|
||||
assertEquals(b.length, utf8.length);
|
||||
for(int i=0;i<b.length;i++)
|
||||
assertEquals(b[i], utf8.result[i]);
|
||||
assertEquals(b[i], utf8.bytes[i]);
|
||||
}
|
||||
|
||||
UnicodeUtil.UTF8toUTF16(utf8.result, 0, utf8.length, utf16);
|
||||
UnicodeUtil.UTF8toUTF16(utf8.bytes, 0, utf8.length, utf16);
|
||||
assertEquals(utf16.length, 20);
|
||||
for(int i=0;i<20;i++)
|
||||
assertEquals(expected[i], utf16.result[i]);
|
||||
|
@ -3430,7 +3437,7 @@ public class TestIndexWriter extends LuceneTestCase {
|
|||
char[] buffer = new char[20];
|
||||
char[] expected = new char[20];
|
||||
|
||||
UnicodeUtil.UTF8Result utf8 = new UnicodeUtil.UTF8Result();
|
||||
BytesRef utf8 = new BytesRef(20);
|
||||
UnicodeUtil.UTF16Result utf16 = new UnicodeUtil.UTF16Result();
|
||||
UnicodeUtil.UTF16Result utf16a = new UnicodeUtil.UTF16Result();
|
||||
|
||||
|
@ -3453,7 +3460,7 @@ public class TestIndexWriter extends LuceneTestCase {
|
|||
byte[] b = new String(buffer, 0, 20).getBytes("UTF-8");
|
||||
assertEquals(b.length, utf8.length);
|
||||
for(int i=0;i<b.length;i++)
|
||||
assertEquals(b[i], utf8.result[i]);
|
||||
assertEquals(b[i], utf8.bytes[i]);
|
||||
}
|
||||
|
||||
int bytePrefix = 20;
|
||||
|
@ -3461,18 +3468,18 @@ public class TestIndexWriter extends LuceneTestCase {
|
|||
bytePrefix = 0;
|
||||
else
|
||||
for(int i=0;i<20;i++)
|
||||
if (last[i] != utf8.result[i]) {
|
||||
if (last[i] != utf8.bytes[i]) {
|
||||
bytePrefix = i;
|
||||
break;
|
||||
}
|
||||
System.arraycopy(utf8.result, 0, last, 0, utf8.length);
|
||||
System.arraycopy(utf8.bytes, 0, last, 0, utf8.length);
|
||||
|
||||
UnicodeUtil.UTF8toUTF16(utf8.result, bytePrefix, utf8.length-bytePrefix, utf16);
|
||||
UnicodeUtil.UTF8toUTF16(utf8.bytes, bytePrefix, utf8.length-bytePrefix, utf16);
|
||||
assertEquals(20, utf16.length);
|
||||
for(int i=0;i<20;i++)
|
||||
assertEquals(expected[i], utf16.result[i]);
|
||||
|
||||
UnicodeUtil.UTF8toUTF16(utf8.result, 0, utf8.length, utf16a);
|
||||
UnicodeUtil.UTF8toUTF16(utf8.bytes, 0, utf8.length, utf16a);
|
||||
assertEquals(20, utf16a.length);
|
||||
for(int i=0;i<20;i++)
|
||||
assertEquals(expected[i], utf16a.result[i]);
|
||||
|
@ -4331,10 +4338,6 @@ public class TestIndexWriter extends LuceneTestCase {
|
|||
|
||||
assertTrue(dir.fileExists("myrandomfile"));
|
||||
|
||||
// Make sure this does not copy myrandomfile:
|
||||
Directory dir2 = new RAMDirectory(dir);
|
||||
assertTrue(!dir2.fileExists("myrandomfile"));
|
||||
|
||||
} finally {
|
||||
dir.close();
|
||||
_TestUtil.rmDir(indexDir);
|
||||
|
|
|
@ -784,20 +784,8 @@ public class TestIndexWriterDelete extends LuceneTestCase {
|
|||
}
|
||||
}
|
||||
|
||||
String[] startFiles = dir.listAll();
|
||||
SegmentInfos infos = new SegmentInfos();
|
||||
infos.read(dir);
|
||||
new IndexFileDeleter(dir, new KeepOnlyLastCommitDeletionPolicy(), infos, null, null);
|
||||
String[] endFiles = dir.listAll();
|
||||
|
||||
if (!Arrays.equals(startFiles, endFiles)) {
|
||||
fail("docswriter abort() failed to delete unreferenced files:\n before delete:\n "
|
||||
+ arrayToString(startFiles) + "\n after delete:\n "
|
||||
+ arrayToString(endFiles));
|
||||
}
|
||||
|
||||
modifier.close();
|
||||
|
||||
TestIndexWriter.assertNoUnreferencedFiles(dir, "docsWriter.abort() failed to delete unreferenced files");
|
||||
modifier.close();
|
||||
}
|
||||
|
||||
private String arrayToString(String[] l) {
|
||||
|
|
|
@ -86,7 +86,7 @@ public class TestIndexWriterReader extends LuceneTestCase {
|
|||
|
||||
// get a reader
|
||||
IndexReader r1 = writer.getReader();
|
||||
assertTrue(r1.isCurrent());
|
||||
//assertTrue(r1.isCurrent());
|
||||
|
||||
String id10 = r1.document(10).getField("id").stringValue();
|
||||
|
||||
|
@ -94,7 +94,7 @@ public class TestIndexWriterReader extends LuceneTestCase {
|
|||
newDoc.removeField("id");
|
||||
newDoc.add(new Field("id", Integer.toString(8000), Store.YES, Index.NOT_ANALYZED));
|
||||
writer.updateDocument(new Term("id", id10), newDoc);
|
||||
assertFalse(r1.isCurrent());
|
||||
//assertFalse(r1.isCurrent());
|
||||
|
||||
IndexReader r2 = writer.getReader();
|
||||
assertTrue(r2.isCurrent());
|
||||
|
@ -157,7 +157,7 @@ public class TestIndexWriterReader extends LuceneTestCase {
|
|||
IndexReader r0 = writer.getReader();
|
||||
assertTrue(r0.isCurrent());
|
||||
writer.addIndexesNoOptimize(new Directory[] { dir2 });
|
||||
assertFalse(r0.isCurrent());
|
||||
//assertFalse(r0.isCurrent());
|
||||
r0.close();
|
||||
|
||||
IndexReader r1 = writer.getReader();
|
||||
|
|
|
@ -48,7 +48,7 @@ public class TestLazyProxSkipping extends LuceneTestCase {
|
|||
@Override
|
||||
public IndexInput openInput(String name) throws IOException {
|
||||
IndexInput ii = super.openInput(name);
|
||||
if (name.endsWith(".prx")) {
|
||||
if (name.endsWith(".prx") || name.endsWith(".pos")) {
|
||||
// we decorate the proxStream with a wrapper class that allows to count the number of calls of seek()
|
||||
ii = new SeeksCountingStream(ii);
|
||||
}
|
||||
|
|
|
@ -30,6 +30,7 @@ import org.apache.lucene.document.Field;
|
|||
import org.apache.lucene.document.Field.Index;
|
||||
import org.apache.lucene.document.Field.Store;
|
||||
import org.apache.lucene.store.IndexInput;
|
||||
import org.apache.lucene.store.MockRAMDirectory;
|
||||
import org.apache.lucene.store.RAMDirectory;
|
||||
import org.apache.lucene.util.LuceneTestCase;
|
||||
|
||||
|
@ -42,6 +43,16 @@ import org.apache.lucene.util.LuceneTestCase;
|
|||
*
|
||||
*/
|
||||
public class TestMultiLevelSkipList extends LuceneTestCase {
|
||||
|
||||
class CountingRAMDirectory extends MockRAMDirectory {
|
||||
public IndexInput openInput(String fileName) throws IOException {
|
||||
IndexInput in = super.openInput(fileName);
|
||||
if (fileName.endsWith(".frq"))
|
||||
in = new CountingStream(in);
|
||||
return in;
|
||||
}
|
||||
}
|
||||
|
||||
public void testSimpleSkip() throws IOException {
|
||||
RAMDirectory dir = new RAMDirectory();
|
||||
IndexWriter writer = new IndexWriter(dir, new PayloadAnalyzer(), true,
|
||||
|
@ -57,8 +68,7 @@ public class TestMultiLevelSkipList extends LuceneTestCase {
|
|||
writer.close();
|
||||
|
||||
IndexReader reader = SegmentReader.getOnlySegmentReader(dir);
|
||||
SegmentTermPositions tp = (SegmentTermPositions) reader.termPositions();
|
||||
tp.freqStream = new CountingStream(tp.freqStream);
|
||||
TermPositions tp = reader.termPositions();
|
||||
|
||||
for (int i = 0; i < 2; i++) {
|
||||
counter = 0;
|
||||
|
|
|
@ -39,6 +39,7 @@ import org.apache.lucene.document.Field;
|
|||
import org.apache.lucene.store.Directory;
|
||||
import org.apache.lucene.store.FSDirectory;
|
||||
import org.apache.lucene.store.RAMDirectory;
|
||||
import org.apache.lucene.util.BytesRef;
|
||||
import org.apache.lucene.util.LuceneTestCase;
|
||||
import org.apache.lucene.util.UnicodeUtil;
|
||||
import org.apache.lucene.util._TestUtil;
|
||||
|
@ -257,10 +258,12 @@ public class TestPayloads extends LuceneTestCase {
|
|||
tp.next();
|
||||
tp.nextPosition();
|
||||
// now we don't read this payload
|
||||
tp.next();
|
||||
tp.nextPosition();
|
||||
assertEquals("Wrong payload length.", 1, tp.getPayloadLength());
|
||||
byte[] payload = tp.getPayload(null, 0);
|
||||
assertEquals(payload[0], payloadData[numTerms]);
|
||||
tp.next();
|
||||
tp.nextPosition();
|
||||
|
||||
// we don't read this payload and skip to a different document
|
||||
|
@ -559,13 +562,13 @@ public class TestPayloads extends LuceneTestCase {
|
|||
}
|
||||
}
|
||||
|
||||
private UnicodeUtil.UTF8Result utf8Result = new UnicodeUtil.UTF8Result();
|
||||
private BytesRef utf8Result = new BytesRef(10);
|
||||
|
||||
synchronized String bytesToString(byte[] bytes) {
|
||||
String s = new String(bytes);
|
||||
UnicodeUtil.UTF16toUTF8(s, 0, s.length(), utf8Result);
|
||||
try {
|
||||
return new String(utf8Result.result, 0, utf8Result.length, "UTF-8");
|
||||
return new String(utf8Result.bytes, 0, utf8Result.length, "UTF-8");
|
||||
} catch (UnsupportedEncodingException uee) {
|
||||
return null;
|
||||
}
|
||||
|
|
|
@ -18,9 +18,11 @@ package org.apache.lucene.index;
|
|||
*/
|
||||
|
||||
import org.apache.lucene.util.LuceneTestCase;
|
||||
import org.apache.lucene.store.BufferedIndexInput;
|
||||
import org.apache.lucene.store.Directory;
|
||||
import org.apache.lucene.store.RAMDirectory;
|
||||
import org.apache.lucene.document.Document;
|
||||
import org.apache.lucene.index.codecs.CodecProvider;
|
||||
|
||||
import java.io.IOException;
|
||||
import java.util.Collection;
|
||||
|
@ -63,14 +65,16 @@ public class TestSegmentMerger extends LuceneTestCase {
|
|||
}
|
||||
|
||||
public void testMerge() throws IOException {
|
||||
SegmentMerger merger = new SegmentMerger(mergedDir, mergedSegment);
|
||||
SegmentMerger merger = new SegmentMerger(mergedDir, IndexWriter.DEFAULT_TERM_INDEX_INTERVAL, mergedSegment, null, CodecProvider.getDefault());
|
||||
merger.add(reader1);
|
||||
merger.add(reader2);
|
||||
int docsMerged = merger.merge();
|
||||
merger.closeReaders();
|
||||
assertTrue(docsMerged == 2);
|
||||
//Should be able to open a new SegmentReader against the new directory
|
||||
SegmentReader mergedReader = SegmentReader.get(true, new SegmentInfo(mergedSegment, docsMerged, mergedDir, false, true), IndexReader.DEFAULT_TERMS_INDEX_DIVISOR);
|
||||
SegmentReader mergedReader = SegmentReader.get(false, mergedDir, new SegmentInfo(mergedSegment, docsMerged, mergedDir, false, true,
|
||||
-1, null, false, merger.hasProx(), merger.getCodec()), BufferedIndexInput.BUFFER_SIZE, true, IndexReader.DEFAULT_TERMS_INDEX_DIVISOR, null);
|
||||
|
||||
assertTrue(mergedReader != null);
|
||||
assertTrue(mergedReader.numDocs() == 2);
|
||||
Document newDoc1 = mergedReader.document(0);
|
||||
|
|
|
@ -137,6 +137,7 @@ public class TestSegmentReader extends LuceneTestCase {
|
|||
TermPositions positions = reader.termPositions();
|
||||
positions.seek(new Term(DocHelper.TEXT_FIELD_1_KEY, "field"));
|
||||
assertTrue(positions != null);
|
||||
assertTrue(positions.next());
|
||||
assertTrue(positions.doc() == 0);
|
||||
assertTrue(positions.nextPosition() >= 0);
|
||||
}
|
||||
|
|
|
@ -56,14 +56,13 @@ public class TestSegmentTermDocs extends LuceneTestCase {
|
|||
SegmentReader reader = SegmentReader.get(true, info, indexDivisor);
|
||||
assertTrue(reader != null);
|
||||
assertEquals(indexDivisor, reader.getTermInfosIndexDivisor());
|
||||
SegmentTermDocs segTermDocs = new SegmentTermDocs(reader);
|
||||
assertTrue(segTermDocs != null);
|
||||
segTermDocs.seek(new Term(DocHelper.TEXT_FIELD_2_KEY, "field"));
|
||||
if (segTermDocs.next() == true)
|
||||
{
|
||||
int docId = segTermDocs.doc();
|
||||
TermDocs termDocs = reader.termDocs();
|
||||
assertTrue(termDocs != null);
|
||||
termDocs.seek(new Term(DocHelper.TEXT_FIELD_2_KEY, "field"));
|
||||
if (termDocs.next() == true) {
|
||||
int docId = termDocs.doc();
|
||||
assertTrue(docId == 0);
|
||||
int freq = segTermDocs.freq();
|
||||
int freq = termDocs.freq();
|
||||
assertTrue(freq == 3);
|
||||
}
|
||||
reader.close();
|
||||
|
@ -78,20 +77,20 @@ public class TestSegmentTermDocs extends LuceneTestCase {
|
|||
//After adding the document, we should be able to read it back in
|
||||
SegmentReader reader = SegmentReader.get(true, info, indexDivisor);
|
||||
assertTrue(reader != null);
|
||||
SegmentTermDocs segTermDocs = new SegmentTermDocs(reader);
|
||||
assertTrue(segTermDocs != null);
|
||||
segTermDocs.seek(new Term("textField2", "bad"));
|
||||
assertTrue(segTermDocs.next() == false);
|
||||
TermDocs termDocs = reader.termDocs();
|
||||
assertTrue(termDocs != null);
|
||||
termDocs.seek(new Term("textField2", "bad"));
|
||||
assertTrue(termDocs.next() == false);
|
||||
reader.close();
|
||||
}
|
||||
{
|
||||
//After adding the document, we should be able to read it back in
|
||||
SegmentReader reader = SegmentReader.get(true, info, indexDivisor);
|
||||
assertTrue(reader != null);
|
||||
SegmentTermDocs segTermDocs = new SegmentTermDocs(reader);
|
||||
assertTrue(segTermDocs != null);
|
||||
segTermDocs.seek(new Term("junk", "bad"));
|
||||
assertTrue(segTermDocs.next() == false);
|
||||
TermDocs termDocs = reader.termDocs();
|
||||
assertTrue(termDocs != null);
|
||||
termDocs.seek(new Term("junk", "bad"));
|
||||
assertTrue(termDocs.next() == false);
|
||||
reader.close();
|
||||
}
|
||||
}
|
||||
|
|
|
@ -61,23 +61,6 @@ public class TestSegmentTermEnum extends LuceneTestCase
|
|||
verifyDocFreq();
|
||||
}
|
||||
|
||||
public void testPrevTermAtEnd() throws IOException
|
||||
{
|
||||
Directory dir = new MockRAMDirectory();
|
||||
IndexWriter writer = new IndexWriter(dir, new WhitespaceAnalyzer(), true, IndexWriter.MaxFieldLength.LIMITED);
|
||||
addDoc(writer, "aaa bbb");
|
||||
writer.close();
|
||||
SegmentReader reader = SegmentReader.getOnlySegmentReader(dir);
|
||||
SegmentTermEnum termEnum = (SegmentTermEnum) reader.terms();
|
||||
assertTrue(termEnum.next());
|
||||
assertEquals("aaa", termEnum.term().text());
|
||||
assertTrue(termEnum.next());
|
||||
assertEquals("aaa", termEnum.prev().text());
|
||||
assertEquals("bbb", termEnum.term().text());
|
||||
assertFalse(termEnum.next());
|
||||
assertEquals("bbb", termEnum.prev().text());
|
||||
}
|
||||
|
||||
private void verifyDocFreq()
|
||||
throws IOException
|
||||
{
|
||||
|
|
|
@ -352,7 +352,7 @@ public class TestStressIndexing2 extends LuceneTestCase {
|
|||
if (!termEnum1.next()) break;
|
||||
}
|
||||
|
||||
// iterate until we get some docs
|
||||
// iterate until we get some docs
|
||||
int len2;
|
||||
for(;;) {
|
||||
len2=0;
|
||||
|
@ -369,12 +369,12 @@ public class TestStressIndexing2 extends LuceneTestCase {
|
|||
if (!termEnum2.next()) break;
|
||||
}
|
||||
|
||||
if (!hasDeletes)
|
||||
assertEquals(termEnum1.docFreq(), termEnum2.docFreq());
|
||||
|
||||
assertEquals(len1, len2);
|
||||
if (len1==0) break; // no more terms
|
||||
|
||||
if (!hasDeletes)
|
||||
assertEquals(termEnum1.docFreq(), termEnum2.docFreq());
|
||||
|
||||
assertEquals(term1, term2);
|
||||
|
||||
// sort info2 to get it into ascending docid
|
||||
|
|
|
@ -33,7 +33,7 @@ public class CheckHits {
|
|||
* different order of operations from the actual scoring method ...
|
||||
* this allows for a small amount of variation
|
||||
*/
|
||||
public static float EXPLAIN_SCORE_TOLERANCE_DELTA = 0.00005f;
|
||||
public static float EXPLAIN_SCORE_TOLERANCE_DELTA = 0.0002f;
|
||||
|
||||
/**
|
||||
* Tests that all documents up to maxDoc which are *not* in the
|
||||
|
|
|
@ -65,7 +65,7 @@ public class TestCachingWrapperFilter extends LuceneTestCase {
|
|||
if (originalSet.isCacheable()) {
|
||||
assertEquals("Cached DocIdSet must be of same class like uncached, if cacheable", originalSet.getClass(), cachedSet.getClass());
|
||||
} else {
|
||||
assertTrue("Cached DocIdSet must be an OpenBitSet if the original one was not cacheable", cachedSet instanceof OpenBitSetDISI);
|
||||
assertTrue("Cached DocIdSet must be an OpenBitSet if the original one was not cacheable", cachedSet instanceof OpenBitSetDISI || cachedSet == DocIdSet.EMPTY_DOCIDSET);
|
||||
}
|
||||
}
|
||||
|
||||
|
|
|
@ -230,6 +230,8 @@ public class TestNumericRangeQuery32 extends LuceneTestCase {
|
|||
testRightOpenRange(2);
|
||||
}
|
||||
|
||||
/* TESTs disabled, because incompatible API change in 3.1/flex:
|
||||
|
||||
private void testRandomTrieAndClassicRangeQuery(int precisionStep) throws Exception {
|
||||
final Random rnd=newRandom();
|
||||
String field="field"+precisionStep;
|
||||
|
@ -298,6 +300,8 @@ public class TestNumericRangeQuery32 extends LuceneTestCase {
|
|||
testRandomTrieAndClassicRangeQuery(Integer.MAX_VALUE);
|
||||
}
|
||||
|
||||
*/
|
||||
|
||||
private void testRangeSplit(int precisionStep) throws Exception {
|
||||
final Random rnd=newRandom();
|
||||
String field="ascfield"+precisionStep;
|
||||
|
@ -443,37 +447,39 @@ public class TestNumericRangeQuery32 extends LuceneTestCase {
|
|||
assertFalse(q2.equals(q1));
|
||||
}
|
||||
|
||||
private void testEnum(int lower, int upper) throws Exception {
|
||||
NumericRangeQuery<Integer> q = NumericRangeQuery.newIntRange("field4", 4, lower, upper, true, true);
|
||||
FilteredTermEnum termEnum = q.getEnum(searcher.getIndexReader());
|
||||
try {
|
||||
int count = 0;
|
||||
do {
|
||||
final Term t = termEnum.term();
|
||||
if (t != null) {
|
||||
final int val = NumericUtils.prefixCodedToInt(t.text());
|
||||
assertTrue("value not in bounds", val >= lower && val <= upper);
|
||||
count++;
|
||||
} else break;
|
||||
} while (termEnum.next());
|
||||
assertFalse(termEnum.next());
|
||||
System.out.println("TermEnum on 'field4' for range [" + lower + "," + upper + "] contained " + count + " terms.");
|
||||
} finally {
|
||||
termEnum.close();
|
||||
}
|
||||
}
|
||||
// Removed for now - NumericRangeQuery does not currently implement getEnum
|
||||
|
||||
public void testEnum() throws Exception {
|
||||
int count=3000;
|
||||
int lower=(distance*3/2)+startOffset, upper=lower + count*distance + (distance/3);
|
||||
// test enum with values
|
||||
testEnum(lower, upper);
|
||||
// test empty enum
|
||||
testEnum(upper, lower);
|
||||
// test empty enum outside of bounds
|
||||
lower = distance*noDocs+startOffset;
|
||||
upper = 2 * lower;
|
||||
testEnum(lower, upper);
|
||||
}
|
||||
// private void testEnum(int lower, int upper) throws Exception {
|
||||
// NumericRangeQuery<Integer> q = NumericRangeQuery.newIntRange("field4", 4, lower, upper, true, true);
|
||||
// FilteredTermEnum termEnum = q.getEnum(searcher.getIndexReader());
|
||||
// try {
|
||||
// int count = 0;
|
||||
// do {
|
||||
// final Term t = termEnum.term();
|
||||
// if (t != null) {
|
||||
// final int val = NumericUtils.prefixCodedToInt(t.text());
|
||||
// assertTrue("value not in bounds", val >= lower && val <= upper);
|
||||
// count++;
|
||||
// } else break;
|
||||
// } while (termEnum.next());
|
||||
// assertFalse(termEnum.next());
|
||||
// System.out.println("TermEnum on 'field4' for range [" + lower + "," + upper + "] contained " + count + " terms.");
|
||||
// } finally {
|
||||
// termEnum.close();
|
||||
// }
|
||||
// }
|
||||
//
|
||||
// public void testEnum() throws Exception {
|
||||
// int count=3000;
|
||||
// int lower=(distance*3/2)+startOffset, upper=lower + count*distance + (distance/3);
|
||||
// // test enum with values
|
||||
// testEnum(lower, upper);
|
||||
// // test empty enum
|
||||
// testEnum(upper, lower);
|
||||
// // test empty enum outside of bounds
|
||||
// lower = distance*noDocs+startOffset;
|
||||
// upper = 2 * lower;
|
||||
// testEnum(lower, upper);
|
||||
// }
|
||||
|
||||
}
|
||||
|
|
|
@ -245,6 +245,8 @@ public class TestNumericRangeQuery64 extends LuceneTestCase {
|
|||
testRightOpenRange(2);
|
||||
}
|
||||
|
||||
/* TESTs disabled, because incompatible API change in 3.1/flex:
|
||||
|
||||
private void testRandomTrieAndClassicRangeQuery(int precisionStep) throws Exception {
|
||||
final Random rnd=newRandom();
|
||||
String field="field"+precisionStep;
|
||||
|
@ -317,6 +319,8 @@ public class TestNumericRangeQuery64 extends LuceneTestCase {
|
|||
testRandomTrieAndClassicRangeQuery(Integer.MAX_VALUE);
|
||||
}
|
||||
|
||||
*/
|
||||
|
||||
private void testRangeSplit(int precisionStep) throws Exception {
|
||||
final Random rnd=newRandom();
|
||||
String field="ascfield"+precisionStep;
|
||||
|
|
|
@ -35,6 +35,7 @@ import org.apache.lucene.index.CorruptIndexException;
|
|||
import org.apache.lucene.index.IndexReader;
|
||||
import org.apache.lucene.index.IndexWriter;
|
||||
import org.apache.lucene.index.Term;
|
||||
import org.apache.lucene.util.BytesRef;
|
||||
import org.apache.lucene.queryParser.ParseException;
|
||||
import org.apache.lucene.search.BooleanClause.Occur;
|
||||
import org.apache.lucene.store.LockObtainFailedException;
|
||||
|
@ -332,20 +333,28 @@ public class TestSort extends LuceneTestCase implements Serializable {
|
|||
FieldCache fc = FieldCache.DEFAULT;
|
||||
|
||||
|
||||
sort.setSort (new SortField ("parser", new FieldCache.IntParser(){
|
||||
public final int parseInt(final String val) {
|
||||
return (val.charAt(0)-'A') * 123456;
|
||||
sort.setSort ( new SortField ("parser", new FieldCache.IntParser(){
|
||||
public final int parseInt(final String term) {
|
||||
// dummy
|
||||
return 0;
|
||||
}
|
||||
}), SortField.FIELD_DOC );
|
||||
public final int parseInt(final BytesRef term) {
|
||||
return (term.bytes[term.offset]-'A') * 123456;
|
||||
}
|
||||
}), SortField.FIELD_DOC);
|
||||
assertMatches (full, queryA, sort, "JIHGFEDCBA");
|
||||
assertSaneFieldCaches(getName() + " IntParser");
|
||||
fc.purgeAllCaches();
|
||||
|
||||
sort.setSort (new SortField ("parser", new FieldCache.FloatParser(){
|
||||
public final float parseFloat(final String val) {
|
||||
return (float) Math.sqrt( val.charAt(0) );
|
||||
sort.setSort (new SortField[] { new SortField ("parser", new FieldCache.FloatParser(){
|
||||
public final float parseFloat(final String term) {
|
||||
// dummy
|
||||
return 0;
|
||||
}
|
||||
}), SortField.FIELD_DOC );
|
||||
public final float parseFloat(final BytesRef term) {
|
||||
return (float) Math.sqrt( term.bytes[term.offset] );
|
||||
}
|
||||
}), SortField.FIELD_DOC });
|
||||
assertMatches (full, queryA, sort, "JIHGFEDCBA");
|
||||
assertSaneFieldCaches(getName() + " FloatParser");
|
||||
fc.purgeAllCaches();
|
||||
|
@ -354,34 +363,49 @@ public class TestSort extends LuceneTestCase implements Serializable {
|
|||
public final long parseLong(final String val) {
|
||||
return (val.charAt(0)-'A') * 1234567890L;
|
||||
}
|
||||
}), SortField.FIELD_DOC );
|
||||
public final long parseLong(final BytesRef term) {
|
||||
return (term.bytes[term.offset]-'A') * 1234567890L;
|
||||
}
|
||||
}), SortField.FIELD_DOC);
|
||||
assertMatches (full, queryA, sort, "JIHGFEDCBA");
|
||||
assertSaneFieldCaches(getName() + " LongParser");
|
||||
fc.purgeAllCaches();
|
||||
|
||||
sort.setSort (new SortField ("parser", new FieldCache.DoubleParser(){
|
||||
public final double parseDouble(final String val) {
|
||||
return Math.pow( val.charAt(0), (val.charAt(0)-'A') );
|
||||
sort.setSort (new SortField[] { new SortField ("parser", new FieldCache.DoubleParser(){
|
||||
public final double parseDouble(final String term) {
|
||||
// dummy
|
||||
return 0;
|
||||
}
|
||||
}), SortField.FIELD_DOC );
|
||||
public final double parseDouble(final BytesRef term) {
|
||||
return Math.pow( term.bytes[term.offset], (term.bytes[term.offset]-'A') );
|
||||
}
|
||||
}), SortField.FIELD_DOC });
|
||||
assertMatches (full, queryA, sort, "JIHGFEDCBA");
|
||||
assertSaneFieldCaches(getName() + " DoubleParser");
|
||||
fc.purgeAllCaches();
|
||||
|
||||
sort.setSort (new SortField ("parser", new FieldCache.ByteParser(){
|
||||
public final byte parseByte(final String val) {
|
||||
return (byte) (val.charAt(0)-'A');
|
||||
sort.setSort (new SortField[] { new SortField ("parser", new FieldCache.ByteParser(){
|
||||
public final byte parseByte(final String term) {
|
||||
// dummy
|
||||
return 0;
|
||||
}
|
||||
}), SortField.FIELD_DOC );
|
||||
public final byte parseByte(final BytesRef term) {
|
||||
return (byte) (term.bytes[term.offset]-'A');
|
||||
}
|
||||
}), SortField.FIELD_DOC });
|
||||
assertMatches (full, queryA, sort, "JIHGFEDCBA");
|
||||
assertSaneFieldCaches(getName() + " ByteParser");
|
||||
fc.purgeAllCaches();
|
||||
|
||||
sort.setSort (new SortField ("parser", new FieldCache.ShortParser(){
|
||||
public final short parseShort(final String val) {
|
||||
return (short) (val.charAt(0)-'A');
|
||||
sort.setSort (new SortField[] { new SortField ("parser", new FieldCache.ShortParser(){
|
||||
public final short parseShort(final String term) {
|
||||
// dummy
|
||||
return 0;
|
||||
}
|
||||
}), SortField.FIELD_DOC );
|
||||
public final short parseShort(final BytesRef term) {
|
||||
return (short) (term.bytes[term.offset]-'A');
|
||||
}
|
||||
}), SortField.FIELD_DOC });
|
||||
assertMatches (full, queryA, sort, "JIHGFEDCBA");
|
||||
assertSaneFieldCaches(getName() + " ShortParser");
|
||||
fc.purgeAllCaches();
|
||||
|
@ -439,8 +463,12 @@ public class TestSort extends LuceneTestCase implements Serializable {
|
|||
@Override
|
||||
public void setNextReader(IndexReader reader, int docBase) throws IOException {
|
||||
docValues = FieldCache.DEFAULT.getInts(reader, "parser", new FieldCache.IntParser() {
|
||||
public final int parseInt(final String val) {
|
||||
return (val.charAt(0)-'A') * 123456;
|
||||
public final int parseInt(final String term) {
|
||||
// dummy
|
||||
return 0;
|
||||
}
|
||||
public final int parseInt(final BytesRef term) {
|
||||
return (term.bytes[term.offset]-'A') * 123456;
|
||||
}
|
||||
});
|
||||
}
|
||||
|
|
|
@ -72,9 +72,9 @@ public class TestTermScorer extends LuceneTestCase
|
|||
|
||||
Weight weight = termQuery.weight(indexSearcher);
|
||||
|
||||
TermScorer ts = new TermScorer(weight,
|
||||
indexReader.termDocs(allTerm), indexSearcher.getSimilarity(),
|
||||
indexReader.norms(FIELD));
|
||||
Scorer ts = weight.scorer(indexSearcher.getIndexReader(),
|
||||
true, true);
|
||||
|
||||
//we have 2 documents with the term all in them, one document for all the other values
|
||||
final List docs = new ArrayList();
|
||||
//must call next first
|
||||
|
@ -138,9 +138,9 @@ public class TestTermScorer extends LuceneTestCase
|
|||
|
||||
Weight weight = termQuery.weight(indexSearcher);
|
||||
|
||||
TermScorer ts = new TermScorer(weight,
|
||||
indexReader.termDocs(allTerm), indexSearcher.getSimilarity(),
|
||||
indexReader.norms(FIELD));
|
||||
Scorer ts = weight.scorer(indexSearcher.getIndexReader(),
|
||||
true, true);
|
||||
|
||||
assertTrue("next did not return a doc", ts.nextDoc() != DocIdSetIterator.NO_MORE_DOCS);
|
||||
assertTrue("score is not correct", ts.score() == 1.6931472f);
|
||||
assertTrue("next did not return a doc", ts.nextDoc() != DocIdSetIterator.NO_MORE_DOCS);
|
||||
|
@ -155,9 +155,9 @@ public class TestTermScorer extends LuceneTestCase
|
|||
|
||||
Weight weight = termQuery.weight(indexSearcher);
|
||||
|
||||
TermScorer ts = new TermScorer(weight,
|
||||
indexReader.termDocs(allTerm), indexSearcher.getSimilarity(),
|
||||
indexReader.norms(FIELD));
|
||||
Scorer ts = weight.scorer(indexSearcher.getIndexReader(),
|
||||
true, true);
|
||||
|
||||
assertTrue("Didn't skip", ts.advance(3) != DocIdSetIterator.NO_MORE_DOCS);
|
||||
//The next doc should be doc 5
|
||||
assertTrue("doc should be number 5", ts.docID() == 5);
|
||||
|
|
|
@ -114,6 +114,7 @@ public class TestWildcard
|
|||
* rewritten to a single PrefixQuery. The boost and rewriteMethod should be
|
||||
* preserved.
|
||||
*/
|
||||
/* disable because rewrites changed in flex/trunk
|
||||
public void testPrefixTerm() throws IOException {
|
||||
RAMDirectory indexStore = getIndexStore("field", new String[]{"prefix", "prefixx"});
|
||||
IndexSearcher searcher = new IndexSearcher(indexStore, true);
|
||||
|
@ -145,7 +146,7 @@ public class TestWildcard
|
|||
expected.setRewriteMethod(wq.getRewriteMethod());
|
||||
expected.setBoost(wq.getBoost());
|
||||
assertEquals(searcher.rewrite(expected), searcher.rewrite(wq));
|
||||
}
|
||||
}*/
|
||||
|
||||
/**
|
||||
* Tests Wildcard queries with an asterisk.
|
||||
|
|
|
@ -78,22 +78,22 @@ public class TestAttributeSource extends LuceneTestCase {
|
|||
|
||||
public void testCloneAttributes() {
|
||||
final AttributeSource src = new AttributeSource();
|
||||
final TermAttribute termAtt = src.addAttribute(TermAttribute.class);
|
||||
final FlagsAttribute flagsAtt = src.addAttribute(FlagsAttribute.class);
|
||||
final TypeAttribute typeAtt = src.addAttribute(TypeAttribute.class);
|
||||
termAtt.setTermBuffer("TestTerm");
|
||||
flagsAtt.setFlags(1234);
|
||||
typeAtt.setType("TestType");
|
||||
|
||||
final AttributeSource clone = src.cloneAttributes();
|
||||
final Iterator<Class<? extends Attribute>> it = clone.getAttributeClassesIterator();
|
||||
assertEquals("TermAttribute must be the first attribute", TermAttribute.class, it.next());
|
||||
assertEquals("FlagsAttribute must be the first attribute", FlagsAttribute.class, it.next());
|
||||
assertEquals("TypeAttribute must be the second attribute", TypeAttribute.class, it.next());
|
||||
assertFalse("No more attributes", it.hasNext());
|
||||
|
||||
final TermAttribute termAtt2 = clone.getAttribute(TermAttribute.class);
|
||||
final FlagsAttribute flagsAtt2 = clone.getAttribute(FlagsAttribute.class);
|
||||
final TypeAttribute typeAtt2 = clone.getAttribute(TypeAttribute.class);
|
||||
assertNotSame("TermAttribute of original and clone must be different instances", termAtt2, termAtt);
|
||||
assertNotSame("FlagsAttribute of original and clone must be different instances", flagsAtt2, flagsAtt);
|
||||
assertNotSame("TypeAttribute of original and clone must be different instances", typeAtt2, typeAtt);
|
||||
assertEquals("TermAttribute of original and clone must be equal", termAtt2, termAtt);
|
||||
assertEquals("FlagsAttribute of original and clone must be equal", flagsAtt2, flagsAtt);
|
||||
assertEquals("TypeAttribute of original and clone must be equal", typeAtt2, typeAtt);
|
||||
}
|
||||
|
||||
|
|
|
@ -26,6 +26,8 @@ import java.util.Iterator;
|
|||
|
||||
public class TestNumericUtils extends LuceneTestCase {
|
||||
|
||||
/* TESTs disabled, because incompatible API change in 3.1/flex:
|
||||
|
||||
public void testLongConversionAndOrdering() throws Exception {
|
||||
// generate a series of encoded longs, each numerical one bigger than the one before
|
||||
String last=null;
|
||||
|
@ -132,6 +134,8 @@ public class TestNumericUtils extends LuceneTestCase {
|
|||
}
|
||||
}
|
||||
|
||||
*/
|
||||
|
||||
public void testDoubles() throws Exception {
|
||||
double[] vals=new double[]{
|
||||
Double.NEGATIVE_INFINITY, -2.3E25, -1.0E15, -1.0, -1.0E-1, -1.0E-2, -0.0,
|
||||
|
|
|
@ -104,24 +104,24 @@ The source distribution does not contain sources of the previous Lucene Java ver
|
|||
|
||||
<target name="compile-backwards" depends="compile-core, jar-core, test-backwards-message"
|
||||
description="Runs tests of a previous Lucene version." if="backwards.available">
|
||||
<sequential>
|
||||
<sequential>
|
||||
<mkdir dir="${build.dir.backwards}"/>
|
||||
|
||||
<!-- first compile branch classes -->
|
||||
<compile
|
||||
<!-- first compile branch classes -->
|
||||
<compile
|
||||
srcdir="${backwards.dir}/src/java"
|
||||
destdir="${build.dir.backwards}/classes/java"
|
||||
javac.source="${javac.source.backwards}" javac.target="${javac.target.backwards}"
|
||||
>
|
||||
<classpath refid="backwards.compile.classpath"/>
|
||||
</compile>
|
||||
</compile>
|
||||
|
||||
<!-- compile branch tests against branch classpath -->
|
||||
<compile-test-macro srcdir="${backwards.dir}/src/test" destdir="${build.dir.backwards}/classes/test"
|
||||
test.classpath="backwards.test.compile.classpath" javac.source="${javac.source.backwards}" javac.target="${javac.target.backwards}"/>
|
||||
|
||||
|
||||
</sequential>
|
||||
</sequential>
|
||||
</target>
|
||||
|
||||
<target name="test-backwards" depends="compile-backwards, junit-backwards-mkdir, junit-backwards-sequential, junit-backwards-parallel"/>
|
||||
|
@ -715,6 +715,41 @@ The source distribution does not contain sources of the previous Lucene Java ver
|
|||
</delete>
|
||||
</target>
|
||||
|
||||
<macrodef name="createLevAutomaton">
|
||||
<attribute name="n"/>
|
||||
<sequential>
|
||||
<exec dir="src/java/org/apache/lucene/util/automaton"
|
||||
executable="${python.exe}" failonerror="true">
|
||||
<arg line="createLevAutomata.py @{n}"/>
|
||||
</exec>
|
||||
</sequential>
|
||||
</macrodef>
|
||||
|
||||
<target name="createLevAutomata" depends="check-moman,clone-moman,pull-moman">
|
||||
<createLevAutomaton n="1"/>
|
||||
<createLevAutomaton n="2"/>
|
||||
</target>
|
||||
|
||||
<target name="check-moman">
|
||||
<condition property="moman.cloned">
|
||||
<available file="src/java/org/apache/lucene/util/automaton/moman"/>
|
||||
</condition>
|
||||
</target>
|
||||
|
||||
<target name="clone-moman" unless="moman.cloned">
|
||||
<exec dir="src/java/org/apache/lucene/util/automaton"
|
||||
executable="${hg.exe}" failonerror="true">
|
||||
<arg line="clone -r ${moman.rev} ${moman.url} moman"/>
|
||||
</exec>
|
||||
</target>
|
||||
|
||||
<target name="pull-moman" if="moman.cloned">
|
||||
<exec dir="src/java/org/apache/lucene/util/automaton/moman"
|
||||
executable="${hg.exe}" failonerror="true">
|
||||
<arg line="pull -f -r ${moman.rev}"/>
|
||||
</exec>
|
||||
</target>
|
||||
|
||||
<macrodef name="contrib-crawl">
|
||||
<attribute name="target" default=""/>
|
||||
<attribute name="failonerror" default="true"/>
|
||||
|
|
|
@ -119,6 +119,11 @@
|
|||
<property name="svnversion.exe" value="svnversion" />
|
||||
<property name="svn.exe" value="svn" />
|
||||
|
||||
<property name="hg.exe" value="hg" />
|
||||
<property name="moman.url" value="https://bitbucket.org/jpbarrette/moman" />
|
||||
<property name="moman.rev" value="115" />
|
||||
<property name="python.exe" value="python" />
|
||||
|
||||
<property name="gpg.exe" value="gpg" />
|
||||
<property name="gpg.key" value="CODE SIGNING KEY" />
|
||||
|
||||
|
|
|
@ -0,0 +1,553 @@
|
|||
import types
|
||||
import re
|
||||
import time
|
||||
import os
|
||||
import shutil
|
||||
import sys
|
||||
import cPickle
|
||||
import datetime
|
||||
|
||||
# TODO
|
||||
# - build wiki/random index as needed (balanced or not, varying # segs, docs)
|
||||
# - verify step
|
||||
# - run searches
|
||||
# - get all docs query in here
|
||||
|
||||
if sys.platform.lower().find('darwin') != -1:
|
||||
osName = 'osx'
|
||||
elif sys.platform.lower().find('win') != -1:
|
||||
osName = 'windows'
|
||||
elif sys.platform.lower().find('linux') != -1:
|
||||
osName = 'linux'
|
||||
else:
|
||||
osName = 'unix'
|
||||
|
||||
TRUNK_DIR = '/lucene/clean'
|
||||
FLEX_DIR = '/lucene/flex.branch'
|
||||
|
||||
DEBUG = False
|
||||
|
||||
# let shell find it:
|
||||
JAVA_COMMAND = 'java -Xms2048M -Xmx2048M -Xbatch -server'
|
||||
#JAVA_COMMAND = 'java -Xms1024M -Xmx1024M -Xbatch -server -XX:+AggressiveOpts -XX:CompileThreshold=100 -XX:+UseFastAccessorMethods'
|
||||
|
||||
INDEX_NUM_THREADS = 1
|
||||
|
||||
INDEX_NUM_DOCS = 5000000
|
||||
|
||||
LOG_DIR = 'logs'
|
||||
|
||||
DO_BALANCED = False
|
||||
|
||||
if osName == 'osx':
|
||||
WIKI_FILE = '/x/lucene/enwiki-20090724-pages-articles.xml.bz2'
|
||||
INDEX_DIR_BASE = '/lucene'
|
||||
else:
|
||||
WIKI_FILE = '/x/lucene/enwiki-20090724-pages-articles.xml.bz2'
|
||||
INDEX_DIR_BASE = '/x/lucene'
|
||||
|
||||
if DEBUG:
|
||||
NUM_ROUND = 0
|
||||
else:
|
||||
NUM_ROUND = 7
|
||||
|
||||
if 0:
|
||||
print 'compile...'
|
||||
if '-nocompile' not in sys.argv:
|
||||
if os.system('ant compile > compile.log 2>&1') != 0:
|
||||
raise RuntimeError('compile failed (see compile.log)')
|
||||
|
||||
BASE_SEARCH_ALG = '''
|
||||
analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
|
||||
directory=FSDirectory
|
||||
work.dir = $INDEX$
|
||||
search.num.hits = $NUM_HITS$
|
||||
query.maker=org.apache.lucene.benchmark.byTask.feeds.FileBasedQueryMaker
|
||||
file.query.maker.file = queries.txt
|
||||
print.hits.field = $PRINT_FIELD$
|
||||
log.queries=true
|
||||
log.step=100000
|
||||
|
||||
$OPENREADER$
|
||||
{"XSearchWarm" $SEARCH$}
|
||||
|
||||
# Turn off printing, after warming:
|
||||
SetProp(print.hits.field,)
|
||||
|
||||
$ROUNDS$
|
||||
CloseReader
|
||||
RepSumByPrefRound XSearch
|
||||
'''
|
||||
|
||||
BASE_INDEX_ALG = '''
|
||||
analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
|
||||
|
||||
$OTHER$
|
||||
deletion.policy = org.apache.lucene.benchmark.utils.NoDeletionPolicy
|
||||
doc.tokenized = false
|
||||
doc.body.tokenized = true
|
||||
doc.stored = true
|
||||
doc.body.stored = false
|
||||
doc.term.vector = false
|
||||
log.step.AddDoc=10000
|
||||
|
||||
directory=FSDirectory
|
||||
autocommit=false
|
||||
compound=false
|
||||
|
||||
work.dir=$WORKDIR$
|
||||
|
||||
{ "BuildIndex"
|
||||
- CreateIndex
|
||||
$INDEX_LINE$
|
||||
- CommitIndex(dp0)
|
||||
- CloseIndex
|
||||
$DELETIONS$
|
||||
}
|
||||
|
||||
RepSumByPrefRound BuildIndex
|
||||
'''
|
||||
|
||||
class RunAlgs:
|
||||
|
||||
def __init__(self, resultsPrefix):
|
||||
self.counter = 0
|
||||
self.results = []
|
||||
self.fOut = open('%s.txt' % resultsPrefix, 'wb')
|
||||
|
||||
def makeIndex(self, label, dir, source, numDocs, balancedNumSegs=None, deletePcts=None):
|
||||
|
||||
if source not in ('wiki', 'random'):
|
||||
raise RuntimeError('source must be wiki or random')
|
||||
|
||||
if dir is not None:
|
||||
fullDir = '%s/contrib/benchmark' % dir
|
||||
if DEBUG:
|
||||
print ' chdir %s' % fullDir
|
||||
os.chdir(fullDir)
|
||||
|
||||
indexName = '%s.%s.nd%gM' % (source, label, numDocs/1000000.0)
|
||||
if balancedNumSegs is not None:
|
||||
indexName += '_balanced%d' % balancedNumSegs
|
||||
fullIndexPath = '%s/%s' % (INDEX_DIR_BASE, indexName)
|
||||
|
||||
if os.path.exists(fullIndexPath):
|
||||
print 'Index %s already exists...' % fullIndexPath
|
||||
return indexName
|
||||
|
||||
print 'Now create index %s...' % fullIndexPath
|
||||
|
||||
s = BASE_INDEX_ALG
|
||||
|
||||
if source == 'wiki':
|
||||
other = '''doc.index.props = true
|
||||
content.source=org.apache.lucene.benchmark.byTask.feeds.EnwikiContentSource
|
||||
docs.file=%s
|
||||
''' % WIKI_FILE
|
||||
#addDoc = 'AddDoc(1024)'
|
||||
addDoc = 'AddDoc'
|
||||
else:
|
||||
other = '''doc.index.props = true
|
||||
content.source=org.apache.lucene.benchmark.byTask.feeds.SortableSingleDocSource
|
||||
'''
|
||||
addDoc = 'AddDoc'
|
||||
if INDEX_NUM_THREADS > 1:
|
||||
#other += 'doc.reuse.fields=false\n'
|
||||
s = s.replace('$INDEX_LINE$', '[ { "AddDocs" %s > : %s } : %s' % \
|
||||
(addDoc, numDocs/INDEX_NUM_THREADS, INDEX_NUM_THREADS))
|
||||
else:
|
||||
s = s.replace('$INDEX_LINE$', '{ "AddDocs" %s > : %s' % \
|
||||
(addDoc, numDocs))
|
||||
|
||||
s = s.replace('$WORKDIR$', fullIndexPath)
|
||||
|
||||
if deletePcts is not None:
|
||||
dp = '# Do deletions\n'
|
||||
dp += 'OpenReader(false)\n'
|
||||
for pct in deletePcts:
|
||||
if pct != 0:
|
||||
dp += 'DeleteByPercent(%g)\n' % pct
|
||||
dp += 'CommitIndex(dp%g)\n' % pct
|
||||
dp += 'CloseReader()\n'
|
||||
else:
|
||||
dp = ''
|
||||
|
||||
s = s.replace('$DELETIONS$', dp)
|
||||
|
||||
if balancedNumSegs is not None:
|
||||
other += ''' merge.factor=1000
|
||||
max.buffered=%d
|
||||
ram.flush.mb=2000
|
||||
''' % (numDocs/balancedNumSegs)
|
||||
else:
|
||||
if source == 'random':
|
||||
other += 'ram.flush.mb=1.0\n'
|
||||
else:
|
||||
other += 'ram.flush.mb=32.0\n'
|
||||
|
||||
s = s.replace('$OTHER$', other)
|
||||
|
||||
try:
|
||||
self.runOne(dir, s, 'index_%s' % indexName, isIndex=True)
|
||||
except:
|
||||
if os.path.exists(fullIndexPath):
|
||||
shutil.rmtree(fullIndexPath)
|
||||
raise
|
||||
return indexName
|
||||
|
||||
def getLogPrefix(self, **dArgs):
|
||||
l = dArgs.items()
|
||||
l.sort()
|
||||
s = '_'.join(['%s=%s' % tup for tup in l])
|
||||
s = s.replace(' ', '_')
|
||||
s = s.replace('"', '_')
|
||||
return s
|
||||
|
||||
def runOne(self, dir, alg, logFileName, expectedMaxDocs=None, expectedNumDocs=None, queries=None, verify=False, isIndex=False):
|
||||
|
||||
fullDir = '%s/contrib/benchmark' % dir
|
||||
if DEBUG:
|
||||
print ' chdir %s' % fullDir
|
||||
os.chdir(fullDir)
|
||||
|
||||
if queries is not None:
|
||||
if type(queries) in types.StringTypes:
|
||||
queries = [queries]
|
||||
open('queries.txt', 'wb').write('\n'.join(queries))
|
||||
|
||||
if DEBUG:
|
||||
algFile = 'tmp.alg'
|
||||
else:
|
||||
algFile = 'tmp.%s.alg' % os.getpid()
|
||||
open(algFile, 'wb').write(alg)
|
||||
|
||||
fullLogFileName = '%s/contrib/benchmark/%s/%s' % (dir, LOG_DIR, logFileName)
|
||||
print ' log: %s' % fullLogFileName
|
||||
if not os.path.exists(LOG_DIR):
|
||||
print ' mkdir %s' % LOG_DIR
|
||||
os.makedirs(LOG_DIR)
|
||||
|
||||
command = '%s -classpath ../../build/classes/java:../../build/classes/demo:../../build/contrib/highlighter/classes/java:lib/commons-digester-1.7.jar:lib/commons-collections-3.1.jar:lib/commons-compress-1.0.jar:lib/commons-logging-1.0.4.jar:lib/commons-beanutils-1.7.0.jar:lib/xerces-2.9.0.jar:lib/xml-apis-2.9.0.jar:../../build/contrib/benchmark/classes/java org.apache.lucene.benchmark.byTask.Benchmark %s > "%s" 2>&1' % (JAVA_COMMAND, algFile, fullLogFileName)
|
||||
|
||||
if DEBUG:
|
||||
print 'command=%s' % command
|
||||
|
||||
try:
|
||||
t0 = time.time()
|
||||
if os.system(command) != 0:
|
||||
raise RuntimeError('FAILED')
|
||||
t1 = time.time()
|
||||
finally:
|
||||
if not DEBUG:
|
||||
os.remove(algFile)
|
||||
|
||||
if isIndex:
|
||||
s = open(fullLogFileName, 'rb').read()
|
||||
if s.find('Exception in thread "') != -1 or s.find('at org.apache.lucene') != -1:
|
||||
raise RuntimeError('alg hit exceptions')
|
||||
return
|
||||
|
||||
else:
|
||||
|
||||
# Parse results:
|
||||
bestQPS = None
|
||||
count = 0
|
||||
nhits = None
|
||||
numDocs = None
|
||||
maxDocs = None
|
||||
warmTime = None
|
||||
r = re.compile('^ ([0-9]+): (.*)$')
|
||||
topN = []
|
||||
|
||||
for line in open(fullLogFileName, 'rb').readlines():
|
||||
m = r.match(line.rstrip())
|
||||
if m is not None:
|
||||
topN.append(m.group(2))
|
||||
if line.startswith('totalHits = '):
|
||||
nhits = int(line[12:].strip())
|
||||
if line.startswith('maxDoc() = '):
|
||||
maxDocs = int(line[12:].strip())
|
||||
if line.startswith('numDocs() = '):
|
||||
numDocs = int(line[12:].strip())
|
||||
if line.startswith('XSearchWarm'):
|
||||
v = line.strip().split()
|
||||
warmTime = float(v[5])
|
||||
if line.startswith('XSearchReal'):
|
||||
v = line.strip().split()
|
||||
# print len(v), v
|
||||
upto = 0
|
||||
i = 0
|
||||
qps = None
|
||||
while i < len(v):
|
||||
if v[i] == '-':
|
||||
i += 1
|
||||
continue
|
||||
else:
|
||||
upto += 1
|
||||
i += 1
|
||||
if upto == 5:
|
||||
qps = float(v[i-1].replace(',', ''))
|
||||
break
|
||||
|
||||
if qps is None:
|
||||
raise RuntimeError('did not find qps')
|
||||
|
||||
count += 1
|
||||
if bestQPS is None or qps > bestQPS:
|
||||
bestQPS = qps
|
||||
|
||||
if not verify:
|
||||
if count != NUM_ROUND:
|
||||
raise RuntimeError('did not find %s rounds (got %s)' % (NUM_ROUND, count))
|
||||
if warmTime is None:
|
||||
raise RuntimeError('did not find warm time')
|
||||
else:
|
||||
bestQPS = 1.0
|
||||
warmTime = None
|
||||
|
||||
if nhits is None:
|
||||
raise RuntimeError('did not see "totalHits = XXX"')
|
||||
|
||||
if maxDocs is None:
|
||||
raise RuntimeError('did not see "maxDoc() = XXX"')
|
||||
|
||||
if maxDocs != expectedMaxDocs:
|
||||
raise RuntimeError('maxDocs() mismatch: expected %s but got %s' % (expectedMaxDocs, maxDocs))
|
||||
|
||||
if numDocs is None:
|
||||
raise RuntimeError('did not see "numDocs() = XXX"')
|
||||
|
||||
if numDocs != expectedNumDocs:
|
||||
raise RuntimeError('numDocs() mismatch: expected %s but got %s' % (expectedNumDocs, numDocs))
|
||||
|
||||
return nhits, warmTime, bestQPS, topN
|
||||
|
||||
def getAlg(self, indexPath, searchTask, numHits, deletes=None, verify=False, printField=''):
|
||||
|
||||
s = BASE_SEARCH_ALG
|
||||
s = s.replace('$PRINT_FIELD$', 'doctitle')
|
||||
|
||||
if not verify:
|
||||
s = s.replace('$ROUNDS$',
|
||||
'''
|
||||
{ "Rounds"
|
||||
{ "Run"
|
||||
{ "TestSearchSpeed"
|
||||
{ "XSearchReal" $SEARCH$ > : 3.0s
|
||||
}
|
||||
NewRound
|
||||
} : %d
|
||||
}
|
||||
''' % NUM_ROUND)
|
||||
else:
|
||||
s = s.replace('$ROUNDS$', '')
|
||||
|
||||
if deletes is None:
|
||||
s = s.replace('$OPENREADER$', 'OpenReader')
|
||||
else:
|
||||
s = s.replace('$OPENREADER$', 'OpenReader(true,dp%g)' % deletes)
|
||||
s = s.replace('$INDEX$', indexPath)
|
||||
s = s.replace('$SEARCH$', searchTask)
|
||||
s = s.replace('$NUM_HITS$', str(numHits))
|
||||
|
||||
return s
|
||||
|
||||
def compare(self, baseline, new, *params):
|
||||
|
||||
if new[0] != baseline[0]:
|
||||
raise RuntimeError('baseline found %d hits but new found %d hits' % (baseline[0], new[0]))
|
||||
|
||||
qpsOld = baseline[2]
|
||||
qpsNew = new[2]
|
||||
pct = 100.0*(qpsNew-qpsOld)/qpsOld
|
||||
print ' diff: %.1f%%' % pct
|
||||
self.results.append((qpsOld, qpsNew, params))
|
||||
|
||||
self.fOut.write('|%s|%.2f|%.2f|%.1f%%|\n' % \
|
||||
('|'.join(str(x) for x in params),
|
||||
qpsOld, qpsNew, pct))
|
||||
self.fOut.flush()
|
||||
|
||||
def save(self, name):
|
||||
f = open('%s.pk' % name, 'wb')
|
||||
cPickle.dump(self.results, f)
|
||||
f.close()
|
||||
|
||||
def verify(r1, r2):
|
||||
if r1[0] != r2[0]:
|
||||
raise RuntimeError('different total hits: %s vs %s' % (r1[0], r2[0]))
|
||||
|
||||
h1 = r1[3]
|
||||
h2 = r2[3]
|
||||
if len(h1) != len(h2):
|
||||
raise RuntimeError('different number of results')
|
||||
else:
|
||||
for i in range(len(h1)):
|
||||
s1 = h1[i].replace('score=NaN', 'score=na').replace('score=0.0', 'score=na')
|
||||
s2 = h2[i].replace('score=NaN', 'score=na').replace('score=0.0', 'score=na')
|
||||
if s1 != s2:
|
||||
raise RuntimeError('hit %s differs: %s vs %s' % (i, s1 ,s2))
|
||||
|
||||
def usage():
|
||||
print
|
||||
print 'Usage: python -u %s -run <name> | -report <name>' % sys.argv[0]
|
||||
print
|
||||
print ' -run <name> runs all tests, saving results to file <name>.pk'
|
||||
print ' -report <name> opens <name>.pk and prints Jira table'
|
||||
print ' -verify confirm old & new produce identical results'
|
||||
print
|
||||
sys.exit(1)
|
||||
|
||||
def main():
|
||||
|
||||
if not os.path.exists(LOG_DIR):
|
||||
os.makedirs(LOG_DIR)
|
||||
|
||||
if '-run' in sys.argv:
|
||||
i = sys.argv.index('-run')
|
||||
mode = 'run'
|
||||
if i < len(sys.argv)-1:
|
||||
name = sys.argv[1+i]
|
||||
else:
|
||||
usage()
|
||||
elif '-report' in sys.argv:
|
||||
i = sys.argv.index('-report')
|
||||
mode = 'report'
|
||||
if i < len(sys.argv)-1:
|
||||
name = sys.argv[1+i]
|
||||
else:
|
||||
usage()
|
||||
elif '-verify' in sys.argv:
|
||||
mode = 'verify'
|
||||
name = None
|
||||
else:
|
||||
usage()
|
||||
|
||||
if mode in ('run', 'verify'):
|
||||
run(mode, name)
|
||||
else:
|
||||
report(name)
|
||||
|
||||
def report(name):
|
||||
|
||||
print '||Query||Deletes %||Tot hits||QPS old||QPS new||Pct change||'
|
||||
|
||||
results = cPickle.load(open('%s.pk' % name))
|
||||
for qpsOld, qpsNew, params in results:
|
||||
pct = 100.0*(qpsNew-qpsOld)/qpsOld
|
||||
if pct < 0.0:
|
||||
c = 'red'
|
||||
else:
|
||||
c = 'green'
|
||||
|
||||
params = list(params)
|
||||
|
||||
query = params[0]
|
||||
if query == '*:*':
|
||||
query = '<all>'
|
||||
params[0] = query
|
||||
|
||||
pct = '{color:%s}%.1f%%{color}' % (c, pct)
|
||||
print '|%s|%.2f|%.2f|%s|' % \
|
||||
('|'.join(str(x) for x in params),
|
||||
qpsOld, qpsNew, pct)
|
||||
|
||||
def run(mode, name):
|
||||
|
||||
for dir in (TRUNK_DIR, FLEX_DIR):
|
||||
dir = '%s/contrib/benchmark' % dir
|
||||
print '"ant compile" in %s...' % dir
|
||||
os.chdir(dir)
|
||||
if os.system('ant compile') != 0:
|
||||
raise RuntimeError('ant compile failed')
|
||||
|
||||
r = RunAlgs(name)
|
||||
|
||||
if not os.path.exists(WIKI_FILE):
|
||||
print
|
||||
print 'ERROR: wiki source file "%s" does not exist' % WIKI_FILE
|
||||
print
|
||||
sys.exit(1)
|
||||
|
||||
print
|
||||
print 'JAVA:\n%s' % os.popen('java -version 2>&1').read()
|
||||
|
||||
print
|
||||
if osName != 'windows':
|
||||
print 'OS:\n%s' % os.popen('uname -a 2>&1').read()
|
||||
else:
|
||||
print 'OS:\n%s' % sys.platform
|
||||
|
||||
deletePcts = (0.0, 0.1, 1.0, 10)
|
||||
|
||||
indexes = {}
|
||||
for rev in ('baseline', 'flex'):
|
||||
if rev == 'baseline':
|
||||
dir = TRUNK_DIR
|
||||
else:
|
||||
dir = FLEX_DIR
|
||||
source = 'wiki'
|
||||
indexes[rev] = r.makeIndex(rev, dir, source, INDEX_NUM_DOCS, deletePcts=deletePcts)
|
||||
|
||||
doVerify = mode == 'verify'
|
||||
source = 'wiki'
|
||||
numHits = 10
|
||||
|
||||
queries = (
|
||||
'body:[tec TO tet]',
|
||||
'real*',
|
||||
'1',
|
||||
'2',
|
||||
'+1 +2',
|
||||
'+1 -2',
|
||||
'1 2 3 -4',
|
||||
'"world economy"')
|
||||
|
||||
for query in queries:
|
||||
|
||||
for deletePct in deletePcts:
|
||||
|
||||
print '\nRUN: query=%s deletes=%g%% nhits=%d' % \
|
||||
(query, deletePct, numHits)
|
||||
|
||||
maxDocs = INDEX_NUM_DOCS
|
||||
numDocs = int(INDEX_NUM_DOCS * (1.0-deletePct/100.))
|
||||
|
||||
prefix = r.getLogPrefix(query=query, deletePct=deletePct)
|
||||
indexPath = '%s/%s' % (INDEX_DIR_BASE, indexes['baseline'])
|
||||
|
||||
# baseline (trunk)
|
||||
s = r.getAlg(indexPath,
|
||||
'Search',
|
||||
numHits,
|
||||
deletes=deletePct,
|
||||
verify=doVerify,
|
||||
printField='doctitle')
|
||||
baseline = r.runOne(TRUNK_DIR, s, 'baseline_%s' % prefix, maxDocs, numDocs, query, verify=doVerify)
|
||||
|
||||
# flex
|
||||
indexPath = '%s/%s' % (INDEX_DIR_BASE, indexes['flex'])
|
||||
s = r.getAlg(indexPath,
|
||||
'Search',
|
||||
numHits,
|
||||
deletes=deletePct,
|
||||
verify=doVerify,
|
||||
printField='doctitle')
|
||||
flex = r.runOne(FLEX_DIR, s, 'flex_%s' % prefix, maxDocs, numDocs, query, verify=doVerify)
|
||||
|
||||
print ' %d hits' % flex[0]
|
||||
|
||||
verify(baseline, flex)
|
||||
|
||||
if mode == 'run' and not DEBUG:
|
||||
r.compare(baseline, flex,
|
||||
query, deletePct, baseline[0])
|
||||
r.save(name)
|
||||
|
||||
def cleanScores(l):
|
||||
for i in range(len(l)):
|
||||
pos = l[i].find(' score=')
|
||||
l[i] = l[i][:pos].strip()
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
|
@ -0,0 +1,38 @@
|
|||
package org.apache.lucene.benchmark.byTask.feeds;
|
||||
|
||||
/**
|
||||
* Licensed to the Apache Software Foundation (ASF) under one or more
|
||||
* contributor license agreements. See the NOTICE file distributed with
|
||||
* this work for additional information regarding copyright ownership.
|
||||
* The ASF licenses this file to You under the Apache License, Version 2.0
|
||||
* (the "License"); you may not use this file except in compliance with
|
||||
* the License. You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
|
||||
import org.apache.lucene.benchmark.byTask.utils.Config;
|
||||
|
||||
/**
|
||||
* A {@link DocMaker} which reads the English Wikipedia dump. Uses
|
||||
* {@link EnwikiContentSource} as its content source, regardless if a different
|
||||
* content source was defined in the configuration.
|
||||
* @deprecated Please use {@link DocMaker} instead, with content.source=EnwikiContentSource
|
||||
*/
|
||||
@Deprecated
|
||||
public class EnwikiDocMaker extends DocMaker {
|
||||
@Override
|
||||
public void setConfig(Config config) {
|
||||
super.setConfig(config);
|
||||
// Override whatever content source was set in the config
|
||||
source = new EnwikiContentSource();
|
||||
source.setConfig(config);
|
||||
System.out.println("NOTE: EnwikiDocMaker is deprecated; please use DocMaker instead (which is the default if you don't specify doc.maker) with content.source=EnwikiContentSource");
|
||||
}
|
||||
}
|
|
@ -0,0 +1,50 @@
|
|||
package org.apache.lucene.benchmark.byTask.feeds;
|
||||
|
||||
/**
|
||||
* Licensed to the Apache Software Foundation (ASF) under one or more
|
||||
* contributor license agreements. See the NOTICE file distributed with
|
||||
* this work for additional information regarding copyright ownership.
|
||||
* The ASF licenses this file to You under the Apache License, Version 2.0
|
||||
* (the "License"); you may not use this file except in compliance with
|
||||
* the License. You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
|
||||
import org.apache.lucene.benchmark.byTask.utils.Config;
|
||||
|
||||
/**
|
||||
* A DocMaker reading one line at a time as a Document from a single file. This
|
||||
* saves IO cost (over DirContentSource) of recursing through a directory and
|
||||
* opening a new file for every document. It also re-uses its Document and Field
|
||||
* instance to improve indexing speed.<br>
|
||||
* The expected format of each line is (arguments are separated by <TAB>):
|
||||
* <i>title, date, body</i>. If a line is read in a different format, a
|
||||
* {@link RuntimeException} will be thrown. In general, you should use this doc
|
||||
* maker with files that were created with
|
||||
* {@link org.apache.lucene.benchmark.byTask.tasks.WriteLineDocTask}.<br>
|
||||
* <br>
|
||||
* Config properties:
|
||||
* <ul>
|
||||
* <li>doc.random.id.limit=N (default -1) -- create random docid in the range
|
||||
* 0..N; this is useful with UpdateDoc to test updating random documents; if
|
||||
* this is unspecified or -1, then docid is sequentially assigned
|
||||
* </ul>
|
||||
* @deprecated Please use {@link DocMaker} instead, with content.source=LineDocSource
|
||||
*/
|
||||
@Deprecated
|
||||
public class LineDocMaker extends DocMaker {
|
||||
@Override
|
||||
public void setConfig(Config config) {
|
||||
super.setConfig(config);
|
||||
source = new LineDocSource();
|
||||
source.setConfig(config);
|
||||
System.out.println("NOTE: LineDocMaker is deprecated; please use DocMaker instead (which is the default if you don't specify doc.maker) with content.source=LineDocSource");
|
||||
}
|
||||
}
|
|
@ -37,11 +37,12 @@ import org.apache.lucene.benchmark.byTask.stats.TaskStats;
|
|||
import org.apache.lucene.collation.CollationKeyAnalyzer;
|
||||
import org.apache.lucene.index.IndexReader;
|
||||
import org.apache.lucene.index.IndexWriter;
|
||||
import org.apache.lucene.index.TermsEnum;
|
||||
import org.apache.lucene.index.MultiFields;
|
||||
import org.apache.lucene.index.FieldsEnum;
|
||||
import org.apache.lucene.index.DocsEnum;
|
||||
import org.apache.lucene.index.IndexWriterConfig;
|
||||
import org.apache.lucene.index.LogMergePolicy;
|
||||
import org.apache.lucene.index.Term;
|
||||
import org.apache.lucene.index.TermEnum;
|
||||
import org.apache.lucene.index.TermDocs;
|
||||
import org.apache.lucene.index.SerialMergeScheduler;
|
||||
import org.apache.lucene.index.LogDocMergePolicy;
|
||||
import org.apache.lucene.index.TermFreqVector;
|
||||
|
@ -474,16 +475,20 @@ public class TestPerfTasksLogic extends LuceneTestCase {
|
|||
IndexReader reader = IndexReader.open(benchmark.getRunData().getDirectory(), true);
|
||||
assertEquals(NUM_DOCS, reader.numDocs());
|
||||
|
||||
TermEnum terms = reader.terms();
|
||||
TermDocs termDocs = reader.termDocs();
|
||||
int totalTokenCount2 = 0;
|
||||
while(terms.next()) {
|
||||
Term term = terms.term();
|
||||
/* not-tokenized, but indexed field */
|
||||
if (term != null && term.field() != DocMaker.ID_FIELD) {
|
||||
termDocs.seek(terms.term());
|
||||
while (termDocs.next())
|
||||
totalTokenCount2 += termDocs.freq();
|
||||
|
||||
FieldsEnum fields = MultiFields.getFields(reader).iterator();
|
||||
String fieldName = null;
|
||||
while((fieldName = fields.next()) != null) {
|
||||
if (fieldName == DocMaker.ID_FIELD)
|
||||
continue;
|
||||
TermsEnum terms = fields.terms();
|
||||
DocsEnum docs = null;
|
||||
while(terms.next() != null) {
|
||||
docs = terms.docs(MultiFields.getDeletedDocs(reader), docs);
|
||||
while(docs.nextDoc() != docs.NO_MORE_DOCS) {
|
||||
totalTokenCount2 += docs.freq();
|
||||
}
|
||||
}
|
||||
}
|
||||
reader.close();
|
||||
|
|
|
@ -150,11 +150,16 @@ public class WeightedSpanTermExtractor {
|
|||
mtq.setRewriteMethod(MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRITE);
|
||||
query = mtq;
|
||||
}
|
||||
FakeReader fReader = new FakeReader();
|
||||
MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRITE.rewrite(fReader, mtq);
|
||||
if (fReader.field != null) {
|
||||
IndexReader ir = getReaderForField(fReader.field);
|
||||
if (mtq.getField() != null) {
|
||||
IndexReader ir = getReaderForField(mtq.getField());
|
||||
extract(query.rewrite(ir), terms);
|
||||
} else {
|
||||
FakeReader fReader = new FakeReader();
|
||||
MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRITE.rewrite(fReader, mtq);
|
||||
if (fReader.field != null) {
|
||||
IndexReader ir = getReaderForField(fReader.field);
|
||||
extract(query.rewrite(ir), terms);
|
||||
}
|
||||
}
|
||||
} else if (query instanceof MultiPhraseQuery) {
|
||||
final MultiPhraseQuery mpq = (MultiPhraseQuery) query;
|
||||
|
|
|
@ -19,11 +19,15 @@ package org.apache.lucene.index;
|
|||
import java.io.IOException;
|
||||
import java.io.File;
|
||||
import java.util.Date;
|
||||
import java.util.List;
|
||||
import java.util.ArrayList;
|
||||
|
||||
import org.apache.lucene.search.Similarity;
|
||||
import org.apache.lucene.store.Directory;
|
||||
import org.apache.lucene.store.FSDirectory;
|
||||
import org.apache.lucene.util.StringHelper;
|
||||
import org.apache.lucene.util.Bits;
|
||||
import org.apache.lucene.util.ReaderUtil;
|
||||
|
||||
/**
|
||||
* Given a directory and a list of fields, updates the fieldNorms in place for every document.
|
||||
|
@ -104,46 +108,46 @@ public class FieldNormModifier {
|
|||
*/
|
||||
public void reSetNorms(String field) throws IOException {
|
||||
String fieldName = StringHelper.intern(field);
|
||||
int[] termCounts = new int[0];
|
||||
|
||||
IndexReader reader = null;
|
||||
TermEnum termEnum = null;
|
||||
TermDocs termDocs = null;
|
||||
try {
|
||||
reader = IndexReader.open(dir, true);
|
||||
termCounts = new int[reader.maxDoc()];
|
||||
try {
|
||||
termEnum = reader.terms(new Term(field));
|
||||
try {
|
||||
termDocs = reader.termDocs();
|
||||
do {
|
||||
Term term = termEnum.term();
|
||||
if (term != null && term.field().equals(fieldName)) {
|
||||
termDocs.seek(termEnum.term());
|
||||
while (termDocs.next()) {
|
||||
termCounts[termDocs.doc()] += termDocs.freq();
|
||||
}
|
||||
}
|
||||
} while (termEnum.next());
|
||||
|
||||
} finally {
|
||||
if (null != termDocs) termDocs.close();
|
||||
}
|
||||
} finally {
|
||||
if (null != termEnum) termEnum.close();
|
||||
}
|
||||
} finally {
|
||||
if (null != reader) reader.close();
|
||||
}
|
||||
|
||||
try {
|
||||
reader = IndexReader.open(dir, false);
|
||||
for (int d = 0; d < termCounts.length; d++) {
|
||||
if (! reader.isDeleted(d)) {
|
||||
if (sim == null)
|
||||
reader.setNorm(d, fieldName, Similarity.encodeNorm(1.0f));
|
||||
else
|
||||
reader.setNorm(d, fieldName, sim.encodeNormValue(sim.lengthNorm(fieldName, termCounts[d])));
|
||||
|
||||
final List<IndexReader> subReaders = new ArrayList<IndexReader>();
|
||||
ReaderUtil.gatherSubReaders(subReaders, reader);
|
||||
|
||||
for(IndexReader subReader : subReaders) {
|
||||
final Bits delDocs = subReader.getDeletedDocs();
|
||||
|
||||
int[] termCounts = new int[subReader.maxDoc()];
|
||||
Fields fields = subReader.fields();
|
||||
if (fields != null) {
|
||||
Terms terms = fields.terms(field);
|
||||
if (terms != null) {
|
||||
TermsEnum termsEnum = terms.iterator();
|
||||
DocsEnum docs = null;
|
||||
while(termsEnum.next() != null) {
|
||||
docs = termsEnum.docs(delDocs, docs);
|
||||
while(true) {
|
||||
int docID = docs.nextDoc();
|
||||
if (docID != docs.NO_MORE_DOCS) {
|
||||
termCounts[docID] += docs.freq();
|
||||
} else {
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
for (int d = 0; d < termCounts.length; d++) {
|
||||
if (delDocs == null || !delDocs.get(d)) {
|
||||
if (sim == null) {
|
||||
subReader.setNorm(d, fieldName, Similarity.encodeNorm(1.0f));
|
||||
} else {
|
||||
subReader.setNorm(d, fieldName, sim.encodeNormValue(sim.lengthNorm(fieldName, termCounts[d])));
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
@ -151,5 +155,4 @@ public class FieldNormModifier {
|
|||
if (null != reader) reader.close();
|
||||
}
|
||||
}
|
||||
|
||||
}
|
||||
|
|
|
@ -26,6 +26,7 @@ import org.apache.lucene.index.IndexWriterConfig.OpenMode;
|
|||
import org.apache.lucene.store.Directory;
|
||||
import org.apache.lucene.store.FSDirectory;
|
||||
import org.apache.lucene.util.OpenBitSet;
|
||||
import org.apache.lucene.util.Bits;
|
||||
import org.apache.lucene.util.Version;
|
||||
|
||||
/**
|
||||
|
@ -172,6 +173,8 @@ public class MultiPassIndexSplitter {
|
|||
* list of deletions.
|
||||
*/
|
||||
public static class FakeDeleteIndexReader extends FilterIndexReader {
|
||||
// TODO: switch to flex api, here
|
||||
|
||||
OpenBitSet dels;
|
||||
OpenBitSet oldDels = null;
|
||||
|
||||
|
@ -202,6 +205,7 @@ public class MultiPassIndexSplitter {
|
|||
if (oldDels != null) {
|
||||
dels.or(oldDels);
|
||||
}
|
||||
storeDelDocs(null);
|
||||
}
|
||||
|
||||
@Override
|
||||
|
@ -214,6 +218,16 @@ public class MultiPassIndexSplitter {
|
|||
return !dels.isEmpty();
|
||||
}
|
||||
|
||||
@Override
|
||||
public IndexReader[] getSequentialSubReaders() {
|
||||
return null;
|
||||
}
|
||||
|
||||
@Override
|
||||
public Bits getDeletedDocs() {
|
||||
return dels;
|
||||
}
|
||||
|
||||
@Override
|
||||
public boolean isDeleted(int n) {
|
||||
return dels.get(n);
|
||||
|
@ -235,5 +249,29 @@ public class MultiPassIndexSplitter {
|
|||
}
|
||||
};
|
||||
}
|
||||
|
||||
@Override
|
||||
public TermDocs termDocs() throws IOException {
|
||||
return new FilterTermDocs(in.termDocs()) {
|
||||
|
||||
@Override
|
||||
public boolean next() throws IOException {
|
||||
boolean res;
|
||||
while ((res = super.next())) {
|
||||
if (!dels.get(doc())) {
|
||||
break;
|
||||
}
|
||||
}
|
||||
return res;
|
||||
}
|
||||
};
|
||||
}
|
||||
|
||||
@Override
|
||||
public TermDocs termDocs(Term term) throws IOException {
|
||||
TermDocs termDocs = termDocs();
|
||||
termDocs.seek(term);
|
||||
return termDocs;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
|
@ -1,10 +1,5 @@
|
|||
package org.apache.lucene.index;
|
||||
|
||||
import org.apache.lucene.util.StringHelper;
|
||||
|
||||
import java.io.IOException;
|
||||
import java.util.ArrayList;
|
||||
import java.util.List;
|
||||
/*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
|
@ -20,6 +15,14 @@ import java.util.List;
|
|||
*
|
||||
*/
|
||||
|
||||
import org.apache.lucene.util.StringHelper;
|
||||
import org.apache.lucene.util.Bits;
|
||||
import org.apache.lucene.util.BytesRef;
|
||||
|
||||
import java.io.IOException;
|
||||
import java.util.ArrayList;
|
||||
import java.util.List;
|
||||
|
||||
|
||||
/**
|
||||
* Transparent access to the vector space model,
|
||||
|
@ -97,40 +100,53 @@ public class TermVectorAccessor {
|
|||
positions.clear();
|
||||
}
|
||||
|
||||
TermEnum termEnum = indexReader.terms(new Term(field, ""));
|
||||
if (termEnum.term() != null) {
|
||||
while (termEnum.term().field() == field) {
|
||||
TermPositions termPositions = indexReader.termPositions(termEnum.term());
|
||||
if (termPositions.skipTo(documentNumber)) {
|
||||
|
||||
frequencies.add(Integer.valueOf(termPositions.freq()));
|
||||
tokens.add(termEnum.term().text());
|
||||
|
||||
final Bits delDocs = MultiFields.getDeletedDocs(indexReader);
|
||||
|
||||
Terms terms = MultiFields.getTerms(indexReader, field);
|
||||
boolean anyTerms = false;
|
||||
if (terms != null) {
|
||||
TermsEnum termsEnum = terms.iterator();
|
||||
DocsEnum docs = null;
|
||||
DocsAndPositionsEnum postings = null;
|
||||
while(true) {
|
||||
BytesRef text = termsEnum.next();
|
||||
if (text != null) {
|
||||
anyTerms = true;
|
||||
if (!mapper.isIgnoringPositions()) {
|
||||
int[] positions = new int[termPositions.freq()];
|
||||
for (int i = 0; i < positions.length; i++) {
|
||||
positions[i] = termPositions.nextPosition();
|
||||
}
|
||||
this.positions.add(positions);
|
||||
docs = postings = termsEnum.docsAndPositions(delDocs, postings);
|
||||
} else {
|
||||
positions.add(null);
|
||||
docs = termsEnum.docs(delDocs, docs);
|
||||
}
|
||||
}
|
||||
termPositions.close();
|
||||
if (!termEnum.next()) {
|
||||
|
||||
int docID = docs.advance(documentNumber);
|
||||
if (docID == documentNumber) {
|
||||
|
||||
frequencies.add(Integer.valueOf(docs.freq()));
|
||||
tokens.add(text.utf8ToString());
|
||||
|
||||
if (!mapper.isIgnoringPositions()) {
|
||||
int[] positions = new int[docs.freq()];
|
||||
for (int i = 0; i < positions.length; i++) {
|
||||
positions[i] = postings.nextPosition();
|
||||
}
|
||||
this.positions.add(positions);
|
||||
} else {
|
||||
positions.add(null);
|
||||
}
|
||||
}
|
||||
} else {
|
||||
break;
|
||||
}
|
||||
}
|
||||
mapper.setDocumentNumber(documentNumber);
|
||||
mapper.setExpectations(field, tokens.size(), false, !mapper.isIgnoringPositions());
|
||||
for (int i = 0; i < tokens.size(); i++) {
|
||||
mapper.map(tokens.get(i), frequencies.get(i).intValue(), (TermVectorOffsetInfo[]) null, positions.get(i));
|
||||
|
||||
if (anyTerms) {
|
||||
mapper.setDocumentNumber(documentNumber);
|
||||
mapper.setExpectations(field, tokens.size(), false, !mapper.isIgnoringPositions());
|
||||
for (int i = 0; i < tokens.size(); i++) {
|
||||
mapper.map(tokens.get(i), frequencies.get(i).intValue(), (TermVectorOffsetInfo[]) null, positions.get(i));
|
||||
}
|
||||
}
|
||||
}
|
||||
termEnum.close();
|
||||
|
||||
|
||||
}
|
||||
|
||||
|
||||
|
|
|
@ -18,7 +18,10 @@ package org.apache.lucene.misc;
|
|||
|
||||
import org.apache.lucene.index.IndexReader;
|
||||
import org.apache.lucene.index.Term;
|
||||
import org.apache.lucene.index.TermEnum;
|
||||
import org.apache.lucene.util.BytesRef;
|
||||
import org.apache.lucene.index.TermsEnum;
|
||||
import org.apache.lucene.index.FieldsEnum;
|
||||
import org.apache.lucene.index.Terms;
|
||||
import org.apache.lucene.store.FSDirectory;
|
||||
import org.apache.lucene.util.PriorityQueue;
|
||||
|
||||
|
@ -50,20 +53,40 @@ public class HighFreqTerms {
|
|||
}
|
||||
|
||||
TermInfoQueue tiq = new TermInfoQueue(numTerms);
|
||||
TermEnum terms = reader.terms();
|
||||
|
||||
if (field != null) {
|
||||
while (terms.next()) {
|
||||
if (terms.term().field().equals(field)) {
|
||||
tiq.insertWithOverflow(new TermInfo(terms.term(), terms.docFreq()));
|
||||
Terms terms = reader.fields().terms(field);
|
||||
if (terms != null) {
|
||||
TermsEnum termsEnum = terms.iterator();
|
||||
while(true) {
|
||||
BytesRef term = termsEnum.next();
|
||||
if (term != null) {
|
||||
tiq.insertWithOverflow(new TermInfo(new Term(field, term.utf8ToString()), termsEnum.docFreq()));
|
||||
} else {
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
} else {
|
||||
FieldsEnum fields = reader.fields().iterator();
|
||||
while(true) {
|
||||
field = fields.next();
|
||||
if (field != null) {
|
||||
TermsEnum terms = fields.terms();
|
||||
while(true) {
|
||||
BytesRef term = terms.next();
|
||||
if (term != null) {
|
||||
tiq.insertWithOverflow(new TermInfo(new Term(field, term.toString()), terms.docFreq()));
|
||||
} else {
|
||||
break;
|
||||
}
|
||||
}
|
||||
} else {
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
else {
|
||||
while (terms.next()) {
|
||||
tiq.insertWithOverflow(new TermInfo(terms.term(), terms.docFreq()));
|
||||
}
|
||||
}
|
||||
|
||||
while (tiq.size() != 0) {
|
||||
TermInfo termInfo = tiq.pop();
|
||||
System.out.println(termInfo.term + " " + termInfo.docFreq);
|
||||
|
|
|
@ -0,0 +1,154 @@
|
|||
package org.apache.lucene.misc;
|
||||
|
||||
/**
|
||||
* Copyright 2006 The Apache Software Foundation
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
|
||||
import org.apache.lucene.index.Term;
|
||||
import org.apache.lucene.index.TermEnum;
|
||||
import org.apache.lucene.index.TermDocs;
|
||||
import org.apache.lucene.index.IndexReader;
|
||||
import org.apache.lucene.search.Similarity;
|
||||
import org.apache.lucene.store.Directory;
|
||||
import org.apache.lucene.store.FSDirectory;
|
||||
import org.apache.lucene.util.StringHelper;
|
||||
|
||||
import java.io.File;
|
||||
import java.io.IOException;
|
||||
import java.util.Date;
|
||||
|
||||
/**
|
||||
* Given a directory, a Similarity, and a list of fields, updates the
|
||||
* fieldNorms in place for every document using the Similarity.lengthNorm.
|
||||
*
|
||||
* <p>
|
||||
* NOTE: This only works if you do <b>not</b> use field/document boosts in your
|
||||
* index.
|
||||
* </p>
|
||||
*
|
||||
* @version $Id$
|
||||
* @deprecated Use {@link org.apache.lucene.index.FieldNormModifier}
|
||||
*/
|
||||
@Deprecated
|
||||
public class LengthNormModifier {
|
||||
|
||||
/**
|
||||
* Command Line Execution method.
|
||||
*
|
||||
* <pre>
|
||||
* Usage: LengthNormModifier /path/index package.SimilarityClassName field1 field2 ...
|
||||
* </pre>
|
||||
*/
|
||||
public static void main(String[] args) throws IOException {
|
||||
if (args.length < 3) {
|
||||
System.err.println("Usage: LengthNormModifier <index> <package.SimilarityClassName> <field1> [field2] ...");
|
||||
System.exit(1);
|
||||
}
|
||||
|
||||
Similarity s = null;
|
||||
try {
|
||||
s = Class.forName(args[1]).asSubclass(Similarity.class).newInstance();
|
||||
} catch (Exception e) {
|
||||
System.err.println("Couldn't instantiate similarity with empty constructor: " + args[1]);
|
||||
e.printStackTrace(System.err);
|
||||
}
|
||||
|
||||
File index = new File(args[0]);
|
||||
Directory d = FSDirectory.open(index);
|
||||
|
||||
LengthNormModifier lnm = new LengthNormModifier(d, s);
|
||||
|
||||
for (int i = 2; i < args.length; i++) {
|
||||
System.out.print("Updating field: " + args[i] + " " + (new Date()).toString() + " ... ");
|
||||
lnm.reSetNorms(args[i]);
|
||||
System.out.println(new Date().toString());
|
||||
}
|
||||
|
||||
d.close();
|
||||
}
|
||||
|
||||
|
||||
private Directory dir;
|
||||
private Similarity sim;
|
||||
|
||||
/**
|
||||
* Constructor for code that wishes to use this class progaomatically.
|
||||
*
|
||||
* @param d The Directory to modify
|
||||
* @param s The Similarity to use in <code>reSetNorms</code>
|
||||
*/
|
||||
public LengthNormModifier(Directory d, Similarity s) {
|
||||
dir = d;
|
||||
sim = s;
|
||||
}
|
||||
|
||||
/**
|
||||
* Resets the norms for the specified field.
|
||||
*
|
||||
* <p>
|
||||
* Opens a new IndexReader on the Directory given to this instance,
|
||||
* modifies the norms using the Similarity given to this instance,
|
||||
* and closes the IndexReader.
|
||||
* </p>
|
||||
*
|
||||
* @param field the field whose norms should be reset
|
||||
*/
|
||||
public void reSetNorms(String field) throws IOException {
|
||||
String fieldName = StringHelper.intern(field);
|
||||
int[] termCounts = new int[0];
|
||||
|
||||
IndexReader reader = null;
|
||||
TermEnum termEnum = null;
|
||||
TermDocs termDocs = null;
|
||||
try {
|
||||
reader = IndexReader.open(dir, false);
|
||||
termCounts = new int[reader.maxDoc()];
|
||||
try {
|
||||
termEnum = reader.terms(new Term(field));
|
||||
try {
|
||||
termDocs = reader.termDocs();
|
||||
do {
|
||||
Term term = termEnum.term();
|
||||
if (term != null && term.field().equals(fieldName)) {
|
||||
termDocs.seek(termEnum.term());
|
||||
while (termDocs.next()) {
|
||||
termCounts[termDocs.doc()] += termDocs.freq();
|
||||
}
|
||||
}
|
||||
} while (termEnum.next());
|
||||
} finally {
|
||||
if (null != termDocs) termDocs.close();
|
||||
}
|
||||
} finally {
|
||||
if (null != termEnum) termEnum.close();
|
||||
}
|
||||
} finally {
|
||||
if (null != reader) reader.close();
|
||||
}
|
||||
|
||||
try {
|
||||
reader = IndexReader.open(dir, false);
|
||||
for (int d = 0; d < termCounts.length; d++) {
|
||||
if (! reader.isDeleted(d)) {
|
||||
byte norm = Similarity.encodeNorm(sim.lengthNorm(fieldName, termCounts[d]));
|
||||
reader.setNorm(d, fieldName, norm);
|
||||
}
|
||||
}
|
||||
} finally {
|
||||
if (null != reader) reader.close();
|
||||
}
|
||||
}
|
||||
|
||||
}
|
|
@ -76,13 +76,9 @@ public class TestFieldNormModifier extends LuceneTestCase {
|
|||
writer.close();
|
||||
}
|
||||
|
||||
public void testMissingField() {
|
||||
public void testMissingField() throws Exception {
|
||||
FieldNormModifier fnm = new FieldNormModifier(store, s);
|
||||
try {
|
||||
fnm.reSetNorms("nobodyherebutuschickens");
|
||||
} catch (Exception e) {
|
||||
assertNull("caught something", e);
|
||||
}
|
||||
fnm.reSetNorms("nobodyherebutuschickens");
|
||||
}
|
||||
|
||||
public void testFieldWithNoNorm() throws Exception {
|
||||
|
@ -97,11 +93,7 @@ public class TestFieldNormModifier extends LuceneTestCase {
|
|||
r.close();
|
||||
|
||||
FieldNormModifier fnm = new FieldNormModifier(store, s);
|
||||
try {
|
||||
fnm.reSetNorms("nonorm");
|
||||
} catch (Exception e) {
|
||||
assertNull("caught something", e);
|
||||
}
|
||||
fnm.reSetNorms("nonorm");
|
||||
|
||||
// nothing should have changed
|
||||
r = IndexReader.open(store, false);
|
||||
|
|
|
@ -18,10 +18,13 @@ package org.apache.lucene.search;
|
|||
import java.io.IOException;
|
||||
|
||||
import org.apache.lucene.index.IndexReader;
|
||||
import org.apache.lucene.index.Term;
|
||||
import org.apache.lucene.index.TermDocs;
|
||||
import org.apache.lucene.index.TermEnum;
|
||||
import org.apache.lucene.util.BytesRef;
|
||||
import org.apache.lucene.index.Terms;
|
||||
import org.apache.lucene.index.DocsEnum;
|
||||
import org.apache.lucene.index.TermsEnum;
|
||||
import org.apache.lucene.index.MultiFields;
|
||||
import org.apache.lucene.util.OpenBitSet;
|
||||
import org.apache.lucene.util.Bits;
|
||||
|
||||
public class DuplicateFilter extends Filter
|
||||
{
|
||||
|
@ -79,88 +82,87 @@ public class DuplicateFilter extends Filter
|
|||
}
|
||||
}
|
||||
|
||||
private OpenBitSet correctBits(IndexReader reader) throws IOException
|
||||
{
|
||||
|
||||
OpenBitSet bits=new OpenBitSet(reader.maxDoc()); //assume all are INvalid
|
||||
Term startTerm=new Term(fieldName);
|
||||
TermEnum te = reader.terms(startTerm);
|
||||
if(te!=null)
|
||||
{
|
||||
Term currTerm=te.term();
|
||||
while((currTerm!=null)&&(currTerm.field()==startTerm.field())) //term fieldnames are interned
|
||||
{
|
||||
int lastDoc=-1;
|
||||
//set non duplicates
|
||||
TermDocs td = reader.termDocs(currTerm);
|
||||
if(td.next())
|
||||
{
|
||||
if(keepMode==KM_USE_FIRST_OCCURRENCE)
|
||||
{
|
||||
bits.set(td.doc());
|
||||
}
|
||||
else
|
||||
{
|
||||
do
|
||||
{
|
||||
lastDoc=td.doc();
|
||||
}while(td.next());
|
||||
bits.set(lastDoc);
|
||||
}
|
||||
}
|
||||
if(!te.next())
|
||||
{
|
||||
break;
|
||||
}
|
||||
currTerm=te.term();
|
||||
}
|
||||
}
|
||||
return bits;
|
||||
}
|
||||
private OpenBitSet correctBits(IndexReader reader) throws IOException {
|
||||
OpenBitSet bits = new OpenBitSet(reader.maxDoc()); //assume all are INvalid
|
||||
final Bits delDocs = MultiFields.getDeletedDocs(reader);
|
||||
Terms terms = reader.fields().terms(fieldName);
|
||||
if (terms != null) {
|
||||
TermsEnum termsEnum = terms.iterator();
|
||||
DocsEnum docs = null;
|
||||
while(true) {
|
||||
BytesRef currTerm = termsEnum.next();
|
||||
if (currTerm == null) {
|
||||
break;
|
||||
} else {
|
||||
docs = termsEnum.docs(delDocs, docs);
|
||||
int doc = docs.nextDoc();
|
||||
if (doc != docs.NO_MORE_DOCS) {
|
||||
if (keepMode == KM_USE_FIRST_OCCURRENCE) {
|
||||
bits.set(doc);
|
||||
} else {
|
||||
int lastDoc = doc;
|
||||
while (true) {
|
||||
lastDoc = doc;
|
||||
doc = docs.nextDoc();
|
||||
if (doc == docs.NO_MORE_DOCS) {
|
||||
break;
|
||||
}
|
||||
}
|
||||
bits.set(lastDoc);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
return bits;
|
||||
}
|
||||
|
||||
private OpenBitSet fastBits(IndexReader reader) throws IOException
|
||||
{
|
||||
{
|
||||
|
||||
OpenBitSet bits=new OpenBitSet(reader.maxDoc());
|
||||
bits.set(0,reader.maxDoc()); //assume all are valid
|
||||
Term startTerm=new Term(fieldName);
|
||||
TermEnum te = reader.terms(startTerm);
|
||||
if(te!=null)
|
||||
{
|
||||
Term currTerm=te.term();
|
||||
bits.set(0,reader.maxDoc()); //assume all are valid
|
||||
final Bits delDocs = MultiFields.getDeletedDocs(reader);
|
||||
Terms terms = reader.fields().terms(fieldName);
|
||||
if (terms != null) {
|
||||
TermsEnum termsEnum = terms.iterator();
|
||||
DocsEnum docs = null;
|
||||
while(true) {
|
||||
BytesRef currTerm = termsEnum.next();
|
||||
if (currTerm == null) {
|
||||
break;
|
||||
} else {
|
||||
if (termsEnum.docFreq() > 1) {
|
||||
// unset potential duplicates
|
||||
docs = termsEnum.docs(delDocs, docs);
|
||||
int doc = docs.nextDoc();
|
||||
if (doc != docs.NO_MORE_DOCS) {
|
||||
if (keepMode == KM_USE_FIRST_OCCURRENCE) {
|
||||
doc = docs.nextDoc();
|
||||
}
|
||||
}
|
||||
|
||||
while((currTerm!=null)&&(currTerm.field()==startTerm.field())) //term fieldnames are interned
|
||||
{
|
||||
if(te.docFreq()>1)
|
||||
{
|
||||
int lastDoc=-1;
|
||||
//unset potential duplicates
|
||||
TermDocs td = reader.termDocs(currTerm);
|
||||
td.next();
|
||||
if(keepMode==KM_USE_FIRST_OCCURRENCE)
|
||||
{
|
||||
td.next();
|
||||
}
|
||||
do
|
||||
{
|
||||
lastDoc=td.doc();
|
||||
bits.clear(lastDoc);
|
||||
}while(td.next());
|
||||
if(keepMode==KM_USE_LAST_OCCURRENCE)
|
||||
{
|
||||
//restore the last bit
|
||||
bits.set(lastDoc);
|
||||
}
|
||||
}
|
||||
if(!te.next())
|
||||
{
|
||||
break;
|
||||
}
|
||||
currTerm=te.term();
|
||||
}
|
||||
}
|
||||
return bits;
|
||||
}
|
||||
int lastDoc = -1;
|
||||
while (true) {
|
||||
lastDoc = doc;
|
||||
bits.clear(lastDoc);
|
||||
doc = docs.nextDoc();
|
||||
if (doc == docs.NO_MORE_DOCS) {
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
if (keepMode==KM_USE_LAST_OCCURRENCE) {
|
||||
// restore the last bit
|
||||
bits.set(lastDoc);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return bits;
|
||||
}
|
||||
|
||||
public String getFieldName()
|
||||
{
|
||||
|
|
|
@ -29,7 +29,7 @@ import org.apache.lucene.analysis.TokenStream;
|
|||
import org.apache.lucene.analysis.tokenattributes.TermAttribute;
|
||||
import org.apache.lucene.index.IndexReader;
|
||||
import org.apache.lucene.index.Term;
|
||||
import org.apache.lucene.index.TermEnum;
|
||||
import org.apache.lucene.util.BytesRef;
|
||||
import org.apache.lucene.util.PriorityQueue;
|
||||
|
||||
/**
|
||||
|
@ -172,8 +172,8 @@ public class FuzzyLikeThisQuery extends Query
|
|||
* Adds user input for "fuzzification"
|
||||
* @param queryString The string which will be parsed by the analyzer and for which fuzzy variants will be parsed
|
||||
* @param fieldName
|
||||
* @param minSimilarity The minimum similarity of the term variants (see FuzzyTermEnum)
|
||||
* @param prefixLength Length of required common prefix on variant terms (see FuzzyTermEnum)
|
||||
* @param minSimilarity The minimum similarity of the term variants (see FuzzyTermsEnum)
|
||||
* @param prefixLength Length of required common prefix on variant terms (see FuzzyTermsEnum)
|
||||
*/
|
||||
public void addTerms(String queryString, String fieldName,float minSimilarity, int prefixLength)
|
||||
{
|
||||
|
@ -195,48 +195,44 @@ public class FuzzyLikeThisQuery extends Query
|
|||
String term = termAtt.term();
|
||||
if(!processedTerms.contains(term))
|
||||
{
|
||||
processedTerms.add(term);
|
||||
ScoreTermQueue variantsQ=new ScoreTermQueue(MAX_VARIANTS_PER_TERM); //maxNum variants considered for any one term
|
||||
float minScore=0;
|
||||
Term startTerm=internSavingTemplateTerm.createTerm(term);
|
||||
FuzzyTermEnum fe=new FuzzyTermEnum(reader,startTerm,f.minSimilarity,f.prefixLength);
|
||||
TermEnum origEnum = reader.terms(startTerm);
|
||||
int df=0;
|
||||
if(startTerm.equals(origEnum.term()))
|
||||
{
|
||||
df=origEnum.docFreq(); //store the df so all variants use same idf
|
||||
}
|
||||
int numVariants=0;
|
||||
int totalVariantDocFreqs=0;
|
||||
do
|
||||
{
|
||||
Term possibleMatch=fe.term();
|
||||
if(possibleMatch!=null)
|
||||
{
|
||||
numVariants++;
|
||||
totalVariantDocFreqs+=fe.docFreq();
|
||||
float score=fe.difference();
|
||||
if(variantsQ.size() < MAX_VARIANTS_PER_TERM || score > minScore){
|
||||
ScoreTerm st=new ScoreTerm(possibleMatch,score,startTerm);
|
||||
variantsQ.insertWithOverflow(st);
|
||||
minScore = variantsQ.top().score; // maintain minScore
|
||||
}
|
||||
processedTerms.add(term);
|
||||
ScoreTermQueue variantsQ=new ScoreTermQueue(MAX_VARIANTS_PER_TERM); //maxNum variants considered for any one term
|
||||
float minScore=0;
|
||||
Term startTerm=internSavingTemplateTerm.createTerm(term);
|
||||
FuzzyTermsEnum fe = new FuzzyTermsEnum(reader, startTerm, f.minSimilarity, f.prefixLength);
|
||||
//store the df so all variants use same idf
|
||||
int df = reader.docFreq(startTerm);
|
||||
int numVariants=0;
|
||||
int totalVariantDocFreqs=0;
|
||||
BytesRef possibleMatch;
|
||||
MultiTermQuery.BoostAttribute boostAtt =
|
||||
fe.attributes().addAttribute(MultiTermQuery.BoostAttribute.class);
|
||||
while ((possibleMatch = fe.next()) != null) {
|
||||
if (possibleMatch!=null) {
|
||||
numVariants++;
|
||||
totalVariantDocFreqs+=fe.docFreq();
|
||||
float score=boostAtt.getBoost();
|
||||
if (variantsQ.size() < MAX_VARIANTS_PER_TERM || score > minScore){
|
||||
ScoreTerm st=new ScoreTerm(new Term(startTerm.field(), possibleMatch.utf8ToString()),score,startTerm);
|
||||
variantsQ.insertWithOverflow(st);
|
||||
minScore = variantsQ.top().score; // maintain minScore
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
while(fe.next());
|
||||
if(numVariants>0)
|
||||
{
|
||||
int avgDf=totalVariantDocFreqs/numVariants;
|
||||
if(df==0)//no direct match we can use as df for all variants
|
||||
|
||||
if(numVariants>0)
|
||||
{
|
||||
int avgDf=totalVariantDocFreqs/numVariants;
|
||||
if(df==0)//no direct match we can use as df for all variants
|
||||
{
|
||||
df=avgDf; //use avg df of all variants
|
||||
}
|
||||
|
||||
// take the top variants (scored by edit distance) and reset the score
|
||||
// to include an IDF factor then add to the global queue for ranking
|
||||
// overall top query terms
|
||||
int size = variantsQ.size();
|
||||
for(int i = 0; i < size; i++)
|
||||
// take the top variants (scored by edit distance) and reset the score
|
||||
// to include an IDF factor then add to the global queue for ranking
|
||||
// overall top query terms
|
||||
int size = variantsQ.size();
|
||||
for(int i = 0; i < size; i++)
|
||||
{
|
||||
ScoreTerm st = variantsQ.pop();
|
||||
st.score=(st.score*st.score)*sim.idf(df,corpusNumDocs);
|
||||
|
|
|
@ -38,6 +38,7 @@ import org.apache.lucene.index.IndexWriter;
|
|||
import org.apache.lucene.index.IndexWriterConfig;
|
||||
import org.apache.lucene.index.LogMergePolicy;
|
||||
import org.apache.lucene.index.Term;
|
||||
import org.apache.lucene.util.BytesRef;
|
||||
import org.apache.lucene.store.RAMDirectory;
|
||||
import org.apache.lucene.util.LuceneTestCase;
|
||||
import org.apache.lucene.util._TestUtil;
|
||||
|
@ -219,8 +220,8 @@ public class TestRemoteSort extends LuceneTestCase implements Serializable {
|
|||
@Override
|
||||
public void setNextReader(IndexReader reader, int docBase) throws IOException {
|
||||
docValues = FieldCache.DEFAULT.getInts(reader, "parser", new FieldCache.IntParser() {
|
||||
public final int parseInt(final String val) {
|
||||
return (val.charAt(0)-'A') * 123456;
|
||||
public final int parseInt(BytesRef termRef) {
|
||||
return (termRef.utf8ToString().charAt(0)-'A') * 123456;
|
||||
}
|
||||
});
|
||||
}
|
||||
|
@ -245,6 +246,29 @@ public class TestRemoteSort extends LuceneTestCase implements Serializable {
|
|||
runMultiSorts(multi, true); // this runs on the full index
|
||||
}
|
||||
|
||||
// test custom search when remote
|
||||
/* rewrite with new API
|
||||
public void testRemoteCustomSort() throws Exception {
|
||||
Searchable searcher = getRemote();
|
||||
MultiSearcher multi = new MultiSearcher (new Searchable[] { searcher });
|
||||
sort.setSort (new SortField ("custom", SampleComparable.getComparatorSource()));
|
||||
assertMatches (multi, queryX, sort, "CAIEG");
|
||||
sort.setSort (new SortField ("custom", SampleComparable.getComparatorSource(), true));
|
||||
assertMatches (multi, queryY, sort, "HJDBF");
|
||||
|
||||
assertSaneFieldCaches(getName() + " ComparatorSource");
|
||||
FieldCache.DEFAULT.purgeAllCaches();
|
||||
|
||||
SortComparator custom = SampleComparable.getComparator();
|
||||
sort.setSort (new SortField ("custom", custom));
|
||||
assertMatches (multi, queryX, sort, "CAIEG");
|
||||
sort.setSort (new SortField ("custom", custom, true));
|
||||
assertMatches (multi, queryY, sort, "HJDBF");
|
||||
|
||||
assertSaneFieldCaches(getName() + " Comparator");
|
||||
FieldCache.DEFAULT.purgeAllCaches();
|
||||
}*/
|
||||
|
||||
// test that the relevancy scores are the same even if
|
||||
// hits are sorted
|
||||
public void testNormalizedScores() throws Exception {
|
||||
|
@ -294,7 +318,7 @@ public class TestRemoteSort extends LuceneTestCase implements Serializable {
|
|||
assertSameValues (scoresY, getScores (remote.search (queryY, null, 1000, sort).scoreDocs, remote));
|
||||
assertSameValues (scoresA, getScores (remote.search (queryA, null, 1000, sort).scoreDocs, remote));
|
||||
|
||||
sort.setSort (new SortField("float", SortField.FLOAT), new SortField("string", SortField.STRING));
|
||||
sort.setSort (new SortField("float", SortField.FLOAT));
|
||||
assertSameValues (scoresX, getScores (remote.search (queryX, null, 1000, sort).scoreDocs, remote));
|
||||
assertSameValues (scoresY, getScores (remote.search (queryY, null, 1000, sort).scoreDocs, remote));
|
||||
assertSameValues (scoresA, getScores (remote.search (queryA, null, 1000, sort).scoreDocs, remote));
|
||||
|
@ -314,6 +338,10 @@ public class TestRemoteSort extends LuceneTestCase implements Serializable {
|
|||
expected = isFull ? "IDHFGJABEC" : "IDHFGJAEBC";
|
||||
assertMatches(multi, queryA, sort, expected);
|
||||
|
||||
sort.setSort(new SortField ("int", SortField.INT));
|
||||
expected = isFull ? "IDHFGJABEC" : "IDHFGJAEBC";
|
||||
assertMatches(multi, queryA, sort, expected);
|
||||
|
||||
sort.setSort(new SortField ("float", SortField.FLOAT), SortField.FIELD_DOC);
|
||||
assertMatches(multi, queryA, sort, "GDHJCIEFAB");
|
||||
|
||||
|
|
|
@ -19,12 +19,15 @@ package org.apache.lucene.spatial.tier;
|
|||
import java.io.IOException;
|
||||
import java.util.List;
|
||||
|
||||
import org.apache.lucene.index.DocsEnum;
|
||||
import org.apache.lucene.index.IndexReader;
|
||||
import org.apache.lucene.index.Term;
|
||||
import org.apache.lucene.index.TermDocs;
|
||||
import org.apache.lucene.index.MultiFields;
|
||||
import org.apache.lucene.search.Filter;
|
||||
import org.apache.lucene.search.DocIdSet;
|
||||
import org.apache.lucene.search.DocIdSetIterator;
|
||||
import org.apache.lucene.util.NumericUtils;
|
||||
import org.apache.lucene.util.Bits;
|
||||
import org.apache.lucene.util.BytesRef;
|
||||
import org.apache.lucene.util.OpenBitSet;
|
||||
|
||||
/**
|
||||
|
@ -44,22 +47,41 @@ public class CartesianShapeFilter extends Filter {
|
|||
|
||||
@Override
|
||||
public DocIdSet getDocIdSet(final IndexReader reader) throws IOException {
|
||||
final OpenBitSet bits = new OpenBitSet(reader.maxDoc());
|
||||
final TermDocs termDocs = reader.termDocs();
|
||||
final Bits delDocs = MultiFields.getDeletedDocs(reader);
|
||||
final List<Double> area = shape.getArea();
|
||||
int sz = area.size();
|
||||
final int sz = area.size();
|
||||
|
||||
final Term term = new Term(fieldName);
|
||||
// iterate through each boxid
|
||||
for (int i =0; i< sz; i++) {
|
||||
double boxId = area.get(i).doubleValue();
|
||||
termDocs.seek(term.createTerm(NumericUtils.doubleToPrefixCoded(boxId)));
|
||||
// iterate through all documents
|
||||
// which have this boxId
|
||||
while (termDocs.next()) {
|
||||
bits.fastSet(termDocs.doc());
|
||||
final BytesRef bytesRef = new BytesRef(NumericUtils.BUF_SIZE_LONG);
|
||||
if (sz == 1) {
|
||||
double boxId = area.get(0).doubleValue();
|
||||
NumericUtils.longToPrefixCoded(NumericUtils.doubleToSortableLong(boxId), 0, bytesRef);
|
||||
return new DocIdSet() {
|
||||
@Override
|
||||
public DocIdSetIterator iterator() throws IOException {
|
||||
return MultiFields.getTermDocsEnum(reader, delDocs, fieldName, bytesRef);
|
||||
}
|
||||
|
||||
@Override
|
||||
public boolean isCacheable() {
|
||||
return false;
|
||||
}
|
||||
};
|
||||
} else {
|
||||
final OpenBitSet bits = new OpenBitSet(reader.maxDoc());
|
||||
for (int i =0; i< sz; i++) {
|
||||
double boxId = area.get(i).doubleValue();
|
||||
NumericUtils.longToPrefixCoded(NumericUtils.doubleToSortableLong(boxId), 0, bytesRef);
|
||||
final DocsEnum docsEnum = MultiFields.getTermDocsEnum(reader, delDocs, fieldName, bytesRef);
|
||||
if (docsEnum == null) continue;
|
||||
// iterate through all documents
|
||||
// which have this boxId
|
||||
int doc;
|
||||
while ((doc = docsEnum.nextDoc()) != DocIdSetIterator.NO_MORE_DOCS) {
|
||||
bits.fastSet(doc);
|
||||
}
|
||||
}
|
||||
return bits;
|
||||
}
|
||||
return bits;
|
||||
}
|
||||
}
|
||||
|
|
|
@ -24,6 +24,7 @@ import java.util.Map;
|
|||
import org.apache.lucene.analysis.WhitespaceAnalyzer;
|
||||
import org.apache.lucene.document.Document;
|
||||
import org.apache.lucene.document.Field;
|
||||
import org.apache.lucene.document.NumericField;
|
||||
import org.apache.lucene.index.IndexWriter;
|
||||
import org.apache.lucene.index.IndexReader;
|
||||
import org.apache.lucene.index.IndexWriterConfig;
|
||||
|
@ -49,7 +50,6 @@ import org.apache.lucene.spatial.tier.projections.SinusoidalProjector;
|
|||
import org.apache.lucene.store.Directory;
|
||||
import org.apache.lucene.store.RAMDirectory;
|
||||
import org.apache.lucene.util.LuceneTestCase;
|
||||
import org.apache.lucene.util.NumericUtils;
|
||||
|
||||
public class TestCartesian extends LuceneTestCase {
|
||||
|
||||
|
@ -96,8 +96,8 @@ public class TestCartesian extends LuceneTestCase {
|
|||
doc.add(new Field("name", name,Field.Store.YES, Field.Index.ANALYZED));
|
||||
|
||||
// convert the lat / long to lucene fields
|
||||
doc.add(new Field(latField, NumericUtils.doubleToPrefixCoded(lat),Field.Store.YES, Field.Index.NOT_ANALYZED));
|
||||
doc.add(new Field(lngField, NumericUtils.doubleToPrefixCoded(lng),Field.Store.YES, Field.Index.NOT_ANALYZED));
|
||||
doc.add(new NumericField(latField, Integer.MAX_VALUE, Field.Store.YES, true).setDoubleValue(lat));
|
||||
doc.add(new NumericField(lngField, Integer.MAX_VALUE, Field.Store.YES, true).setDoubleValue(lng));
|
||||
|
||||
// add a default meta field to make searching all documents easy
|
||||
doc.add(new Field("metafile", "doc",Field.Store.YES, Field.Index.ANALYZED));
|
||||
|
@ -105,10 +105,9 @@ public class TestCartesian extends LuceneTestCase {
|
|||
int ctpsize = ctps.size();
|
||||
for (int i =0; i < ctpsize; i++){
|
||||
CartesianTierPlotter ctp = ctps.get(i);
|
||||
doc.add(new Field(ctp.getTierFieldName(),
|
||||
NumericUtils.doubleToPrefixCoded(ctp.getTierBoxId(lat,lng)),
|
||||
doc.add(new NumericField(ctp.getTierFieldName(), Integer.MAX_VALUE,
|
||||
Field.Store.YES,
|
||||
Field.Index.NOT_ANALYZED_NO_NORMS));
|
||||
true).setDoubleValue(ctp.getTierBoxId(lat,lng)));
|
||||
|
||||
doc.add(new Field(geoHashPrefix, GeoHashUtils.encode(lat,lng),
|
||||
Field.Store.YES,
|
||||
|
@ -275,8 +274,8 @@ public class TestCartesian extends LuceneTestCase {
|
|||
Document d = searcher.doc(scoreDocs[i].doc);
|
||||
|
||||
String name = d.get("name");
|
||||
double rsLat = NumericUtils.prefixCodedToDouble(d.get(latField));
|
||||
double rsLng = NumericUtils.prefixCodedToDouble(d.get(lngField));
|
||||
double rsLat = Double.parseDouble(d.get(latField));
|
||||
double rsLng = Double.parseDouble(d.get(lngField));
|
||||
Double geo_distance = distances.get(scoreDocs[i].doc);
|
||||
|
||||
double distance = DistanceUtils.getInstance().getDistanceMi(lat, lng, rsLat, rsLng);
|
||||
|
@ -369,8 +368,8 @@ public class TestCartesian extends LuceneTestCase {
|
|||
for(int i =0 ; i < results; i++){
|
||||
Document d = searcher.doc(scoreDocs[i].doc);
|
||||
String name = d.get("name");
|
||||
double rsLat = NumericUtils.prefixCodedToDouble(d.get(latField));
|
||||
double rsLng = NumericUtils.prefixCodedToDouble(d.get(lngField));
|
||||
double rsLat = Double.parseDouble(d.get(latField));
|
||||
double rsLng = Double.parseDouble(d.get(lngField));
|
||||
Double geo_distance = distances.get(scoreDocs[i].doc);
|
||||
|
||||
double distance = DistanceUtils.getInstance().getDistanceMi(lat, lng, rsLat, rsLng);
|
||||
|
@ -464,8 +463,8 @@ public class TestCartesian extends LuceneTestCase {
|
|||
Document d = searcher.doc(scoreDocs[i].doc);
|
||||
|
||||
String name = d.get("name");
|
||||
double rsLat = NumericUtils.prefixCodedToDouble(d.get(latField));
|
||||
double rsLng = NumericUtils.prefixCodedToDouble(d.get(lngField));
|
||||
double rsLat = Double.parseDouble(d.get(latField));
|
||||
double rsLng = Double.parseDouble(d.get(lngField));
|
||||
Double geo_distance = distances.get(scoreDocs[i].doc);
|
||||
|
||||
double distance = DistanceUtils.getInstance().getDistanceMi(lat, lng, rsLat, rsLng);
|
||||
|
@ -558,8 +557,8 @@ public class TestCartesian extends LuceneTestCase {
|
|||
Document d = searcher.doc(scoreDocs[i].doc);
|
||||
|
||||
String name = d.get("name");
|
||||
double rsLat = NumericUtils.prefixCodedToDouble(d.get(latField));
|
||||
double rsLng = NumericUtils.prefixCodedToDouble(d.get(lngField));
|
||||
double rsLat = Double.parseDouble(d.get(latField));
|
||||
double rsLng = Double.parseDouble(d.get(lngField));
|
||||
Double geo_distance = distances.get(scoreDocs[i].doc);
|
||||
|
||||
double distance = DistanceUtils.getInstance().getDistanceMi(lat, lng, rsLat, rsLng);
|
||||
|
|
|
@ -21,6 +21,7 @@ import java.io.IOException;
|
|||
import org.apache.lucene.analysis.WhitespaceAnalyzer;
|
||||
import org.apache.lucene.document.Document;
|
||||
import org.apache.lucene.document.Field;
|
||||
import org.apache.lucene.document.NumericField;
|
||||
import org.apache.lucene.index.IndexWriter;
|
||||
import org.apache.lucene.index.IndexWriterConfig;
|
||||
import org.apache.lucene.index.Term;
|
||||
|
@ -28,7 +29,6 @@ import org.apache.lucene.index.IndexReader;
|
|||
import org.apache.lucene.search.QueryWrapperFilter;
|
||||
import org.apache.lucene.search.MatchAllDocsQuery;
|
||||
import org.apache.lucene.util.LuceneTestCase;
|
||||
import org.apache.lucene.util.NumericUtils;
|
||||
import org.apache.lucene.store.RAMDirectory;
|
||||
|
||||
public class TestDistance extends LuceneTestCase {
|
||||
|
@ -63,8 +63,8 @@ public class TestDistance extends LuceneTestCase {
|
|||
doc.add(new Field("name", name,Field.Store.YES, Field.Index.ANALYZED));
|
||||
|
||||
// convert the lat / long to lucene fields
|
||||
doc.add(new Field(latField, NumericUtils.doubleToPrefixCoded(lat),Field.Store.YES, Field.Index.NOT_ANALYZED));
|
||||
doc.add(new Field(lngField, NumericUtils.doubleToPrefixCoded(lng),Field.Store.YES, Field.Index.NOT_ANALYZED));
|
||||
doc.add(new NumericField(latField, Integer.MAX_VALUE, Field.Store.YES, true).setDoubleValue(lat));
|
||||
doc.add(new NumericField(lngField, Integer.MAX_VALUE,Field.Store.YES, true).setDoubleValue(lng));
|
||||
|
||||
// add a default meta field to make searching all documents easy
|
||||
doc.add(new Field("metafile", "doc",Field.Store.YES, Field.Index.ANALYZED));
|
||||
|
|
|
@ -21,8 +21,10 @@ import org.apache.lucene.index.IndexReader;
|
|||
|
||||
import java.util.Iterator;
|
||||
|
||||
import org.apache.lucene.index.TermEnum;
|
||||
import org.apache.lucene.index.Term;
|
||||
import org.apache.lucene.index.TermsEnum;
|
||||
import org.apache.lucene.util.BytesRef;
|
||||
import org.apache.lucene.index.Terms;
|
||||
import org.apache.lucene.index.MultiFields;
|
||||
import org.apache.lucene.util.StringHelper;
|
||||
|
||||
import java.io.*;
|
||||
|
@ -52,55 +54,39 @@ public class LuceneDictionary implements Dictionary {
|
|||
|
||||
|
||||
final class LuceneIterator implements Iterator<String> {
|
||||
private TermEnum termEnum;
|
||||
private Term actualTerm;
|
||||
private boolean hasNextCalled;
|
||||
private TermsEnum termsEnum;
|
||||
private BytesRef pendingTerm;
|
||||
|
||||
LuceneIterator() {
|
||||
try {
|
||||
termEnum = reader.terms(new Term(field));
|
||||
final Terms terms = MultiFields.getTerms(reader, field);
|
||||
if (terms != null) {
|
||||
termsEnum = terms.iterator();
|
||||
pendingTerm = termsEnum.next();
|
||||
}
|
||||
} catch (IOException e) {
|
||||
throw new RuntimeException(e);
|
||||
}
|
||||
}
|
||||
|
||||
public String next() {
|
||||
if (!hasNextCalled) {
|
||||
hasNext();
|
||||
if (pendingTerm == null) {
|
||||
return null;
|
||||
}
|
||||
hasNextCalled = false;
|
||||
|
||||
String result = pendingTerm.utf8ToString();
|
||||
|
||||
try {
|
||||
termEnum.next();
|
||||
pendingTerm = termsEnum.next();
|
||||
} catch (IOException e) {
|
||||
throw new RuntimeException(e);
|
||||
}
|
||||
|
||||
return (actualTerm != null) ? actualTerm.text() : null;
|
||||
return result;
|
||||
}
|
||||
|
||||
public boolean hasNext() {
|
||||
if (hasNextCalled) {
|
||||
return actualTerm != null;
|
||||
}
|
||||
hasNextCalled = true;
|
||||
|
||||
actualTerm = termEnum.term();
|
||||
|
||||
// if there are no words return false
|
||||
if (actualTerm == null) {
|
||||
return false;
|
||||
}
|
||||
|
||||
String currentField = actualTerm.field();
|
||||
|
||||
// if the next word doesn't have the same field return false
|
||||
if (currentField != field) {
|
||||
actualTerm = null;
|
||||
return false;
|
||||
}
|
||||
|
||||
return true;
|
||||
return pendingTerm != null;
|
||||
}
|
||||
|
||||
public void remove() {
|
||||
|
|
|
@ -17,16 +17,21 @@ package org.apache.lucene.queryParser.surround.query;
|
|||
*/
|
||||
|
||||
import org.apache.lucene.index.Term;
|
||||
import org.apache.lucene.index.TermEnum;
|
||||
import org.apache.lucene.index.Terms;
|
||||
import org.apache.lucene.util.BytesRef;
|
||||
import org.apache.lucene.index.TermsEnum;
|
||||
import org.apache.lucene.index.IndexReader;
|
||||
import org.apache.lucene.index.MultiFields;
|
||||
|
||||
import java.io.IOException;
|
||||
|
||||
|
||||
public class SrndPrefixQuery extends SimpleTerm {
|
||||
private final BytesRef prefixRef;
|
||||
public SrndPrefixQuery(String prefix, boolean quoted, char truncator) {
|
||||
super(quoted);
|
||||
this.prefix = prefix;
|
||||
prefixRef = new BytesRef(prefix);
|
||||
this.truncator = truncator;
|
||||
}
|
||||
|
||||
|
@ -53,20 +58,35 @@ public class SrndPrefixQuery extends SimpleTerm {
|
|||
MatchingTermVisitor mtv) throws IOException
|
||||
{
|
||||
/* inspired by PrefixQuery.rewrite(): */
|
||||
TermEnum enumerator = reader.terms(getLucenePrefixTerm(fieldName));
|
||||
try {
|
||||
do {
|
||||
Term term = enumerator.term();
|
||||
if ((term != null)
|
||||
&& term.text().startsWith(getPrefix())
|
||||
&& term.field().equals(fieldName)) {
|
||||
mtv.visitMatchingTerm(term);
|
||||
Terms terms = MultiFields.getTerms(reader, fieldName);
|
||||
if (terms != null) {
|
||||
TermsEnum termsEnum = terms.iterator();
|
||||
|
||||
boolean skip = false;
|
||||
TermsEnum.SeekStatus status = termsEnum.seek(new BytesRef(getPrefix()));
|
||||
if (status == TermsEnum.SeekStatus.FOUND) {
|
||||
mtv.visitMatchingTerm(getLucenePrefixTerm(fieldName));
|
||||
} else if (status == TermsEnum.SeekStatus.NOT_FOUND) {
|
||||
if (termsEnum.term().startsWith(prefixRef)) {
|
||||
mtv.visitMatchingTerm(new Term(fieldName, termsEnum.term().utf8ToString()));
|
||||
} else {
|
||||
break;
|
||||
skip = true;
|
||||
}
|
||||
} while (enumerator.next());
|
||||
} finally {
|
||||
enumerator.close();
|
||||
} else {
|
||||
// EOF
|
||||
skip = true;
|
||||
}
|
||||
|
||||
if (!skip) {
|
||||
while(true) {
|
||||
BytesRef text = termsEnum.next();
|
||||
if (text != null && text.startsWith(prefixRef)) {
|
||||
mtv.visitMatchingTerm(new Term(fieldName, text.utf8ToString()));
|
||||
} else {
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
|
@ -20,7 +20,10 @@ import java.io.IOException;
|
|||
|
||||
import org.apache.lucene.index.IndexReader;
|
||||
import org.apache.lucene.index.Term;
|
||||
import org.apache.lucene.index.TermEnum;
|
||||
import org.apache.lucene.index.TermsEnum;
|
||||
import org.apache.lucene.index.Terms;
|
||||
import org.apache.lucene.index.MultiFields;
|
||||
import org.apache.lucene.util.BytesRef;
|
||||
|
||||
|
||||
public class SrndTermQuery extends SimpleTerm {
|
||||
|
@ -46,16 +49,14 @@ public class SrndTermQuery extends SimpleTerm {
|
|||
MatchingTermVisitor mtv) throws IOException
|
||||
{
|
||||
/* check term presence in index here for symmetry with other SimpleTerm's */
|
||||
TermEnum enumerator = reader.terms(getLuceneTerm(fieldName));
|
||||
try {
|
||||
Term it= enumerator.term(); /* same or following index term */
|
||||
if ((it != null)
|
||||
&& it.text().equals(getTermText())
|
||||
&& it.field().equals(fieldName)) {
|
||||
mtv.visitMatchingTerm(it);
|
||||
Terms terms = MultiFields.getTerms(reader, fieldName);
|
||||
if (terms != null) {
|
||||
TermsEnum termsEnum = terms.iterator();
|
||||
|
||||
TermsEnum.SeekStatus status = termsEnum.seek(new BytesRef(getTermText()));
|
||||
if (status == TermsEnum.SeekStatus.FOUND) {
|
||||
mtv.visitMatchingTerm(getLuceneTerm(fieldName));
|
||||
}
|
||||
} finally {
|
||||
enumerator.close();
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
|
@ -17,8 +17,11 @@ package org.apache.lucene.queryParser.surround.query;
|
|||
*/
|
||||
|
||||
import org.apache.lucene.index.Term;
|
||||
import org.apache.lucene.index.TermEnum;
|
||||
import org.apache.lucene.index.TermsEnum;
|
||||
import org.apache.lucene.index.Terms;
|
||||
import org.apache.lucene.util.BytesRef;
|
||||
import org.apache.lucene.index.IndexReader;
|
||||
import org.apache.lucene.index.MultiFields;
|
||||
|
||||
import java.io.IOException;
|
||||
|
||||
|
@ -40,6 +43,7 @@ public class SrndTruncQuery extends SimpleTerm {
|
|||
private final char mask;
|
||||
|
||||
private String prefix;
|
||||
private BytesRef prefixRef;
|
||||
private Pattern pattern;
|
||||
|
||||
|
||||
|
@ -68,6 +72,7 @@ public class SrndTruncQuery extends SimpleTerm {
|
|||
i++;
|
||||
}
|
||||
prefix = truncated.substring(0, i);
|
||||
prefixRef = new BytesRef(prefix);
|
||||
|
||||
StringBuilder re = new StringBuilder();
|
||||
while (i < truncated.length()) {
|
||||
|
@ -84,26 +89,37 @@ public class SrndTruncQuery extends SimpleTerm {
|
|||
MatchingTermVisitor mtv) throws IOException
|
||||
{
|
||||
int prefixLength = prefix.length();
|
||||
TermEnum enumerator = reader.terms(new Term(fieldName, prefix));
|
||||
Matcher matcher = pattern.matcher("");
|
||||
try {
|
||||
do {
|
||||
Term term = enumerator.term();
|
||||
if (term != null) {
|
||||
String text = term.text();
|
||||
if ((! text.startsWith(prefix)) || (! term.field().equals(fieldName))) {
|
||||
break;
|
||||
} else {
|
||||
matcher.reset( text.substring(prefixLength));
|
||||
if (matcher.matches()) {
|
||||
mtv.visitMatchingTerm(term);
|
||||
}
|
||||
}
|
||||
Terms terms = MultiFields.getTerms(reader, fieldName);
|
||||
if (terms != null) {
|
||||
Matcher matcher = pattern.matcher("");
|
||||
try {
|
||||
TermsEnum termsEnum = terms.iterator();
|
||||
|
||||
TermsEnum.SeekStatus status = termsEnum.seek(prefixRef);
|
||||
BytesRef text;
|
||||
if (status == TermsEnum.SeekStatus.FOUND) {
|
||||
text = prefixRef;
|
||||
} else if (status == TermsEnum.SeekStatus.NOT_FOUND) {
|
||||
text = termsEnum.term();
|
||||
} else {
|
||||
text = null;
|
||||
}
|
||||
} while (enumerator.next());
|
||||
} finally {
|
||||
enumerator.close();
|
||||
matcher.reset();
|
||||
|
||||
while(text != null) {
|
||||
if (text != null && text.startsWith(prefixRef)) {
|
||||
String textString = text.utf8ToString();
|
||||
matcher.reset(textString.substring(prefixLength));
|
||||
if (matcher.matches()) {
|
||||
mtv.visitMatchingTerm(new Term(fieldName, textString));
|
||||
}
|
||||
} else {
|
||||
break;
|
||||
}
|
||||
text = termsEnum.next();
|
||||
}
|
||||
} finally {
|
||||
matcher.reset();
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
|
@ -17,12 +17,17 @@ package org.apache.lucene.analysis;
|
|||
* limitations under the License.
|
||||
*/
|
||||
|
||||
import org.apache.lucene.util.Attribute;
|
||||
import org.apache.lucene.util.AttributeImpl;
|
||||
import org.apache.lucene.util.AttributeSource;
|
||||
import org.apache.lucene.util.BytesRef;
|
||||
import org.apache.lucene.util.NumericUtils;
|
||||
import org.apache.lucene.document.NumericField; // for javadocs
|
||||
import org.apache.lucene.search.NumericRangeQuery; // for javadocs
|
||||
import org.apache.lucene.search.NumericRangeFilter; // for javadocs
|
||||
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
|
||||
import org.apache.lucene.analysis.tokenattributes.TermAttribute;
|
||||
import org.apache.lucene.analysis.tokenattributes.TermToBytesRefAttribute;
|
||||
import org.apache.lucene.analysis.tokenattributes.TypeAttribute;
|
||||
import org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;
|
||||
|
||||
|
@ -92,6 +97,88 @@ public final class NumericTokenStream extends TokenStream {
|
|||
/** The lower precision tokens gets this token type assigned. */
|
||||
public static final String TOKEN_TYPE_LOWER_PREC = "lowerPrecNumeric";
|
||||
|
||||
/** <b>Expert:</b> Use this attribute to get the details of the currently generated token
|
||||
* @lucene.experimental
|
||||
* @since 3.1
|
||||
*/
|
||||
public interface NumericTermAttribute extends Attribute {
|
||||
/** Returns current shift value, undefined before first token */
|
||||
int getShift();
|
||||
/** Returns {@link NumericTokenStream}'s raw value as {@code long} */
|
||||
long getRawValue();
|
||||
/** Returns value size in bits (32 for {@code float}, {@code int}; 64 for {@code double}, {@code long}) */
|
||||
int getValueSize();
|
||||
}
|
||||
|
||||
private static final class NumericAttributeFactory extends AttributeFactory {
|
||||
private final AttributeFactory delegate;
|
||||
private NumericTokenStream ts = null;
|
||||
|
||||
NumericAttributeFactory(AttributeFactory delegate) {
|
||||
this.delegate = delegate;
|
||||
}
|
||||
|
||||
@Override
|
||||
public AttributeImpl createAttributeInstance(Class<? extends Attribute> attClass) {
|
||||
if (attClass == NumericTermAttribute.class)
|
||||
return new NumericTermAttributeImpl(ts);
|
||||
if (attClass.isAssignableFrom(CharTermAttribute.class) || attClass.isAssignableFrom(TermAttribute.class))
|
||||
throw new IllegalArgumentException("NumericTokenStream does not support CharTermAttribute/TermAttribute.");
|
||||
return delegate.createAttributeInstance(attClass);
|
||||
}
|
||||
}
|
||||
|
||||
private static final class NumericTermAttributeImpl extends AttributeImpl implements NumericTermAttribute,TermToBytesRefAttribute {
|
||||
private final NumericTokenStream ts;
|
||||
|
||||
public NumericTermAttributeImpl(NumericTokenStream ts) {
|
||||
this.ts = ts;
|
||||
}
|
||||
|
||||
public int toBytesRef(BytesRef bytes) {
|
||||
try {
|
||||
assert ts.valSize == 64 || ts.valSize == 32;
|
||||
return (ts.valSize == 64) ?
|
||||
NumericUtils.longToPrefixCoded(ts.value, ts.shift, bytes) :
|
||||
NumericUtils.intToPrefixCoded((int) ts.value, ts.shift, bytes);
|
||||
} catch (IllegalArgumentException iae) {
|
||||
// return empty token before first
|
||||
bytes.length = 0;
|
||||
return 0;
|
||||
}
|
||||
}
|
||||
|
||||
public int getShift() { return ts.shift; }
|
||||
public long getRawValue() { return ts.value; }
|
||||
public int getValueSize() { return ts.valSize; }
|
||||
|
||||
@Override
|
||||
public void clear() {
|
||||
// this attribute has no contents to clear
|
||||
}
|
||||
|
||||
@Override
|
||||
public boolean equals(Object other) {
|
||||
return other == this;
|
||||
}
|
||||
|
||||
@Override
|
||||
public int hashCode() {
|
||||
return System.identityHashCode(this);
|
||||
}
|
||||
|
||||
@Override
|
||||
public void copyTo(AttributeImpl target) {
|
||||
// this attribute has no contents to copy
|
||||
}
|
||||
|
||||
@Override
|
||||
public Object clone() {
|
||||
// cannot throw CloneNotSupportedException (checked)
|
||||
throw new UnsupportedOperationException();
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Creates a token stream for numeric values using the default <code>precisionStep</code>
|
||||
* {@link NumericUtils#PRECISION_STEP_DEFAULT} (4). The stream is not yet initialized,
|
||||
|
@ -107,23 +194,15 @@ public final class NumericTokenStream extends TokenStream {
|
|||
* before using set a value using the various set<em>???</em>Value() methods.
|
||||
*/
|
||||
public NumericTokenStream(final int precisionStep) {
|
||||
super();
|
||||
this.precisionStep = precisionStep;
|
||||
if (precisionStep < 1)
|
||||
throw new IllegalArgumentException("precisionStep must be >=1");
|
||||
}
|
||||
super(new NumericAttributeFactory(AttributeFactory.DEFAULT_ATTRIBUTE_FACTORY));
|
||||
// we must do this after the super call :(
|
||||
((NumericAttributeFactory) getAttributeFactory()).ts = this;
|
||||
addAttribute(NumericTermAttribute.class);
|
||||
|
||||
/**
|
||||
* Expert: Creates a token stream for numeric values with the specified
|
||||
* <code>precisionStep</code> using the given {@link AttributeSource}.
|
||||
* The stream is not yet initialized,
|
||||
* before using set a value using the various set<em>???</em>Value() methods.
|
||||
*/
|
||||
public NumericTokenStream(AttributeSource source, final int precisionStep) {
|
||||
super(source);
|
||||
this.precisionStep = precisionStep;
|
||||
if (precisionStep < 1)
|
||||
throw new IllegalArgumentException("precisionStep must be >=1");
|
||||
shift = -precisionStep;
|
||||
}
|
||||
|
||||
/**
|
||||
|
@ -134,10 +213,15 @@ public final class NumericTokenStream extends TokenStream {
|
|||
* before using set a value using the various set<em>???</em>Value() methods.
|
||||
*/
|
||||
public NumericTokenStream(AttributeFactory factory, final int precisionStep) {
|
||||
super(factory);
|
||||
super(new NumericAttributeFactory(factory));
|
||||
// we must do this after the super call :(
|
||||
((NumericAttributeFactory) getAttributeFactory()).ts = this;
|
||||
addAttribute(NumericTermAttribute.class);
|
||||
|
||||
this.precisionStep = precisionStep;
|
||||
if (precisionStep < 1)
|
||||
throw new IllegalArgumentException("precisionStep must be >=1");
|
||||
shift = -precisionStep;
|
||||
}
|
||||
|
||||
/**
|
||||
|
@ -149,7 +233,7 @@ public final class NumericTokenStream extends TokenStream {
|
|||
public NumericTokenStream setLongValue(final long value) {
|
||||
this.value = value;
|
||||
valSize = 64;
|
||||
shift = 0;
|
||||
shift = -precisionStep;
|
||||
return this;
|
||||
}
|
||||
|
||||
|
@ -162,7 +246,7 @@ public final class NumericTokenStream extends TokenStream {
|
|||
public NumericTokenStream setIntValue(final int value) {
|
||||
this.value = value;
|
||||
valSize = 32;
|
||||
shift = 0;
|
||||
shift = -precisionStep;
|
||||
return this;
|
||||
}
|
||||
|
||||
|
@ -175,7 +259,7 @@ public final class NumericTokenStream extends TokenStream {
|
|||
public NumericTokenStream setDoubleValue(final double value) {
|
||||
this.value = NumericUtils.doubleToSortableLong(value);
|
||||
valSize = 64;
|
||||
shift = 0;
|
||||
shift = -precisionStep;
|
||||
return this;
|
||||
}
|
||||
|
||||
|
@ -188,7 +272,7 @@ public final class NumericTokenStream extends TokenStream {
|
|||
public NumericTokenStream setFloatValue(final float value) {
|
||||
this.value = NumericUtils.floatToSortableInt(value);
|
||||
valSize = 32;
|
||||
shift = 0;
|
||||
shift = -precisionStep;
|
||||
return this;
|
||||
}
|
||||
|
||||
|
@ -196,37 +280,24 @@ public final class NumericTokenStream extends TokenStream {
|
|||
public void reset() {
|
||||
if (valSize == 0)
|
||||
throw new IllegalStateException("call set???Value() before usage");
|
||||
shift = 0;
|
||||
shift = -precisionStep;
|
||||
}
|
||||
|
||||
@Override
|
||||
public boolean incrementToken() {
|
||||
if (valSize == 0)
|
||||
throw new IllegalStateException("call set???Value() before usage");
|
||||
if (shift >= valSize)
|
||||
shift += precisionStep;
|
||||
if (shift >= valSize) {
|
||||
// reset so the attribute still works after exhausted stream
|
||||
shift -= precisionStep;
|
||||
return false;
|
||||
|
||||
clearAttributes();
|
||||
final char[] buffer;
|
||||
switch (valSize) {
|
||||
case 64:
|
||||
buffer = termAtt.resizeTermBuffer(NumericUtils.BUF_SIZE_LONG);
|
||||
termAtt.setTermLength(NumericUtils.longToPrefixCoded(value, shift, buffer));
|
||||
break;
|
||||
|
||||
case 32:
|
||||
buffer = termAtt.resizeTermBuffer(NumericUtils.BUF_SIZE_INT);
|
||||
termAtt.setTermLength(NumericUtils.intToPrefixCoded((int) value, shift, buffer));
|
||||
break;
|
||||
|
||||
default:
|
||||
// should not happen
|
||||
throw new IllegalArgumentException("valSize must be 32 or 64");
|
||||
}
|
||||
|
||||
clearAttributes();
|
||||
// the TermToBytesRefAttribute is directly accessing shift & value.
|
||||
typeAtt.setType((shift == 0) ? TOKEN_TYPE_FULL_PREC : TOKEN_TYPE_LOWER_PREC);
|
||||
posIncrAtt.setPositionIncrement((shift == 0) ? 1 : 0);
|
||||
shift += precisionStep;
|
||||
return true;
|
||||
}
|
||||
|
||||
|
@ -238,12 +309,11 @@ public final class NumericTokenStream extends TokenStream {
|
|||
}
|
||||
|
||||
// members
|
||||
private final TermAttribute termAtt = addAttribute(TermAttribute.class);
|
||||
private final TypeAttribute typeAtt = addAttribute(TypeAttribute.class);
|
||||
private final PositionIncrementAttribute posIncrAtt = addAttribute(PositionIncrementAttribute.class);
|
||||
|
||||
private int shift = 0, valSize = 0; // valSize==0 means not initialized
|
||||
int shift, valSize = 0; // valSize==0 means not initialized
|
||||
private final int precisionStep;
|
||||
|
||||
private long value = 0L;
|
||||
long value = 0L;
|
||||
}
|
||||
|
|
|
@ -64,14 +64,14 @@ import org.apache.lucene.util.AttributeImpl;
|
|||
implementing the {@link TokenStream#incrementToken()} API.
|
||||
Failing that, to create a new Token you should first use
|
||||
one of the constructors that starts with null text. To load
|
||||
the token from a char[] use {@link #setTermBuffer(char[], int, int)}.
|
||||
To load from a String use {@link #setTermBuffer(String)} or {@link #setTermBuffer(String, int, int)}.
|
||||
Alternatively you can get the Token's termBuffer by calling either {@link #termBuffer()},
|
||||
the token from a char[] use {@link #copyBuffer(char[], int, int)}.
|
||||
To load from a String use {@link #setEmpty} followed by {@link #append(CharSequence)} or {@link #append(CharSequence, int, int)}.
|
||||
Alternatively you can get the Token's termBuffer by calling either {@link #buffer()},
|
||||
if you know that your text is shorter than the capacity of the termBuffer
|
||||
or {@link #resizeTermBuffer(int)}, if there is any possibility
|
||||
or {@link #resizeBuffer(int)}, if there is any possibility
|
||||
that you may need to grow the buffer. Fill in the characters of your term into this
|
||||
buffer, with {@link String#getChars(int, int, char[], int)} if loading from a string,
|
||||
or with {@link System#arraycopy(Object, int, Object, int, int)}, and finally call {@link #setTermLength(int)} to
|
||||
or with {@link System#arraycopy(Object, int, Object, int, int)}, and finally call {@link #setLength(int)} to
|
||||
set the length of the term text. See <a target="_top"
|
||||
href="https://issues.apache.org/jira/browse/LUCENE-969">LUCENE-969</a>
|
||||
for details.</p>
|
||||
|
@ -100,7 +100,7 @@ import org.apache.lucene.util.AttributeImpl;
|
|||
</li>
|
||||
<li> Copying from one one Token to another (type is reset to {@link #DEFAULT_TYPE} if not specified):<br/>
|
||||
<pre>
|
||||
return reusableToken.reinit(source.termBuffer(), 0, source.termLength(), source.startOffset(), source.endOffset()[, source.type()]);
|
||||
return reusableToken.reinit(source.buffer(), 0, source.length(), source.startOffset(), source.endOffset()[, source.type()]);
|
||||
</pre>
|
||||
</li>
|
||||
</ul>
|
||||
|
@ -115,6 +115,7 @@ import org.apache.lucene.util.AttributeImpl;
|
|||
|
||||
@see org.apache.lucene.index.Payload
|
||||
*/
|
||||
// TODO: change superclass to CharTermAttribute in 4.0!
|
||||
public class Token extends TermAttributeImpl
|
||||
implements TypeAttribute, PositionIncrementAttribute,
|
||||
FlagsAttribute, OffsetAttribute, PayloadAttribute {
|
||||
|
@ -172,7 +173,7 @@ public class Token extends TermAttributeImpl
|
|||
* @param end end offset
|
||||
*/
|
||||
public Token(String text, int start, int end) {
|
||||
setTermBuffer(text);
|
||||
append(text);
|
||||
startOffset = start;
|
||||
endOffset = end;
|
||||
}
|
||||
|
@ -187,7 +188,7 @@ public class Token extends TermAttributeImpl
|
|||
* @param typ token type
|
||||
*/
|
||||
public Token(String text, int start, int end, String typ) {
|
||||
setTermBuffer(text);
|
||||
append(text);
|
||||
startOffset = start;
|
||||
endOffset = end;
|
||||
type = typ;
|
||||
|
@ -204,7 +205,7 @@ public class Token extends TermAttributeImpl
|
|||
* @param flags token type bits
|
||||
*/
|
||||
public Token(String text, int start, int end, int flags) {
|
||||
setTermBuffer(text);
|
||||
append(text);
|
||||
startOffset = start;
|
||||
endOffset = end;
|
||||
this.flags = flags;
|
||||
|
@ -221,7 +222,7 @@ public class Token extends TermAttributeImpl
|
|||
* @param end
|
||||
*/
|
||||
public Token(char[] startTermBuffer, int termBufferOffset, int termBufferLength, int start, int end) {
|
||||
setTermBuffer(startTermBuffer, termBufferOffset, termBufferLength);
|
||||
copyBuffer(startTermBuffer, termBufferOffset, termBufferLength);
|
||||
startOffset = start;
|
||||
endOffset = end;
|
||||
}
|
||||
|
@ -270,7 +271,7 @@ public class Token extends TermAttributeImpl
|
|||
corresponding to this token in the source text.
|
||||
|
||||
Note that the difference between endOffset() and startOffset() may not be
|
||||
equal to {@link #termLength}, as the term text may have been altered by a
|
||||
equal to {@link #length}, as the term text may have been altered by a
|
||||
stemmer or some other filter. */
|
||||
public final int startOffset() {
|
||||
return startOffset;
|
||||
|
@ -351,7 +352,7 @@ public class Token extends TermAttributeImpl
|
|||
@Override
|
||||
public String toString() {
|
||||
final StringBuilder sb = new StringBuilder();
|
||||
sb.append('(').append(term()).append(',')
|
||||
sb.append('(').append(super.toString()).append(',')
|
||||
.append(startOffset).append(',').append(endOffset);
|
||||
if (!"word".equals(type))
|
||||
sb.append(",type=").append(type);
|
||||
|
@ -387,7 +388,7 @@ public class Token extends TermAttributeImpl
|
|||
/** Makes a clone, but replaces the term buffer &
|
||||
* start/end offset in the process. This is more
|
||||
* efficient than doing a full clone (and then calling
|
||||
* setTermBuffer) because it saves a wasted copy of the old
|
||||
* {@link #copyBuffer}) because it saves a wasted copy of the old
|
||||
* termBuffer. */
|
||||
public Token clone(char[] newTermBuffer, int newTermOffset, int newTermLength, int newStartOffset, int newEndOffset) {
|
||||
final Token t = new Token(newTermBuffer, newTermOffset, newTermLength, newStartOffset, newEndOffset);
|
||||
|
@ -442,16 +443,16 @@ public class Token extends TermAttributeImpl
|
|||
}
|
||||
|
||||
/** Shorthand for calling {@link #clear},
|
||||
* {@link #setTermBuffer(char[], int, int)},
|
||||
* {@link #copyBuffer(char[], int, int)},
|
||||
* {@link #setStartOffset},
|
||||
* {@link #setEndOffset},
|
||||
* {@link #setType}
|
||||
* @return this Token instance */
|
||||
public Token reinit(char[] newTermBuffer, int newTermOffset, int newTermLength, int newStartOffset, int newEndOffset, String newType) {
|
||||
clearNoTermBuffer();
|
||||
copyBuffer(newTermBuffer, newTermOffset, newTermLength);
|
||||
payload = null;
|
||||
positionIncrement = 1;
|
||||
setTermBuffer(newTermBuffer, newTermOffset, newTermLength);
|
||||
startOffset = newStartOffset;
|
||||
endOffset = newEndOffset;
|
||||
type = newType;
|
||||
|
@ -459,14 +460,14 @@ public class Token extends TermAttributeImpl
|
|||
}
|
||||
|
||||
/** Shorthand for calling {@link #clear},
|
||||
* {@link #setTermBuffer(char[], int, int)},
|
||||
* {@link #copyBuffer(char[], int, int)},
|
||||
* {@link #setStartOffset},
|
||||
* {@link #setEndOffset}
|
||||
* {@link #setType} on Token.DEFAULT_TYPE
|
||||
* @return this Token instance */
|
||||
public Token reinit(char[] newTermBuffer, int newTermOffset, int newTermLength, int newStartOffset, int newEndOffset) {
|
||||
clearNoTermBuffer();
|
||||
setTermBuffer(newTermBuffer, newTermOffset, newTermLength);
|
||||
copyBuffer(newTermBuffer, newTermOffset, newTermLength);
|
||||
startOffset = newStartOffset;
|
||||
endOffset = newEndOffset;
|
||||
type = DEFAULT_TYPE;
|
||||
|
@ -474,14 +475,14 @@ public class Token extends TermAttributeImpl
|
|||
}
|
||||
|
||||
/** Shorthand for calling {@link #clear},
|
||||
* {@link #setTermBuffer(String)},
|
||||
* {@link #append(CharSequence)},
|
||||
* {@link #setStartOffset},
|
||||
* {@link #setEndOffset}
|
||||
* {@link #setType}
|
||||
* @return this Token instance */
|
||||
public Token reinit(String newTerm, int newStartOffset, int newEndOffset, String newType) {
|
||||
clearNoTermBuffer();
|
||||
setTermBuffer(newTerm);
|
||||
clear();
|
||||
append(newTerm);
|
||||
startOffset = newStartOffset;
|
||||
endOffset = newEndOffset;
|
||||
type = newType;
|
||||
|
@ -489,14 +490,14 @@ public class Token extends TermAttributeImpl
|
|||
}
|
||||
|
||||
/** Shorthand for calling {@link #clear},
|
||||
* {@link #setTermBuffer(String, int, int)},
|
||||
* {@link #append(CharSequence, int, int)},
|
||||
* {@link #setStartOffset},
|
||||
* {@link #setEndOffset}
|
||||
* {@link #setType}
|
||||
* @return this Token instance */
|
||||
public Token reinit(String newTerm, int newTermOffset, int newTermLength, int newStartOffset, int newEndOffset, String newType) {
|
||||
clearNoTermBuffer();
|
||||
setTermBuffer(newTerm, newTermOffset, newTermLength);
|
||||
clear();
|
||||
append(newTerm, newTermOffset, newTermOffset + newTermLength);
|
||||
startOffset = newStartOffset;
|
||||
endOffset = newEndOffset;
|
||||
type = newType;
|
||||
|
@ -504,14 +505,14 @@ public class Token extends TermAttributeImpl
|
|||
}
|
||||
|
||||
/** Shorthand for calling {@link #clear},
|
||||
* {@link #setTermBuffer(String)},
|
||||
* {@link #append(CharSequence)},
|
||||
* {@link #setStartOffset},
|
||||
* {@link #setEndOffset}
|
||||
* {@link #setType} on Token.DEFAULT_TYPE
|
||||
* @return this Token instance */
|
||||
public Token reinit(String newTerm, int newStartOffset, int newEndOffset) {
|
||||
clearNoTermBuffer();
|
||||
setTermBuffer(newTerm);
|
||||
clear();
|
||||
append(newTerm);
|
||||
startOffset = newStartOffset;
|
||||
endOffset = newEndOffset;
|
||||
type = DEFAULT_TYPE;
|
||||
|
@ -519,14 +520,14 @@ public class Token extends TermAttributeImpl
|
|||
}
|
||||
|
||||
/** Shorthand for calling {@link #clear},
|
||||
* {@link #setTermBuffer(String, int, int)},
|
||||
* {@link #append(CharSequence, int, int)},
|
||||
* {@link #setStartOffset},
|
||||
* {@link #setEndOffset}
|
||||
* {@link #setType} on Token.DEFAULT_TYPE
|
||||
* @return this Token instance */
|
||||
public Token reinit(String newTerm, int newTermOffset, int newTermLength, int newStartOffset, int newEndOffset) {
|
||||
clearNoTermBuffer();
|
||||
setTermBuffer(newTerm, newTermOffset, newTermLength);
|
||||
clear();
|
||||
append(newTerm, newTermOffset, newTermOffset + newTermLength);
|
||||
startOffset = newStartOffset;
|
||||
endOffset = newEndOffset;
|
||||
type = DEFAULT_TYPE;
|
||||
|
@ -538,7 +539,7 @@ public class Token extends TermAttributeImpl
|
|||
* @param prototype
|
||||
*/
|
||||
public void reinit(Token prototype) {
|
||||
setTermBuffer(prototype.termBuffer(), 0, prototype.termLength());
|
||||
copyBuffer(prototype.buffer(), 0, prototype.length());
|
||||
positionIncrement = prototype.positionIncrement;
|
||||
flags = prototype.flags;
|
||||
startOffset = prototype.startOffset;
|
||||
|
@ -553,7 +554,7 @@ public class Token extends TermAttributeImpl
|
|||
* @param newTerm
|
||||
*/
|
||||
public void reinit(Token prototype, String newTerm) {
|
||||
setTermBuffer(newTerm);
|
||||
setEmpty().append(newTerm);
|
||||
positionIncrement = prototype.positionIncrement;
|
||||
flags = prototype.flags;
|
||||
startOffset = prototype.startOffset;
|
||||
|
@ -570,7 +571,7 @@ public class Token extends TermAttributeImpl
|
|||
* @param length
|
||||
*/
|
||||
public void reinit(Token prototype, char[] newTermBuffer, int offset, int length) {
|
||||
setTermBuffer(newTermBuffer, offset, length);
|
||||
copyBuffer(newTermBuffer, offset, length);
|
||||
positionIncrement = prototype.positionIncrement;
|
||||
flags = prototype.flags;
|
||||
startOffset = prototype.startOffset;
|
||||
|
|
|
@ -0,0 +1,71 @@
|
|||
package org.apache.lucene.analysis.tokenattributes;
|
||||
|
||||
/**
|
||||
* Licensed to the Apache Software Foundation (ASF) under one or more
|
||||
* contributor license agreements. See the NOTICE file distributed with
|
||||
* this work for additional information regarding copyright ownership.
|
||||
* The ASF licenses this file to You under the Apache License, Version 2.0
|
||||
* (the "License"); you may not use this file except in compliance with
|
||||
* the License. You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
|
||||
import org.apache.lucene.util.Attribute;
|
||||
|
||||
/**
|
||||
* The term text of a Token.
|
||||
*/
|
||||
public interface CharTermAttribute extends Attribute, CharSequence, Appendable {
|
||||
|
||||
/** Copies the contents of buffer, starting at offset for
|
||||
* length characters, into the termBuffer array.
|
||||
* @param buffer the buffer to copy
|
||||
* @param offset the index in the buffer of the first character to copy
|
||||
* @param length the number of characters to copy
|
||||
*/
|
||||
public void copyBuffer(char[] buffer, int offset, int length);
|
||||
|
||||
/** Returns the internal termBuffer character array which
|
||||
* you can then directly alter. If the array is too
|
||||
* small for your token, use {@link
|
||||
* #resizeBuffer(int)} to increase it. After
|
||||
* altering the buffer be sure to call {@link
|
||||
* #setLength} to record the number of valid
|
||||
* characters that were placed into the termBuffer. */
|
||||
public char[] buffer();
|
||||
|
||||
/** Grows the termBuffer to at least size newSize, preserving the
|
||||
* existing content.
|
||||
* @param newSize minimum size of the new termBuffer
|
||||
* @return newly created termBuffer with length >= newSize
|
||||
*/
|
||||
public char[] resizeBuffer(int newSize);
|
||||
|
||||
/** Set number of valid characters (length of the term) in
|
||||
* the termBuffer array. Use this to truncate the termBuffer
|
||||
* or to synchronize with external manipulation of the termBuffer.
|
||||
* Note: to grow the size of the array,
|
||||
* use {@link #resizeBuffer(int)} first.
|
||||
* @param length the truncated length
|
||||
*/
|
||||
public CharTermAttribute setLength(int length);
|
||||
|
||||
/** Sets the length of the termBuffer to zero.
|
||||
* Use this method before appending contents
|
||||
* using the {@link Appendable} interface.
|
||||
*/
|
||||
public CharTermAttribute setEmpty();
|
||||
|
||||
// the following methods are redefined to get rid of IOException declaration:
|
||||
public CharTermAttribute append(CharSequence csq);
|
||||
public CharTermAttribute append(CharSequence csq, int start, int end);
|
||||
public CharTermAttribute append(char c);
|
||||
|
||||
}
|
|
@ -0,0 +1,255 @@
|
|||
package org.apache.lucene.analysis.tokenattributes;
|
||||
|
||||
/**
|
||||
* Licensed to the Apache Software Foundation (ASF) under one or more
|
||||
* contributor license agreements. See the NOTICE file distributed with
|
||||
* this work for additional information regarding copyright ownership.
|
||||
* The ASF licenses this file to You under the Apache License, Version 2.0
|
||||
* (the "License"); you may not use this file except in compliance with
|
||||
* the License. You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
|
||||
import java.io.Serializable;
|
||||
import java.nio.CharBuffer;
|
||||
|
||||
import org.apache.lucene.util.ArrayUtil;
|
||||
import org.apache.lucene.util.AttributeImpl;
|
||||
import org.apache.lucene.util.BytesRef;
|
||||
import org.apache.lucene.util.RamUsageEstimator;
|
||||
import org.apache.lucene.util.UnicodeUtil;
|
||||
|
||||
/**
|
||||
* The term text of a Token.
|
||||
*/
|
||||
public class CharTermAttributeImpl extends AttributeImpl implements CharTermAttribute, TermAttribute, TermToBytesRefAttribute, Cloneable, Serializable {
|
||||
private static int MIN_BUFFER_SIZE = 10;
|
||||
|
||||
private char[] termBuffer = new char[ArrayUtil.oversize(MIN_BUFFER_SIZE, RamUsageEstimator.NUM_BYTES_CHAR)];
|
||||
private int termLength = 0;
|
||||
|
||||
@Deprecated
|
||||
public String term() {
|
||||
// don't delegate to toString() here!
|
||||
return new String(termBuffer, 0, termLength);
|
||||
}
|
||||
|
||||
public void copyBuffer(char[] buffer, int offset, int length) {
|
||||
growTermBuffer(length);
|
||||
System.arraycopy(buffer, offset, termBuffer, 0, length);
|
||||
termLength = length;
|
||||
}
|
||||
|
||||
@Deprecated
|
||||
public void setTermBuffer(char[] buffer, int offset, int length) {
|
||||
copyBuffer(buffer, offset, length);
|
||||
}
|
||||
|
||||
@Deprecated
|
||||
public void setTermBuffer(String buffer) {
|
||||
int length = buffer.length();
|
||||
growTermBuffer(length);
|
||||
buffer.getChars(0, length, termBuffer, 0);
|
||||
termLength = length;
|
||||
}
|
||||
|
||||
@Deprecated
|
||||
public void setTermBuffer(String buffer, int offset, int length) {
|
||||
assert offset <= buffer.length();
|
||||
assert offset + length <= buffer.length();
|
||||
growTermBuffer(length);
|
||||
buffer.getChars(offset, offset + length, termBuffer, 0);
|
||||
termLength = length;
|
||||
}
|
||||
|
||||
public char[] buffer() {
|
||||
return termBuffer;
|
||||
}
|
||||
|
||||
@Deprecated
|
||||
public char[] termBuffer() {
|
||||
return termBuffer;
|
||||
}
|
||||
|
||||
public char[] resizeBuffer(int newSize) {
|
||||
if (termBuffer == null) {
|
||||
// The buffer is always at least MIN_BUFFER_SIZE
|
||||
termBuffer = new char[ArrayUtil.oversize(newSize < MIN_BUFFER_SIZE ? MIN_BUFFER_SIZE : newSize, RamUsageEstimator.NUM_BYTES_CHAR)];
|
||||
} else {
|
||||
if(termBuffer.length < newSize){
|
||||
// Not big enough; create a new array with slight
|
||||
// over allocation and preserve content
|
||||
final char[] newCharBuffer = new char[ArrayUtil.oversize(newSize, RamUsageEstimator.NUM_BYTES_CHAR)];
|
||||
System.arraycopy(termBuffer, 0, newCharBuffer, 0, termBuffer.length);
|
||||
termBuffer = newCharBuffer;
|
||||
}
|
||||
}
|
||||
return termBuffer;
|
||||
}
|
||||
|
||||
@Deprecated
|
||||
public char[] resizeTermBuffer(int newSize) {
|
||||
return resizeBuffer(newSize);
|
||||
}
|
||||
|
||||
private void growTermBuffer(int newSize) {
|
||||
if (termBuffer == null) {
|
||||
// The buffer is always at least MIN_BUFFER_SIZE
|
||||
termBuffer = new char[ArrayUtil.oversize(newSize < MIN_BUFFER_SIZE ? MIN_BUFFER_SIZE : newSize, RamUsageEstimator.NUM_BYTES_CHAR)];
|
||||
} else {
|
||||
if(termBuffer.length < newSize){
|
||||
// Not big enough; create a new array with slight
|
||||
// over allocation:
|
||||
termBuffer = new char[ArrayUtil.oversize(newSize, RamUsageEstimator.NUM_BYTES_CHAR)];
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
@Deprecated
|
||||
public int termLength() {
|
||||
return termLength;
|
||||
}
|
||||
|
||||
public CharTermAttribute setLength(int length) {
|
||||
if (length > termBuffer.length)
|
||||
throw new IllegalArgumentException("length " + length + " exceeds the size of the termBuffer (" + termBuffer.length + ")");
|
||||
termLength = length;
|
||||
return this;
|
||||
}
|
||||
|
||||
public CharTermAttribute setEmpty() {
|
||||
termLength = 0;
|
||||
return this;
|
||||
}
|
||||
|
||||
@Deprecated
|
||||
public void setTermLength(int length) {
|
||||
setLength(length);
|
||||
}
|
||||
|
||||
// *** TermToBytesRefAttribute interface ***
|
||||
public int toBytesRef(BytesRef target) {
|
||||
// TODO: Maybe require that bytes is already initialized? TermsHashPerField ensures this.
|
||||
if (target.bytes == null) {
|
||||
target.bytes = new byte[termLength * 4];
|
||||
}
|
||||
return UnicodeUtil.UTF16toUTF8WithHash(termBuffer, 0, termLength, target);
|
||||
}
|
||||
|
||||
// *** CharSequence interface ***
|
||||
public int length() {
|
||||
return termLength;
|
||||
}
|
||||
|
||||
public char charAt(int index) {
|
||||
if (index >= termLength)
|
||||
throw new IndexOutOfBoundsException();
|
||||
return termBuffer[index];
|
||||
}
|
||||
|
||||
public CharSequence subSequence(final int start, final int end) {
|
||||
if (start > termLength || end > termLength)
|
||||
throw new IndexOutOfBoundsException();
|
||||
return new String(termBuffer, start, end - start);
|
||||
}
|
||||
|
||||
// *** Appendable interface ***
|
||||
public CharTermAttribute append(CharSequence csq) {
|
||||
return append(csq, 0, csq.length());
|
||||
}
|
||||
|
||||
public CharTermAttribute append(CharSequence csq, int start, int end) {
|
||||
resizeBuffer(termLength + end - start);
|
||||
if (csq instanceof String) {
|
||||
((String) csq).getChars(start, end, termBuffer, termLength);
|
||||
} else if (csq instanceof StringBuilder) {
|
||||
((StringBuilder) csq).getChars(start, end, termBuffer, termLength);
|
||||
} else if (csq instanceof StringBuffer) {
|
||||
((StringBuffer) csq).getChars(start, end, termBuffer, termLength);
|
||||
} else if (csq instanceof CharBuffer && ((CharBuffer) csq).hasArray()) {
|
||||
final CharBuffer cb = (CharBuffer) csq;
|
||||
System.arraycopy(cb.array(), cb.arrayOffset() + cb.position() + start, termBuffer, termLength, end - start);
|
||||
} else {
|
||||
while (start < end)
|
||||
termBuffer[termLength++] = csq.charAt(start++);
|
||||
// no fall-through here, as termLength is updated!
|
||||
return this;
|
||||
}
|
||||
termLength += end - start;
|
||||
return this;
|
||||
}
|
||||
|
||||
public CharTermAttribute append(char c) {
|
||||
resizeBuffer(termLength + 1)[termLength++] = c;
|
||||
return this;
|
||||
}
|
||||
|
||||
// *** AttributeImpl ***
|
||||
|
||||
@Override
|
||||
public int hashCode() {
|
||||
int code = termLength;
|
||||
code = code * 31 + ArrayUtil.hashCode(termBuffer, 0, termLength);
|
||||
return code;
|
||||
}
|
||||
|
||||
@Override
|
||||
public void clear() {
|
||||
termLength = 0;
|
||||
}
|
||||
|
||||
@Override
|
||||
public Object clone() {
|
||||
CharTermAttributeImpl t = (CharTermAttributeImpl)super.clone();
|
||||
// Do a deep clone
|
||||
if (termBuffer != null) {
|
||||
t.termBuffer = termBuffer.clone();
|
||||
}
|
||||
return t;
|
||||
}
|
||||
|
||||
@Override
|
||||
public boolean equals(Object other) {
|
||||
if (other == this) {
|
||||
return true;
|
||||
}
|
||||
|
||||
if (other instanceof CharTermAttributeImpl) {
|
||||
final CharTermAttributeImpl o = ((CharTermAttributeImpl) other);
|
||||
if (termLength != o.termLength)
|
||||
return false;
|
||||
for(int i=0;i<termLength;i++) {
|
||||
if (termBuffer[i] != o.termBuffer[i]) {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
return true;
|
||||
}
|
||||
|
||||
return false;
|
||||
}
|
||||
|
||||
@Override
|
||||
public String toString() {
|
||||
return new String(termBuffer, 0, termLength);
|
||||
}
|
||||
|
||||
@Override
|
||||
public void copyTo(AttributeImpl target) {
|
||||
if (target instanceof CharTermAttribute) {
|
||||
CharTermAttribute t = (CharTermAttribute) target;
|
||||
t.copyBuffer(termBuffer, 0, termLength);
|
||||
} else {
|
||||
TermAttribute t = (TermAttribute) target;
|
||||
t.setTermBuffer(termBuffer, 0, termLength);
|
||||
}
|
||||
}
|
||||
|
||||
}
|
|
@ -21,7 +21,9 @@ import org.apache.lucene.util.Attribute;
|
|||
|
||||
/**
|
||||
* The term text of a Token.
|
||||
* @deprecated Use {@link CharTermAttribute} instead.
|
||||
*/
|
||||
@Deprecated
|
||||
public interface TermAttribute extends Attribute {
|
||||
/** Returns the Token's term text.
|
||||
*
|
||||
|
|
|
@ -17,211 +17,11 @@ package org.apache.lucene.analysis.tokenattributes;
|
|||
* limitations under the License.
|
||||
*/
|
||||
|
||||
import java.io.Serializable;
|
||||
|
||||
import org.apache.lucene.util.ArrayUtil;
|
||||
import org.apache.lucene.util.AttributeImpl;
|
||||
import org.apache.lucene.util.RamUsageEstimator;
|
||||
|
||||
/**
|
||||
* The term text of a Token.
|
||||
* @deprecated This class is only available for AttributeSource
|
||||
* to be able to load an old TermAttribute implementation class.
|
||||
*/
|
||||
public class TermAttributeImpl extends AttributeImpl implements TermAttribute, Cloneable, Serializable {
|
||||
private static int MIN_BUFFER_SIZE = 10;
|
||||
|
||||
private char[] termBuffer;
|
||||
private int termLength;
|
||||
|
||||
/** Returns the Token's term text.
|
||||
*
|
||||
* This method has a performance penalty
|
||||
* because the text is stored internally in a char[]. If
|
||||
* possible, use {@link #termBuffer()} and {@link
|
||||
* #termLength()} directly instead. If you really need a
|
||||
* String, use this method, which is nothing more than
|
||||
* a convenience call to <b>new String(token.termBuffer(), 0, token.termLength())</b>
|
||||
*/
|
||||
public String term() {
|
||||
initTermBuffer();
|
||||
return new String(termBuffer, 0, termLength);
|
||||
}
|
||||
|
||||
/** Copies the contents of buffer, starting at offset for
|
||||
* length characters, into the termBuffer array.
|
||||
* @param buffer the buffer to copy
|
||||
* @param offset the index in the buffer of the first character to copy
|
||||
* @param length the number of characters to copy
|
||||
*/
|
||||
public void setTermBuffer(char[] buffer, int offset, int length) {
|
||||
growTermBuffer(length);
|
||||
System.arraycopy(buffer, offset, termBuffer, 0, length);
|
||||
termLength = length;
|
||||
}
|
||||
|
||||
/** Copies the contents of buffer into the termBuffer array.
|
||||
* @param buffer the buffer to copy
|
||||
*/
|
||||
public void setTermBuffer(String buffer) {
|
||||
int length = buffer.length();
|
||||
growTermBuffer(length);
|
||||
buffer.getChars(0, length, termBuffer, 0);
|
||||
termLength = length;
|
||||
}
|
||||
|
||||
/** Copies the contents of buffer, starting at offset and continuing
|
||||
* for length characters, into the termBuffer array.
|
||||
* @param buffer the buffer to copy
|
||||
* @param offset the index in the buffer of the first character to copy
|
||||
* @param length the number of characters to copy
|
||||
*/
|
||||
public void setTermBuffer(String buffer, int offset, int length) {
|
||||
assert offset <= buffer.length();
|
||||
assert offset + length <= buffer.length();
|
||||
growTermBuffer(length);
|
||||
buffer.getChars(offset, offset + length, termBuffer, 0);
|
||||
termLength = length;
|
||||
}
|
||||
|
||||
/** Returns the internal termBuffer character array which
|
||||
* you can then directly alter. If the array is too
|
||||
* small for your token, use {@link
|
||||
* #resizeTermBuffer(int)} to increase it. After
|
||||
* altering the buffer be sure to call {@link
|
||||
* #setTermLength} to record the number of valid
|
||||
* characters that were placed into the termBuffer. */
|
||||
public char[] termBuffer() {
|
||||
initTermBuffer();
|
||||
return termBuffer;
|
||||
}
|
||||
|
||||
/** Grows the termBuffer to at least size newSize, preserving the
|
||||
* existing content. Note: If the next operation is to change
|
||||
* the contents of the term buffer use
|
||||
* {@link #setTermBuffer(char[], int, int)},
|
||||
* {@link #setTermBuffer(String)}, or
|
||||
* {@link #setTermBuffer(String, int, int)}
|
||||
* to optimally combine the resize with the setting of the termBuffer.
|
||||
* @param newSize minimum size of the new termBuffer
|
||||
* @return newly created termBuffer with length >= newSize
|
||||
*/
|
||||
public char[] resizeTermBuffer(int newSize) {
|
||||
if (termBuffer == null) {
|
||||
// The buffer is always at least MIN_BUFFER_SIZE
|
||||
termBuffer = new char[ArrayUtil.oversize(newSize < MIN_BUFFER_SIZE ? MIN_BUFFER_SIZE : newSize, RamUsageEstimator.NUM_BYTES_CHAR)];
|
||||
} else {
|
||||
if(termBuffer.length < newSize){
|
||||
// Not big enough; create a new array with slight
|
||||
// over allocation and preserve content
|
||||
final char[] newCharBuffer = new char[ArrayUtil.oversize(newSize, RamUsageEstimator.NUM_BYTES_CHAR)];
|
||||
System.arraycopy(termBuffer, 0, newCharBuffer, 0, termBuffer.length);
|
||||
termBuffer = newCharBuffer;
|
||||
}
|
||||
}
|
||||
return termBuffer;
|
||||
}
|
||||
|
||||
|
||||
/** Allocates a buffer char[] of at least newSize, without preserving the existing content.
|
||||
* its always used in places that set the content
|
||||
* @param newSize minimum size of the buffer
|
||||
*/
|
||||
private void growTermBuffer(int newSize) {
|
||||
if (termBuffer == null) {
|
||||
// The buffer is always at least MIN_BUFFER_SIZE
|
||||
termBuffer = new char[ArrayUtil.oversize(newSize < MIN_BUFFER_SIZE ? MIN_BUFFER_SIZE : newSize, RamUsageEstimator.NUM_BYTES_CHAR)];
|
||||
} else {
|
||||
if(termBuffer.length < newSize){
|
||||
// Not big enough; create a new array with slight
|
||||
// over allocation:
|
||||
termBuffer = new char[ArrayUtil.oversize(newSize, RamUsageEstimator.NUM_BYTES_CHAR)];
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
private void initTermBuffer() {
|
||||
if (termBuffer == null) {
|
||||
termBuffer = new char[ArrayUtil.oversize(MIN_BUFFER_SIZE, RamUsageEstimator.NUM_BYTES_CHAR)];
|
||||
termLength = 0;
|
||||
}
|
||||
}
|
||||
|
||||
/** Return number of valid characters (length of the term)
|
||||
* in the termBuffer array. */
|
||||
public int termLength() {
|
||||
return termLength;
|
||||
}
|
||||
|
||||
/** Set number of valid characters (length of the term) in
|
||||
* the termBuffer array. Use this to truncate the termBuffer
|
||||
* or to synchronize with external manipulation of the termBuffer.
|
||||
* Note: to grow the size of the array,
|
||||
* use {@link #resizeTermBuffer(int)} first.
|
||||
* @param length the truncated length
|
||||
*/
|
||||
public void setTermLength(int length) {
|
||||
initTermBuffer();
|
||||
if (length > termBuffer.length)
|
||||
throw new IllegalArgumentException("length " + length + " exceeds the size of the termBuffer (" + termBuffer.length + ")");
|
||||
termLength = length;
|
||||
}
|
||||
|
||||
@Override
|
||||
public int hashCode() {
|
||||
initTermBuffer();
|
||||
int code = termLength;
|
||||
code = code * 31 + ArrayUtil.hashCode(termBuffer, 0, termLength);
|
||||
return code;
|
||||
}
|
||||
|
||||
@Override
|
||||
public void clear() {
|
||||
termLength = 0;
|
||||
}
|
||||
|
||||
@Override
|
||||
public Object clone() {
|
||||
TermAttributeImpl t = (TermAttributeImpl)super.clone();
|
||||
// Do a deep clone
|
||||
if (termBuffer != null) {
|
||||
t.termBuffer = termBuffer.clone();
|
||||
}
|
||||
return t;
|
||||
}
|
||||
|
||||
@Override
|
||||
public boolean equals(Object other) {
|
||||
if (other == this) {
|
||||
return true;
|
||||
}
|
||||
|
||||
if (other instanceof TermAttributeImpl) {
|
||||
initTermBuffer();
|
||||
TermAttributeImpl o = ((TermAttributeImpl) other);
|
||||
o.initTermBuffer();
|
||||
|
||||
if (termLength != o.termLength)
|
||||
return false;
|
||||
for(int i=0;i<termLength;i++) {
|
||||
if (termBuffer[i] != o.termBuffer[i]) {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
return true;
|
||||
}
|
||||
|
||||
return false;
|
||||
}
|
||||
|
||||
@Override
|
||||
public String toString() {
|
||||
initTermBuffer();
|
||||
return "term=" + new String(termBuffer, 0, termLength);
|
||||
}
|
||||
|
||||
@Override
|
||||
public void copyTo(AttributeImpl target) {
|
||||
initTermBuffer();
|
||||
TermAttribute t = (TermAttribute) target;
|
||||
t.setTermBuffer(termBuffer, 0, termLength);
|
||||
}
|
||||
@Deprecated
|
||||
public class TermAttributeImpl extends CharTermAttributeImpl {
|
||||
}
|
||||
|
|
|
@ -0,0 +1,47 @@
|
|||
package org.apache.lucene.analysis.tokenattributes;
|
||||
|
||||
/**
|
||||
* Licensed to the Apache Software Foundation (ASF) under one or more
|
||||
* contributor license agreements. See the NOTICE file distributed with
|
||||
* this work for additional information regarding copyright ownership.
|
||||
* The ASF licenses this file to You under the Apache License, Version 2.0
|
||||
* (the "License"); you may not use this file except in compliance with
|
||||
* the License. You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
|
||||
import org.apache.lucene.util.Attribute;
|
||||
import org.apache.lucene.util.BytesRef;
|
||||
|
||||
/**
|
||||
* This attribute is requested by TermsHashPerField to index the contents.
|
||||
* This attribute has no real state, it should be implemented in addition to
|
||||
* {@link CharTermAttribute}, to support indexing the term text as
|
||||
* UTF-8 bytes.
|
||||
* @lucene.experimental This is a very expert API, please use
|
||||
* {@link CharTermAttributeImpl} and its implementation of this method
|
||||
* for UTF-8 terms.
|
||||
*/
|
||||
public interface TermToBytesRefAttribute extends Attribute {
|
||||
/** Copies the token's term text into the given {@link BytesRef}.
|
||||
* @param termBytes destination to write the bytes to (UTF-8 for text terms).
|
||||
* @return the hashcode as defined by {@link BytesRef#hashCode}:
|
||||
* <pre>
|
||||
* int hash = 0;
|
||||
* for (int i = termBytes.offset; i < termBytes.offset+termBytes.length; i++) {
|
||||
* hash = 31*hash + termBytes.bytes[i];
|
||||
* }
|
||||
* </pre>
|
||||
* Implement this for performance reasons, if your code can calculate
|
||||
* the hash on-the-fly. If this is not the case, just return
|
||||
* {@code termBytes.hashCode()}.
|
||||
*/
|
||||
public int toBytesRef(BytesRef termBytes);
|
||||
}
|
|
@ -21,6 +21,8 @@ import java.util.zip.Deflater;
|
|||
import java.util.zip.Inflater;
|
||||
import java.util.zip.DataFormatException;
|
||||
import java.io.ByteArrayOutputStream;
|
||||
|
||||
import org.apache.lucene.util.BytesRef;
|
||||
import org.apache.lucene.util.UnicodeUtil;
|
||||
|
||||
/** Simple utility class providing static methods to
|
||||
|
@ -84,9 +86,9 @@ public class CompressionTools {
|
|||
* compressionLevel (constants are defined in
|
||||
* java.util.zip.Deflater). */
|
||||
public static byte[] compressString(String value, int compressionLevel) {
|
||||
UnicodeUtil.UTF8Result result = new UnicodeUtil.UTF8Result();
|
||||
BytesRef result = new BytesRef(10);
|
||||
UnicodeUtil.UTF16toUTF8(value, 0, value.length(), result);
|
||||
return compress(result.result, 0, result.length, compressionLevel);
|
||||
return compress(result.bytes, 0, result.length, compressionLevel);
|
||||
}
|
||||
|
||||
/** Decompress the byte array previously returned by
|
||||
|
|
|
@ -26,6 +26,7 @@ import java.io.IOException;
|
|||
* packages. This means the API is freely subject to
|
||||
* change, and, the class could be removed entirely, in any
|
||||
* Lucene release. Use directly at your own risk! */
|
||||
@Deprecated
|
||||
public abstract class AbstractAllTermDocs implements TermDocs {
|
||||
|
||||
protected int maxDoc;
|
||||
|
|
|
@ -0,0 +1,78 @@
|
|||
/**
|
||||
* Licensed to the Apache Software Foundation (ASF) under one or more
|
||||
* contributor license agreements. See the NOTICE file distributed with
|
||||
* this work for additional information regarding copyright ownership.
|
||||
* The ASF licenses this file to You under the Apache License, Version 2.0
|
||||
* (the "License"); you may not use this file except in compliance with
|
||||
* the License. You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
|
||||
package org.apache.lucene.index;
|
||||
|
||||
import org.apache.lucene.util.Bits;
|
||||
import java.io.IOException;
|
||||
|
||||
class AllDocsEnum extends DocsEnum {
|
||||
protected final Bits skipDocs;
|
||||
protected final int maxDoc;
|
||||
protected final IndexReader reader;
|
||||
protected int doc = -1;
|
||||
|
||||
protected AllDocsEnum(IndexReader reader, Bits skipDocs) {
|
||||
this.skipDocs = skipDocs;
|
||||
this.maxDoc = reader.maxDoc();
|
||||
this.reader = reader;
|
||||
}
|
||||
|
||||
@Override
|
||||
public int freq() {
|
||||
return 1;
|
||||
}
|
||||
|
||||
@Override
|
||||
public int docID() {
|
||||
return doc;
|
||||
}
|
||||
|
||||
@Override
|
||||
public int nextDoc() throws IOException {
|
||||
return advance(doc+1);
|
||||
}
|
||||
|
||||
@Override
|
||||
public int read() throws IOException {
|
||||
final int[] docs = bulkResult.docs.ints;
|
||||
final int[] freqs = bulkResult.freqs.ints;
|
||||
int i = 0;
|
||||
while (i < docs.length && doc < maxDoc) {
|
||||
if (skipDocs == null || !skipDocs.get(doc)) {
|
||||
docs[i] = doc;
|
||||
freqs[i] = 1;
|
||||
++i;
|
||||
}
|
||||
doc++;
|
||||
}
|
||||
return i;
|
||||
}
|
||||
|
||||
@Override
|
||||
public int advance(int target) throws IOException {
|
||||
doc = target;
|
||||
while (doc < maxDoc) {
|
||||
if (skipDocs == null || !skipDocs.get(doc)) {
|
||||
return doc;
|
||||
}
|
||||
doc++;
|
||||
}
|
||||
doc = NO_MORE_DOCS;
|
||||
return doc;
|
||||
}
|
||||
}
|
|
@ -19,6 +19,8 @@ package org.apache.lucene.index;
|
|||
|
||||
import org.apache.lucene.util.BitVector;
|
||||
|
||||
/** @deprecated Switch to AllDocsEnum */
|
||||
@Deprecated
|
||||
class AllTermDocs extends AbstractAllTermDocs {
|
||||
|
||||
protected BitVector deletedDocs;
|
||||
|
|
|
@ -34,11 +34,11 @@ package org.apache.lucene.index;
|
|||
* hit a non-zero byte. */
|
||||
|
||||
import java.util.Arrays;
|
||||
import org.apache.lucene.util.BytesRef;
|
||||
import java.util.List;
|
||||
import static org.apache.lucene.util.RamUsageEstimator.NUM_BYTES_OBJECT_REF;
|
||||
import org.apache.lucene.util.ArrayUtil;
|
||||
|
||||
|
||||
final class ByteBlockPool {
|
||||
|
||||
abstract static class Allocator {
|
||||
|
@ -149,5 +149,23 @@ final class ByteBlockPool {
|
|||
|
||||
return newUpto+3;
|
||||
}
|
||||
|
||||
// Fill in a BytesRef from term's length & bytes encoded in
|
||||
// byte block
|
||||
final BytesRef setBytesRef(BytesRef term, int textStart) {
|
||||
final byte[] bytes = term.bytes = buffers[textStart >> DocumentsWriter.BYTE_BLOCK_SHIFT];
|
||||
int pos = textStart & DocumentsWriter.BYTE_BLOCK_MASK;
|
||||
if ((bytes[pos] & 0x80) == 0) {
|
||||
// length is 1 byte
|
||||
term.length = bytes[pos];
|
||||
term.offset = pos+1;
|
||||
} else {
|
||||
// length is 2 bytes
|
||||
term.length = (bytes[pos]&0x7f) + ((bytes[pos+1]&0xff)<<7);
|
||||
term.offset = pos+2;
|
||||
}
|
||||
assert term.length >= 0;
|
||||
return term;
|
||||
}
|
||||
}
|
||||
|
||||
|
|
|
@ -17,16 +17,17 @@ package org.apache.lucene.index;
|
|||
* limitations under the License.
|
||||
*/
|
||||
|
||||
import org.apache.lucene.store.IndexInput;
|
||||
import org.apache.lucene.store.IndexOutput;
|
||||
import java.io.IOException;
|
||||
|
||||
import org.apache.lucene.store.DataInput;
|
||||
import org.apache.lucene.store.DataOutput;
|
||||
|
||||
/* IndexInput that knows how to read the byte slices written
|
||||
* by Posting and PostingVector. We read the bytes in
|
||||
* each slice until we hit the end of that slice at which
|
||||
* point we read the forwarding address of the next slice
|
||||
* and then jump to it.*/
|
||||
final class ByteSliceReader extends IndexInput {
|
||||
final class ByteSliceReader extends DataInput {
|
||||
ByteBlockPool pool;
|
||||
int bufferUpto;
|
||||
byte[] buffer;
|
||||
|
@ -75,7 +76,7 @@ final class ByteSliceReader extends IndexInput {
|
|||
return buffer[upto++];
|
||||
}
|
||||
|
||||
public long writeTo(IndexOutput out) throws IOException {
|
||||
public long writeTo(DataOutput out) throws IOException {
|
||||
long size = 0;
|
||||
while(true) {
|
||||
if (limit + bufferOffset == endIndex) {
|
||||
|
@ -136,14 +137,4 @@ final class ByteSliceReader extends IndexInput {
|
|||
}
|
||||
}
|
||||
}
|
||||
|
||||
@Override
|
||||
public long getFilePointer() {throw new RuntimeException("not implemented");}
|
||||
@Override
|
||||
public long length() {throw new RuntimeException("not implemented");}
|
||||
@Override
|
||||
public void seek(long pos) {throw new RuntimeException("not implemented");}
|
||||
@Override
|
||||
public void close() {throw new RuntimeException("not implemented");}
|
||||
}
|
||||
|
||||
|
|
|
@ -1,5 +1,7 @@
|
|||
package org.apache.lucene.index;
|
||||
|
||||
import org.apache.lucene.store.DataOutput;
|
||||
|
||||
/**
|
||||
* Licensed to the Apache Software Foundation (ASF) under one or more
|
||||
* contributor license agreements. See the NOTICE file distributed with
|
||||
|
@ -24,7 +26,7 @@ package org.apache.lucene.index;
|
|||
* posting list for many terms in RAM.
|
||||
*/
|
||||
|
||||
final class ByteSliceWriter {
|
||||
final class ByteSliceWriter extends DataOutput {
|
||||
|
||||
private byte[] slice;
|
||||
private int upto;
|
||||
|
@ -48,6 +50,7 @@ final class ByteSliceWriter {
|
|||
}
|
||||
|
||||
/** Write byte into byte slice stream */
|
||||
@Override
|
||||
public void writeByte(byte b) {
|
||||
assert slice != null;
|
||||
if (slice[upto] != 0) {
|
||||
|
@ -60,6 +63,7 @@ final class ByteSliceWriter {
|
|||
assert upto != slice.length;
|
||||
}
|
||||
|
||||
@Override
|
||||
public void writeBytes(final byte[] b, int offset, final int len) {
|
||||
final int offsetEnd = offset + len;
|
||||
while(offset < offsetEnd) {
|
||||
|
@ -78,12 +82,4 @@ final class ByteSliceWriter {
|
|||
public int getAddress() {
|
||||
return upto + (offset0 & DocumentsWriter.BYTE_BLOCK_NOT_MASK);
|
||||
}
|
||||
|
||||
public void writeVInt(int i) {
|
||||
while ((i & ~0x7F) != 0) {
|
||||
writeByte((byte)((i & 0x7f) | 0x80));
|
||||
i >>>= 7;
|
||||
}
|
||||
writeByte((byte) i);
|
||||
}
|
||||
}
|
|
@ -1,60 +0,0 @@
|
|||
package org.apache.lucene.index;
|
||||
|
||||
/**
|
||||
* Licensed to the Apache Software Foundation (ASF) under one or more
|
||||
* contributor license agreements. See the NOTICE file distributed with
|
||||
* this work for additional information regarding copyright ownership.
|
||||
* The ASF licenses this file to You under the Apache License, Version 2.0
|
||||
* (the "License"); you may not use this file except in compliance with
|
||||
* the License. You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
|
||||
import static org.apache.lucene.util.RamUsageEstimator.NUM_BYTES_OBJECT_REF;
|
||||
import org.apache.lucene.util.ArrayUtil;
|
||||
|
||||
final class CharBlockPool {
|
||||
|
||||
public char[][] buffers = new char[10][];
|
||||
int numBuffer;
|
||||
|
||||
int bufferUpto = -1; // Which buffer we are upto
|
||||
public int charUpto = DocumentsWriter.CHAR_BLOCK_SIZE; // Where we are in head buffer
|
||||
|
||||
public char[] buffer; // Current head buffer
|
||||
public int charOffset = -DocumentsWriter.CHAR_BLOCK_SIZE; // Current head offset
|
||||
final private DocumentsWriter docWriter;
|
||||
|
||||
public CharBlockPool(DocumentsWriter docWriter) {
|
||||
this.docWriter = docWriter;
|
||||
}
|
||||
|
||||
public void reset() {
|
||||
docWriter.recycleCharBlocks(buffers, 1+bufferUpto);
|
||||
bufferUpto = -1;
|
||||
charUpto = DocumentsWriter.CHAR_BLOCK_SIZE;
|
||||
charOffset = -DocumentsWriter.CHAR_BLOCK_SIZE;
|
||||
}
|
||||
|
||||
public void nextBuffer() {
|
||||
if (1+bufferUpto == buffers.length) {
|
||||
char[][] newBuffers = new char[ArrayUtil.oversize(buffers.length+1,
|
||||
NUM_BYTES_OBJECT_REF)][];
|
||||
System.arraycopy(buffers, 0, newBuffers, 0, buffers.length);
|
||||
buffers = newBuffers;
|
||||
}
|
||||
buffer = buffers[1+bufferUpto] = docWriter.getCharBlock();
|
||||
bufferUpto++;
|
||||
|
||||
charUpto = 0;
|
||||
charOffset += DocumentsWriter.CHAR_BLOCK_SIZE;
|
||||
}
|
||||
}
|
||||
|
|
@ -22,6 +22,9 @@ import org.apache.lucene.store.Directory;
|
|||
import org.apache.lucene.store.IndexInput;
|
||||
import org.apache.lucene.document.AbstractField; // for javadocs
|
||||
import org.apache.lucene.document.Document;
|
||||
import org.apache.lucene.index.codecs.CodecProvider;
|
||||
import org.apache.lucene.util.Bits;
|
||||
import org.apache.lucene.util.BytesRef;
|
||||
|
||||
import java.text.NumberFormat;
|
||||
import java.io.PrintStream;
|
||||
|
@ -122,6 +125,9 @@ public class CheckIndex {
|
|||
/** Name of the segment. */
|
||||
public String name;
|
||||
|
||||
/** Name of codec used to read this segment. */
|
||||
public String codec;
|
||||
|
||||
/** Document count (does not take deletions into account). */
|
||||
public int docCount;
|
||||
|
||||
|
@ -263,26 +269,6 @@ public class CheckIndex {
|
|||
infoStream.println(msg);
|
||||
}
|
||||
|
||||
private static class MySegmentTermDocs extends SegmentTermDocs {
|
||||
|
||||
int delCount;
|
||||
|
||||
MySegmentTermDocs(SegmentReader p) {
|
||||
super(p);
|
||||
}
|
||||
|
||||
@Override
|
||||
public void seek(Term term) throws IOException {
|
||||
super.seek(term);
|
||||
delCount = 0;
|
||||
}
|
||||
|
||||
@Override
|
||||
protected void skippingDoc() throws IOException {
|
||||
delCount++;
|
||||
}
|
||||
}
|
||||
|
||||
/** Returns a {@link Status} instance detailing
|
||||
* the state of the index.
|
||||
*
|
||||
|
@ -296,6 +282,10 @@ public class CheckIndex {
|
|||
return checkIndex(null);
|
||||
}
|
||||
|
||||
protected Status checkIndex(List<String> onlySegments) throws IOException {
|
||||
return checkIndex(onlySegments, CodecProvider.getDefault());
|
||||
}
|
||||
|
||||
/** Returns a {@link Status} instance detailing
|
||||
* the state of the index.
|
||||
*
|
||||
|
@ -308,13 +298,13 @@ public class CheckIndex {
|
|||
* <p><b>WARNING</b>: make sure
|
||||
* you only call this when the index is not opened by any
|
||||
* writer. */
|
||||
public Status checkIndex(List<String> onlySegments) throws IOException {
|
||||
protected Status checkIndex(List<String> onlySegments, CodecProvider codecs) throws IOException {
|
||||
NumberFormat nf = NumberFormat.getInstance();
|
||||
SegmentInfos sis = new SegmentInfos();
|
||||
Status result = new Status();
|
||||
result.dir = dir;
|
||||
try {
|
||||
sis.read(dir);
|
||||
sis.read(dir, codecs);
|
||||
} catch (Throwable t) {
|
||||
msg("ERROR: could not read any segments file in directory");
|
||||
result.missingSegments = true;
|
||||
|
@ -371,6 +361,8 @@ public class CheckIndex {
|
|||
sFormat = "FORMAT_USER_DATA [Lucene 2.9]";
|
||||
else if (format == SegmentInfos.FORMAT_DIAGNOSTICS)
|
||||
sFormat = "FORMAT_DIAGNOSTICS [Lucene 2.9]";
|
||||
else if (format == SegmentInfos.FORMAT_FLEX_POSTINGS)
|
||||
sFormat = "FORMAT_FLEX_POSTINGS [Lucene 3.1]";
|
||||
else if (format < SegmentInfos.CURRENT_FORMAT) {
|
||||
sFormat = "int=" + format + " [newer version of Lucene than this tool]";
|
||||
skip = true;
|
||||
|
@ -429,6 +421,9 @@ public class CheckIndex {
|
|||
SegmentReader reader = null;
|
||||
|
||||
try {
|
||||
final String codec = info.getCodec().name;
|
||||
msg(" codec=" + codec);
|
||||
segInfoStat.codec = codec;
|
||||
msg(" compound=" + info.getUseCompoundFile());
|
||||
segInfoStat.compound = info.getUseCompoundFile();
|
||||
msg(" hasProx=" + info.getHasProx());
|
||||
|
@ -452,6 +447,7 @@ public class CheckIndex {
|
|||
msg(" docStoreIsCompoundFile=" + info.getDocStoreIsCompoundFile());
|
||||
segInfoStat.docStoreCompoundFile = info.getDocStoreIsCompoundFile();
|
||||
}
|
||||
|
||||
final String delFileName = info.getDelFileName();
|
||||
if (delFileName == null){
|
||||
msg(" no deletions");
|
||||
|
@ -503,7 +499,7 @@ public class CheckIndex {
|
|||
segInfoStat.fieldNormStatus = testFieldNorms(fieldNames, reader);
|
||||
|
||||
// Test the Term Index
|
||||
segInfoStat.termIndexStatus = testTermIndex(info, reader);
|
||||
segInfoStat.termIndexStatus = testTermIndex(reader);
|
||||
|
||||
// Test Stored Fields
|
||||
segInfoStat.storedFieldStatus = testStoredFields(info, reader, nf);
|
||||
|
@ -586,69 +582,129 @@ public class CheckIndex {
|
|||
/**
|
||||
* Test the term index.
|
||||
*/
|
||||
private Status.TermIndexStatus testTermIndex(SegmentInfo info, SegmentReader reader) {
|
||||
private Status.TermIndexStatus testTermIndex(SegmentReader reader) {
|
||||
final Status.TermIndexStatus status = new Status.TermIndexStatus();
|
||||
|
||||
final int maxDoc = reader.maxDoc();
|
||||
final Bits delDocs = reader.getDeletedDocs();
|
||||
|
||||
try {
|
||||
|
||||
if (infoStream != null) {
|
||||
infoStream.print(" test: terms, freq, prox...");
|
||||
}
|
||||
|
||||
final TermEnum termEnum = reader.terms();
|
||||
final TermPositions termPositions = reader.termPositions();
|
||||
final Fields fields = reader.fields();
|
||||
if (fields == null) {
|
||||
msg("OK [no fields/terms]");
|
||||
return status;
|
||||
}
|
||||
|
||||
// Used only to count up # deleted docs for this term
|
||||
final MySegmentTermDocs myTermDocs = new MySegmentTermDocs(reader);
|
||||
final FieldsEnum fieldsEnum = fields.iterator();
|
||||
while(true) {
|
||||
final String field = fieldsEnum.next();
|
||||
if (field == null) {
|
||||
break;
|
||||
}
|
||||
|
||||
final int maxDoc = reader.maxDoc();
|
||||
final TermsEnum terms = fieldsEnum.terms();
|
||||
|
||||
while (termEnum.next()) {
|
||||
status.termCount++;
|
||||
final Term term = termEnum.term();
|
||||
final int docFreq = termEnum.docFreq();
|
||||
termPositions.seek(term);
|
||||
int lastDoc = -1;
|
||||
int freq0 = 0;
|
||||
status.totFreq += docFreq;
|
||||
while (termPositions.next()) {
|
||||
freq0++;
|
||||
final int doc = termPositions.doc();
|
||||
final int freq = termPositions.freq();
|
||||
if (doc <= lastDoc)
|
||||
throw new RuntimeException("term " + term + ": doc " + doc + " <= lastDoc " + lastDoc);
|
||||
if (doc >= maxDoc)
|
||||
throw new RuntimeException("term " + term + ": doc " + doc + " >= maxDoc " + maxDoc);
|
||||
DocsEnum docs = null;
|
||||
DocsAndPositionsEnum postings = null;
|
||||
|
||||
lastDoc = doc;
|
||||
if (freq <= 0)
|
||||
throw new RuntimeException("term " + term + ": doc " + doc + ": freq " + freq + " is out of bounds");
|
||||
boolean hasOrd = true;
|
||||
final long termCountStart = status.termCount;
|
||||
|
||||
int lastPos = -1;
|
||||
status.totPos += freq;
|
||||
for(int j=0;j<freq;j++) {
|
||||
final int pos = termPositions.nextPosition();
|
||||
if (pos < -1)
|
||||
throw new RuntimeException("term " + term + ": doc " + doc + ": pos " + pos + " is out of bounds");
|
||||
if (pos < lastPos)
|
||||
throw new RuntimeException("term " + term + ": doc " + doc + ": pos " + pos + " < lastPos " + lastPos);
|
||||
lastPos = pos;
|
||||
while(true) {
|
||||
|
||||
final BytesRef term = terms.next();
|
||||
if (term == null) {
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
// Now count how many deleted docs occurred in
|
||||
// this term:
|
||||
final int delCount;
|
||||
if (reader.hasDeletions()) {
|
||||
myTermDocs.seek(term);
|
||||
while(myTermDocs.next()) { }
|
||||
delCount = myTermDocs.delCount;
|
||||
} else {
|
||||
delCount = 0;
|
||||
}
|
||||
final int docFreq = terms.docFreq();
|
||||
status.totFreq += docFreq;
|
||||
|
||||
if (freq0 + delCount != docFreq) {
|
||||
throw new RuntimeException("term " + term + " docFreq=" +
|
||||
docFreq + " != num docs seen " + freq0 + " + num docs deleted " + delCount);
|
||||
docs = terms.docs(delDocs, docs);
|
||||
postings = terms.docsAndPositions(delDocs, postings);
|
||||
|
||||
if (hasOrd) {
|
||||
long ord = -1;
|
||||
try {
|
||||
ord = terms.ord();
|
||||
} catch (UnsupportedOperationException uoe) {
|
||||
hasOrd = false;
|
||||
}
|
||||
|
||||
if (hasOrd) {
|
||||
final long ordExpected = status.termCount - termCountStart;
|
||||
if (ord != ordExpected) {
|
||||
throw new RuntimeException("ord mismatch: TermsEnum has ord=" + ord + " vs actual=" + ordExpected);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
status.termCount++;
|
||||
|
||||
final DocsEnum docs2;
|
||||
if (postings != null) {
|
||||
docs2 = postings;
|
||||
} else {
|
||||
docs2 = docs;
|
||||
}
|
||||
|
||||
int lastDoc = -1;
|
||||
while(true) {
|
||||
final int doc = docs2.nextDoc();
|
||||
if (doc == DocsEnum.NO_MORE_DOCS) {
|
||||
break;
|
||||
}
|
||||
final int freq = docs2.freq();
|
||||
status.totPos += freq;
|
||||
|
||||
if (doc <= lastDoc) {
|
||||
throw new RuntimeException("term " + term + ": doc " + doc + " <= lastDoc " + lastDoc);
|
||||
}
|
||||
if (doc >= maxDoc) {
|
||||
throw new RuntimeException("term " + term + ": doc " + doc + " >= maxDoc " + maxDoc);
|
||||
}
|
||||
|
||||
lastDoc = doc;
|
||||
if (freq <= 0) {
|
||||
throw new RuntimeException("term " + term + ": doc " + doc + ": freq " + freq + " is out of bounds");
|
||||
}
|
||||
|
||||
int lastPos = -1;
|
||||
if (postings != null) {
|
||||
for(int j=0;j<freq;j++) {
|
||||
final int pos = postings.nextPosition();
|
||||
if (pos < -1) {
|
||||
throw new RuntimeException("term " + term + ": doc " + doc + ": pos " + pos + " is out of bounds");
|
||||
}
|
||||
if (pos < lastPos) {
|
||||
throw new RuntimeException("term " + term + ": doc " + doc + ": pos " + pos + " < lastPos " + lastPos);
|
||||
}
|
||||
lastPos = pos;
|
||||
if (postings.getPayloadLength() != 0) {
|
||||
postings.getPayload();
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Now count how many deleted docs occurred in
|
||||
// this term:
|
||||
|
||||
if (reader.hasDeletions()) {
|
||||
final DocsEnum docsNoDel = terms.docs(null, docs);
|
||||
int count = 0;
|
||||
while(docsNoDel.nextDoc() != DocsEnum.NO_MORE_DOCS) {
|
||||
count++;
|
||||
}
|
||||
if (count != docFreq) {
|
||||
throw new RuntimeException("term " + term + " docFreq=" + docFreq + " != tot docs w/o deletions " + count);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
|
|
@ -31,8 +31,9 @@ import java.io.IOException;
|
|||
* Class for accessing a compound stream.
|
||||
* This class implements a directory, but is limited to only read operations.
|
||||
* Directory methods that would normally modify data throw an exception.
|
||||
* @lucene.experimental
|
||||
*/
|
||||
class CompoundFileReader extends Directory {
|
||||
public class CompoundFileReader extends Directory {
|
||||
|
||||
private int readBufferSize;
|
||||
|
||||
|
|
|
@ -25,7 +25,7 @@ import java.util.Collection;
|
|||
import java.util.Collections;
|
||||
import java.util.HashMap;
|
||||
import java.util.HashSet;
|
||||
|
||||
import java.util.List;
|
||||
import java.util.Map;
|
||||
import java.util.Set;
|
||||
|
||||
|
@ -35,6 +35,11 @@ import org.apache.lucene.search.Similarity;
|
|||
import org.apache.lucene.store.Directory;
|
||||
import org.apache.lucene.store.Lock;
|
||||
import org.apache.lucene.store.LockObtainFailedException;
|
||||
import org.apache.lucene.index.codecs.CodecProvider;
|
||||
import org.apache.lucene.util.Bits;
|
||||
import org.apache.lucene.util.ReaderUtil;
|
||||
import org.apache.lucene.util.BytesRef;
|
||||
|
||||
import org.apache.lucene.search.FieldCache; // not great (circular); used only to purge FieldCache entry on close
|
||||
|
||||
/**
|
||||
|
@ -44,12 +49,13 @@ class DirectoryReader extends IndexReader implements Cloneable {
|
|||
protected Directory directory;
|
||||
protected boolean readOnly;
|
||||
|
||||
protected CodecProvider codecs;
|
||||
|
||||
IndexWriter writer;
|
||||
|
||||
private IndexDeletionPolicy deletionPolicy;
|
||||
private Lock writeLock;
|
||||
private SegmentInfos segmentInfos;
|
||||
private SegmentInfos segmentInfosStart;
|
||||
private boolean stale;
|
||||
private final int termInfosIndexDivisor;
|
||||
|
||||
|
@ -58,34 +64,57 @@ class DirectoryReader extends IndexReader implements Cloneable {
|
|||
|
||||
private SegmentReader[] subReaders;
|
||||
private int[] starts; // 1st docno for each segment
|
||||
private final Map<SegmentReader,ReaderUtil.Slice> subReaderToSlice = new HashMap<SegmentReader,ReaderUtil.Slice>();
|
||||
private Map<String,byte[]> normsCache = new HashMap<String,byte[]>();
|
||||
private int maxDoc = 0;
|
||||
private int numDocs = -1;
|
||||
private boolean hasDeletions = false;
|
||||
|
||||
// static IndexReader open(final Directory directory, final IndexDeletionPolicy deletionPolicy, final IndexCommit commit, final boolean readOnly,
|
||||
// final int termInfosIndexDivisor) throws CorruptIndexException, IOException {
|
||||
// return open(directory, deletionPolicy, commit, readOnly, termInfosIndexDivisor, null);
|
||||
// }
|
||||
|
||||
static IndexReader open(final Directory directory, final IndexDeletionPolicy deletionPolicy, final IndexCommit commit, final boolean readOnly,
|
||||
final int termInfosIndexDivisor) throws CorruptIndexException, IOException {
|
||||
final int termInfosIndexDivisor, CodecProvider codecs) throws CorruptIndexException, IOException {
|
||||
final CodecProvider codecs2;
|
||||
if (codecs == null) {
|
||||
codecs2 = CodecProvider.getDefault();
|
||||
} else {
|
||||
codecs2 = codecs;
|
||||
}
|
||||
return (IndexReader) new SegmentInfos.FindSegmentsFile(directory) {
|
||||
@Override
|
||||
protected Object doBody(String segmentFileName) throws CorruptIndexException, IOException {
|
||||
SegmentInfos infos = new SegmentInfos();
|
||||
infos.read(directory, segmentFileName);
|
||||
infos.read(directory, segmentFileName, codecs2);
|
||||
if (readOnly)
|
||||
return new ReadOnlyDirectoryReader(directory, infos, deletionPolicy, termInfosIndexDivisor);
|
||||
return new ReadOnlyDirectoryReader(directory, infos, deletionPolicy, termInfosIndexDivisor, codecs2);
|
||||
else
|
||||
return new DirectoryReader(directory, infos, deletionPolicy, false, termInfosIndexDivisor);
|
||||
return new DirectoryReader(directory, infos, deletionPolicy, false, termInfosIndexDivisor, codecs2);
|
||||
}
|
||||
}.run(commit);
|
||||
}
|
||||
|
||||
/** Construct reading the named set of readers. */
|
||||
DirectoryReader(Directory directory, SegmentInfos sis, IndexDeletionPolicy deletionPolicy, boolean readOnly, int termInfosIndexDivisor) throws IOException {
|
||||
// DirectoryReader(Directory directory, SegmentInfos sis, IndexDeletionPolicy deletionPolicy, boolean readOnly, int termInfosIndexDivisor) throws IOException {
|
||||
// this(directory, sis, deletionPolicy, readOnly, termInfosIndexDivisor, null);
|
||||
// }
|
||||
|
||||
/** Construct reading the named set of readers. */
|
||||
DirectoryReader(Directory directory, SegmentInfos sis, IndexDeletionPolicy deletionPolicy, boolean readOnly, int termInfosIndexDivisor, CodecProvider codecs) throws IOException {
|
||||
this.directory = directory;
|
||||
this.readOnly = readOnly;
|
||||
this.segmentInfos = sis;
|
||||
this.deletionPolicy = deletionPolicy;
|
||||
this.termInfosIndexDivisor = termInfosIndexDivisor;
|
||||
|
||||
if (codecs == null) {
|
||||
this.codecs = CodecProvider.getDefault();
|
||||
} else {
|
||||
this.codecs = codecs;
|
||||
}
|
||||
|
||||
// To reduce the chance of hitting FileNotFound
|
||||
// (and having to retry), we open segments in
|
||||
// reverse because IndexWriter merges & deletes
|
||||
|
@ -115,12 +144,16 @@ class DirectoryReader extends IndexReader implements Cloneable {
|
|||
}
|
||||
|
||||
// Used by near real-time search
|
||||
DirectoryReader(IndexWriter writer, SegmentInfos infos, int termInfosIndexDivisor) throws IOException {
|
||||
DirectoryReader(IndexWriter writer, SegmentInfos infos, int termInfosIndexDivisor, CodecProvider codecs) throws IOException {
|
||||
this.directory = writer.getDirectory();
|
||||
this.readOnly = true;
|
||||
segmentInfos = infos;
|
||||
segmentInfosStart = (SegmentInfos) infos.clone();
|
||||
this.termInfosIndexDivisor = termInfosIndexDivisor;
|
||||
if (codecs == null) {
|
||||
this.codecs = CodecProvider.getDefault();
|
||||
} else {
|
||||
this.codecs = codecs;
|
||||
}
|
||||
|
||||
// IndexWriter synchronizes externally before calling
|
||||
// us, which ensures infos will not change; so there's
|
||||
|
@ -166,11 +199,17 @@ class DirectoryReader extends IndexReader implements Cloneable {
|
|||
|
||||
/** This constructor is only used for {@link #reopen()} */
|
||||
DirectoryReader(Directory directory, SegmentInfos infos, SegmentReader[] oldReaders, int[] oldStarts,
|
||||
Map<String,byte[]> oldNormsCache, boolean readOnly, boolean doClone, int termInfosIndexDivisor) throws IOException {
|
||||
Map<String,byte[]> oldNormsCache, boolean readOnly, boolean doClone, int termInfosIndexDivisor, CodecProvider codecs) throws IOException {
|
||||
this.directory = directory;
|
||||
this.readOnly = readOnly;
|
||||
this.segmentInfos = infos;
|
||||
this.termInfosIndexDivisor = termInfosIndexDivisor;
|
||||
if (codecs == null) {
|
||||
this.codecs = CodecProvider.getDefault();
|
||||
} else {
|
||||
this.codecs = codecs;
|
||||
}
|
||||
|
||||
|
||||
// we put the old SegmentReaders in a map, that allows us
|
||||
// to lookup a reader using its segment name
|
||||
|
@ -296,24 +335,44 @@ class DirectoryReader extends IndexReader implements Cloneable {
|
|||
buffer.append(' ');
|
||||
}
|
||||
buffer.append(subReaders[i]);
|
||||
buffer.append(' ');
|
||||
}
|
||||
buffer.append(')');
|
||||
return buffer.toString();
|
||||
}
|
||||
|
||||
private void initialize(SegmentReader[] subReaders) {
|
||||
private void initialize(SegmentReader[] subReaders) throws IOException {
|
||||
this.subReaders = subReaders;
|
||||
starts = new int[subReaders.length + 1]; // build starts array
|
||||
|
||||
final List<Fields> subFields = new ArrayList<Fields>();
|
||||
final List<ReaderUtil.Slice> fieldSlices = new ArrayList<ReaderUtil.Slice>();
|
||||
|
||||
for (int i = 0; i < subReaders.length; i++) {
|
||||
starts[i] = maxDoc;
|
||||
maxDoc += subReaders[i].maxDoc(); // compute maxDocs
|
||||
|
||||
if (subReaders[i].hasDeletions())
|
||||
if (subReaders[i].hasDeletions()) {
|
||||
hasDeletions = true;
|
||||
}
|
||||
|
||||
final ReaderUtil.Slice slice = new ReaderUtil.Slice(starts[i], subReaders[i].maxDoc(), i);
|
||||
subReaderToSlice.put(subReaders[i], slice);
|
||||
|
||||
final Fields f = subReaders[i].fields();
|
||||
if (f != null) {
|
||||
subFields.add(f);
|
||||
fieldSlices.add(slice);
|
||||
}
|
||||
}
|
||||
starts[subReaders.length] = maxDoc;
|
||||
}
|
||||
|
||||
@Override
|
||||
public Bits getDeletedDocs() {
|
||||
throw new UnsupportedOperationException("please use MultiFields.getDeletedDocs if you really need a top level Bits deletedDocs (NOTE that it's usually better to work per segment instead)");
|
||||
}
|
||||
|
||||
@Override
|
||||
public final synchronized Object clone() {
|
||||
try {
|
||||
|
@ -435,7 +494,7 @@ class DirectoryReader extends IndexReader implements Cloneable {
|
|||
@Override
|
||||
protected Object doBody(String segmentFileName) throws CorruptIndexException, IOException {
|
||||
SegmentInfos infos = new SegmentInfos();
|
||||
infos.read(directory, segmentFileName);
|
||||
infos.read(directory, segmentFileName, codecs);
|
||||
return doReopen(infos, false, openReadOnly);
|
||||
}
|
||||
}.run(commit);
|
||||
|
@ -444,9 +503,9 @@ class DirectoryReader extends IndexReader implements Cloneable {
|
|||
private synchronized DirectoryReader doReopen(SegmentInfos infos, boolean doClone, boolean openReadOnly) throws CorruptIndexException, IOException {
|
||||
DirectoryReader reader;
|
||||
if (openReadOnly) {
|
||||
reader = new ReadOnlyDirectoryReader(directory, infos, subReaders, starts, normsCache, doClone, termInfosIndexDivisor);
|
||||
reader = new ReadOnlyDirectoryReader(directory, infos, subReaders, starts, normsCache, doClone, termInfosIndexDivisor, null);
|
||||
} else {
|
||||
reader = new DirectoryReader(directory, infos, subReaders, starts, normsCache, false, doClone, termInfosIndexDivisor);
|
||||
reader = new DirectoryReader(directory, infos, subReaders, starts, normsCache, false, doClone, termInfosIndexDivisor, null);
|
||||
}
|
||||
return reader;
|
||||
}
|
||||
|
@ -640,7 +699,7 @@ class DirectoryReader extends IndexReader implements Cloneable {
|
|||
// Optimize single segment case:
|
||||
return subReaders[0].terms();
|
||||
} else {
|
||||
return new MultiTermEnum(this, subReaders, starts, null);
|
||||
return new MultiTermEnum(this, subReaders, starts, null);
|
||||
}
|
||||
}
|
||||
|
||||
|
@ -664,6 +723,16 @@ class DirectoryReader extends IndexReader implements Cloneable {
|
|||
return total;
|
||||
}
|
||||
|
||||
@Override
|
||||
public int docFreq(String field, BytesRef term) throws IOException {
|
||||
ensureOpen();
|
||||
int total = 0; // sum freqs in segments
|
||||
for (int i = 0; i < subReaders.length; i++) {
|
||||
total += subReaders[i].docFreq(field, term);
|
||||
}
|
||||
return total;
|
||||
}
|
||||
|
||||
@Override
|
||||
public TermDocs termDocs() throws IOException {
|
||||
ensureOpen();
|
||||
|
@ -686,6 +755,11 @@ class DirectoryReader extends IndexReader implements Cloneable {
|
|||
}
|
||||
}
|
||||
|
||||
@Override
|
||||
public Fields fields() throws IOException {
|
||||
throw new UnsupportedOperationException("please use MultiFields.getFields if you really need a top level Fields (NOTE that it's usually better to work per segment instead)");
|
||||
}
|
||||
|
||||
@Override
|
||||
public TermPositions termPositions() throws IOException {
|
||||
ensureOpen();
|
||||
|
@ -731,7 +805,7 @@ class DirectoryReader extends IndexReader implements Cloneable {
|
|||
|
||||
// we have to check whether index has changed since this reader was opened.
|
||||
// if so, this reader is no longer valid for deletion
|
||||
if (SegmentInfos.readCurrentVersion(directory) > segmentInfos.getVersion()) {
|
||||
if (SegmentInfos.readCurrentVersion(directory, codecs) > segmentInfos.getVersion()) {
|
||||
stale = true;
|
||||
this.writeLock.release();
|
||||
this.writeLock = null;
|
||||
|
@ -751,13 +825,18 @@ class DirectoryReader extends IndexReader implements Cloneable {
|
|||
*/
|
||||
@Override
|
||||
protected void doCommit(Map<String,String> commitUserData) throws IOException {
|
||||
// poll subreaders for changes
|
||||
for (int i = 0; !hasChanges && i < subReaders.length; i++) {
|
||||
hasChanges |= subReaders[i].hasChanges;
|
||||
}
|
||||
|
||||
if (hasChanges) {
|
||||
segmentInfos.setUserData(commitUserData);
|
||||
// Default deleter (for backwards compatibility) is
|
||||
// KeepOnlyLastCommitDeleter:
|
||||
IndexFileDeleter deleter = new IndexFileDeleter(directory,
|
||||
deletionPolicy == null ? new KeepOnlyLastCommitDeletionPolicy() : deletionPolicy,
|
||||
segmentInfos, null, null);
|
||||
segmentInfos, null, null, codecs);
|
||||
|
||||
// Checkpoint the state we are about to change, in
|
||||
// case we have to roll back:
|
||||
|
@ -827,21 +906,31 @@ class DirectoryReader extends IndexReader implements Cloneable {
|
|||
}
|
||||
}
|
||||
|
||||
@Override
|
||||
public long getUniqueTermCount() throws IOException {
|
||||
throw new UnsupportedOperationException("");
|
||||
}
|
||||
|
||||
@Override
|
||||
public Map<String,String> getCommitUserData() {
|
||||
ensureOpen();
|
||||
return segmentInfos.getUserData();
|
||||
}
|
||||
|
||||
/**
|
||||
* Check whether this IndexReader is still using the current (i.e., most recently committed) version of the index. If
|
||||
* a writer has committed any changes to the index since this reader was opened, this will return <code>false</code>,
|
||||
* in which case you must open a new IndexReader in order
|
||||
* to see the changes. Use {@link IndexWriter#commit} to
|
||||
* commit changes to the index.
|
||||
*
|
||||
* @throws CorruptIndexException if the index is corrupt
|
||||
* @throws IOException if there is a low-level IO error
|
||||
*/
|
||||
@Override
|
||||
public boolean isCurrent() throws CorruptIndexException, IOException {
|
||||
ensureOpen();
|
||||
if (writer == null || writer.isClosed()) {
|
||||
// we loaded SegmentInfos from the directory
|
||||
return SegmentInfos.readCurrentVersion(directory) == segmentInfos.getVersion();
|
||||
} else {
|
||||
return writer.nrtIsCurrent(segmentInfosStart);
|
||||
}
|
||||
return SegmentInfos.readCurrentVersion(directory, codecs) == segmentInfos.getVersion();
|
||||
}
|
||||
|
||||
@Override
|
||||
|
@ -893,6 +982,11 @@ class DirectoryReader extends IndexReader implements Cloneable {
|
|||
return subReaders;
|
||||
}
|
||||
|
||||
@Override
|
||||
public int getSubReaderDocBase(IndexReader subReader) {
|
||||
return subReaderToSlice.get(subReader).start;
|
||||
}
|
||||
|
||||
/** Returns the directory this index resides in. */
|
||||
@Override
|
||||
public Directory directory() {
|
||||
|
@ -919,12 +1013,17 @@ class DirectoryReader extends IndexReader implements Cloneable {
|
|||
|
||||
/** @see org.apache.lucene.index.IndexReader#listCommits */
|
||||
public static Collection<IndexCommit> listCommits(Directory dir) throws IOException {
|
||||
return listCommits(dir, CodecProvider.getDefault());
|
||||
}
|
||||
|
||||
/** @see org.apache.lucene.index.IndexReader#listCommits */
|
||||
public static Collection<IndexCommit> listCommits(Directory dir, CodecProvider codecs) throws IOException {
|
||||
final String[] files = dir.listAll();
|
||||
|
||||
Collection<IndexCommit> commits = new ArrayList<IndexCommit>();
|
||||
|
||||
SegmentInfos latest = new SegmentInfos();
|
||||
latest.read(dir);
|
||||
latest.read(dir, codecs);
|
||||
final long currentGen = latest.getGeneration();
|
||||
|
||||
commits.add(new ReaderCommit(latest, dir));
|
||||
|
@ -941,7 +1040,7 @@ class DirectoryReader extends IndexReader implements Cloneable {
|
|||
try {
|
||||
// IOException allowed to throw there, in case
|
||||
// segments_N is corrupt
|
||||
sis.read(dir, fileName);
|
||||
sis.read(dir, fileName, codecs);
|
||||
} catch (FileNotFoundException fnfe) {
|
||||
// LUCENE-948: on NFS (and maybe others), if
|
||||
// you have writers switching back and forth
|
||||
|
@ -1021,29 +1120,33 @@ class DirectoryReader extends IndexReader implements Cloneable {
|
|||
}
|
||||
}
|
||||
|
||||
// @deprecated This is pre-flex API
|
||||
// Exposes pre-flex API by doing on-the-fly merging
|
||||
// pre-flex API to each segment
|
||||
static class MultiTermEnum extends TermEnum {
|
||||
IndexReader topReader; // used for matching TermEnum to TermDocs
|
||||
private SegmentMergeQueue queue;
|
||||
private LegacySegmentMergeQueue queue;
|
||||
|
||||
private Term term;
|
||||
private int docFreq;
|
||||
final SegmentMergeInfo[] matchingSegments; // null terminated array of matching segments
|
||||
final LegacySegmentMergeInfo[] matchingSegments; // null terminated array of matching segments
|
||||
|
||||
public MultiTermEnum(IndexReader topReader, IndexReader[] readers, int[] starts, Term t)
|
||||
throws IOException {
|
||||
this.topReader = topReader;
|
||||
queue = new SegmentMergeQueue(readers.length);
|
||||
matchingSegments = new SegmentMergeInfo[readers.length+1];
|
||||
queue = new LegacySegmentMergeQueue(readers.length);
|
||||
matchingSegments = new LegacySegmentMergeInfo[readers.length+1];
|
||||
for (int i = 0; i < readers.length; i++) {
|
||||
IndexReader reader = readers[i];
|
||||
TermEnum termEnum;
|
||||
|
||||
if (t != null) {
|
||||
termEnum = reader.terms(t);
|
||||
} else
|
||||
} else {
|
||||
termEnum = reader.terms();
|
||||
}
|
||||
|
||||
SegmentMergeInfo smi = new SegmentMergeInfo(starts[i], termEnum, reader);
|
||||
LegacySegmentMergeInfo smi = new LegacySegmentMergeInfo(starts[i], termEnum, reader);
|
||||
smi.ord = i;
|
||||
if (t == null ? smi.next() : termEnum.term() != null)
|
||||
queue.add(smi); // initialize queue
|
||||
|
@ -1059,7 +1162,7 @@ class DirectoryReader extends IndexReader implements Cloneable {
|
|||
@Override
|
||||
public boolean next() throws IOException {
|
||||
for (int i=0; i<matchingSegments.length; i++) {
|
||||
SegmentMergeInfo smi = matchingSegments[i];
|
||||
LegacySegmentMergeInfo smi = matchingSegments[i];
|
||||
if (smi==null) break;
|
||||
if (smi.next())
|
||||
queue.add(smi);
|
||||
|
@ -1070,7 +1173,7 @@ class DirectoryReader extends IndexReader implements Cloneable {
|
|||
int numMatchingSegments = 0;
|
||||
matchingSegments[0] = null;
|
||||
|
||||
SegmentMergeInfo top = queue.top();
|
||||
LegacySegmentMergeInfo top = queue.top();
|
||||
|
||||
if (top == null) {
|
||||
term = null;
|
||||
|
@ -1107,6 +1210,9 @@ class DirectoryReader extends IndexReader implements Cloneable {
|
|||
}
|
||||
}
|
||||
|
||||
// @deprecated This is pre-flex API
|
||||
// Exposes pre-flex API by doing on-the-fly merging
|
||||
// pre-flex API to each segment
|
||||
static class MultiTermDocs implements TermDocs {
|
||||
IndexReader topReader; // used for matching TermEnum to TermDocs
|
||||
protected IndexReader[] readers;
|
||||
|
@ -1121,7 +1227,7 @@ class DirectoryReader extends IndexReader implements Cloneable {
|
|||
|
||||
private MultiTermEnum tenum; // the term enum used for seeking... can be null
|
||||
int matchingSegmentPos; // position into the matching segments from tenum
|
||||
SegmentMergeInfo smi; // current segment mere info... can be null
|
||||
LegacySegmentMergeInfo smi; // current segment mere info... can be null
|
||||
|
||||
public MultiTermDocs(IndexReader topReader, IndexReader[] r, int[] s) {
|
||||
this.topReader = topReader;
|
||||
|
@ -1217,7 +1323,7 @@ class DirectoryReader extends IndexReader implements Cloneable {
|
|||
return true;
|
||||
} else if (pointer < readers.length) {
|
||||
if (tenum != null) {
|
||||
SegmentMergeInfo smi = tenum.matchingSegments[matchingSegmentPos++];
|
||||
LegacySegmentMergeInfo smi = tenum.matchingSegments[matchingSegmentPos++];
|
||||
if (smi==null) {
|
||||
pointer = readers.length;
|
||||
return false;
|
||||
|
@ -1258,6 +1364,9 @@ class DirectoryReader extends IndexReader implements Cloneable {
|
|||
}
|
||||
}
|
||||
|
||||
// @deprecated This is pre-flex API
|
||||
// Exposes pre-flex API by doing on-the-fly merging
|
||||
// pre-flex API to each segment
|
||||
static class MultiTermPositions extends MultiTermDocs implements TermPositions {
|
||||
public MultiTermPositions(IndexReader topReader, IndexReader[] r, int[] s) {
|
||||
super(topReader,r,s);
|
||||
|
|
|
@ -67,7 +67,7 @@ final class DocFieldProcessor extends DocConsumer {
|
|||
// consumer can alter the FieldInfo* if necessary. EG,
|
||||
// FreqProxTermsWriter does this with
|
||||
// FieldInfo.storePayload.
|
||||
final String fileName = state.segmentFileName(IndexFileNames.FIELD_INFOS_EXTENSION);
|
||||
final String fileName = IndexFileNames.segmentFileName(state.segmentName, IndexFileNames.FIELD_INFOS_EXTENSION);
|
||||
fieldInfos.write(state.directory, fileName);
|
||||
state.flushedFiles.add(fileName);
|
||||
}
|
||||
|
|
|
@ -113,8 +113,9 @@ final class DocFieldProcessorPerThread extends DocConsumerPerThread {
|
|||
else
|
||||
lastPerField.next = perField.next;
|
||||
|
||||
if (state.docWriter.infoStream != null)
|
||||
state.docWriter.infoStream.println(" purge field=" + perField.fieldInfo.name);
|
||||
if (state.infoStream != null) {
|
||||
state.infoStream.println(" purge field=" + perField.fieldInfo.name);
|
||||
}
|
||||
|
||||
totalFieldCount--;
|
||||
|
||||
|
@ -247,7 +248,7 @@ final class DocFieldProcessorPerThread extends DocConsumerPerThread {
|
|||
fields[i].consumer.processFields(fields[i].fields, fields[i].fieldCount);
|
||||
|
||||
if (docState.maxTermPrefix != null && docState.infoStream != null) {
|
||||
docState.infoStream.println("WARNING: document contains at least one immense term (longer than the max length " + DocumentsWriter.MAX_TERM_LENGTH + "), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '" + docState.maxTermPrefix + "...'");
|
||||
docState.infoStream.println("WARNING: document contains at least one immense term (whose UTF8 encoding is longer than the max length " + DocumentsWriter.MAX_TERM_LENGTH_UTF8 + "), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '" + docState.maxTermPrefix + "...'");
|
||||
docState.maxTermPrefix = null;
|
||||
}
|
||||
|
||||
|
|
|
@ -116,8 +116,9 @@ final class DocInverterPerField extends DocFieldConsumerPerField {
|
|||
reader = readerValue;
|
||||
else {
|
||||
String stringValue = field.stringValue();
|
||||
if (stringValue == null)
|
||||
if (stringValue == null) {
|
||||
throw new IllegalArgumentException("field must have either TokenStream, String or Reader value");
|
||||
}
|
||||
perThread.stringReader.init(stringValue);
|
||||
reader = perThread.stringReader;
|
||||
}
|
||||
|
|
|
@ -21,7 +21,7 @@ import java.io.IOException;
|
|||
|
||||
import org.apache.lucene.util.AttributeSource;
|
||||
import org.apache.lucene.analysis.tokenattributes.OffsetAttribute;
|
||||
import org.apache.lucene.analysis.tokenattributes.TermAttribute;
|
||||
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
|
||||
|
||||
/** This is a DocFieldConsumer that inverts each field,
|
||||
* separately, from a Document, and accepts a
|
||||
|
@ -34,16 +34,16 @@ final class DocInverterPerThread extends DocFieldConsumerPerThread {
|
|||
final SingleTokenAttributeSource singleToken = new SingleTokenAttributeSource();
|
||||
|
||||
static class SingleTokenAttributeSource extends AttributeSource {
|
||||
final TermAttribute termAttribute;
|
||||
final CharTermAttribute termAttribute;
|
||||
final OffsetAttribute offsetAttribute;
|
||||
|
||||
private SingleTokenAttributeSource() {
|
||||
termAttribute = addAttribute(TermAttribute.class);
|
||||
termAttribute = addAttribute(CharTermAttribute.class);
|
||||
offsetAttribute = addAttribute(OffsetAttribute.class);
|
||||
}
|
||||
|
||||
public void reinit(String stringValue, int startOffset, int endOffset) {
|
||||
termAttribute.setTermBuffer(stringValue);
|
||||
termAttribute.setEmpty().append(stringValue);
|
||||
offsetAttribute.setOffset(startOffset, endOffset);
|
||||
}
|
||||
}
|
||||
|
|
|
@ -0,0 +1,44 @@
|
|||
package org.apache.lucene.index;
|
||||
|
||||
/**
|
||||
* Licensed to the Apache Software Foundation (ASF) under one or more
|
||||
* contributor license agreements. See the NOTICE file distributed with
|
||||
* this work for additional information regarding copyright ownership.
|
||||
* The ASF licenses this file to You under the Apache License, Version 2.0
|
||||
* (the "License"); you may not use this file except in compliance with
|
||||
* the License. You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
|
||||
import java.io.IOException;
|
||||
|
||||
import org.apache.lucene.util.BytesRef;
|
||||
|
||||
/** Also iterates through positions. */
|
||||
public abstract class DocsAndPositionsEnum extends DocsEnum {
|
||||
|
||||
/** Returns the next position. You should only call this
|
||||
* up to {@link DocsEnum#freq()} times else
|
||||
* the behavior is not defined. */
|
||||
public abstract int nextPosition() throws IOException;
|
||||
|
||||
/** Returns length of payload at current position */
|
||||
public abstract int getPayloadLength();
|
||||
|
||||
/** Returns the payload at this position, or null if no
|
||||
* payload was indexed. */
|
||||
public abstract BytesRef getPayload() throws IOException;
|
||||
|
||||
public abstract boolean hasPayload();
|
||||
|
||||
public final int read(int[] docs, int[] freqs) {
|
||||
throw new UnsupportedOperationException();
|
||||
}
|
||||
}
|
|
@ -0,0 +1,93 @@
|
|||
package org.apache.lucene.index;
|
||||
|
||||
/**
|
||||
* Licensed to the Apache Software Foundation (ASF) under one or more
|
||||
* contributor license agreements. See the NOTICE file distributed with
|
||||
* this work for additional information regarding copyright ownership.
|
||||
* The ASF licenses this file to You under the Apache License, Version 2.0
|
||||
* (the "License"); you may not use this file except in compliance with
|
||||
* the License. You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
|
||||
import java.io.IOException;
|
||||
|
||||
import org.apache.lucene.search.DocIdSetIterator;
|
||||
import org.apache.lucene.util.AttributeSource;
|
||||
import org.apache.lucene.util.IntsRef;
|
||||
|
||||
/** Iterates through the documents, term freq and positions.
|
||||
* NOTE: you must first call {@link #nextDoc}.
|
||||
*
|
||||
* @lucene.experimental */
|
||||
public abstract class DocsEnum extends DocIdSetIterator {
|
||||
|
||||
private AttributeSource atts = null;
|
||||
|
||||
/** Returns term frequency in the current document. Do
|
||||
* not call this before {@link #nextDoc} is first called,
|
||||
* nor after {@link #nextDoc} returns NO_MORE_DOCS. */
|
||||
public abstract int freq();
|
||||
|
||||
/** Returns the related attributes. */
|
||||
public AttributeSource attributes() {
|
||||
if (atts == null) atts = new AttributeSource();
|
||||
return atts;
|
||||
}
|
||||
|
||||
// TODO: maybe add bulk read only docIDs (for eventual
|
||||
// match-only scoring)
|
||||
|
||||
public static class BulkReadResult {
|
||||
public final IntsRef docs = new IntsRef();
|
||||
public final IntsRef freqs = new IntsRef();
|
||||
}
|
||||
|
||||
protected BulkReadResult bulkResult;
|
||||
|
||||
protected final void initBulkResult() {
|
||||
if (bulkResult == null) {
|
||||
bulkResult = new BulkReadResult();
|
||||
bulkResult.docs.ints = new int[64];
|
||||
bulkResult.freqs.ints = new int[64];
|
||||
}
|
||||
}
|
||||
|
||||
public BulkReadResult getBulkResult() {
|
||||
initBulkResult();
|
||||
return bulkResult;
|
||||
}
|
||||
|
||||
/** Bulk read (docs and freqs). After this is called,
|
||||
* {@link #docID()} and {@link #freq} are undefined. This
|
||||
* returns the count read, or 0 if the end is reached.
|
||||
* The IntsRef for docs and freqs will not have their
|
||||
* length set.
|
||||
*
|
||||
* <p>NOTE: the default impl simply delegates to {@link
|
||||
* #nextDoc}, but subclasses may do this more
|
||||
* efficiently. */
|
||||
public int read() throws IOException {
|
||||
int count = 0;
|
||||
final int[] docs = bulkResult.docs.ints;
|
||||
final int[] freqs = bulkResult.freqs.ints;
|
||||
while(count < docs.length) {
|
||||
final int doc = nextDoc();
|
||||
if (doc != NO_MORE_DOCS) {
|
||||
docs[count] = doc;
|
||||
freqs[count] = freq();
|
||||
count++;
|
||||
} else {
|
||||
break;
|
||||
}
|
||||
}
|
||||
return count;
|
||||
}
|
||||
}
|
|
@ -30,6 +30,7 @@ import java.util.Map.Entry;
|
|||
|
||||
import org.apache.lucene.analysis.Analyzer;
|
||||
import org.apache.lucene.document.Document;
|
||||
import org.apache.lucene.index.codecs.Codec;
|
||||
import org.apache.lucene.search.IndexSearcher;
|
||||
import org.apache.lucene.search.Query;
|
||||
import org.apache.lucene.search.Scorer;
|
||||
|
@ -41,6 +42,7 @@ import org.apache.lucene.store.RAMFile;
|
|||
import org.apache.lucene.util.ArrayUtil;
|
||||
import org.apache.lucene.util.Constants;
|
||||
import org.apache.lucene.util.ThreadInterruptedException;
|
||||
import org.apache.lucene.util.BytesRef;
|
||||
import org.apache.lucene.util.RamUsageEstimator;
|
||||
|
||||
/**
|
||||
|
@ -282,7 +284,6 @@ final class DocumentsWriter {
|
|||
|
||||
// If we've allocated 5% over our RAM budget, we then
|
||||
// free down to 95%
|
||||
private long freeTrigger = (long) (IndexWriterConfig.DEFAULT_RAM_BUFFER_SIZE_MB*1024*1024*1.05);
|
||||
private long freeLevel = (long) (IndexWriterConfig.DEFAULT_RAM_BUFFER_SIZE_MB*1024*1024*0.95);
|
||||
|
||||
// Flush @ this number of docs. If ramBufferSize is
|
||||
|
@ -353,7 +354,6 @@ final class DocumentsWriter {
|
|||
ramBufferSize = (long) (mb*1024*1024);
|
||||
waitQueuePauseBytes = (long) (ramBufferSize*0.1);
|
||||
waitQueueResumeBytes = (long) (ramBufferSize*0.05);
|
||||
freeTrigger = (long) (1.05 * ramBufferSize);
|
||||
freeLevel = (long) (0.95 * ramBufferSize);
|
||||
}
|
||||
}
|
||||
|
@ -550,7 +550,6 @@ final class DocumentsWriter {
|
|||
flushPending = false;
|
||||
for(int i=0;i<threadStates.length;i++)
|
||||
threadStates[i].doAfterFlush();
|
||||
numBytesUsed = 0;
|
||||
}
|
||||
|
||||
// Returns true if an abort is in progress
|
||||
|
@ -590,7 +589,14 @@ final class DocumentsWriter {
|
|||
|
||||
synchronized private void initFlushState(boolean onlyDocStore) {
|
||||
initSegmentName(onlyDocStore);
|
||||
flushState = new SegmentWriteState(this, directory, segment, docStoreSegment, numDocsInRAM, numDocsInStore, writer.getConfig().getTermIndexInterval());
|
||||
flushState = new SegmentWriteState(infoStream, directory, segment, docFieldProcessor.fieldInfos,
|
||||
docStoreSegment, numDocsInRAM, numDocsInStore, writer.getConfig().getTermIndexInterval(),
|
||||
writer.codecs);
|
||||
}
|
||||
|
||||
/** Returns the codec used to flush the last segment */
|
||||
Codec getCodec() {
|
||||
return flushState.codec;
|
||||
}
|
||||
|
||||
/** Flush all pending docs to a new segment */
|
||||
|
@ -628,9 +634,10 @@ final class DocumentsWriter {
|
|||
consumer.flush(threads, flushState);
|
||||
|
||||
if (infoStream != null) {
|
||||
SegmentInfo si = new SegmentInfo(flushState.segmentName, flushState.numDocs, directory);
|
||||
SegmentInfo si = new SegmentInfo(flushState.segmentName, flushState.numDocs, directory, flushState.codec);
|
||||
si.setHasProx(hasProx());
|
||||
final long newSegmentSize = si.sizeInBytes();
|
||||
String message = " oldRAMSize=" + numBytesUsed +
|
||||
String message = " ramUsed=" + nf.format(((double) numBytesUsed)/1024./1024.) + " MB" +
|
||||
" newFlushedSize=" + newSegmentSize +
|
||||
" docs/MB=" + nf.format(numDocsInRAM/(newSegmentSize/1024./1024.)) +
|
||||
" new/old=" + nf.format(100.0*newSegmentSize/numBytesUsed) + "%";
|
||||
|
@ -659,8 +666,9 @@ final class DocumentsWriter {
|
|||
|
||||
CompoundFileWriter cfsWriter = new CompoundFileWriter(directory,
|
||||
IndexFileNames.segmentFileName(segment, IndexFileNames.COMPOUND_FILE_EXTENSION));
|
||||
for (final String flushedFile : flushState.flushedFiles)
|
||||
cfsWriter.addFile(flushedFile);
|
||||
for(String fileName : flushState.flushedFiles) {
|
||||
cfsWriter.addFile(fileName);
|
||||
}
|
||||
|
||||
// Perform the merge
|
||||
cfsWriter.close();
|
||||
|
@ -1032,28 +1040,58 @@ final class DocumentsWriter {
|
|||
|
||||
// Delete by term
|
||||
if (deletesFlushed.terms.size() > 0) {
|
||||
TermDocs docs = reader.termDocs();
|
||||
try {
|
||||
Fields fields = reader.fields();
|
||||
TermsEnum termsEnum = null;
|
||||
|
||||
String currentField = null;
|
||||
BytesRef termRef = new BytesRef();
|
||||
DocsEnum docs = null;
|
||||
|
||||
for (Entry<Term, BufferedDeletes.Num> entry: deletesFlushed.terms.entrySet()) {
|
||||
Term term = entry.getKey();
|
||||
// LUCENE-2086: we should be iterating a TreeMap,
|
||||
// here, so terms better be in order:
|
||||
// Since we visit terms sorted, we gain performance
|
||||
// by re-using the same TermsEnum and seeking only
|
||||
// forwards
|
||||
if (term.field() != currentField) {
|
||||
assert currentField == null || currentField.compareTo(term.field()) < 0;
|
||||
currentField = term.field();
|
||||
Terms terms = fields.terms(currentField);
|
||||
if (terms != null) {
|
||||
termsEnum = terms.iterator();
|
||||
} else {
|
||||
termsEnum = null;
|
||||
}
|
||||
}
|
||||
|
||||
if (termsEnum == null) {
|
||||
continue;
|
||||
}
|
||||
assert checkDeleteTerm(term);
|
||||
docs.seek(term);
|
||||
int limit = entry.getValue().getNum();
|
||||
while (docs.next()) {
|
||||
int docID = docs.doc();
|
||||
if (docIDStart+docID >= limit)
|
||||
break;
|
||||
reader.deleteDocument(docID);
|
||||
any = true;
|
||||
|
||||
termRef.copy(term.text());
|
||||
|
||||
if (termsEnum.seek(termRef, false) == TermsEnum.SeekStatus.FOUND) {
|
||||
DocsEnum docsEnum = termsEnum.docs(reader.getDeletedDocs(), docs);
|
||||
|
||||
if (docsEnum != null) {
|
||||
docs = docsEnum;
|
||||
int limit = entry.getValue().getNum();
|
||||
while (true) {
|
||||
final int docID = docs.nextDoc();
|
||||
if (docID == DocsEnum.NO_MORE_DOCS || docIDStart+docID >= limit) {
|
||||
break;
|
||||
}
|
||||
reader.deleteDocument(docID);
|
||||
any = true;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
} finally {
|
||||
docs.close();
|
||||
//docs.close();
|
||||
}
|
||||
}
|
||||
|
||||
// Delete by docID
|
||||
for (Integer docIdInt : deletesFlushed.docIDs) {
|
||||
int docID = docIdInt.intValue();
|
||||
|
@ -1118,7 +1156,7 @@ final class DocumentsWriter {
|
|||
}
|
||||
|
||||
synchronized boolean doBalanceRAM() {
|
||||
return ramBufferSize != IndexWriterConfig.DISABLE_AUTO_FLUSH && !bufferIsFull && (numBytesUsed+deletesInRAM.bytesUsed+deletesFlushed.bytesUsed >= ramBufferSize || numBytesAlloc >= freeTrigger);
|
||||
return ramBufferSize != IndexWriterConfig.DISABLE_AUTO_FLUSH && !bufferIsFull && (numBytesUsed+deletesInRAM.bytesUsed+deletesFlushed.bytesUsed >= ramBufferSize);
|
||||
}
|
||||
|
||||
/** Does the synchronized work to finish/flush the
|
||||
|
@ -1201,7 +1239,6 @@ final class DocumentsWriter {
|
|||
return numBytesUsed + deletesInRAM.bytesUsed + deletesFlushed.bytesUsed;
|
||||
}
|
||||
|
||||
long numBytesAlloc;
|
||||
long numBytesUsed;
|
||||
|
||||
NumberFormat nf = NumberFormat.getInstance();
|
||||
|
@ -1243,6 +1280,8 @@ final class DocumentsWriter {
|
|||
final static int BYTE_BLOCK_MASK = BYTE_BLOCK_SIZE - 1;
|
||||
final static int BYTE_BLOCK_NOT_MASK = ~BYTE_BLOCK_MASK;
|
||||
|
||||
final static int MAX_TERM_LENGTH_UTF8 = BYTE_BLOCK_SIZE-2;
|
||||
|
||||
private class ByteBlockAllocator extends ByteBlockPool.Allocator {
|
||||
final int blockSize;
|
||||
|
||||
|
@ -1259,19 +1298,16 @@ final class DocumentsWriter {
|
|||
final int size = freeByteBlocks.size();
|
||||
final byte[] b;
|
||||
if (0 == size) {
|
||||
b = new byte[blockSize];
|
||||
// Always record a block allocated, even if
|
||||
// trackAllocations is false. This is necessary
|
||||
// because this block will be shared between
|
||||
// things that don't track allocations (term
|
||||
// vectors) and things that do (freq/prox
|
||||
// postings).
|
||||
numBytesAlloc += blockSize;
|
||||
b = new byte[blockSize];
|
||||
numBytesUsed += blockSize;
|
||||
} else
|
||||
b = freeByteBlocks.remove(size-1);
|
||||
if (trackAllocations)
|
||||
numBytesUsed += blockSize;
|
||||
assert numBytesUsed <= numBytesAlloc;
|
||||
return b;
|
||||
}
|
||||
}
|
||||
|
@ -1291,7 +1327,7 @@ final class DocumentsWriter {
|
|||
final int size = blocks.size();
|
||||
for(int i=0;i<size;i++)
|
||||
freeByteBlocks.add(blocks.get(i));
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
@ -1308,30 +1344,21 @@ final class DocumentsWriter {
|
|||
final int size = freeIntBlocks.size();
|
||||
final int[] b;
|
||||
if (0 == size) {
|
||||
b = new int[INT_BLOCK_SIZE];
|
||||
// Always record a block allocated, even if
|
||||
// trackAllocations is false. This is necessary
|
||||
// because this block will be shared between
|
||||
// things that don't track allocations (term
|
||||
// vectors) and things that do (freq/prox
|
||||
// postings).
|
||||
numBytesAlloc += INT_BLOCK_SIZE*INT_NUM_BYTE;
|
||||
b = new int[INT_BLOCK_SIZE];
|
||||
numBytesUsed += INT_BLOCK_SIZE*INT_NUM_BYTE;
|
||||
} else
|
||||
b = freeIntBlocks.remove(size-1);
|
||||
if (trackAllocations)
|
||||
numBytesUsed += INT_BLOCK_SIZE*INT_NUM_BYTE;
|
||||
assert numBytesUsed <= numBytesAlloc;
|
||||
return b;
|
||||
}
|
||||
|
||||
synchronized void bytesAllocated(long numBytes) {
|
||||
numBytesAlloc += numBytes;
|
||||
assert numBytesUsed <= numBytesAlloc;
|
||||
}
|
||||
|
||||
synchronized void bytesUsed(long numBytes) {
|
||||
numBytesUsed += numBytes;
|
||||
assert numBytesUsed <= numBytesAlloc;
|
||||
}
|
||||
|
||||
/* Return int[]s to the pool */
|
||||
|
@ -1346,78 +1373,34 @@ final class DocumentsWriter {
|
|||
|
||||
final ByteBlockAllocator perDocAllocator = new ByteBlockAllocator(PER_DOC_BLOCK_SIZE);
|
||||
|
||||
|
||||
/* Initial chunk size of the shared char[] blocks used to
|
||||
store term text */
|
||||
final static int CHAR_BLOCK_SHIFT = 14;
|
||||
final static int CHAR_BLOCK_SIZE = 1 << CHAR_BLOCK_SHIFT;
|
||||
final static int CHAR_BLOCK_MASK = CHAR_BLOCK_SIZE - 1;
|
||||
|
||||
final static int MAX_TERM_LENGTH = CHAR_BLOCK_SIZE-1;
|
||||
|
||||
private ArrayList<char[]> freeCharBlocks = new ArrayList<char[]>();
|
||||
|
||||
/* Allocate another char[] from the shared pool */
|
||||
synchronized char[] getCharBlock() {
|
||||
final int size = freeCharBlocks.size();
|
||||
final char[] c;
|
||||
if (0 == size) {
|
||||
numBytesAlloc += CHAR_BLOCK_SIZE * CHAR_NUM_BYTE;
|
||||
c = new char[CHAR_BLOCK_SIZE];
|
||||
} else
|
||||
c = freeCharBlocks.remove(size-1);
|
||||
// We always track allocations of char blocks, for now,
|
||||
// because nothing that skips allocation tracking
|
||||
// (currently only term vectors) uses its own char
|
||||
// blocks.
|
||||
numBytesUsed += CHAR_BLOCK_SIZE * CHAR_NUM_BYTE;
|
||||
assert numBytesUsed <= numBytesAlloc;
|
||||
return c;
|
||||
}
|
||||
|
||||
/* Return char[]s to the pool */
|
||||
synchronized void recycleCharBlocks(char[][] blocks, int numBlocks) {
|
||||
for(int i=0;i<numBlocks;i++)
|
||||
freeCharBlocks.add(blocks[i]);
|
||||
}
|
||||
|
||||
String toMB(long v) {
|
||||
return nf.format(v/1024./1024.);
|
||||
}
|
||||
|
||||
/* We have four pools of RAM: Postings, byte blocks
|
||||
* (holds freq/prox posting data), char blocks (holds
|
||||
* characters in the term) and per-doc buffers (stored fields/term vectors).
|
||||
* Different docs require varying amount of storage from
|
||||
* these four classes.
|
||||
*
|
||||
* For example, docs with many unique single-occurrence
|
||||
* short terms will use up the Postings RAM and hardly any
|
||||
* of the other two. Whereas docs with very large terms
|
||||
* will use alot of char blocks RAM and relatively less of
|
||||
* the other two. This method just frees allocations from
|
||||
* the pools once we are over-budget, which balances the
|
||||
* pools to match the current docs. */
|
||||
/* We have three pools of RAM: Postings, byte blocks
|
||||
* (holds freq/prox posting data) and per-doc buffers
|
||||
* (stored fields/term vectors). Different docs require
|
||||
* varying amount of storage from these classes. For
|
||||
* example, docs with many unique single-occurrence short
|
||||
* terms will use up the Postings RAM and hardly any of
|
||||
* the other two. Whereas docs with very large terms will
|
||||
* use alot of byte blocks RAM. This method just frees
|
||||
* allocations from the pools once we are over-budget,
|
||||
* which balances the pools to match the current docs. */
|
||||
void balanceRAM() {
|
||||
|
||||
// We flush when we've used our target usage
|
||||
final long flushTrigger = ramBufferSize;
|
||||
|
||||
final long deletesRAMUsed = deletesInRAM.bytesUsed+deletesFlushed.bytesUsed;
|
||||
|
||||
if (numBytesAlloc+deletesRAMUsed > freeTrigger) {
|
||||
if (numBytesUsed+deletesRAMUsed > ramBufferSize) {
|
||||
|
||||
if (infoStream != null)
|
||||
message(" RAM: now balance allocations: usedMB=" + toMB(numBytesUsed) +
|
||||
" vs trigger=" + toMB(flushTrigger) +
|
||||
" allocMB=" + toMB(numBytesAlloc) +
|
||||
" vs trigger=" + toMB(ramBufferSize) +
|
||||
" deletesMB=" + toMB(deletesRAMUsed) +
|
||||
" vs trigger=" + toMB(freeTrigger) +
|
||||
" byteBlockFree=" + toMB(byteBlockAllocator.freeByteBlocks.size()*BYTE_BLOCK_SIZE) +
|
||||
" perDocFree=" + toMB(perDocAllocator.freeByteBlocks.size()*PER_DOC_BLOCK_SIZE) +
|
||||
" charBlockFree=" + toMB(freeCharBlocks.size()*CHAR_BLOCK_SIZE*CHAR_NUM_BYTE));
|
||||
" perDocFree=" + toMB(perDocAllocator.freeByteBlocks.size()*PER_DOC_BLOCK_SIZE));
|
||||
|
||||
final long startBytesAlloc = numBytesAlloc + deletesRAMUsed;
|
||||
final long startBytesUsed = numBytesUsed + deletesRAMUsed;
|
||||
|
||||
int iter = 0;
|
||||
|
||||
|
@ -1427,46 +1410,38 @@ final class DocumentsWriter {
|
|||
|
||||
boolean any = true;
|
||||
|
||||
while(numBytesAlloc+deletesRAMUsed > freeLevel) {
|
||||
while(numBytesUsed+deletesRAMUsed > freeLevel) {
|
||||
|
||||
synchronized(this) {
|
||||
if (0 == perDocAllocator.freeByteBlocks.size()
|
||||
&& 0 == byteBlockAllocator.freeByteBlocks.size()
|
||||
&& 0 == freeCharBlocks.size()
|
||||
&& 0 == freeIntBlocks.size()
|
||||
&& !any) {
|
||||
if (0 == perDocAllocator.freeByteBlocks.size() &&
|
||||
0 == byteBlockAllocator.freeByteBlocks.size() &&
|
||||
0 == freeIntBlocks.size() && !any) {
|
||||
// Nothing else to free -- must flush now.
|
||||
bufferIsFull = numBytesUsed+deletesRAMUsed > flushTrigger;
|
||||
bufferIsFull = numBytesUsed+deletesRAMUsed > ramBufferSize;
|
||||
if (infoStream != null) {
|
||||
if (numBytesUsed > flushTrigger)
|
||||
if (numBytesUsed+deletesRAMUsed > ramBufferSize)
|
||||
message(" nothing to free; now set bufferIsFull");
|
||||
else
|
||||
message(" nothing to free");
|
||||
}
|
||||
assert numBytesUsed <= numBytesAlloc;
|
||||
break;
|
||||
}
|
||||
|
||||
if ((0 == iter % 5) && byteBlockAllocator.freeByteBlocks.size() > 0) {
|
||||
if ((0 == iter % 4) && byteBlockAllocator.freeByteBlocks.size() > 0) {
|
||||
byteBlockAllocator.freeByteBlocks.remove(byteBlockAllocator.freeByteBlocks.size()-1);
|
||||
numBytesAlloc -= BYTE_BLOCK_SIZE;
|
||||
numBytesUsed -= BYTE_BLOCK_SIZE;
|
||||
}
|
||||
|
||||
if ((1 == iter % 5) && freeCharBlocks.size() > 0) {
|
||||
freeCharBlocks.remove(freeCharBlocks.size()-1);
|
||||
numBytesAlloc -= CHAR_BLOCK_SIZE * CHAR_NUM_BYTE;
|
||||
}
|
||||
|
||||
if ((2 == iter % 5) && freeIntBlocks.size() > 0) {
|
||||
if ((1 == iter % 4) && freeIntBlocks.size() > 0) {
|
||||
freeIntBlocks.remove(freeIntBlocks.size()-1);
|
||||
numBytesAlloc -= INT_BLOCK_SIZE * INT_NUM_BYTE;
|
||||
numBytesUsed -= INT_BLOCK_SIZE * INT_NUM_BYTE;
|
||||
}
|
||||
|
||||
if ((3 == iter % 5) && perDocAllocator.freeByteBlocks.size() > 0) {
|
||||
if ((2 == iter % 4) && perDocAllocator.freeByteBlocks.size() > 0) {
|
||||
// Remove upwards of 32 blocks (each block is 1K)
|
||||
for (int i = 0; i < 32; ++i) {
|
||||
perDocAllocator.freeByteBlocks.remove(perDocAllocator.freeByteBlocks.size() - 1);
|
||||
numBytesAlloc -= PER_DOC_BLOCK_SIZE;
|
||||
numBytesUsed -= PER_DOC_BLOCK_SIZE;
|
||||
if (perDocAllocator.freeByteBlocks.size() == 0) {
|
||||
break;
|
||||
}
|
||||
|
@ -1474,7 +1449,7 @@ final class DocumentsWriter {
|
|||
}
|
||||
}
|
||||
|
||||
if ((4 == iter % 5) && any)
|
||||
if ((3 == iter % 4) && any)
|
||||
// Ask consumer to free any recycled state
|
||||
any = consumer.freeRAM();
|
||||
|
||||
|
@ -1482,26 +1457,7 @@ final class DocumentsWriter {
|
|||
}
|
||||
|
||||
if (infoStream != null)
|
||||
message(" after free: freedMB=" + nf.format((startBytesAlloc-numBytesAlloc-deletesRAMUsed)/1024./1024.) + " usedMB=" + nf.format((numBytesUsed+deletesRAMUsed)/1024./1024.) + " allocMB=" + nf.format(numBytesAlloc/1024./1024.));
|
||||
|
||||
} else {
|
||||
// If we have not crossed the 100% mark, but have
|
||||
// crossed the 95% mark of RAM we are actually
|
||||
// using, go ahead and flush. This prevents
|
||||
// over-allocating and then freeing, with every
|
||||
// flush.
|
||||
synchronized(this) {
|
||||
|
||||
if (numBytesUsed+deletesRAMUsed > flushTrigger) {
|
||||
if (infoStream != null)
|
||||
message(" RAM: now flush @ usedMB=" + nf.format(numBytesUsed/1024./1024.) +
|
||||
" allocMB=" + nf.format(numBytesAlloc/1024./1024.) +
|
||||
" deletesMB=" + nf.format(deletesRAMUsed/1024./1024.) +
|
||||
" triggerMB=" + nf.format(flushTrigger/1024./1024.));
|
||||
|
||||
bufferIsFull = true;
|
||||
}
|
||||
}
|
||||
message(" after free: freedMB=" + nf.format((startBytesUsed-numBytesUsed-deletesRAMUsed)/1024./1024.) + " usedMB=" + nf.format((numBytesUsed+deletesRAMUsed)/1024./1024.));
|
||||
}
|
||||
}
|
||||
|
||||
|
|
|
@ -17,20 +17,21 @@ package org.apache.lucene.index;
|
|||
* limitations under the License.
|
||||
*/
|
||||
|
||||
final class FieldInfo {
|
||||
String name;
|
||||
boolean isIndexed;
|
||||
int number;
|
||||
/** @lucene.experimental */
|
||||
public final class FieldInfo {
|
||||
public String name;
|
||||
public boolean isIndexed;
|
||||
public int number;
|
||||
|
||||
// true if term vector for this field should be stored
|
||||
boolean storeTermVector;
|
||||
boolean storeOffsetWithTermVector;
|
||||
boolean storePositionWithTermVector;
|
||||
|
||||
boolean omitNorms; // omit norms associated with indexed fields
|
||||
boolean omitTermFreqAndPositions;
|
||||
public boolean omitNorms; // omit norms associated with indexed fields
|
||||
public boolean omitTermFreqAndPositions;
|
||||
|
||||
boolean storePayloads; // whether this field stores payloads together with term positions
|
||||
public boolean storePayloads; // whether this field stores payloads together with term positions
|
||||
|
||||
FieldInfo(String na, boolean tk, int nu, boolean storeTermVector,
|
||||
boolean storePositionWithTermVector, boolean storeOffsetWithTermVector,
|
||||
|
|
|
@ -32,8 +32,9 @@ import java.util.*;
|
|||
* of this class are thread-safe for multiple readers, but only one thread can
|
||||
* be adding documents at a time, with no other reader or writer threads
|
||||
* accessing this object.
|
||||
* @lucene.experimental
|
||||
*/
|
||||
final class FieldInfos {
|
||||
public final class FieldInfos {
|
||||
|
||||
// Used internally (ie not written to *.fnm files) for pre-2.9 files
|
||||
public static final int FORMAT_PRE = -1;
|
||||
|
@ -120,7 +121,7 @@ final class FieldInfos {
|
|||
}
|
||||
|
||||
/** Returns true if any fields do not omitTermFreqAndPositions */
|
||||
boolean hasProx() {
|
||||
public boolean hasProx() {
|
||||
final int numFields = byNumber.size();
|
||||
for(int i=0;i<numFields;i++) {
|
||||
final FieldInfo fi = fieldInfo(i);
|
||||
|
|
|
@ -0,0 +1,36 @@
|
|||
package org.apache.lucene.index;
|
||||
|
||||
/**
|
||||
* Licensed to the Apache Software Foundation (ASF) under one or more
|
||||
* contributor license agreements. See the NOTICE file distributed with
|
||||
* this work for additional information regarding copyright ownership.
|
||||
* The ASF licenses this file to You under the Apache License, Version 2.0
|
||||
* (the "License"); you may not use this file except in compliance with
|
||||
* the License. You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
|
||||
import java.io.IOException;
|
||||
|
||||
/** Flex API for access to fields and terms
|
||||
* @lucene.experimental */
|
||||
|
||||
public abstract class Fields {
|
||||
|
||||
/** Returns an iterator that will step through all fields
|
||||
* names. This will not return null. */
|
||||
public abstract FieldsEnum iterator() throws IOException;
|
||||
|
||||
/** Get the {@link Terms} for this field. This may return
|
||||
* null if the field does not exist. */
|
||||
public abstract Terms terms(String field) throws IOException;
|
||||
|
||||
public final static Fields[] EMPTY_ARRAY = new Fields[0];
|
||||
}
|
|
@ -0,0 +1,74 @@
|
|||
package org.apache.lucene.index;
|
||||
|
||||
/**
|
||||
* Licensed to the Apache Software Foundation (ASF) under one or more
|
||||
* contributor license agreements. See the NOTICE file distributed with
|
||||
* this work for additional information regarding copyright ownership.
|
||||
* The ASF licenses this file to You under the Apache License, Version 2.0
|
||||
* (the "License"); you may not use this file except in compliance with
|
||||
* the License. You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
|
||||
import java.io.IOException;
|
||||
|
||||
import org.apache.lucene.util.AttributeSource;
|
||||
|
||||
/** Enumerates indexed fields. You must first call {@link
|
||||
* #next} before calling {@link #terms}.
|
||||
*
|
||||
* @lucene.experimental */
|
||||
|
||||
public abstract class FieldsEnum {
|
||||
|
||||
// TODO: maybe allow retrieving FieldInfo for current
|
||||
// field, as optional method?
|
||||
|
||||
private AttributeSource atts = null;
|
||||
|
||||
/**
|
||||
* Returns the related attributes.
|
||||
*/
|
||||
public AttributeSource attributes() {
|
||||
if (atts == null) {
|
||||
atts = new AttributeSource();
|
||||
}
|
||||
return atts;
|
||||
}
|
||||
|
||||
/** Increments the enumeration to the next field. The
|
||||
* returned field is always interned, so simple ==
|
||||
* comparison is allowed. Returns null when there are no
|
||||
* more fields.*/
|
||||
public abstract String next() throws IOException;
|
||||
|
||||
/** Get {@link TermsEnum} for the current field. You
|
||||
* should not call {@link #next} until you're done using
|
||||
* this {@link TermsEnum}. After {@link #next} returns
|
||||
* null this method should not be called. This method
|
||||
* will not return null. */
|
||||
public abstract TermsEnum terms() throws IOException;
|
||||
|
||||
public final static FieldsEnum[] EMPTY_ARRAY = new FieldsEnum[0];
|
||||
|
||||
/** Provides zero fields */
|
||||
public final static FieldsEnum EMPTY = new FieldsEnum() {
|
||||
|
||||
@Override
|
||||
public String next() {
|
||||
return null;
|
||||
}
|
||||
|
||||
@Override
|
||||
public TermsEnum terms() {
|
||||
throw new IllegalStateException("this method should never be called");
|
||||
}
|
||||
};
|
||||
}
|
|
@ -20,7 +20,9 @@ package org.apache.lucene.index;
|
|||
import org.apache.lucene.document.Document;
|
||||
import org.apache.lucene.document.FieldSelector;
|
||||
import org.apache.lucene.store.Directory;
|
||||
import org.apache.lucene.util.Bits;
|
||||
import org.apache.lucene.search.FieldCache; // not great (circular); used only to purge FieldCache entry on close
|
||||
import org.apache.lucene.util.BytesRef;
|
||||
|
||||
import java.io.IOException;
|
||||
import java.util.Collection;
|
||||
|
@ -115,6 +117,11 @@ public class FilterIndexReader extends IndexReader {
|
|||
return in.directory();
|
||||
}
|
||||
|
||||
@Override
|
||||
public Bits getDeletedDocs() throws IOException {
|
||||
return in.getDeletedDocs();
|
||||
}
|
||||
|
||||
@Override
|
||||
public TermFreqVector[] getTermFreqVectors(int docNumber)
|
||||
throws IOException {
|
||||
|
@ -217,6 +224,12 @@ public class FilterIndexReader extends IndexReader {
|
|||
return in.docFreq(t);
|
||||
}
|
||||
|
||||
@Override
|
||||
public int docFreq(String field, BytesRef t) throws IOException {
|
||||
ensureOpen();
|
||||
return in.docFreq(field, t);
|
||||
}
|
||||
|
||||
@Override
|
||||
public TermDocs termDocs() throws IOException {
|
||||
ensureOpen();
|
||||
|
|
|
@ -1,129 +0,0 @@
|
|||
package org.apache.lucene.index;
|
||||
|
||||
/**
|
||||
* Licensed to the Apache Software Foundation (ASF) under one or more
|
||||
* contributor license agreements. See the NOTICE file distributed with
|
||||
* this work for additional information regarding copyright ownership.
|
||||
* The ASF licenses this file to You under the Apache License, Version 2.0
|
||||
* (the "License"); you may not use this file except in compliance with
|
||||
* the License. You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
|
||||
/** Consumes doc & freq, writing them using the current
|
||||
* index file format */
|
||||
|
||||
import java.io.IOException;
|
||||
|
||||
import org.apache.lucene.util.UnicodeUtil;
|
||||
import org.apache.lucene.store.IndexOutput;
|
||||
|
||||
final class FormatPostingsDocsWriter extends FormatPostingsDocsConsumer {
|
||||
|
||||
final IndexOutput out;
|
||||
final FormatPostingsTermsWriter parent;
|
||||
final FormatPostingsPositionsWriter posWriter;
|
||||
final DefaultSkipListWriter skipListWriter;
|
||||
final int skipInterval;
|
||||
final int totalNumDocs;
|
||||
|
||||
boolean omitTermFreqAndPositions;
|
||||
boolean storePayloads;
|
||||
long freqStart;
|
||||
FieldInfo fieldInfo;
|
||||
|
||||
FormatPostingsDocsWriter(SegmentWriteState state, FormatPostingsTermsWriter parent) throws IOException {
|
||||
super();
|
||||
this.parent = parent;
|
||||
final String fileName = IndexFileNames.segmentFileName(parent.parent.segment, IndexFileNames.FREQ_EXTENSION);
|
||||
state.flushedFiles.add(fileName);
|
||||
out = parent.parent.dir.createOutput(fileName);
|
||||
totalNumDocs = parent.parent.totalNumDocs;
|
||||
|
||||
// TODO: abstraction violation
|
||||
skipInterval = parent.parent.termsOut.skipInterval;
|
||||
skipListWriter = parent.parent.skipListWriter;
|
||||
skipListWriter.setFreqOutput(out);
|
||||
|
||||
posWriter = new FormatPostingsPositionsWriter(state, this);
|
||||
}
|
||||
|
||||
void setField(FieldInfo fieldInfo) {
|
||||
this.fieldInfo = fieldInfo;
|
||||
omitTermFreqAndPositions = fieldInfo.omitTermFreqAndPositions;
|
||||
storePayloads = fieldInfo.storePayloads;
|
||||
posWriter.setField(fieldInfo);
|
||||
}
|
||||
|
||||
int lastDocID;
|
||||
int df;
|
||||
|
||||
/** Adds a new doc in this term. If this returns null
|
||||
* then we just skip consuming positions/payloads. */
|
||||
@Override
|
||||
FormatPostingsPositionsConsumer addDoc(int docID, int termDocFreq) throws IOException {
|
||||
|
||||
final int delta = docID - lastDocID;
|
||||
|
||||
if (docID < 0 || (df > 0 && delta <= 0))
|
||||
throw new CorruptIndexException("docs out of order (" + docID + " <= " + lastDocID + " )");
|
||||
|
||||
if ((++df % skipInterval) == 0) {
|
||||
// TODO: abstraction violation
|
||||
skipListWriter.setSkipData(lastDocID, storePayloads, posWriter.lastPayloadLength);
|
||||
skipListWriter.bufferSkip(df);
|
||||
}
|
||||
|
||||
assert docID < totalNumDocs: "docID=" + docID + " totalNumDocs=" + totalNumDocs;
|
||||
|
||||
lastDocID = docID;
|
||||
if (omitTermFreqAndPositions)
|
||||
out.writeVInt(delta);
|
||||
else if (1 == termDocFreq)
|
||||
out.writeVInt((delta<<1) | 1);
|
||||
else {
|
||||
out.writeVInt(delta<<1);
|
||||
out.writeVInt(termDocFreq);
|
||||
}
|
||||
|
||||
return posWriter;
|
||||
}
|
||||
|
||||
private final TermInfo termInfo = new TermInfo(); // minimize consing
|
||||
final UnicodeUtil.UTF8Result utf8 = new UnicodeUtil.UTF8Result();
|
||||
|
||||
/** Called when we are done adding docs to this term */
|
||||
@Override
|
||||
void finish() throws IOException {
|
||||
long skipPointer = skipListWriter.writeSkip(out);
|
||||
|
||||
// TODO: this is abstraction violation -- we should not
|
||||
// peek up into parents terms encoding format
|
||||
termInfo.set(df, parent.freqStart, parent.proxStart, (int) (skipPointer - parent.freqStart));
|
||||
|
||||
// TODO: we could do this incrementally
|
||||
UnicodeUtil.UTF16toUTF8(parent.currentTerm, parent.currentTermStart, utf8);
|
||||
|
||||
if (df > 0) {
|
||||
parent.termsOut.add(fieldInfo.number,
|
||||
utf8.result,
|
||||
utf8.length,
|
||||
termInfo);
|
||||
}
|
||||
|
||||
lastDocID = 0;
|
||||
df = 0;
|
||||
}
|
||||
|
||||
void close() throws IOException {
|
||||
out.close();
|
||||
posWriter.close();
|
||||
}
|
||||
}
|
|
@ -1,75 +0,0 @@
|
|||
package org.apache.lucene.index;
|
||||
|
||||
/**
|
||||
* Licensed to the Apache Software Foundation (ASF) under one or more
|
||||
* contributor license agreements. See the NOTICE file distributed with
|
||||
* this work for additional information regarding copyright ownership.
|
||||
* The ASF licenses this file to You under the Apache License, Version 2.0
|
||||
* (the "License"); you may not use this file except in compliance with
|
||||
* the License. You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
|
||||
import java.io.IOException;
|
||||
|
||||
import org.apache.lucene.store.Directory;
|
||||
|
||||
final class FormatPostingsFieldsWriter extends FormatPostingsFieldsConsumer {
|
||||
|
||||
final Directory dir;
|
||||
final String segment;
|
||||
final TermInfosWriter termsOut;
|
||||
final FieldInfos fieldInfos;
|
||||
final FormatPostingsTermsWriter termsWriter;
|
||||
final DefaultSkipListWriter skipListWriter;
|
||||
final int totalNumDocs;
|
||||
|
||||
public FormatPostingsFieldsWriter(SegmentWriteState state, FieldInfos fieldInfos) throws IOException {
|
||||
super();
|
||||
|
||||
dir = state.directory;
|
||||
segment = state.segmentName;
|
||||
totalNumDocs = state.numDocs;
|
||||
this.fieldInfos = fieldInfos;
|
||||
termsOut = new TermInfosWriter(dir,
|
||||
segment,
|
||||
fieldInfos,
|
||||
state.termIndexInterval);
|
||||
|
||||
// TODO: this is a nasty abstraction violation (that we
|
||||
// peek down to find freqOut/proxOut) -- we need a
|
||||
// better abstraction here whereby these child consumers
|
||||
// can provide skip data or not
|
||||
skipListWriter = new DefaultSkipListWriter(termsOut.skipInterval,
|
||||
termsOut.maxSkipLevels,
|
||||
totalNumDocs,
|
||||
null,
|
||||
null);
|
||||
|
||||
state.flushedFiles.add(state.segmentFileName(IndexFileNames.TERMS_EXTENSION));
|
||||
state.flushedFiles.add(state.segmentFileName(IndexFileNames.TERMS_INDEX_EXTENSION));
|
||||
|
||||
termsWriter = new FormatPostingsTermsWriter(state, this);
|
||||
}
|
||||
|
||||
/** Add a new field */
|
||||
@Override
|
||||
FormatPostingsTermsConsumer addField(FieldInfo field) {
|
||||
termsWriter.setField(field);
|
||||
return termsWriter;
|
||||
}
|
||||
|
||||
/** Called when we are done adding everything. */
|
||||
@Override
|
||||
void finish() throws IOException {
|
||||
termsOut.close();
|
||||
termsWriter.close();
|
||||
}
|
||||
}
|
|
@ -1,89 +0,0 @@
|
|||
package org.apache.lucene.index;
|
||||
|
||||
/**
|
||||
* Licensed to the Apache Software Foundation (ASF) under one or more
|
||||
* contributor license agreements. See the NOTICE file distributed with
|
||||
* this work for additional information regarding copyright ownership.
|
||||
* The ASF licenses this file to You under the Apache License, Version 2.0
|
||||
* (the "License"); you may not use this file except in compliance with
|
||||
* the License. You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
|
||||
import org.apache.lucene.store.IndexOutput;
|
||||
|
||||
|
||||
import java.io.IOException;
|
||||
|
||||
final class FormatPostingsPositionsWriter extends FormatPostingsPositionsConsumer {
|
||||
|
||||
final FormatPostingsDocsWriter parent;
|
||||
final IndexOutput out;
|
||||
|
||||
boolean omitTermFreqAndPositions;
|
||||
boolean storePayloads;
|
||||
int lastPayloadLength = -1;
|
||||
|
||||
FormatPostingsPositionsWriter(SegmentWriteState state, FormatPostingsDocsWriter parent) throws IOException {
|
||||
this.parent = parent;
|
||||
omitTermFreqAndPositions = parent.omitTermFreqAndPositions;
|
||||
if (parent.parent.parent.fieldInfos.hasProx()) {
|
||||
// At least one field does not omit TF, so create the
|
||||
// prox file
|
||||
final String fileName = IndexFileNames.segmentFileName(parent.parent.parent.segment, IndexFileNames.PROX_EXTENSION);
|
||||
state.flushedFiles.add(fileName);
|
||||
out = parent.parent.parent.dir.createOutput(fileName);
|
||||
parent.skipListWriter.setProxOutput(out);
|
||||
} else
|
||||
// Every field omits TF so we will write no prox file
|
||||
out = null;
|
||||
}
|
||||
|
||||
int lastPosition;
|
||||
|
||||
/** Add a new position & payload */
|
||||
@Override
|
||||
void addPosition(int position, byte[] payload, int payloadOffset, int payloadLength) throws IOException {
|
||||
assert !omitTermFreqAndPositions: "omitTermFreqAndPositions is true";
|
||||
assert out != null;
|
||||
|
||||
final int delta = position - lastPosition;
|
||||
lastPosition = position;
|
||||
|
||||
if (storePayloads) {
|
||||
if (payloadLength != lastPayloadLength) {
|
||||
lastPayloadLength = payloadLength;
|
||||
out.writeVInt((delta<<1)|1);
|
||||
out.writeVInt(payloadLength);
|
||||
} else
|
||||
out.writeVInt(delta << 1);
|
||||
if (payloadLength > 0)
|
||||
out.writeBytes(payload, payloadLength);
|
||||
} else
|
||||
out.writeVInt(delta);
|
||||
}
|
||||
|
||||
void setField(FieldInfo fieldInfo) {
|
||||
omitTermFreqAndPositions = fieldInfo.omitTermFreqAndPositions;
|
||||
storePayloads = omitTermFreqAndPositions ? false : fieldInfo.storePayloads;
|
||||
}
|
||||
|
||||
/** Called when we are done adding positions & payloads */
|
||||
@Override
|
||||
void finish() {
|
||||
lastPosition = 0;
|
||||
lastPayloadLength = -1;
|
||||
}
|
||||
|
||||
void close() throws IOException {
|
||||
if (out != null)
|
||||
out.close();
|
||||
}
|
||||
}
|
|
@ -1,47 +0,0 @@
|
|||
package org.apache.lucene.index;
|
||||
|
||||
/**
|
||||
* Licensed to the Apache Software Foundation (ASF) under one or more
|
||||
* contributor license agreements. See the NOTICE file distributed with
|
||||
* this work for additional information regarding copyright ownership.
|
||||
* The ASF licenses this file to You under the Apache License, Version 2.0
|
||||
* (the "License"); you may not use this file except in compliance with
|
||||
* the License. You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
|
||||
import java.io.IOException;
|
||||
|
||||
import org.apache.lucene.util.ArrayUtil;
|
||||
import org.apache.lucene.util.RamUsageEstimator;
|
||||
|
||||
/**
|
||||
* @lucene.experimental
|
||||
*/
|
||||
|
||||
abstract class FormatPostingsTermsConsumer {
|
||||
|
||||
/** Adds a new term in this field; term ends with U+FFFF
|
||||
* char */
|
||||
abstract FormatPostingsDocsConsumer addTerm(char[] text, int start) throws IOException;
|
||||
|
||||
char[] termBuffer;
|
||||
FormatPostingsDocsConsumer addTerm(String text) throws IOException {
|
||||
final int len = text.length();
|
||||
if (termBuffer == null || termBuffer.length < 1+len)
|
||||
termBuffer = new char[ArrayUtil.oversize(1+len, RamUsageEstimator.NUM_BYTES_CHAR)];
|
||||
text.getChars(0, len, termBuffer, 0);
|
||||
termBuffer[len] = 0xffff;
|
||||
return addTerm(termBuffer, 0);
|
||||
}
|
||||
|
||||
/** Called when we are done adding terms to this field */
|
||||
abstract void finish() throws IOException;
|
||||
}
|
|
@ -1,73 +0,0 @@
|
|||
package org.apache.lucene.index;
|
||||
|
||||
/**
|
||||
* Licensed to the Apache Software Foundation (ASF) under one or more
|
||||
* contributor license agreements. See the NOTICE file distributed with
|
||||
* this work for additional information regarding copyright ownership.
|
||||
* The ASF licenses this file to You under the Apache License, Version 2.0
|
||||
* (the "License"); you may not use this file except in compliance with
|
||||
* the License. You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
|
||||
import java.io.IOException;
|
||||
|
||||
final class FormatPostingsTermsWriter extends FormatPostingsTermsConsumer {
|
||||
|
||||
final FormatPostingsFieldsWriter parent;
|
||||
final FormatPostingsDocsWriter docsWriter;
|
||||
final TermInfosWriter termsOut;
|
||||
FieldInfo fieldInfo;
|
||||
|
||||
FormatPostingsTermsWriter(SegmentWriteState state, FormatPostingsFieldsWriter parent) throws IOException {
|
||||
super();
|
||||
this.parent = parent;
|
||||
termsOut = parent.termsOut;
|
||||
docsWriter = new FormatPostingsDocsWriter(state, this);
|
||||
}
|
||||
|
||||
void setField(FieldInfo fieldInfo) {
|
||||
this.fieldInfo = fieldInfo;
|
||||
docsWriter.setField(fieldInfo);
|
||||
}
|
||||
|
||||
char[] currentTerm;
|
||||
int currentTermStart;
|
||||
|
||||
long freqStart;
|
||||
long proxStart;
|
||||
|
||||
/** Adds a new term in this field */
|
||||
@Override
|
||||
FormatPostingsDocsConsumer addTerm(char[] text, int start) {
|
||||
currentTerm = text;
|
||||
currentTermStart = start;
|
||||
|
||||
// TODO: this is abstraction violation -- ideally this
|
||||
// terms writer is not so "invasive", looking for file
|
||||
// pointers in its child consumers.
|
||||
freqStart = docsWriter.out.getFilePointer();
|
||||
if (docsWriter.posWriter.out != null)
|
||||
proxStart = docsWriter.posWriter.out.getFilePointer();
|
||||
|
||||
parent.skipListWriter.resetSkip();
|
||||
|
||||
return docsWriter;
|
||||
}
|
||||
|
||||
/** Called when we are done adding terms to this field */
|
||||
@Override
|
||||
void finish() {
|
||||
}
|
||||
|
||||
void close() throws IOException {
|
||||
docsWriter.close();
|
||||
}
|
||||
}
|
|
@ -18,6 +18,8 @@ package org.apache.lucene.index;
|
|||
*/
|
||||
|
||||
import java.io.IOException;
|
||||
import java.util.Comparator;
|
||||
import org.apache.lucene.util.BytesRef;
|
||||
|
||||
import org.apache.lucene.index.FreqProxTermsWriterPerField.FreqProxPostingsArray;
|
||||
|
||||
|
@ -31,13 +33,12 @@ final class FreqProxFieldMergeState {
|
|||
|
||||
final FreqProxTermsWriterPerField field;
|
||||
final int numPostings;
|
||||
final CharBlockPool charPool;
|
||||
private final ByteBlockPool bytePool;
|
||||
final int[] termIDs;
|
||||
final FreqProxPostingsArray postings;
|
||||
int currentTermID;
|
||||
|
||||
char[] text;
|
||||
int textOffset;
|
||||
final BytesRef text = new BytesRef();
|
||||
|
||||
private int postingUpto = -1;
|
||||
|
||||
|
@ -47,29 +48,31 @@ final class FreqProxFieldMergeState {
|
|||
int docID;
|
||||
int termFreq;
|
||||
|
||||
public FreqProxFieldMergeState(FreqProxTermsWriterPerField field) {
|
||||
public FreqProxFieldMergeState(FreqProxTermsWriterPerField field, Comparator<BytesRef> termComp) {
|
||||
this.field = field;
|
||||
this.charPool = field.perThread.termsHashPerThread.charPool;
|
||||
this.numPostings = field.termsHashPerField.numPostings;
|
||||
this.termIDs = field.termsHashPerField.sortPostings();
|
||||
this.bytePool = field.perThread.termsHashPerThread.bytePool;
|
||||
this.termIDs = field.termsHashPerField.sortPostings(termComp);
|
||||
this.postings = (FreqProxPostingsArray) field.termsHashPerField.postingsArray;
|
||||
}
|
||||
|
||||
boolean nextTerm() throws IOException {
|
||||
postingUpto++;
|
||||
if (postingUpto == numPostings)
|
||||
if (postingUpto == numPostings) {
|
||||
return false;
|
||||
}
|
||||
|
||||
currentTermID = termIDs[postingUpto];
|
||||
docID = 0;
|
||||
|
||||
// Get BytesRef
|
||||
final int textStart = postings.textStarts[currentTermID];
|
||||
text = charPool.buffers[textStart >> DocumentsWriter.CHAR_BLOCK_SHIFT];
|
||||
textOffset = textStart & DocumentsWriter.CHAR_BLOCK_MASK;
|
||||
bytePool.setBytesRef(text, textStart);
|
||||
|
||||
field.termsHashPerField.initReader(freq, currentTermID, 0);
|
||||
if (!field.fieldInfo.omitTermFreqAndPositions)
|
||||
if (!field.fieldInfo.omitTermFreqAndPositions) {
|
||||
field.termsHashPerField.initReader(prox, currentTermID, 1);
|
||||
}
|
||||
|
||||
// Should always be true
|
||||
boolean result = nextDoc();
|
||||
|
|
|
@ -17,14 +17,19 @@ package org.apache.lucene.index;
|
|||
* limitations under the License.
|
||||
*/
|
||||
|
||||
import org.apache.lucene.util.UnicodeUtil;
|
||||
|
||||
import java.io.IOException;
|
||||
import java.util.ArrayList;
|
||||
import java.util.Collection;
|
||||
import java.util.Collections;
|
||||
import java.util.Map;
|
||||
import java.util.ArrayList;
|
||||
import java.util.Iterator;
|
||||
import java.util.List;
|
||||
import java.util.Map;
|
||||
import java.util.Comparator;
|
||||
|
||||
import org.apache.lucene.index.codecs.PostingsConsumer;
|
||||
import org.apache.lucene.index.codecs.FieldsConsumer;
|
||||
import org.apache.lucene.index.codecs.TermsConsumer;
|
||||
import org.apache.lucene.util.BytesRef;
|
||||
|
||||
final class FreqProxTermsWriter extends TermsHashConsumer {
|
||||
|
||||
|
@ -33,27 +38,13 @@ final class FreqProxTermsWriter extends TermsHashConsumer {
|
|||
return new FreqProxTermsWriterPerThread(perThread);
|
||||
}
|
||||
|
||||
private static int compareText(final char[] text1, int pos1, final char[] text2, int pos2) {
|
||||
while(true) {
|
||||
final char c1 = text1[pos1++];
|
||||
final char c2 = text2[pos2++];
|
||||
if (c1 != c2) {
|
||||
if (0xffff == c2)
|
||||
return 1;
|
||||
else if (0xffff == c1)
|
||||
return -1;
|
||||
else
|
||||
return c1-c2;
|
||||
} else if (0xffff == c1)
|
||||
return 0;
|
||||
}
|
||||
}
|
||||
|
||||
@Override
|
||||
void closeDocStore(SegmentWriteState state) {}
|
||||
|
||||
@Override
|
||||
void abort() {}
|
||||
|
||||
private int flushedDocCount;
|
||||
|
||||
// TODO: would be nice to factor out more of this, eg the
|
||||
// FreqProxFieldMergeState, and code to visit all Fields
|
||||
|
@ -67,6 +58,8 @@ final class FreqProxTermsWriter extends TermsHashConsumer {
|
|||
// ThreadStates
|
||||
List<FreqProxTermsWriterPerField> allFields = new ArrayList<FreqProxTermsWriterPerField>();
|
||||
|
||||
flushedDocCount = state.numDocs;
|
||||
|
||||
for (Map.Entry<TermsHashConsumerPerThread,Collection<TermsHashConsumerPerField>> entry : threadsAndFields.entrySet()) {
|
||||
|
||||
Collection<TermsHashConsumerPerField> fields = entry.getValue();
|
||||
|
@ -79,21 +72,23 @@ final class FreqProxTermsWriter extends TermsHashConsumer {
|
|||
}
|
||||
}
|
||||
|
||||
// Sort by field name
|
||||
Collections.sort(allFields);
|
||||
final int numAllFields = allFields.size();
|
||||
|
||||
// TODO: allow Lucene user to customize this consumer:
|
||||
final FormatPostingsFieldsConsumer consumer = new FormatPostingsFieldsWriter(state, fieldInfos);
|
||||
// Sort by field name
|
||||
Collections.sort(allFields);
|
||||
|
||||
// TODO: allow Lucene user to customize this codec:
|
||||
final FieldsConsumer consumer = state.codec.fieldsConsumer(state);
|
||||
|
||||
/*
|
||||
Current writer chain:
|
||||
FormatPostingsFieldsConsumer
|
||||
-> IMPL: FormatPostingsFieldsWriter
|
||||
-> FormatPostingsTermsConsumer
|
||||
-> IMPL: FormatPostingsTermsWriter
|
||||
-> FormatPostingsDocConsumer
|
||||
-> IMPL: FormatPostingsDocWriter
|
||||
-> FormatPostingsPositionsConsumer
|
||||
FieldsConsumer
|
||||
-> IMPL: FormatPostingsTermsDictWriter
|
||||
-> TermsConsumer
|
||||
-> IMPL: FormatPostingsTermsDictWriter.TermsWriter
|
||||
-> DocsConsumer
|
||||
-> IMPL: FormatPostingsDocsWriter
|
||||
-> PositionsConsumer
|
||||
-> IMPL: FormatPostingsPositionsWriter
|
||||
*/
|
||||
|
||||
|
@ -134,25 +129,29 @@ final class FreqProxTermsWriter extends TermsHashConsumer {
|
|||
FreqProxTermsWriterPerThread perThread = (FreqProxTermsWriterPerThread) entry.getKey();
|
||||
perThread.termsHashPerThread.reset(true);
|
||||
}
|
||||
|
||||
consumer.finish();
|
||||
consumer.close();
|
||||
}
|
||||
|
||||
private byte[] payloadBuffer;
|
||||
BytesRef payload;
|
||||
|
||||
/* Walk through all unique text tokens (Posting
|
||||
* instances) found in this field and serialize them
|
||||
* into a single RAM segment. */
|
||||
void appendPostings(FreqProxTermsWriterPerField[] fields,
|
||||
FormatPostingsFieldsConsumer consumer)
|
||||
FieldsConsumer consumer)
|
||||
throws CorruptIndexException, IOException {
|
||||
|
||||
int numFields = fields.length;
|
||||
|
||||
final BytesRef text = new BytesRef();
|
||||
|
||||
final FreqProxFieldMergeState[] mergeStates = new FreqProxFieldMergeState[numFields];
|
||||
|
||||
final TermsConsumer termsConsumer = consumer.addField(fields[0].fieldInfo);
|
||||
final Comparator<BytesRef> termComp = termsConsumer.getComparator();
|
||||
|
||||
for(int i=0;i<numFields;i++) {
|
||||
FreqProxFieldMergeState fms = mergeStates[i] = new FreqProxFieldMergeState(fields[i]);
|
||||
FreqProxFieldMergeState fms = mergeStates[i] = new FreqProxFieldMergeState(fields[i], termComp);
|
||||
|
||||
assert fms.field.fieldInfo == fields[0].fieldInfo;
|
||||
|
||||
|
@ -161,45 +160,63 @@ final class FreqProxTermsWriter extends TermsHashConsumer {
|
|||
assert result;
|
||||
}
|
||||
|
||||
final FormatPostingsTermsConsumer termsConsumer = consumer.addField(fields[0].fieldInfo);
|
||||
|
||||
FreqProxFieldMergeState[] termStates = new FreqProxFieldMergeState[numFields];
|
||||
|
||||
final boolean currentFieldOmitTermFreqAndPositions = fields[0].fieldInfo.omitTermFreqAndPositions;
|
||||
//System.out.println("flush terms field=" + fields[0].fieldInfo.name);
|
||||
|
||||
// TODO: really TermsHashPerField should take over most
|
||||
// of this loop, including merge sort of terms from
|
||||
// multiple threads and interacting with the
|
||||
// TermsConsumer, only calling out to us (passing us the
|
||||
// DocsConsumer) to handle delivery of docs/positions
|
||||
while(numFields > 0) {
|
||||
|
||||
// Get the next term to merge
|
||||
termStates[0] = mergeStates[0];
|
||||
int numToMerge = 1;
|
||||
|
||||
// TODO: pqueue
|
||||
for(int i=1;i<numFields;i++) {
|
||||
final char[] text = mergeStates[i].text;
|
||||
final int textOffset = mergeStates[i].textOffset;
|
||||
final int cmp = compareText(text, textOffset, termStates[0].text, termStates[0].textOffset);
|
||||
|
||||
final int cmp = termComp.compare(mergeStates[i].text, termStates[0].text);
|
||||
if (cmp < 0) {
|
||||
termStates[0] = mergeStates[i];
|
||||
numToMerge = 1;
|
||||
} else if (cmp == 0)
|
||||
} else if (cmp == 0) {
|
||||
termStates[numToMerge++] = mergeStates[i];
|
||||
}
|
||||
}
|
||||
|
||||
final FormatPostingsDocsConsumer docConsumer = termsConsumer.addTerm(termStates[0].text, termStates[0].textOffset);
|
||||
// Need shallow copy here because termStates[0].text
|
||||
// changes by the time we call finishTerm
|
||||
text.bytes = termStates[0].text.bytes;
|
||||
text.offset = termStates[0].text.offset;
|
||||
text.length = termStates[0].text.length;
|
||||
|
||||
//System.out.println(" term=" + text.toUnicodeString());
|
||||
//System.out.println(" term=" + text.toString());
|
||||
|
||||
final PostingsConsumer postingsConsumer = termsConsumer.startTerm(text);
|
||||
|
||||
// Now termStates has numToMerge FieldMergeStates
|
||||
// which all share the same term. Now we must
|
||||
// interleave the docID streams.
|
||||
int numDocs = 0;
|
||||
while(numToMerge > 0) {
|
||||
|
||||
FreqProxFieldMergeState minState = termStates[0];
|
||||
for(int i=1;i<numToMerge;i++)
|
||||
if (termStates[i].docID < minState.docID)
|
||||
for(int i=1;i<numToMerge;i++) {
|
||||
if (termStates[i].docID < minState.docID) {
|
||||
minState = termStates[i];
|
||||
}
|
||||
}
|
||||
|
||||
final int termDocFreq = minState.termFreq;
|
||||
numDocs++;
|
||||
|
||||
final FormatPostingsPositionsConsumer posConsumer = docConsumer.addDoc(minState.docID, termDocFreq);
|
||||
assert minState.docID < flushedDocCount: "doc=" + minState.docID + " maxDoc=" + flushedDocCount;
|
||||
|
||||
postingsConsumer.startDoc(minState.docID, termDocFreq);
|
||||
|
||||
final ByteSliceReader prox = minState.prox;
|
||||
|
||||
|
@ -213,33 +230,48 @@ final class FreqProxTermsWriter extends TermsHashConsumer {
|
|||
for(int j=0;j<termDocFreq;j++) {
|
||||
final int code = prox.readVInt();
|
||||
position += code >> 1;
|
||||
//System.out.println(" pos=" + position);
|
||||
|
||||
final int payloadLength;
|
||||
final BytesRef thisPayload;
|
||||
|
||||
if ((code & 1) != 0) {
|
||||
// This position has a payload
|
||||
payloadLength = prox.readVInt();
|
||||
|
||||
if (payloadBuffer == null || payloadBuffer.length < payloadLength)
|
||||
payloadBuffer = new byte[payloadLength];
|
||||
if (payload == null) {
|
||||
payload = new BytesRef();
|
||||
payload.bytes = new byte[payloadLength];
|
||||
} else if (payload.bytes.length < payloadLength) {
|
||||
payload.grow(payloadLength);
|
||||
}
|
||||
|
||||
prox.readBytes(payloadBuffer, 0, payloadLength);
|
||||
prox.readBytes(payload.bytes, 0, payloadLength);
|
||||
payload.length = payloadLength;
|
||||
thisPayload = payload;
|
||||
|
||||
} else
|
||||
} else {
|
||||
payloadLength = 0;
|
||||
thisPayload = null;
|
||||
}
|
||||
|
||||
posConsumer.addPosition(position, payloadBuffer, 0, payloadLength);
|
||||
postingsConsumer.addPosition(position, thisPayload);
|
||||
} //End for
|
||||
|
||||
posConsumer.finish();
|
||||
postingsConsumer.finishDoc();
|
||||
}
|
||||
|
||||
if (!minState.nextDoc()) {
|
||||
|
||||
// Remove from termStates
|
||||
int upto = 0;
|
||||
for(int i=0;i<numToMerge;i++)
|
||||
if (termStates[i] != minState)
|
||||
// TODO: inefficient O(N) where N = number of
|
||||
// threads that had seen this term:
|
||||
for(int i=0;i<numToMerge;i++) {
|
||||
if (termStates[i] != minState) {
|
||||
termStates[upto++] = termStates[i];
|
||||
}
|
||||
}
|
||||
numToMerge--;
|
||||
assert upto == numToMerge;
|
||||
|
||||
|
@ -258,11 +290,10 @@ final class FreqProxTermsWriter extends TermsHashConsumer {
|
|||
}
|
||||
}
|
||||
|
||||
docConsumer.finish();
|
||||
assert numDocs > 0;
|
||||
termsConsumer.finishTerm(text, numDocs);
|
||||
}
|
||||
|
||||
termsConsumer.finish();
|
||||
}
|
||||
|
||||
final UnicodeUtil.UTF8Result termsUTF8 = new UnicodeUtil.UTF8Result();
|
||||
}
|
||||
|
|
|
@ -187,25 +187,26 @@ final class FreqProxTermsWriterPerField extends TermsHashConsumerPerField implem
|
|||
int lastPositions[]; // Last position where this term occurred
|
||||
|
||||
@Override
|
||||
ParallelPostingsArray resize(int newSize) {
|
||||
FreqProxPostingsArray newArray = new FreqProxPostingsArray(newSize);
|
||||
copy(this, newArray);
|
||||
return newArray;
|
||||
ParallelPostingsArray newInstance(int size) {
|
||||
return new FreqProxPostingsArray(size);
|
||||
}
|
||||
|
||||
void copy(FreqProxPostingsArray fromArray, FreqProxPostingsArray toArray) {
|
||||
super.copy(fromArray, toArray);
|
||||
System.arraycopy(fromArray.docFreqs, 0, toArray.docFreqs, 0, fromArray.docFreqs.length);
|
||||
System.arraycopy(fromArray.lastDocIDs, 0, toArray.lastDocIDs, 0, fromArray.lastDocIDs.length);
|
||||
System.arraycopy(fromArray.lastDocCodes, 0, toArray.lastDocCodes, 0, fromArray.lastDocCodes.length);
|
||||
System.arraycopy(fromArray.lastPositions, 0, toArray.lastPositions, 0, fromArray.lastPositions.length);
|
||||
void copyTo(ParallelPostingsArray toArray, int numToCopy) {
|
||||
assert toArray instanceof FreqProxPostingsArray;
|
||||
FreqProxPostingsArray to = (FreqProxPostingsArray) toArray;
|
||||
|
||||
super.copyTo(toArray, numToCopy);
|
||||
|
||||
System.arraycopy(docFreqs, 0, to.docFreqs, 0, numToCopy);
|
||||
System.arraycopy(lastDocIDs, 0, to.lastDocIDs, 0, numToCopy);
|
||||
System.arraycopy(lastDocCodes, 0, to.lastDocCodes, 0, numToCopy);
|
||||
System.arraycopy(lastPositions, 0, to.lastPositions, 0, numToCopy);
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
@Override
|
||||
int bytesPerPosting() {
|
||||
return ParallelPostingsArray.BYTES_PER_POSTING + 4 * DocumentsWriter.INT_NUM_BYTE;
|
||||
@Override
|
||||
int bytesPerPosting() {
|
||||
return ParallelPostingsArray.BYTES_PER_POSTING + 4 * DocumentsWriter.INT_NUM_BYTE;
|
||||
}
|
||||
}
|
||||
|
||||
public void abort() {}
|
||||
|
|
|
@ -17,18 +17,20 @@ package org.apache.lucene.index;
|
|||
* limitations under the License.
|
||||
*/
|
||||
|
||||
import org.apache.lucene.store.Directory;
|
||||
|
||||
import java.io.IOException;
|
||||
import java.io.FileNotFoundException;
|
||||
import java.io.FilenameFilter;
|
||||
import java.io.IOException;
|
||||
import java.io.PrintStream;
|
||||
import java.util.Map;
|
||||
import java.util.ArrayList;
|
||||
import java.util.Collection;
|
||||
import java.util.Collections;
|
||||
import java.util.HashMap;
|
||||
|
||||
import java.util.List;
|
||||
import java.util.ArrayList;
|
||||
import java.util.Collections;
|
||||
import java.util.Collection;
|
||||
import java.util.Map;
|
||||
|
||||
import org.apache.lucene.index.codecs.CodecProvider;
|
||||
import org.apache.lucene.store.Directory;
|
||||
|
||||
/*
|
||||
* This class keeps track of each SegmentInfos instance that
|
||||
|
@ -114,6 +116,8 @@ final class IndexFileDeleter {
|
|||
infoStream.println("IFD [" + Thread.currentThread().getName() + "]: " + message);
|
||||
}
|
||||
|
||||
private final FilenameFilter indexFilenameFilter;
|
||||
|
||||
/**
|
||||
* Initialize the deleter: find all previous commits in
|
||||
* the Directory, incref the files they reference, call
|
||||
|
@ -122,7 +126,8 @@ final class IndexFileDeleter {
|
|||
* @throws CorruptIndexException if the index is corrupt
|
||||
* @throws IOException if there is a low-level IO error
|
||||
*/
|
||||
public IndexFileDeleter(Directory directory, IndexDeletionPolicy policy, SegmentInfos segmentInfos, PrintStream infoStream, DocumentsWriter docWriter)
|
||||
public IndexFileDeleter(Directory directory, IndexDeletionPolicy policy, SegmentInfos segmentInfos, PrintStream infoStream, DocumentsWriter docWriter,
|
||||
CodecProvider codecs)
|
||||
throws CorruptIndexException, IOException {
|
||||
|
||||
this.docWriter = docWriter;
|
||||
|
@ -137,7 +142,7 @@ final class IndexFileDeleter {
|
|||
// First pass: walk the files and initialize our ref
|
||||
// counts:
|
||||
long currentGen = segmentInfos.getGeneration();
|
||||
IndexFileNameFilter filter = IndexFileNameFilter.getFilter();
|
||||
indexFilenameFilter = new IndexFileNameFilter(codecs);
|
||||
|
||||
String[] files = directory.listAll();
|
||||
|
||||
|
@ -147,7 +152,7 @@ final class IndexFileDeleter {
|
|||
|
||||
String fileName = files[i];
|
||||
|
||||
if (filter.accept(null, fileName) && !fileName.equals(IndexFileNames.SEGMENTS_GEN)) {
|
||||
if ((indexFilenameFilter.accept(null, fileName)) && !fileName.endsWith("write.lock") && !fileName.equals(IndexFileNames.SEGMENTS_GEN)) {
|
||||
|
||||
// Add this file to refCounts with initial count 0:
|
||||
getRefCount(fileName);
|
||||
|
@ -163,7 +168,7 @@ final class IndexFileDeleter {
|
|||
}
|
||||
SegmentInfos sis = new SegmentInfos();
|
||||
try {
|
||||
sis.read(directory, fileName);
|
||||
sis.read(directory, fileName, codecs);
|
||||
} catch (FileNotFoundException e) {
|
||||
// LUCENE-948: on NFS (and maybe others), if
|
||||
// you have writers switching back and forth
|
||||
|
@ -200,7 +205,7 @@ final class IndexFileDeleter {
|
|||
// try now to explicitly open this commit point:
|
||||
SegmentInfos sis = new SegmentInfos();
|
||||
try {
|
||||
sis.read(directory, segmentInfos.getCurrentSegmentFileName());
|
||||
sis.read(directory, segmentInfos.getCurrentSegmentFileName(), codecs);
|
||||
} catch (IOException e) {
|
||||
throw new CorruptIndexException("failed to locate current segments_N file");
|
||||
}
|
||||
|
@ -296,7 +301,6 @@ final class IndexFileDeleter {
|
|||
*/
|
||||
public void refresh(String segmentName) throws IOException {
|
||||
String[] files = directory.listAll();
|
||||
IndexFileNameFilter filter = IndexFileNameFilter.getFilter();
|
||||
String segmentPrefix1;
|
||||
String segmentPrefix2;
|
||||
if (segmentName != null) {
|
||||
|
@ -309,8 +313,8 @@ final class IndexFileDeleter {
|
|||
|
||||
for(int i=0;i<files.length;i++) {
|
||||
String fileName = files[i];
|
||||
if (filter.accept(null, fileName) &&
|
||||
(segmentName == null || fileName.startsWith(segmentPrefix1) || fileName.startsWith(segmentPrefix2)) &&
|
||||
if ((segmentName == null || fileName.startsWith(segmentPrefix1) || fileName.startsWith(segmentPrefix2)) &&
|
||||
indexFilenameFilter.accept(null, fileName) &&
|
||||
!refCounts.containsKey(fileName) &&
|
||||
!fileName.equals(IndexFileNames.SEGMENTS_GEN)) {
|
||||
// Unreferenced file, so remove it
|
||||
|
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue