LUCENE-2370: Reintegrate flex_1458 branch into trunk (revision 931101)

git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@931278 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Uwe Schindler 2010-04-06 19:19:27 +00:00
parent 3e509789f8
commit b679816a70
361 changed files with 35338 additions and 4869 deletions

View File

@ -1,5 +1,79 @@
Lucene Change Log
======================= Flexible Indexing Branch =======================
Changes in backwards compatibility policy
* LUCENE-1458, LUCENE-2111, LUCENE-2354: Changes from flexible indexing:
- MultiReader ctor now throws IOException
- Directory.copy/Directory.copyTo now copies all files (not just
index files), since what is and isn't and index file is now
dependent on the codecs used. (Mike McCandless)
- UnicodeUtil now uses BytesRef for UTF-8 output, and some method
signatures have changed to CharSequence. These are internal APIs
and subject to change suddenly. (Robert Muir, Mike McCandless)
- Positional queries (PhraseQuery, *SpanQuery) will now throw an
exception if use them on a field that omits positions during
indexing (previously they silently returned no results).
- FieldCache.(Byte,Short,Int,Long,Float,Double}Parser's API has
changed -- each parse method now takes a BytesRef instead of a
String. If you have an existing Parser, a simple way to fix it is
invoke BytesRef.utf8ToString, and pass that String to your
existing parser. This will work, but performance would be better
if you could fix your parser to instead operate directly on the
byte[] in the BytesRef.
- The internal (experimental) API of NumericUtils changed completely
from String to BytesRef. Client code should never use this class,
so the change would normally not affect you. If you used some of
the methods to inspect terms or create TermQueries out of
prefix encoded terms, change to use BytesRef. Please note:
Do not use TermQueries to search for single numeric terms.
The recommended way is to create a corresponding NumericRangeQuery
with upper and lower bound equal and included. TermQueries do not
score correct, so the constant score mode of NRQ is the only
correct way to handle single value queries.
- NumericTokenStream now works directly on byte[] terms. If you
plug a TokenFilter on top of this stream, you will likely get
an IllegalArgumentException, because the NTS does not support
TermAttribute/CharTermAttribute. If you want to further filter
or attach Payloads to NTS, use the new NumericTermAttribute.
Bug Fixes
* LUCENE-2222: FixedIntBlockIndexInput incorrectly read one block of
0s before the actual data. (Renaud Delbru via Mike McCandless)
* LUCENE-2344: PostingsConsumer.merge was failing to call finishDoc,
which caused corruption for sep codec. Also fixed several tests to
test all 4 core codecs. (Renaud Delbru via Mike McCandless)
New features
* LUCENE-1606, LUCENE-2089: Adds AutomatonQuery, a MultiTermQuery that
matches terms against a finite-state machine. Implement WildcardQuery
and FuzzyQuery with finite-state methods. Adds RegexpQuery.
(Robert Muir, Mike McCandless, Uwe Schindler, Mark Miller)
* LUCENE-1990: Adds internal packed ints implementation, to be used
for more efficient storage of int arrays when the values are
bounded, for example for storing the terms dict index Toke Toke
Eskildsen via Mike McCandless)
* LUCENE-2321: Cutover to a more RAM efficient packed-ints based
representation for the in-memory terms dict index. (Mike
McCandless)
* LUCENE-2126: Add new classes for data (de)serialization: DataInput
and DataOutput. IndexInput and IndexOutput extend these new classes.
(Michael Busch)
======================= Trunk (not yet released) =======================
Changes in backwards compatibility policy
@ -297,8 +371,8 @@ Optimizations
Build
* LUCENE-2124: Moved the JDK-based collation support from contrib/collation
into core, and moved the ICU-based collation support into contrib/icu.
(Robert Muir)
into core, and moved the ICU-based collation support into contrib/icu.
(Robert Muir)
* LUCENE-2326: Removed SVN checkouts for backwards tests. The backwards branch
is now included in the svn repository using "svn copy" after release.

View File

@ -237,4 +237,60 @@ http://www.python.org. Full license is here:
http://www.python.org/download/releases/2.4.2/license/
Some code in src/java/org/apache/lucene/util/automaton was
derived from Brics automaton sources available at
www.brics.dk/automaton/. Here is the copyright from those sources:
/*
* Copyright (c) 2001-2009 Anders Moeller
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
* 3. The name of the author may not be used to endorse or promote products
* derived from this software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
* IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
* IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
* NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
* THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
* THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
The levenshtein automata tables in src/java/org/apache/lucene/util/automaton
were automatically generated with the moman/finenight FSA package.
Here is the copyright for those sources:
# Copyright (c) 2010, Jean-Philippe Barrette-LaPierre, <jpb@rrette.com>
#
# Permission is hereby granted, free of charge, to any person
# obtaining a copy of this software and associated documentation
# files (the "Software"), to deal in the Software without
# restriction, including without limitation the rights to use,
# copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following
# conditions:
#
# The above copyright notice and this permission notice shall be
# included in all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
# OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
# HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
# WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
# OTHER DEALINGS IN THE SOFTWARE.

View File

@ -46,3 +46,12 @@ provided by Xiaoping Gao and copyright 2009 by www.imdict.net.
ICU4J, (under contrib/icu) is licensed under an MIT styles license
(contrib/icu/lib/ICU-LICENSE.txt) and Copyright (c) 1995-2008
International Business Machines Corporation and others
Brics Automaton (under src/java/org/apache/lucene/util/automaton) is
BSD-licensed, created by Anders Møller. See http://www.brics.dk/automaton/
The levenshtein automata tables (under src/java/org/apache/lucene/util/automaton) were
automatically generated with the moman/finenight FSA library, created by
Jean-Philippe Barrette-LaPierre. This library is available under an MIT license,
see http://sites.google.com/site/rrettesite/moman and
http://bitbucket.org/jpbarrette/moman/overview/

View File

@ -21,6 +21,7 @@ import org.apache.lucene.store.Directory;
import org.apache.lucene.store.IndexOutput;
import org.apache.lucene.store.IndexInput;
import org.apache.lucene.util.BitVector;
import org.apache.lucene.index.codecs.Codec;
import java.io.IOException;
import java.util.List;
import java.util.Map;
@ -129,6 +130,12 @@ public final class SegmentInfo {
assert docStoreOffset == -1 || docStoreSegment != null: "dso=" + docStoreOffset + " dss=" + docStoreSegment + " docCount=" + docCount;
}
// stub
public SegmentInfo(String name, int docCount, Directory dir, boolean isCompoundFile, boolean hasSingleNormFile,
int docStoreOffset, String docStoreSegment, boolean docStoreIsCompoundFile, boolean hasProx,
Codec codec) {
}
/**
* Copy everything from src SegmentInfo into our instance.
*/

View File

@ -29,6 +29,8 @@ import org.apache.lucene.index.MergePolicy.MergeAbortedException;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.IndexInput;
import org.apache.lucene.store.IndexOutput;
import org.apache.lucene.index.codecs.Codec;
import org.apache.lucene.index.codecs.CodecProvider;
/**
* The SegmentMerger class combines two or more Segments, represented by an IndexReader ({@link #add},
@ -99,6 +101,11 @@ final class SegmentMerger {
termIndexInterval = writer.getTermIndexInterval();
}
// stub
SegmentMerger(Directory dir, int termIndexInterval, String name, MergePolicy.OneMerge merge, CodecProvider codecs) {
checkAbort = null;
}
boolean hasProx() {
return fieldInfos.hasProx();
}
@ -171,6 +178,11 @@ final class SegmentMerger {
}
}
// stub
final List<String> createCompoundFile(String fileName, SegmentInfo info) {
return null;
}
final List<String> createCompoundFile(String fileName)
throws IOException {
CompoundFileWriter cfsWriter =
@ -553,6 +565,11 @@ final class SegmentMerger {
}
}
// stub
Codec getCodec() {
return null;
}
private SegmentMergeQueue queue = null;
private final void mergeTerms() throws CorruptIndexException, IOException {

View File

@ -37,6 +37,7 @@ import org.apache.lucene.store.IndexInput;
import org.apache.lucene.store.IndexOutput;
import org.apache.lucene.util.BitVector;
import org.apache.lucene.util.CloseableThreadLocal;
import org.apache.lucene.index.codecs.CodecProvider;
/** @version $Id */
/**
@ -594,6 +595,17 @@ public class SegmentReader extends IndexReader implements Cloneable {
return instance;
}
// stub
public static SegmentReader get(boolean readOnly,
Directory dir,
SegmentInfo si,
int readBufferSize,
boolean doOpenStores,
int termInfosIndexDivisor,
CodecProvider codecs) {
return null;
}
void openDocStores() throws IOException {
core.openDocStores(si);
}

View File

@ -1,4 +1,4 @@
package org.apache.lucene.index;
package org.apache.lucene.index.codecs;
/**
* Licensed to the Apache Software Foundation (ASF) under one or more
@ -17,15 +17,7 @@ package org.apache.lucene.index;
* limitations under the License.
*/
import java.io.IOException;
// stub
public class Codec {
abstract class FormatPostingsPositionsConsumer {
/** Add a new position & payload. If payloadLength > 0
* you must read those bytes from the IndexInput. */
abstract void addPosition(int position, byte[] payload, int payloadOffset, int payloadLength) throws IOException;
/** Called when we are done adding positions & payloads */
abstract void finish() throws IOException;
}

View File

@ -0,0 +1,25 @@
package org.apache.lucene.index.codecs;
/**
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
// stub
public class CodecProvider {
public static CodecProvider getDefault() {
return null;
}
}

View File

@ -0,0 +1,234 @@
package org.apache.lucene.store;
/**
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
import java.io.IOException;
import java.util.HashMap;
import java.util.Map;
/**
* Abstract base class for performing read operations of Lucene's low-level
* data types.
*/
public abstract class DataInput implements Cloneable {
private byte[] bytes; // used by readString()
private char[] chars; // used by readModifiedUTF8String()
private boolean preUTF8Strings; // true if we are reading old (modified UTF8) string format
/** Reads and returns a single byte.
* @see DataOutput#writeByte(byte)
*/
public abstract byte readByte() throws IOException;
/** Reads a specified number of bytes into an array at the specified offset.
* @param b the array to read bytes into
* @param offset the offset in the array to start storing bytes
* @param len the number of bytes to read
* @see DataOutput#writeBytes(byte[],int)
*/
public abstract void readBytes(byte[] b, int offset, int len)
throws IOException;
/** Reads a specified number of bytes into an array at the
* specified offset with control over whether the read
* should be buffered (callers who have their own buffer
* should pass in "false" for useBuffer). Currently only
* {@link BufferedIndexInput} respects this parameter.
* @param b the array to read bytes into
* @param offset the offset in the array to start storing bytes
* @param len the number of bytes to read
* @param useBuffer set to false if the caller will handle
* buffering.
* @see DataOutput#writeBytes(byte[],int)
*/
public void readBytes(byte[] b, int offset, int len, boolean useBuffer)
throws IOException
{
// Default to ignoring useBuffer entirely
readBytes(b, offset, len);
}
/** Reads two bytes and returns a short.
* @see DataOutput#writeByte(byte)
*/
public short readShort() throws IOException {
return (short) (((readByte() & 0xFF) << 8) | (readByte() & 0xFF));
}
/** Reads four bytes and returns an int.
* @see DataOutput#writeInt(int)
*/
public int readInt() throws IOException {
return ((readByte() & 0xFF) << 24) | ((readByte() & 0xFF) << 16)
| ((readByte() & 0xFF) << 8) | (readByte() & 0xFF);
}
/** Reads an int stored in variable-length format. Reads between one and
* five bytes. Smaller values take fewer bytes. Negative numbers are not
* supported.
* @see DataOutput#writeVInt(int)
*/
public int readVInt() throws IOException {
byte b = readByte();
int i = b & 0x7F;
for (int shift = 7; (b & 0x80) != 0; shift += 7) {
b = readByte();
i |= (b & 0x7F) << shift;
}
return i;
}
/** Reads eight bytes and returns a long.
* @see DataOutput#writeLong(long)
*/
public long readLong() throws IOException {
return (((long)readInt()) << 32) | (readInt() & 0xFFFFFFFFL);
}
/** Reads a long stored in variable-length format. Reads between one and
* nine bytes. Smaller values take fewer bytes. Negative numbers are not
* supported. */
public long readVLong() throws IOException {
byte b = readByte();
long i = b & 0x7F;
for (int shift = 7; (b & 0x80) != 0; shift += 7) {
b = readByte();
i |= (b & 0x7FL) << shift;
}
return i;
}
/** Call this if readString should read characters stored
* in the old modified UTF8 format (length in java chars
* and java's modified UTF8 encoding). This is used for
* indices written pre-2.4 See LUCENE-510 for details. */
public void setModifiedUTF8StringsMode() {
preUTF8Strings = true;
}
/** Reads a string.
* @see DataOutput#writeString(String)
*/
public String readString() throws IOException {
if (preUTF8Strings)
return readModifiedUTF8String();
int length = readVInt();
if (bytes == null || length > bytes.length)
bytes = new byte[(int) (length*1.25)];
readBytes(bytes, 0, length);
return new String(bytes, 0, length, "UTF-8");
}
private String readModifiedUTF8String() throws IOException {
int length = readVInt();
if (chars == null || length > chars.length)
chars = new char[length];
readChars(chars, 0, length);
return new String(chars, 0, length);
}
/** Reads Lucene's old "modified UTF-8" encoded
* characters into an array.
* @param buffer the array to read characters into
* @param start the offset in the array to start storing characters
* @param length the number of characters to read
* @see DataOutput#writeChars(String,int,int)
* @deprecated -- please use readString or readBytes
* instead, and construct the string
* from those utf8 bytes
*/
@Deprecated
public void readChars(char[] buffer, int start, int length)
throws IOException {
final int end = start + length;
for (int i = start; i < end; i++) {
byte b = readByte();
if ((b & 0x80) == 0)
buffer[i] = (char)(b & 0x7F);
else if ((b & 0xE0) != 0xE0) {
buffer[i] = (char)(((b & 0x1F) << 6)
| (readByte() & 0x3F));
} else {
buffer[i] = (char)(((b & 0x0F) << 12)
| ((readByte() & 0x3F) << 6)
| (readByte() & 0x3F));
}
}
}
/**
* Expert
*
* Similar to {@link #readChars(char[], int, int)} but does not do any conversion operations on the bytes it is reading in. It still
* has to invoke {@link #readByte()} just as {@link #readChars(char[], int, int)} does, but it does not need a buffer to store anything
* and it does not have to do any of the bitwise operations, since we don't actually care what is in the byte except to determine
* how many more bytes to read
* @param length The number of chars to read
* @deprecated this method operates on old "modified utf8" encoded
* strings
*/
@Deprecated
public void skipChars(int length) throws IOException{
for (int i = 0; i < length; i++) {
byte b = readByte();
if ((b & 0x80) == 0){
//do nothing, we only need one byte
} else if ((b & 0xE0) != 0xE0) {
readByte();//read an additional byte
} else {
//read two additional bytes.
readByte();
readByte();
}
}
}
/** Returns a clone of this stream.
*
* <p>Clones of a stream access the same data, and are positioned at the same
* point as the stream they were cloned from.
*
* <p>Expert: Subclasses must ensure that clones may be positioned at
* different points in the input from each other and from the stream they
* were cloned from.
*/
@Override
public Object clone() {
DataInput clone = null;
try {
clone = (DataInput)super.clone();
} catch (CloneNotSupportedException e) {}
clone.bytes = null;
clone.chars = null;
return clone;
}
public Map<String,String> readStringStringMap() throws IOException {
final Map<String,String> map = new HashMap<String,String>();
final int count = readInt();
for(int i=0;i<count;i++) {
final String key = readString();
final String val = readString();
map.put(key, val);
}
return map;
}
}

View File

@ -0,0 +1,194 @@
package org.apache.lucene.store;
/**
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
import java.io.IOException;
import java.util.Map;
import org.apache.lucene.util.BytesRef;
import org.apache.lucene.util.UnicodeUtil;
/**
* Abstract base class for performing write operations of Lucene's low-level
* data types.
*/
public abstract class DataOutput {
private BytesRef utf8Result = new BytesRef(10);
/** Writes a single byte.
* @see IndexInput#readByte()
*/
public abstract void writeByte(byte b) throws IOException;
/** Writes an array of bytes.
* @param b the bytes to write
* @param length the number of bytes to write
* @see DataInput#readBytes(byte[],int,int)
*/
public void writeBytes(byte[] b, int length) throws IOException {
writeBytes(b, 0, length);
}
/** Writes an array of bytes.
* @param b the bytes to write
* @param offset the offset in the byte array
* @param length the number of bytes to write
* @see DataInput#readBytes(byte[],int,int)
*/
public abstract void writeBytes(byte[] b, int offset, int length) throws IOException;
/** Writes an int as four bytes.
* @see DataInput#readInt()
*/
public void writeInt(int i) throws IOException {
writeByte((byte)(i >> 24));
writeByte((byte)(i >> 16));
writeByte((byte)(i >> 8));
writeByte((byte) i);
}
/** Writes an int in a variable-length format. Writes between one and
* five bytes. Smaller values take fewer bytes. Negative numbers are not
* supported.
* @see DataInput#readVInt()
*/
public void writeVInt(int i) throws IOException {
while ((i & ~0x7F) != 0) {
writeByte((byte)((i & 0x7f) | 0x80));
i >>>= 7;
}
writeByte((byte)i);
}
/** Writes a long as eight bytes.
* @see DataInput#readLong()
*/
public void writeLong(long i) throws IOException {
writeInt((int) (i >> 32));
writeInt((int) i);
}
/** Writes an long in a variable-length format. Writes between one and five
* bytes. Smaller values take fewer bytes. Negative numbers are not
* supported.
* @see DataInput#readVLong()
*/
public void writeVLong(long i) throws IOException {
while ((i & ~0x7F) != 0) {
writeByte((byte)((i & 0x7f) | 0x80));
i >>>= 7;
}
writeByte((byte)i);
}
/** Writes a string.
* @see DataInput#readString()
*/
public void writeString(String s) throws IOException {
UnicodeUtil.UTF16toUTF8(s, 0, s.length(), utf8Result);
writeVInt(utf8Result.length);
writeBytes(utf8Result.bytes, 0, utf8Result.length);
}
/** Writes a sub sequence of characters from s as the old
* format (modified UTF-8 encoded bytes).
* @param s the source of the characters
* @param start the first character in the sequence
* @param length the number of characters in the sequence
* @deprecated -- please pre-convert to utf8 bytes
* instead or use {@link #writeString}
*/
@Deprecated
public void writeChars(String s, int start, int length)
throws IOException {
final int end = start + length;
for (int i = start; i < end; i++) {
final int code = s.charAt(i);
if (code >= 0x01 && code <= 0x7F)
writeByte((byte)code);
else if (((code >= 0x80) && (code <= 0x7FF)) || code == 0) {
writeByte((byte)(0xC0 | (code >> 6)));
writeByte((byte)(0x80 | (code & 0x3F)));
} else {
writeByte((byte)(0xE0 | (code >>> 12)));
writeByte((byte)(0x80 | ((code >> 6) & 0x3F)));
writeByte((byte)(0x80 | (code & 0x3F)));
}
}
}
/** Writes a sub sequence of characters from char[] as
* the old format (modified UTF-8 encoded bytes).
* @param s the source of the characters
* @param start the first character in the sequence
* @param length the number of characters in the sequence
* @deprecated -- please pre-convert to utf8 bytes instead or use {@link #writeString}
*/
@Deprecated
public void writeChars(char[] s, int start, int length)
throws IOException {
final int end = start + length;
for (int i = start; i < end; i++) {
final int code = s[i];
if (code >= 0x01 && code <= 0x7F)
writeByte((byte)code);
else if (((code >= 0x80) && (code <= 0x7FF)) || code == 0) {
writeByte((byte)(0xC0 | (code >> 6)));
writeByte((byte)(0x80 | (code & 0x3F)));
} else {
writeByte((byte)(0xE0 | (code >>> 12)));
writeByte((byte)(0x80 | ((code >> 6) & 0x3F)));
writeByte((byte)(0x80 | (code & 0x3F)));
}
}
}
private static int COPY_BUFFER_SIZE = 16384;
private byte[] copyBuffer;
/** Copy numBytes bytes from input to ourself. */
public void copyBytes(DataInput input, long numBytes) throws IOException {
assert numBytes >= 0: "numBytes=" + numBytes;
long left = numBytes;
if (copyBuffer == null)
copyBuffer = new byte[COPY_BUFFER_SIZE];
while(left > 0) {
final int toCopy;
if (left > COPY_BUFFER_SIZE)
toCopy = COPY_BUFFER_SIZE;
else
toCopy = (int) left;
input.readBytes(copyBuffer, 0, toCopy);
writeBytes(copyBuffer, 0, toCopy);
left -= toCopy;
}
}
public void writeStringStringMap(Map<String,String> map) throws IOException {
if (map == null) {
writeInt(0);
} else {
writeInt(map.size());
for(final Map.Entry<String, String> entry: map.entrySet()) {
writeString(entry.getKey());
writeString(entry.getValue());
}
}
}
}

View File

@ -17,180 +17,14 @@ package org.apache.lucene.store;
* limitations under the License.
*/
import java.io.IOException;
import java.io.Closeable;
import java.util.Map;
import java.util.HashMap;
import java.io.IOException;
/** Abstract base class for input from a file in a {@link Directory}. A
* random-access input stream. Used for all Lucene index input operations.
* @see Directory
*/
public abstract class IndexInput implements Cloneable,Closeable {
private byte[] bytes; // used by readString()
private char[] chars; // used by readModifiedUTF8String()
private boolean preUTF8Strings; // true if we are reading old (modified UTF8) string format
/** Reads and returns a single byte.
* @see IndexOutput#writeByte(byte)
*/
public abstract byte readByte() throws IOException;
/** Reads a specified number of bytes into an array at the specified offset.
* @param b the array to read bytes into
* @param offset the offset in the array to start storing bytes
* @param len the number of bytes to read
* @see IndexOutput#writeBytes(byte[],int)
*/
public abstract void readBytes(byte[] b, int offset, int len)
throws IOException;
/** Reads a specified number of bytes into an array at the
* specified offset with control over whether the read
* should be buffered (callers who have their own buffer
* should pass in "false" for useBuffer). Currently only
* {@link BufferedIndexInput} respects this parameter.
* @param b the array to read bytes into
* @param offset the offset in the array to start storing bytes
* @param len the number of bytes to read
* @param useBuffer set to false if the caller will handle
* buffering.
* @see IndexOutput#writeBytes(byte[],int)
*/
public void readBytes(byte[] b, int offset, int len, boolean useBuffer)
throws IOException
{
// Default to ignoring useBuffer entirely
readBytes(b, offset, len);
}
/** Reads four bytes and returns an int.
* @see IndexOutput#writeInt(int)
*/
public int readInt() throws IOException {
return ((readByte() & 0xFF) << 24) | ((readByte() & 0xFF) << 16)
| ((readByte() & 0xFF) << 8) | (readByte() & 0xFF);
}
/** Reads an int stored in variable-length format. Reads between one and
* five bytes. Smaller values take fewer bytes. Negative numbers are not
* supported.
* @see IndexOutput#writeVInt(int)
*/
public int readVInt() throws IOException {
byte b = readByte();
int i = b & 0x7F;
for (int shift = 7; (b & 0x80) != 0; shift += 7) {
b = readByte();
i |= (b & 0x7F) << shift;
}
return i;
}
/** Reads eight bytes and returns a long.
* @see IndexOutput#writeLong(long)
*/
public long readLong() throws IOException {
return (((long)readInt()) << 32) | (readInt() & 0xFFFFFFFFL);
}
/** Reads a long stored in variable-length format. Reads between one and
* nine bytes. Smaller values take fewer bytes. Negative numbers are not
* supported. */
public long readVLong() throws IOException {
byte b = readByte();
long i = b & 0x7F;
for (int shift = 7; (b & 0x80) != 0; shift += 7) {
b = readByte();
i |= (b & 0x7FL) << shift;
}
return i;
}
/** Call this if readString should read characters stored
* in the old modified UTF8 format (length in java chars
* and java's modified UTF8 encoding). This is used for
* indices written pre-2.4 See LUCENE-510 for details. */
public void setModifiedUTF8StringsMode() {
preUTF8Strings = true;
}
/** Reads a string.
* @see IndexOutput#writeString(String)
*/
public String readString() throws IOException {
if (preUTF8Strings)
return readModifiedUTF8String();
int length = readVInt();
if (bytes == null || length > bytes.length)
bytes = new byte[(int) (length*1.25)];
readBytes(bytes, 0, length);
return new String(bytes, 0, length, "UTF-8");
}
private String readModifiedUTF8String() throws IOException {
int length = readVInt();
if (chars == null || length > chars.length)
chars = new char[length];
readChars(chars, 0, length);
return new String(chars, 0, length);
}
/** Reads Lucene's old "modified UTF-8" encoded
* characters into an array.
* @param buffer the array to read characters into
* @param start the offset in the array to start storing characters
* @param length the number of characters to read
* @see IndexOutput#writeChars(String,int,int)
* @deprecated -- please use readString or readBytes
* instead, and construct the string
* from those utf8 bytes
*/
public void readChars(char[] buffer, int start, int length)
throws IOException {
final int end = start + length;
for (int i = start; i < end; i++) {
byte b = readByte();
if ((b & 0x80) == 0)
buffer[i] = (char)(b & 0x7F);
else if ((b & 0xE0) != 0xE0) {
buffer[i] = (char)(((b & 0x1F) << 6)
| (readByte() & 0x3F));
} else
buffer[i] = (char)(((b & 0x0F) << 12)
| ((readByte() & 0x3F) << 6)
| (readByte() & 0x3F));
}
}
/**
* Expert
*
* Similar to {@link #readChars(char[], int, int)} but does not do any conversion operations on the bytes it is reading in. It still
* has to invoke {@link #readByte()} just as {@link #readChars(char[], int, int)} does, but it does not need a buffer to store anything
* and it does not have to do any of the bitwise operations, since we don't actually care what is in the byte except to determine
* how many more bytes to read
* @param length The number of chars to read
* @deprecated this method operates on old "modified utf8" encoded
* strings
*/
public void skipChars(int length) throws IOException{
for (int i = 0; i < length; i++) {
byte b = readByte();
if ((b & 0x80) == 0){
//do nothing, we only need one byte
}
else if ((b & 0xE0) != 0xE0) {
readByte();//read an additional byte
} else{
//read two additional bytes.
readByte();
readByte();
}
}
}
public abstract class IndexInput extends DataInput implements Cloneable,Closeable {
/** Closes the stream to further operations. */
public abstract void close() throws IOException;
@ -207,38 +41,4 @@ public abstract class IndexInput implements Cloneable,Closeable {
/** The number of bytes in the file. */
public abstract long length();
/** Returns a clone of this stream.
*
* <p>Clones of a stream access the same data, and are positioned at the same
* point as the stream they were cloned from.
*
* <p>Expert: Subclasses must ensure that clones may be positioned at
* different points in the input from each other and from the stream they
* were cloned from.
*/
@Override
public Object clone() {
IndexInput clone = null;
try {
clone = (IndexInput)super.clone();
} catch (CloneNotSupportedException e) {}
clone.bytes = null;
clone.chars = null;
return clone;
}
public Map<String,String> readStringStringMap() throws IOException {
final Map<String,String> map = new HashMap<String,String>();
final int count = readInt();
for(int i=0;i<count;i++) {
final String key = readString();
final String val = readString();
map.put(key, val);
}
return map;
}
}

View File

@ -17,166 +17,15 @@ package org.apache.lucene.store;
* limitations under the License.
*/
import java.io.IOException;
import java.io.Closeable;
import java.util.Map;
import org.apache.lucene.util.UnicodeUtil;
import java.io.IOException;
/** Abstract base class for output to a file in a Directory. A random-access
* output stream. Used for all Lucene index output operations.
* @see Directory
* @see IndexInput
*/
public abstract class IndexOutput implements Closeable {
private UnicodeUtil.UTF8Result utf8Result = new UnicodeUtil.UTF8Result();
/** Writes a single byte.
* @see IndexInput#readByte()
*/
public abstract void writeByte(byte b) throws IOException;
/** Writes an array of bytes.
* @param b the bytes to write
* @param length the number of bytes to write
* @see IndexInput#readBytes(byte[],int,int)
*/
public void writeBytes(byte[] b, int length) throws IOException {
writeBytes(b, 0, length);
}
/** Writes an array of bytes.
* @param b the bytes to write
* @param offset the offset in the byte array
* @param length the number of bytes to write
* @see IndexInput#readBytes(byte[],int,int)
*/
public abstract void writeBytes(byte[] b, int offset, int length) throws IOException;
/** Writes an int as four bytes.
* @see IndexInput#readInt()
*/
public void writeInt(int i) throws IOException {
writeByte((byte)(i >> 24));
writeByte((byte)(i >> 16));
writeByte((byte)(i >> 8));
writeByte((byte) i);
}
/** Writes an int in a variable-length format. Writes between one and
* five bytes. Smaller values take fewer bytes. Negative numbers are not
* supported.
* @see IndexInput#readVInt()
*/
public void writeVInt(int i) throws IOException {
while ((i & ~0x7F) != 0) {
writeByte((byte)((i & 0x7f) | 0x80));
i >>>= 7;
}
writeByte((byte)i);
}
/** Writes a long as eight bytes.
* @see IndexInput#readLong()
*/
public void writeLong(long i) throws IOException {
writeInt((int) (i >> 32));
writeInt((int) i);
}
/** Writes an long in a variable-length format. Writes between one and five
* bytes. Smaller values take fewer bytes. Negative numbers are not
* supported.
* @see IndexInput#readVLong()
*/
public void writeVLong(long i) throws IOException {
while ((i & ~0x7F) != 0) {
writeByte((byte)((i & 0x7f) | 0x80));
i >>>= 7;
}
writeByte((byte)i);
}
/** Writes a string.
* @see IndexInput#readString()
*/
public void writeString(String s) throws IOException {
UnicodeUtil.UTF16toUTF8(s, 0, s.length(), utf8Result);
writeVInt(utf8Result.length);
writeBytes(utf8Result.result, 0, utf8Result.length);
}
/** Writes a sub sequence of characters from s as the old
* format (modified UTF-8 encoded bytes).
* @param s the source of the characters
* @param start the first character in the sequence
* @param length the number of characters in the sequence
* @deprecated -- please pre-convert to utf8 bytes
* instead or use {@link #writeString}
*/
public void writeChars(String s, int start, int length)
throws IOException {
final int end = start + length;
for (int i = start; i < end; i++) {
final int code = (int)s.charAt(i);
if (code >= 0x01 && code <= 0x7F)
writeByte((byte)code);
else if (((code >= 0x80) && (code <= 0x7FF)) || code == 0) {
writeByte((byte)(0xC0 | (code >> 6)));
writeByte((byte)(0x80 | (code & 0x3F)));
} else {
writeByte((byte)(0xE0 | (code >>> 12)));
writeByte((byte)(0x80 | ((code >> 6) & 0x3F)));
writeByte((byte)(0x80 | (code & 0x3F)));
}
}
}
/** Writes a sub sequence of characters from char[] as
* the old format (modified UTF-8 encoded bytes).
* @param s the source of the characters
* @param start the first character in the sequence
* @param length the number of characters in the sequence
* @deprecated -- please pre-convert to utf8 bytes instead or use {@link #writeString}
*/
public void writeChars(char[] s, int start, int length)
throws IOException {
final int end = start + length;
for (int i = start; i < end; i++) {
final int code = (int)s[i];
if (code >= 0x01 && code <= 0x7F)
writeByte((byte)code);
else if (((code >= 0x80) && (code <= 0x7FF)) || code == 0) {
writeByte((byte)(0xC0 | (code >> 6)));
writeByte((byte)(0x80 | (code & 0x3F)));
} else {
writeByte((byte)(0xE0 | (code >>> 12)));
writeByte((byte)(0x80 | ((code >> 6) & 0x3F)));
writeByte((byte)(0x80 | (code & 0x3F)));
}
}
}
private static int COPY_BUFFER_SIZE = 16384;
private byte[] copyBuffer;
/** Copy numBytes bytes from input to ourself. */
public void copyBytes(IndexInput input, long numBytes) throws IOException {
assert numBytes >= 0: "numBytes=" + numBytes;
long left = numBytes;
if (copyBuffer == null)
copyBuffer = new byte[COPY_BUFFER_SIZE];
while(left > 0) {
final int toCopy;
if (left > COPY_BUFFER_SIZE)
toCopy = COPY_BUFFER_SIZE;
else
toCopy = (int) left;
input.readBytes(copyBuffer, 0, toCopy);
writeBytes(copyBuffer, 0, toCopy);
left -= toCopy;
}
}
public abstract class IndexOutput extends DataOutput implements Closeable {
/** Forces any buffered output to be written. */
public abstract void flush() throws IOException;
@ -208,17 +57,5 @@ public abstract class IndexOutput implements Closeable {
* undefined. Otherwise the file is truncated.
* @param length file length
*/
public void setLength(long length) throws IOException {};
public void writeStringStringMap(Map<String,String> map) throws IOException {
if (map == null) {
writeInt(0);
} else {
writeInt(map.size());
for(final Map.Entry<String, String> entry: map.entrySet()) {
writeString(entry.getKey());
writeString(entry.getValue());
}
}
}
public void setLength(long length) throws IOException {}
}

View File

@ -0,0 +1,27 @@
package org.apache.lucene.util;
/**
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
// stub for tests only
public class BytesRef {
public BytesRef(int capacity) {}
public BytesRef() {}
public byte[] bytes;
public int offset;
public int length;
};

View File

@ -106,6 +106,10 @@ final public class UnicodeUtil {
}
}
// stubs for tests only
public static void UTF16toUTF8(char[] source, int offset, int length, BytesRef result) {}
public static void UTF16toUTF8(CharSequence s, int offset, int length, BytesRef result) {}
/** Encode characters from a char[] source, starting at
* offset and stopping when the character 0xffff is seen.
* Returns the number of bytes written to bytesOut. */
@ -223,7 +227,7 @@ final public class UnicodeUtil {
/** Encode characters from this String, starting at offset
* for length characters. Returns the number of bytes
* written to bytesOut. */
public static void UTF16toUTF8(final String s, final int offset, final int length, UTF8Result result) {
public static void UTF16toUTF8(final CharSequence s, final int offset, final int length, UTF8Result result) {
final int end = offset + length;
byte[] out = result.result;

View File

@ -1,73 +0,0 @@
package org.apache.lucene.analysis;
/**
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
import org.apache.lucene.util.NumericUtils;
import org.apache.lucene.analysis.tokenattributes.TermAttribute;
import org.apache.lucene.analysis.tokenattributes.TypeAttribute;
public class TestNumericTokenStream extends BaseTokenStreamTestCase {
static final long lvalue = 4573245871874382L;
static final int ivalue = 123456;
public void testLongStream() throws Exception {
final NumericTokenStream stream=new NumericTokenStream().setLongValue(lvalue);
// use getAttribute to test if attributes really exist, if not an IAE will be throwed
final TermAttribute termAtt = stream.getAttribute(TermAttribute.class);
final TypeAttribute typeAtt = stream.getAttribute(TypeAttribute.class);
for (int shift=0; shift<64; shift+=NumericUtils.PRECISION_STEP_DEFAULT) {
assertTrue("New token is available", stream.incrementToken());
assertEquals("Term is correctly encoded", NumericUtils.longToPrefixCoded(lvalue, shift), termAtt.term());
assertEquals("Type correct", (shift == 0) ? NumericTokenStream.TOKEN_TYPE_FULL_PREC : NumericTokenStream.TOKEN_TYPE_LOWER_PREC, typeAtt.type());
}
assertFalse("No more tokens available", stream.incrementToken());
}
public void testIntStream() throws Exception {
final NumericTokenStream stream=new NumericTokenStream().setIntValue(ivalue);
// use getAttribute to test if attributes really exist, if not an IAE will be throwed
final TermAttribute termAtt = stream.getAttribute(TermAttribute.class);
final TypeAttribute typeAtt = stream.getAttribute(TypeAttribute.class);
for (int shift=0; shift<32; shift+=NumericUtils.PRECISION_STEP_DEFAULT) {
assertTrue("New token is available", stream.incrementToken());
assertEquals("Term is correctly encoded", NumericUtils.intToPrefixCoded(ivalue, shift), termAtt.term());
assertEquals("Type correct", (shift == 0) ? NumericTokenStream.TOKEN_TYPE_FULL_PREC : NumericTokenStream.TOKEN_TYPE_LOWER_PREC, typeAtt.type());
}
assertFalse("No more tokens available", stream.incrementToken());
}
public void testNotInitialized() throws Exception {
final NumericTokenStream stream=new NumericTokenStream();
try {
stream.reset();
fail("reset() should not succeed.");
} catch (IllegalStateException e) {
// pass
}
try {
stream.incrementToken();
fail("incrementToken() should not succeed.");
} catch (IllegalStateException e) {
// pass
}
}
}

View File

@ -107,10 +107,10 @@ public class TestTermAttributeImpl extends LuceneTestCase {
char[] b = {'a', 'l', 'o', 'h', 'a'};
TermAttributeImpl t = new TermAttributeImpl();
t.setTermBuffer(b, 0, 5);
assertEquals("term=aloha", t.toString());
assertEquals("aloha", t.toString());
t.setTermBuffer("hi there");
assertEquals("term=hi there", t.toString());
assertEquals("hi there", t.toString());
}
public void testMixedStringArray() throws Exception {

View File

@ -35,6 +35,7 @@ import org.apache.lucene.document.Field;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.LuceneTestCase;
import org.apache.lucene.index.codecs.CodecProvider;
/** JUnit adaptation of an older test case DocTest. */
@ -180,20 +181,24 @@ public class TestDoc extends LuceneTestCase {
SegmentReader r1 = SegmentReader.get(true, si1, IndexReader.DEFAULT_TERMS_INDEX_DIVISOR);
SegmentReader r2 = SegmentReader.get(true, si2, IndexReader.DEFAULT_TERMS_INDEX_DIVISOR);
SegmentMerger merger = new SegmentMerger(si1.dir, merged);
SegmentMerger merger = new SegmentMerger(si1.dir, IndexWriter.DEFAULT_TERM_INDEX_INTERVAL, merged, null, CodecProvider.getDefault());
merger.add(r1);
merger.add(r2);
merger.merge();
merger.closeReaders();
final SegmentInfo info = new SegmentInfo(merged, si1.docCount + si2.docCount, si1.dir,
useCompoundFile, true, -1, null, false, merger.hasProx(),
merger.getCodec());
if (useCompoundFile) {
List filesToDelete = merger.createCompoundFile(merged + ".cfs");
List filesToDelete = merger.createCompoundFile(merged + ".cfs", info);
for (Iterator iter = filesToDelete.iterator(); iter.hasNext();)
si1.dir.deleteFile((String) iter.next());
}
return new SegmentInfo(merged, si1.docCount + si2.docCount, si1.dir, useCompoundFile, true);
return info;
}

View File

@ -986,29 +986,7 @@ public class TestIndexReader extends LuceneTestCase
// new IndexFileDeleter, have it delete
// unreferenced files, then verify that in fact
// no files were deleted:
String[] startFiles = dir.listAll();
SegmentInfos infos = new SegmentInfos();
infos.read(dir);
new IndexFileDeleter(dir, new KeepOnlyLastCommitDeletionPolicy(), infos, null, null);
String[] endFiles = dir.listAll();
Arrays.sort(startFiles);
Arrays.sort(endFiles);
//for(int i=0;i<startFiles.length;i++) {
// System.out.println(" startFiles: " + i + ": " + startFiles[i]);
//}
if (!Arrays.equals(startFiles, endFiles)) {
String successStr;
if (success) {
successStr = "success";
} else {
successStr = "IOException";
err.printStackTrace();
}
fail("reader.close() failed to delete unreferenced files after " + successStr + " (" + diskFree + " bytes): before delete:\n " + arrayToString(startFiles) + "\n after delete:\n " + arrayToString(endFiles));
}
TestIndexWriter.assertNoUnreferencedFiles(dir, "reader.close() failed to delete unreferenced files");
// Finally, verify index is not corrupt, and, if
// we succeeded, we see all docs changed, and if
@ -1760,7 +1738,6 @@ public class TestIndexReader extends LuceneTestCase
} catch (IllegalStateException ise) {
// expected
}
assertFalse(((SegmentReader) r.getSequentialSubReaders()[0]).termsIndexLoaded());
assertEquals(-1, ((SegmentReader) r.getSequentialSubReaders()[0]).getTermInfosIndexDivisor());
writer = new IndexWriter(dir, new WhitespaceAnalyzer(), IndexWriter.MaxFieldLength.UNLIMITED);
@ -1773,7 +1750,12 @@ public class TestIndexReader extends LuceneTestCase
IndexReader[] subReaders = r2.getSequentialSubReaders();
assertEquals(2, subReaders.length);
for(int i=0;i<2;i++) {
assertFalse(((SegmentReader) subReaders[i]).termsIndexLoaded());
try {
subReaders[i].docFreq(new Term("field", "f"));
fail("did not hit expected exception");
} catch (IllegalStateException ise) {
// expected
}
}
r2.close();
dir.close();

View File

@ -61,8 +61,10 @@ import org.apache.lucene.store.IndexOutput;
import org.apache.lucene.store.Lock;
import org.apache.lucene.store.LockFactory;
import org.apache.lucene.store.MockRAMDirectory;
import org.apache.lucene.store.NoLockFactory;
import org.apache.lucene.store.RAMDirectory;
import org.apache.lucene.store.SingleInstanceLockFactory;
import org.apache.lucene.util.BytesRef;
import org.apache.lucene.util.UnicodeUtil;
import org.apache.lucene.util._TestUtil;
import org.apache.lucene.util.Version;
@ -524,10 +526,15 @@ public class TestIndexWriter extends LuceneTestCase {
}
public static void assertNoUnreferencedFiles(Directory dir, String message) throws IOException {
String[] startFiles = dir.listAll();
SegmentInfos infos = new SegmentInfos();
infos.read(dir);
new IndexFileDeleter(dir, new KeepOnlyLastCommitDeletionPolicy(), infos, null, null);
final LockFactory lf = dir.getLockFactory();
String[] startFiles;
try {
dir.setLockFactory(new NoLockFactory());
startFiles = dir.listAll();
new IndexWriter(dir, new WhitespaceAnalyzer(), IndexWriter.MaxFieldLength.UNLIMITED).close();
} finally {
dir.setLockFactory(lf);
}
String[] endFiles = dir.listAll();
Arrays.sort(startFiles);
@ -3309,7 +3316,7 @@ public class TestIndexWriter extends LuceneTestCase {
// LUCENE-510
public void testAllUnicodeChars() throws Throwable {
UnicodeUtil.UTF8Result utf8 = new UnicodeUtil.UTF8Result();
BytesRef utf8 = new BytesRef(10);
UnicodeUtil.UTF16Result utf16 = new UnicodeUtil.UTF16Result();
char[] chars = new char[2];
for(int ch=0;ch<0x0010FFFF;ch++) {
@ -3329,16 +3336,16 @@ public class TestIndexWriter extends LuceneTestCase {
UnicodeUtil.UTF16toUTF8(chars, 0, len, utf8);
String s1 = new String(chars, 0, len);
String s2 = new String(utf8.result, 0, utf8.length, "UTF-8");
String s2 = new String(utf8.bytes, 0, utf8.length, "UTF-8");
assertEquals("codepoint " + ch, s1, s2);
UnicodeUtil.UTF8toUTF16(utf8.result, 0, utf8.length, utf16);
UnicodeUtil.UTF8toUTF16(utf8.bytes, 0, utf8.length, utf16);
assertEquals("codepoint " + ch, s1, new String(utf16.result, 0, utf16.length));
byte[] b = s1.getBytes("UTF-8");
assertEquals(utf8.length, b.length);
for(int j=0;j<utf8.length;j++)
assertEquals(utf8.result[j], b[j]);
assertEquals(utf8.bytes[j], b[j]);
}
}
@ -3403,7 +3410,7 @@ public class TestIndexWriter extends LuceneTestCase {
char[] buffer = new char[20];
char[] expected = new char[20];
UnicodeUtil.UTF8Result utf8 = new UnicodeUtil.UTF8Result();
BytesRef utf8 = new BytesRef(20);
UnicodeUtil.UTF16Result utf16 = new UnicodeUtil.UTF16Result();
for(int iter=0;iter<100000;iter++) {
@ -3414,10 +3421,10 @@ public class TestIndexWriter extends LuceneTestCase {
byte[] b = new String(buffer, 0, 20).getBytes("UTF-8");
assertEquals(b.length, utf8.length);
for(int i=0;i<b.length;i++)
assertEquals(b[i], utf8.result[i]);
assertEquals(b[i], utf8.bytes[i]);
}
UnicodeUtil.UTF8toUTF16(utf8.result, 0, utf8.length, utf16);
UnicodeUtil.UTF8toUTF16(utf8.bytes, 0, utf8.length, utf16);
assertEquals(utf16.length, 20);
for(int i=0;i<20;i++)
assertEquals(expected[i], utf16.result[i]);
@ -3430,7 +3437,7 @@ public class TestIndexWriter extends LuceneTestCase {
char[] buffer = new char[20];
char[] expected = new char[20];
UnicodeUtil.UTF8Result utf8 = new UnicodeUtil.UTF8Result();
BytesRef utf8 = new BytesRef(20);
UnicodeUtil.UTF16Result utf16 = new UnicodeUtil.UTF16Result();
UnicodeUtil.UTF16Result utf16a = new UnicodeUtil.UTF16Result();
@ -3453,7 +3460,7 @@ public class TestIndexWriter extends LuceneTestCase {
byte[] b = new String(buffer, 0, 20).getBytes("UTF-8");
assertEquals(b.length, utf8.length);
for(int i=0;i<b.length;i++)
assertEquals(b[i], utf8.result[i]);
assertEquals(b[i], utf8.bytes[i]);
}
int bytePrefix = 20;
@ -3461,18 +3468,18 @@ public class TestIndexWriter extends LuceneTestCase {
bytePrefix = 0;
else
for(int i=0;i<20;i++)
if (last[i] != utf8.result[i]) {
if (last[i] != utf8.bytes[i]) {
bytePrefix = i;
break;
}
System.arraycopy(utf8.result, 0, last, 0, utf8.length);
System.arraycopy(utf8.bytes, 0, last, 0, utf8.length);
UnicodeUtil.UTF8toUTF16(utf8.result, bytePrefix, utf8.length-bytePrefix, utf16);
UnicodeUtil.UTF8toUTF16(utf8.bytes, bytePrefix, utf8.length-bytePrefix, utf16);
assertEquals(20, utf16.length);
for(int i=0;i<20;i++)
assertEquals(expected[i], utf16.result[i]);
UnicodeUtil.UTF8toUTF16(utf8.result, 0, utf8.length, utf16a);
UnicodeUtil.UTF8toUTF16(utf8.bytes, 0, utf8.length, utf16a);
assertEquals(20, utf16a.length);
for(int i=0;i<20;i++)
assertEquals(expected[i], utf16a.result[i]);
@ -4331,10 +4338,6 @@ public class TestIndexWriter extends LuceneTestCase {
assertTrue(dir.fileExists("myrandomfile"));
// Make sure this does not copy myrandomfile:
Directory dir2 = new RAMDirectory(dir);
assertTrue(!dir2.fileExists("myrandomfile"));
} finally {
dir.close();
_TestUtil.rmDir(indexDir);

View File

@ -784,20 +784,8 @@ public class TestIndexWriterDelete extends LuceneTestCase {
}
}
String[] startFiles = dir.listAll();
SegmentInfos infos = new SegmentInfos();
infos.read(dir);
new IndexFileDeleter(dir, new KeepOnlyLastCommitDeletionPolicy(), infos, null, null);
String[] endFiles = dir.listAll();
if (!Arrays.equals(startFiles, endFiles)) {
fail("docswriter abort() failed to delete unreferenced files:\n before delete:\n "
+ arrayToString(startFiles) + "\n after delete:\n "
+ arrayToString(endFiles));
}
modifier.close();
TestIndexWriter.assertNoUnreferencedFiles(dir, "docsWriter.abort() failed to delete unreferenced files");
modifier.close();
}
private String arrayToString(String[] l) {

View File

@ -86,7 +86,7 @@ public class TestIndexWriterReader extends LuceneTestCase {
// get a reader
IndexReader r1 = writer.getReader();
assertTrue(r1.isCurrent());
//assertTrue(r1.isCurrent());
String id10 = r1.document(10).getField("id").stringValue();
@ -94,7 +94,7 @@ public class TestIndexWriterReader extends LuceneTestCase {
newDoc.removeField("id");
newDoc.add(new Field("id", Integer.toString(8000), Store.YES, Index.NOT_ANALYZED));
writer.updateDocument(new Term("id", id10), newDoc);
assertFalse(r1.isCurrent());
//assertFalse(r1.isCurrent());
IndexReader r2 = writer.getReader();
assertTrue(r2.isCurrent());
@ -157,7 +157,7 @@ public class TestIndexWriterReader extends LuceneTestCase {
IndexReader r0 = writer.getReader();
assertTrue(r0.isCurrent());
writer.addIndexesNoOptimize(new Directory[] { dir2 });
assertFalse(r0.isCurrent());
//assertFalse(r0.isCurrent());
r0.close();
IndexReader r1 = writer.getReader();

View File

@ -48,7 +48,7 @@ public class TestLazyProxSkipping extends LuceneTestCase {
@Override
public IndexInput openInput(String name) throws IOException {
IndexInput ii = super.openInput(name);
if (name.endsWith(".prx")) {
if (name.endsWith(".prx") || name.endsWith(".pos")) {
// we decorate the proxStream with a wrapper class that allows to count the number of calls of seek()
ii = new SeeksCountingStream(ii);
}

View File

@ -30,6 +30,7 @@ import org.apache.lucene.document.Field;
import org.apache.lucene.document.Field.Index;
import org.apache.lucene.document.Field.Store;
import org.apache.lucene.store.IndexInput;
import org.apache.lucene.store.MockRAMDirectory;
import org.apache.lucene.store.RAMDirectory;
import org.apache.lucene.util.LuceneTestCase;
@ -42,6 +43,16 @@ import org.apache.lucene.util.LuceneTestCase;
*
*/
public class TestMultiLevelSkipList extends LuceneTestCase {
class CountingRAMDirectory extends MockRAMDirectory {
public IndexInput openInput(String fileName) throws IOException {
IndexInput in = super.openInput(fileName);
if (fileName.endsWith(".frq"))
in = new CountingStream(in);
return in;
}
}
public void testSimpleSkip() throws IOException {
RAMDirectory dir = new RAMDirectory();
IndexWriter writer = new IndexWriter(dir, new PayloadAnalyzer(), true,
@ -57,8 +68,7 @@ public class TestMultiLevelSkipList extends LuceneTestCase {
writer.close();
IndexReader reader = SegmentReader.getOnlySegmentReader(dir);
SegmentTermPositions tp = (SegmentTermPositions) reader.termPositions();
tp.freqStream = new CountingStream(tp.freqStream);
TermPositions tp = reader.termPositions();
for (int i = 0; i < 2; i++) {
counter = 0;

View File

@ -39,6 +39,7 @@ import org.apache.lucene.document.Field;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.store.RAMDirectory;
import org.apache.lucene.util.BytesRef;
import org.apache.lucene.util.LuceneTestCase;
import org.apache.lucene.util.UnicodeUtil;
import org.apache.lucene.util._TestUtil;
@ -257,10 +258,12 @@ public class TestPayloads extends LuceneTestCase {
tp.next();
tp.nextPosition();
// now we don't read this payload
tp.next();
tp.nextPosition();
assertEquals("Wrong payload length.", 1, tp.getPayloadLength());
byte[] payload = tp.getPayload(null, 0);
assertEquals(payload[0], payloadData[numTerms]);
tp.next();
tp.nextPosition();
// we don't read this payload and skip to a different document
@ -559,13 +562,13 @@ public class TestPayloads extends LuceneTestCase {
}
}
private UnicodeUtil.UTF8Result utf8Result = new UnicodeUtil.UTF8Result();
private BytesRef utf8Result = new BytesRef(10);
synchronized String bytesToString(byte[] bytes) {
String s = new String(bytes);
UnicodeUtil.UTF16toUTF8(s, 0, s.length(), utf8Result);
try {
return new String(utf8Result.result, 0, utf8Result.length, "UTF-8");
return new String(utf8Result.bytes, 0, utf8Result.length, "UTF-8");
} catch (UnsupportedEncodingException uee) {
return null;
}

View File

@ -18,9 +18,11 @@ package org.apache.lucene.index;
*/
import org.apache.lucene.util.LuceneTestCase;
import org.apache.lucene.store.BufferedIndexInput;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.RAMDirectory;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.codecs.CodecProvider;
import java.io.IOException;
import java.util.Collection;
@ -63,14 +65,16 @@ public class TestSegmentMerger extends LuceneTestCase {
}
public void testMerge() throws IOException {
SegmentMerger merger = new SegmentMerger(mergedDir, mergedSegment);
SegmentMerger merger = new SegmentMerger(mergedDir, IndexWriter.DEFAULT_TERM_INDEX_INTERVAL, mergedSegment, null, CodecProvider.getDefault());
merger.add(reader1);
merger.add(reader2);
int docsMerged = merger.merge();
merger.closeReaders();
assertTrue(docsMerged == 2);
//Should be able to open a new SegmentReader against the new directory
SegmentReader mergedReader = SegmentReader.get(true, new SegmentInfo(mergedSegment, docsMerged, mergedDir, false, true), IndexReader.DEFAULT_TERMS_INDEX_DIVISOR);
SegmentReader mergedReader = SegmentReader.get(false, mergedDir, new SegmentInfo(mergedSegment, docsMerged, mergedDir, false, true,
-1, null, false, merger.hasProx(), merger.getCodec()), BufferedIndexInput.BUFFER_SIZE, true, IndexReader.DEFAULT_TERMS_INDEX_DIVISOR, null);
assertTrue(mergedReader != null);
assertTrue(mergedReader.numDocs() == 2);
Document newDoc1 = mergedReader.document(0);

View File

@ -137,6 +137,7 @@ public class TestSegmentReader extends LuceneTestCase {
TermPositions positions = reader.termPositions();
positions.seek(new Term(DocHelper.TEXT_FIELD_1_KEY, "field"));
assertTrue(positions != null);
assertTrue(positions.next());
assertTrue(positions.doc() == 0);
assertTrue(positions.nextPosition() >= 0);
}

View File

@ -56,14 +56,13 @@ public class TestSegmentTermDocs extends LuceneTestCase {
SegmentReader reader = SegmentReader.get(true, info, indexDivisor);
assertTrue(reader != null);
assertEquals(indexDivisor, reader.getTermInfosIndexDivisor());
SegmentTermDocs segTermDocs = new SegmentTermDocs(reader);
assertTrue(segTermDocs != null);
segTermDocs.seek(new Term(DocHelper.TEXT_FIELD_2_KEY, "field"));
if (segTermDocs.next() == true)
{
int docId = segTermDocs.doc();
TermDocs termDocs = reader.termDocs();
assertTrue(termDocs != null);
termDocs.seek(new Term(DocHelper.TEXT_FIELD_2_KEY, "field"));
if (termDocs.next() == true) {
int docId = termDocs.doc();
assertTrue(docId == 0);
int freq = segTermDocs.freq();
int freq = termDocs.freq();
assertTrue(freq == 3);
}
reader.close();
@ -78,20 +77,20 @@ public class TestSegmentTermDocs extends LuceneTestCase {
//After adding the document, we should be able to read it back in
SegmentReader reader = SegmentReader.get(true, info, indexDivisor);
assertTrue(reader != null);
SegmentTermDocs segTermDocs = new SegmentTermDocs(reader);
assertTrue(segTermDocs != null);
segTermDocs.seek(new Term("textField2", "bad"));
assertTrue(segTermDocs.next() == false);
TermDocs termDocs = reader.termDocs();
assertTrue(termDocs != null);
termDocs.seek(new Term("textField2", "bad"));
assertTrue(termDocs.next() == false);
reader.close();
}
{
//After adding the document, we should be able to read it back in
SegmentReader reader = SegmentReader.get(true, info, indexDivisor);
assertTrue(reader != null);
SegmentTermDocs segTermDocs = new SegmentTermDocs(reader);
assertTrue(segTermDocs != null);
segTermDocs.seek(new Term("junk", "bad"));
assertTrue(segTermDocs.next() == false);
TermDocs termDocs = reader.termDocs();
assertTrue(termDocs != null);
termDocs.seek(new Term("junk", "bad"));
assertTrue(termDocs.next() == false);
reader.close();
}
}

View File

@ -61,23 +61,6 @@ public class TestSegmentTermEnum extends LuceneTestCase
verifyDocFreq();
}
public void testPrevTermAtEnd() throws IOException
{
Directory dir = new MockRAMDirectory();
IndexWriter writer = new IndexWriter(dir, new WhitespaceAnalyzer(), true, IndexWriter.MaxFieldLength.LIMITED);
addDoc(writer, "aaa bbb");
writer.close();
SegmentReader reader = SegmentReader.getOnlySegmentReader(dir);
SegmentTermEnum termEnum = (SegmentTermEnum) reader.terms();
assertTrue(termEnum.next());
assertEquals("aaa", termEnum.term().text());
assertTrue(termEnum.next());
assertEquals("aaa", termEnum.prev().text());
assertEquals("bbb", termEnum.term().text());
assertFalse(termEnum.next());
assertEquals("bbb", termEnum.prev().text());
}
private void verifyDocFreq()
throws IOException
{

View File

@ -352,7 +352,7 @@ public class TestStressIndexing2 extends LuceneTestCase {
if (!termEnum1.next()) break;
}
// iterate until we get some docs
// iterate until we get some docs
int len2;
for(;;) {
len2=0;
@ -369,12 +369,12 @@ public class TestStressIndexing2 extends LuceneTestCase {
if (!termEnum2.next()) break;
}
if (!hasDeletes)
assertEquals(termEnum1.docFreq(), termEnum2.docFreq());
assertEquals(len1, len2);
if (len1==0) break; // no more terms
if (!hasDeletes)
assertEquals(termEnum1.docFreq(), termEnum2.docFreq());
assertEquals(term1, term2);
// sort info2 to get it into ascending docid

View File

@ -33,7 +33,7 @@ public class CheckHits {
* different order of operations from the actual scoring method ...
* this allows for a small amount of variation
*/
public static float EXPLAIN_SCORE_TOLERANCE_DELTA = 0.00005f;
public static float EXPLAIN_SCORE_TOLERANCE_DELTA = 0.0002f;
/**
* Tests that all documents up to maxDoc which are *not* in the

View File

@ -65,7 +65,7 @@ public class TestCachingWrapperFilter extends LuceneTestCase {
if (originalSet.isCacheable()) {
assertEquals("Cached DocIdSet must be of same class like uncached, if cacheable", originalSet.getClass(), cachedSet.getClass());
} else {
assertTrue("Cached DocIdSet must be an OpenBitSet if the original one was not cacheable", cachedSet instanceof OpenBitSetDISI);
assertTrue("Cached DocIdSet must be an OpenBitSet if the original one was not cacheable", cachedSet instanceof OpenBitSetDISI || cachedSet == DocIdSet.EMPTY_DOCIDSET);
}
}

View File

@ -230,6 +230,8 @@ public class TestNumericRangeQuery32 extends LuceneTestCase {
testRightOpenRange(2);
}
/* TESTs disabled, because incompatible API change in 3.1/flex:
private void testRandomTrieAndClassicRangeQuery(int precisionStep) throws Exception {
final Random rnd=newRandom();
String field="field"+precisionStep;
@ -298,6 +300,8 @@ public class TestNumericRangeQuery32 extends LuceneTestCase {
testRandomTrieAndClassicRangeQuery(Integer.MAX_VALUE);
}
*/
private void testRangeSplit(int precisionStep) throws Exception {
final Random rnd=newRandom();
String field="ascfield"+precisionStep;
@ -443,37 +447,39 @@ public class TestNumericRangeQuery32 extends LuceneTestCase {
assertFalse(q2.equals(q1));
}
private void testEnum(int lower, int upper) throws Exception {
NumericRangeQuery<Integer> q = NumericRangeQuery.newIntRange("field4", 4, lower, upper, true, true);
FilteredTermEnum termEnum = q.getEnum(searcher.getIndexReader());
try {
int count = 0;
do {
final Term t = termEnum.term();
if (t != null) {
final int val = NumericUtils.prefixCodedToInt(t.text());
assertTrue("value not in bounds", val >= lower && val <= upper);
count++;
} else break;
} while (termEnum.next());
assertFalse(termEnum.next());
System.out.println("TermEnum on 'field4' for range [" + lower + "," + upper + "] contained " + count + " terms.");
} finally {
termEnum.close();
}
}
// Removed for now - NumericRangeQuery does not currently implement getEnum
public void testEnum() throws Exception {
int count=3000;
int lower=(distance*3/2)+startOffset, upper=lower + count*distance + (distance/3);
// test enum with values
testEnum(lower, upper);
// test empty enum
testEnum(upper, lower);
// test empty enum outside of bounds
lower = distance*noDocs+startOffset;
upper = 2 * lower;
testEnum(lower, upper);
}
// private void testEnum(int lower, int upper) throws Exception {
// NumericRangeQuery<Integer> q = NumericRangeQuery.newIntRange("field4", 4, lower, upper, true, true);
// FilteredTermEnum termEnum = q.getEnum(searcher.getIndexReader());
// try {
// int count = 0;
// do {
// final Term t = termEnum.term();
// if (t != null) {
// final int val = NumericUtils.prefixCodedToInt(t.text());
// assertTrue("value not in bounds", val >= lower && val <= upper);
// count++;
// } else break;
// } while (termEnum.next());
// assertFalse(termEnum.next());
// System.out.println("TermEnum on 'field4' for range [" + lower + "," + upper + "] contained " + count + " terms.");
// } finally {
// termEnum.close();
// }
// }
//
// public void testEnum() throws Exception {
// int count=3000;
// int lower=(distance*3/2)+startOffset, upper=lower + count*distance + (distance/3);
// // test enum with values
// testEnum(lower, upper);
// // test empty enum
// testEnum(upper, lower);
// // test empty enum outside of bounds
// lower = distance*noDocs+startOffset;
// upper = 2 * lower;
// testEnum(lower, upper);
// }
}

View File

@ -245,6 +245,8 @@ public class TestNumericRangeQuery64 extends LuceneTestCase {
testRightOpenRange(2);
}
/* TESTs disabled, because incompatible API change in 3.1/flex:
private void testRandomTrieAndClassicRangeQuery(int precisionStep) throws Exception {
final Random rnd=newRandom();
String field="field"+precisionStep;
@ -317,6 +319,8 @@ public class TestNumericRangeQuery64 extends LuceneTestCase {
testRandomTrieAndClassicRangeQuery(Integer.MAX_VALUE);
}
*/
private void testRangeSplit(int precisionStep) throws Exception {
final Random rnd=newRandom();
String field="ascfield"+precisionStep;

View File

@ -35,6 +35,7 @@ import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.Term;
import org.apache.lucene.util.BytesRef;
import org.apache.lucene.queryParser.ParseException;
import org.apache.lucene.search.BooleanClause.Occur;
import org.apache.lucene.store.LockObtainFailedException;
@ -332,20 +333,28 @@ public class TestSort extends LuceneTestCase implements Serializable {
FieldCache fc = FieldCache.DEFAULT;
sort.setSort (new SortField ("parser", new FieldCache.IntParser(){
public final int parseInt(final String val) {
return (val.charAt(0)-'A') * 123456;
sort.setSort ( new SortField ("parser", new FieldCache.IntParser(){
public final int parseInt(final String term) {
// dummy
return 0;
}
}), SortField.FIELD_DOC );
public final int parseInt(final BytesRef term) {
return (term.bytes[term.offset]-'A') * 123456;
}
}), SortField.FIELD_DOC);
assertMatches (full, queryA, sort, "JIHGFEDCBA");
assertSaneFieldCaches(getName() + " IntParser");
fc.purgeAllCaches();
sort.setSort (new SortField ("parser", new FieldCache.FloatParser(){
public final float parseFloat(final String val) {
return (float) Math.sqrt( val.charAt(0) );
sort.setSort (new SortField[] { new SortField ("parser", new FieldCache.FloatParser(){
public final float parseFloat(final String term) {
// dummy
return 0;
}
}), SortField.FIELD_DOC );
public final float parseFloat(final BytesRef term) {
return (float) Math.sqrt( term.bytes[term.offset] );
}
}), SortField.FIELD_DOC });
assertMatches (full, queryA, sort, "JIHGFEDCBA");
assertSaneFieldCaches(getName() + " FloatParser");
fc.purgeAllCaches();
@ -354,34 +363,49 @@ public class TestSort extends LuceneTestCase implements Serializable {
public final long parseLong(final String val) {
return (val.charAt(0)-'A') * 1234567890L;
}
}), SortField.FIELD_DOC );
public final long parseLong(final BytesRef term) {
return (term.bytes[term.offset]-'A') * 1234567890L;
}
}), SortField.FIELD_DOC);
assertMatches (full, queryA, sort, "JIHGFEDCBA");
assertSaneFieldCaches(getName() + " LongParser");
fc.purgeAllCaches();
sort.setSort (new SortField ("parser", new FieldCache.DoubleParser(){
public final double parseDouble(final String val) {
return Math.pow( val.charAt(0), (val.charAt(0)-'A') );
sort.setSort (new SortField[] { new SortField ("parser", new FieldCache.DoubleParser(){
public final double parseDouble(final String term) {
// dummy
return 0;
}
}), SortField.FIELD_DOC );
public final double parseDouble(final BytesRef term) {
return Math.pow( term.bytes[term.offset], (term.bytes[term.offset]-'A') );
}
}), SortField.FIELD_DOC });
assertMatches (full, queryA, sort, "JIHGFEDCBA");
assertSaneFieldCaches(getName() + " DoubleParser");
fc.purgeAllCaches();
sort.setSort (new SortField ("parser", new FieldCache.ByteParser(){
public final byte parseByte(final String val) {
return (byte) (val.charAt(0)-'A');
sort.setSort (new SortField[] { new SortField ("parser", new FieldCache.ByteParser(){
public final byte parseByte(final String term) {
// dummy
return 0;
}
}), SortField.FIELD_DOC );
public final byte parseByte(final BytesRef term) {
return (byte) (term.bytes[term.offset]-'A');
}
}), SortField.FIELD_DOC });
assertMatches (full, queryA, sort, "JIHGFEDCBA");
assertSaneFieldCaches(getName() + " ByteParser");
fc.purgeAllCaches();
sort.setSort (new SortField ("parser", new FieldCache.ShortParser(){
public final short parseShort(final String val) {
return (short) (val.charAt(0)-'A');
sort.setSort (new SortField[] { new SortField ("parser", new FieldCache.ShortParser(){
public final short parseShort(final String term) {
// dummy
return 0;
}
}), SortField.FIELD_DOC );
public final short parseShort(final BytesRef term) {
return (short) (term.bytes[term.offset]-'A');
}
}), SortField.FIELD_DOC });
assertMatches (full, queryA, sort, "JIHGFEDCBA");
assertSaneFieldCaches(getName() + " ShortParser");
fc.purgeAllCaches();
@ -439,8 +463,12 @@ public class TestSort extends LuceneTestCase implements Serializable {
@Override
public void setNextReader(IndexReader reader, int docBase) throws IOException {
docValues = FieldCache.DEFAULT.getInts(reader, "parser", new FieldCache.IntParser() {
public final int parseInt(final String val) {
return (val.charAt(0)-'A') * 123456;
public final int parseInt(final String term) {
// dummy
return 0;
}
public final int parseInt(final BytesRef term) {
return (term.bytes[term.offset]-'A') * 123456;
}
});
}

View File

@ -72,9 +72,9 @@ public class TestTermScorer extends LuceneTestCase
Weight weight = termQuery.weight(indexSearcher);
TermScorer ts = new TermScorer(weight,
indexReader.termDocs(allTerm), indexSearcher.getSimilarity(),
indexReader.norms(FIELD));
Scorer ts = weight.scorer(indexSearcher.getIndexReader(),
true, true);
//we have 2 documents with the term all in them, one document for all the other values
final List docs = new ArrayList();
//must call next first
@ -138,9 +138,9 @@ public class TestTermScorer extends LuceneTestCase
Weight weight = termQuery.weight(indexSearcher);
TermScorer ts = new TermScorer(weight,
indexReader.termDocs(allTerm), indexSearcher.getSimilarity(),
indexReader.norms(FIELD));
Scorer ts = weight.scorer(indexSearcher.getIndexReader(),
true, true);
assertTrue("next did not return a doc", ts.nextDoc() != DocIdSetIterator.NO_MORE_DOCS);
assertTrue("score is not correct", ts.score() == 1.6931472f);
assertTrue("next did not return a doc", ts.nextDoc() != DocIdSetIterator.NO_MORE_DOCS);
@ -155,9 +155,9 @@ public class TestTermScorer extends LuceneTestCase
Weight weight = termQuery.weight(indexSearcher);
TermScorer ts = new TermScorer(weight,
indexReader.termDocs(allTerm), indexSearcher.getSimilarity(),
indexReader.norms(FIELD));
Scorer ts = weight.scorer(indexSearcher.getIndexReader(),
true, true);
assertTrue("Didn't skip", ts.advance(3) != DocIdSetIterator.NO_MORE_DOCS);
//The next doc should be doc 5
assertTrue("doc should be number 5", ts.docID() == 5);

View File

@ -114,6 +114,7 @@ public class TestWildcard
* rewritten to a single PrefixQuery. The boost and rewriteMethod should be
* preserved.
*/
/* disable because rewrites changed in flex/trunk
public void testPrefixTerm() throws IOException {
RAMDirectory indexStore = getIndexStore("field", new String[]{"prefix", "prefixx"});
IndexSearcher searcher = new IndexSearcher(indexStore, true);
@ -145,7 +146,7 @@ public class TestWildcard
expected.setRewriteMethod(wq.getRewriteMethod());
expected.setBoost(wq.getBoost());
assertEquals(searcher.rewrite(expected), searcher.rewrite(wq));
}
}*/
/**
* Tests Wildcard queries with an asterisk.

View File

@ -78,22 +78,22 @@ public class TestAttributeSource extends LuceneTestCase {
public void testCloneAttributes() {
final AttributeSource src = new AttributeSource();
final TermAttribute termAtt = src.addAttribute(TermAttribute.class);
final FlagsAttribute flagsAtt = src.addAttribute(FlagsAttribute.class);
final TypeAttribute typeAtt = src.addAttribute(TypeAttribute.class);
termAtt.setTermBuffer("TestTerm");
flagsAtt.setFlags(1234);
typeAtt.setType("TestType");
final AttributeSource clone = src.cloneAttributes();
final Iterator<Class<? extends Attribute>> it = clone.getAttributeClassesIterator();
assertEquals("TermAttribute must be the first attribute", TermAttribute.class, it.next());
assertEquals("FlagsAttribute must be the first attribute", FlagsAttribute.class, it.next());
assertEquals("TypeAttribute must be the second attribute", TypeAttribute.class, it.next());
assertFalse("No more attributes", it.hasNext());
final TermAttribute termAtt2 = clone.getAttribute(TermAttribute.class);
final FlagsAttribute flagsAtt2 = clone.getAttribute(FlagsAttribute.class);
final TypeAttribute typeAtt2 = clone.getAttribute(TypeAttribute.class);
assertNotSame("TermAttribute of original and clone must be different instances", termAtt2, termAtt);
assertNotSame("FlagsAttribute of original and clone must be different instances", flagsAtt2, flagsAtt);
assertNotSame("TypeAttribute of original and clone must be different instances", typeAtt2, typeAtt);
assertEquals("TermAttribute of original and clone must be equal", termAtt2, termAtt);
assertEquals("FlagsAttribute of original and clone must be equal", flagsAtt2, flagsAtt);
assertEquals("TypeAttribute of original and clone must be equal", typeAtt2, typeAtt);
}

View File

@ -26,6 +26,8 @@ import java.util.Iterator;
public class TestNumericUtils extends LuceneTestCase {
/* TESTs disabled, because incompatible API change in 3.1/flex:
public void testLongConversionAndOrdering() throws Exception {
// generate a series of encoded longs, each numerical one bigger than the one before
String last=null;
@ -132,6 +134,8 @@ public class TestNumericUtils extends LuceneTestCase {
}
}
*/
public void testDoubles() throws Exception {
double[] vals=new double[]{
Double.NEGATIVE_INFINITY, -2.3E25, -1.0E15, -1.0, -1.0E-1, -1.0E-2, -0.0,

View File

@ -104,24 +104,24 @@ The source distribution does not contain sources of the previous Lucene Java ver
<target name="compile-backwards" depends="compile-core, jar-core, test-backwards-message"
description="Runs tests of a previous Lucene version." if="backwards.available">
<sequential>
<sequential>
<mkdir dir="${build.dir.backwards}"/>
<!-- first compile branch classes -->
<compile
<!-- first compile branch classes -->
<compile
srcdir="${backwards.dir}/src/java"
destdir="${build.dir.backwards}/classes/java"
javac.source="${javac.source.backwards}" javac.target="${javac.target.backwards}"
>
<classpath refid="backwards.compile.classpath"/>
</compile>
</compile>
<!-- compile branch tests against branch classpath -->
<compile-test-macro srcdir="${backwards.dir}/src/test" destdir="${build.dir.backwards}/classes/test"
test.classpath="backwards.test.compile.classpath" javac.source="${javac.source.backwards}" javac.target="${javac.target.backwards}"/>
</sequential>
</sequential>
</target>
<target name="test-backwards" depends="compile-backwards, junit-backwards-mkdir, junit-backwards-sequential, junit-backwards-parallel"/>
@ -715,6 +715,41 @@ The source distribution does not contain sources of the previous Lucene Java ver
</delete>
</target>
<macrodef name="createLevAutomaton">
<attribute name="n"/>
<sequential>
<exec dir="src/java/org/apache/lucene/util/automaton"
executable="${python.exe}" failonerror="true">
<arg line="createLevAutomata.py @{n}"/>
</exec>
</sequential>
</macrodef>
<target name="createLevAutomata" depends="check-moman,clone-moman,pull-moman">
<createLevAutomaton n="1"/>
<createLevAutomaton n="2"/>
</target>
<target name="check-moman">
<condition property="moman.cloned">
<available file="src/java/org/apache/lucene/util/automaton/moman"/>
</condition>
</target>
<target name="clone-moman" unless="moman.cloned">
<exec dir="src/java/org/apache/lucene/util/automaton"
executable="${hg.exe}" failonerror="true">
<arg line="clone -r ${moman.rev} ${moman.url} moman"/>
</exec>
</target>
<target name="pull-moman" if="moman.cloned">
<exec dir="src/java/org/apache/lucene/util/automaton/moman"
executable="${hg.exe}" failonerror="true">
<arg line="pull -f -r ${moman.rev}"/>
</exec>
</target>
<macrodef name="contrib-crawl">
<attribute name="target" default=""/>
<attribute name="failonerror" default="true"/>

View File

@ -119,6 +119,11 @@
<property name="svnversion.exe" value="svnversion" />
<property name="svn.exe" value="svn" />
<property name="hg.exe" value="hg" />
<property name="moman.url" value="https://bitbucket.org/jpbarrette/moman" />
<property name="moman.rev" value="115" />
<property name="python.exe" value="python" />
<property name="gpg.exe" value="gpg" />
<property name="gpg.key" value="CODE SIGNING KEY" />

View File

@ -0,0 +1,553 @@
import types
import re
import time
import os
import shutil
import sys
import cPickle
import datetime
# TODO
# - build wiki/random index as needed (balanced or not, varying # segs, docs)
# - verify step
# - run searches
# - get all docs query in here
if sys.platform.lower().find('darwin') != -1:
osName = 'osx'
elif sys.platform.lower().find('win') != -1:
osName = 'windows'
elif sys.platform.lower().find('linux') != -1:
osName = 'linux'
else:
osName = 'unix'
TRUNK_DIR = '/lucene/clean'
FLEX_DIR = '/lucene/flex.branch'
DEBUG = False
# let shell find it:
JAVA_COMMAND = 'java -Xms2048M -Xmx2048M -Xbatch -server'
#JAVA_COMMAND = 'java -Xms1024M -Xmx1024M -Xbatch -server -XX:+AggressiveOpts -XX:CompileThreshold=100 -XX:+UseFastAccessorMethods'
INDEX_NUM_THREADS = 1
INDEX_NUM_DOCS = 5000000
LOG_DIR = 'logs'
DO_BALANCED = False
if osName == 'osx':
WIKI_FILE = '/x/lucene/enwiki-20090724-pages-articles.xml.bz2'
INDEX_DIR_BASE = '/lucene'
else:
WIKI_FILE = '/x/lucene/enwiki-20090724-pages-articles.xml.bz2'
INDEX_DIR_BASE = '/x/lucene'
if DEBUG:
NUM_ROUND = 0
else:
NUM_ROUND = 7
if 0:
print 'compile...'
if '-nocompile' not in sys.argv:
if os.system('ant compile > compile.log 2>&1') != 0:
raise RuntimeError('compile failed (see compile.log)')
BASE_SEARCH_ALG = '''
analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
directory=FSDirectory
work.dir = $INDEX$
search.num.hits = $NUM_HITS$
query.maker=org.apache.lucene.benchmark.byTask.feeds.FileBasedQueryMaker
file.query.maker.file = queries.txt
print.hits.field = $PRINT_FIELD$
log.queries=true
log.step=100000
$OPENREADER$
{"XSearchWarm" $SEARCH$}
# Turn off printing, after warming:
SetProp(print.hits.field,)
$ROUNDS$
CloseReader
RepSumByPrefRound XSearch
'''
BASE_INDEX_ALG = '''
analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
$OTHER$
deletion.policy = org.apache.lucene.benchmark.utils.NoDeletionPolicy
doc.tokenized = false
doc.body.tokenized = true
doc.stored = true
doc.body.stored = false
doc.term.vector = false
log.step.AddDoc=10000
directory=FSDirectory
autocommit=false
compound=false
work.dir=$WORKDIR$
{ "BuildIndex"
- CreateIndex
$INDEX_LINE$
- CommitIndex(dp0)
- CloseIndex
$DELETIONS$
}
RepSumByPrefRound BuildIndex
'''
class RunAlgs:
def __init__(self, resultsPrefix):
self.counter = 0
self.results = []
self.fOut = open('%s.txt' % resultsPrefix, 'wb')
def makeIndex(self, label, dir, source, numDocs, balancedNumSegs=None, deletePcts=None):
if source not in ('wiki', 'random'):
raise RuntimeError('source must be wiki or random')
if dir is not None:
fullDir = '%s/contrib/benchmark' % dir
if DEBUG:
print ' chdir %s' % fullDir
os.chdir(fullDir)
indexName = '%s.%s.nd%gM' % (source, label, numDocs/1000000.0)
if balancedNumSegs is not None:
indexName += '_balanced%d' % balancedNumSegs
fullIndexPath = '%s/%s' % (INDEX_DIR_BASE, indexName)
if os.path.exists(fullIndexPath):
print 'Index %s already exists...' % fullIndexPath
return indexName
print 'Now create index %s...' % fullIndexPath
s = BASE_INDEX_ALG
if source == 'wiki':
other = '''doc.index.props = true
content.source=org.apache.lucene.benchmark.byTask.feeds.EnwikiContentSource
docs.file=%s
''' % WIKI_FILE
#addDoc = 'AddDoc(1024)'
addDoc = 'AddDoc'
else:
other = '''doc.index.props = true
content.source=org.apache.lucene.benchmark.byTask.feeds.SortableSingleDocSource
'''
addDoc = 'AddDoc'
if INDEX_NUM_THREADS > 1:
#other += 'doc.reuse.fields=false\n'
s = s.replace('$INDEX_LINE$', '[ { "AddDocs" %s > : %s } : %s' % \
(addDoc, numDocs/INDEX_NUM_THREADS, INDEX_NUM_THREADS))
else:
s = s.replace('$INDEX_LINE$', '{ "AddDocs" %s > : %s' % \
(addDoc, numDocs))
s = s.replace('$WORKDIR$', fullIndexPath)
if deletePcts is not None:
dp = '# Do deletions\n'
dp += 'OpenReader(false)\n'
for pct in deletePcts:
if pct != 0:
dp += 'DeleteByPercent(%g)\n' % pct
dp += 'CommitIndex(dp%g)\n' % pct
dp += 'CloseReader()\n'
else:
dp = ''
s = s.replace('$DELETIONS$', dp)
if balancedNumSegs is not None:
other += ''' merge.factor=1000
max.buffered=%d
ram.flush.mb=2000
''' % (numDocs/balancedNumSegs)
else:
if source == 'random':
other += 'ram.flush.mb=1.0\n'
else:
other += 'ram.flush.mb=32.0\n'
s = s.replace('$OTHER$', other)
try:
self.runOne(dir, s, 'index_%s' % indexName, isIndex=True)
except:
if os.path.exists(fullIndexPath):
shutil.rmtree(fullIndexPath)
raise
return indexName
def getLogPrefix(self, **dArgs):
l = dArgs.items()
l.sort()
s = '_'.join(['%s=%s' % tup for tup in l])
s = s.replace(' ', '_')
s = s.replace('"', '_')
return s
def runOne(self, dir, alg, logFileName, expectedMaxDocs=None, expectedNumDocs=None, queries=None, verify=False, isIndex=False):
fullDir = '%s/contrib/benchmark' % dir
if DEBUG:
print ' chdir %s' % fullDir
os.chdir(fullDir)
if queries is not None:
if type(queries) in types.StringTypes:
queries = [queries]
open('queries.txt', 'wb').write('\n'.join(queries))
if DEBUG:
algFile = 'tmp.alg'
else:
algFile = 'tmp.%s.alg' % os.getpid()
open(algFile, 'wb').write(alg)
fullLogFileName = '%s/contrib/benchmark/%s/%s' % (dir, LOG_DIR, logFileName)
print ' log: %s' % fullLogFileName
if not os.path.exists(LOG_DIR):
print ' mkdir %s' % LOG_DIR
os.makedirs(LOG_DIR)
command = '%s -classpath ../../build/classes/java:../../build/classes/demo:../../build/contrib/highlighter/classes/java:lib/commons-digester-1.7.jar:lib/commons-collections-3.1.jar:lib/commons-compress-1.0.jar:lib/commons-logging-1.0.4.jar:lib/commons-beanutils-1.7.0.jar:lib/xerces-2.9.0.jar:lib/xml-apis-2.9.0.jar:../../build/contrib/benchmark/classes/java org.apache.lucene.benchmark.byTask.Benchmark %s > "%s" 2>&1' % (JAVA_COMMAND, algFile, fullLogFileName)
if DEBUG:
print 'command=%s' % command
try:
t0 = time.time()
if os.system(command) != 0:
raise RuntimeError('FAILED')
t1 = time.time()
finally:
if not DEBUG:
os.remove(algFile)
if isIndex:
s = open(fullLogFileName, 'rb').read()
if s.find('Exception in thread "') != -1 or s.find('at org.apache.lucene') != -1:
raise RuntimeError('alg hit exceptions')
return
else:
# Parse results:
bestQPS = None
count = 0
nhits = None
numDocs = None
maxDocs = None
warmTime = None
r = re.compile('^ ([0-9]+): (.*)$')
topN = []
for line in open(fullLogFileName, 'rb').readlines():
m = r.match(line.rstrip())
if m is not None:
topN.append(m.group(2))
if line.startswith('totalHits = '):
nhits = int(line[12:].strip())
if line.startswith('maxDoc() = '):
maxDocs = int(line[12:].strip())
if line.startswith('numDocs() = '):
numDocs = int(line[12:].strip())
if line.startswith('XSearchWarm'):
v = line.strip().split()
warmTime = float(v[5])
if line.startswith('XSearchReal'):
v = line.strip().split()
# print len(v), v
upto = 0
i = 0
qps = None
while i < len(v):
if v[i] == '-':
i += 1
continue
else:
upto += 1
i += 1
if upto == 5:
qps = float(v[i-1].replace(',', ''))
break
if qps is None:
raise RuntimeError('did not find qps')
count += 1
if bestQPS is None or qps > bestQPS:
bestQPS = qps
if not verify:
if count != NUM_ROUND:
raise RuntimeError('did not find %s rounds (got %s)' % (NUM_ROUND, count))
if warmTime is None:
raise RuntimeError('did not find warm time')
else:
bestQPS = 1.0
warmTime = None
if nhits is None:
raise RuntimeError('did not see "totalHits = XXX"')
if maxDocs is None:
raise RuntimeError('did not see "maxDoc() = XXX"')
if maxDocs != expectedMaxDocs:
raise RuntimeError('maxDocs() mismatch: expected %s but got %s' % (expectedMaxDocs, maxDocs))
if numDocs is None:
raise RuntimeError('did not see "numDocs() = XXX"')
if numDocs != expectedNumDocs:
raise RuntimeError('numDocs() mismatch: expected %s but got %s' % (expectedNumDocs, numDocs))
return nhits, warmTime, bestQPS, topN
def getAlg(self, indexPath, searchTask, numHits, deletes=None, verify=False, printField=''):
s = BASE_SEARCH_ALG
s = s.replace('$PRINT_FIELD$', 'doctitle')
if not verify:
s = s.replace('$ROUNDS$',
'''
{ "Rounds"
{ "Run"
{ "TestSearchSpeed"
{ "XSearchReal" $SEARCH$ > : 3.0s
}
NewRound
} : %d
}
''' % NUM_ROUND)
else:
s = s.replace('$ROUNDS$', '')
if deletes is None:
s = s.replace('$OPENREADER$', 'OpenReader')
else:
s = s.replace('$OPENREADER$', 'OpenReader(true,dp%g)' % deletes)
s = s.replace('$INDEX$', indexPath)
s = s.replace('$SEARCH$', searchTask)
s = s.replace('$NUM_HITS$', str(numHits))
return s
def compare(self, baseline, new, *params):
if new[0] != baseline[0]:
raise RuntimeError('baseline found %d hits but new found %d hits' % (baseline[0], new[0]))
qpsOld = baseline[2]
qpsNew = new[2]
pct = 100.0*(qpsNew-qpsOld)/qpsOld
print ' diff: %.1f%%' % pct
self.results.append((qpsOld, qpsNew, params))
self.fOut.write('|%s|%.2f|%.2f|%.1f%%|\n' % \
('|'.join(str(x) for x in params),
qpsOld, qpsNew, pct))
self.fOut.flush()
def save(self, name):
f = open('%s.pk' % name, 'wb')
cPickle.dump(self.results, f)
f.close()
def verify(r1, r2):
if r1[0] != r2[0]:
raise RuntimeError('different total hits: %s vs %s' % (r1[0], r2[0]))
h1 = r1[3]
h2 = r2[3]
if len(h1) != len(h2):
raise RuntimeError('different number of results')
else:
for i in range(len(h1)):
s1 = h1[i].replace('score=NaN', 'score=na').replace('score=0.0', 'score=na')
s2 = h2[i].replace('score=NaN', 'score=na').replace('score=0.0', 'score=na')
if s1 != s2:
raise RuntimeError('hit %s differs: %s vs %s' % (i, s1 ,s2))
def usage():
print
print 'Usage: python -u %s -run <name> | -report <name>' % sys.argv[0]
print
print ' -run <name> runs all tests, saving results to file <name>.pk'
print ' -report <name> opens <name>.pk and prints Jira table'
print ' -verify confirm old & new produce identical results'
print
sys.exit(1)
def main():
if not os.path.exists(LOG_DIR):
os.makedirs(LOG_DIR)
if '-run' in sys.argv:
i = sys.argv.index('-run')
mode = 'run'
if i < len(sys.argv)-1:
name = sys.argv[1+i]
else:
usage()
elif '-report' in sys.argv:
i = sys.argv.index('-report')
mode = 'report'
if i < len(sys.argv)-1:
name = sys.argv[1+i]
else:
usage()
elif '-verify' in sys.argv:
mode = 'verify'
name = None
else:
usage()
if mode in ('run', 'verify'):
run(mode, name)
else:
report(name)
def report(name):
print '||Query||Deletes %||Tot hits||QPS old||QPS new||Pct change||'
results = cPickle.load(open('%s.pk' % name))
for qpsOld, qpsNew, params in results:
pct = 100.0*(qpsNew-qpsOld)/qpsOld
if pct < 0.0:
c = 'red'
else:
c = 'green'
params = list(params)
query = params[0]
if query == '*:*':
query = '<all>'
params[0] = query
pct = '{color:%s}%.1f%%{color}' % (c, pct)
print '|%s|%.2f|%.2f|%s|' % \
('|'.join(str(x) for x in params),
qpsOld, qpsNew, pct)
def run(mode, name):
for dir in (TRUNK_DIR, FLEX_DIR):
dir = '%s/contrib/benchmark' % dir
print '"ant compile" in %s...' % dir
os.chdir(dir)
if os.system('ant compile') != 0:
raise RuntimeError('ant compile failed')
r = RunAlgs(name)
if not os.path.exists(WIKI_FILE):
print
print 'ERROR: wiki source file "%s" does not exist' % WIKI_FILE
print
sys.exit(1)
print
print 'JAVA:\n%s' % os.popen('java -version 2>&1').read()
print
if osName != 'windows':
print 'OS:\n%s' % os.popen('uname -a 2>&1').read()
else:
print 'OS:\n%s' % sys.platform
deletePcts = (0.0, 0.1, 1.0, 10)
indexes = {}
for rev in ('baseline', 'flex'):
if rev == 'baseline':
dir = TRUNK_DIR
else:
dir = FLEX_DIR
source = 'wiki'
indexes[rev] = r.makeIndex(rev, dir, source, INDEX_NUM_DOCS, deletePcts=deletePcts)
doVerify = mode == 'verify'
source = 'wiki'
numHits = 10
queries = (
'body:[tec TO tet]',
'real*',
'1',
'2',
'+1 +2',
'+1 -2',
'1 2 3 -4',
'"world economy"')
for query in queries:
for deletePct in deletePcts:
print '\nRUN: query=%s deletes=%g%% nhits=%d' % \
(query, deletePct, numHits)
maxDocs = INDEX_NUM_DOCS
numDocs = int(INDEX_NUM_DOCS * (1.0-deletePct/100.))
prefix = r.getLogPrefix(query=query, deletePct=deletePct)
indexPath = '%s/%s' % (INDEX_DIR_BASE, indexes['baseline'])
# baseline (trunk)
s = r.getAlg(indexPath,
'Search',
numHits,
deletes=deletePct,
verify=doVerify,
printField='doctitle')
baseline = r.runOne(TRUNK_DIR, s, 'baseline_%s' % prefix, maxDocs, numDocs, query, verify=doVerify)
# flex
indexPath = '%s/%s' % (INDEX_DIR_BASE, indexes['flex'])
s = r.getAlg(indexPath,
'Search',
numHits,
deletes=deletePct,
verify=doVerify,
printField='doctitle')
flex = r.runOne(FLEX_DIR, s, 'flex_%s' % prefix, maxDocs, numDocs, query, verify=doVerify)
print ' %d hits' % flex[0]
verify(baseline, flex)
if mode == 'run' and not DEBUG:
r.compare(baseline, flex,
query, deletePct, baseline[0])
r.save(name)
def cleanScores(l):
for i in range(len(l)):
pos = l[i].find(' score=')
l[i] = l[i][:pos].strip()
if __name__ == '__main__':
main()

View File

@ -0,0 +1,38 @@
package org.apache.lucene.benchmark.byTask.feeds;
/**
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
import org.apache.lucene.benchmark.byTask.utils.Config;
/**
* A {@link DocMaker} which reads the English Wikipedia dump. Uses
* {@link EnwikiContentSource} as its content source, regardless if a different
* content source was defined in the configuration.
* @deprecated Please use {@link DocMaker} instead, with content.source=EnwikiContentSource
*/
@Deprecated
public class EnwikiDocMaker extends DocMaker {
@Override
public void setConfig(Config config) {
super.setConfig(config);
// Override whatever content source was set in the config
source = new EnwikiContentSource();
source.setConfig(config);
System.out.println("NOTE: EnwikiDocMaker is deprecated; please use DocMaker instead (which is the default if you don't specify doc.maker) with content.source=EnwikiContentSource");
}
}

View File

@ -0,0 +1,50 @@
package org.apache.lucene.benchmark.byTask.feeds;
/**
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
import org.apache.lucene.benchmark.byTask.utils.Config;
/**
* A DocMaker reading one line at a time as a Document from a single file. This
* saves IO cost (over DirContentSource) of recursing through a directory and
* opening a new file for every document. It also re-uses its Document and Field
* instance to improve indexing speed.<br>
* The expected format of each line is (arguments are separated by &lt;TAB&gt;):
* <i>title, date, body</i>. If a line is read in a different format, a
* {@link RuntimeException} will be thrown. In general, you should use this doc
* maker with files that were created with
* {@link org.apache.lucene.benchmark.byTask.tasks.WriteLineDocTask}.<br>
* <br>
* Config properties:
* <ul>
* <li>doc.random.id.limit=N (default -1) -- create random docid in the range
* 0..N; this is useful with UpdateDoc to test updating random documents; if
* this is unspecified or -1, then docid is sequentially assigned
* </ul>
* @deprecated Please use {@link DocMaker} instead, with content.source=LineDocSource
*/
@Deprecated
public class LineDocMaker extends DocMaker {
@Override
public void setConfig(Config config) {
super.setConfig(config);
source = new LineDocSource();
source.setConfig(config);
System.out.println("NOTE: LineDocMaker is deprecated; please use DocMaker instead (which is the default if you don't specify doc.maker) with content.source=LineDocSource");
}
}

View File

@ -37,11 +37,12 @@ import org.apache.lucene.benchmark.byTask.stats.TaskStats;
import org.apache.lucene.collation.CollationKeyAnalyzer;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.TermsEnum;
import org.apache.lucene.index.MultiFields;
import org.apache.lucene.index.FieldsEnum;
import org.apache.lucene.index.DocsEnum;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.index.LogMergePolicy;
import org.apache.lucene.index.Term;
import org.apache.lucene.index.TermEnum;
import org.apache.lucene.index.TermDocs;
import org.apache.lucene.index.SerialMergeScheduler;
import org.apache.lucene.index.LogDocMergePolicy;
import org.apache.lucene.index.TermFreqVector;
@ -474,16 +475,20 @@ public class TestPerfTasksLogic extends LuceneTestCase {
IndexReader reader = IndexReader.open(benchmark.getRunData().getDirectory(), true);
assertEquals(NUM_DOCS, reader.numDocs());
TermEnum terms = reader.terms();
TermDocs termDocs = reader.termDocs();
int totalTokenCount2 = 0;
while(terms.next()) {
Term term = terms.term();
/* not-tokenized, but indexed field */
if (term != null && term.field() != DocMaker.ID_FIELD) {
termDocs.seek(terms.term());
while (termDocs.next())
totalTokenCount2 += termDocs.freq();
FieldsEnum fields = MultiFields.getFields(reader).iterator();
String fieldName = null;
while((fieldName = fields.next()) != null) {
if (fieldName == DocMaker.ID_FIELD)
continue;
TermsEnum terms = fields.terms();
DocsEnum docs = null;
while(terms.next() != null) {
docs = terms.docs(MultiFields.getDeletedDocs(reader), docs);
while(docs.nextDoc() != docs.NO_MORE_DOCS) {
totalTokenCount2 += docs.freq();
}
}
}
reader.close();

View File

@ -150,11 +150,16 @@ public class WeightedSpanTermExtractor {
mtq.setRewriteMethod(MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRITE);
query = mtq;
}
FakeReader fReader = new FakeReader();
MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRITE.rewrite(fReader, mtq);
if (fReader.field != null) {
IndexReader ir = getReaderForField(fReader.field);
if (mtq.getField() != null) {
IndexReader ir = getReaderForField(mtq.getField());
extract(query.rewrite(ir), terms);
} else {
FakeReader fReader = new FakeReader();
MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRITE.rewrite(fReader, mtq);
if (fReader.field != null) {
IndexReader ir = getReaderForField(fReader.field);
extract(query.rewrite(ir), terms);
}
}
} else if (query instanceof MultiPhraseQuery) {
final MultiPhraseQuery mpq = (MultiPhraseQuery) query;

View File

@ -19,11 +19,15 @@ package org.apache.lucene.index;
import java.io.IOException;
import java.io.File;
import java.util.Date;
import java.util.List;
import java.util.ArrayList;
import org.apache.lucene.search.Similarity;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.StringHelper;
import org.apache.lucene.util.Bits;
import org.apache.lucene.util.ReaderUtil;
/**
* Given a directory and a list of fields, updates the fieldNorms in place for every document.
@ -104,46 +108,46 @@ public class FieldNormModifier {
*/
public void reSetNorms(String field) throws IOException {
String fieldName = StringHelper.intern(field);
int[] termCounts = new int[0];
IndexReader reader = null;
TermEnum termEnum = null;
TermDocs termDocs = null;
try {
reader = IndexReader.open(dir, true);
termCounts = new int[reader.maxDoc()];
try {
termEnum = reader.terms(new Term(field));
try {
termDocs = reader.termDocs();
do {
Term term = termEnum.term();
if (term != null && term.field().equals(fieldName)) {
termDocs.seek(termEnum.term());
while (termDocs.next()) {
termCounts[termDocs.doc()] += termDocs.freq();
}
}
} while (termEnum.next());
} finally {
if (null != termDocs) termDocs.close();
}
} finally {
if (null != termEnum) termEnum.close();
}
} finally {
if (null != reader) reader.close();
}
try {
reader = IndexReader.open(dir, false);
for (int d = 0; d < termCounts.length; d++) {
if (! reader.isDeleted(d)) {
if (sim == null)
reader.setNorm(d, fieldName, Similarity.encodeNorm(1.0f));
else
reader.setNorm(d, fieldName, sim.encodeNormValue(sim.lengthNorm(fieldName, termCounts[d])));
final List<IndexReader> subReaders = new ArrayList<IndexReader>();
ReaderUtil.gatherSubReaders(subReaders, reader);
for(IndexReader subReader : subReaders) {
final Bits delDocs = subReader.getDeletedDocs();
int[] termCounts = new int[subReader.maxDoc()];
Fields fields = subReader.fields();
if (fields != null) {
Terms terms = fields.terms(field);
if (terms != null) {
TermsEnum termsEnum = terms.iterator();
DocsEnum docs = null;
while(termsEnum.next() != null) {
docs = termsEnum.docs(delDocs, docs);
while(true) {
int docID = docs.nextDoc();
if (docID != docs.NO_MORE_DOCS) {
termCounts[docID] += docs.freq();
} else {
break;
}
}
}
}
}
for (int d = 0; d < termCounts.length; d++) {
if (delDocs == null || !delDocs.get(d)) {
if (sim == null) {
subReader.setNorm(d, fieldName, Similarity.encodeNorm(1.0f));
} else {
subReader.setNorm(d, fieldName, sim.encodeNormValue(sim.lengthNorm(fieldName, termCounts[d])));
}
}
}
}
@ -151,5 +155,4 @@ public class FieldNormModifier {
if (null != reader) reader.close();
}
}
}

View File

@ -26,6 +26,7 @@ import org.apache.lucene.index.IndexWriterConfig.OpenMode;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.OpenBitSet;
import org.apache.lucene.util.Bits;
import org.apache.lucene.util.Version;
/**
@ -172,6 +173,8 @@ public class MultiPassIndexSplitter {
* list of deletions.
*/
public static class FakeDeleteIndexReader extends FilterIndexReader {
// TODO: switch to flex api, here
OpenBitSet dels;
OpenBitSet oldDels = null;
@ -202,6 +205,7 @@ public class MultiPassIndexSplitter {
if (oldDels != null) {
dels.or(oldDels);
}
storeDelDocs(null);
}
@Override
@ -214,6 +218,16 @@ public class MultiPassIndexSplitter {
return !dels.isEmpty();
}
@Override
public IndexReader[] getSequentialSubReaders() {
return null;
}
@Override
public Bits getDeletedDocs() {
return dels;
}
@Override
public boolean isDeleted(int n) {
return dels.get(n);
@ -235,5 +249,29 @@ public class MultiPassIndexSplitter {
}
};
}
@Override
public TermDocs termDocs() throws IOException {
return new FilterTermDocs(in.termDocs()) {
@Override
public boolean next() throws IOException {
boolean res;
while ((res = super.next())) {
if (!dels.get(doc())) {
break;
}
}
return res;
}
};
}
@Override
public TermDocs termDocs(Term term) throws IOException {
TermDocs termDocs = termDocs();
termDocs.seek(term);
return termDocs;
}
}
}

View File

@ -1,10 +1,5 @@
package org.apache.lucene.index;
import org.apache.lucene.util.StringHelper;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
/*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
@ -20,6 +15,14 @@ import java.util.List;
*
*/
import org.apache.lucene.util.StringHelper;
import org.apache.lucene.util.Bits;
import org.apache.lucene.util.BytesRef;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
/**
* Transparent access to the vector space model,
@ -97,40 +100,53 @@ public class TermVectorAccessor {
positions.clear();
}
TermEnum termEnum = indexReader.terms(new Term(field, ""));
if (termEnum.term() != null) {
while (termEnum.term().field() == field) {
TermPositions termPositions = indexReader.termPositions(termEnum.term());
if (termPositions.skipTo(documentNumber)) {
frequencies.add(Integer.valueOf(termPositions.freq()));
tokens.add(termEnum.term().text());
final Bits delDocs = MultiFields.getDeletedDocs(indexReader);
Terms terms = MultiFields.getTerms(indexReader, field);
boolean anyTerms = false;
if (terms != null) {
TermsEnum termsEnum = terms.iterator();
DocsEnum docs = null;
DocsAndPositionsEnum postings = null;
while(true) {
BytesRef text = termsEnum.next();
if (text != null) {
anyTerms = true;
if (!mapper.isIgnoringPositions()) {
int[] positions = new int[termPositions.freq()];
for (int i = 0; i < positions.length; i++) {
positions[i] = termPositions.nextPosition();
}
this.positions.add(positions);
docs = postings = termsEnum.docsAndPositions(delDocs, postings);
} else {
positions.add(null);
docs = termsEnum.docs(delDocs, docs);
}
}
termPositions.close();
if (!termEnum.next()) {
int docID = docs.advance(documentNumber);
if (docID == documentNumber) {
frequencies.add(Integer.valueOf(docs.freq()));
tokens.add(text.utf8ToString());
if (!mapper.isIgnoringPositions()) {
int[] positions = new int[docs.freq()];
for (int i = 0; i < positions.length; i++) {
positions[i] = postings.nextPosition();
}
this.positions.add(positions);
} else {
positions.add(null);
}
}
} else {
break;
}
}
mapper.setDocumentNumber(documentNumber);
mapper.setExpectations(field, tokens.size(), false, !mapper.isIgnoringPositions());
for (int i = 0; i < tokens.size(); i++) {
mapper.map(tokens.get(i), frequencies.get(i).intValue(), (TermVectorOffsetInfo[]) null, positions.get(i));
if (anyTerms) {
mapper.setDocumentNumber(documentNumber);
mapper.setExpectations(field, tokens.size(), false, !mapper.isIgnoringPositions());
for (int i = 0; i < tokens.size(); i++) {
mapper.map(tokens.get(i), frequencies.get(i).intValue(), (TermVectorOffsetInfo[]) null, positions.get(i));
}
}
}
termEnum.close();
}

View File

@ -18,7 +18,10 @@ package org.apache.lucene.misc;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.Term;
import org.apache.lucene.index.TermEnum;
import org.apache.lucene.util.BytesRef;
import org.apache.lucene.index.TermsEnum;
import org.apache.lucene.index.FieldsEnum;
import org.apache.lucene.index.Terms;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.PriorityQueue;
@ -50,20 +53,40 @@ public class HighFreqTerms {
}
TermInfoQueue tiq = new TermInfoQueue(numTerms);
TermEnum terms = reader.terms();
if (field != null) {
while (terms.next()) {
if (terms.term().field().equals(field)) {
tiq.insertWithOverflow(new TermInfo(terms.term(), terms.docFreq()));
Terms terms = reader.fields().terms(field);
if (terms != null) {
TermsEnum termsEnum = terms.iterator();
while(true) {
BytesRef term = termsEnum.next();
if (term != null) {
tiq.insertWithOverflow(new TermInfo(new Term(field, term.utf8ToString()), termsEnum.docFreq()));
} else {
break;
}
}
}
} else {
FieldsEnum fields = reader.fields().iterator();
while(true) {
field = fields.next();
if (field != null) {
TermsEnum terms = fields.terms();
while(true) {
BytesRef term = terms.next();
if (term != null) {
tiq.insertWithOverflow(new TermInfo(new Term(field, term.toString()), terms.docFreq()));
} else {
break;
}
}
} else {
break;
}
}
}
else {
while (terms.next()) {
tiq.insertWithOverflow(new TermInfo(terms.term(), terms.docFreq()));
}
}
while (tiq.size() != 0) {
TermInfo termInfo = tiq.pop();
System.out.println(termInfo.term + " " + termInfo.docFreq);

View File

@ -0,0 +1,154 @@
package org.apache.lucene.misc;
/**
* Copyright 2006 The Apache Software Foundation
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
import org.apache.lucene.index.Term;
import org.apache.lucene.index.TermEnum;
import org.apache.lucene.index.TermDocs;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.search.Similarity;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.StringHelper;
import java.io.File;
import java.io.IOException;
import java.util.Date;
/**
* Given a directory, a Similarity, and a list of fields, updates the
* fieldNorms in place for every document using the Similarity.lengthNorm.
*
* <p>
* NOTE: This only works if you do <b>not</b> use field/document boosts in your
* index.
* </p>
*
* @version $Id$
* @deprecated Use {@link org.apache.lucene.index.FieldNormModifier}
*/
@Deprecated
public class LengthNormModifier {
/**
* Command Line Execution method.
*
* <pre>
* Usage: LengthNormModifier /path/index package.SimilarityClassName field1 field2 ...
* </pre>
*/
public static void main(String[] args) throws IOException {
if (args.length < 3) {
System.err.println("Usage: LengthNormModifier <index> <package.SimilarityClassName> <field1> [field2] ...");
System.exit(1);
}
Similarity s = null;
try {
s = Class.forName(args[1]).asSubclass(Similarity.class).newInstance();
} catch (Exception e) {
System.err.println("Couldn't instantiate similarity with empty constructor: " + args[1]);
e.printStackTrace(System.err);
}
File index = new File(args[0]);
Directory d = FSDirectory.open(index);
LengthNormModifier lnm = new LengthNormModifier(d, s);
for (int i = 2; i < args.length; i++) {
System.out.print("Updating field: " + args[i] + " " + (new Date()).toString() + " ... ");
lnm.reSetNorms(args[i]);
System.out.println(new Date().toString());
}
d.close();
}
private Directory dir;
private Similarity sim;
/**
* Constructor for code that wishes to use this class progaomatically.
*
* @param d The Directory to modify
* @param s The Similarity to use in <code>reSetNorms</code>
*/
public LengthNormModifier(Directory d, Similarity s) {
dir = d;
sim = s;
}
/**
* Resets the norms for the specified field.
*
* <p>
* Opens a new IndexReader on the Directory given to this instance,
* modifies the norms using the Similarity given to this instance,
* and closes the IndexReader.
* </p>
*
* @param field the field whose norms should be reset
*/
public void reSetNorms(String field) throws IOException {
String fieldName = StringHelper.intern(field);
int[] termCounts = new int[0];
IndexReader reader = null;
TermEnum termEnum = null;
TermDocs termDocs = null;
try {
reader = IndexReader.open(dir, false);
termCounts = new int[reader.maxDoc()];
try {
termEnum = reader.terms(new Term(field));
try {
termDocs = reader.termDocs();
do {
Term term = termEnum.term();
if (term != null && term.field().equals(fieldName)) {
termDocs.seek(termEnum.term());
while (termDocs.next()) {
termCounts[termDocs.doc()] += termDocs.freq();
}
}
} while (termEnum.next());
} finally {
if (null != termDocs) termDocs.close();
}
} finally {
if (null != termEnum) termEnum.close();
}
} finally {
if (null != reader) reader.close();
}
try {
reader = IndexReader.open(dir, false);
for (int d = 0; d < termCounts.length; d++) {
if (! reader.isDeleted(d)) {
byte norm = Similarity.encodeNorm(sim.lengthNorm(fieldName, termCounts[d]));
reader.setNorm(d, fieldName, norm);
}
}
} finally {
if (null != reader) reader.close();
}
}
}

View File

@ -76,13 +76,9 @@ public class TestFieldNormModifier extends LuceneTestCase {
writer.close();
}
public void testMissingField() {
public void testMissingField() throws Exception {
FieldNormModifier fnm = new FieldNormModifier(store, s);
try {
fnm.reSetNorms("nobodyherebutuschickens");
} catch (Exception e) {
assertNull("caught something", e);
}
fnm.reSetNorms("nobodyherebutuschickens");
}
public void testFieldWithNoNorm() throws Exception {
@ -97,11 +93,7 @@ public class TestFieldNormModifier extends LuceneTestCase {
r.close();
FieldNormModifier fnm = new FieldNormModifier(store, s);
try {
fnm.reSetNorms("nonorm");
} catch (Exception e) {
assertNull("caught something", e);
}
fnm.reSetNorms("nonorm");
// nothing should have changed
r = IndexReader.open(store, false);

View File

@ -18,10 +18,13 @@ package org.apache.lucene.search;
import java.io.IOException;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.Term;
import org.apache.lucene.index.TermDocs;
import org.apache.lucene.index.TermEnum;
import org.apache.lucene.util.BytesRef;
import org.apache.lucene.index.Terms;
import org.apache.lucene.index.DocsEnum;
import org.apache.lucene.index.TermsEnum;
import org.apache.lucene.index.MultiFields;
import org.apache.lucene.util.OpenBitSet;
import org.apache.lucene.util.Bits;
public class DuplicateFilter extends Filter
{
@ -79,88 +82,87 @@ public class DuplicateFilter extends Filter
}
}
private OpenBitSet correctBits(IndexReader reader) throws IOException
{
OpenBitSet bits=new OpenBitSet(reader.maxDoc()); //assume all are INvalid
Term startTerm=new Term(fieldName);
TermEnum te = reader.terms(startTerm);
if(te!=null)
{
Term currTerm=te.term();
while((currTerm!=null)&&(currTerm.field()==startTerm.field())) //term fieldnames are interned
{
int lastDoc=-1;
//set non duplicates
TermDocs td = reader.termDocs(currTerm);
if(td.next())
{
if(keepMode==KM_USE_FIRST_OCCURRENCE)
{
bits.set(td.doc());
}
else
{
do
{
lastDoc=td.doc();
}while(td.next());
bits.set(lastDoc);
}
}
if(!te.next())
{
break;
}
currTerm=te.term();
}
}
return bits;
}
private OpenBitSet correctBits(IndexReader reader) throws IOException {
OpenBitSet bits = new OpenBitSet(reader.maxDoc()); //assume all are INvalid
final Bits delDocs = MultiFields.getDeletedDocs(reader);
Terms terms = reader.fields().terms(fieldName);
if (terms != null) {
TermsEnum termsEnum = terms.iterator();
DocsEnum docs = null;
while(true) {
BytesRef currTerm = termsEnum.next();
if (currTerm == null) {
break;
} else {
docs = termsEnum.docs(delDocs, docs);
int doc = docs.nextDoc();
if (doc != docs.NO_MORE_DOCS) {
if (keepMode == KM_USE_FIRST_OCCURRENCE) {
bits.set(doc);
} else {
int lastDoc = doc;
while (true) {
lastDoc = doc;
doc = docs.nextDoc();
if (doc == docs.NO_MORE_DOCS) {
break;
}
}
bits.set(lastDoc);
}
}
}
}
}
return bits;
}
private OpenBitSet fastBits(IndexReader reader) throws IOException
{
{
OpenBitSet bits=new OpenBitSet(reader.maxDoc());
bits.set(0,reader.maxDoc()); //assume all are valid
Term startTerm=new Term(fieldName);
TermEnum te = reader.terms(startTerm);
if(te!=null)
{
Term currTerm=te.term();
bits.set(0,reader.maxDoc()); //assume all are valid
final Bits delDocs = MultiFields.getDeletedDocs(reader);
Terms terms = reader.fields().terms(fieldName);
if (terms != null) {
TermsEnum termsEnum = terms.iterator();
DocsEnum docs = null;
while(true) {
BytesRef currTerm = termsEnum.next();
if (currTerm == null) {
break;
} else {
if (termsEnum.docFreq() > 1) {
// unset potential duplicates
docs = termsEnum.docs(delDocs, docs);
int doc = docs.nextDoc();
if (doc != docs.NO_MORE_DOCS) {
if (keepMode == KM_USE_FIRST_OCCURRENCE) {
doc = docs.nextDoc();
}
}
while((currTerm!=null)&&(currTerm.field()==startTerm.field())) //term fieldnames are interned
{
if(te.docFreq()>1)
{
int lastDoc=-1;
//unset potential duplicates
TermDocs td = reader.termDocs(currTerm);
td.next();
if(keepMode==KM_USE_FIRST_OCCURRENCE)
{
td.next();
}
do
{
lastDoc=td.doc();
bits.clear(lastDoc);
}while(td.next());
if(keepMode==KM_USE_LAST_OCCURRENCE)
{
//restore the last bit
bits.set(lastDoc);
}
}
if(!te.next())
{
break;
}
currTerm=te.term();
}
}
return bits;
}
int lastDoc = -1;
while (true) {
lastDoc = doc;
bits.clear(lastDoc);
doc = docs.nextDoc();
if (doc == docs.NO_MORE_DOCS) {
break;
}
}
if (keepMode==KM_USE_LAST_OCCURRENCE) {
// restore the last bit
bits.set(lastDoc);
}
}
}
}
}
return bits;
}
public String getFieldName()
{

View File

@ -29,7 +29,7 @@ import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.tokenattributes.TermAttribute;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.Term;
import org.apache.lucene.index.TermEnum;
import org.apache.lucene.util.BytesRef;
import org.apache.lucene.util.PriorityQueue;
/**
@ -172,8 +172,8 @@ public class FuzzyLikeThisQuery extends Query
* Adds user input for "fuzzification"
* @param queryString The string which will be parsed by the analyzer and for which fuzzy variants will be parsed
* @param fieldName
* @param minSimilarity The minimum similarity of the term variants (see FuzzyTermEnum)
* @param prefixLength Length of required common prefix on variant terms (see FuzzyTermEnum)
* @param minSimilarity The minimum similarity of the term variants (see FuzzyTermsEnum)
* @param prefixLength Length of required common prefix on variant terms (see FuzzyTermsEnum)
*/
public void addTerms(String queryString, String fieldName,float minSimilarity, int prefixLength)
{
@ -195,48 +195,44 @@ public class FuzzyLikeThisQuery extends Query
String term = termAtt.term();
if(!processedTerms.contains(term))
{
processedTerms.add(term);
ScoreTermQueue variantsQ=new ScoreTermQueue(MAX_VARIANTS_PER_TERM); //maxNum variants considered for any one term
float minScore=0;
Term startTerm=internSavingTemplateTerm.createTerm(term);
FuzzyTermEnum fe=new FuzzyTermEnum(reader,startTerm,f.minSimilarity,f.prefixLength);
TermEnum origEnum = reader.terms(startTerm);
int df=0;
if(startTerm.equals(origEnum.term()))
{
df=origEnum.docFreq(); //store the df so all variants use same idf
}
int numVariants=0;
int totalVariantDocFreqs=0;
do
{
Term possibleMatch=fe.term();
if(possibleMatch!=null)
{
numVariants++;
totalVariantDocFreqs+=fe.docFreq();
float score=fe.difference();
if(variantsQ.size() < MAX_VARIANTS_PER_TERM || score > minScore){
ScoreTerm st=new ScoreTerm(possibleMatch,score,startTerm);
variantsQ.insertWithOverflow(st);
minScore = variantsQ.top().score; // maintain minScore
}
processedTerms.add(term);
ScoreTermQueue variantsQ=new ScoreTermQueue(MAX_VARIANTS_PER_TERM); //maxNum variants considered for any one term
float minScore=0;
Term startTerm=internSavingTemplateTerm.createTerm(term);
FuzzyTermsEnum fe = new FuzzyTermsEnum(reader, startTerm, f.minSimilarity, f.prefixLength);
//store the df so all variants use same idf
int df = reader.docFreq(startTerm);
int numVariants=0;
int totalVariantDocFreqs=0;
BytesRef possibleMatch;
MultiTermQuery.BoostAttribute boostAtt =
fe.attributes().addAttribute(MultiTermQuery.BoostAttribute.class);
while ((possibleMatch = fe.next()) != null) {
if (possibleMatch!=null) {
numVariants++;
totalVariantDocFreqs+=fe.docFreq();
float score=boostAtt.getBoost();
if (variantsQ.size() < MAX_VARIANTS_PER_TERM || score > minScore){
ScoreTerm st=new ScoreTerm(new Term(startTerm.field(), possibleMatch.utf8ToString()),score,startTerm);
variantsQ.insertWithOverflow(st);
minScore = variantsQ.top().score; // maintain minScore
}
}
}
}
while(fe.next());
if(numVariants>0)
{
int avgDf=totalVariantDocFreqs/numVariants;
if(df==0)//no direct match we can use as df for all variants
if(numVariants>0)
{
int avgDf=totalVariantDocFreqs/numVariants;
if(df==0)//no direct match we can use as df for all variants
{
df=avgDf; //use avg df of all variants
}
// take the top variants (scored by edit distance) and reset the score
// to include an IDF factor then add to the global queue for ranking
// overall top query terms
int size = variantsQ.size();
for(int i = 0; i < size; i++)
// take the top variants (scored by edit distance) and reset the score
// to include an IDF factor then add to the global queue for ranking
// overall top query terms
int size = variantsQ.size();
for(int i = 0; i < size; i++)
{
ScoreTerm st = variantsQ.pop();
st.score=(st.score*st.score)*sim.idf(df,corpusNumDocs);

View File

@ -38,6 +38,7 @@ import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.index.LogMergePolicy;
import org.apache.lucene.index.Term;
import org.apache.lucene.util.BytesRef;
import org.apache.lucene.store.RAMDirectory;
import org.apache.lucene.util.LuceneTestCase;
import org.apache.lucene.util._TestUtil;
@ -219,8 +220,8 @@ public class TestRemoteSort extends LuceneTestCase implements Serializable {
@Override
public void setNextReader(IndexReader reader, int docBase) throws IOException {
docValues = FieldCache.DEFAULT.getInts(reader, "parser", new FieldCache.IntParser() {
public final int parseInt(final String val) {
return (val.charAt(0)-'A') * 123456;
public final int parseInt(BytesRef termRef) {
return (termRef.utf8ToString().charAt(0)-'A') * 123456;
}
});
}
@ -245,6 +246,29 @@ public class TestRemoteSort extends LuceneTestCase implements Serializable {
runMultiSorts(multi, true); // this runs on the full index
}
// test custom search when remote
/* rewrite with new API
public void testRemoteCustomSort() throws Exception {
Searchable searcher = getRemote();
MultiSearcher multi = new MultiSearcher (new Searchable[] { searcher });
sort.setSort (new SortField ("custom", SampleComparable.getComparatorSource()));
assertMatches (multi, queryX, sort, "CAIEG");
sort.setSort (new SortField ("custom", SampleComparable.getComparatorSource(), true));
assertMatches (multi, queryY, sort, "HJDBF");
assertSaneFieldCaches(getName() + " ComparatorSource");
FieldCache.DEFAULT.purgeAllCaches();
SortComparator custom = SampleComparable.getComparator();
sort.setSort (new SortField ("custom", custom));
assertMatches (multi, queryX, sort, "CAIEG");
sort.setSort (new SortField ("custom", custom, true));
assertMatches (multi, queryY, sort, "HJDBF");
assertSaneFieldCaches(getName() + " Comparator");
FieldCache.DEFAULT.purgeAllCaches();
}*/
// test that the relevancy scores are the same even if
// hits are sorted
public void testNormalizedScores() throws Exception {
@ -294,7 +318,7 @@ public class TestRemoteSort extends LuceneTestCase implements Serializable {
assertSameValues (scoresY, getScores (remote.search (queryY, null, 1000, sort).scoreDocs, remote));
assertSameValues (scoresA, getScores (remote.search (queryA, null, 1000, sort).scoreDocs, remote));
sort.setSort (new SortField("float", SortField.FLOAT), new SortField("string", SortField.STRING));
sort.setSort (new SortField("float", SortField.FLOAT));
assertSameValues (scoresX, getScores (remote.search (queryX, null, 1000, sort).scoreDocs, remote));
assertSameValues (scoresY, getScores (remote.search (queryY, null, 1000, sort).scoreDocs, remote));
assertSameValues (scoresA, getScores (remote.search (queryA, null, 1000, sort).scoreDocs, remote));
@ -314,6 +338,10 @@ public class TestRemoteSort extends LuceneTestCase implements Serializable {
expected = isFull ? "IDHFGJABEC" : "IDHFGJAEBC";
assertMatches(multi, queryA, sort, expected);
sort.setSort(new SortField ("int", SortField.INT));
expected = isFull ? "IDHFGJABEC" : "IDHFGJAEBC";
assertMatches(multi, queryA, sort, expected);
sort.setSort(new SortField ("float", SortField.FLOAT), SortField.FIELD_DOC);
assertMatches(multi, queryA, sort, "GDHJCIEFAB");

View File

@ -19,12 +19,15 @@ package org.apache.lucene.spatial.tier;
import java.io.IOException;
import java.util.List;
import org.apache.lucene.index.DocsEnum;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.Term;
import org.apache.lucene.index.TermDocs;
import org.apache.lucene.index.MultiFields;
import org.apache.lucene.search.Filter;
import org.apache.lucene.search.DocIdSet;
import org.apache.lucene.search.DocIdSetIterator;
import org.apache.lucene.util.NumericUtils;
import org.apache.lucene.util.Bits;
import org.apache.lucene.util.BytesRef;
import org.apache.lucene.util.OpenBitSet;
/**
@ -44,22 +47,41 @@ public class CartesianShapeFilter extends Filter {
@Override
public DocIdSet getDocIdSet(final IndexReader reader) throws IOException {
final OpenBitSet bits = new OpenBitSet(reader.maxDoc());
final TermDocs termDocs = reader.termDocs();
final Bits delDocs = MultiFields.getDeletedDocs(reader);
final List<Double> area = shape.getArea();
int sz = area.size();
final int sz = area.size();
final Term term = new Term(fieldName);
// iterate through each boxid
for (int i =0; i< sz; i++) {
double boxId = area.get(i).doubleValue();
termDocs.seek(term.createTerm(NumericUtils.doubleToPrefixCoded(boxId)));
// iterate through all documents
// which have this boxId
while (termDocs.next()) {
bits.fastSet(termDocs.doc());
final BytesRef bytesRef = new BytesRef(NumericUtils.BUF_SIZE_LONG);
if (sz == 1) {
double boxId = area.get(0).doubleValue();
NumericUtils.longToPrefixCoded(NumericUtils.doubleToSortableLong(boxId), 0, bytesRef);
return new DocIdSet() {
@Override
public DocIdSetIterator iterator() throws IOException {
return MultiFields.getTermDocsEnum(reader, delDocs, fieldName, bytesRef);
}
@Override
public boolean isCacheable() {
return false;
}
};
} else {
final OpenBitSet bits = new OpenBitSet(reader.maxDoc());
for (int i =0; i< sz; i++) {
double boxId = area.get(i).doubleValue();
NumericUtils.longToPrefixCoded(NumericUtils.doubleToSortableLong(boxId), 0, bytesRef);
final DocsEnum docsEnum = MultiFields.getTermDocsEnum(reader, delDocs, fieldName, bytesRef);
if (docsEnum == null) continue;
// iterate through all documents
// which have this boxId
int doc;
while ((doc = docsEnum.nextDoc()) != DocIdSetIterator.NO_MORE_DOCS) {
bits.fastSet(doc);
}
}
return bits;
}
return bits;
}
}

View File

@ -24,6 +24,7 @@ import java.util.Map;
import org.apache.lucene.analysis.WhitespaceAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.NumericField;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriterConfig;
@ -49,7 +50,6 @@ import org.apache.lucene.spatial.tier.projections.SinusoidalProjector;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.RAMDirectory;
import org.apache.lucene.util.LuceneTestCase;
import org.apache.lucene.util.NumericUtils;
public class TestCartesian extends LuceneTestCase {
@ -96,8 +96,8 @@ public class TestCartesian extends LuceneTestCase {
doc.add(new Field("name", name,Field.Store.YES, Field.Index.ANALYZED));
// convert the lat / long to lucene fields
doc.add(new Field(latField, NumericUtils.doubleToPrefixCoded(lat),Field.Store.YES, Field.Index.NOT_ANALYZED));
doc.add(new Field(lngField, NumericUtils.doubleToPrefixCoded(lng),Field.Store.YES, Field.Index.NOT_ANALYZED));
doc.add(new NumericField(latField, Integer.MAX_VALUE, Field.Store.YES, true).setDoubleValue(lat));
doc.add(new NumericField(lngField, Integer.MAX_VALUE, Field.Store.YES, true).setDoubleValue(lng));
// add a default meta field to make searching all documents easy
doc.add(new Field("metafile", "doc",Field.Store.YES, Field.Index.ANALYZED));
@ -105,10 +105,9 @@ public class TestCartesian extends LuceneTestCase {
int ctpsize = ctps.size();
for (int i =0; i < ctpsize; i++){
CartesianTierPlotter ctp = ctps.get(i);
doc.add(new Field(ctp.getTierFieldName(),
NumericUtils.doubleToPrefixCoded(ctp.getTierBoxId(lat,lng)),
doc.add(new NumericField(ctp.getTierFieldName(), Integer.MAX_VALUE,
Field.Store.YES,
Field.Index.NOT_ANALYZED_NO_NORMS));
true).setDoubleValue(ctp.getTierBoxId(lat,lng)));
doc.add(new Field(geoHashPrefix, GeoHashUtils.encode(lat,lng),
Field.Store.YES,
@ -275,8 +274,8 @@ public class TestCartesian extends LuceneTestCase {
Document d = searcher.doc(scoreDocs[i].doc);
String name = d.get("name");
double rsLat = NumericUtils.prefixCodedToDouble(d.get(latField));
double rsLng = NumericUtils.prefixCodedToDouble(d.get(lngField));
double rsLat = Double.parseDouble(d.get(latField));
double rsLng = Double.parseDouble(d.get(lngField));
Double geo_distance = distances.get(scoreDocs[i].doc);
double distance = DistanceUtils.getInstance().getDistanceMi(lat, lng, rsLat, rsLng);
@ -369,8 +368,8 @@ public class TestCartesian extends LuceneTestCase {
for(int i =0 ; i < results; i++){
Document d = searcher.doc(scoreDocs[i].doc);
String name = d.get("name");
double rsLat = NumericUtils.prefixCodedToDouble(d.get(latField));
double rsLng = NumericUtils.prefixCodedToDouble(d.get(lngField));
double rsLat = Double.parseDouble(d.get(latField));
double rsLng = Double.parseDouble(d.get(lngField));
Double geo_distance = distances.get(scoreDocs[i].doc);
double distance = DistanceUtils.getInstance().getDistanceMi(lat, lng, rsLat, rsLng);
@ -464,8 +463,8 @@ public class TestCartesian extends LuceneTestCase {
Document d = searcher.doc(scoreDocs[i].doc);
String name = d.get("name");
double rsLat = NumericUtils.prefixCodedToDouble(d.get(latField));
double rsLng = NumericUtils.prefixCodedToDouble(d.get(lngField));
double rsLat = Double.parseDouble(d.get(latField));
double rsLng = Double.parseDouble(d.get(lngField));
Double geo_distance = distances.get(scoreDocs[i].doc);
double distance = DistanceUtils.getInstance().getDistanceMi(lat, lng, rsLat, rsLng);
@ -558,8 +557,8 @@ public class TestCartesian extends LuceneTestCase {
Document d = searcher.doc(scoreDocs[i].doc);
String name = d.get("name");
double rsLat = NumericUtils.prefixCodedToDouble(d.get(latField));
double rsLng = NumericUtils.prefixCodedToDouble(d.get(lngField));
double rsLat = Double.parseDouble(d.get(latField));
double rsLng = Double.parseDouble(d.get(lngField));
Double geo_distance = distances.get(scoreDocs[i].doc);
double distance = DistanceUtils.getInstance().getDistanceMi(lat, lng, rsLat, rsLng);

View File

@ -21,6 +21,7 @@ import java.io.IOException;
import org.apache.lucene.analysis.WhitespaceAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.NumericField;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.index.Term;
@ -28,7 +29,6 @@ import org.apache.lucene.index.IndexReader;
import org.apache.lucene.search.QueryWrapperFilter;
import org.apache.lucene.search.MatchAllDocsQuery;
import org.apache.lucene.util.LuceneTestCase;
import org.apache.lucene.util.NumericUtils;
import org.apache.lucene.store.RAMDirectory;
public class TestDistance extends LuceneTestCase {
@ -63,8 +63,8 @@ public class TestDistance extends LuceneTestCase {
doc.add(new Field("name", name,Field.Store.YES, Field.Index.ANALYZED));
// convert the lat / long to lucene fields
doc.add(new Field(latField, NumericUtils.doubleToPrefixCoded(lat),Field.Store.YES, Field.Index.NOT_ANALYZED));
doc.add(new Field(lngField, NumericUtils.doubleToPrefixCoded(lng),Field.Store.YES, Field.Index.NOT_ANALYZED));
doc.add(new NumericField(latField, Integer.MAX_VALUE, Field.Store.YES, true).setDoubleValue(lat));
doc.add(new NumericField(lngField, Integer.MAX_VALUE,Field.Store.YES, true).setDoubleValue(lng));
// add a default meta field to make searching all documents easy
doc.add(new Field("metafile", "doc",Field.Store.YES, Field.Index.ANALYZED));

View File

@ -21,8 +21,10 @@ import org.apache.lucene.index.IndexReader;
import java.util.Iterator;
import org.apache.lucene.index.TermEnum;
import org.apache.lucene.index.Term;
import org.apache.lucene.index.TermsEnum;
import org.apache.lucene.util.BytesRef;
import org.apache.lucene.index.Terms;
import org.apache.lucene.index.MultiFields;
import org.apache.lucene.util.StringHelper;
import java.io.*;
@ -52,55 +54,39 @@ public class LuceneDictionary implements Dictionary {
final class LuceneIterator implements Iterator<String> {
private TermEnum termEnum;
private Term actualTerm;
private boolean hasNextCalled;
private TermsEnum termsEnum;
private BytesRef pendingTerm;
LuceneIterator() {
try {
termEnum = reader.terms(new Term(field));
final Terms terms = MultiFields.getTerms(reader, field);
if (terms != null) {
termsEnum = terms.iterator();
pendingTerm = termsEnum.next();
}
} catch (IOException e) {
throw new RuntimeException(e);
}
}
public String next() {
if (!hasNextCalled) {
hasNext();
if (pendingTerm == null) {
return null;
}
hasNextCalled = false;
String result = pendingTerm.utf8ToString();
try {
termEnum.next();
pendingTerm = termsEnum.next();
} catch (IOException e) {
throw new RuntimeException(e);
}
return (actualTerm != null) ? actualTerm.text() : null;
return result;
}
public boolean hasNext() {
if (hasNextCalled) {
return actualTerm != null;
}
hasNextCalled = true;
actualTerm = termEnum.term();
// if there are no words return false
if (actualTerm == null) {
return false;
}
String currentField = actualTerm.field();
// if the next word doesn't have the same field return false
if (currentField != field) {
actualTerm = null;
return false;
}
return true;
return pendingTerm != null;
}
public void remove() {

View File

@ -17,16 +17,21 @@ package org.apache.lucene.queryParser.surround.query;
*/
import org.apache.lucene.index.Term;
import org.apache.lucene.index.TermEnum;
import org.apache.lucene.index.Terms;
import org.apache.lucene.util.BytesRef;
import org.apache.lucene.index.TermsEnum;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.MultiFields;
import java.io.IOException;
public class SrndPrefixQuery extends SimpleTerm {
private final BytesRef prefixRef;
public SrndPrefixQuery(String prefix, boolean quoted, char truncator) {
super(quoted);
this.prefix = prefix;
prefixRef = new BytesRef(prefix);
this.truncator = truncator;
}
@ -53,20 +58,35 @@ public class SrndPrefixQuery extends SimpleTerm {
MatchingTermVisitor mtv) throws IOException
{
/* inspired by PrefixQuery.rewrite(): */
TermEnum enumerator = reader.terms(getLucenePrefixTerm(fieldName));
try {
do {
Term term = enumerator.term();
if ((term != null)
&& term.text().startsWith(getPrefix())
&& term.field().equals(fieldName)) {
mtv.visitMatchingTerm(term);
Terms terms = MultiFields.getTerms(reader, fieldName);
if (terms != null) {
TermsEnum termsEnum = terms.iterator();
boolean skip = false;
TermsEnum.SeekStatus status = termsEnum.seek(new BytesRef(getPrefix()));
if (status == TermsEnum.SeekStatus.FOUND) {
mtv.visitMatchingTerm(getLucenePrefixTerm(fieldName));
} else if (status == TermsEnum.SeekStatus.NOT_FOUND) {
if (termsEnum.term().startsWith(prefixRef)) {
mtv.visitMatchingTerm(new Term(fieldName, termsEnum.term().utf8ToString()));
} else {
break;
skip = true;
}
} while (enumerator.next());
} finally {
enumerator.close();
} else {
// EOF
skip = true;
}
if (!skip) {
while(true) {
BytesRef text = termsEnum.next();
if (text != null && text.startsWith(prefixRef)) {
mtv.visitMatchingTerm(new Term(fieldName, text.utf8ToString()));
} else {
break;
}
}
}
}
}
}

View File

@ -20,7 +20,10 @@ import java.io.IOException;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.Term;
import org.apache.lucene.index.TermEnum;
import org.apache.lucene.index.TermsEnum;
import org.apache.lucene.index.Terms;
import org.apache.lucene.index.MultiFields;
import org.apache.lucene.util.BytesRef;
public class SrndTermQuery extends SimpleTerm {
@ -46,16 +49,14 @@ public class SrndTermQuery extends SimpleTerm {
MatchingTermVisitor mtv) throws IOException
{
/* check term presence in index here for symmetry with other SimpleTerm's */
TermEnum enumerator = reader.terms(getLuceneTerm(fieldName));
try {
Term it= enumerator.term(); /* same or following index term */
if ((it != null)
&& it.text().equals(getTermText())
&& it.field().equals(fieldName)) {
mtv.visitMatchingTerm(it);
Terms terms = MultiFields.getTerms(reader, fieldName);
if (terms != null) {
TermsEnum termsEnum = terms.iterator();
TermsEnum.SeekStatus status = termsEnum.seek(new BytesRef(getTermText()));
if (status == TermsEnum.SeekStatus.FOUND) {
mtv.visitMatchingTerm(getLuceneTerm(fieldName));
}
} finally {
enumerator.close();
}
}
}

View File

@ -17,8 +17,11 @@ package org.apache.lucene.queryParser.surround.query;
*/
import org.apache.lucene.index.Term;
import org.apache.lucene.index.TermEnum;
import org.apache.lucene.index.TermsEnum;
import org.apache.lucene.index.Terms;
import org.apache.lucene.util.BytesRef;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.MultiFields;
import java.io.IOException;
@ -40,6 +43,7 @@ public class SrndTruncQuery extends SimpleTerm {
private final char mask;
private String prefix;
private BytesRef prefixRef;
private Pattern pattern;
@ -68,6 +72,7 @@ public class SrndTruncQuery extends SimpleTerm {
i++;
}
prefix = truncated.substring(0, i);
prefixRef = new BytesRef(prefix);
StringBuilder re = new StringBuilder();
while (i < truncated.length()) {
@ -84,26 +89,37 @@ public class SrndTruncQuery extends SimpleTerm {
MatchingTermVisitor mtv) throws IOException
{
int prefixLength = prefix.length();
TermEnum enumerator = reader.terms(new Term(fieldName, prefix));
Matcher matcher = pattern.matcher("");
try {
do {
Term term = enumerator.term();
if (term != null) {
String text = term.text();
if ((! text.startsWith(prefix)) || (! term.field().equals(fieldName))) {
break;
} else {
matcher.reset( text.substring(prefixLength));
if (matcher.matches()) {
mtv.visitMatchingTerm(term);
}
}
Terms terms = MultiFields.getTerms(reader, fieldName);
if (terms != null) {
Matcher matcher = pattern.matcher("");
try {
TermsEnum termsEnum = terms.iterator();
TermsEnum.SeekStatus status = termsEnum.seek(prefixRef);
BytesRef text;
if (status == TermsEnum.SeekStatus.FOUND) {
text = prefixRef;
} else if (status == TermsEnum.SeekStatus.NOT_FOUND) {
text = termsEnum.term();
} else {
text = null;
}
} while (enumerator.next());
} finally {
enumerator.close();
matcher.reset();
while(text != null) {
if (text != null && text.startsWith(prefixRef)) {
String textString = text.utf8ToString();
matcher.reset(textString.substring(prefixLength));
if (matcher.matches()) {
mtv.visitMatchingTerm(new Term(fieldName, textString));
}
} else {
break;
}
text = termsEnum.next();
}
} finally {
matcher.reset();
}
}
}
}

View File

@ -17,12 +17,17 @@ package org.apache.lucene.analysis;
* limitations under the License.
*/
import org.apache.lucene.util.Attribute;
import org.apache.lucene.util.AttributeImpl;
import org.apache.lucene.util.AttributeSource;
import org.apache.lucene.util.BytesRef;
import org.apache.lucene.util.NumericUtils;
import org.apache.lucene.document.NumericField; // for javadocs
import org.apache.lucene.search.NumericRangeQuery; // for javadocs
import org.apache.lucene.search.NumericRangeFilter; // for javadocs
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
import org.apache.lucene.analysis.tokenattributes.TermAttribute;
import org.apache.lucene.analysis.tokenattributes.TermToBytesRefAttribute;
import org.apache.lucene.analysis.tokenattributes.TypeAttribute;
import org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;
@ -92,6 +97,88 @@ public final class NumericTokenStream extends TokenStream {
/** The lower precision tokens gets this token type assigned. */
public static final String TOKEN_TYPE_LOWER_PREC = "lowerPrecNumeric";
/** <b>Expert:</b> Use this attribute to get the details of the currently generated token
* @lucene.experimental
* @since 3.1
*/
public interface NumericTermAttribute extends Attribute {
/** Returns current shift value, undefined before first token */
int getShift();
/** Returns {@link NumericTokenStream}'s raw value as {@code long} */
long getRawValue();
/** Returns value size in bits (32 for {@code float}, {@code int}; 64 for {@code double}, {@code long}) */
int getValueSize();
}
private static final class NumericAttributeFactory extends AttributeFactory {
private final AttributeFactory delegate;
private NumericTokenStream ts = null;
NumericAttributeFactory(AttributeFactory delegate) {
this.delegate = delegate;
}
@Override
public AttributeImpl createAttributeInstance(Class<? extends Attribute> attClass) {
if (attClass == NumericTermAttribute.class)
return new NumericTermAttributeImpl(ts);
if (attClass.isAssignableFrom(CharTermAttribute.class) || attClass.isAssignableFrom(TermAttribute.class))
throw new IllegalArgumentException("NumericTokenStream does not support CharTermAttribute/TermAttribute.");
return delegate.createAttributeInstance(attClass);
}
}
private static final class NumericTermAttributeImpl extends AttributeImpl implements NumericTermAttribute,TermToBytesRefAttribute {
private final NumericTokenStream ts;
public NumericTermAttributeImpl(NumericTokenStream ts) {
this.ts = ts;
}
public int toBytesRef(BytesRef bytes) {
try {
assert ts.valSize == 64 || ts.valSize == 32;
return (ts.valSize == 64) ?
NumericUtils.longToPrefixCoded(ts.value, ts.shift, bytes) :
NumericUtils.intToPrefixCoded((int) ts.value, ts.shift, bytes);
} catch (IllegalArgumentException iae) {
// return empty token before first
bytes.length = 0;
return 0;
}
}
public int getShift() { return ts.shift; }
public long getRawValue() { return ts.value; }
public int getValueSize() { return ts.valSize; }
@Override
public void clear() {
// this attribute has no contents to clear
}
@Override
public boolean equals(Object other) {
return other == this;
}
@Override
public int hashCode() {
return System.identityHashCode(this);
}
@Override
public void copyTo(AttributeImpl target) {
// this attribute has no contents to copy
}
@Override
public Object clone() {
// cannot throw CloneNotSupportedException (checked)
throw new UnsupportedOperationException();
}
}
/**
* Creates a token stream for numeric values using the default <code>precisionStep</code>
* {@link NumericUtils#PRECISION_STEP_DEFAULT} (4). The stream is not yet initialized,
@ -107,23 +194,15 @@ public final class NumericTokenStream extends TokenStream {
* before using set a value using the various set<em>???</em>Value() methods.
*/
public NumericTokenStream(final int precisionStep) {
super();
this.precisionStep = precisionStep;
if (precisionStep < 1)
throw new IllegalArgumentException("precisionStep must be >=1");
}
super(new NumericAttributeFactory(AttributeFactory.DEFAULT_ATTRIBUTE_FACTORY));
// we must do this after the super call :(
((NumericAttributeFactory) getAttributeFactory()).ts = this;
addAttribute(NumericTermAttribute.class);
/**
* Expert: Creates a token stream for numeric values with the specified
* <code>precisionStep</code> using the given {@link AttributeSource}.
* The stream is not yet initialized,
* before using set a value using the various set<em>???</em>Value() methods.
*/
public NumericTokenStream(AttributeSource source, final int precisionStep) {
super(source);
this.precisionStep = precisionStep;
if (precisionStep < 1)
throw new IllegalArgumentException("precisionStep must be >=1");
shift = -precisionStep;
}
/**
@ -134,10 +213,15 @@ public final class NumericTokenStream extends TokenStream {
* before using set a value using the various set<em>???</em>Value() methods.
*/
public NumericTokenStream(AttributeFactory factory, final int precisionStep) {
super(factory);
super(new NumericAttributeFactory(factory));
// we must do this after the super call :(
((NumericAttributeFactory) getAttributeFactory()).ts = this;
addAttribute(NumericTermAttribute.class);
this.precisionStep = precisionStep;
if (precisionStep < 1)
throw new IllegalArgumentException("precisionStep must be >=1");
shift = -precisionStep;
}
/**
@ -149,7 +233,7 @@ public final class NumericTokenStream extends TokenStream {
public NumericTokenStream setLongValue(final long value) {
this.value = value;
valSize = 64;
shift = 0;
shift = -precisionStep;
return this;
}
@ -162,7 +246,7 @@ public final class NumericTokenStream extends TokenStream {
public NumericTokenStream setIntValue(final int value) {
this.value = value;
valSize = 32;
shift = 0;
shift = -precisionStep;
return this;
}
@ -175,7 +259,7 @@ public final class NumericTokenStream extends TokenStream {
public NumericTokenStream setDoubleValue(final double value) {
this.value = NumericUtils.doubleToSortableLong(value);
valSize = 64;
shift = 0;
shift = -precisionStep;
return this;
}
@ -188,7 +272,7 @@ public final class NumericTokenStream extends TokenStream {
public NumericTokenStream setFloatValue(final float value) {
this.value = NumericUtils.floatToSortableInt(value);
valSize = 32;
shift = 0;
shift = -precisionStep;
return this;
}
@ -196,37 +280,24 @@ public final class NumericTokenStream extends TokenStream {
public void reset() {
if (valSize == 0)
throw new IllegalStateException("call set???Value() before usage");
shift = 0;
shift = -precisionStep;
}
@Override
public boolean incrementToken() {
if (valSize == 0)
throw new IllegalStateException("call set???Value() before usage");
if (shift >= valSize)
shift += precisionStep;
if (shift >= valSize) {
// reset so the attribute still works after exhausted stream
shift -= precisionStep;
return false;
clearAttributes();
final char[] buffer;
switch (valSize) {
case 64:
buffer = termAtt.resizeTermBuffer(NumericUtils.BUF_SIZE_LONG);
termAtt.setTermLength(NumericUtils.longToPrefixCoded(value, shift, buffer));
break;
case 32:
buffer = termAtt.resizeTermBuffer(NumericUtils.BUF_SIZE_INT);
termAtt.setTermLength(NumericUtils.intToPrefixCoded((int) value, shift, buffer));
break;
default:
// should not happen
throw new IllegalArgumentException("valSize must be 32 or 64");
}
clearAttributes();
// the TermToBytesRefAttribute is directly accessing shift & value.
typeAtt.setType((shift == 0) ? TOKEN_TYPE_FULL_PREC : TOKEN_TYPE_LOWER_PREC);
posIncrAtt.setPositionIncrement((shift == 0) ? 1 : 0);
shift += precisionStep;
return true;
}
@ -238,12 +309,11 @@ public final class NumericTokenStream extends TokenStream {
}
// members
private final TermAttribute termAtt = addAttribute(TermAttribute.class);
private final TypeAttribute typeAtt = addAttribute(TypeAttribute.class);
private final PositionIncrementAttribute posIncrAtt = addAttribute(PositionIncrementAttribute.class);
private int shift = 0, valSize = 0; // valSize==0 means not initialized
int shift, valSize = 0; // valSize==0 means not initialized
private final int precisionStep;
private long value = 0L;
long value = 0L;
}

View File

@ -64,14 +64,14 @@ import org.apache.lucene.util.AttributeImpl;
implementing the {@link TokenStream#incrementToken()} API.
Failing that, to create a new Token you should first use
one of the constructors that starts with null text. To load
the token from a char[] use {@link #setTermBuffer(char[], int, int)}.
To load from a String use {@link #setTermBuffer(String)} or {@link #setTermBuffer(String, int, int)}.
Alternatively you can get the Token's termBuffer by calling either {@link #termBuffer()},
the token from a char[] use {@link #copyBuffer(char[], int, int)}.
To load from a String use {@link #setEmpty} followed by {@link #append(CharSequence)} or {@link #append(CharSequence, int, int)}.
Alternatively you can get the Token's termBuffer by calling either {@link #buffer()},
if you know that your text is shorter than the capacity of the termBuffer
or {@link #resizeTermBuffer(int)}, if there is any possibility
or {@link #resizeBuffer(int)}, if there is any possibility
that you may need to grow the buffer. Fill in the characters of your term into this
buffer, with {@link String#getChars(int, int, char[], int)} if loading from a string,
or with {@link System#arraycopy(Object, int, Object, int, int)}, and finally call {@link #setTermLength(int)} to
or with {@link System#arraycopy(Object, int, Object, int, int)}, and finally call {@link #setLength(int)} to
set the length of the term text. See <a target="_top"
href="https://issues.apache.org/jira/browse/LUCENE-969">LUCENE-969</a>
for details.</p>
@ -100,7 +100,7 @@ import org.apache.lucene.util.AttributeImpl;
</li>
<li> Copying from one one Token to another (type is reset to {@link #DEFAULT_TYPE} if not specified):<br/>
<pre>
return reusableToken.reinit(source.termBuffer(), 0, source.termLength(), source.startOffset(), source.endOffset()[, source.type()]);
return reusableToken.reinit(source.buffer(), 0, source.length(), source.startOffset(), source.endOffset()[, source.type()]);
</pre>
</li>
</ul>
@ -115,6 +115,7 @@ import org.apache.lucene.util.AttributeImpl;
@see org.apache.lucene.index.Payload
*/
// TODO: change superclass to CharTermAttribute in 4.0!
public class Token extends TermAttributeImpl
implements TypeAttribute, PositionIncrementAttribute,
FlagsAttribute, OffsetAttribute, PayloadAttribute {
@ -172,7 +173,7 @@ public class Token extends TermAttributeImpl
* @param end end offset
*/
public Token(String text, int start, int end) {
setTermBuffer(text);
append(text);
startOffset = start;
endOffset = end;
}
@ -187,7 +188,7 @@ public class Token extends TermAttributeImpl
* @param typ token type
*/
public Token(String text, int start, int end, String typ) {
setTermBuffer(text);
append(text);
startOffset = start;
endOffset = end;
type = typ;
@ -204,7 +205,7 @@ public class Token extends TermAttributeImpl
* @param flags token type bits
*/
public Token(String text, int start, int end, int flags) {
setTermBuffer(text);
append(text);
startOffset = start;
endOffset = end;
this.flags = flags;
@ -221,7 +222,7 @@ public class Token extends TermAttributeImpl
* @param end
*/
public Token(char[] startTermBuffer, int termBufferOffset, int termBufferLength, int start, int end) {
setTermBuffer(startTermBuffer, termBufferOffset, termBufferLength);
copyBuffer(startTermBuffer, termBufferOffset, termBufferLength);
startOffset = start;
endOffset = end;
}
@ -270,7 +271,7 @@ public class Token extends TermAttributeImpl
corresponding to this token in the source text.
Note that the difference between endOffset() and startOffset() may not be
equal to {@link #termLength}, as the term text may have been altered by a
equal to {@link #length}, as the term text may have been altered by a
stemmer or some other filter. */
public final int startOffset() {
return startOffset;
@ -351,7 +352,7 @@ public class Token extends TermAttributeImpl
@Override
public String toString() {
final StringBuilder sb = new StringBuilder();
sb.append('(').append(term()).append(',')
sb.append('(').append(super.toString()).append(',')
.append(startOffset).append(',').append(endOffset);
if (!"word".equals(type))
sb.append(",type=").append(type);
@ -387,7 +388,7 @@ public class Token extends TermAttributeImpl
/** Makes a clone, but replaces the term buffer &
* start/end offset in the process. This is more
* efficient than doing a full clone (and then calling
* setTermBuffer) because it saves a wasted copy of the old
* {@link #copyBuffer}) because it saves a wasted copy of the old
* termBuffer. */
public Token clone(char[] newTermBuffer, int newTermOffset, int newTermLength, int newStartOffset, int newEndOffset) {
final Token t = new Token(newTermBuffer, newTermOffset, newTermLength, newStartOffset, newEndOffset);
@ -442,16 +443,16 @@ public class Token extends TermAttributeImpl
}
/** Shorthand for calling {@link #clear},
* {@link #setTermBuffer(char[], int, int)},
* {@link #copyBuffer(char[], int, int)},
* {@link #setStartOffset},
* {@link #setEndOffset},
* {@link #setType}
* @return this Token instance */
public Token reinit(char[] newTermBuffer, int newTermOffset, int newTermLength, int newStartOffset, int newEndOffset, String newType) {
clearNoTermBuffer();
copyBuffer(newTermBuffer, newTermOffset, newTermLength);
payload = null;
positionIncrement = 1;
setTermBuffer(newTermBuffer, newTermOffset, newTermLength);
startOffset = newStartOffset;
endOffset = newEndOffset;
type = newType;
@ -459,14 +460,14 @@ public class Token extends TermAttributeImpl
}
/** Shorthand for calling {@link #clear},
* {@link #setTermBuffer(char[], int, int)},
* {@link #copyBuffer(char[], int, int)},
* {@link #setStartOffset},
* {@link #setEndOffset}
* {@link #setType} on Token.DEFAULT_TYPE
* @return this Token instance */
public Token reinit(char[] newTermBuffer, int newTermOffset, int newTermLength, int newStartOffset, int newEndOffset) {
clearNoTermBuffer();
setTermBuffer(newTermBuffer, newTermOffset, newTermLength);
copyBuffer(newTermBuffer, newTermOffset, newTermLength);
startOffset = newStartOffset;
endOffset = newEndOffset;
type = DEFAULT_TYPE;
@ -474,14 +475,14 @@ public class Token extends TermAttributeImpl
}
/** Shorthand for calling {@link #clear},
* {@link #setTermBuffer(String)},
* {@link #append(CharSequence)},
* {@link #setStartOffset},
* {@link #setEndOffset}
* {@link #setType}
* @return this Token instance */
public Token reinit(String newTerm, int newStartOffset, int newEndOffset, String newType) {
clearNoTermBuffer();
setTermBuffer(newTerm);
clear();
append(newTerm);
startOffset = newStartOffset;
endOffset = newEndOffset;
type = newType;
@ -489,14 +490,14 @@ public class Token extends TermAttributeImpl
}
/** Shorthand for calling {@link #clear},
* {@link #setTermBuffer(String, int, int)},
* {@link #append(CharSequence, int, int)},
* {@link #setStartOffset},
* {@link #setEndOffset}
* {@link #setType}
* @return this Token instance */
public Token reinit(String newTerm, int newTermOffset, int newTermLength, int newStartOffset, int newEndOffset, String newType) {
clearNoTermBuffer();
setTermBuffer(newTerm, newTermOffset, newTermLength);
clear();
append(newTerm, newTermOffset, newTermOffset + newTermLength);
startOffset = newStartOffset;
endOffset = newEndOffset;
type = newType;
@ -504,14 +505,14 @@ public class Token extends TermAttributeImpl
}
/** Shorthand for calling {@link #clear},
* {@link #setTermBuffer(String)},
* {@link #append(CharSequence)},
* {@link #setStartOffset},
* {@link #setEndOffset}
* {@link #setType} on Token.DEFAULT_TYPE
* @return this Token instance */
public Token reinit(String newTerm, int newStartOffset, int newEndOffset) {
clearNoTermBuffer();
setTermBuffer(newTerm);
clear();
append(newTerm);
startOffset = newStartOffset;
endOffset = newEndOffset;
type = DEFAULT_TYPE;
@ -519,14 +520,14 @@ public class Token extends TermAttributeImpl
}
/** Shorthand for calling {@link #clear},
* {@link #setTermBuffer(String, int, int)},
* {@link #append(CharSequence, int, int)},
* {@link #setStartOffset},
* {@link #setEndOffset}
* {@link #setType} on Token.DEFAULT_TYPE
* @return this Token instance */
public Token reinit(String newTerm, int newTermOffset, int newTermLength, int newStartOffset, int newEndOffset) {
clearNoTermBuffer();
setTermBuffer(newTerm, newTermOffset, newTermLength);
clear();
append(newTerm, newTermOffset, newTermOffset + newTermLength);
startOffset = newStartOffset;
endOffset = newEndOffset;
type = DEFAULT_TYPE;
@ -538,7 +539,7 @@ public class Token extends TermAttributeImpl
* @param prototype
*/
public void reinit(Token prototype) {
setTermBuffer(prototype.termBuffer(), 0, prototype.termLength());
copyBuffer(prototype.buffer(), 0, prototype.length());
positionIncrement = prototype.positionIncrement;
flags = prototype.flags;
startOffset = prototype.startOffset;
@ -553,7 +554,7 @@ public class Token extends TermAttributeImpl
* @param newTerm
*/
public void reinit(Token prototype, String newTerm) {
setTermBuffer(newTerm);
setEmpty().append(newTerm);
positionIncrement = prototype.positionIncrement;
flags = prototype.flags;
startOffset = prototype.startOffset;
@ -570,7 +571,7 @@ public class Token extends TermAttributeImpl
* @param length
*/
public void reinit(Token prototype, char[] newTermBuffer, int offset, int length) {
setTermBuffer(newTermBuffer, offset, length);
copyBuffer(newTermBuffer, offset, length);
positionIncrement = prototype.positionIncrement;
flags = prototype.flags;
startOffset = prototype.startOffset;

View File

@ -0,0 +1,71 @@
package org.apache.lucene.analysis.tokenattributes;
/**
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
import org.apache.lucene.util.Attribute;
/**
* The term text of a Token.
*/
public interface CharTermAttribute extends Attribute, CharSequence, Appendable {
/** Copies the contents of buffer, starting at offset for
* length characters, into the termBuffer array.
* @param buffer the buffer to copy
* @param offset the index in the buffer of the first character to copy
* @param length the number of characters to copy
*/
public void copyBuffer(char[] buffer, int offset, int length);
/** Returns the internal termBuffer character array which
* you can then directly alter. If the array is too
* small for your token, use {@link
* #resizeBuffer(int)} to increase it. After
* altering the buffer be sure to call {@link
* #setLength} to record the number of valid
* characters that were placed into the termBuffer. */
public char[] buffer();
/** Grows the termBuffer to at least size newSize, preserving the
* existing content.
* @param newSize minimum size of the new termBuffer
* @return newly created termBuffer with length >= newSize
*/
public char[] resizeBuffer(int newSize);
/** Set number of valid characters (length of the term) in
* the termBuffer array. Use this to truncate the termBuffer
* or to synchronize with external manipulation of the termBuffer.
* Note: to grow the size of the array,
* use {@link #resizeBuffer(int)} first.
* @param length the truncated length
*/
public CharTermAttribute setLength(int length);
/** Sets the length of the termBuffer to zero.
* Use this method before appending contents
* using the {@link Appendable} interface.
*/
public CharTermAttribute setEmpty();
// the following methods are redefined to get rid of IOException declaration:
public CharTermAttribute append(CharSequence csq);
public CharTermAttribute append(CharSequence csq, int start, int end);
public CharTermAttribute append(char c);
}

View File

@ -0,0 +1,255 @@
package org.apache.lucene.analysis.tokenattributes;
/**
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
import java.io.Serializable;
import java.nio.CharBuffer;
import org.apache.lucene.util.ArrayUtil;
import org.apache.lucene.util.AttributeImpl;
import org.apache.lucene.util.BytesRef;
import org.apache.lucene.util.RamUsageEstimator;
import org.apache.lucene.util.UnicodeUtil;
/**
* The term text of a Token.
*/
public class CharTermAttributeImpl extends AttributeImpl implements CharTermAttribute, TermAttribute, TermToBytesRefAttribute, Cloneable, Serializable {
private static int MIN_BUFFER_SIZE = 10;
private char[] termBuffer = new char[ArrayUtil.oversize(MIN_BUFFER_SIZE, RamUsageEstimator.NUM_BYTES_CHAR)];
private int termLength = 0;
@Deprecated
public String term() {
// don't delegate to toString() here!
return new String(termBuffer, 0, termLength);
}
public void copyBuffer(char[] buffer, int offset, int length) {
growTermBuffer(length);
System.arraycopy(buffer, offset, termBuffer, 0, length);
termLength = length;
}
@Deprecated
public void setTermBuffer(char[] buffer, int offset, int length) {
copyBuffer(buffer, offset, length);
}
@Deprecated
public void setTermBuffer(String buffer) {
int length = buffer.length();
growTermBuffer(length);
buffer.getChars(0, length, termBuffer, 0);
termLength = length;
}
@Deprecated
public void setTermBuffer(String buffer, int offset, int length) {
assert offset <= buffer.length();
assert offset + length <= buffer.length();
growTermBuffer(length);
buffer.getChars(offset, offset + length, termBuffer, 0);
termLength = length;
}
public char[] buffer() {
return termBuffer;
}
@Deprecated
public char[] termBuffer() {
return termBuffer;
}
public char[] resizeBuffer(int newSize) {
if (termBuffer == null) {
// The buffer is always at least MIN_BUFFER_SIZE
termBuffer = new char[ArrayUtil.oversize(newSize < MIN_BUFFER_SIZE ? MIN_BUFFER_SIZE : newSize, RamUsageEstimator.NUM_BYTES_CHAR)];
} else {
if(termBuffer.length < newSize){
// Not big enough; create a new array with slight
// over allocation and preserve content
final char[] newCharBuffer = new char[ArrayUtil.oversize(newSize, RamUsageEstimator.NUM_BYTES_CHAR)];
System.arraycopy(termBuffer, 0, newCharBuffer, 0, termBuffer.length);
termBuffer = newCharBuffer;
}
}
return termBuffer;
}
@Deprecated
public char[] resizeTermBuffer(int newSize) {
return resizeBuffer(newSize);
}
private void growTermBuffer(int newSize) {
if (termBuffer == null) {
// The buffer is always at least MIN_BUFFER_SIZE
termBuffer = new char[ArrayUtil.oversize(newSize < MIN_BUFFER_SIZE ? MIN_BUFFER_SIZE : newSize, RamUsageEstimator.NUM_BYTES_CHAR)];
} else {
if(termBuffer.length < newSize){
// Not big enough; create a new array with slight
// over allocation:
termBuffer = new char[ArrayUtil.oversize(newSize, RamUsageEstimator.NUM_BYTES_CHAR)];
}
}
}
@Deprecated
public int termLength() {
return termLength;
}
public CharTermAttribute setLength(int length) {
if (length > termBuffer.length)
throw new IllegalArgumentException("length " + length + " exceeds the size of the termBuffer (" + termBuffer.length + ")");
termLength = length;
return this;
}
public CharTermAttribute setEmpty() {
termLength = 0;
return this;
}
@Deprecated
public void setTermLength(int length) {
setLength(length);
}
// *** TermToBytesRefAttribute interface ***
public int toBytesRef(BytesRef target) {
// TODO: Maybe require that bytes is already initialized? TermsHashPerField ensures this.
if (target.bytes == null) {
target.bytes = new byte[termLength * 4];
}
return UnicodeUtil.UTF16toUTF8WithHash(termBuffer, 0, termLength, target);
}
// *** CharSequence interface ***
public int length() {
return termLength;
}
public char charAt(int index) {
if (index >= termLength)
throw new IndexOutOfBoundsException();
return termBuffer[index];
}
public CharSequence subSequence(final int start, final int end) {
if (start > termLength || end > termLength)
throw new IndexOutOfBoundsException();
return new String(termBuffer, start, end - start);
}
// *** Appendable interface ***
public CharTermAttribute append(CharSequence csq) {
return append(csq, 0, csq.length());
}
public CharTermAttribute append(CharSequence csq, int start, int end) {
resizeBuffer(termLength + end - start);
if (csq instanceof String) {
((String) csq).getChars(start, end, termBuffer, termLength);
} else if (csq instanceof StringBuilder) {
((StringBuilder) csq).getChars(start, end, termBuffer, termLength);
} else if (csq instanceof StringBuffer) {
((StringBuffer) csq).getChars(start, end, termBuffer, termLength);
} else if (csq instanceof CharBuffer && ((CharBuffer) csq).hasArray()) {
final CharBuffer cb = (CharBuffer) csq;
System.arraycopy(cb.array(), cb.arrayOffset() + cb.position() + start, termBuffer, termLength, end - start);
} else {
while (start < end)
termBuffer[termLength++] = csq.charAt(start++);
// no fall-through here, as termLength is updated!
return this;
}
termLength += end - start;
return this;
}
public CharTermAttribute append(char c) {
resizeBuffer(termLength + 1)[termLength++] = c;
return this;
}
// *** AttributeImpl ***
@Override
public int hashCode() {
int code = termLength;
code = code * 31 + ArrayUtil.hashCode(termBuffer, 0, termLength);
return code;
}
@Override
public void clear() {
termLength = 0;
}
@Override
public Object clone() {
CharTermAttributeImpl t = (CharTermAttributeImpl)super.clone();
// Do a deep clone
if (termBuffer != null) {
t.termBuffer = termBuffer.clone();
}
return t;
}
@Override
public boolean equals(Object other) {
if (other == this) {
return true;
}
if (other instanceof CharTermAttributeImpl) {
final CharTermAttributeImpl o = ((CharTermAttributeImpl) other);
if (termLength != o.termLength)
return false;
for(int i=0;i<termLength;i++) {
if (termBuffer[i] != o.termBuffer[i]) {
return false;
}
}
return true;
}
return false;
}
@Override
public String toString() {
return new String(termBuffer, 0, termLength);
}
@Override
public void copyTo(AttributeImpl target) {
if (target instanceof CharTermAttribute) {
CharTermAttribute t = (CharTermAttribute) target;
t.copyBuffer(termBuffer, 0, termLength);
} else {
TermAttribute t = (TermAttribute) target;
t.setTermBuffer(termBuffer, 0, termLength);
}
}
}

View File

@ -21,7 +21,9 @@ import org.apache.lucene.util.Attribute;
/**
* The term text of a Token.
* @deprecated Use {@link CharTermAttribute} instead.
*/
@Deprecated
public interface TermAttribute extends Attribute {
/** Returns the Token's term text.
*

View File

@ -17,211 +17,11 @@ package org.apache.lucene.analysis.tokenattributes;
* limitations under the License.
*/
import java.io.Serializable;
import org.apache.lucene.util.ArrayUtil;
import org.apache.lucene.util.AttributeImpl;
import org.apache.lucene.util.RamUsageEstimator;
/**
* The term text of a Token.
* @deprecated This class is only available for AttributeSource
* to be able to load an old TermAttribute implementation class.
*/
public class TermAttributeImpl extends AttributeImpl implements TermAttribute, Cloneable, Serializable {
private static int MIN_BUFFER_SIZE = 10;
private char[] termBuffer;
private int termLength;
/** Returns the Token's term text.
*
* This method has a performance penalty
* because the text is stored internally in a char[]. If
* possible, use {@link #termBuffer()} and {@link
* #termLength()} directly instead. If you really need a
* String, use this method, which is nothing more than
* a convenience call to <b>new String(token.termBuffer(), 0, token.termLength())</b>
*/
public String term() {
initTermBuffer();
return new String(termBuffer, 0, termLength);
}
/** Copies the contents of buffer, starting at offset for
* length characters, into the termBuffer array.
* @param buffer the buffer to copy
* @param offset the index in the buffer of the first character to copy
* @param length the number of characters to copy
*/
public void setTermBuffer(char[] buffer, int offset, int length) {
growTermBuffer(length);
System.arraycopy(buffer, offset, termBuffer, 0, length);
termLength = length;
}
/** Copies the contents of buffer into the termBuffer array.
* @param buffer the buffer to copy
*/
public void setTermBuffer(String buffer) {
int length = buffer.length();
growTermBuffer(length);
buffer.getChars(0, length, termBuffer, 0);
termLength = length;
}
/** Copies the contents of buffer, starting at offset and continuing
* for length characters, into the termBuffer array.
* @param buffer the buffer to copy
* @param offset the index in the buffer of the first character to copy
* @param length the number of characters to copy
*/
public void setTermBuffer(String buffer, int offset, int length) {
assert offset <= buffer.length();
assert offset + length <= buffer.length();
growTermBuffer(length);
buffer.getChars(offset, offset + length, termBuffer, 0);
termLength = length;
}
/** Returns the internal termBuffer character array which
* you can then directly alter. If the array is too
* small for your token, use {@link
* #resizeTermBuffer(int)} to increase it. After
* altering the buffer be sure to call {@link
* #setTermLength} to record the number of valid
* characters that were placed into the termBuffer. */
public char[] termBuffer() {
initTermBuffer();
return termBuffer;
}
/** Grows the termBuffer to at least size newSize, preserving the
* existing content. Note: If the next operation is to change
* the contents of the term buffer use
* {@link #setTermBuffer(char[], int, int)},
* {@link #setTermBuffer(String)}, or
* {@link #setTermBuffer(String, int, int)}
* to optimally combine the resize with the setting of the termBuffer.
* @param newSize minimum size of the new termBuffer
* @return newly created termBuffer with length >= newSize
*/
public char[] resizeTermBuffer(int newSize) {
if (termBuffer == null) {
// The buffer is always at least MIN_BUFFER_SIZE
termBuffer = new char[ArrayUtil.oversize(newSize < MIN_BUFFER_SIZE ? MIN_BUFFER_SIZE : newSize, RamUsageEstimator.NUM_BYTES_CHAR)];
} else {
if(termBuffer.length < newSize){
// Not big enough; create a new array with slight
// over allocation and preserve content
final char[] newCharBuffer = new char[ArrayUtil.oversize(newSize, RamUsageEstimator.NUM_BYTES_CHAR)];
System.arraycopy(termBuffer, 0, newCharBuffer, 0, termBuffer.length);
termBuffer = newCharBuffer;
}
}
return termBuffer;
}
/** Allocates a buffer char[] of at least newSize, without preserving the existing content.
* its always used in places that set the content
* @param newSize minimum size of the buffer
*/
private void growTermBuffer(int newSize) {
if (termBuffer == null) {
// The buffer is always at least MIN_BUFFER_SIZE
termBuffer = new char[ArrayUtil.oversize(newSize < MIN_BUFFER_SIZE ? MIN_BUFFER_SIZE : newSize, RamUsageEstimator.NUM_BYTES_CHAR)];
} else {
if(termBuffer.length < newSize){
// Not big enough; create a new array with slight
// over allocation:
termBuffer = new char[ArrayUtil.oversize(newSize, RamUsageEstimator.NUM_BYTES_CHAR)];
}
}
}
private void initTermBuffer() {
if (termBuffer == null) {
termBuffer = new char[ArrayUtil.oversize(MIN_BUFFER_SIZE, RamUsageEstimator.NUM_BYTES_CHAR)];
termLength = 0;
}
}
/** Return number of valid characters (length of the term)
* in the termBuffer array. */
public int termLength() {
return termLength;
}
/** Set number of valid characters (length of the term) in
* the termBuffer array. Use this to truncate the termBuffer
* or to synchronize with external manipulation of the termBuffer.
* Note: to grow the size of the array,
* use {@link #resizeTermBuffer(int)} first.
* @param length the truncated length
*/
public void setTermLength(int length) {
initTermBuffer();
if (length > termBuffer.length)
throw new IllegalArgumentException("length " + length + " exceeds the size of the termBuffer (" + termBuffer.length + ")");
termLength = length;
}
@Override
public int hashCode() {
initTermBuffer();
int code = termLength;
code = code * 31 + ArrayUtil.hashCode(termBuffer, 0, termLength);
return code;
}
@Override
public void clear() {
termLength = 0;
}
@Override
public Object clone() {
TermAttributeImpl t = (TermAttributeImpl)super.clone();
// Do a deep clone
if (termBuffer != null) {
t.termBuffer = termBuffer.clone();
}
return t;
}
@Override
public boolean equals(Object other) {
if (other == this) {
return true;
}
if (other instanceof TermAttributeImpl) {
initTermBuffer();
TermAttributeImpl o = ((TermAttributeImpl) other);
o.initTermBuffer();
if (termLength != o.termLength)
return false;
for(int i=0;i<termLength;i++) {
if (termBuffer[i] != o.termBuffer[i]) {
return false;
}
}
return true;
}
return false;
}
@Override
public String toString() {
initTermBuffer();
return "term=" + new String(termBuffer, 0, termLength);
}
@Override
public void copyTo(AttributeImpl target) {
initTermBuffer();
TermAttribute t = (TermAttribute) target;
t.setTermBuffer(termBuffer, 0, termLength);
}
@Deprecated
public class TermAttributeImpl extends CharTermAttributeImpl {
}

View File

@ -0,0 +1,47 @@
package org.apache.lucene.analysis.tokenattributes;
/**
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
import org.apache.lucene.util.Attribute;
import org.apache.lucene.util.BytesRef;
/**
* This attribute is requested by TermsHashPerField to index the contents.
* This attribute has no real state, it should be implemented in addition to
* {@link CharTermAttribute}, to support indexing the term text as
* UTF-8 bytes.
* @lucene.experimental This is a very expert API, please use
* {@link CharTermAttributeImpl} and its implementation of this method
* for UTF-8 terms.
*/
public interface TermToBytesRefAttribute extends Attribute {
/** Copies the token's term text into the given {@link BytesRef}.
* @param termBytes destination to write the bytes to (UTF-8 for text terms).
* @return the hashcode as defined by {@link BytesRef#hashCode}:
* <pre>
* int hash = 0;
* for (int i = termBytes.offset; i &lt; termBytes.offset+termBytes.length; i++) {
* hash = 31*hash + termBytes.bytes[i];
* }
* </pre>
* Implement this for performance reasons, if your code can calculate
* the hash on-the-fly. If this is not the case, just return
* {@code termBytes.hashCode()}.
*/
public int toBytesRef(BytesRef termBytes);
}

View File

@ -21,6 +21,8 @@ import java.util.zip.Deflater;
import java.util.zip.Inflater;
import java.util.zip.DataFormatException;
import java.io.ByteArrayOutputStream;
import org.apache.lucene.util.BytesRef;
import org.apache.lucene.util.UnicodeUtil;
/** Simple utility class providing static methods to
@ -84,9 +86,9 @@ public class CompressionTools {
* compressionLevel (constants are defined in
* java.util.zip.Deflater). */
public static byte[] compressString(String value, int compressionLevel) {
UnicodeUtil.UTF8Result result = new UnicodeUtil.UTF8Result();
BytesRef result = new BytesRef(10);
UnicodeUtil.UTF16toUTF8(value, 0, value.length(), result);
return compress(result.result, 0, result.length, compressionLevel);
return compress(result.bytes, 0, result.length, compressionLevel);
}
/** Decompress the byte array previously returned by

View File

@ -26,6 +26,7 @@ import java.io.IOException;
* packages. This means the API is freely subject to
* change, and, the class could be removed entirely, in any
* Lucene release. Use directly at your own risk! */
@Deprecated
public abstract class AbstractAllTermDocs implements TermDocs {
protected int maxDoc;

View File

@ -0,0 +1,78 @@
/**
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.lucene.index;
import org.apache.lucene.util.Bits;
import java.io.IOException;
class AllDocsEnum extends DocsEnum {
protected final Bits skipDocs;
protected final int maxDoc;
protected final IndexReader reader;
protected int doc = -1;
protected AllDocsEnum(IndexReader reader, Bits skipDocs) {
this.skipDocs = skipDocs;
this.maxDoc = reader.maxDoc();
this.reader = reader;
}
@Override
public int freq() {
return 1;
}
@Override
public int docID() {
return doc;
}
@Override
public int nextDoc() throws IOException {
return advance(doc+1);
}
@Override
public int read() throws IOException {
final int[] docs = bulkResult.docs.ints;
final int[] freqs = bulkResult.freqs.ints;
int i = 0;
while (i < docs.length && doc < maxDoc) {
if (skipDocs == null || !skipDocs.get(doc)) {
docs[i] = doc;
freqs[i] = 1;
++i;
}
doc++;
}
return i;
}
@Override
public int advance(int target) throws IOException {
doc = target;
while (doc < maxDoc) {
if (skipDocs == null || !skipDocs.get(doc)) {
return doc;
}
doc++;
}
doc = NO_MORE_DOCS;
return doc;
}
}

View File

@ -19,6 +19,8 @@ package org.apache.lucene.index;
import org.apache.lucene.util.BitVector;
/** @deprecated Switch to AllDocsEnum */
@Deprecated
class AllTermDocs extends AbstractAllTermDocs {
protected BitVector deletedDocs;

View File

@ -34,11 +34,11 @@ package org.apache.lucene.index;
* hit a non-zero byte. */
import java.util.Arrays;
import org.apache.lucene.util.BytesRef;
import java.util.List;
import static org.apache.lucene.util.RamUsageEstimator.NUM_BYTES_OBJECT_REF;
import org.apache.lucene.util.ArrayUtil;
final class ByteBlockPool {
abstract static class Allocator {
@ -149,5 +149,23 @@ final class ByteBlockPool {
return newUpto+3;
}
// Fill in a BytesRef from term's length & bytes encoded in
// byte block
final BytesRef setBytesRef(BytesRef term, int textStart) {
final byte[] bytes = term.bytes = buffers[textStart >> DocumentsWriter.BYTE_BLOCK_SHIFT];
int pos = textStart & DocumentsWriter.BYTE_BLOCK_MASK;
if ((bytes[pos] & 0x80) == 0) {
// length is 1 byte
term.length = bytes[pos];
term.offset = pos+1;
} else {
// length is 2 bytes
term.length = (bytes[pos]&0x7f) + ((bytes[pos+1]&0xff)<<7);
term.offset = pos+2;
}
assert term.length >= 0;
return term;
}
}

View File

@ -17,16 +17,17 @@ package org.apache.lucene.index;
* limitations under the License.
*/
import org.apache.lucene.store.IndexInput;
import org.apache.lucene.store.IndexOutput;
import java.io.IOException;
import org.apache.lucene.store.DataInput;
import org.apache.lucene.store.DataOutput;
/* IndexInput that knows how to read the byte slices written
* by Posting and PostingVector. We read the bytes in
* each slice until we hit the end of that slice at which
* point we read the forwarding address of the next slice
* and then jump to it.*/
final class ByteSliceReader extends IndexInput {
final class ByteSliceReader extends DataInput {
ByteBlockPool pool;
int bufferUpto;
byte[] buffer;
@ -75,7 +76,7 @@ final class ByteSliceReader extends IndexInput {
return buffer[upto++];
}
public long writeTo(IndexOutput out) throws IOException {
public long writeTo(DataOutput out) throws IOException {
long size = 0;
while(true) {
if (limit + bufferOffset == endIndex) {
@ -136,14 +137,4 @@ final class ByteSliceReader extends IndexInput {
}
}
}
@Override
public long getFilePointer() {throw new RuntimeException("not implemented");}
@Override
public long length() {throw new RuntimeException("not implemented");}
@Override
public void seek(long pos) {throw new RuntimeException("not implemented");}
@Override
public void close() {throw new RuntimeException("not implemented");}
}

View File

@ -1,5 +1,7 @@
package org.apache.lucene.index;
import org.apache.lucene.store.DataOutput;
/**
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
@ -24,7 +26,7 @@ package org.apache.lucene.index;
* posting list for many terms in RAM.
*/
final class ByteSliceWriter {
final class ByteSliceWriter extends DataOutput {
private byte[] slice;
private int upto;
@ -48,6 +50,7 @@ final class ByteSliceWriter {
}
/** Write byte into byte slice stream */
@Override
public void writeByte(byte b) {
assert slice != null;
if (slice[upto] != 0) {
@ -60,6 +63,7 @@ final class ByteSliceWriter {
assert upto != slice.length;
}
@Override
public void writeBytes(final byte[] b, int offset, final int len) {
final int offsetEnd = offset + len;
while(offset < offsetEnd) {
@ -78,12 +82,4 @@ final class ByteSliceWriter {
public int getAddress() {
return upto + (offset0 & DocumentsWriter.BYTE_BLOCK_NOT_MASK);
}
public void writeVInt(int i) {
while ((i & ~0x7F) != 0) {
writeByte((byte)((i & 0x7f) | 0x80));
i >>>= 7;
}
writeByte((byte) i);
}
}

View File

@ -1,60 +0,0 @@
package org.apache.lucene.index;
/**
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
import static org.apache.lucene.util.RamUsageEstimator.NUM_BYTES_OBJECT_REF;
import org.apache.lucene.util.ArrayUtil;
final class CharBlockPool {
public char[][] buffers = new char[10][];
int numBuffer;
int bufferUpto = -1; // Which buffer we are upto
public int charUpto = DocumentsWriter.CHAR_BLOCK_SIZE; // Where we are in head buffer
public char[] buffer; // Current head buffer
public int charOffset = -DocumentsWriter.CHAR_BLOCK_SIZE; // Current head offset
final private DocumentsWriter docWriter;
public CharBlockPool(DocumentsWriter docWriter) {
this.docWriter = docWriter;
}
public void reset() {
docWriter.recycleCharBlocks(buffers, 1+bufferUpto);
bufferUpto = -1;
charUpto = DocumentsWriter.CHAR_BLOCK_SIZE;
charOffset = -DocumentsWriter.CHAR_BLOCK_SIZE;
}
public void nextBuffer() {
if (1+bufferUpto == buffers.length) {
char[][] newBuffers = new char[ArrayUtil.oversize(buffers.length+1,
NUM_BYTES_OBJECT_REF)][];
System.arraycopy(buffers, 0, newBuffers, 0, buffers.length);
buffers = newBuffers;
}
buffer = buffers[1+bufferUpto] = docWriter.getCharBlock();
bufferUpto++;
charUpto = 0;
charOffset += DocumentsWriter.CHAR_BLOCK_SIZE;
}
}

View File

@ -22,6 +22,9 @@ import org.apache.lucene.store.Directory;
import org.apache.lucene.store.IndexInput;
import org.apache.lucene.document.AbstractField; // for javadocs
import org.apache.lucene.document.Document;
import org.apache.lucene.index.codecs.CodecProvider;
import org.apache.lucene.util.Bits;
import org.apache.lucene.util.BytesRef;
import java.text.NumberFormat;
import java.io.PrintStream;
@ -122,6 +125,9 @@ public class CheckIndex {
/** Name of the segment. */
public String name;
/** Name of codec used to read this segment. */
public String codec;
/** Document count (does not take deletions into account). */
public int docCount;
@ -263,26 +269,6 @@ public class CheckIndex {
infoStream.println(msg);
}
private static class MySegmentTermDocs extends SegmentTermDocs {
int delCount;
MySegmentTermDocs(SegmentReader p) {
super(p);
}
@Override
public void seek(Term term) throws IOException {
super.seek(term);
delCount = 0;
}
@Override
protected void skippingDoc() throws IOException {
delCount++;
}
}
/** Returns a {@link Status} instance detailing
* the state of the index.
*
@ -296,6 +282,10 @@ public class CheckIndex {
return checkIndex(null);
}
protected Status checkIndex(List<String> onlySegments) throws IOException {
return checkIndex(onlySegments, CodecProvider.getDefault());
}
/** Returns a {@link Status} instance detailing
* the state of the index.
*
@ -308,13 +298,13 @@ public class CheckIndex {
* <p><b>WARNING</b>: make sure
* you only call this when the index is not opened by any
* writer. */
public Status checkIndex(List<String> onlySegments) throws IOException {
protected Status checkIndex(List<String> onlySegments, CodecProvider codecs) throws IOException {
NumberFormat nf = NumberFormat.getInstance();
SegmentInfos sis = new SegmentInfos();
Status result = new Status();
result.dir = dir;
try {
sis.read(dir);
sis.read(dir, codecs);
} catch (Throwable t) {
msg("ERROR: could not read any segments file in directory");
result.missingSegments = true;
@ -371,6 +361,8 @@ public class CheckIndex {
sFormat = "FORMAT_USER_DATA [Lucene 2.9]";
else if (format == SegmentInfos.FORMAT_DIAGNOSTICS)
sFormat = "FORMAT_DIAGNOSTICS [Lucene 2.9]";
else if (format == SegmentInfos.FORMAT_FLEX_POSTINGS)
sFormat = "FORMAT_FLEX_POSTINGS [Lucene 3.1]";
else if (format < SegmentInfos.CURRENT_FORMAT) {
sFormat = "int=" + format + " [newer version of Lucene than this tool]";
skip = true;
@ -429,6 +421,9 @@ public class CheckIndex {
SegmentReader reader = null;
try {
final String codec = info.getCodec().name;
msg(" codec=" + codec);
segInfoStat.codec = codec;
msg(" compound=" + info.getUseCompoundFile());
segInfoStat.compound = info.getUseCompoundFile();
msg(" hasProx=" + info.getHasProx());
@ -452,6 +447,7 @@ public class CheckIndex {
msg(" docStoreIsCompoundFile=" + info.getDocStoreIsCompoundFile());
segInfoStat.docStoreCompoundFile = info.getDocStoreIsCompoundFile();
}
final String delFileName = info.getDelFileName();
if (delFileName == null){
msg(" no deletions");
@ -503,7 +499,7 @@ public class CheckIndex {
segInfoStat.fieldNormStatus = testFieldNorms(fieldNames, reader);
// Test the Term Index
segInfoStat.termIndexStatus = testTermIndex(info, reader);
segInfoStat.termIndexStatus = testTermIndex(reader);
// Test Stored Fields
segInfoStat.storedFieldStatus = testStoredFields(info, reader, nf);
@ -586,69 +582,129 @@ public class CheckIndex {
/**
* Test the term index.
*/
private Status.TermIndexStatus testTermIndex(SegmentInfo info, SegmentReader reader) {
private Status.TermIndexStatus testTermIndex(SegmentReader reader) {
final Status.TermIndexStatus status = new Status.TermIndexStatus();
final int maxDoc = reader.maxDoc();
final Bits delDocs = reader.getDeletedDocs();
try {
if (infoStream != null) {
infoStream.print(" test: terms, freq, prox...");
}
final TermEnum termEnum = reader.terms();
final TermPositions termPositions = reader.termPositions();
final Fields fields = reader.fields();
if (fields == null) {
msg("OK [no fields/terms]");
return status;
}
// Used only to count up # deleted docs for this term
final MySegmentTermDocs myTermDocs = new MySegmentTermDocs(reader);
final FieldsEnum fieldsEnum = fields.iterator();
while(true) {
final String field = fieldsEnum.next();
if (field == null) {
break;
}
final int maxDoc = reader.maxDoc();
final TermsEnum terms = fieldsEnum.terms();
while (termEnum.next()) {
status.termCount++;
final Term term = termEnum.term();
final int docFreq = termEnum.docFreq();
termPositions.seek(term);
int lastDoc = -1;
int freq0 = 0;
status.totFreq += docFreq;
while (termPositions.next()) {
freq0++;
final int doc = termPositions.doc();
final int freq = termPositions.freq();
if (doc <= lastDoc)
throw new RuntimeException("term " + term + ": doc " + doc + " <= lastDoc " + lastDoc);
if (doc >= maxDoc)
throw new RuntimeException("term " + term + ": doc " + doc + " >= maxDoc " + maxDoc);
DocsEnum docs = null;
DocsAndPositionsEnum postings = null;
lastDoc = doc;
if (freq <= 0)
throw new RuntimeException("term " + term + ": doc " + doc + ": freq " + freq + " is out of bounds");
boolean hasOrd = true;
final long termCountStart = status.termCount;
int lastPos = -1;
status.totPos += freq;
for(int j=0;j<freq;j++) {
final int pos = termPositions.nextPosition();
if (pos < -1)
throw new RuntimeException("term " + term + ": doc " + doc + ": pos " + pos + " is out of bounds");
if (pos < lastPos)
throw new RuntimeException("term " + term + ": doc " + doc + ": pos " + pos + " < lastPos " + lastPos);
lastPos = pos;
while(true) {
final BytesRef term = terms.next();
if (term == null) {
break;
}
}
// Now count how many deleted docs occurred in
// this term:
final int delCount;
if (reader.hasDeletions()) {
myTermDocs.seek(term);
while(myTermDocs.next()) { }
delCount = myTermDocs.delCount;
} else {
delCount = 0;
}
final int docFreq = terms.docFreq();
status.totFreq += docFreq;
if (freq0 + delCount != docFreq) {
throw new RuntimeException("term " + term + " docFreq=" +
docFreq + " != num docs seen " + freq0 + " + num docs deleted " + delCount);
docs = terms.docs(delDocs, docs);
postings = terms.docsAndPositions(delDocs, postings);
if (hasOrd) {
long ord = -1;
try {
ord = terms.ord();
} catch (UnsupportedOperationException uoe) {
hasOrd = false;
}
if (hasOrd) {
final long ordExpected = status.termCount - termCountStart;
if (ord != ordExpected) {
throw new RuntimeException("ord mismatch: TermsEnum has ord=" + ord + " vs actual=" + ordExpected);
}
}
}
status.termCount++;
final DocsEnum docs2;
if (postings != null) {
docs2 = postings;
} else {
docs2 = docs;
}
int lastDoc = -1;
while(true) {
final int doc = docs2.nextDoc();
if (doc == DocsEnum.NO_MORE_DOCS) {
break;
}
final int freq = docs2.freq();
status.totPos += freq;
if (doc <= lastDoc) {
throw new RuntimeException("term " + term + ": doc " + doc + " <= lastDoc " + lastDoc);
}
if (doc >= maxDoc) {
throw new RuntimeException("term " + term + ": doc " + doc + " >= maxDoc " + maxDoc);
}
lastDoc = doc;
if (freq <= 0) {
throw new RuntimeException("term " + term + ": doc " + doc + ": freq " + freq + " is out of bounds");
}
int lastPos = -1;
if (postings != null) {
for(int j=0;j<freq;j++) {
final int pos = postings.nextPosition();
if (pos < -1) {
throw new RuntimeException("term " + term + ": doc " + doc + ": pos " + pos + " is out of bounds");
}
if (pos < lastPos) {
throw new RuntimeException("term " + term + ": doc " + doc + ": pos " + pos + " < lastPos " + lastPos);
}
lastPos = pos;
if (postings.getPayloadLength() != 0) {
postings.getPayload();
}
}
}
}
// Now count how many deleted docs occurred in
// this term:
if (reader.hasDeletions()) {
final DocsEnum docsNoDel = terms.docs(null, docs);
int count = 0;
while(docsNoDel.nextDoc() != DocsEnum.NO_MORE_DOCS) {
count++;
}
if (count != docFreq) {
throw new RuntimeException("term " + term + " docFreq=" + docFreq + " != tot docs w/o deletions " + count);
}
}
}
}

View File

@ -31,8 +31,9 @@ import java.io.IOException;
* Class for accessing a compound stream.
* This class implements a directory, but is limited to only read operations.
* Directory methods that would normally modify data throw an exception.
* @lucene.experimental
*/
class CompoundFileReader extends Directory {
public class CompoundFileReader extends Directory {
private int readBufferSize;

View File

@ -25,7 +25,7 @@ import java.util.Collection;
import java.util.Collections;
import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Map;
import java.util.Set;
@ -35,6 +35,11 @@ import org.apache.lucene.search.Similarity;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.Lock;
import org.apache.lucene.store.LockObtainFailedException;
import org.apache.lucene.index.codecs.CodecProvider;
import org.apache.lucene.util.Bits;
import org.apache.lucene.util.ReaderUtil;
import org.apache.lucene.util.BytesRef;
import org.apache.lucene.search.FieldCache; // not great (circular); used only to purge FieldCache entry on close
/**
@ -44,12 +49,13 @@ class DirectoryReader extends IndexReader implements Cloneable {
protected Directory directory;
protected boolean readOnly;
protected CodecProvider codecs;
IndexWriter writer;
private IndexDeletionPolicy deletionPolicy;
private Lock writeLock;
private SegmentInfos segmentInfos;
private SegmentInfos segmentInfosStart;
private boolean stale;
private final int termInfosIndexDivisor;
@ -58,34 +64,57 @@ class DirectoryReader extends IndexReader implements Cloneable {
private SegmentReader[] subReaders;
private int[] starts; // 1st docno for each segment
private final Map<SegmentReader,ReaderUtil.Slice> subReaderToSlice = new HashMap<SegmentReader,ReaderUtil.Slice>();
private Map<String,byte[]> normsCache = new HashMap<String,byte[]>();
private int maxDoc = 0;
private int numDocs = -1;
private boolean hasDeletions = false;
// static IndexReader open(final Directory directory, final IndexDeletionPolicy deletionPolicy, final IndexCommit commit, final boolean readOnly,
// final int termInfosIndexDivisor) throws CorruptIndexException, IOException {
// return open(directory, deletionPolicy, commit, readOnly, termInfosIndexDivisor, null);
// }
static IndexReader open(final Directory directory, final IndexDeletionPolicy deletionPolicy, final IndexCommit commit, final boolean readOnly,
final int termInfosIndexDivisor) throws CorruptIndexException, IOException {
final int termInfosIndexDivisor, CodecProvider codecs) throws CorruptIndexException, IOException {
final CodecProvider codecs2;
if (codecs == null) {
codecs2 = CodecProvider.getDefault();
} else {
codecs2 = codecs;
}
return (IndexReader) new SegmentInfos.FindSegmentsFile(directory) {
@Override
protected Object doBody(String segmentFileName) throws CorruptIndexException, IOException {
SegmentInfos infos = new SegmentInfos();
infos.read(directory, segmentFileName);
infos.read(directory, segmentFileName, codecs2);
if (readOnly)
return new ReadOnlyDirectoryReader(directory, infos, deletionPolicy, termInfosIndexDivisor);
return new ReadOnlyDirectoryReader(directory, infos, deletionPolicy, termInfosIndexDivisor, codecs2);
else
return new DirectoryReader(directory, infos, deletionPolicy, false, termInfosIndexDivisor);
return new DirectoryReader(directory, infos, deletionPolicy, false, termInfosIndexDivisor, codecs2);
}
}.run(commit);
}
/** Construct reading the named set of readers. */
DirectoryReader(Directory directory, SegmentInfos sis, IndexDeletionPolicy deletionPolicy, boolean readOnly, int termInfosIndexDivisor) throws IOException {
// DirectoryReader(Directory directory, SegmentInfos sis, IndexDeletionPolicy deletionPolicy, boolean readOnly, int termInfosIndexDivisor) throws IOException {
// this(directory, sis, deletionPolicy, readOnly, termInfosIndexDivisor, null);
// }
/** Construct reading the named set of readers. */
DirectoryReader(Directory directory, SegmentInfos sis, IndexDeletionPolicy deletionPolicy, boolean readOnly, int termInfosIndexDivisor, CodecProvider codecs) throws IOException {
this.directory = directory;
this.readOnly = readOnly;
this.segmentInfos = sis;
this.deletionPolicy = deletionPolicy;
this.termInfosIndexDivisor = termInfosIndexDivisor;
if (codecs == null) {
this.codecs = CodecProvider.getDefault();
} else {
this.codecs = codecs;
}
// To reduce the chance of hitting FileNotFound
// (and having to retry), we open segments in
// reverse because IndexWriter merges & deletes
@ -115,12 +144,16 @@ class DirectoryReader extends IndexReader implements Cloneable {
}
// Used by near real-time search
DirectoryReader(IndexWriter writer, SegmentInfos infos, int termInfosIndexDivisor) throws IOException {
DirectoryReader(IndexWriter writer, SegmentInfos infos, int termInfosIndexDivisor, CodecProvider codecs) throws IOException {
this.directory = writer.getDirectory();
this.readOnly = true;
segmentInfos = infos;
segmentInfosStart = (SegmentInfos) infos.clone();
this.termInfosIndexDivisor = termInfosIndexDivisor;
if (codecs == null) {
this.codecs = CodecProvider.getDefault();
} else {
this.codecs = codecs;
}
// IndexWriter synchronizes externally before calling
// us, which ensures infos will not change; so there's
@ -166,11 +199,17 @@ class DirectoryReader extends IndexReader implements Cloneable {
/** This constructor is only used for {@link #reopen()} */
DirectoryReader(Directory directory, SegmentInfos infos, SegmentReader[] oldReaders, int[] oldStarts,
Map<String,byte[]> oldNormsCache, boolean readOnly, boolean doClone, int termInfosIndexDivisor) throws IOException {
Map<String,byte[]> oldNormsCache, boolean readOnly, boolean doClone, int termInfosIndexDivisor, CodecProvider codecs) throws IOException {
this.directory = directory;
this.readOnly = readOnly;
this.segmentInfos = infos;
this.termInfosIndexDivisor = termInfosIndexDivisor;
if (codecs == null) {
this.codecs = CodecProvider.getDefault();
} else {
this.codecs = codecs;
}
// we put the old SegmentReaders in a map, that allows us
// to lookup a reader using its segment name
@ -296,24 +335,44 @@ class DirectoryReader extends IndexReader implements Cloneable {
buffer.append(' ');
}
buffer.append(subReaders[i]);
buffer.append(' ');
}
buffer.append(')');
return buffer.toString();
}
private void initialize(SegmentReader[] subReaders) {
private void initialize(SegmentReader[] subReaders) throws IOException {
this.subReaders = subReaders;
starts = new int[subReaders.length + 1]; // build starts array
final List<Fields> subFields = new ArrayList<Fields>();
final List<ReaderUtil.Slice> fieldSlices = new ArrayList<ReaderUtil.Slice>();
for (int i = 0; i < subReaders.length; i++) {
starts[i] = maxDoc;
maxDoc += subReaders[i].maxDoc(); // compute maxDocs
if (subReaders[i].hasDeletions())
if (subReaders[i].hasDeletions()) {
hasDeletions = true;
}
final ReaderUtil.Slice slice = new ReaderUtil.Slice(starts[i], subReaders[i].maxDoc(), i);
subReaderToSlice.put(subReaders[i], slice);
final Fields f = subReaders[i].fields();
if (f != null) {
subFields.add(f);
fieldSlices.add(slice);
}
}
starts[subReaders.length] = maxDoc;
}
@Override
public Bits getDeletedDocs() {
throw new UnsupportedOperationException("please use MultiFields.getDeletedDocs if you really need a top level Bits deletedDocs (NOTE that it's usually better to work per segment instead)");
}
@Override
public final synchronized Object clone() {
try {
@ -435,7 +494,7 @@ class DirectoryReader extends IndexReader implements Cloneable {
@Override
protected Object doBody(String segmentFileName) throws CorruptIndexException, IOException {
SegmentInfos infos = new SegmentInfos();
infos.read(directory, segmentFileName);
infos.read(directory, segmentFileName, codecs);
return doReopen(infos, false, openReadOnly);
}
}.run(commit);
@ -444,9 +503,9 @@ class DirectoryReader extends IndexReader implements Cloneable {
private synchronized DirectoryReader doReopen(SegmentInfos infos, boolean doClone, boolean openReadOnly) throws CorruptIndexException, IOException {
DirectoryReader reader;
if (openReadOnly) {
reader = new ReadOnlyDirectoryReader(directory, infos, subReaders, starts, normsCache, doClone, termInfosIndexDivisor);
reader = new ReadOnlyDirectoryReader(directory, infos, subReaders, starts, normsCache, doClone, termInfosIndexDivisor, null);
} else {
reader = new DirectoryReader(directory, infos, subReaders, starts, normsCache, false, doClone, termInfosIndexDivisor);
reader = new DirectoryReader(directory, infos, subReaders, starts, normsCache, false, doClone, termInfosIndexDivisor, null);
}
return reader;
}
@ -640,7 +699,7 @@ class DirectoryReader extends IndexReader implements Cloneable {
// Optimize single segment case:
return subReaders[0].terms();
} else {
return new MultiTermEnum(this, subReaders, starts, null);
return new MultiTermEnum(this, subReaders, starts, null);
}
}
@ -664,6 +723,16 @@ class DirectoryReader extends IndexReader implements Cloneable {
return total;
}
@Override
public int docFreq(String field, BytesRef term) throws IOException {
ensureOpen();
int total = 0; // sum freqs in segments
for (int i = 0; i < subReaders.length; i++) {
total += subReaders[i].docFreq(field, term);
}
return total;
}
@Override
public TermDocs termDocs() throws IOException {
ensureOpen();
@ -686,6 +755,11 @@ class DirectoryReader extends IndexReader implements Cloneable {
}
}
@Override
public Fields fields() throws IOException {
throw new UnsupportedOperationException("please use MultiFields.getFields if you really need a top level Fields (NOTE that it's usually better to work per segment instead)");
}
@Override
public TermPositions termPositions() throws IOException {
ensureOpen();
@ -731,7 +805,7 @@ class DirectoryReader extends IndexReader implements Cloneable {
// we have to check whether index has changed since this reader was opened.
// if so, this reader is no longer valid for deletion
if (SegmentInfos.readCurrentVersion(directory) > segmentInfos.getVersion()) {
if (SegmentInfos.readCurrentVersion(directory, codecs) > segmentInfos.getVersion()) {
stale = true;
this.writeLock.release();
this.writeLock = null;
@ -751,13 +825,18 @@ class DirectoryReader extends IndexReader implements Cloneable {
*/
@Override
protected void doCommit(Map<String,String> commitUserData) throws IOException {
// poll subreaders for changes
for (int i = 0; !hasChanges && i < subReaders.length; i++) {
hasChanges |= subReaders[i].hasChanges;
}
if (hasChanges) {
segmentInfos.setUserData(commitUserData);
// Default deleter (for backwards compatibility) is
// KeepOnlyLastCommitDeleter:
IndexFileDeleter deleter = new IndexFileDeleter(directory,
deletionPolicy == null ? new KeepOnlyLastCommitDeletionPolicy() : deletionPolicy,
segmentInfos, null, null);
segmentInfos, null, null, codecs);
// Checkpoint the state we are about to change, in
// case we have to roll back:
@ -827,21 +906,31 @@ class DirectoryReader extends IndexReader implements Cloneable {
}
}
@Override
public long getUniqueTermCount() throws IOException {
throw new UnsupportedOperationException("");
}
@Override
public Map<String,String> getCommitUserData() {
ensureOpen();
return segmentInfos.getUserData();
}
/**
* Check whether this IndexReader is still using the current (i.e., most recently committed) version of the index. If
* a writer has committed any changes to the index since this reader was opened, this will return <code>false</code>,
* in which case you must open a new IndexReader in order
* to see the changes. Use {@link IndexWriter#commit} to
* commit changes to the index.
*
* @throws CorruptIndexException if the index is corrupt
* @throws IOException if there is a low-level IO error
*/
@Override
public boolean isCurrent() throws CorruptIndexException, IOException {
ensureOpen();
if (writer == null || writer.isClosed()) {
// we loaded SegmentInfos from the directory
return SegmentInfos.readCurrentVersion(directory) == segmentInfos.getVersion();
} else {
return writer.nrtIsCurrent(segmentInfosStart);
}
return SegmentInfos.readCurrentVersion(directory, codecs) == segmentInfos.getVersion();
}
@Override
@ -893,6 +982,11 @@ class DirectoryReader extends IndexReader implements Cloneable {
return subReaders;
}
@Override
public int getSubReaderDocBase(IndexReader subReader) {
return subReaderToSlice.get(subReader).start;
}
/** Returns the directory this index resides in. */
@Override
public Directory directory() {
@ -919,12 +1013,17 @@ class DirectoryReader extends IndexReader implements Cloneable {
/** @see org.apache.lucene.index.IndexReader#listCommits */
public static Collection<IndexCommit> listCommits(Directory dir) throws IOException {
return listCommits(dir, CodecProvider.getDefault());
}
/** @see org.apache.lucene.index.IndexReader#listCommits */
public static Collection<IndexCommit> listCommits(Directory dir, CodecProvider codecs) throws IOException {
final String[] files = dir.listAll();
Collection<IndexCommit> commits = new ArrayList<IndexCommit>();
SegmentInfos latest = new SegmentInfos();
latest.read(dir);
latest.read(dir, codecs);
final long currentGen = latest.getGeneration();
commits.add(new ReaderCommit(latest, dir));
@ -941,7 +1040,7 @@ class DirectoryReader extends IndexReader implements Cloneable {
try {
// IOException allowed to throw there, in case
// segments_N is corrupt
sis.read(dir, fileName);
sis.read(dir, fileName, codecs);
} catch (FileNotFoundException fnfe) {
// LUCENE-948: on NFS (and maybe others), if
// you have writers switching back and forth
@ -1021,29 +1120,33 @@ class DirectoryReader extends IndexReader implements Cloneable {
}
}
// @deprecated This is pre-flex API
// Exposes pre-flex API by doing on-the-fly merging
// pre-flex API to each segment
static class MultiTermEnum extends TermEnum {
IndexReader topReader; // used for matching TermEnum to TermDocs
private SegmentMergeQueue queue;
private LegacySegmentMergeQueue queue;
private Term term;
private int docFreq;
final SegmentMergeInfo[] matchingSegments; // null terminated array of matching segments
final LegacySegmentMergeInfo[] matchingSegments; // null terminated array of matching segments
public MultiTermEnum(IndexReader topReader, IndexReader[] readers, int[] starts, Term t)
throws IOException {
this.topReader = topReader;
queue = new SegmentMergeQueue(readers.length);
matchingSegments = new SegmentMergeInfo[readers.length+1];
queue = new LegacySegmentMergeQueue(readers.length);
matchingSegments = new LegacySegmentMergeInfo[readers.length+1];
for (int i = 0; i < readers.length; i++) {
IndexReader reader = readers[i];
TermEnum termEnum;
if (t != null) {
termEnum = reader.terms(t);
} else
} else {
termEnum = reader.terms();
}
SegmentMergeInfo smi = new SegmentMergeInfo(starts[i], termEnum, reader);
LegacySegmentMergeInfo smi = new LegacySegmentMergeInfo(starts[i], termEnum, reader);
smi.ord = i;
if (t == null ? smi.next() : termEnum.term() != null)
queue.add(smi); // initialize queue
@ -1059,7 +1162,7 @@ class DirectoryReader extends IndexReader implements Cloneable {
@Override
public boolean next() throws IOException {
for (int i=0; i<matchingSegments.length; i++) {
SegmentMergeInfo smi = matchingSegments[i];
LegacySegmentMergeInfo smi = matchingSegments[i];
if (smi==null) break;
if (smi.next())
queue.add(smi);
@ -1070,7 +1173,7 @@ class DirectoryReader extends IndexReader implements Cloneable {
int numMatchingSegments = 0;
matchingSegments[0] = null;
SegmentMergeInfo top = queue.top();
LegacySegmentMergeInfo top = queue.top();
if (top == null) {
term = null;
@ -1107,6 +1210,9 @@ class DirectoryReader extends IndexReader implements Cloneable {
}
}
// @deprecated This is pre-flex API
// Exposes pre-flex API by doing on-the-fly merging
// pre-flex API to each segment
static class MultiTermDocs implements TermDocs {
IndexReader topReader; // used for matching TermEnum to TermDocs
protected IndexReader[] readers;
@ -1121,7 +1227,7 @@ class DirectoryReader extends IndexReader implements Cloneable {
private MultiTermEnum tenum; // the term enum used for seeking... can be null
int matchingSegmentPos; // position into the matching segments from tenum
SegmentMergeInfo smi; // current segment mere info... can be null
LegacySegmentMergeInfo smi; // current segment mere info... can be null
public MultiTermDocs(IndexReader topReader, IndexReader[] r, int[] s) {
this.topReader = topReader;
@ -1217,7 +1323,7 @@ class DirectoryReader extends IndexReader implements Cloneable {
return true;
} else if (pointer < readers.length) {
if (tenum != null) {
SegmentMergeInfo smi = tenum.matchingSegments[matchingSegmentPos++];
LegacySegmentMergeInfo smi = tenum.matchingSegments[matchingSegmentPos++];
if (smi==null) {
pointer = readers.length;
return false;
@ -1258,6 +1364,9 @@ class DirectoryReader extends IndexReader implements Cloneable {
}
}
// @deprecated This is pre-flex API
// Exposes pre-flex API by doing on-the-fly merging
// pre-flex API to each segment
static class MultiTermPositions extends MultiTermDocs implements TermPositions {
public MultiTermPositions(IndexReader topReader, IndexReader[] r, int[] s) {
super(topReader,r,s);

View File

@ -67,7 +67,7 @@ final class DocFieldProcessor extends DocConsumer {
// consumer can alter the FieldInfo* if necessary. EG,
// FreqProxTermsWriter does this with
// FieldInfo.storePayload.
final String fileName = state.segmentFileName(IndexFileNames.FIELD_INFOS_EXTENSION);
final String fileName = IndexFileNames.segmentFileName(state.segmentName, IndexFileNames.FIELD_INFOS_EXTENSION);
fieldInfos.write(state.directory, fileName);
state.flushedFiles.add(fileName);
}

View File

@ -113,8 +113,9 @@ final class DocFieldProcessorPerThread extends DocConsumerPerThread {
else
lastPerField.next = perField.next;
if (state.docWriter.infoStream != null)
state.docWriter.infoStream.println(" purge field=" + perField.fieldInfo.name);
if (state.infoStream != null) {
state.infoStream.println(" purge field=" + perField.fieldInfo.name);
}
totalFieldCount--;
@ -247,7 +248,7 @@ final class DocFieldProcessorPerThread extends DocConsumerPerThread {
fields[i].consumer.processFields(fields[i].fields, fields[i].fieldCount);
if (docState.maxTermPrefix != null && docState.infoStream != null) {
docState.infoStream.println("WARNING: document contains at least one immense term (longer than the max length " + DocumentsWriter.MAX_TERM_LENGTH + "), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '" + docState.maxTermPrefix + "...'");
docState.infoStream.println("WARNING: document contains at least one immense term (whose UTF8 encoding is longer than the max length " + DocumentsWriter.MAX_TERM_LENGTH_UTF8 + "), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '" + docState.maxTermPrefix + "...'");
docState.maxTermPrefix = null;
}

View File

@ -116,8 +116,9 @@ final class DocInverterPerField extends DocFieldConsumerPerField {
reader = readerValue;
else {
String stringValue = field.stringValue();
if (stringValue == null)
if (stringValue == null) {
throw new IllegalArgumentException("field must have either TokenStream, String or Reader value");
}
perThread.stringReader.init(stringValue);
reader = perThread.stringReader;
}

View File

@ -21,7 +21,7 @@ import java.io.IOException;
import org.apache.lucene.util.AttributeSource;
import org.apache.lucene.analysis.tokenattributes.OffsetAttribute;
import org.apache.lucene.analysis.tokenattributes.TermAttribute;
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
/** This is a DocFieldConsumer that inverts each field,
* separately, from a Document, and accepts a
@ -34,16 +34,16 @@ final class DocInverterPerThread extends DocFieldConsumerPerThread {
final SingleTokenAttributeSource singleToken = new SingleTokenAttributeSource();
static class SingleTokenAttributeSource extends AttributeSource {
final TermAttribute termAttribute;
final CharTermAttribute termAttribute;
final OffsetAttribute offsetAttribute;
private SingleTokenAttributeSource() {
termAttribute = addAttribute(TermAttribute.class);
termAttribute = addAttribute(CharTermAttribute.class);
offsetAttribute = addAttribute(OffsetAttribute.class);
}
public void reinit(String stringValue, int startOffset, int endOffset) {
termAttribute.setTermBuffer(stringValue);
termAttribute.setEmpty().append(stringValue);
offsetAttribute.setOffset(startOffset, endOffset);
}
}

View File

@ -0,0 +1,44 @@
package org.apache.lucene.index;
/**
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
import java.io.IOException;
import org.apache.lucene.util.BytesRef;
/** Also iterates through positions. */
public abstract class DocsAndPositionsEnum extends DocsEnum {
/** Returns the next position. You should only call this
* up to {@link DocsEnum#freq()} times else
* the behavior is not defined. */
public abstract int nextPosition() throws IOException;
/** Returns length of payload at current position */
public abstract int getPayloadLength();
/** Returns the payload at this position, or null if no
* payload was indexed. */
public abstract BytesRef getPayload() throws IOException;
public abstract boolean hasPayload();
public final int read(int[] docs, int[] freqs) {
throw new UnsupportedOperationException();
}
}

View File

@ -0,0 +1,93 @@
package org.apache.lucene.index;
/**
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
import java.io.IOException;
import org.apache.lucene.search.DocIdSetIterator;
import org.apache.lucene.util.AttributeSource;
import org.apache.lucene.util.IntsRef;
/** Iterates through the documents, term freq and positions.
* NOTE: you must first call {@link #nextDoc}.
*
* @lucene.experimental */
public abstract class DocsEnum extends DocIdSetIterator {
private AttributeSource atts = null;
/** Returns term frequency in the current document. Do
* not call this before {@link #nextDoc} is first called,
* nor after {@link #nextDoc} returns NO_MORE_DOCS. */
public abstract int freq();
/** Returns the related attributes. */
public AttributeSource attributes() {
if (atts == null) atts = new AttributeSource();
return atts;
}
// TODO: maybe add bulk read only docIDs (for eventual
// match-only scoring)
public static class BulkReadResult {
public final IntsRef docs = new IntsRef();
public final IntsRef freqs = new IntsRef();
}
protected BulkReadResult bulkResult;
protected final void initBulkResult() {
if (bulkResult == null) {
bulkResult = new BulkReadResult();
bulkResult.docs.ints = new int[64];
bulkResult.freqs.ints = new int[64];
}
}
public BulkReadResult getBulkResult() {
initBulkResult();
return bulkResult;
}
/** Bulk read (docs and freqs). After this is called,
* {@link #docID()} and {@link #freq} are undefined. This
* returns the count read, or 0 if the end is reached.
* The IntsRef for docs and freqs will not have their
* length set.
*
* <p>NOTE: the default impl simply delegates to {@link
* #nextDoc}, but subclasses may do this more
* efficiently. */
public int read() throws IOException {
int count = 0;
final int[] docs = bulkResult.docs.ints;
final int[] freqs = bulkResult.freqs.ints;
while(count < docs.length) {
final int doc = nextDoc();
if (doc != NO_MORE_DOCS) {
docs[count] = doc;
freqs[count] = freq();
count++;
} else {
break;
}
}
return count;
}
}

View File

@ -30,6 +30,7 @@ import java.util.Map.Entry;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.codecs.Codec;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.Scorer;
@ -41,6 +42,7 @@ import org.apache.lucene.store.RAMFile;
import org.apache.lucene.util.ArrayUtil;
import org.apache.lucene.util.Constants;
import org.apache.lucene.util.ThreadInterruptedException;
import org.apache.lucene.util.BytesRef;
import org.apache.lucene.util.RamUsageEstimator;
/**
@ -282,7 +284,6 @@ final class DocumentsWriter {
// If we've allocated 5% over our RAM budget, we then
// free down to 95%
private long freeTrigger = (long) (IndexWriterConfig.DEFAULT_RAM_BUFFER_SIZE_MB*1024*1024*1.05);
private long freeLevel = (long) (IndexWriterConfig.DEFAULT_RAM_BUFFER_SIZE_MB*1024*1024*0.95);
// Flush @ this number of docs. If ramBufferSize is
@ -353,7 +354,6 @@ final class DocumentsWriter {
ramBufferSize = (long) (mb*1024*1024);
waitQueuePauseBytes = (long) (ramBufferSize*0.1);
waitQueueResumeBytes = (long) (ramBufferSize*0.05);
freeTrigger = (long) (1.05 * ramBufferSize);
freeLevel = (long) (0.95 * ramBufferSize);
}
}
@ -550,7 +550,6 @@ final class DocumentsWriter {
flushPending = false;
for(int i=0;i<threadStates.length;i++)
threadStates[i].doAfterFlush();
numBytesUsed = 0;
}
// Returns true if an abort is in progress
@ -590,7 +589,14 @@ final class DocumentsWriter {
synchronized private void initFlushState(boolean onlyDocStore) {
initSegmentName(onlyDocStore);
flushState = new SegmentWriteState(this, directory, segment, docStoreSegment, numDocsInRAM, numDocsInStore, writer.getConfig().getTermIndexInterval());
flushState = new SegmentWriteState(infoStream, directory, segment, docFieldProcessor.fieldInfos,
docStoreSegment, numDocsInRAM, numDocsInStore, writer.getConfig().getTermIndexInterval(),
writer.codecs);
}
/** Returns the codec used to flush the last segment */
Codec getCodec() {
return flushState.codec;
}
/** Flush all pending docs to a new segment */
@ -628,9 +634,10 @@ final class DocumentsWriter {
consumer.flush(threads, flushState);
if (infoStream != null) {
SegmentInfo si = new SegmentInfo(flushState.segmentName, flushState.numDocs, directory);
SegmentInfo si = new SegmentInfo(flushState.segmentName, flushState.numDocs, directory, flushState.codec);
si.setHasProx(hasProx());
final long newSegmentSize = si.sizeInBytes();
String message = " oldRAMSize=" + numBytesUsed +
String message = " ramUsed=" + nf.format(((double) numBytesUsed)/1024./1024.) + " MB" +
" newFlushedSize=" + newSegmentSize +
" docs/MB=" + nf.format(numDocsInRAM/(newSegmentSize/1024./1024.)) +
" new/old=" + nf.format(100.0*newSegmentSize/numBytesUsed) + "%";
@ -659,8 +666,9 @@ final class DocumentsWriter {
CompoundFileWriter cfsWriter = new CompoundFileWriter(directory,
IndexFileNames.segmentFileName(segment, IndexFileNames.COMPOUND_FILE_EXTENSION));
for (final String flushedFile : flushState.flushedFiles)
cfsWriter.addFile(flushedFile);
for(String fileName : flushState.flushedFiles) {
cfsWriter.addFile(fileName);
}
// Perform the merge
cfsWriter.close();
@ -1032,28 +1040,58 @@ final class DocumentsWriter {
// Delete by term
if (deletesFlushed.terms.size() > 0) {
TermDocs docs = reader.termDocs();
try {
Fields fields = reader.fields();
TermsEnum termsEnum = null;
String currentField = null;
BytesRef termRef = new BytesRef();
DocsEnum docs = null;
for (Entry<Term, BufferedDeletes.Num> entry: deletesFlushed.terms.entrySet()) {
Term term = entry.getKey();
// LUCENE-2086: we should be iterating a TreeMap,
// here, so terms better be in order:
// Since we visit terms sorted, we gain performance
// by re-using the same TermsEnum and seeking only
// forwards
if (term.field() != currentField) {
assert currentField == null || currentField.compareTo(term.field()) < 0;
currentField = term.field();
Terms terms = fields.terms(currentField);
if (terms != null) {
termsEnum = terms.iterator();
} else {
termsEnum = null;
}
}
if (termsEnum == null) {
continue;
}
assert checkDeleteTerm(term);
docs.seek(term);
int limit = entry.getValue().getNum();
while (docs.next()) {
int docID = docs.doc();
if (docIDStart+docID >= limit)
break;
reader.deleteDocument(docID);
any = true;
termRef.copy(term.text());
if (termsEnum.seek(termRef, false) == TermsEnum.SeekStatus.FOUND) {
DocsEnum docsEnum = termsEnum.docs(reader.getDeletedDocs(), docs);
if (docsEnum != null) {
docs = docsEnum;
int limit = entry.getValue().getNum();
while (true) {
final int docID = docs.nextDoc();
if (docID == DocsEnum.NO_MORE_DOCS || docIDStart+docID >= limit) {
break;
}
reader.deleteDocument(docID);
any = true;
}
}
}
}
} finally {
docs.close();
//docs.close();
}
}
// Delete by docID
for (Integer docIdInt : deletesFlushed.docIDs) {
int docID = docIdInt.intValue();
@ -1118,7 +1156,7 @@ final class DocumentsWriter {
}
synchronized boolean doBalanceRAM() {
return ramBufferSize != IndexWriterConfig.DISABLE_AUTO_FLUSH && !bufferIsFull && (numBytesUsed+deletesInRAM.bytesUsed+deletesFlushed.bytesUsed >= ramBufferSize || numBytesAlloc >= freeTrigger);
return ramBufferSize != IndexWriterConfig.DISABLE_AUTO_FLUSH && !bufferIsFull && (numBytesUsed+deletesInRAM.bytesUsed+deletesFlushed.bytesUsed >= ramBufferSize);
}
/** Does the synchronized work to finish/flush the
@ -1201,7 +1239,6 @@ final class DocumentsWriter {
return numBytesUsed + deletesInRAM.bytesUsed + deletesFlushed.bytesUsed;
}
long numBytesAlloc;
long numBytesUsed;
NumberFormat nf = NumberFormat.getInstance();
@ -1243,6 +1280,8 @@ final class DocumentsWriter {
final static int BYTE_BLOCK_MASK = BYTE_BLOCK_SIZE - 1;
final static int BYTE_BLOCK_NOT_MASK = ~BYTE_BLOCK_MASK;
final static int MAX_TERM_LENGTH_UTF8 = BYTE_BLOCK_SIZE-2;
private class ByteBlockAllocator extends ByteBlockPool.Allocator {
final int blockSize;
@ -1259,19 +1298,16 @@ final class DocumentsWriter {
final int size = freeByteBlocks.size();
final byte[] b;
if (0 == size) {
b = new byte[blockSize];
// Always record a block allocated, even if
// trackAllocations is false. This is necessary
// because this block will be shared between
// things that don't track allocations (term
// vectors) and things that do (freq/prox
// postings).
numBytesAlloc += blockSize;
b = new byte[blockSize];
numBytesUsed += blockSize;
} else
b = freeByteBlocks.remove(size-1);
if (trackAllocations)
numBytesUsed += blockSize;
assert numBytesUsed <= numBytesAlloc;
return b;
}
}
@ -1291,7 +1327,7 @@ final class DocumentsWriter {
final int size = blocks.size();
for(int i=0;i<size;i++)
freeByteBlocks.add(blocks.get(i));
}
}
}
}
@ -1308,30 +1344,21 @@ final class DocumentsWriter {
final int size = freeIntBlocks.size();
final int[] b;
if (0 == size) {
b = new int[INT_BLOCK_SIZE];
// Always record a block allocated, even if
// trackAllocations is false. This is necessary
// because this block will be shared between
// things that don't track allocations (term
// vectors) and things that do (freq/prox
// postings).
numBytesAlloc += INT_BLOCK_SIZE*INT_NUM_BYTE;
b = new int[INT_BLOCK_SIZE];
numBytesUsed += INT_BLOCK_SIZE*INT_NUM_BYTE;
} else
b = freeIntBlocks.remove(size-1);
if (trackAllocations)
numBytesUsed += INT_BLOCK_SIZE*INT_NUM_BYTE;
assert numBytesUsed <= numBytesAlloc;
return b;
}
synchronized void bytesAllocated(long numBytes) {
numBytesAlloc += numBytes;
assert numBytesUsed <= numBytesAlloc;
}
synchronized void bytesUsed(long numBytes) {
numBytesUsed += numBytes;
assert numBytesUsed <= numBytesAlloc;
}
/* Return int[]s to the pool */
@ -1346,78 +1373,34 @@ final class DocumentsWriter {
final ByteBlockAllocator perDocAllocator = new ByteBlockAllocator(PER_DOC_BLOCK_SIZE);
/* Initial chunk size of the shared char[] blocks used to
store term text */
final static int CHAR_BLOCK_SHIFT = 14;
final static int CHAR_BLOCK_SIZE = 1 << CHAR_BLOCK_SHIFT;
final static int CHAR_BLOCK_MASK = CHAR_BLOCK_SIZE - 1;
final static int MAX_TERM_LENGTH = CHAR_BLOCK_SIZE-1;
private ArrayList<char[]> freeCharBlocks = new ArrayList<char[]>();
/* Allocate another char[] from the shared pool */
synchronized char[] getCharBlock() {
final int size = freeCharBlocks.size();
final char[] c;
if (0 == size) {
numBytesAlloc += CHAR_BLOCK_SIZE * CHAR_NUM_BYTE;
c = new char[CHAR_BLOCK_SIZE];
} else
c = freeCharBlocks.remove(size-1);
// We always track allocations of char blocks, for now,
// because nothing that skips allocation tracking
// (currently only term vectors) uses its own char
// blocks.
numBytesUsed += CHAR_BLOCK_SIZE * CHAR_NUM_BYTE;
assert numBytesUsed <= numBytesAlloc;
return c;
}
/* Return char[]s to the pool */
synchronized void recycleCharBlocks(char[][] blocks, int numBlocks) {
for(int i=0;i<numBlocks;i++)
freeCharBlocks.add(blocks[i]);
}
String toMB(long v) {
return nf.format(v/1024./1024.);
}
/* We have four pools of RAM: Postings, byte blocks
* (holds freq/prox posting data), char blocks (holds
* characters in the term) and per-doc buffers (stored fields/term vectors).
* Different docs require varying amount of storage from
* these four classes.
*
* For example, docs with many unique single-occurrence
* short terms will use up the Postings RAM and hardly any
* of the other two. Whereas docs with very large terms
* will use alot of char blocks RAM and relatively less of
* the other two. This method just frees allocations from
* the pools once we are over-budget, which balances the
* pools to match the current docs. */
/* We have three pools of RAM: Postings, byte blocks
* (holds freq/prox posting data) and per-doc buffers
* (stored fields/term vectors). Different docs require
* varying amount of storage from these classes. For
* example, docs with many unique single-occurrence short
* terms will use up the Postings RAM and hardly any of
* the other two. Whereas docs with very large terms will
* use alot of byte blocks RAM. This method just frees
* allocations from the pools once we are over-budget,
* which balances the pools to match the current docs. */
void balanceRAM() {
// We flush when we've used our target usage
final long flushTrigger = ramBufferSize;
final long deletesRAMUsed = deletesInRAM.bytesUsed+deletesFlushed.bytesUsed;
if (numBytesAlloc+deletesRAMUsed > freeTrigger) {
if (numBytesUsed+deletesRAMUsed > ramBufferSize) {
if (infoStream != null)
message(" RAM: now balance allocations: usedMB=" + toMB(numBytesUsed) +
" vs trigger=" + toMB(flushTrigger) +
" allocMB=" + toMB(numBytesAlloc) +
" vs trigger=" + toMB(ramBufferSize) +
" deletesMB=" + toMB(deletesRAMUsed) +
" vs trigger=" + toMB(freeTrigger) +
" byteBlockFree=" + toMB(byteBlockAllocator.freeByteBlocks.size()*BYTE_BLOCK_SIZE) +
" perDocFree=" + toMB(perDocAllocator.freeByteBlocks.size()*PER_DOC_BLOCK_SIZE) +
" charBlockFree=" + toMB(freeCharBlocks.size()*CHAR_BLOCK_SIZE*CHAR_NUM_BYTE));
" perDocFree=" + toMB(perDocAllocator.freeByteBlocks.size()*PER_DOC_BLOCK_SIZE));
final long startBytesAlloc = numBytesAlloc + deletesRAMUsed;
final long startBytesUsed = numBytesUsed + deletesRAMUsed;
int iter = 0;
@ -1427,46 +1410,38 @@ final class DocumentsWriter {
boolean any = true;
while(numBytesAlloc+deletesRAMUsed > freeLevel) {
while(numBytesUsed+deletesRAMUsed > freeLevel) {
synchronized(this) {
if (0 == perDocAllocator.freeByteBlocks.size()
&& 0 == byteBlockAllocator.freeByteBlocks.size()
&& 0 == freeCharBlocks.size()
&& 0 == freeIntBlocks.size()
&& !any) {
if (0 == perDocAllocator.freeByteBlocks.size() &&
0 == byteBlockAllocator.freeByteBlocks.size() &&
0 == freeIntBlocks.size() && !any) {
// Nothing else to free -- must flush now.
bufferIsFull = numBytesUsed+deletesRAMUsed > flushTrigger;
bufferIsFull = numBytesUsed+deletesRAMUsed > ramBufferSize;
if (infoStream != null) {
if (numBytesUsed > flushTrigger)
if (numBytesUsed+deletesRAMUsed > ramBufferSize)
message(" nothing to free; now set bufferIsFull");
else
message(" nothing to free");
}
assert numBytesUsed <= numBytesAlloc;
break;
}
if ((0 == iter % 5) && byteBlockAllocator.freeByteBlocks.size() > 0) {
if ((0 == iter % 4) && byteBlockAllocator.freeByteBlocks.size() > 0) {
byteBlockAllocator.freeByteBlocks.remove(byteBlockAllocator.freeByteBlocks.size()-1);
numBytesAlloc -= BYTE_BLOCK_SIZE;
numBytesUsed -= BYTE_BLOCK_SIZE;
}
if ((1 == iter % 5) && freeCharBlocks.size() > 0) {
freeCharBlocks.remove(freeCharBlocks.size()-1);
numBytesAlloc -= CHAR_BLOCK_SIZE * CHAR_NUM_BYTE;
}
if ((2 == iter % 5) && freeIntBlocks.size() > 0) {
if ((1 == iter % 4) && freeIntBlocks.size() > 0) {
freeIntBlocks.remove(freeIntBlocks.size()-1);
numBytesAlloc -= INT_BLOCK_SIZE * INT_NUM_BYTE;
numBytesUsed -= INT_BLOCK_SIZE * INT_NUM_BYTE;
}
if ((3 == iter % 5) && perDocAllocator.freeByteBlocks.size() > 0) {
if ((2 == iter % 4) && perDocAllocator.freeByteBlocks.size() > 0) {
// Remove upwards of 32 blocks (each block is 1K)
for (int i = 0; i < 32; ++i) {
perDocAllocator.freeByteBlocks.remove(perDocAllocator.freeByteBlocks.size() - 1);
numBytesAlloc -= PER_DOC_BLOCK_SIZE;
numBytesUsed -= PER_DOC_BLOCK_SIZE;
if (perDocAllocator.freeByteBlocks.size() == 0) {
break;
}
@ -1474,7 +1449,7 @@ final class DocumentsWriter {
}
}
if ((4 == iter % 5) && any)
if ((3 == iter % 4) && any)
// Ask consumer to free any recycled state
any = consumer.freeRAM();
@ -1482,26 +1457,7 @@ final class DocumentsWriter {
}
if (infoStream != null)
message(" after free: freedMB=" + nf.format((startBytesAlloc-numBytesAlloc-deletesRAMUsed)/1024./1024.) + " usedMB=" + nf.format((numBytesUsed+deletesRAMUsed)/1024./1024.) + " allocMB=" + nf.format(numBytesAlloc/1024./1024.));
} else {
// If we have not crossed the 100% mark, but have
// crossed the 95% mark of RAM we are actually
// using, go ahead and flush. This prevents
// over-allocating and then freeing, with every
// flush.
synchronized(this) {
if (numBytesUsed+deletesRAMUsed > flushTrigger) {
if (infoStream != null)
message(" RAM: now flush @ usedMB=" + nf.format(numBytesUsed/1024./1024.) +
" allocMB=" + nf.format(numBytesAlloc/1024./1024.) +
" deletesMB=" + nf.format(deletesRAMUsed/1024./1024.) +
" triggerMB=" + nf.format(flushTrigger/1024./1024.));
bufferIsFull = true;
}
}
message(" after free: freedMB=" + nf.format((startBytesUsed-numBytesUsed-deletesRAMUsed)/1024./1024.) + " usedMB=" + nf.format((numBytesUsed+deletesRAMUsed)/1024./1024.));
}
}

View File

@ -17,20 +17,21 @@ package org.apache.lucene.index;
* limitations under the License.
*/
final class FieldInfo {
String name;
boolean isIndexed;
int number;
/** @lucene.experimental */
public final class FieldInfo {
public String name;
public boolean isIndexed;
public int number;
// true if term vector for this field should be stored
boolean storeTermVector;
boolean storeOffsetWithTermVector;
boolean storePositionWithTermVector;
boolean omitNorms; // omit norms associated with indexed fields
boolean omitTermFreqAndPositions;
public boolean omitNorms; // omit norms associated with indexed fields
public boolean omitTermFreqAndPositions;
boolean storePayloads; // whether this field stores payloads together with term positions
public boolean storePayloads; // whether this field stores payloads together with term positions
FieldInfo(String na, boolean tk, int nu, boolean storeTermVector,
boolean storePositionWithTermVector, boolean storeOffsetWithTermVector,

View File

@ -32,8 +32,9 @@ import java.util.*;
* of this class are thread-safe for multiple readers, but only one thread can
* be adding documents at a time, with no other reader or writer threads
* accessing this object.
* @lucene.experimental
*/
final class FieldInfos {
public final class FieldInfos {
// Used internally (ie not written to *.fnm files) for pre-2.9 files
public static final int FORMAT_PRE = -1;
@ -120,7 +121,7 @@ final class FieldInfos {
}
/** Returns true if any fields do not omitTermFreqAndPositions */
boolean hasProx() {
public boolean hasProx() {
final int numFields = byNumber.size();
for(int i=0;i<numFields;i++) {
final FieldInfo fi = fieldInfo(i);

View File

@ -0,0 +1,36 @@
package org.apache.lucene.index;
/**
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
import java.io.IOException;
/** Flex API for access to fields and terms
* @lucene.experimental */
public abstract class Fields {
/** Returns an iterator that will step through all fields
* names. This will not return null. */
public abstract FieldsEnum iterator() throws IOException;
/** Get the {@link Terms} for this field. This may return
* null if the field does not exist. */
public abstract Terms terms(String field) throws IOException;
public final static Fields[] EMPTY_ARRAY = new Fields[0];
}

View File

@ -0,0 +1,74 @@
package org.apache.lucene.index;
/**
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
import java.io.IOException;
import org.apache.lucene.util.AttributeSource;
/** Enumerates indexed fields. You must first call {@link
* #next} before calling {@link #terms}.
*
* @lucene.experimental */
public abstract class FieldsEnum {
// TODO: maybe allow retrieving FieldInfo for current
// field, as optional method?
private AttributeSource atts = null;
/**
* Returns the related attributes.
*/
public AttributeSource attributes() {
if (atts == null) {
atts = new AttributeSource();
}
return atts;
}
/** Increments the enumeration to the next field. The
* returned field is always interned, so simple ==
* comparison is allowed. Returns null when there are no
* more fields.*/
public abstract String next() throws IOException;
/** Get {@link TermsEnum} for the current field. You
* should not call {@link #next} until you're done using
* this {@link TermsEnum}. After {@link #next} returns
* null this method should not be called. This method
* will not return null. */
public abstract TermsEnum terms() throws IOException;
public final static FieldsEnum[] EMPTY_ARRAY = new FieldsEnum[0];
/** Provides zero fields */
public final static FieldsEnum EMPTY = new FieldsEnum() {
@Override
public String next() {
return null;
}
@Override
public TermsEnum terms() {
throw new IllegalStateException("this method should never be called");
}
};
}

View File

@ -20,7 +20,9 @@ package org.apache.lucene.index;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.FieldSelector;
import org.apache.lucene.store.Directory;
import org.apache.lucene.util.Bits;
import org.apache.lucene.search.FieldCache; // not great (circular); used only to purge FieldCache entry on close
import org.apache.lucene.util.BytesRef;
import java.io.IOException;
import java.util.Collection;
@ -115,6 +117,11 @@ public class FilterIndexReader extends IndexReader {
return in.directory();
}
@Override
public Bits getDeletedDocs() throws IOException {
return in.getDeletedDocs();
}
@Override
public TermFreqVector[] getTermFreqVectors(int docNumber)
throws IOException {
@ -217,6 +224,12 @@ public class FilterIndexReader extends IndexReader {
return in.docFreq(t);
}
@Override
public int docFreq(String field, BytesRef t) throws IOException {
ensureOpen();
return in.docFreq(field, t);
}
@Override
public TermDocs termDocs() throws IOException {
ensureOpen();

View File

@ -1,129 +0,0 @@
package org.apache.lucene.index;
/**
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/** Consumes doc & freq, writing them using the current
* index file format */
import java.io.IOException;
import org.apache.lucene.util.UnicodeUtil;
import org.apache.lucene.store.IndexOutput;
final class FormatPostingsDocsWriter extends FormatPostingsDocsConsumer {
final IndexOutput out;
final FormatPostingsTermsWriter parent;
final FormatPostingsPositionsWriter posWriter;
final DefaultSkipListWriter skipListWriter;
final int skipInterval;
final int totalNumDocs;
boolean omitTermFreqAndPositions;
boolean storePayloads;
long freqStart;
FieldInfo fieldInfo;
FormatPostingsDocsWriter(SegmentWriteState state, FormatPostingsTermsWriter parent) throws IOException {
super();
this.parent = parent;
final String fileName = IndexFileNames.segmentFileName(parent.parent.segment, IndexFileNames.FREQ_EXTENSION);
state.flushedFiles.add(fileName);
out = parent.parent.dir.createOutput(fileName);
totalNumDocs = parent.parent.totalNumDocs;
// TODO: abstraction violation
skipInterval = parent.parent.termsOut.skipInterval;
skipListWriter = parent.parent.skipListWriter;
skipListWriter.setFreqOutput(out);
posWriter = new FormatPostingsPositionsWriter(state, this);
}
void setField(FieldInfo fieldInfo) {
this.fieldInfo = fieldInfo;
omitTermFreqAndPositions = fieldInfo.omitTermFreqAndPositions;
storePayloads = fieldInfo.storePayloads;
posWriter.setField(fieldInfo);
}
int lastDocID;
int df;
/** Adds a new doc in this term. If this returns null
* then we just skip consuming positions/payloads. */
@Override
FormatPostingsPositionsConsumer addDoc(int docID, int termDocFreq) throws IOException {
final int delta = docID - lastDocID;
if (docID < 0 || (df > 0 && delta <= 0))
throw new CorruptIndexException("docs out of order (" + docID + " <= " + lastDocID + " )");
if ((++df % skipInterval) == 0) {
// TODO: abstraction violation
skipListWriter.setSkipData(lastDocID, storePayloads, posWriter.lastPayloadLength);
skipListWriter.bufferSkip(df);
}
assert docID < totalNumDocs: "docID=" + docID + " totalNumDocs=" + totalNumDocs;
lastDocID = docID;
if (omitTermFreqAndPositions)
out.writeVInt(delta);
else if (1 == termDocFreq)
out.writeVInt((delta<<1) | 1);
else {
out.writeVInt(delta<<1);
out.writeVInt(termDocFreq);
}
return posWriter;
}
private final TermInfo termInfo = new TermInfo(); // minimize consing
final UnicodeUtil.UTF8Result utf8 = new UnicodeUtil.UTF8Result();
/** Called when we are done adding docs to this term */
@Override
void finish() throws IOException {
long skipPointer = skipListWriter.writeSkip(out);
// TODO: this is abstraction violation -- we should not
// peek up into parents terms encoding format
termInfo.set(df, parent.freqStart, parent.proxStart, (int) (skipPointer - parent.freqStart));
// TODO: we could do this incrementally
UnicodeUtil.UTF16toUTF8(parent.currentTerm, parent.currentTermStart, utf8);
if (df > 0) {
parent.termsOut.add(fieldInfo.number,
utf8.result,
utf8.length,
termInfo);
}
lastDocID = 0;
df = 0;
}
void close() throws IOException {
out.close();
posWriter.close();
}
}

View File

@ -1,75 +0,0 @@
package org.apache.lucene.index;
/**
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
import java.io.IOException;
import org.apache.lucene.store.Directory;
final class FormatPostingsFieldsWriter extends FormatPostingsFieldsConsumer {
final Directory dir;
final String segment;
final TermInfosWriter termsOut;
final FieldInfos fieldInfos;
final FormatPostingsTermsWriter termsWriter;
final DefaultSkipListWriter skipListWriter;
final int totalNumDocs;
public FormatPostingsFieldsWriter(SegmentWriteState state, FieldInfos fieldInfos) throws IOException {
super();
dir = state.directory;
segment = state.segmentName;
totalNumDocs = state.numDocs;
this.fieldInfos = fieldInfos;
termsOut = new TermInfosWriter(dir,
segment,
fieldInfos,
state.termIndexInterval);
// TODO: this is a nasty abstraction violation (that we
// peek down to find freqOut/proxOut) -- we need a
// better abstraction here whereby these child consumers
// can provide skip data or not
skipListWriter = new DefaultSkipListWriter(termsOut.skipInterval,
termsOut.maxSkipLevels,
totalNumDocs,
null,
null);
state.flushedFiles.add(state.segmentFileName(IndexFileNames.TERMS_EXTENSION));
state.flushedFiles.add(state.segmentFileName(IndexFileNames.TERMS_INDEX_EXTENSION));
termsWriter = new FormatPostingsTermsWriter(state, this);
}
/** Add a new field */
@Override
FormatPostingsTermsConsumer addField(FieldInfo field) {
termsWriter.setField(field);
return termsWriter;
}
/** Called when we are done adding everything. */
@Override
void finish() throws IOException {
termsOut.close();
termsWriter.close();
}
}

View File

@ -1,89 +0,0 @@
package org.apache.lucene.index;
/**
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
import org.apache.lucene.store.IndexOutput;
import java.io.IOException;
final class FormatPostingsPositionsWriter extends FormatPostingsPositionsConsumer {
final FormatPostingsDocsWriter parent;
final IndexOutput out;
boolean omitTermFreqAndPositions;
boolean storePayloads;
int lastPayloadLength = -1;
FormatPostingsPositionsWriter(SegmentWriteState state, FormatPostingsDocsWriter parent) throws IOException {
this.parent = parent;
omitTermFreqAndPositions = parent.omitTermFreqAndPositions;
if (parent.parent.parent.fieldInfos.hasProx()) {
// At least one field does not omit TF, so create the
// prox file
final String fileName = IndexFileNames.segmentFileName(parent.parent.parent.segment, IndexFileNames.PROX_EXTENSION);
state.flushedFiles.add(fileName);
out = parent.parent.parent.dir.createOutput(fileName);
parent.skipListWriter.setProxOutput(out);
} else
// Every field omits TF so we will write no prox file
out = null;
}
int lastPosition;
/** Add a new position & payload */
@Override
void addPosition(int position, byte[] payload, int payloadOffset, int payloadLength) throws IOException {
assert !omitTermFreqAndPositions: "omitTermFreqAndPositions is true";
assert out != null;
final int delta = position - lastPosition;
lastPosition = position;
if (storePayloads) {
if (payloadLength != lastPayloadLength) {
lastPayloadLength = payloadLength;
out.writeVInt((delta<<1)|1);
out.writeVInt(payloadLength);
} else
out.writeVInt(delta << 1);
if (payloadLength > 0)
out.writeBytes(payload, payloadLength);
} else
out.writeVInt(delta);
}
void setField(FieldInfo fieldInfo) {
omitTermFreqAndPositions = fieldInfo.omitTermFreqAndPositions;
storePayloads = omitTermFreqAndPositions ? false : fieldInfo.storePayloads;
}
/** Called when we are done adding positions & payloads */
@Override
void finish() {
lastPosition = 0;
lastPayloadLength = -1;
}
void close() throws IOException {
if (out != null)
out.close();
}
}

View File

@ -1,47 +0,0 @@
package org.apache.lucene.index;
/**
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
import java.io.IOException;
import org.apache.lucene.util.ArrayUtil;
import org.apache.lucene.util.RamUsageEstimator;
/**
* @lucene.experimental
*/
abstract class FormatPostingsTermsConsumer {
/** Adds a new term in this field; term ends with U+FFFF
* char */
abstract FormatPostingsDocsConsumer addTerm(char[] text, int start) throws IOException;
char[] termBuffer;
FormatPostingsDocsConsumer addTerm(String text) throws IOException {
final int len = text.length();
if (termBuffer == null || termBuffer.length < 1+len)
termBuffer = new char[ArrayUtil.oversize(1+len, RamUsageEstimator.NUM_BYTES_CHAR)];
text.getChars(0, len, termBuffer, 0);
termBuffer[len] = 0xffff;
return addTerm(termBuffer, 0);
}
/** Called when we are done adding terms to this field */
abstract void finish() throws IOException;
}

View File

@ -1,73 +0,0 @@
package org.apache.lucene.index;
/**
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
import java.io.IOException;
final class FormatPostingsTermsWriter extends FormatPostingsTermsConsumer {
final FormatPostingsFieldsWriter parent;
final FormatPostingsDocsWriter docsWriter;
final TermInfosWriter termsOut;
FieldInfo fieldInfo;
FormatPostingsTermsWriter(SegmentWriteState state, FormatPostingsFieldsWriter parent) throws IOException {
super();
this.parent = parent;
termsOut = parent.termsOut;
docsWriter = new FormatPostingsDocsWriter(state, this);
}
void setField(FieldInfo fieldInfo) {
this.fieldInfo = fieldInfo;
docsWriter.setField(fieldInfo);
}
char[] currentTerm;
int currentTermStart;
long freqStart;
long proxStart;
/** Adds a new term in this field */
@Override
FormatPostingsDocsConsumer addTerm(char[] text, int start) {
currentTerm = text;
currentTermStart = start;
// TODO: this is abstraction violation -- ideally this
// terms writer is not so "invasive", looking for file
// pointers in its child consumers.
freqStart = docsWriter.out.getFilePointer();
if (docsWriter.posWriter.out != null)
proxStart = docsWriter.posWriter.out.getFilePointer();
parent.skipListWriter.resetSkip();
return docsWriter;
}
/** Called when we are done adding terms to this field */
@Override
void finish() {
}
void close() throws IOException {
docsWriter.close();
}
}

View File

@ -18,6 +18,8 @@ package org.apache.lucene.index;
*/
import java.io.IOException;
import java.util.Comparator;
import org.apache.lucene.util.BytesRef;
import org.apache.lucene.index.FreqProxTermsWriterPerField.FreqProxPostingsArray;
@ -31,13 +33,12 @@ final class FreqProxFieldMergeState {
final FreqProxTermsWriterPerField field;
final int numPostings;
final CharBlockPool charPool;
private final ByteBlockPool bytePool;
final int[] termIDs;
final FreqProxPostingsArray postings;
int currentTermID;
char[] text;
int textOffset;
final BytesRef text = new BytesRef();
private int postingUpto = -1;
@ -47,29 +48,31 @@ final class FreqProxFieldMergeState {
int docID;
int termFreq;
public FreqProxFieldMergeState(FreqProxTermsWriterPerField field) {
public FreqProxFieldMergeState(FreqProxTermsWriterPerField field, Comparator<BytesRef> termComp) {
this.field = field;
this.charPool = field.perThread.termsHashPerThread.charPool;
this.numPostings = field.termsHashPerField.numPostings;
this.termIDs = field.termsHashPerField.sortPostings();
this.bytePool = field.perThread.termsHashPerThread.bytePool;
this.termIDs = field.termsHashPerField.sortPostings(termComp);
this.postings = (FreqProxPostingsArray) field.termsHashPerField.postingsArray;
}
boolean nextTerm() throws IOException {
postingUpto++;
if (postingUpto == numPostings)
if (postingUpto == numPostings) {
return false;
}
currentTermID = termIDs[postingUpto];
docID = 0;
// Get BytesRef
final int textStart = postings.textStarts[currentTermID];
text = charPool.buffers[textStart >> DocumentsWriter.CHAR_BLOCK_SHIFT];
textOffset = textStart & DocumentsWriter.CHAR_BLOCK_MASK;
bytePool.setBytesRef(text, textStart);
field.termsHashPerField.initReader(freq, currentTermID, 0);
if (!field.fieldInfo.omitTermFreqAndPositions)
if (!field.fieldInfo.omitTermFreqAndPositions) {
field.termsHashPerField.initReader(prox, currentTermID, 1);
}
// Should always be true
boolean result = nextDoc();

View File

@ -17,14 +17,19 @@ package org.apache.lucene.index;
* limitations under the License.
*/
import org.apache.lucene.util.UnicodeUtil;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Collection;
import java.util.Collections;
import java.util.Map;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
import java.util.Comparator;
import org.apache.lucene.index.codecs.PostingsConsumer;
import org.apache.lucene.index.codecs.FieldsConsumer;
import org.apache.lucene.index.codecs.TermsConsumer;
import org.apache.lucene.util.BytesRef;
final class FreqProxTermsWriter extends TermsHashConsumer {
@ -33,27 +38,13 @@ final class FreqProxTermsWriter extends TermsHashConsumer {
return new FreqProxTermsWriterPerThread(perThread);
}
private static int compareText(final char[] text1, int pos1, final char[] text2, int pos2) {
while(true) {
final char c1 = text1[pos1++];
final char c2 = text2[pos2++];
if (c1 != c2) {
if (0xffff == c2)
return 1;
else if (0xffff == c1)
return -1;
else
return c1-c2;
} else if (0xffff == c1)
return 0;
}
}
@Override
void closeDocStore(SegmentWriteState state) {}
@Override
void abort() {}
private int flushedDocCount;
// TODO: would be nice to factor out more of this, eg the
// FreqProxFieldMergeState, and code to visit all Fields
@ -67,6 +58,8 @@ final class FreqProxTermsWriter extends TermsHashConsumer {
// ThreadStates
List<FreqProxTermsWriterPerField> allFields = new ArrayList<FreqProxTermsWriterPerField>();
flushedDocCount = state.numDocs;
for (Map.Entry<TermsHashConsumerPerThread,Collection<TermsHashConsumerPerField>> entry : threadsAndFields.entrySet()) {
Collection<TermsHashConsumerPerField> fields = entry.getValue();
@ -79,21 +72,23 @@ final class FreqProxTermsWriter extends TermsHashConsumer {
}
}
// Sort by field name
Collections.sort(allFields);
final int numAllFields = allFields.size();
// TODO: allow Lucene user to customize this consumer:
final FormatPostingsFieldsConsumer consumer = new FormatPostingsFieldsWriter(state, fieldInfos);
// Sort by field name
Collections.sort(allFields);
// TODO: allow Lucene user to customize this codec:
final FieldsConsumer consumer = state.codec.fieldsConsumer(state);
/*
Current writer chain:
FormatPostingsFieldsConsumer
-> IMPL: FormatPostingsFieldsWriter
-> FormatPostingsTermsConsumer
-> IMPL: FormatPostingsTermsWriter
-> FormatPostingsDocConsumer
-> IMPL: FormatPostingsDocWriter
-> FormatPostingsPositionsConsumer
FieldsConsumer
-> IMPL: FormatPostingsTermsDictWriter
-> TermsConsumer
-> IMPL: FormatPostingsTermsDictWriter.TermsWriter
-> DocsConsumer
-> IMPL: FormatPostingsDocsWriter
-> PositionsConsumer
-> IMPL: FormatPostingsPositionsWriter
*/
@ -134,25 +129,29 @@ final class FreqProxTermsWriter extends TermsHashConsumer {
FreqProxTermsWriterPerThread perThread = (FreqProxTermsWriterPerThread) entry.getKey();
perThread.termsHashPerThread.reset(true);
}
consumer.finish();
consumer.close();
}
private byte[] payloadBuffer;
BytesRef payload;
/* Walk through all unique text tokens (Posting
* instances) found in this field and serialize them
* into a single RAM segment. */
void appendPostings(FreqProxTermsWriterPerField[] fields,
FormatPostingsFieldsConsumer consumer)
FieldsConsumer consumer)
throws CorruptIndexException, IOException {
int numFields = fields.length;
final BytesRef text = new BytesRef();
final FreqProxFieldMergeState[] mergeStates = new FreqProxFieldMergeState[numFields];
final TermsConsumer termsConsumer = consumer.addField(fields[0].fieldInfo);
final Comparator<BytesRef> termComp = termsConsumer.getComparator();
for(int i=0;i<numFields;i++) {
FreqProxFieldMergeState fms = mergeStates[i] = new FreqProxFieldMergeState(fields[i]);
FreqProxFieldMergeState fms = mergeStates[i] = new FreqProxFieldMergeState(fields[i], termComp);
assert fms.field.fieldInfo == fields[0].fieldInfo;
@ -161,45 +160,63 @@ final class FreqProxTermsWriter extends TermsHashConsumer {
assert result;
}
final FormatPostingsTermsConsumer termsConsumer = consumer.addField(fields[0].fieldInfo);
FreqProxFieldMergeState[] termStates = new FreqProxFieldMergeState[numFields];
final boolean currentFieldOmitTermFreqAndPositions = fields[0].fieldInfo.omitTermFreqAndPositions;
//System.out.println("flush terms field=" + fields[0].fieldInfo.name);
// TODO: really TermsHashPerField should take over most
// of this loop, including merge sort of terms from
// multiple threads and interacting with the
// TermsConsumer, only calling out to us (passing us the
// DocsConsumer) to handle delivery of docs/positions
while(numFields > 0) {
// Get the next term to merge
termStates[0] = mergeStates[0];
int numToMerge = 1;
// TODO: pqueue
for(int i=1;i<numFields;i++) {
final char[] text = mergeStates[i].text;
final int textOffset = mergeStates[i].textOffset;
final int cmp = compareText(text, textOffset, termStates[0].text, termStates[0].textOffset);
final int cmp = termComp.compare(mergeStates[i].text, termStates[0].text);
if (cmp < 0) {
termStates[0] = mergeStates[i];
numToMerge = 1;
} else if (cmp == 0)
} else if (cmp == 0) {
termStates[numToMerge++] = mergeStates[i];
}
}
final FormatPostingsDocsConsumer docConsumer = termsConsumer.addTerm(termStates[0].text, termStates[0].textOffset);
// Need shallow copy here because termStates[0].text
// changes by the time we call finishTerm
text.bytes = termStates[0].text.bytes;
text.offset = termStates[0].text.offset;
text.length = termStates[0].text.length;
//System.out.println(" term=" + text.toUnicodeString());
//System.out.println(" term=" + text.toString());
final PostingsConsumer postingsConsumer = termsConsumer.startTerm(text);
// Now termStates has numToMerge FieldMergeStates
// which all share the same term. Now we must
// interleave the docID streams.
int numDocs = 0;
while(numToMerge > 0) {
FreqProxFieldMergeState minState = termStates[0];
for(int i=1;i<numToMerge;i++)
if (termStates[i].docID < minState.docID)
for(int i=1;i<numToMerge;i++) {
if (termStates[i].docID < minState.docID) {
minState = termStates[i];
}
}
final int termDocFreq = minState.termFreq;
numDocs++;
final FormatPostingsPositionsConsumer posConsumer = docConsumer.addDoc(minState.docID, termDocFreq);
assert minState.docID < flushedDocCount: "doc=" + minState.docID + " maxDoc=" + flushedDocCount;
postingsConsumer.startDoc(minState.docID, termDocFreq);
final ByteSliceReader prox = minState.prox;
@ -213,33 +230,48 @@ final class FreqProxTermsWriter extends TermsHashConsumer {
for(int j=0;j<termDocFreq;j++) {
final int code = prox.readVInt();
position += code >> 1;
//System.out.println(" pos=" + position);
final int payloadLength;
final BytesRef thisPayload;
if ((code & 1) != 0) {
// This position has a payload
payloadLength = prox.readVInt();
if (payloadBuffer == null || payloadBuffer.length < payloadLength)
payloadBuffer = new byte[payloadLength];
if (payload == null) {
payload = new BytesRef();
payload.bytes = new byte[payloadLength];
} else if (payload.bytes.length < payloadLength) {
payload.grow(payloadLength);
}
prox.readBytes(payloadBuffer, 0, payloadLength);
prox.readBytes(payload.bytes, 0, payloadLength);
payload.length = payloadLength;
thisPayload = payload;
} else
} else {
payloadLength = 0;
thisPayload = null;
}
posConsumer.addPosition(position, payloadBuffer, 0, payloadLength);
postingsConsumer.addPosition(position, thisPayload);
} //End for
posConsumer.finish();
postingsConsumer.finishDoc();
}
if (!minState.nextDoc()) {
// Remove from termStates
int upto = 0;
for(int i=0;i<numToMerge;i++)
if (termStates[i] != minState)
// TODO: inefficient O(N) where N = number of
// threads that had seen this term:
for(int i=0;i<numToMerge;i++) {
if (termStates[i] != minState) {
termStates[upto++] = termStates[i];
}
}
numToMerge--;
assert upto == numToMerge;
@ -258,11 +290,10 @@ final class FreqProxTermsWriter extends TermsHashConsumer {
}
}
docConsumer.finish();
assert numDocs > 0;
termsConsumer.finishTerm(text, numDocs);
}
termsConsumer.finish();
}
final UnicodeUtil.UTF8Result termsUTF8 = new UnicodeUtil.UTF8Result();
}

View File

@ -187,25 +187,26 @@ final class FreqProxTermsWriterPerField extends TermsHashConsumerPerField implem
int lastPositions[]; // Last position where this term occurred
@Override
ParallelPostingsArray resize(int newSize) {
FreqProxPostingsArray newArray = new FreqProxPostingsArray(newSize);
copy(this, newArray);
return newArray;
ParallelPostingsArray newInstance(int size) {
return new FreqProxPostingsArray(size);
}
void copy(FreqProxPostingsArray fromArray, FreqProxPostingsArray toArray) {
super.copy(fromArray, toArray);
System.arraycopy(fromArray.docFreqs, 0, toArray.docFreqs, 0, fromArray.docFreqs.length);
System.arraycopy(fromArray.lastDocIDs, 0, toArray.lastDocIDs, 0, fromArray.lastDocIDs.length);
System.arraycopy(fromArray.lastDocCodes, 0, toArray.lastDocCodes, 0, fromArray.lastDocCodes.length);
System.arraycopy(fromArray.lastPositions, 0, toArray.lastPositions, 0, fromArray.lastPositions.length);
void copyTo(ParallelPostingsArray toArray, int numToCopy) {
assert toArray instanceof FreqProxPostingsArray;
FreqProxPostingsArray to = (FreqProxPostingsArray) toArray;
super.copyTo(toArray, numToCopy);
System.arraycopy(docFreqs, 0, to.docFreqs, 0, numToCopy);
System.arraycopy(lastDocIDs, 0, to.lastDocIDs, 0, numToCopy);
System.arraycopy(lastDocCodes, 0, to.lastDocCodes, 0, numToCopy);
System.arraycopy(lastPositions, 0, to.lastPositions, 0, numToCopy);
}
}
@Override
int bytesPerPosting() {
return ParallelPostingsArray.BYTES_PER_POSTING + 4 * DocumentsWriter.INT_NUM_BYTE;
@Override
int bytesPerPosting() {
return ParallelPostingsArray.BYTES_PER_POSTING + 4 * DocumentsWriter.INT_NUM_BYTE;
}
}
public void abort() {}

View File

@ -17,18 +17,20 @@ package org.apache.lucene.index;
* limitations under the License.
*/
import org.apache.lucene.store.Directory;
import java.io.IOException;
import java.io.FileNotFoundException;
import java.io.FilenameFilter;
import java.io.IOException;
import java.io.PrintStream;
import java.util.Map;
import java.util.ArrayList;
import java.util.Collection;
import java.util.Collections;
import java.util.HashMap;
import java.util.List;
import java.util.ArrayList;
import java.util.Collections;
import java.util.Collection;
import java.util.Map;
import org.apache.lucene.index.codecs.CodecProvider;
import org.apache.lucene.store.Directory;
/*
* This class keeps track of each SegmentInfos instance that
@ -114,6 +116,8 @@ final class IndexFileDeleter {
infoStream.println("IFD [" + Thread.currentThread().getName() + "]: " + message);
}
private final FilenameFilter indexFilenameFilter;
/**
* Initialize the deleter: find all previous commits in
* the Directory, incref the files they reference, call
@ -122,7 +126,8 @@ final class IndexFileDeleter {
* @throws CorruptIndexException if the index is corrupt
* @throws IOException if there is a low-level IO error
*/
public IndexFileDeleter(Directory directory, IndexDeletionPolicy policy, SegmentInfos segmentInfos, PrintStream infoStream, DocumentsWriter docWriter)
public IndexFileDeleter(Directory directory, IndexDeletionPolicy policy, SegmentInfos segmentInfos, PrintStream infoStream, DocumentsWriter docWriter,
CodecProvider codecs)
throws CorruptIndexException, IOException {
this.docWriter = docWriter;
@ -137,7 +142,7 @@ final class IndexFileDeleter {
// First pass: walk the files and initialize our ref
// counts:
long currentGen = segmentInfos.getGeneration();
IndexFileNameFilter filter = IndexFileNameFilter.getFilter();
indexFilenameFilter = new IndexFileNameFilter(codecs);
String[] files = directory.listAll();
@ -147,7 +152,7 @@ final class IndexFileDeleter {
String fileName = files[i];
if (filter.accept(null, fileName) && !fileName.equals(IndexFileNames.SEGMENTS_GEN)) {
if ((indexFilenameFilter.accept(null, fileName)) && !fileName.endsWith("write.lock") && !fileName.equals(IndexFileNames.SEGMENTS_GEN)) {
// Add this file to refCounts with initial count 0:
getRefCount(fileName);
@ -163,7 +168,7 @@ final class IndexFileDeleter {
}
SegmentInfos sis = new SegmentInfos();
try {
sis.read(directory, fileName);
sis.read(directory, fileName, codecs);
} catch (FileNotFoundException e) {
// LUCENE-948: on NFS (and maybe others), if
// you have writers switching back and forth
@ -200,7 +205,7 @@ final class IndexFileDeleter {
// try now to explicitly open this commit point:
SegmentInfos sis = new SegmentInfos();
try {
sis.read(directory, segmentInfos.getCurrentSegmentFileName());
sis.read(directory, segmentInfos.getCurrentSegmentFileName(), codecs);
} catch (IOException e) {
throw new CorruptIndexException("failed to locate current segments_N file");
}
@ -296,7 +301,6 @@ final class IndexFileDeleter {
*/
public void refresh(String segmentName) throws IOException {
String[] files = directory.listAll();
IndexFileNameFilter filter = IndexFileNameFilter.getFilter();
String segmentPrefix1;
String segmentPrefix2;
if (segmentName != null) {
@ -309,8 +313,8 @@ final class IndexFileDeleter {
for(int i=0;i<files.length;i++) {
String fileName = files[i];
if (filter.accept(null, fileName) &&
(segmentName == null || fileName.startsWith(segmentPrefix1) || fileName.startsWith(segmentPrefix2)) &&
if ((segmentName == null || fileName.startsWith(segmentPrefix1) || fileName.startsWith(segmentPrefix2)) &&
indexFilenameFilter.accept(null, fileName) &&
!refCounts.containsKey(fileName) &&
!fileName.equals(IndexFileNames.SEGMENTS_GEN)) {
// Unreferenced file, so remove it

Some files were not shown because too many files have changed in this diff Show More