Lockless commits

git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@476359 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Michael McCandless 2006-11-17 23:18:47 +00:00
parent bd6f012511
commit d634ccf4e9
20 changed files with 1956 additions and 493 deletions

View File

@ -104,6 +104,15 @@ API Changes
9. LUCENE-657: Made FuzzyQuery non-final and inner ScoreTerm protected.
(Steven Parkes via Otis Gospodnetic)
10. LUCENE-701: Lockless commits: a commit lock is no longer required
when a writer commits and a reader opens the index. This includes
a change to the index file format (see docs/fileformats.html for
details). It also removes all APIs associated with the commit
lock & its timeout. Readers are now truly read-only and do not
block one another on startup. This is the first step to getting
Lucene to work correctly over NFS (second step is
LUCENE-710). (Mike McCandless)
Bug fixes
1. Fixed the web application demo (built with "ant war-demo") which

View File

@ -118,7 +118,7 @@ limitations under the License.
<blockquote>
<p>
This document defines the index file formats used
in Lucene version 2.0. If you are using a different
in Lucene version 2.1. If you are using a different
version of Lucene, please consult the copy of
<code>docs/fileformats.html</code> that was distributed
with the version you are using.
@ -143,6 +143,17 @@ limitations under the License.
Compatibility notes are provided in this document,
describing how file formats have changed from prior versions.
</p>
<p>
In version 2.1, the file format was changed to allow
lock-less commits (ie, no more commit lock). The
change is fully backwards compatible: you can open a
pre-2.1 index for searching or adding/deleting of
docs. When the new segments file is saved
(committed), it will be written in the new file format
(meaning no specific "upgrade" process is needed).
But note that once a commit has occurred, pre-2.1
Lucene will not be able to read the index.
</p>
</blockquote>
</p>
</td></tr>
@ -403,6 +414,17 @@ limitations under the License.
Typically, all segments
in an index are stored in a single directory, although this is not
required.
</p>
<p>
As of version 2.1 (lock-less commits), file names are
never re-used (there is one exception, "segments.gen",
see below). That is, when any file is saved to the
Directory it is given a never before used filename.
This is achieved using a simple generations approach.
For example, the first segments file is segments_1,
then segments_2, etc. The generation is a sequential
long integer represented in alpha-numeric (base 36)
form.
</p>
</blockquote>
</p>
@ -1080,25 +1102,53 @@ limitations under the License.
<blockquote>
<p>
The active segments in the index are stored in the
segment info file. An index only has
a single file in this format, and it is named "segments".
This lists each segment by name, and also contains the size of each
segment.
segment info file, <tt>segments_N</tt>. There may
be one or more <tt>segments_N</tt> files in the
index; however, the one with the largest
generation is the active one (when older
segments_N files are present it's because they
temporarily cannot be deleted, or, a writer is in
the process of committing). This file lists each
segment by name, has details about the separate
norms and deletion files, and also contains the
size of each segment.
</p>
<p>
As of 2.1, there is also a file
<tt>segments.gen</tt>. This file contains the
current generation (the <tt>_N</tt> in
<tt>segments_N</tt>) of the index. This is
used only as a fallback in case the current
generation cannot be accurately determined by
directory listing alone (as is the case for some
NFS clients with time-based directory cache
expiraation). This file simply contains an Int32
version header (SegmentInfos.FORMAT_LOCKLESS =
-2), followed by the generation recorded as Int64,
written twice.
</p>
<p>
<b>Pre-2.1:</b>
Segments --&gt; Format, Version, NameCounter, SegCount, &lt;SegName, SegSize&gt;<sup>SegCount</sup>
</p>
<p>
Format, NameCounter, SegCount, SegSize --&gt; UInt32
<b>2.1 and above:</b>
Segments --&gt; Format, Version, NameCounter, SegCount, &lt;SegName, SegSize, DelGen, NumField, NormGen<sup>NumField</sup> &gt;<sup>SegCount</sup>, IsCompoundFile
</p>
<p>
Version --&gt; UInt64
Format, NameCounter, SegCount, SegSize, NumField --&gt; Int32
</p>
<p>
Version, DelGen, NormGen --&gt; Int64
</p>
<p>
SegName --&gt; String
</p>
<p>
Format is -1 in Lucene 1.4.
IsCompoundFile --&gt; Int8
</p>
<p>
Format is -1 as of Lucene 1.4 and -2 as of Lucene 2.1.
</p>
<p>
Version counts how often the index has been
@ -1113,6 +1163,35 @@ limitations under the License.
</p>
<p>
SegSize is the number of documents contained in the segment index.
</p>
<p>
DelGen is the generation count of the separate
deletes file. If this is -1, there are no
separate deletes. If it is 0, this is a pre-2.1
segment and you must check filesystem for the
existence of _X.del. Anything above zero means
there are separate deletes (_X_N.del).
</p>
<p>
NumField is the size of the array for NormGen, or
-1 if there are no NormGens stored.
</p>
<p>
NormGen records the generation of the separate
norms files. If NumField is -1, there are no
normGens stored and they are all assumed to be 0
when the segment file was written pre-2.1 and all
assumed to be -1 when the segments file is 2.1 or
above. The generation then has the same meaning
as delGen (above).
</p>
<p>
IsCompoundFile records whether the segment is
written as a compound file or not. If this is -1,
the segment is not a compound file. If it is 1,
the segment is a compound file. Else it is 0,
which means we check filesystem to see if _X.cfs
exists.
</p>
</blockquote>
</td></tr>
@ -1121,42 +1200,31 @@ limitations under the License.
<table border="0" cellspacing="0" cellpadding="2" width="100%">
<tr><td bgcolor="#828DA6">
<font color="#ffffff" face="arial,helvetica,sanserif">
<a name="Lock Files"><strong>Lock Files</strong></a>
<a name="Lock File"><strong>Lock File</strong></a>
</font>
</td></tr>
<tr><td>
<blockquote>
<p>
Several files are used to indicate that another
process is using an index. Note that these files are not
A write lock is used to indicate that another
process is writing to the index. Note that this file is not
stored in the index directory itself, but rather in the
system's temporary directory, as indicated in the Java
system property "java.io.tmpdir".
</p>
<ul>
<li>
<p>
When a file named "commit.lock"
is present, a process is currently re-writing the "segments"
file and deleting outdated segment index files, or a process is
reading the "segments"
file and opening the files of the segments it names. This lock file
prevents files from being deleted by another process after a process
has read the "segments"
file but before it has managed to open all of the files of the
segments named therein.
</p>
</li>
<li>
<p>
When a file named "write.lock"
is present, a process is currently adding documents to an index, or
removing files from that index. This lock file prevents several
processes from attempting to modify an index at the same time.
</p>
</li>
</ul>
<p>
The write lock is named "XXXX-write.lock" where
XXXX is typically a unique prefix computed by the
directory path to the index. When this file is
present, a process is currently adding documents
to an index, or removing files from that index.
This lock file prevents several processes from
attempting to modify an index at the same time.
</p>
<p>
Note that prior to version 2.1, Lucene also used a
commit lock. This was removed in 2.1.
</p>
</blockquote>
</td></tr>
<tr><td><br/></td></tr>
@ -1170,20 +1238,11 @@ limitations under the License.
<tr><td>
<blockquote>
<p>
A file named "deletable"
contains the names of files that are no longer used by the index, but
which could not be deleted. This is only used on Win32, where a
file may not be deleted while it is still open. On other platforms
the file contains only null bytes.
</p>
<p>
Deletable --&gt; DeletableCount,
&lt;DelableName&gt;<sup>DeletableCount</sup>
</p>
<p>DeletableCount --&gt; UInt32
</p>
<p>DeletableName --&gt;
String
Prior to Lucene 2.1 there was a file "deletable"
that contained details about files that need to be
deleted. As of 2.1, a writer dynamically computes
the files that are deletable, instead, so no file
is written.
</p>
</blockquote>
</td></tr>

View File

@ -0,0 +1,219 @@
package org.apache.lucene.index;
import org.apache.lucene.index.IndexFileNames;
import org.apache.lucene.index.IndexFileNameFilter;
import org.apache.lucene.index.SegmentInfos;
import org.apache.lucene.store.Directory;
import java.io.IOException;
import java.io.PrintStream;
import java.util.Vector;
import java.util.HashMap;
/**
* A utility class (used by both IndexReader and
* IndexWriter) to keep track of files that need to be
* deleted because they are no longer referenced by the
* index.
*/
public class IndexFileDeleter {
private Vector deletable;
private Vector pending;
private Directory directory;
private SegmentInfos segmentInfos;
private PrintStream infoStream;
public IndexFileDeleter(SegmentInfos segmentInfos, Directory directory)
throws IOException {
this.segmentInfos = segmentInfos;
this.directory = directory;
}
void setInfoStream(PrintStream infoStream) {
this.infoStream = infoStream;
}
/** Determine index files that are no longer referenced
* and therefore should be deleted. This is called once
* (by the writer), and then subsequently we add onto
* deletable any files that are no longer needed at the
* point that we create the unused file (eg when merging
* segments), and we only remove from deletable when a
* file is successfully deleted.
*/
public void findDeletableFiles() throws IOException {
// Gather all "current" segments:
HashMap current = new HashMap();
for(int j=0;j<segmentInfos.size();j++) {
SegmentInfo segmentInfo = (SegmentInfo) segmentInfos.elementAt(j);
current.put(segmentInfo.name, segmentInfo);
}
// Then go through all files in the Directory that are
// Lucene index files, and add to deletable if they are
// not referenced by the current segments info:
String segmentsInfosFileName = segmentInfos.getCurrentSegmentFileName();
IndexFileNameFilter filter = IndexFileNameFilter.getFilter();
String[] files = directory.list();
for (int i = 0; i < files.length; i++) {
if (filter.accept(null, files[i]) && !files[i].equals(segmentsInfosFileName) && !files[i].equals(IndexFileNames.SEGMENTS_GEN)) {
String segmentName;
String extension;
// First remove any extension:
int loc = files[i].indexOf('.');
if (loc != -1) {
extension = files[i].substring(1+loc);
segmentName = files[i].substring(0, loc);
} else {
extension = null;
segmentName = files[i];
}
// Then, remove any generation count:
loc = segmentName.indexOf('_', 1);
if (loc != -1) {
segmentName = segmentName.substring(0, loc);
}
// Delete this file if it's not a "current" segment,
// or, it is a single index file but there is now a
// corresponding compound file:
boolean doDelete = false;
if (!current.containsKey(segmentName)) {
// Delete if segment is not referenced:
doDelete = true;
} else {
// OK, segment is referenced, but file may still
// be orphan'd:
SegmentInfo info = (SegmentInfo) current.get(segmentName);
if (filter.isCFSFile(files[i]) && info.getUseCompoundFile()) {
// This file is in fact stored in a CFS file for
// this segment:
doDelete = true;
} else {
if ("del".equals(extension)) {
// This is a _segmentName_N.del file:
if (!files[i].equals(info.getDelFileName())) {
// If this is a seperate .del file, but it
// doesn't match the current del filename for
// this segment, then delete it:
doDelete = true;
}
} else if (extension != null && extension.startsWith("s") && extension.matches("s\\d+")) {
int field = Integer.parseInt(extension.substring(1));
// This is a _segmentName_N.sX file:
if (!files[i].equals(info.getNormFileName(field))) {
// This is an orphan'd separate norms file:
doDelete = true;
}
}
}
}
if (doDelete) {
addDeletableFile(files[i]);
if (infoStream != null) {
infoStream.println("IndexFileDeleter: file \"" + files[i] + "\" is unreferenced in index and will be deleted on next commit");
}
}
}
}
}
/*
* Some operating systems (e.g. Windows) don't permit a file to be deleted
* while it is opened for read (e.g. by another process or thread). So we
* assume that when a delete fails it is because the file is open in another
* process, and queue the file for subsequent deletion.
*/
public final void deleteSegments(Vector segments) throws IOException {
deleteFiles(); // try to delete files that we couldn't before
for (int i = 0; i < segments.size(); i++) {
SegmentReader reader = (SegmentReader)segments.elementAt(i);
if (reader.directory() == this.directory)
deleteFiles(reader.files()); // try to delete our files
else
deleteFiles(reader.files(), reader.directory()); // delete other files
}
}
public final void deleteFiles(Vector files, Directory directory)
throws IOException {
for (int i = 0; i < files.size(); i++)
directory.deleteFile((String)files.elementAt(i));
}
public final void deleteFiles(Vector files)
throws IOException {
deleteFiles(); // try to delete files that we couldn't before
for (int i = 0; i < files.size(); i++) {
deleteFile((String) files.elementAt(i));
}
}
public final void deleteFile(String file)
throws IOException {
try {
directory.deleteFile(file); // try to delete each file
} catch (IOException e) { // if delete fails
if (directory.fileExists(file)) {
if (infoStream != null)
infoStream.println("IndexFileDeleter: unable to remove file \"" + file + "\": " + e.toString() + "; Will re-try later.");
addDeletableFile(file); // add to deletable
}
}
}
final void clearPendingFiles() {
pending = null;
}
final void addPendingFile(String fileName) {
if (pending == null) {
pending = new Vector();
}
pending.addElement(fileName);
}
final void commitPendingFiles() {
if (pending != null) {
if (deletable == null) {
deletable = pending;
pending = null;
} else {
deletable.addAll(pending);
pending = null;
}
}
}
public final void addDeletableFile(String fileName) {
if (deletable == null) {
deletable = new Vector();
}
deletable.addElement(fileName);
}
public final void deleteFiles()
throws IOException {
if (deletable != null) {
Vector oldDeletable = deletable;
deletable = null;
deleteFiles(oldDeletable); // try to delete deletable
}
}
}

View File

@ -19,6 +19,7 @@ package org.apache.lucene.index;
import java.io.File;
import java.io.FilenameFilter;
import java.util.HashSet;
/**
* Filename filter that accept filenames and extensions only created by Lucene.
@ -28,18 +29,64 @@ import java.io.FilenameFilter;
*/
public class IndexFileNameFilter implements FilenameFilter {
static IndexFileNameFilter singleton = new IndexFileNameFilter();
private HashSet extensions;
public IndexFileNameFilter() {
extensions = new HashSet();
for (int i = 0; i < IndexFileNames.INDEX_EXTENSIONS.length; i++) {
extensions.add(IndexFileNames.INDEX_EXTENSIONS[i]);
}
}
/* (non-Javadoc)
* @see java.io.FilenameFilter#accept(java.io.File, java.lang.String)
*/
public boolean accept(File dir, String name) {
for (int i = 0; i < IndexFileNames.INDEX_EXTENSIONS.length; i++) {
if (name.endsWith("."+IndexFileNames.INDEX_EXTENSIONS[i]))
int i = name.lastIndexOf('.');
if (i != -1) {
String extension = name.substring(1+i);
if (extensions.contains(extension)) {
return true;
} else if (extension.startsWith("f") &&
extension.matches("f\\d+")) {
return true;
} else if (extension.startsWith("s") &&
extension.matches("s\\d+")) {
return true;
}
} else {
if (name.equals(IndexFileNames.DELETABLE)) return true;
else if (name.startsWith(IndexFileNames.SEGMENTS)) return true;
}
if (name.equals(IndexFileNames.DELETABLE)) return true;
else if (name.equals(IndexFileNames.SEGMENTS)) return true;
else if (name.matches(".+\\.f\\d+")) return true;
return false;
}
/**
* Returns true if this is a file that would be contained
* in a CFS file. This function should only be called on
* files that pass the above "accept" (ie, are already
* known to be a Lucene index file).
*/
public boolean isCFSFile(String name) {
int i = name.lastIndexOf('.');
if (i != -1) {
String extension = name.substring(1+i);
if (extensions.contains(extension) &&
!extension.equals("del") &&
!extension.equals("gen") &&
!extension.equals("cfs")) {
return true;
}
if (extension.startsWith("f") &&
extension.matches("f\\d+")) {
return true;
}
}
return false;
}
public static IndexFileNameFilter getFilter() {
return singleton;
}
}

View File

@ -27,19 +27,25 @@ final class IndexFileNames {
/** Name of the index segment file */
static final String SEGMENTS = "segments";
/** Name of the generation reference file name */
static final String SEGMENTS_GEN = "segments.gen";
/** Name of the index deletable file */
/** Name of the index deletable file (only used in
* pre-lockless indices) */
static final String DELETABLE = "deletable";
/**
* This array contains all filename extensions used by Lucene's index files, with
* one exception, namely the extension made up from <code>.f</code> + a number.
* Also note that two of Lucene's files (<code>deletable</code> and
* <code>segments</code>) don't have any filename extension.
* This array contains all filename extensions used by
* Lucene's index files, with two exceptions, namely the
* extension made up from <code>.f</code> + a number and
* from <code>.s</code> + a number. Also note that
* Lucene's <code>segments_N</code> files do not have any
* filename extension.
*/
static final String INDEX_EXTENSIONS[] = new String[] {
"cfs", "fnm", "fdx", "fdt", "tii", "tis", "frq", "prx", "del",
"tvx", "tvd", "tvf", "tvp" };
"tvx", "tvd", "tvf", "tvp", "gen"};
/** File extensions of old-style index files */
static final String COMPOUND_EXTENSIONS[] = new String[] {
@ -50,5 +56,24 @@ final class IndexFileNames {
static final String VECTOR_EXTENSIONS[] = new String[] {
"tvx", "tvd", "tvf"
};
/**
* Computes the full file name from base, extension and
* generation. If the generation is -1, the file name is
* null. If it's 0, the file name is <base><extension>.
* If it's > 0, the file name is <base>_<generation><extension>.
*
* @param base -- main part of the file name
* @param extension -- extension of the filename (including .)
* @param gen -- generation
*/
public static final String fileNameFromGeneration(String base, String extension, long gen) {
if (gen == -1) {
return null;
} else if (gen == 0) {
return base + extension;
} else {
return base + "_" + Long.toString(gen, Character.MAX_RADIX) + extension;
}
}
}

View File

@ -113,6 +113,7 @@ public abstract class IndexReader {
private Directory directory;
private boolean directoryOwner;
private boolean closeDirectory;
protected IndexFileDeleter deleter;
private SegmentInfos segmentInfos;
private Lock writeLock;
@ -138,24 +139,40 @@ public abstract class IndexReader {
}
private static IndexReader open(final Directory directory, final boolean closeDirectory) throws IOException {
synchronized (directory) { // in- & inter-process sync
return (IndexReader)new Lock.With(
directory.makeLock(IndexWriter.COMMIT_LOCK_NAME),
IndexWriter.COMMIT_LOCK_TIMEOUT) {
public Object doBody() throws IOException {
SegmentInfos infos = new SegmentInfos();
infos.read(directory);
if (infos.size() == 1) { // index is optimized
return SegmentReader.get(infos, infos.info(0), closeDirectory);
}
IndexReader[] readers = new IndexReader[infos.size()];
for (int i = 0; i < infos.size(); i++)
readers[i] = SegmentReader.get(infos.info(i));
return new MultiReader(directory, infos, closeDirectory, readers);
return (IndexReader) new SegmentInfos.FindSegmentsFile(directory) {
public Object doBody(String segmentFileName) throws IOException {
SegmentInfos infos = new SegmentInfos();
infos.read(directory, segmentFileName);
if (infos.size() == 1) { // index is optimized
return SegmentReader.get(infos, infos.info(0), closeDirectory);
} else {
// To reduce the chance of hitting FileNotFound
// (and having to retry), we open segments in
// reverse because IndexWriter merges & deletes
// the newest segments first.
IndexReader[] readers = new IndexReader[infos.size()];
for (int i = infos.size()-1; i >= 0; i--) {
try {
readers[i] = SegmentReader.get(infos.info(i));
} catch (IOException e) {
// Close all readers we had opened:
for(i++;i<infos.size();i++) {
readers[i].close();
}
throw e;
}
}
}.run();
}
return new MultiReader(directory, infos, closeDirectory, readers);
}
}
}.run();
}
/** Returns the directory this index resides in. */
@ -175,8 +192,12 @@ public abstract class IndexReader {
* Do not use this to check whether the reader is still up-to-date, use
* {@link #isCurrent()} instead.
*/
public static long lastModified(File directory) throws IOException {
return FSDirectory.fileModified(directory, IndexFileNames.SEGMENTS);
public static long lastModified(File fileDirectory) throws IOException {
return ((Long) new SegmentInfos.FindSegmentsFile(fileDirectory) {
public Object doBody(String segmentFileName) {
return new Long(FSDirectory.fileModified(fileDirectory, segmentFileName));
}
}.run()).longValue();
}
/**
@ -184,8 +205,12 @@ public abstract class IndexReader {
* Do not use this to check whether the reader is still up-to-date, use
* {@link #isCurrent()} instead.
*/
public static long lastModified(Directory directory) throws IOException {
return directory.fileModified(IndexFileNames.SEGMENTS);
public static long lastModified(final Directory directory2) throws IOException {
return ((Long) new SegmentInfos.FindSegmentsFile(directory2) {
public Object doBody(String segmentFileName) throws IOException {
return new Long(directory2.fileModified(segmentFileName));
}
}.run()).longValue();
}
/**
@ -227,21 +252,7 @@ public abstract class IndexReader {
* @throws IOException if segments file cannot be read.
*/
public static long getCurrentVersion(Directory directory) throws IOException {
synchronized (directory) { // in- & inter-process sync
Lock commitLock=directory.makeLock(IndexWriter.COMMIT_LOCK_NAME);
boolean locked=false;
try {
locked=commitLock.obtain(IndexWriter.COMMIT_LOCK_TIMEOUT);
return SegmentInfos.readCurrentVersion(directory);
} finally {
if (locked) {
commitLock.release();
}
}
}
return SegmentInfos.readCurrentVersion(directory);
}
/**
@ -259,21 +270,7 @@ public abstract class IndexReader {
* @throws IOException
*/
public boolean isCurrent() throws IOException {
synchronized (directory) { // in- & inter-process sync
Lock commitLock=directory.makeLock(IndexWriter.COMMIT_LOCK_NAME);
boolean locked=false;
try {
locked=commitLock.obtain(IndexWriter.COMMIT_LOCK_TIMEOUT);
return SegmentInfos.readCurrentVersion(directory) == segmentInfos.getVersion();
} finally {
if (locked) {
commitLock.release();
}
}
}
return SegmentInfos.readCurrentVersion(directory) == segmentInfos.getVersion();
}
/**
@ -319,7 +316,7 @@ public abstract class IndexReader {
* @return <code>true</code> if an index exists; <code>false</code> otherwise
*/
public static boolean indexExists(String directory) {
return (new File(directory, IndexFileNames.SEGMENTS)).exists();
return indexExists(new File(directory));
}
/**
@ -328,8 +325,9 @@ public abstract class IndexReader {
* @param directory the directory to check for an index
* @return <code>true</code> if an index exists; <code>false</code> otherwise
*/
public static boolean indexExists(File directory) {
return (new File(directory, IndexFileNames.SEGMENTS)).exists();
return SegmentInfos.getCurrentSegmentGeneration(directory.list()) != -1;
}
/**
@ -340,7 +338,7 @@ public abstract class IndexReader {
* @throws IOException if there is a problem with accessing the index
*/
public static boolean indexExists(Directory directory) throws IOException {
return directory.fileExists(IndexFileNames.SEGMENTS);
return SegmentInfos.getCurrentSegmentGeneration(directory) != -1;
}
/** Returns the number of documents in this index. */
@ -592,17 +590,22 @@ public abstract class IndexReader {
*/
protected final synchronized void commit() throws IOException{
if(hasChanges){
if (deleter == null) {
// In the MultiReader case, we share this deleter
// across all SegmentReaders:
setDeleter(new IndexFileDeleter(segmentInfos, directory));
deleter.deleteFiles();
}
if(directoryOwner){
synchronized (directory) { // in- & inter-process sync
new Lock.With(directory.makeLock(IndexWriter.COMMIT_LOCK_NAME),
IndexWriter.COMMIT_LOCK_TIMEOUT) {
public Object doBody() throws IOException {
doCommit();
segmentInfos.write(directory);
return null;
}
}.run();
}
deleter.clearPendingFiles();
doCommit();
String oldInfoFileName = segmentInfos.getCurrentSegmentFileName();
segmentInfos.write(directory);
// Attempt to delete all files we just obsoleted:
deleter.deleteFile(oldInfoFileName);
deleter.commitPendingFiles();
deleter.deleteFiles();
if (writeLock != null) {
writeLock.release(); // release write lock
writeLock = null;
@ -614,6 +617,13 @@ public abstract class IndexReader {
hasChanges = false;
}
protected void setDeleter(IndexFileDeleter deleter) {
this.deleter = deleter;
}
protected IndexFileDeleter getDeleter() {
return deleter;
}
/** Implements commit. */
protected abstract void doCommit() throws IOException;
@ -658,8 +668,7 @@ public abstract class IndexReader {
*/
public static boolean isLocked(Directory directory) throws IOException {
return
directory.makeLock(IndexWriter.WRITE_LOCK_NAME).isLocked() ||
directory.makeLock(IndexWriter.COMMIT_LOCK_NAME).isLocked();
directory.makeLock(IndexWriter.WRITE_LOCK_NAME).isLocked();
}
/**
@ -684,7 +693,6 @@ public abstract class IndexReader {
*/
public static void unlock(Directory directory) throws IOException {
directory.makeLock(IndexWriter.WRITE_LOCK_NAME).release();
directory.makeLock(IndexWriter.COMMIT_LOCK_NAME).release();
}
/**

View File

@ -67,16 +67,7 @@ public class IndexWriter {
private long writeLockTimeout = WRITE_LOCK_TIMEOUT;
/**
* Default value for the commit lock timeout (10,000).
* @see #setDefaultCommitLockTimeout
*/
public static long COMMIT_LOCK_TIMEOUT = 10000;
private long commitLockTimeout = COMMIT_LOCK_TIMEOUT;
public static final String WRITE_LOCK_NAME = "write.lock";
public static final String COMMIT_LOCK_NAME = "commit.lock";
/**
* Default value is 10. Change using {@link #setMergeFactor(int)}.
@ -111,6 +102,7 @@ public class IndexWriter {
private SegmentInfos segmentInfos = new SegmentInfos(); // the segments
private SegmentInfos ramSegmentInfos = new SegmentInfos(); // the segments in ramDirectory
private final Directory ramDirectory = new RAMDirectory(); // for temp segs
private IndexFileDeleter deleter;
private Lock writeLock;
@ -260,19 +252,30 @@ public class IndexWriter {
this.writeLock = writeLock; // save it
try {
synchronized (directory) { // in- & inter-process sync
new Lock.With(directory.makeLock(IndexWriter.COMMIT_LOCK_NAME), commitLockTimeout) {
public Object doBody() throws IOException {
if (create)
segmentInfos.write(directory);
else
segmentInfos.read(directory);
return null;
}
}.run();
if (create) {
// Try to read first. This is to allow create
// against an index that's currently open for
// searching. In this case we write the next
// segments_N file with no segments:
try {
segmentInfos.read(directory);
segmentInfos.clear();
} catch (IOException e) {
// Likely this means it's a fresh directory
}
segmentInfos.write(directory);
} else {
segmentInfos.read(directory);
}
// Create a deleter to keep track of which files can
// be deleted:
deleter = new IndexFileDeleter(segmentInfos, directory);
deleter.setInfoStream(infoStream);
deleter.findDeletableFiles();
deleter.deleteFiles();
} catch (IOException e) {
// the doBody method failed
this.writeLock.release();
this.writeLock = null;
throw e;
@ -380,35 +383,6 @@ public class IndexWriter {
return infoStream;
}
/**
* Sets the maximum time to wait for a commit lock (in milliseconds) for this instance of IndexWriter. @see
* @see #setDefaultCommitLockTimeout to change the default value for all instances of IndexWriter.
*/
public void setCommitLockTimeout(long commitLockTimeout) {
this.commitLockTimeout = commitLockTimeout;
}
/**
* @see #setCommitLockTimeout
*/
public long getCommitLockTimeout() {
return commitLockTimeout;
}
/**
* Sets the default (for any instance of IndexWriter) maximum time to wait for a commit lock (in milliseconds)
*/
public static void setDefaultCommitLockTimeout(long commitLockTimeout) {
IndexWriter.COMMIT_LOCK_TIMEOUT = commitLockTimeout;
}
/**
* @see #setDefaultCommitLockTimeout
*/
public static long getDefaultCommitLockTimeout() {
return IndexWriter.COMMIT_LOCK_TIMEOUT;
}
/**
* Sets the maximum time to wait for a write lock (in milliseconds) for this instance of IndexWriter. @see
* @see #setDefaultWriteLockTimeout to change the default value for all instances of IndexWriter.
@ -517,7 +491,7 @@ public class IndexWriter {
String segmentName = newRAMSegmentName();
dw.addDocument(segmentName, doc);
synchronized (this) {
ramSegmentInfos.addElement(new SegmentInfo(segmentName, 1, ramDirectory));
ramSegmentInfos.addElement(new SegmentInfo(segmentName, 1, ramDirectory, false));
maybeFlushRamSegments();
}
}
@ -790,36 +764,26 @@ public class IndexWriter {
int docCount = merger.merge(); // merge 'em
segmentInfos.setSize(0); // pop old infos & add new
segmentInfos.addElement(new SegmentInfo(mergedName, docCount, directory));
SegmentInfo info = new SegmentInfo(mergedName, docCount, directory, false);
segmentInfos.addElement(info);
if(sReader != null)
sReader.close();
synchronized (directory) { // in- & inter-process sync
new Lock.With(directory.makeLock(COMMIT_LOCK_NAME), commitLockTimeout) {
public Object doBody() throws IOException {
segmentInfos.write(directory); // commit changes
return null;
}
}.run();
}
String segmentsInfosFileName = segmentInfos.getCurrentSegmentFileName();
segmentInfos.write(directory); // commit changes
deleteSegments(segmentsToDelete); // delete now-unused segments
deleter.deleteFile(segmentsInfosFileName); // delete old segments_N file
deleter.deleteSegments(segmentsToDelete); // delete now-unused segments
if (useCompoundFile) {
final Vector filesToDelete = merger.createCompoundFile(mergedName + ".tmp");
synchronized (directory) { // in- & inter-process sync
new Lock.With(directory.makeLock(COMMIT_LOCK_NAME), commitLockTimeout) {
public Object doBody() throws IOException {
// make compound file visible for SegmentReaders
directory.renameFile(mergedName + ".tmp", mergedName + ".cfs");
return null;
}
}.run();
}
Vector filesToDelete = merger.createCompoundFile(mergedName + ".cfs");
segmentsInfosFileName = segmentInfos.getCurrentSegmentFileName();
info.setUseCompoundFile(true);
segmentInfos.write(directory); // commit again so readers know we've switched this segment to a compound file
// delete now unused files of segment
deleteFiles(filesToDelete);
deleter.deleteFile(segmentsInfosFileName); // delete old segments_N file
deleter.deleteFiles(filesToDelete); // delete now unused files of segment
}
}
@ -937,10 +901,11 @@ public class IndexWriter {
*/
private final int mergeSegments(SegmentInfos sourceSegments, int minSegment, int end)
throws IOException {
final String mergedName = newSegmentName();
if (infoStream != null) infoStream.print("merging segments");
SegmentMerger merger = new SegmentMerger(this, mergedName);
final Vector segmentsToDelete = new Vector();
for (int i = minSegment; i < end; i++) {
SegmentInfo si = sourceSegments.info(i);
@ -960,7 +925,7 @@ public class IndexWriter {
}
SegmentInfo newSegment = new SegmentInfo(mergedName, mergedDocCount,
directory);
directory, false);
if (sourceSegments == ramSegmentInfos) {
sourceSegments.removeAllElements();
segmentInfos.addElement(newSegment);
@ -973,115 +938,26 @@ public class IndexWriter {
// close readers before we attempt to delete now-obsolete segments
merger.closeReaders();
synchronized (directory) { // in- & inter-process sync
new Lock.With(directory.makeLock(COMMIT_LOCK_NAME), commitLockTimeout) {
public Object doBody() throws IOException {
segmentInfos.write(directory); // commit before deleting
return null;
}
}.run();
}
deleteSegments(segmentsToDelete); // delete now-unused segments
String segmentsInfosFileName = segmentInfos.getCurrentSegmentFileName();
segmentInfos.write(directory); // commit before deleting
deleter.deleteFile(segmentsInfosFileName); // delete old segments_N file
deleter.deleteSegments(segmentsToDelete); // delete now-unused segments
if (useCompoundFile) {
final Vector filesToDelete = merger.createCompoundFile(mergedName + ".tmp");
synchronized (directory) { // in- & inter-process sync
new Lock.With(directory.makeLock(COMMIT_LOCK_NAME), commitLockTimeout) {
public Object doBody() throws IOException {
// make compound file visible for SegmentReaders
directory.renameFile(mergedName + ".tmp", mergedName + ".cfs");
return null;
}
}.run();
}
Vector filesToDelete = merger.createCompoundFile(mergedName + ".cfs");
// delete now unused files of segment
deleteFiles(filesToDelete);
segmentsInfosFileName = segmentInfos.getCurrentSegmentFileName();
newSegment.setUseCompoundFile(true);
segmentInfos.write(directory); // commit again so readers know we've switched this segment to a compound file
deleter.deleteFile(segmentsInfosFileName); // delete old segments_N file
deleter.deleteFiles(filesToDelete); // delete now-unused segments
}
return mergedDocCount;
}
/*
* Some operating systems (e.g. Windows) don't permit a file to be deleted
* while it is opened for read (e.g. by another process or thread). So we
* assume that when a delete fails it is because the file is open in another
* process, and queue the file for subsequent deletion.
*/
private final void deleteSegments(Vector segments) throws IOException {
Vector deletable = new Vector();
deleteFiles(readDeleteableFiles(), deletable); // try to delete deleteable
for (int i = 0; i < segments.size(); i++) {
SegmentReader reader = (SegmentReader)segments.elementAt(i);
if (reader.directory() == this.directory)
deleteFiles(reader.files(), deletable); // try to delete our files
else
deleteFiles(reader.files(), reader.directory()); // delete other files
}
writeDeleteableFiles(deletable); // note files we can't delete
}
private final void deleteFiles(Vector files) throws IOException {
Vector deletable = new Vector();
deleteFiles(readDeleteableFiles(), deletable); // try to delete deleteable
deleteFiles(files, deletable); // try to delete our files
writeDeleteableFiles(deletable); // note files we can't delete
}
private final void deleteFiles(Vector files, Directory directory)
throws IOException {
for (int i = 0; i < files.size(); i++)
directory.deleteFile((String)files.elementAt(i));
}
private final void deleteFiles(Vector files, Vector deletable)
throws IOException {
for (int i = 0; i < files.size(); i++) {
String file = (String)files.elementAt(i);
try {
directory.deleteFile(file); // try to delete each file
} catch (IOException e) { // if delete fails
if (directory.fileExists(file)) {
if (infoStream != null)
infoStream.println(e.toString() + "; Will re-try later.");
deletable.addElement(file); // add to deletable
}
}
}
}
private final Vector readDeleteableFiles() throws IOException {
Vector result = new Vector();
if (!directory.fileExists(IndexFileNames.DELETABLE))
return result;
IndexInput input = directory.openInput(IndexFileNames.DELETABLE);
try {
for (int i = input.readInt(); i > 0; i--) // read file names
result.addElement(input.readString());
} finally {
input.close();
}
return result;
}
private final void writeDeleteableFiles(Vector files) throws IOException {
IndexOutput output = directory.createOutput("deleteable.new");
try {
output.writeInt(files.size());
for (int i = 0; i < files.size(); i++)
output.writeString((String)files.elementAt(i));
} finally {
output.close();
}
directory.renameFile("deleteable.new", IndexFileNames.DELETABLE);
}
private final boolean checkNonDecreasingLevels(int start) {
int lowerBound = -1;
int upperBound = minMergeDocs;

View File

@ -218,6 +218,13 @@ public class MultiReader extends IndexReader {
return new MultiTermPositions(subReaders, starts);
}
protected void setDeleter(IndexFileDeleter deleter) {
// Share deleter to our SegmentReaders:
this.deleter = deleter;
for (int i = 0; i < subReaders.length; i++)
subReaders[i].setDeleter(deleter);
}
protected void doCommit() throws IOException {
for (int i = 0; i < subReaders.length; i++)
subReaders[i].commit();

View File

@ -18,15 +18,302 @@ package org.apache.lucene.index;
*/
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.IndexOutput;
import org.apache.lucene.store.IndexInput;
import java.io.IOException;
final class SegmentInfo {
public String name; // unique name in dir
public int docCount; // number of docs in seg
public Directory dir; // where segment resides
private boolean preLockless; // true if this is a segments file written before
// lock-less commits (XXX)
private long delGen; // current generation of del file; -1 if there
// are no deletes; 0 if it's a pre-XXX segment
// (and we must check filesystem); 1 or higher if
// there are deletes at generation N
private long[] normGen; // current generations of each field's norm file.
// If this array is null, we must check filesystem
// when preLockLess is true. Else,
// there are no separate norms
private byte isCompoundFile; // -1 if it is not; 1 if it is; 0 if it's
// pre-XXX (ie, must check file system to see
// if <name>.cfs exists)
public SegmentInfo(String name, int docCount, Directory dir) {
this.name = name;
this.docCount = docCount;
this.dir = dir;
delGen = -1;
isCompoundFile = 0;
preLockless = true;
}
public SegmentInfo(String name, int docCount, Directory dir, boolean isCompoundFile) {
this(name, docCount, dir);
if (isCompoundFile) {
this.isCompoundFile = 1;
} else {
this.isCompoundFile = -1;
}
preLockless = false;
}
/**
* Construct a new SegmentInfo instance by reading a
* previously saved SegmentInfo from input.
*
* @param dir directory to load from
* @param format format of the segments info file
* @param input input handle to read segment info from
*/
public SegmentInfo(Directory dir, int format, IndexInput input) throws IOException {
this.dir = dir;
name = input.readString();
docCount = input.readInt();
if (format <= SegmentInfos.FORMAT_LOCKLESS) {
delGen = input.readLong();
int numNormGen = input.readInt();
if (numNormGen == -1) {
normGen = null;
} else {
normGen = new long[numNormGen];
for(int j=0;j<numNormGen;j++) {
normGen[j] = input.readLong();
}
}
isCompoundFile = input.readByte();
preLockless = isCompoundFile == 0;
} else {
delGen = 0;
normGen = null;
isCompoundFile = 0;
preLockless = true;
}
}
void setNumField(int numField) {
if (normGen == null) {
// normGen is null if we loaded a pre-XXX segment
// file, or, if this segments file hasn't had any
// norms set against it yet:
normGen = new long[numField];
if (!preLockless) {
// This is a FORMAT_LOCKLESS segment, which means
// there are no norms:
for(int i=0;i<numField;i++) {
normGen[i] = -1;
}
}
}
}
boolean hasDeletions()
throws IOException {
// Cases:
//
// delGen == -1: this means this segment was written
// by the LOCKLESS code and for certain does not have
// deletions yet
//
// delGen == 0: this means this segment was written by
// pre-LOCKLESS code which means we must check
// directory to see if .del file exists
//
// delGen > 0: this means this segment was written by
// the LOCKLESS code and for certain has
// deletions
//
if (delGen == -1) {
return false;
} else if (delGen > 0) {
return true;
} else {
return dir.fileExists(getDelFileName());
}
}
void advanceDelGen() {
// delGen 0 is reserved for pre-LOCKLESS format
if (delGen == -1) {
delGen = 1;
} else {
delGen++;
}
}
void clearDelGen() {
delGen = -1;
}
String getDelFileName() {
if (delGen == -1) {
// In this case we know there is no deletion filename
// against this segment
return null;
} else {
// If delGen is 0, it's the pre-lockless-commit file format
return IndexFileNames.fileNameFromGeneration(name, ".del", delGen);
}
}
/**
* Returns true if this field for this segment has saved a separate norms file (_<segment>_N.sX).
*
* @param fieldNumber the field index to check
*/
boolean hasSeparateNorms(int fieldNumber)
throws IOException {
if ((normGen == null && preLockless) || (normGen != null && normGen[fieldNumber] == 0)) {
// Must fallback to directory file exists check:
String fileName = name + ".s" + fieldNumber;
return dir.fileExists(fileName);
} else if (normGen == null || normGen[fieldNumber] == -1) {
return false;
} else {
return true;
}
}
/**
* Returns true if any fields in this segment have separate norms.
*/
boolean hasSeparateNorms()
throws IOException {
if (normGen == null) {
if (!preLockless) {
// This means we were created w/ LOCKLESS code and no
// norms are written yet:
return false;
} else {
// This means this segment was saved with pre-LOCKLESS
// code. So we must fallback to the original
// directory list check:
String[] result = dir.list();
String pattern;
pattern = name + ".s";
int patternLength = pattern.length();
for(int i = 0; i < result.length; i++){
if(result[i].startsWith(pattern) && Character.isDigit(result[i].charAt(patternLength)))
return true;
}
return false;
}
} else {
// This means this segment was saved with LOCKLESS
// code so we first check whether any normGen's are >
// 0 (meaning they definitely have separate norms):
for(int i=0;i<normGen.length;i++) {
if (normGen[i] > 0) {
return true;
}
}
// Next we look for any == 0. These cases were
// pre-LOCKLESS and must be checked in directory:
for(int i=0;i<normGen.length;i++) {
if (normGen[i] == 0) {
if (dir.fileExists(getNormFileName(i))) {
return true;
}
}
}
}
return false;
}
/**
* Increment the generation count for the norms file for
* this field.
*
* @param fieldIndex field whose norm file will be rewritten
*/
void advanceNormGen(int fieldIndex) {
if (normGen[fieldIndex] == -1) {
normGen[fieldIndex] = 1;
} else {
normGen[fieldIndex]++;
}
}
/**
* Get the file name for the norms file for this field.
*
* @param number field index
*/
String getNormFileName(int number) throws IOException {
String prefix;
long gen;
if (normGen == null) {
gen = 0;
} else {
gen = normGen[number];
}
if (hasSeparateNorms(number)) {
prefix = ".s";
return IndexFileNames.fileNameFromGeneration(name, prefix + number, gen);
} else {
prefix = ".f";
return IndexFileNames.fileNameFromGeneration(name, prefix + number, 0);
}
}
/**
* Mark whether this segment is stored as a compound file.
*
* @param isCompoundFile true if this is a compound file;
* else, false
*/
void setUseCompoundFile(boolean isCompoundFile) {
if (isCompoundFile) {
this.isCompoundFile = 1;
} else {
this.isCompoundFile = -1;
}
}
/**
* Returns true if this segment is stored as a compound
* file; else, false.
*
* @param directory directory to check. This parameter is
* only used when the segment was written before version
* XXX (at which point compound file or not became stored
* in the segments info file).
*/
boolean getUseCompoundFile() throws IOException {
if (isCompoundFile == -1) {
return false;
} else if (isCompoundFile == 1) {
return true;
} else {
return dir.fileExists(name + ".cfs");
}
}
/**
* Save this segment's info.
*/
void write(IndexOutput output)
throws IOException {
output.writeString(name);
output.writeInt(docCount);
output.writeLong(delGen);
if (normGen == null) {
output.writeInt(-1);
} else {
output.writeInt(normGen.length);
for(int j=0;j<normGen.length;j++) {
output.writeLong(normGen[j]);
}
}
output.writeByte(isCompoundFile);
}
}

View File

@ -19,36 +19,151 @@ package org.apache.lucene.index;
import java.util.Vector;
import java.io.IOException;
import java.io.PrintStream;
import java.io.File;
import java.io.FileNotFoundException;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.IndexInput;
import org.apache.lucene.store.IndexOutput;
import org.apache.lucene.util.Constants;
final class SegmentInfos extends Vector {
public final class SegmentInfos extends Vector {
/** The file format version, a negative number. */
/* Works since counter, the old 1st entry, is always >= 0 */
public static final int FORMAT = -1;
/** This is the current file format written. It differs
* slightly from the previous format in that file names
* are never re-used (write once). Instead, each file is
* written to the next generation. For example,
* segments_1, segments_2, etc. This allows us to not use
* a commit lock. See <a
* href="http://lucene.apache.org/java/docs/fileformats.html">file
* formats</a> for details.
*/
public static final int FORMAT_LOCKLESS = -2;
public int counter = 0; // used to name new segments
/**
* counts how often the index has been changed by adding or deleting docs.
* starting with the current time in milliseconds forces to create unique version numbers.
*/
private long version = System.currentTimeMillis();
private long generation = 0; // generation of the "segments_N" file we read
/**
* If non-null, information about loading segments_N files
* will be printed here. @see #setInfoStream.
*/
private static PrintStream infoStream;
public final SegmentInfo info(int i) {
return (SegmentInfo) elementAt(i);
}
public final void read(Directory directory) throws IOException {
IndexInput input = directory.openInput(IndexFileNames.SEGMENTS);
/**
* Get the generation (N) of the current segments_N file
* from a list of files.
*
* @param files -- array of file names to check
*/
public static long getCurrentSegmentGeneration(String[] files) {
if (files == null) {
return -1;
}
long max = -1;
int prefixLen = IndexFileNames.SEGMENTS.length()+1;
for (int i = 0; i < files.length; i++) {
String file = files[i];
if (file.startsWith(IndexFileNames.SEGMENTS) && !file.equals(IndexFileNames.SEGMENTS_GEN)) {
if (file.equals(IndexFileNames.SEGMENTS)) {
// Pre lock-less commits:
if (max == -1) {
max = 0;
}
} else {
long v = Long.parseLong(file.substring(prefixLen), Character.MAX_RADIX);
if (v > max) {
max = v;
}
}
}
}
return max;
}
/**
* Get the generation (N) of the current segments_N file
* in the directory.
*
* @param directory -- directory to search for the latest segments_N file
*/
public static long getCurrentSegmentGeneration(Directory directory) throws IOException {
String[] files = directory.list();
if (files == null)
throw new IOException("Cannot read directory " + directory);
return getCurrentSegmentGeneration(files);
}
/**
* Get the filename of the current segments_N file
* from a list of files.
*
* @param files -- array of file names to check
*/
public static String getCurrentSegmentFileName(String[] files) throws IOException {
return IndexFileNames.fileNameFromGeneration(IndexFileNames.SEGMENTS,
"",
getCurrentSegmentGeneration(files));
}
/**
* Get the filename of the current segments_N file
* in the directory.
*
* @param directory -- directory to search for the latest segments_N file
*/
public static String getCurrentSegmentFileName(Directory directory) throws IOException {
return IndexFileNames.fileNameFromGeneration(IndexFileNames.SEGMENTS,
"",
getCurrentSegmentGeneration(directory));
}
/**
* Get the segment_N filename in use by this segment infos.
*/
public String getCurrentSegmentFileName() {
return IndexFileNames.fileNameFromGeneration(IndexFileNames.SEGMENTS,
"",
generation);
}
/**
* Read a particular segmentFileName. Note that this may
* throw an IOException if a commit is in process.
*
* @param directory -- directory containing the segments file
* @param segmentFileName -- segment file to load
*/
public final void read(Directory directory, String segmentFileName) throws IOException {
boolean success = false;
IndexInput input = directory.openInput(segmentFileName);
if (segmentFileName.equals(IndexFileNames.SEGMENTS)) {
generation = 0;
} else {
generation = Long.parseLong(segmentFileName.substring(1+IndexFileNames.SEGMENTS.length()),
Character.MAX_RADIX);
}
try {
int format = input.readInt();
if(format < 0){ // file contains explicit format info
// check that it is a format we can understand
if (format < FORMAT)
if (format < FORMAT_LOCKLESS)
throw new IOException("Unknown format version: " + format);
version = input.readLong(); // read version
counter = input.readInt(); // read counter
@ -58,9 +173,7 @@ final class SegmentInfos extends Vector {
}
for (int i = input.readInt(); i > 0; i--) { // read segmentInfos
SegmentInfo si =
new SegmentInfo(input.readString(), input.readInt(), directory);
addElement(si);
addElement(new SegmentInfo(directory, format, input));
}
if(format >= 0){ // in old format the version number may be at the end of the file
@ -69,31 +182,71 @@ final class SegmentInfos extends Vector {
else
version = input.readLong(); // read version
}
success = true;
}
finally {
input.close();
if (!success) {
// Clear any segment infos we had loaded so we
// have a clean slate on retry:
clear();
}
}
}
/**
* This version of read uses the retry logic (for lock-less
* commits) to find the right segments file to load.
*/
public final void read(Directory directory) throws IOException {
generation = -1;
new FindSegmentsFile(directory) {
public Object doBody(String segmentFileName) throws IOException {
read(directory, segmentFileName);
return null;
}
}.run();
}
public final void write(Directory directory) throws IOException {
IndexOutput output = directory.createOutput("segments.new");
// Always advance the generation on write:
if (generation == -1) {
generation = 1;
} else {
generation++;
}
String segmentFileName = getCurrentSegmentFileName();
IndexOutput output = directory.createOutput(segmentFileName);
try {
output.writeInt(FORMAT); // write FORMAT
output.writeLong(++version); // every write changes the index
output.writeInt(FORMAT_LOCKLESS); // write FORMAT
output.writeLong(++version); // every write changes
// the index
output.writeInt(counter); // write counter
output.writeInt(size()); // write infos
for (int i = 0; i < size(); i++) {
SegmentInfo si = info(i);
output.writeString(si.name);
output.writeInt(si.docCount);
si.write(output);
}
}
finally {
output.close();
}
// install new segment info
directory.renameFile("segments.new", IndexFileNames.SEGMENTS);
try {
output = directory.createOutput(IndexFileNames.SEGMENTS_GEN);
output.writeInt(FORMAT_LOCKLESS);
output.writeLong(generation);
output.writeLong(generation);
output.close();
} catch (IOException e) {
// It's OK if we fail to write this file since it's
// used only as one of the retry fallbacks.
}
}
/**
@ -108,30 +261,322 @@ final class SegmentInfos extends Vector {
*/
public static long readCurrentVersion(Directory directory)
throws IOException {
return ((Long) new FindSegmentsFile(directory) {
public Object doBody(String segmentFileName) throws IOException {
IndexInput input = directory.openInput(segmentFileName);
int format = 0;
long version = 0;
try {
format = input.readInt();
if(format < 0){
if (format < FORMAT_LOCKLESS)
throw new IOException("Unknown format version: " + format);
version = input.readLong(); // read version
}
}
finally {
input.close();
}
if(format < 0)
return new Long(version);
// We cannot be sure about the format of the file.
// Therefore we have to read the whole file and cannot simply seek to the version entry.
SegmentInfos sis = new SegmentInfos();
sis.read(directory, segmentFileName);
return new Long(sis.getVersion());
}
}.run()).longValue();
}
/** If non-null, information about retries when loading
* the segments file will be printed to this.
*/
public static void setInfoStream(PrintStream infoStream) {
SegmentInfos.infoStream = infoStream;
}
/* Advanced configuration of retry logic in loading
segments_N file */
private static int defaultGenFileRetryCount = 10;
private static int defaultGenFileRetryPauseMsec = 50;
private static int defaultGenLookaheadCount = 10;
/**
* Advanced: set how many times to try loading the
* segments.gen file contents to determine current segment
* generation. This file is only referenced when the
* primary method (listing the directory) fails.
*/
public static void setDefaultGenFileRetryCount(int count) {
defaultGenFileRetryCount = count;
}
/**
* @see #setDefaultGenFileRetryCount
*/
public static int getDefaultGenFileRetryCount() {
return defaultGenFileRetryCount;
}
/**
* Advanced: set how many milliseconds to pause in between
* attempts to load the segments.gen file.
*/
public static void setDefaultGenFileRetryPauseMsec(int msec) {
defaultGenFileRetryPauseMsec = msec;
}
/**
* @see #setDefaultGenFileRetryPauseMsec
*/
public static int getDefaultGenFileRetryPauseMsec() {
return defaultGenFileRetryPauseMsec;
}
/**
* Advanced: set how many times to try incrementing the
* gen when loading the segments file. This only runs if
* the primary (listing directory) and secondary (opening
* segments.gen file) methods fail to find the segments
* file.
*/
public static void setDefaultGenLookaheadCount(int count) {
defaultGenLookaheadCount = count;
}
/**
* @see #setDefaultGenLookaheadCount
*/
public static int getDefaultGenLookahedCount() {
return defaultGenLookaheadCount;
}
/**
* @see #setInfoStream
*/
public static PrintStream getInfoStream() {
return infoStream;
}
private static void message(String message) {
if (infoStream != null) {
infoStream.println(Thread.currentThread().getName() + ": " + message);
}
}
/**
* Utility class for executing code that needs to do
* something with the current segments file. This is
* necessary with lock-less commits because from the time
* you locate the current segments file name, until you
* actually open it, read its contents, or check modified
* time, etc., it could have been deleted due to a writer
* commit finishing.
*/
public abstract static class FindSegmentsFile {
File fileDirectory;
Directory directory;
public FindSegmentsFile(File directory) {
this.fileDirectory = directory;
}
public FindSegmentsFile(Directory directory) {
this.directory = directory;
}
public Object run() throws IOException {
String segmentFileName = null;
long lastGen = -1;
long gen = 0;
int genLookaheadCount = 0;
IOException exc = null;
boolean retry = false;
int method = 0;
// Loop until we succeed in calling doBody() without
// hitting an IOException. An IOException most likely
// means a commit was in process and has finished, in
// the time it took us to load the now-old infos files
// (and segments files). It's also possible it's a
// true error (corrupt index). To distinguish these,
// on each retry we must see "forward progress" on
// which generation we are trying to load. If we
// don't, then the original error is real and we throw
// it.
IndexInput input = directory.openInput(IndexFileNames.SEGMENTS);
int format = 0;
long version = 0;
try {
format = input.readInt();
if(format < 0){
if (format < FORMAT)
throw new IOException("Unknown format version: " + format);
version = input.readLong(); // read version
// We have three methods for determining the current
// generation. We try each in sequence.
while(true) {
// Method 1: list the directory and use the highest
// segments_N file. This method works well as long
// as there is no stale caching on the directory
// contents:
String[] files = null;
if (0 == method) {
if (directory != null) {
files = directory.list();
} else {
files = fileDirectory.list();
}
gen = getCurrentSegmentGeneration(files);
if (gen == -1) {
String s = "";
for(int i=0;i<files.length;i++) {
s += " " + files[i];
}
throw new FileNotFoundException("no segments* file found: files:" + s);
}
}
// Method 2 (fallback if Method 1 isn't reliable):
// if the directory listing seems to be stale, then
// try loading the "segments.gen" file.
if (1 == method || (0 == method && lastGen == gen && retry)) {
method = 1;
for(int i=0;i<defaultGenFileRetryCount;i++) {
IndexInput genInput = null;
try {
genInput = directory.openInput(IndexFileNames.SEGMENTS_GEN);
} catch (IOException e) {
message("segments.gen open: IOException " + e);
}
if (genInput != null) {
try {
int version = genInput.readInt();
if (version == FORMAT_LOCKLESS) {
long gen0 = genInput.readLong();
long gen1 = genInput.readLong();
message("fallback check: " + gen0 + "; " + gen1);
if (gen0 == gen1) {
// The file is consistent.
if (gen0 > gen) {
message("fallback to '" + IndexFileNames.SEGMENTS_GEN + "' check: now try generation " + gen0 + " > " + gen);
gen = gen0;
}
break;
}
}
} catch (IOException err2) {
// will retry
} finally {
genInput.close();
}
}
try {
Thread.sleep(defaultGenFileRetryPauseMsec);
} catch (InterruptedException e) {
// will retry
}
}
}
// Method 3 (fallback if Methods 2 & 3 are not
// reliabel): since both directory cache and file
// contents cache seem to be stale, just advance the
// generation.
if (2 == method || (1 == method && lastGen == gen && retry)) {
method = 2;
if (genLookaheadCount < defaultGenLookaheadCount) {
gen++;
genLookaheadCount++;
message("look ahead incremenent gen to " + gen);
}
}
if (lastGen == gen) {
// This means we're about to try the same
// segments_N last tried. This is allowed,
// exactly once, because writer could have been in
// the process of writing segments_N last time.
if (retry) {
// OK, we've tried the same segments_N file
// twice in a row, so this must be a real
// error. We throw the original exception we
// got.
throw exc;
} else {
retry = true;
}
} else {
// Segment file has advanced since our last loop, so
// reset retry:
retry = false;
}
lastGen = gen;
segmentFileName = IndexFileNames.fileNameFromGeneration(IndexFileNames.SEGMENTS,
"",
gen);
try {
Object v = doBody(segmentFileName);
if (exc != null) {
message("success on " + segmentFileName);
}
return v;
} catch (IOException err) {
// Save the original root cause:
if (exc == null) {
exc = err;
}
message("primary Exception on '" + segmentFileName + "': " + err + "'; will retry: retry=" + retry + "; gen = " + gen);
if (!retry && gen > 1) {
// This is our first time trying this segments
// file (because retry is false), and, there is
// possibly a segments_(N-1) (because gen > 1).
// So, check if the segments_(N-1) exists and
// try it if so:
String prevSegmentFileName = IndexFileNames.fileNameFromGeneration(IndexFileNames.SEGMENTS,
"",
gen-1);
if (directory.fileExists(prevSegmentFileName)) {
message("fallback to prior segment file '" + prevSegmentFileName + "'");
try {
Object v = doBody(prevSegmentFileName);
if (exc != null) {
message("success on fallback " + prevSegmentFileName);
}
return v;
} catch (IOException err2) {
message("secondary Exception on '" + prevSegmentFileName + "': " + err2 + "'; will retry");
}
}
}
}
}
}
finally {
input.close();
}
if(format < 0)
return version;
// We cannot be sure about the format of the file.
// Therefore we have to read the whole file and cannot simply seek to the version entry.
SegmentInfos sis = new SegmentInfos();
sis.read(directory);
return sis.getVersion();
}
/**
* Subclass must implement this. The assumption is an
* IOException will be thrown if something goes wrong
* during the processing that could have been caused by
* a writer committing.
*/
protected abstract Object doBody(String segmentFileName) throws IOException;}
}

View File

@ -33,6 +33,7 @@ import java.util.*;
*/
class SegmentReader extends IndexReader {
private String segment;
private SegmentInfo si;
FieldInfos fieldInfos;
private FieldsReader fieldsReader;
@ -64,22 +65,24 @@ class SegmentReader extends IndexReader {
private boolean dirty;
private int number;
private void reWrite() throws IOException {
private void reWrite(SegmentInfo si) throws IOException {
// NOTE: norms are re-written in regular directory, not cfs
IndexOutput out = directory().createOutput(segment + ".tmp");
String oldFileName = si.getNormFileName(this.number);
if (oldFileName != null) {
// Mark this file for deletion. Note that we don't
// actually try to delete it until the new segments files is
// successfully written:
deleter.addPendingFile(oldFileName);
}
si.advanceNormGen(this.number);
IndexOutput out = directory().createOutput(si.getNormFileName(this.number));
try {
out.writeBytes(bytes, maxDoc());
} finally {
out.close();
}
String fileName;
if(cfsReader == null)
fileName = segment + ".f" + number;
else{
// use a different file name if we have compound format
fileName = segment + ".s" + number;
}
directory().renameFile(segment + ".tmp", fileName);
this.dirty = false;
}
}
@ -131,57 +134,94 @@ class SegmentReader extends IndexReader {
return instance;
}
private void initialize(SegmentInfo si) throws IOException {
private void initialize(SegmentInfo si) throws IOException {
segment = si.name;
this.si = si;
// Use compound file directory for some files, if it exists
Directory cfsDir = directory();
if (directory().fileExists(segment + ".cfs")) {
cfsReader = new CompoundFileReader(directory(), segment + ".cfs");
cfsDir = cfsReader;
}
boolean success = false;
// No compound file exists - use the multi-file format
fieldInfos = new FieldInfos(cfsDir, segment + ".fnm");
fieldsReader = new FieldsReader(cfsDir, segment, fieldInfos);
try {
// Use compound file directory for some files, if it exists
Directory cfsDir = directory();
if (si.getUseCompoundFile()) {
cfsReader = new CompoundFileReader(directory(), segment + ".cfs");
cfsDir = cfsReader;
}
tis = new TermInfosReader(cfsDir, segment, fieldInfos);
// No compound file exists - use the multi-file format
fieldInfos = new FieldInfos(cfsDir, segment + ".fnm");
fieldsReader = new FieldsReader(cfsDir, segment, fieldInfos);
// NOTE: the bitvector is stored using the regular directory, not cfs
if (hasDeletions(si))
deletedDocs = new BitVector(directory(), segment + ".del");
tis = new TermInfosReader(cfsDir, segment, fieldInfos);
// NOTE: the bitvector is stored using the regular directory, not cfs
if (hasDeletions(si)) {
deletedDocs = new BitVector(directory(), si.getDelFileName());
}
// make sure that all index files have been read or are kept open
// so that if an index update removes them we'll still have them
freqStream = cfsDir.openInput(segment + ".frq");
proxStream = cfsDir.openInput(segment + ".prx");
openNorms(cfsDir);
// make sure that all index files have been read or are kept open
// so that if an index update removes them we'll still have them
freqStream = cfsDir.openInput(segment + ".frq");
proxStream = cfsDir.openInput(segment + ".prx");
openNorms(cfsDir);
if (fieldInfos.hasVectors()) { // open term vector files only as needed
termVectorsReaderOrig = new TermVectorsReader(cfsDir, segment, fieldInfos);
if (fieldInfos.hasVectors()) { // open term vector files only as needed
termVectorsReaderOrig = new TermVectorsReader(cfsDir, segment, fieldInfos);
}
success = true;
} finally {
// With lock-less commits, it's entirely possible (and
// fine) to hit a FileNotFound exception above. In
// this case, we want to explicitly close any subset
// of things that were opened so that we don't have to
// wait for a GC to do so.
if (!success) {
doClose();
}
}
}
protected void finalize() {
protected void finalize() {
// patch for pre-1.4.2 JVMs, whose ThreadLocals leak
termVectorsLocal.set(null);
super.finalize();
}
}
protected void doCommit() throws IOException {
if (deletedDocsDirty) { // re-write deleted
deletedDocs.write(directory(), segment + ".tmp");
directory().renameFile(segment + ".tmp", segment + ".del");
String oldDelFileName = si.getDelFileName();
if (oldDelFileName != null) {
// Mark this file for deletion. Note that we don't
// actually try to delete it until the new segments files is
// successfully written:
deleter.addPendingFile(oldDelFileName);
}
si.advanceDelGen();
// We can write directly to the actual name (vs to a
// .tmp & renaming it) because the file is not live
// until segments file is written:
deletedDocs.write(directory(), si.getDelFileName());
}
if(undeleteAll && directory().fileExists(segment + ".del")){
directory().deleteFile(segment + ".del");
if (undeleteAll && si.hasDeletions()) {
String oldDelFileName = si.getDelFileName();
if (oldDelFileName != null) {
// Mark this file for deletion. Note that we don't
// actually try to delete it until the new segments files is
// successfully written:
deleter.addPendingFile(oldDelFileName);
}
si.clearDelGen();
}
if (normsDirty) { // re-write norms
si.setNumField(fieldInfos.size());
Enumeration values = norms.elements();
while (values.hasMoreElements()) {
Norm norm = (Norm) values.nextElement();
if (norm.dirty) {
norm.reWrite();
norm.reWrite(si);
}
}
}
@ -191,8 +231,12 @@ class SegmentReader extends IndexReader {
}
protected void doClose() throws IOException {
fieldsReader.close();
tis.close();
if (fieldsReader != null) {
fieldsReader.close();
}
if (tis != null) {
tis.close();
}
if (freqStream != null)
freqStream.close();
@ -209,27 +253,19 @@ class SegmentReader extends IndexReader {
}
static boolean hasDeletions(SegmentInfo si) throws IOException {
return si.dir.fileExists(si.name + ".del");
return si.hasDeletions();
}
public boolean hasDeletions() {
return deletedDocs != null;
}
static boolean usesCompoundFile(SegmentInfo si) throws IOException {
return si.dir.fileExists(si.name + ".cfs");
return si.getUseCompoundFile();
}
static boolean hasSeparateNorms(SegmentInfo si) throws IOException {
String[] result = si.dir.list();
String pattern = si.name + ".s";
int patternLength = pattern.length();
for(int i = 0; i < result.length; i++){
if(result[i].startsWith(pattern) && Character.isDigit(result[i].charAt(patternLength)))
return true;
}
return false;
return si.hasSeparateNorms();
}
protected void doDelete(int docNum) {
@ -249,23 +285,27 @@ class SegmentReader extends IndexReader {
Vector files() throws IOException {
Vector files = new Vector(16);
for (int i = 0; i < IndexFileNames.INDEX_EXTENSIONS.length; i++) {
String name = segment + "." + IndexFileNames.INDEX_EXTENSIONS[i];
if (directory().fileExists(name))
if (si.getUseCompoundFile()) {
String name = segment + ".cfs";
if (directory().fileExists(name)) {
files.addElement(name);
}
} else {
for (int i = 0; i < IndexFileNames.INDEX_EXTENSIONS.length; i++) {
String name = segment + "." + IndexFileNames.INDEX_EXTENSIONS[i];
if (directory().fileExists(name))
files.addElement(name);
}
}
if (si.hasDeletions()) {
files.addElement(si.getDelFileName());
}
for (int i = 0; i < fieldInfos.size(); i++) {
FieldInfo fi = fieldInfos.fieldInfo(i);
if (fi.isIndexed && !fi.omitNorms){
String name;
if(cfsReader == null)
name = segment + ".f" + i;
else
name = segment + ".s" + i;
if (directory().fileExists(name))
String name = si.getNormFileName(i);
if (name != null && directory().fileExists(name))
files.addElement(name);
}
}
return files;
}
@ -380,7 +420,6 @@ class SegmentReader extends IndexReader {
protected synchronized byte[] getNorms(String field) throws IOException {
Norm norm = (Norm) norms.get(field);
if (norm == null) return null; // not indexed, or norms not stored
if (norm.bytes == null) { // value not yet read
byte[] bytes = new byte[maxDoc()];
norms(field, bytes, 0);
@ -436,12 +475,10 @@ class SegmentReader extends IndexReader {
for (int i = 0; i < fieldInfos.size(); i++) {
FieldInfo fi = fieldInfos.fieldInfo(i);
if (fi.isIndexed && !fi.omitNorms) {
// look first if there are separate norms in compound format
String fileName = segment + ".s" + fi.number;
Directory d = directory();
if(!d.fileExists(fileName)){
fileName = segment + ".f" + fi.number;
d = cfsDir;
String fileName = si.getNormFileName(fi.number);
if (!si.hasSeparateNorms(fi.number)) {
d = cfsDir;
}
norms.put(fi.name, new Norm(d.openInput(fileName), fi.number));
}

View File

@ -128,7 +128,7 @@ public class FSDirectory extends Directory {
* @return the FSDirectory for the named file. */
public static FSDirectory getDirectory(String path, boolean create)
throws IOException {
return getDirectory(path, create, null);
return getDirectory(new File(path), create, null, true);
}
/** Returns the directory instance for the named location, using the
@ -143,10 +143,16 @@ public class FSDirectory extends Directory {
* @param lockFactory instance of {@link LockFactory} providing the
* locking implementation.
* @return the FSDirectory for the named file. */
public static FSDirectory getDirectory(String path, boolean create,
LockFactory lockFactory, boolean doRemoveOldFiles)
throws IOException {
return getDirectory(new File(path), create, lockFactory, doRemoveOldFiles);
}
public static FSDirectory getDirectory(String path, boolean create,
LockFactory lockFactory)
throws IOException {
return getDirectory(new File(path), create, lockFactory);
return getDirectory(new File(path), create, lockFactory, true);
}
/** Returns the directory instance for the named location.
@ -158,9 +164,9 @@ public class FSDirectory extends Directory {
* @param file the path to the directory.
* @param create if true, create, or erase any existing contents.
* @return the FSDirectory for the named file. */
public static FSDirectory getDirectory(File file, boolean create)
public static FSDirectory getDirectory(File file, boolean create, boolean doRemoveOldFiles)
throws IOException {
return getDirectory(file, create, null);
return getDirectory(file, create, null, doRemoveOldFiles);
}
/** Returns the directory instance for the named location, using the
@ -176,7 +182,7 @@ public class FSDirectory extends Directory {
* locking implementation.
* @return the FSDirectory for the named file. */
public static FSDirectory getDirectory(File file, boolean create,
LockFactory lockFactory)
LockFactory lockFactory, boolean doRemoveOldFiles)
throws IOException {
file = new File(file.getCanonicalPath());
FSDirectory dir;
@ -188,7 +194,7 @@ public class FSDirectory extends Directory {
} catch (Exception e) {
throw new RuntimeException("cannot load FSDirectory class: " + e.toString(), e);
}
dir.init(file, create, lockFactory);
dir.init(file, create, lockFactory, doRemoveOldFiles);
DIRECTORIES.put(file, dir);
} else {
@ -199,7 +205,7 @@ public class FSDirectory extends Directory {
}
if (create) {
dir.create();
dir.create(doRemoveOldFiles);
}
}
}
@ -209,23 +215,35 @@ public class FSDirectory extends Directory {
return dir;
}
public static FSDirectory getDirectory(File file, boolean create,
LockFactory lockFactory)
throws IOException
{
return getDirectory(file, create, lockFactory, true);
}
public static FSDirectory getDirectory(File file, boolean create)
throws IOException {
return getDirectory(file, create, true);
}
private File directory = null;
private int refCount;
protected FSDirectory() {}; // permit subclassing
private void init(File path, boolean create) throws IOException {
private void init(File path, boolean create, boolean doRemoveOldFiles) throws IOException {
directory = path;
if (create) {
create();
create(doRemoveOldFiles);
}
if (!directory.isDirectory())
throw new IOException(path + " not a directory");
}
private void init(File path, boolean create, LockFactory lockFactory) throws IOException {
private void init(File path, boolean create, LockFactory lockFactory, boolean doRemoveOldFiles) throws IOException {
// Set up lockFactory with cascaded defaults: if an instance was passed in,
// use that; else if locks are disabled, use NoLockFactory; else if the
@ -280,10 +298,10 @@ public class FSDirectory extends Directory {
setLockFactory(lockFactory);
init(path, create);
init(path, create, doRemoveOldFiles);
}
private synchronized void create() throws IOException {
private synchronized void create(boolean doRemoveOldFiles) throws IOException {
if (!directory.exists())
if (!directory.mkdirs())
throw new IOException("Cannot create directory: " + directory);
@ -291,13 +309,15 @@ public class FSDirectory extends Directory {
if (!directory.isDirectory())
throw new IOException(directory + " not a directory");
String[] files = directory.list(new IndexFileNameFilter()); // clear old files
if (files == null)
throw new IOException("Cannot read directory " + directory.getAbsolutePath());
for (int i = 0; i < files.length; i++) {
File file = new File(directory, files[i]);
if (!file.delete())
throw new IOException("Cannot delete " + file);
if (doRemoveOldFiles) {
String[] files = directory.list(IndexFileNameFilter.getFilter()); // clear old files
if (files == null)
throw new IOException("Cannot read directory " + directory.getAbsolutePath());
for (int i = 0; i < files.length; i++) {
File file = new File(directory, files[i]);
if (!file.delete())
throw new IOException("Cannot delete " + file);
}
}
lockFactory.clearAllLocks();
@ -305,7 +325,7 @@ public class FSDirectory extends Directory {
/** Returns an array of strings, one for each Lucene index file in the directory. */
public String[] list() {
return directory.list(new IndexFileNameFilter());
return directory.list(IndexFileNameFilter.getFilter());
}
/** Returns true iff a file with the given name exists. */

View File

@ -18,6 +18,7 @@ package org.apache.lucene.store;
*/
import java.io.IOException;
import java.io.FileNotFoundException;
import java.io.File;
import java.io.Serializable;
import java.util.Hashtable;
@ -105,7 +106,7 @@ public final class RAMDirectory extends Directory implements Serializable {
}
/** Returns an array of strings, one for each file in the directory. */
public final String[] list() {
public synchronized final String[] list() {
String[] result = new String[files.size()];
int i = 0;
Enumeration names = files.keys();
@ -129,7 +130,7 @@ public final class RAMDirectory extends Directory implements Serializable {
/** Set the modified time of an existing file to now. */
public void touchFile(String name) {
// final boolean MONITOR = false;
RAMFile file = (RAMFile)files.get(name);
long ts2, ts1 = System.currentTimeMillis();
do {
@ -175,8 +176,11 @@ public final class RAMDirectory extends Directory implements Serializable {
}
/** Returns a stream reading an existing file. */
public final IndexInput openInput(String name) {
public final IndexInput openInput(String name) throws IOException {
RAMFile file = (RAMFile)files.get(name);
if (file == null) {
throw new FileNotFoundException(name);
}
return new RAMInputStream(file);
}

View File

@ -32,6 +32,7 @@ import org.apache.lucene.document.Field;
import java.util.Collection;
import java.io.IOException;
import java.io.FileNotFoundException;
import java.io.File;
public class TestIndexReader extends TestCase
@ -222,6 +223,11 @@ public class TestIndexReader extends TestCase
assertEquals("deleted count", 100, deleted);
assertEquals("deleted docFreq", 100, reader.docFreq(searchTerm));
assertTermDocsCount("deleted termDocs", reader, searchTerm, 0);
// open a 2nd reader to make sure first reader can
// commit its changes (.del) while second reader
// is open:
IndexReader reader2 = IndexReader.open(dir);
reader.close();
// CREATE A NEW READER and re-test
@ -231,10 +237,73 @@ public class TestIndexReader extends TestCase
reader.close();
}
// Make sure you can set norms & commit even if a reader
// is open against the index:
public void testWritingNorms() throws IOException
{
String tempDir = System.getProperty("tempDir");
if (tempDir == null)
throw new IOException("tempDir undefined, cannot run test");
File indexDir = new File(tempDir, "lucenetestnormwriter");
Directory dir = FSDirectory.getDirectory(indexDir, true);
IndexWriter writer = null;
IndexReader reader = null;
Term searchTerm = new Term("content", "aaa");
// add 1 documents with term : aaa
writer = new IndexWriter(dir, new WhitespaceAnalyzer(), true);
addDoc(writer, searchTerm.text());
writer.close();
// now open reader & set norm for doc 0
reader = IndexReader.open(dir);
reader.setNorm(0, "content", (float) 2.0);
// we should be holding the write lock now:
assertTrue("locked", IndexReader.isLocked(dir));
reader.commit();
// we should not be holding the write lock now:
assertTrue("not locked", !IndexReader.isLocked(dir));
// open a 2nd reader:
IndexReader reader2 = IndexReader.open(dir);
// set norm again for doc 0
reader.setNorm(0, "content", (float) 3.0);
assertTrue("locked", IndexReader.isLocked(dir));
reader.close();
// we should not be holding the write lock now:
assertTrue("not locked", !IndexReader.isLocked(dir));
reader2.close();
dir.close();
rmDir(indexDir);
}
public void testDeleteReaderWriterConflictUnoptimized() throws IOException{
deleteReaderWriterConflict(false);
}
public void testOpenEmptyDirectory() throws IOException{
String dirName = "test.empty";
File fileDirName = new File(dirName);
if (!fileDirName.exists()) {
fileDirName.mkdir();
}
try {
IndexReader reader = IndexReader.open(fileDirName);
fail("opening IndexReader on empty directory failed to produce FileNotFoundException");
} catch (FileNotFoundException e) {
// GOOD
}
}
public void testDeleteReaderWriterConflictOptimized() throws IOException{
deleteReaderWriterConflict(true);
@ -368,12 +437,36 @@ public class TestIndexReader extends TestCase
assertFalse(IndexReader.isLocked(dir)); // reader only, no lock
long version = IndexReader.lastModified(dir);
reader.close();
// modify index and check version has been incremented:
// modify index and check version has been
// incremented:
writer = new IndexWriter(dir, new WhitespaceAnalyzer(), true);
addDocumentWithFields(writer);
writer.close();
reader = IndexReader.open(dir);
assertTrue(version < IndexReader.getCurrentVersion(dir));
assertTrue("old lastModified is " + version + "; new lastModified is " + IndexReader.lastModified(dir), version <= IndexReader.lastModified(dir));
reader.close();
}
public void testVersion() throws IOException {
assertFalse(IndexReader.indexExists("there_is_no_such_index"));
Directory dir = new RAMDirectory();
assertFalse(IndexReader.indexExists(dir));
IndexWriter writer = new IndexWriter(dir, new WhitespaceAnalyzer(), true);
addDocumentWithFields(writer);
assertTrue(IndexReader.isLocked(dir)); // writer open, so dir is locked
writer.close();
assertTrue(IndexReader.indexExists(dir));
IndexReader reader = IndexReader.open(dir);
assertFalse(IndexReader.isLocked(dir)); // reader only, no lock
long version = IndexReader.getCurrentVersion(dir);
reader.close();
// modify index and check version has been
// incremented:
writer = new IndexWriter(dir, new WhitespaceAnalyzer(), true);
addDocumentWithFields(writer);
writer.close();
reader = IndexReader.open(dir);
assertTrue("old version is " + version + "; new version is " + IndexReader.getCurrentVersion(dir), version < IndexReader.getCurrentVersion(dir));
reader.close();
}
@ -412,6 +505,40 @@ public class TestIndexReader extends TestCase
reader.close();
}
public void testUndeleteAllAfterClose() throws IOException {
Directory dir = new RAMDirectory();
IndexWriter writer = new IndexWriter(dir, new WhitespaceAnalyzer(), true);
addDocumentWithFields(writer);
addDocumentWithFields(writer);
writer.close();
IndexReader reader = IndexReader.open(dir);
reader.deleteDocument(0);
reader.deleteDocument(1);
reader.close();
reader = IndexReader.open(dir);
reader.undeleteAll();
assertEquals(2, reader.numDocs()); // nothing has really been deleted thanks to undeleteAll()
reader.close();
}
public void testUndeleteAllAfterCloseThenReopen() throws IOException {
Directory dir = new RAMDirectory();
IndexWriter writer = new IndexWriter(dir, new WhitespaceAnalyzer(), true);
addDocumentWithFields(writer);
addDocumentWithFields(writer);
writer.close();
IndexReader reader = IndexReader.open(dir);
reader.deleteDocument(0);
reader.deleteDocument(1);
reader.close();
reader = IndexReader.open(dir);
reader.undeleteAll();
reader.close();
reader = IndexReader.open(dir);
assertEquals(2, reader.numDocs()); // nothing has really been deleted thanks to undeleteAll()
reader.close();
}
public void testDeleteReaderReaderConflictUnoptimized() throws IOException{
deleteReaderReaderConflict(false);
}
@ -562,4 +689,11 @@ public class TestIndexReader extends TestCase
doc.add(new Field("content", value, Field.Store.NO, Field.Index.TOKENIZED));
writer.addDocument(doc);
}
private void rmDir(File dir) {
File[] files = dir.listFiles();
for (int i = 0; i < files.length; i++) {
files[i].delete();
}
dir.delete();
}
}

View File

@ -1,6 +1,7 @@
package org.apache.lucene.index;
import java.io.IOException;
import java.io.File;
import junit.framework.TestCase;
@ -10,7 +11,10 @@ import org.apache.lucene.document.Field;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.store.RAMDirectory;
import org.apache.lucene.store.IndexInput;
import org.apache.lucene.store.IndexOutput;
/**
@ -28,14 +32,11 @@ public class TestIndexWriter extends TestCase
int i;
IndexWriter.setDefaultWriteLockTimeout(2000);
IndexWriter.setDefaultCommitLockTimeout(2000);
assertEquals(2000, IndexWriter.getDefaultWriteLockTimeout());
assertEquals(2000, IndexWriter.getDefaultCommitLockTimeout());
writer = new IndexWriter(dir, new WhitespaceAnalyzer(), true);
IndexWriter.setDefaultWriteLockTimeout(1000);
IndexWriter.setDefaultCommitLockTimeout(1000);
// add 100 documents
for (i = 0; i < 100; i++) {
@ -72,6 +73,12 @@ public class TestIndexWriter extends TestCase
assertEquals(60, reader.maxDoc());
assertEquals(60, reader.numDocs());
reader.close();
// make sure opening a new index for create over
// this existing one works correctly:
writer = new IndexWriter(dir, new WhitespaceAnalyzer(), true);
assertEquals(0, writer.docCount());
writer.close();
}
private void addDoc(IndexWriter writer) throws IOException
@ -80,4 +87,192 @@ public class TestIndexWriter extends TestCase
doc.add(new Field("content", "aaa", Field.Store.NO, Field.Index.TOKENIZED));
writer.addDocument(doc);
}
// Make sure we can open an index for create even when a
// reader holds it open (this fails pre lock-less
// commits on windows):
public void testCreateWithReader() throws IOException {
String tempDir = System.getProperty("java.io.tmpdir");
if (tempDir == null)
throw new IOException("java.io.tmpdir undefined, cannot run test");
File indexDir = new File(tempDir, "lucenetestindexwriter");
Directory dir = FSDirectory.getDirectory(indexDir, true);
// add one document & close writer
IndexWriter writer = new IndexWriter(dir, new WhitespaceAnalyzer(), true);
addDoc(writer);
writer.close();
// now open reader:
IndexReader reader = IndexReader.open(dir);
assertEquals("should be one document", reader.numDocs(), 1);
// now open index for create:
writer = new IndexWriter(dir, new WhitespaceAnalyzer(), true);
assertEquals("should be zero documents", writer.docCount(), 0);
addDoc(writer);
writer.close();
assertEquals("should be one document", reader.numDocs(), 1);
IndexReader reader2 = IndexReader.open(dir);
assertEquals("should be one document", reader2.numDocs(), 1);
reader.close();
reader2.close();
rmDir(indexDir);
}
// Simulate a writer that crashed while writing segments
// file: make sure we can still open the index (ie,
// gracefully fallback to the previous segments file),
// and that we can add to the index:
public void testSimulatedCrashedWriter() throws IOException {
Directory dir = new RAMDirectory();
IndexWriter writer = null;
writer = new IndexWriter(dir, new WhitespaceAnalyzer(), true);
// add 100 documents
for (int i = 0; i < 100; i++) {
addDoc(writer);
}
// close
writer.close();
long gen = SegmentInfos.getCurrentSegmentGeneration(dir);
assertTrue("segment generation should be > 1 but got " + gen, gen > 1);
// Make the next segments file, with last byte
// missing, to simulate a writer that crashed while
// writing segments file:
String fileNameIn = SegmentInfos.getCurrentSegmentFileName(dir);
String fileNameOut = IndexFileNames.fileNameFromGeneration(IndexFileNames.SEGMENTS,
"",
1+gen);
IndexInput in = dir.openInput(fileNameIn);
IndexOutput out = dir.createOutput(fileNameOut);
long length = in.length();
for(int i=0;i<length-1;i++) {
out.writeByte(in.readByte());
}
in.close();
out.close();
IndexReader reader = null;
try {
reader = IndexReader.open(dir);
} catch (Exception e) {
fail("reader failed to open on a crashed index");
}
reader.close();
try {
writer = new IndexWriter(dir, new WhitespaceAnalyzer(), true);
} catch (Exception e) {
fail("writer failed to open on a crashed index");
}
// add 100 documents
for (int i = 0; i < 100; i++) {
addDoc(writer);
}
// close
writer.close();
}
// Simulate a corrupt index by removing last byte of
// latest segments file and make sure we get an
// IOException trying to open the index:
public void testSimulatedCorruptIndex1() throws IOException {
Directory dir = new RAMDirectory();
IndexWriter writer = null;
writer = new IndexWriter(dir, new WhitespaceAnalyzer(), true);
// add 100 documents
for (int i = 0; i < 100; i++) {
addDoc(writer);
}
// close
writer.close();
long gen = SegmentInfos.getCurrentSegmentGeneration(dir);
assertTrue("segment generation should be > 1 but got " + gen, gen > 1);
String fileNameIn = SegmentInfos.getCurrentSegmentFileName(dir);
String fileNameOut = IndexFileNames.fileNameFromGeneration(IndexFileNames.SEGMENTS,
"",
1+gen);
IndexInput in = dir.openInput(fileNameIn);
IndexOutput out = dir.createOutput(fileNameOut);
long length = in.length();
for(int i=0;i<length-1;i++) {
out.writeByte(in.readByte());
}
in.close();
out.close();
dir.deleteFile(fileNameIn);
IndexReader reader = null;
try {
reader = IndexReader.open(dir);
fail("reader did not hit IOException on opening a corrupt index");
} catch (Exception e) {
}
if (reader != null) {
reader.close();
}
}
// Simulate a corrupt index by removing one of the cfs
// files and make sure we get an IOException trying to
// open the index:
public void testSimulatedCorruptIndex2() throws IOException {
Directory dir = new RAMDirectory();
IndexWriter writer = null;
writer = new IndexWriter(dir, new WhitespaceAnalyzer(), true);
// add 100 documents
for (int i = 0; i < 100; i++) {
addDoc(writer);
}
// close
writer.close();
long gen = SegmentInfos.getCurrentSegmentGeneration(dir);
assertTrue("segment generation should be > 1 but got " + gen, gen > 1);
String[] files = dir.list();
for(int i=0;i<files.length;i++) {
if (files[i].endsWith(".cfs")) {
dir.deleteFile(files[i]);
break;
}
}
IndexReader reader = null;
try {
reader = IndexReader.open(dir);
fail("reader did not hit IOException on opening a corrupt index");
} catch (Exception e) {
}
if (reader != null) {
reader.close();
}
}
private void rmDir(File dir) {
File[] files = dir.listFiles();
for (int i = 0; i < files.length; i++) {
files[i].delete();
}
dir.delete();
}
}

View File

@ -80,6 +80,21 @@ public class TestMultiReader extends TestCase {
assertEquals( 1, reader.numDocs() );
reader.undeleteAll();
assertEquals( 2, reader.numDocs() );
// Ensure undeleteAll survives commit/close/reopen:
reader.commit();
reader.close();
sis.read(dir);
reader = new MultiReader(dir, sis, false, readers);
assertEquals( 2, reader.numDocs() );
reader.deleteDocument(0);
assertEquals( 1, reader.numDocs() );
reader.commit();
reader.close();
sis.read(dir);
reader = new MultiReader(dir, sis, false, readers);
assertEquals( 1, reader.numDocs() );
}

View File

@ -58,9 +58,9 @@ public class TestLockFactory extends TestCase {
// Both write lock and commit lock should have been created:
assertEquals("# of unique locks created (after instantiating IndexWriter)",
2, lf.locksCreated.size());
assertTrue("# calls to makeLock <= 2 (after instantiating IndexWriter)",
lf.makeLockCount > 2);
1, lf.locksCreated.size());
assertTrue("# calls to makeLock is 0 (after instantiating IndexWriter)",
lf.makeLockCount >= 1);
for(Enumeration e = lf.locksCreated.keys(); e.hasMoreElements();) {
String lockName = (String) e.nextElement();
@ -90,6 +90,7 @@ public class TestLockFactory extends TestCase {
try {
writer2 = new IndexWriter(dir, new WhitespaceAnalyzer(), false);
} catch (Exception e) {
e.printStackTrace(System.out);
fail("Should not have hit an IOException with no locking");
}
@ -234,6 +235,7 @@ public class TestLockFactory extends TestCase {
try {
writer2 = new IndexWriter(indexDirName, new WhitespaceAnalyzer(), false);
} catch (IOException e) {
e.printStackTrace(System.out);
fail("Should not have hit an IOException with locking disabled");
}
@ -266,6 +268,7 @@ public class TestLockFactory extends TestCase {
try {
fs2 = FSDirectory.getDirectory(indexDirName, true, lf);
} catch (IOException e) {
e.printStackTrace(System.out);
fail("Should not have hit an IOException because LockFactory instances are the same");
}
@ -294,7 +297,6 @@ public class TestLockFactory extends TestCase {
public void _testStressLocks(LockFactory lockFactory, String indexDirName) throws IOException {
FSDirectory fs1 = FSDirectory.getDirectory(indexDirName, true, lockFactory);
// fs1.setLockFactory(NoLockFactory.getNoLockFactory());
// First create a 1 doc index:
IndexWriter w = new IndexWriter(fs1, new WhitespaceAnalyzer(), true);
@ -405,6 +407,7 @@ public class TestLockFactory extends TestCase {
hitException = true;
System.out.println("Stress Test Index Writer: creation hit unexpected exception: " + e.toString());
e.printStackTrace(System.out);
break;
}
if (writer != null) {
try {
@ -413,6 +416,7 @@ public class TestLockFactory extends TestCase {
hitException = true;
System.out.println("Stress Test Index Writer: addDoc hit unexpected exception: " + e.toString());
e.printStackTrace(System.out);
break;
}
try {
writer.close();
@ -420,6 +424,7 @@ public class TestLockFactory extends TestCase {
hitException = true;
System.out.println("Stress Test Index Writer: close hit unexpected exception: " + e.toString());
e.printStackTrace(System.out);
break;
}
writer = null;
}
@ -446,6 +451,7 @@ public class TestLockFactory extends TestCase {
hitException = true;
System.out.println("Stress Test Index Searcher: create hit unexpected exception: " + e.toString());
e.printStackTrace(System.out);
break;
}
if (searcher != null) {
Hits hits = null;
@ -455,6 +461,7 @@ public class TestLockFactory extends TestCase {
hitException = true;
System.out.println("Stress Test Index Searcher: search hit unexpected exception: " + e.toString());
e.printStackTrace(System.out);
break;
}
// System.out.println(hits.length() + " total results");
try {
@ -463,6 +470,7 @@ public class TestLockFactory extends TestCase {
hitException = true;
System.out.println("Stress Test Index Searcher: close hit unexpected exception: " + e.toString());
e.printStackTrace(System.out);
break;
}
searcher = null;
}

View File

@ -14,7 +14,7 @@
<p>
This document defines the index file formats used
in Lucene version 2.0. If you are using a different
in Lucene version 2.1. If you are using a different
version of Lucene, please consult the copy of
<code>docs/fileformats.html</code> that was distributed
with the version you are using.
@ -43,6 +43,18 @@
describing how file formats have changed from prior versions.
</p>
<p>
In version 2.1, the file format was changed to allow
lock-less commits (ie, no more commit lock). The
change is fully backwards compatible: you can open a
pre-2.1 index for searching or adding/deleting of
docs. When the new segments file is saved
(committed), it will be written in the new file format
(meaning no specific "upgrade" process is needed).
But note that once a commit has occurred, pre-2.1
Lucene will not be able to read the index.
</p>
</section>
<section name="Definitions">
@ -260,6 +272,18 @@
required.
</p>
<p>
As of version 2.1 (lock-less commits), file names are
never re-used (there is one exception, "segments.gen",
see below). That is, when any file is saved to the
Directory it is given a never before used filename.
This is achieved using a simple generations approach.
For example, the first segments file is segments_1,
then segments_2, etc. The generation is a sequential
long integer represented in alpha-numeric (base 36)
form.
</p>
</section>
<section name="Primitive Types">
@ -696,22 +720,48 @@
<p>
The active segments in the index are stored in the
segment info file. An index only has
a single file in this format, and it is named "segments".
This lists each segment by name, and also contains the size of each
segment.
segment info file, <tt>segments_N</tt>. There may
be one or more <tt>segments_N</tt> files in the
index; however, the one with the largest
generation is the active one (when older
segments_N files are present it's because they
temporarily cannot be deleted, or, a writer is in
the process of committing). This file lists each
segment by name, has details about the separate
norms and deletion files, and also contains the
size of each segment.
</p>
<p>
As of 2.1, there is also a file
<tt>segments.gen</tt>. This file contains the
current generation (the <tt>_N</tt> in
<tt>segments_N</tt>) of the index. This is
used only as a fallback in case the current
generation cannot be accurately determined by
directory listing alone (as is the case for some
NFS clients with time-based directory cache
expiraation). This file simply contains an Int32
version header (SegmentInfos.FORMAT_LOCKLESS =
-2), followed by the generation recorded as Int64,
written twice.
</p>
<p>
<b>Pre-2.1:</b>
Segments --&gt; Format, Version, NameCounter, SegCount, &lt;SegName, SegSize&gt;<sup>SegCount</sup>
</p>
<p>
Format, NameCounter, SegCount, SegSize --&gt; UInt32
<p>
<b>2.1 and above:</b>
Segments --&gt; Format, Version, NameCounter, SegCount, &lt;SegName, SegSize, DelGen, NumField, NormGen<sup>NumField</sup> &gt;<sup>SegCount</sup>, IsCompoundFile
</p>
<p>
Version --&gt; UInt64
Format, NameCounter, SegCount, SegSize, NumField --&gt; Int32
</p>
<p>
Version, DelGen, NormGen --&gt; Int64
</p>
<p>
@ -719,7 +769,11 @@
</p>
<p>
Format is -1 in Lucene 1.4.
IsCompoundFile --&gt; Int8
</p>
<p>
Format is -1 as of Lucene 1.4 and -2 as of Lucene 2.1.
</p>
<p>
@ -740,65 +794,79 @@
SegSize is the number of documents contained in the segment index.
</p>
<p>
DelGen is the generation count of the separate
deletes file. If this is -1, there are no
separate deletes. If it is 0, this is a pre-2.1
segment and you must check filesystem for the
existence of _X.del. Anything above zero means
there are separate deletes (_X_N.del).
</p>
<p>
NumField is the size of the array for NormGen, or
-1 if there are no NormGens stored.
</p>
<p>
NormGen records the generation of the separate
norms files. If NumField is -1, there are no
normGens stored and they are all assumed to be 0
when the segment file was written pre-2.1 and all
assumed to be -1 when the segments file is 2.1 or
above. The generation then has the same meaning
as delGen (above).
</p>
<p>
IsCompoundFile records whether the segment is
written as a compound file or not. If this is -1,
the segment is not a compound file. If it is 1,
the segment is a compound file. Else it is 0,
which means we check filesystem to see if _X.cfs
exists.
</p>
</subsection>
<subsection name="Lock Files">
<subsection name="Lock File">
<p>
Several files are used to indicate that another
process is using an index. Note that these files are not
A write lock is used to indicate that another
process is writing to the index. Note that this file is not
stored in the index directory itself, but rather in the
system's temporary directory, as indicated in the Java
system property "java.io.tmpdir".
</p>
<ul>
<li>
<p>
When a file named "commit.lock"
is present, a process is currently re-writing the "segments"
file and deleting outdated segment index files, or a process is
reading the "segments"
file and opening the files of the segments it names. This lock file
prevents files from being deleted by another process after a process
has read the "segments"
file but before it has managed to open all of the files of the
segments named therein.
</p>
</li>
<p>
The write lock is named "XXXX-write.lock" where
XXXX is typically a unique prefix computed by the
directory path to the index. When this file is
present, a process is currently adding documents
to an index, or removing files from that index.
This lock file prevents several processes from
attempting to modify an index at the same time.
</p>
<p>
Note that prior to version 2.1, Lucene also used a
commit lock. This was removed in 2.1.
</p>
<li>
<p>
When a file named "write.lock"
is present, a process is currently adding documents to an index, or
removing files from that index. This lock file prevents several
processes from attempting to modify an index at the same time.
</p>
</li>
</ul>
</subsection>
<subsection name="Deletable File">
<p>
A file named "deletable"
contains the names of files that are no longer used by the index, but
which could not be deleted. This is only used on Win32, where a
file may not be deleted while it is still open. On other platforms
the file contains only null bytes.
Prior to Lucene 2.1 there was a file "deletable"
that contained details about files that need to be
deleted. As of 2.1, a writer dynamically computes
the files that are deletable, instead, so no file
is written.
</p>
<p>
Deletable --&gt; DeletableCount,
&lt;DelableName&gt;<sup>DeletableCount</sup>
</p>
<p>DeletableCount --&gt; UInt32
</p>
<p>DeletableName --&gt;
String
</p>
</subsection>
<subsection name="Compound Files">