From d634ccf4e933374ea0cba98a986fd602d8a25c72 Mon Sep 17 00:00:00 2001 From: Michael McCandless Date: Fri, 17 Nov 2006 23:18:47 +0000 Subject: [PATCH] Lockless commits git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@476359 13f79535-47bb-0310-9956-ffa450edef68 --- CHANGES.txt | 9 + docs/fileformats.html | 157 ++++-- .../apache/lucene/index/IndexFileDeleter.java | 219 ++++++++ .../lucene/index/IndexFileNameFilter.java | 57 +- .../apache/lucene/index/IndexFileNames.java | 41 +- .../org/apache/lucene/index/IndexReader.java | 140 ++--- .../org/apache/lucene/index/IndexWriter.java | 226 ++------ .../org/apache/lucene/index/MultiReader.java | 7 + .../org/apache/lucene/index/SegmentInfo.java | 287 ++++++++++ .../org/apache/lucene/index/SegmentInfos.java | 521 ++++++++++++++++-- .../apache/lucene/index/SegmentReader.java | 175 +++--- .../org/apache/lucene/store/FSDirectory.java | 60 +- .../org/apache/lucene/store/RAMDirectory.java | 10 +- .../apache/lucene/index/TestIndexReader.java | 138 ++++- .../apache/lucene/index/TestIndexWriter.java | 201 ++++++- .../apache/lucene/index/TestMultiReader.java | 15 + .../lucene/index/index.prelockless.cfs.zip | Bin 0 -> 3837 bytes .../lucene/index/index.prelockless.nocfs.zip | Bin 0 -> 11147 bytes .../apache/lucene/store/TestLockFactory.java | 16 +- xdocs/fileformats.xml | 170 ++++-- 20 files changed, 1956 insertions(+), 493 deletions(-) create mode 100644 src/java/org/apache/lucene/index/IndexFileDeleter.java create mode 100644 src/test/org/apache/lucene/index/index.prelockless.cfs.zip create mode 100644 src/test/org/apache/lucene/index/index.prelockless.nocfs.zip diff --git a/CHANGES.txt b/CHANGES.txt index 973e2d053b0..e0eafb88049 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -104,6 +104,15 @@ API Changes 9. LUCENE-657: Made FuzzyQuery non-final and inner ScoreTerm protected. (Steven Parkes via Otis Gospodnetic) +10. LUCENE-701: Lockless commits: a commit lock is no longer required + when a writer commits and a reader opens the index. This includes + a change to the index file format (see docs/fileformats.html for + details). It also removes all APIs associated with the commit + lock & its timeout. Readers are now truly read-only and do not + block one another on startup. This is the first step to getting + Lucene to work correctly over NFS (second step is + LUCENE-710). (Mike McCandless) + Bug fixes 1. Fixed the web application demo (built with "ant war-demo") which diff --git a/docs/fileformats.html b/docs/fileformats.html index 26158092bf8..e45f9a102f2 100644 --- a/docs/fileformats.html +++ b/docs/fileformats.html @@ -118,7 +118,7 @@ limitations under the License.

This document defines the index file formats used - in Lucene version 2.0. If you are using a different + in Lucene version 2.1. If you are using a different version of Lucene, please consult the copy of docs/fileformats.html that was distributed with the version you are using. @@ -143,6 +143,17 @@ limitations under the License. Compatibility notes are provided in this document, describing how file formats have changed from prior versions.

+

+ In version 2.1, the file format was changed to allow + lock-less commits (ie, no more commit lock). The + change is fully backwards compatible: you can open a + pre-2.1 index for searching or adding/deleting of + docs. When the new segments file is saved + (committed), it will be written in the new file format + (meaning no specific "upgrade" process is needed). + But note that once a commit has occurred, pre-2.1 + Lucene will not be able to read the index. +

@@ -403,6 +414,17 @@ limitations under the License. Typically, all segments in an index are stored in a single directory, although this is not required. +

+

+ As of version 2.1 (lock-less commits), file names are + never re-used (there is one exception, "segments.gen", + see below). That is, when any file is saved to the + Directory it is given a never before used filename. + This is achieved using a simple generations approach. + For example, the first segments file is segments_1, + then segments_2, etc. The generation is a sequential + long integer represented in alpha-numeric (base 36) + form.

@@ -1080,25 +1102,53 @@ limitations under the License.

The active segments in the index are stored in the - segment info file. An index only has - a single file in this format, and it is named "segments". - This lists each segment by name, and also contains the size of each - segment. + segment info file, segments_N. There may + be one or more segments_N files in the + index; however, the one with the largest + generation is the active one (when older + segments_N files are present it's because they + temporarily cannot be deleted, or, a writer is in + the process of committing). This file lists each + segment by name, has details about the separate + norms and deletion files, and also contains the + size of each segment.

+ As of 2.1, there is also a file + segments.gen. This file contains the + current generation (the _N in + segments_N) of the index. This is + used only as a fallback in case the current + generation cannot be accurately determined by + directory listing alone (as is the case for some + NFS clients with time-based directory cache + expiraation). This file simply contains an Int32 + version header (SegmentInfos.FORMAT_LOCKLESS = + -2), followed by the generation recorded as Int64, + written twice. +

+

+ Pre-2.1: Segments --> Format, Version, NameCounter, SegCount, <SegName, SegSize>SegCount

- Format, NameCounter, SegCount, SegSize --> UInt32 + 2.1 and above: + Segments --> Format, Version, NameCounter, SegCount, <SegName, SegSize, DelGen, NumField, NormGenNumField >SegCount, IsCompoundFile

- Version --> UInt64 + Format, NameCounter, SegCount, SegSize, NumField --> Int32 +

+

+ Version, DelGen, NormGen --> Int64

SegName --> String

- Format is -1 in Lucene 1.4. + IsCompoundFile --> Int8 +

+

+ Format is -1 as of Lucene 1.4 and -2 as of Lucene 2.1.

Version counts how often the index has been @@ -1113,6 +1163,35 @@ limitations under the License.

SegSize is the number of documents contained in the segment index. +

+

+ DelGen is the generation count of the separate + deletes file. If this is -1, there are no + separate deletes. If it is 0, this is a pre-2.1 + segment and you must check filesystem for the + existence of _X.del. Anything above zero means + there are separate deletes (_X_N.del). +

+

+ NumField is the size of the array for NormGen, or + -1 if there are no NormGens stored. +

+

+ NormGen records the generation of the separate + norms files. If NumField is -1, there are no + normGens stored and they are all assumed to be 0 + when the segment file was written pre-2.1 and all + assumed to be -1 when the segments file is 2.1 or + above. The generation then has the same meaning + as delGen (above). +

+

+ IsCompoundFile records whether the segment is + written as a compound file or not. If this is -1, + the segment is not a compound file. If it is 1, + the segment is a compound file. Else it is 0, + which means we check filesystem to see if _X.cfs + exists.

@@ -1121,42 +1200,31 @@ limitations under the License. @@ -1170,20 +1238,11 @@ limitations under the License. diff --git a/src/java/org/apache/lucene/index/IndexFileDeleter.java b/src/java/org/apache/lucene/index/IndexFileDeleter.java new file mode 100644 index 00000000000..e25a144059c --- /dev/null +++ b/src/java/org/apache/lucene/index/IndexFileDeleter.java @@ -0,0 +1,219 @@ +package org.apache.lucene.index; + +import org.apache.lucene.index.IndexFileNames; +import org.apache.lucene.index.IndexFileNameFilter; +import org.apache.lucene.index.SegmentInfos; +import org.apache.lucene.store.Directory; + +import java.io.IOException; +import java.io.PrintStream; +import java.util.Vector; +import java.util.HashMap; + +/** + * A utility class (used by both IndexReader and + * IndexWriter) to keep track of files that need to be + * deleted because they are no longer referenced by the + * index. + */ +public class IndexFileDeleter { + private Vector deletable; + private Vector pending; + private Directory directory; + private SegmentInfos segmentInfos; + private PrintStream infoStream; + + public IndexFileDeleter(SegmentInfos segmentInfos, Directory directory) + throws IOException { + this.segmentInfos = segmentInfos; + this.directory = directory; + } + + void setInfoStream(PrintStream infoStream) { + this.infoStream = infoStream; + } + + /** Determine index files that are no longer referenced + * and therefore should be deleted. This is called once + * (by the writer), and then subsequently we add onto + * deletable any files that are no longer needed at the + * point that we create the unused file (eg when merging + * segments), and we only remove from deletable when a + * file is successfully deleted. + */ + + public void findDeletableFiles() throws IOException { + + // Gather all "current" segments: + HashMap current = new HashMap(); + for(int j=0;j.f + a number. - * Also note that two of Lucene's files (deletable and - * segments) don't have any filename extension. + * This array contains all filename extensions used by + * Lucene's index files, with two exceptions, namely the + * extension made up from .f + a number and + * from .s + a number. Also note that + * Lucene's segments_N files do not have any + * filename extension. */ static final String INDEX_EXTENSIONS[] = new String[] { "cfs", "fnm", "fdx", "fdt", "tii", "tis", "frq", "prx", "del", - "tvx", "tvd", "tvf", "tvp" }; + "tvx", "tvd", "tvf", "tvp", "gen"}; /** File extensions of old-style index files */ static final String COMPOUND_EXTENSIONS[] = new String[] { @@ -50,5 +56,24 @@ final class IndexFileNames { static final String VECTOR_EXTENSIONS[] = new String[] { "tvx", "tvd", "tvf" }; - + + /** + * Computes the full file name from base, extension and + * generation. If the generation is -1, the file name is + * null. If it's 0, the file name is . + * If it's > 0, the file name is _. + * + * @param base -- main part of the file name + * @param extension -- extension of the filename (including .) + * @param gen -- generation + */ + public static final String fileNameFromGeneration(String base, String extension, long gen) { + if (gen == -1) { + return null; + } else if (gen == 0) { + return base + extension; + } else { + return base + "_" + Long.toString(gen, Character.MAX_RADIX) + extension; + } + } } diff --git a/src/java/org/apache/lucene/index/IndexReader.java b/src/java/org/apache/lucene/index/IndexReader.java index 56df53d4bd3..017671bbeb3 100644 --- a/src/java/org/apache/lucene/index/IndexReader.java +++ b/src/java/org/apache/lucene/index/IndexReader.java @@ -113,6 +113,7 @@ public abstract class IndexReader { private Directory directory; private boolean directoryOwner; private boolean closeDirectory; + protected IndexFileDeleter deleter; private SegmentInfos segmentInfos; private Lock writeLock; @@ -138,24 +139,40 @@ public abstract class IndexReader { } private static IndexReader open(final Directory directory, final boolean closeDirectory) throws IOException { - synchronized (directory) { // in- & inter-process sync - return (IndexReader)new Lock.With( - directory.makeLock(IndexWriter.COMMIT_LOCK_NAME), - IndexWriter.COMMIT_LOCK_TIMEOUT) { - public Object doBody() throws IOException { - SegmentInfos infos = new SegmentInfos(); - infos.read(directory); - if (infos.size() == 1) { // index is optimized - return SegmentReader.get(infos, infos.info(0), closeDirectory); - } - IndexReader[] readers = new IndexReader[infos.size()]; - for (int i = 0; i < infos.size(); i++) - readers[i] = SegmentReader.get(infos.info(i)); - return new MultiReader(directory, infos, closeDirectory, readers); + return (IndexReader) new SegmentInfos.FindSegmentsFile(directory) { + + public Object doBody(String segmentFileName) throws IOException { + + SegmentInfos infos = new SegmentInfos(); + infos.read(directory, segmentFileName); + + if (infos.size() == 1) { // index is optimized + return SegmentReader.get(infos, infos.info(0), closeDirectory); + } else { + + // To reduce the chance of hitting FileNotFound + // (and having to retry), we open segments in + // reverse because IndexWriter merges & deletes + // the newest segments first. + + IndexReader[] readers = new IndexReader[infos.size()]; + for (int i = infos.size()-1; i >= 0; i--) { + try { + readers[i] = SegmentReader.get(infos.info(i)); + } catch (IOException e) { + // Close all readers we had opened: + for(i++;itrue if an index exists; false otherwise */ public static boolean indexExists(String directory) { - return (new File(directory, IndexFileNames.SEGMENTS)).exists(); + return indexExists(new File(directory)); } /** @@ -328,8 +325,9 @@ public abstract class IndexReader { * @param directory the directory to check for an index * @return true if an index exists; false otherwise */ + public static boolean indexExists(File directory) { - return (new File(directory, IndexFileNames.SEGMENTS)).exists(); + return SegmentInfos.getCurrentSegmentGeneration(directory.list()) != -1; } /** @@ -340,7 +338,7 @@ public abstract class IndexReader { * @throws IOException if there is a problem with accessing the index */ public static boolean indexExists(Directory directory) throws IOException { - return directory.fileExists(IndexFileNames.SEGMENTS); + return SegmentInfos.getCurrentSegmentGeneration(directory) != -1; } /** Returns the number of documents in this index. */ @@ -592,17 +590,22 @@ public abstract class IndexReader { */ protected final synchronized void commit() throws IOException{ if(hasChanges){ + if (deleter == null) { + // In the MultiReader case, we share this deleter + // across all SegmentReaders: + setDeleter(new IndexFileDeleter(segmentInfos, directory)); + deleter.deleteFiles(); + } if(directoryOwner){ - synchronized (directory) { // in- & inter-process sync - new Lock.With(directory.makeLock(IndexWriter.COMMIT_LOCK_NAME), - IndexWriter.COMMIT_LOCK_TIMEOUT) { - public Object doBody() throws IOException { - doCommit(); - segmentInfos.write(directory); - return null; - } - }.run(); - } + deleter.clearPendingFiles(); + doCommit(); + String oldInfoFileName = segmentInfos.getCurrentSegmentFileName(); + segmentInfos.write(directory); + // Attempt to delete all files we just obsoleted: + + deleter.deleteFile(oldInfoFileName); + deleter.commitPendingFiles(); + deleter.deleteFiles(); if (writeLock != null) { writeLock.release(); // release write lock writeLock = null; @@ -614,6 +617,13 @@ public abstract class IndexReader { hasChanges = false; } + protected void setDeleter(IndexFileDeleter deleter) { + this.deleter = deleter; + } + protected IndexFileDeleter getDeleter() { + return deleter; + } + /** Implements commit. */ protected abstract void doCommit() throws IOException; @@ -658,8 +668,7 @@ public abstract class IndexReader { */ public static boolean isLocked(Directory directory) throws IOException { return - directory.makeLock(IndexWriter.WRITE_LOCK_NAME).isLocked() || - directory.makeLock(IndexWriter.COMMIT_LOCK_NAME).isLocked(); + directory.makeLock(IndexWriter.WRITE_LOCK_NAME).isLocked(); } /** @@ -684,7 +693,6 @@ public abstract class IndexReader { */ public static void unlock(Directory directory) throws IOException { directory.makeLock(IndexWriter.WRITE_LOCK_NAME).release(); - directory.makeLock(IndexWriter.COMMIT_LOCK_NAME).release(); } /** diff --git a/src/java/org/apache/lucene/index/IndexWriter.java b/src/java/org/apache/lucene/index/IndexWriter.java index 9533aebb6f6..d02c9435a68 100644 --- a/src/java/org/apache/lucene/index/IndexWriter.java +++ b/src/java/org/apache/lucene/index/IndexWriter.java @@ -67,16 +67,7 @@ public class IndexWriter { private long writeLockTimeout = WRITE_LOCK_TIMEOUT; - /** - * Default value for the commit lock timeout (10,000). - * @see #setDefaultCommitLockTimeout - */ - public static long COMMIT_LOCK_TIMEOUT = 10000; - - private long commitLockTimeout = COMMIT_LOCK_TIMEOUT; - public static final String WRITE_LOCK_NAME = "write.lock"; - public static final String COMMIT_LOCK_NAME = "commit.lock"; /** * Default value is 10. Change using {@link #setMergeFactor(int)}. @@ -111,6 +102,7 @@ public class IndexWriter { private SegmentInfos segmentInfos = new SegmentInfos(); // the segments private SegmentInfos ramSegmentInfos = new SegmentInfos(); // the segments in ramDirectory private final Directory ramDirectory = new RAMDirectory(); // for temp segs + private IndexFileDeleter deleter; private Lock writeLock; @@ -260,19 +252,30 @@ public class IndexWriter { this.writeLock = writeLock; // save it try { - synchronized (directory) { // in- & inter-process sync - new Lock.With(directory.makeLock(IndexWriter.COMMIT_LOCK_NAME), commitLockTimeout) { - public Object doBody() throws IOException { - if (create) - segmentInfos.write(directory); - else - segmentInfos.read(directory); - return null; - } - }.run(); + if (create) { + // Try to read first. This is to allow create + // against an index that's currently open for + // searching. In this case we write the next + // segments_N file with no segments: + try { + segmentInfos.read(directory); + segmentInfos.clear(); + } catch (IOException e) { + // Likely this means it's a fresh directory + } + segmentInfos.write(directory); + } else { + segmentInfos.read(directory); } + + // Create a deleter to keep track of which files can + // be deleted: + deleter = new IndexFileDeleter(segmentInfos, directory); + deleter.setInfoStream(infoStream); + deleter.findDeletableFiles(); + deleter.deleteFiles(); + } catch (IOException e) { - // the doBody method failed this.writeLock.release(); this.writeLock = null; throw e; @@ -380,35 +383,6 @@ public class IndexWriter { return infoStream; } - /** - * Sets the maximum time to wait for a commit lock (in milliseconds) for this instance of IndexWriter. @see - * @see #setDefaultCommitLockTimeout to change the default value for all instances of IndexWriter. - */ - public void setCommitLockTimeout(long commitLockTimeout) { - this.commitLockTimeout = commitLockTimeout; - } - - /** - * @see #setCommitLockTimeout - */ - public long getCommitLockTimeout() { - return commitLockTimeout; - } - - /** - * Sets the default (for any instance of IndexWriter) maximum time to wait for a commit lock (in milliseconds) - */ - public static void setDefaultCommitLockTimeout(long commitLockTimeout) { - IndexWriter.COMMIT_LOCK_TIMEOUT = commitLockTimeout; - } - - /** - * @see #setDefaultCommitLockTimeout - */ - public static long getDefaultCommitLockTimeout() { - return IndexWriter.COMMIT_LOCK_TIMEOUT; - } - /** * Sets the maximum time to wait for a write lock (in milliseconds) for this instance of IndexWriter. @see * @see #setDefaultWriteLockTimeout to change the default value for all instances of IndexWriter. @@ -517,7 +491,7 @@ public class IndexWriter { String segmentName = newRAMSegmentName(); dw.addDocument(segmentName, doc); synchronized (this) { - ramSegmentInfos.addElement(new SegmentInfo(segmentName, 1, ramDirectory)); + ramSegmentInfos.addElement(new SegmentInfo(segmentName, 1, ramDirectory, false)); maybeFlushRamSegments(); } } @@ -790,36 +764,26 @@ public class IndexWriter { int docCount = merger.merge(); // merge 'em segmentInfos.setSize(0); // pop old infos & add new - segmentInfos.addElement(new SegmentInfo(mergedName, docCount, directory)); + SegmentInfo info = new SegmentInfo(mergedName, docCount, directory, false); + segmentInfos.addElement(info); if(sReader != null) sReader.close(); - synchronized (directory) { // in- & inter-process sync - new Lock.With(directory.makeLock(COMMIT_LOCK_NAME), commitLockTimeout) { - public Object doBody() throws IOException { - segmentInfos.write(directory); // commit changes - return null; - } - }.run(); - } + String segmentsInfosFileName = segmentInfos.getCurrentSegmentFileName(); + segmentInfos.write(directory); // commit changes - deleteSegments(segmentsToDelete); // delete now-unused segments + deleter.deleteFile(segmentsInfosFileName); // delete old segments_N file + deleter.deleteSegments(segmentsToDelete); // delete now-unused segments if (useCompoundFile) { - final Vector filesToDelete = merger.createCompoundFile(mergedName + ".tmp"); - synchronized (directory) { // in- & inter-process sync - new Lock.With(directory.makeLock(COMMIT_LOCK_NAME), commitLockTimeout) { - public Object doBody() throws IOException { - // make compound file visible for SegmentReaders - directory.renameFile(mergedName + ".tmp", mergedName + ".cfs"); - return null; - } - }.run(); - } + Vector filesToDelete = merger.createCompoundFile(mergedName + ".cfs"); + segmentsInfosFileName = segmentInfos.getCurrentSegmentFileName(); + info.setUseCompoundFile(true); + segmentInfos.write(directory); // commit again so readers know we've switched this segment to a compound file - // delete now unused files of segment - deleteFiles(filesToDelete); + deleter.deleteFile(segmentsInfosFileName); // delete old segments_N file + deleter.deleteFiles(filesToDelete); // delete now unused files of segment } } @@ -937,10 +901,11 @@ public class IndexWriter { */ private final int mergeSegments(SegmentInfos sourceSegments, int minSegment, int end) throws IOException { + final String mergedName = newSegmentName(); if (infoStream != null) infoStream.print("merging segments"); SegmentMerger merger = new SegmentMerger(this, mergedName); - + final Vector segmentsToDelete = new Vector(); for (int i = minSegment; i < end; i++) { SegmentInfo si = sourceSegments.info(i); @@ -960,7 +925,7 @@ public class IndexWriter { } SegmentInfo newSegment = new SegmentInfo(mergedName, mergedDocCount, - directory); + directory, false); if (sourceSegments == ramSegmentInfos) { sourceSegments.removeAllElements(); segmentInfos.addElement(newSegment); @@ -973,115 +938,26 @@ public class IndexWriter { // close readers before we attempt to delete now-obsolete segments merger.closeReaders(); - synchronized (directory) { // in- & inter-process sync - new Lock.With(directory.makeLock(COMMIT_LOCK_NAME), commitLockTimeout) { - public Object doBody() throws IOException { - segmentInfos.write(directory); // commit before deleting - return null; - } - }.run(); - } - - deleteSegments(segmentsToDelete); // delete now-unused segments + String segmentsInfosFileName = segmentInfos.getCurrentSegmentFileName(); + segmentInfos.write(directory); // commit before deleting + + deleter.deleteFile(segmentsInfosFileName); // delete old segments_N file + deleter.deleteSegments(segmentsToDelete); // delete now-unused segments if (useCompoundFile) { - final Vector filesToDelete = merger.createCompoundFile(mergedName + ".tmp"); - synchronized (directory) { // in- & inter-process sync - new Lock.With(directory.makeLock(COMMIT_LOCK_NAME), commitLockTimeout) { - public Object doBody() throws IOException { - // make compound file visible for SegmentReaders - directory.renameFile(mergedName + ".tmp", mergedName + ".cfs"); - return null; - } - }.run(); - } + Vector filesToDelete = merger.createCompoundFile(mergedName + ".cfs"); - // delete now unused files of segment - deleteFiles(filesToDelete); + segmentsInfosFileName = segmentInfos.getCurrentSegmentFileName(); + newSegment.setUseCompoundFile(true); + segmentInfos.write(directory); // commit again so readers know we've switched this segment to a compound file + + deleter.deleteFile(segmentsInfosFileName); // delete old segments_N file + deleter.deleteFiles(filesToDelete); // delete now-unused segments } return mergedDocCount; } - /* - * Some operating systems (e.g. Windows) don't permit a file to be deleted - * while it is opened for read (e.g. by another process or thread). So we - * assume that when a delete fails it is because the file is open in another - * process, and queue the file for subsequent deletion. - */ - - private final void deleteSegments(Vector segments) throws IOException { - Vector deletable = new Vector(); - - deleteFiles(readDeleteableFiles(), deletable); // try to delete deleteable - - for (int i = 0; i < segments.size(); i++) { - SegmentReader reader = (SegmentReader)segments.elementAt(i); - if (reader.directory() == this.directory) - deleteFiles(reader.files(), deletable); // try to delete our files - else - deleteFiles(reader.files(), reader.directory()); // delete other files - } - - writeDeleteableFiles(deletable); // note files we can't delete - } - - private final void deleteFiles(Vector files) throws IOException { - Vector deletable = new Vector(); - deleteFiles(readDeleteableFiles(), deletable); // try to delete deleteable - deleteFiles(files, deletable); // try to delete our files - writeDeleteableFiles(deletable); // note files we can't delete - } - - private final void deleteFiles(Vector files, Directory directory) - throws IOException { - for (int i = 0; i < files.size(); i++) - directory.deleteFile((String)files.elementAt(i)); - } - - private final void deleteFiles(Vector files, Vector deletable) - throws IOException { - for (int i = 0; i < files.size(); i++) { - String file = (String)files.elementAt(i); - try { - directory.deleteFile(file); // try to delete each file - } catch (IOException e) { // if delete fails - if (directory.fileExists(file)) { - if (infoStream != null) - infoStream.println(e.toString() + "; Will re-try later."); - deletable.addElement(file); // add to deletable - } - } - } - } - - private final Vector readDeleteableFiles() throws IOException { - Vector result = new Vector(); - if (!directory.fileExists(IndexFileNames.DELETABLE)) - return result; - - IndexInput input = directory.openInput(IndexFileNames.DELETABLE); - try { - for (int i = input.readInt(); i > 0; i--) // read file names - result.addElement(input.readString()); - } finally { - input.close(); - } - return result; - } - - private final void writeDeleteableFiles(Vector files) throws IOException { - IndexOutput output = directory.createOutput("deleteable.new"); - try { - output.writeInt(files.size()); - for (int i = 0; i < files.size(); i++) - output.writeString((String)files.elementAt(i)); - } finally { - output.close(); - } - directory.renameFile("deleteable.new", IndexFileNames.DELETABLE); - } - private final boolean checkNonDecreasingLevels(int start) { int lowerBound = -1; int upperBound = minMergeDocs; diff --git a/src/java/org/apache/lucene/index/MultiReader.java b/src/java/org/apache/lucene/index/MultiReader.java index 15295fd9bee..91c4c562e4e 100644 --- a/src/java/org/apache/lucene/index/MultiReader.java +++ b/src/java/org/apache/lucene/index/MultiReader.java @@ -218,6 +218,13 @@ public class MultiReader extends IndexReader { return new MultiTermPositions(subReaders, starts); } + protected void setDeleter(IndexFileDeleter deleter) { + // Share deleter to our SegmentReaders: + this.deleter = deleter; + for (int i = 0; i < subReaders.length; i++) + subReaders[i].setDeleter(deleter); + } + protected void doCommit() throws IOException { for (int i = 0; i < subReaders.length; i++) subReaders[i].commit(); diff --git a/src/java/org/apache/lucene/index/SegmentInfo.java b/src/java/org/apache/lucene/index/SegmentInfo.java index 9f6b6d96334..8e47977dd64 100644 --- a/src/java/org/apache/lucene/index/SegmentInfo.java +++ b/src/java/org/apache/lucene/index/SegmentInfo.java @@ -18,15 +18,302 @@ package org.apache.lucene.index; */ import org.apache.lucene.store.Directory; +import org.apache.lucene.store.IndexOutput; +import org.apache.lucene.store.IndexInput; +import java.io.IOException; final class SegmentInfo { public String name; // unique name in dir public int docCount; // number of docs in seg public Directory dir; // where segment resides + private boolean preLockless; // true if this is a segments file written before + // lock-less commits (XXX) + + private long delGen; // current generation of del file; -1 if there + // are no deletes; 0 if it's a pre-XXX segment + // (and we must check filesystem); 1 or higher if + // there are deletes at generation N + + private long[] normGen; // current generations of each field's norm file. + // If this array is null, we must check filesystem + // when preLockLess is true. Else, + // there are no separate norms + + private byte isCompoundFile; // -1 if it is not; 1 if it is; 0 if it's + // pre-XXX (ie, must check file system to see + // if .cfs exists) + public SegmentInfo(String name, int docCount, Directory dir) { this.name = name; this.docCount = docCount; this.dir = dir; + delGen = -1; + isCompoundFile = 0; + preLockless = true; + } + public SegmentInfo(String name, int docCount, Directory dir, boolean isCompoundFile) { + this(name, docCount, dir); + if (isCompoundFile) { + this.isCompoundFile = 1; + } else { + this.isCompoundFile = -1; + } + preLockless = false; + } + + + /** + * Construct a new SegmentInfo instance by reading a + * previously saved SegmentInfo from input. + * + * @param dir directory to load from + * @param format format of the segments info file + * @param input input handle to read segment info from + */ + public SegmentInfo(Directory dir, int format, IndexInput input) throws IOException { + this.dir = dir; + name = input.readString(); + docCount = input.readInt(); + if (format <= SegmentInfos.FORMAT_LOCKLESS) { + delGen = input.readLong(); + int numNormGen = input.readInt(); + if (numNormGen == -1) { + normGen = null; + } else { + normGen = new long[numNormGen]; + for(int j=0;j 0: this means this segment was written by + // the LOCKLESS code and for certain has + // deletions + // + if (delGen == -1) { + return false; + } else if (delGen > 0) { + return true; + } else { + return dir.fileExists(getDelFileName()); + } + } + + void advanceDelGen() { + // delGen 0 is reserved for pre-LOCKLESS format + if (delGen == -1) { + delGen = 1; + } else { + delGen++; + } + } + + void clearDelGen() { + delGen = -1; + } + + String getDelFileName() { + if (delGen == -1) { + // In this case we know there is no deletion filename + // against this segment + return null; + } else { + // If delGen is 0, it's the pre-lockless-commit file format + return IndexFileNames.fileNameFromGeneration(name, ".del", delGen); + } + } + + /** + * Returns true if this field for this segment has saved a separate norms file (__N.sX). + * + * @param fieldNumber the field index to check + */ + boolean hasSeparateNorms(int fieldNumber) + throws IOException { + if ((normGen == null && preLockless) || (normGen != null && normGen[fieldNumber] == 0)) { + // Must fallback to directory file exists check: + String fileName = name + ".s" + fieldNumber; + return dir.fileExists(fileName); + } else if (normGen == null || normGen[fieldNumber] == -1) { + return false; + } else { + return true; + } + } + + /** + * Returns true if any fields in this segment have separate norms. + */ + boolean hasSeparateNorms() + throws IOException { + if (normGen == null) { + if (!preLockless) { + // This means we were created w/ LOCKLESS code and no + // norms are written yet: + return false; + } else { + // This means this segment was saved with pre-LOCKLESS + // code. So we must fallback to the original + // directory list check: + String[] result = dir.list(); + String pattern; + pattern = name + ".s"; + int patternLength = pattern.length(); + for(int i = 0; i < result.length; i++){ + if(result[i].startsWith(pattern) && Character.isDigit(result[i].charAt(patternLength))) + return true; + } + return false; + } + } else { + // This means this segment was saved with LOCKLESS + // code so we first check whether any normGen's are > + // 0 (meaning they definitely have separate norms): + for(int i=0;i 0) { + return true; + } + } + // Next we look for any == 0. These cases were + // pre-LOCKLESS and must be checked in directory: + for(int i=0;i= 0 */ public static final int FORMAT = -1; - + + /** This is the current file format written. It differs + * slightly from the previous format in that file names + * are never re-used (write once). Instead, each file is + * written to the next generation. For example, + * segments_1, segments_2, etc. This allows us to not use + * a commit lock. See file + * formats for details. + */ + public static final int FORMAT_LOCKLESS = -2; + public int counter = 0; // used to name new segments /** * counts how often the index has been changed by adding or deleting docs. * starting with the current time in milliseconds forces to create unique version numbers. */ private long version = System.currentTimeMillis(); + private long generation = 0; // generation of the "segments_N" file we read + + /** + * If non-null, information about loading segments_N files + * will be printed here. @see #setInfoStream. + */ + private static PrintStream infoStream; public final SegmentInfo info(int i) { return (SegmentInfo) elementAt(i); } - public final void read(Directory directory) throws IOException { - - IndexInput input = directory.openInput(IndexFileNames.SEGMENTS); + /** + * Get the generation (N) of the current segments_N file + * from a list of files. + * + * @param files -- array of file names to check + */ + public static long getCurrentSegmentGeneration(String[] files) { + if (files == null) { + return -1; + } + long max = -1; + int prefixLen = IndexFileNames.SEGMENTS.length()+1; + for (int i = 0; i < files.length; i++) { + String file = files[i]; + if (file.startsWith(IndexFileNames.SEGMENTS) && !file.equals(IndexFileNames.SEGMENTS_GEN)) { + if (file.equals(IndexFileNames.SEGMENTS)) { + // Pre lock-less commits: + if (max == -1) { + max = 0; + } + } else { + long v = Long.parseLong(file.substring(prefixLen), Character.MAX_RADIX); + if (v > max) { + max = v; + } + } + } + } + return max; + } + + /** + * Get the generation (N) of the current segments_N file + * in the directory. + * + * @param directory -- directory to search for the latest segments_N file + */ + public static long getCurrentSegmentGeneration(Directory directory) throws IOException { + String[] files = directory.list(); + if (files == null) + throw new IOException("Cannot read directory " + directory); + return getCurrentSegmentGeneration(files); + } + + /** + * Get the filename of the current segments_N file + * from a list of files. + * + * @param files -- array of file names to check + */ + + public static String getCurrentSegmentFileName(String[] files) throws IOException { + return IndexFileNames.fileNameFromGeneration(IndexFileNames.SEGMENTS, + "", + getCurrentSegmentGeneration(files)); + } + + /** + * Get the filename of the current segments_N file + * in the directory. + * + * @param directory -- directory to search for the latest segments_N file + */ + public static String getCurrentSegmentFileName(Directory directory) throws IOException { + return IndexFileNames.fileNameFromGeneration(IndexFileNames.SEGMENTS, + "", + getCurrentSegmentGeneration(directory)); + } + + /** + * Get the segment_N filename in use by this segment infos. + */ + public String getCurrentSegmentFileName() { + return IndexFileNames.fileNameFromGeneration(IndexFileNames.SEGMENTS, + "", + generation); + } + + /** + * Read a particular segmentFileName. Note that this may + * throw an IOException if a commit is in process. + * + * @param directory -- directory containing the segments file + * @param segmentFileName -- segment file to load + */ + public final void read(Directory directory, String segmentFileName) throws IOException { + boolean success = false; + + IndexInput input = directory.openInput(segmentFileName); + + if (segmentFileName.equals(IndexFileNames.SEGMENTS)) { + generation = 0; + } else { + generation = Long.parseLong(segmentFileName.substring(1+IndexFileNames.SEGMENTS.length()), + Character.MAX_RADIX); + } + try { int format = input.readInt(); if(format < 0){ // file contains explicit format info // check that it is a format we can understand - if (format < FORMAT) + if (format < FORMAT_LOCKLESS) throw new IOException("Unknown format version: " + format); version = input.readLong(); // read version counter = input.readInt(); // read counter @@ -58,9 +173,7 @@ final class SegmentInfos extends Vector { } for (int i = input.readInt(); i > 0; i--) { // read segmentInfos - SegmentInfo si = - new SegmentInfo(input.readString(), input.readInt(), directory); - addElement(si); + addElement(new SegmentInfo(directory, format, input)); } if(format >= 0){ // in old format the version number may be at the end of the file @@ -69,31 +182,71 @@ final class SegmentInfos extends Vector { else version = input.readLong(); // read version } + success = true; } finally { input.close(); + if (!success) { + // Clear any segment infos we had loaded so we + // have a clean slate on retry: + clear(); + } } } + /** + * This version of read uses the retry logic (for lock-less + * commits) to find the right segments file to load. + */ + public final void read(Directory directory) throws IOException { + + generation = -1; + + new FindSegmentsFile(directory) { + + public Object doBody(String segmentFileName) throws IOException { + read(directory, segmentFileName); + return null; + } + }.run(); + } public final void write(Directory directory) throws IOException { - IndexOutput output = directory.createOutput("segments.new"); + + // Always advance the generation on write: + if (generation == -1) { + generation = 1; + } else { + generation++; + } + + String segmentFileName = getCurrentSegmentFileName(); + IndexOutput output = directory.createOutput(segmentFileName); + try { - output.writeInt(FORMAT); // write FORMAT - output.writeLong(++version); // every write changes the index + output.writeInt(FORMAT_LOCKLESS); // write FORMAT + output.writeLong(++version); // every write changes + // the index output.writeInt(counter); // write counter output.writeInt(size()); // write infos for (int i = 0; i < size(); i++) { SegmentInfo si = info(i); - output.writeString(si.name); - output.writeInt(si.docCount); + si.write(output); } } finally { output.close(); } - // install new segment info - directory.renameFile("segments.new", IndexFileNames.SEGMENTS); + try { + output = directory.createOutput(IndexFileNames.SEGMENTS_GEN); + output.writeInt(FORMAT_LOCKLESS); + output.writeLong(generation); + output.writeLong(generation); + output.close(); + } catch (IOException e) { + // It's OK if we fail to write this file since it's + // used only as one of the retry fallbacks. + } } /** @@ -108,30 +261,322 @@ final class SegmentInfos extends Vector { */ public static long readCurrentVersion(Directory directory) throws IOException { + + return ((Long) new FindSegmentsFile(directory) { + public Object doBody(String segmentFileName) throws IOException { + + IndexInput input = directory.openInput(segmentFileName); + + int format = 0; + long version = 0; + try { + format = input.readInt(); + if(format < 0){ + if (format < FORMAT_LOCKLESS) + throw new IOException("Unknown format version: " + format); + version = input.readLong(); // read version + } + } + finally { + input.close(); + } + + if(format < 0) + return new Long(version); + + // We cannot be sure about the format of the file. + // Therefore we have to read the whole file and cannot simply seek to the version entry. + SegmentInfos sis = new SegmentInfos(); + sis.read(directory, segmentFileName); + return new Long(sis.getVersion()); + } + }.run()).longValue(); + } + + /** If non-null, information about retries when loading + * the segments file will be printed to this. + */ + public static void setInfoStream(PrintStream infoStream) { + SegmentInfos.infoStream = infoStream; + } + + /* Advanced configuration of retry logic in loading + segments_N file */ + private static int defaultGenFileRetryCount = 10; + private static int defaultGenFileRetryPauseMsec = 50; + private static int defaultGenLookaheadCount = 10; + + /** + * Advanced: set how many times to try loading the + * segments.gen file contents to determine current segment + * generation. This file is only referenced when the + * primary method (listing the directory) fails. + */ + public static void setDefaultGenFileRetryCount(int count) { + defaultGenFileRetryCount = count; + } + + /** + * @see #setDefaultGenFileRetryCount + */ + public static int getDefaultGenFileRetryCount() { + return defaultGenFileRetryCount; + } + + /** + * Advanced: set how many milliseconds to pause in between + * attempts to load the segments.gen file. + */ + public static void setDefaultGenFileRetryPauseMsec(int msec) { + defaultGenFileRetryPauseMsec = msec; + } + + /** + * @see #setDefaultGenFileRetryPauseMsec + */ + public static int getDefaultGenFileRetryPauseMsec() { + return defaultGenFileRetryPauseMsec; + } + + /** + * Advanced: set how many times to try incrementing the + * gen when loading the segments file. This only runs if + * the primary (listing directory) and secondary (opening + * segments.gen file) methods fail to find the segments + * file. + */ + public static void setDefaultGenLookaheadCount(int count) { + defaultGenLookaheadCount = count; + } + /** + * @see #setDefaultGenLookaheadCount + */ + public static int getDefaultGenLookahedCount() { + return defaultGenLookaheadCount; + } + + /** + * @see #setInfoStream + */ + public static PrintStream getInfoStream() { + return infoStream; + } + + private static void message(String message) { + if (infoStream != null) { + infoStream.println(Thread.currentThread().getName() + ": " + message); + } + } + + /** + * Utility class for executing code that needs to do + * something with the current segments file. This is + * necessary with lock-less commits because from the time + * you locate the current segments file name, until you + * actually open it, read its contents, or check modified + * time, etc., it could have been deleted due to a writer + * commit finishing. + */ + public abstract static class FindSegmentsFile { + + File fileDirectory; + Directory directory; + + public FindSegmentsFile(File directory) { + this.fileDirectory = directory; + } + + public FindSegmentsFile(Directory directory) { + this.directory = directory; + } + + public Object run() throws IOException { + String segmentFileName = null; + long lastGen = -1; + long gen = 0; + int genLookaheadCount = 0; + IOException exc = null; + boolean retry = false; + + int method = 0; + + // Loop until we succeed in calling doBody() without + // hitting an IOException. An IOException most likely + // means a commit was in process and has finished, in + // the time it took us to load the now-old infos files + // (and segments files). It's also possible it's a + // true error (corrupt index). To distinguish these, + // on each retry we must see "forward progress" on + // which generation we are trying to load. If we + // don't, then the original error is real and we throw + // it. - IndexInput input = directory.openInput(IndexFileNames.SEGMENTS); - int format = 0; - long version = 0; - try { - format = input.readInt(); - if(format < 0){ - if (format < FORMAT) - throw new IOException("Unknown format version: " + format); - version = input.readLong(); // read version + // We have three methods for determining the current + // generation. We try each in sequence. + + while(true) { + + // Method 1: list the directory and use the highest + // segments_N file. This method works well as long + // as there is no stale caching on the directory + // contents: + String[] files = null; + + if (0 == method) { + if (directory != null) { + files = directory.list(); + } else { + files = fileDirectory.list(); + } + + gen = getCurrentSegmentGeneration(files); + + if (gen == -1) { + String s = ""; + for(int i=0;i gen) { + message("fallback to '" + IndexFileNames.SEGMENTS_GEN + "' check: now try generation " + gen0 + " > " + gen); + gen = gen0; + } + break; + } + } + } catch (IOException err2) { + // will retry + } finally { + genInput.close(); + } + } + try { + Thread.sleep(defaultGenFileRetryPauseMsec); + } catch (InterruptedException e) { + // will retry + } + } + } + + // Method 3 (fallback if Methods 2 & 3 are not + // reliabel): since both directory cache and file + // contents cache seem to be stale, just advance the + // generation. + if (2 == method || (1 == method && lastGen == gen && retry)) { + + method = 2; + + if (genLookaheadCount < defaultGenLookaheadCount) { + gen++; + genLookaheadCount++; + message("look ahead incremenent gen to " + gen); + } + } + + if (lastGen == gen) { + + // This means we're about to try the same + // segments_N last tried. This is allowed, + // exactly once, because writer could have been in + // the process of writing segments_N last time. + + if (retry) { + // OK, we've tried the same segments_N file + // twice in a row, so this must be a real + // error. We throw the original exception we + // got. + throw exc; + } else { + retry = true; + } + + } else { + // Segment file has advanced since our last loop, so + // reset retry: + retry = false; + } + + lastGen = gen; + + segmentFileName = IndexFileNames.fileNameFromGeneration(IndexFileNames.SEGMENTS, + "", + gen); + + try { + Object v = doBody(segmentFileName); + if (exc != null) { + message("success on " + segmentFileName); + } + return v; + } catch (IOException err) { + + // Save the original root cause: + if (exc == null) { + exc = err; + } + + message("primary Exception on '" + segmentFileName + "': " + err + "'; will retry: retry=" + retry + "; gen = " + gen); + + if (!retry && gen > 1) { + + // This is our first time trying this segments + // file (because retry is false), and, there is + // possibly a segments_(N-1) (because gen > 1). + // So, check if the segments_(N-1) exists and + // try it if so: + String prevSegmentFileName = IndexFileNames.fileNameFromGeneration(IndexFileNames.SEGMENTS, + "", + gen-1); + + if (directory.fileExists(prevSegmentFileName)) { + message("fallback to prior segment file '" + prevSegmentFileName + "'"); + try { + Object v = doBody(prevSegmentFileName); + if (exc != null) { + message("success on fallback " + prevSegmentFileName); + } + return v; + } catch (IOException err2) { + message("secondary Exception on '" + prevSegmentFileName + "': " + err2 + "'; will retry"); + } + } + } + } } } - finally { - input.close(); - } - - if(format < 0) - return version; - // We cannot be sure about the format of the file. - // Therefore we have to read the whole file and cannot simply seek to the version entry. - - SegmentInfos sis = new SegmentInfos(); - sis.read(directory); - return sis.getVersion(); - } + /** + * Subclass must implement this. The assumption is an + * IOException will be thrown if something goes wrong + * during the processing that could have been caused by + * a writer committing. + */ + protected abstract Object doBody(String segmentFileName) throws IOException;} } diff --git a/src/java/org/apache/lucene/index/SegmentReader.java b/src/java/org/apache/lucene/index/SegmentReader.java index 19b4edb0c96..091e789db86 100644 --- a/src/java/org/apache/lucene/index/SegmentReader.java +++ b/src/java/org/apache/lucene/index/SegmentReader.java @@ -33,6 +33,7 @@ import java.util.*; */ class SegmentReader extends IndexReader { private String segment; + private SegmentInfo si; FieldInfos fieldInfos; private FieldsReader fieldsReader; @@ -64,22 +65,24 @@ class SegmentReader extends IndexReader { private boolean dirty; private int number; - private void reWrite() throws IOException { + private void reWrite(SegmentInfo si) throws IOException { // NOTE: norms are re-written in regular directory, not cfs - IndexOutput out = directory().createOutput(segment + ".tmp"); + + String oldFileName = si.getNormFileName(this.number); + if (oldFileName != null) { + // Mark this file for deletion. Note that we don't + // actually try to delete it until the new segments files is + // successfully written: + deleter.addPendingFile(oldFileName); + } + + si.advanceNormGen(this.number); + IndexOutput out = directory().createOutput(si.getNormFileName(this.number)); try { out.writeBytes(bytes, maxDoc()); } finally { out.close(); } - String fileName; - if(cfsReader == null) - fileName = segment + ".f" + number; - else{ - // use a different file name if we have compound format - fileName = segment + ".s" + number; - } - directory().renameFile(segment + ".tmp", fileName); this.dirty = false; } } @@ -131,57 +134,94 @@ class SegmentReader extends IndexReader { return instance; } - private void initialize(SegmentInfo si) throws IOException { + private void initialize(SegmentInfo si) throws IOException { segment = si.name; + this.si = si; - // Use compound file directory for some files, if it exists - Directory cfsDir = directory(); - if (directory().fileExists(segment + ".cfs")) { - cfsReader = new CompoundFileReader(directory(), segment + ".cfs"); - cfsDir = cfsReader; - } + boolean success = false; - // No compound file exists - use the multi-file format - fieldInfos = new FieldInfos(cfsDir, segment + ".fnm"); - fieldsReader = new FieldsReader(cfsDir, segment, fieldInfos); + try { + // Use compound file directory for some files, if it exists + Directory cfsDir = directory(); + if (si.getUseCompoundFile()) { + cfsReader = new CompoundFileReader(directory(), segment + ".cfs"); + cfsDir = cfsReader; + } - tis = new TermInfosReader(cfsDir, segment, fieldInfos); + // No compound file exists - use the multi-file format + fieldInfos = new FieldInfos(cfsDir, segment + ".fnm"); + fieldsReader = new FieldsReader(cfsDir, segment, fieldInfos); - // NOTE: the bitvector is stored using the regular directory, not cfs - if (hasDeletions(si)) - deletedDocs = new BitVector(directory(), segment + ".del"); + tis = new TermInfosReader(cfsDir, segment, fieldInfos); + + // NOTE: the bitvector is stored using the regular directory, not cfs + if (hasDeletions(si)) { + deletedDocs = new BitVector(directory(), si.getDelFileName()); + } - // make sure that all index files have been read or are kept open - // so that if an index update removes them we'll still have them - freqStream = cfsDir.openInput(segment + ".frq"); - proxStream = cfsDir.openInput(segment + ".prx"); - openNorms(cfsDir); + // make sure that all index files have been read or are kept open + // so that if an index update removes them we'll still have them + freqStream = cfsDir.openInput(segment + ".frq"); + proxStream = cfsDir.openInput(segment + ".prx"); + openNorms(cfsDir); - if (fieldInfos.hasVectors()) { // open term vector files only as needed - termVectorsReaderOrig = new TermVectorsReader(cfsDir, segment, fieldInfos); + if (fieldInfos.hasVectors()) { // open term vector files only as needed + termVectorsReaderOrig = new TermVectorsReader(cfsDir, segment, fieldInfos); + } + success = true; + } finally { + + // With lock-less commits, it's entirely possible (and + // fine) to hit a FileNotFound exception above. In + // this case, we want to explicitly close any subset + // of things that were opened so that we don't have to + // wait for a GC to do so. + if (!success) { + doClose(); + } } } - protected void finalize() { + protected void finalize() { // patch for pre-1.4.2 JVMs, whose ThreadLocals leak termVectorsLocal.set(null); super.finalize(); - } + } protected void doCommit() throws IOException { if (deletedDocsDirty) { // re-write deleted - deletedDocs.write(directory(), segment + ".tmp"); - directory().renameFile(segment + ".tmp", segment + ".del"); + String oldDelFileName = si.getDelFileName(); + if (oldDelFileName != null) { + // Mark this file for deletion. Note that we don't + // actually try to delete it until the new segments files is + // successfully written: + deleter.addPendingFile(oldDelFileName); + } + + si.advanceDelGen(); + + // We can write directly to the actual name (vs to a + // .tmp & renaming it) because the file is not live + // until segments file is written: + deletedDocs.write(directory(), si.getDelFileName()); } - if(undeleteAll && directory().fileExists(segment + ".del")){ - directory().deleteFile(segment + ".del"); + if (undeleteAll && si.hasDeletions()) { + String oldDelFileName = si.getDelFileName(); + if (oldDelFileName != null) { + // Mark this file for deletion. Note that we don't + // actually try to delete it until the new segments files is + // successfully written: + deleter.addPendingFile(oldDelFileName); + } + si.clearDelGen(); } if (normsDirty) { // re-write norms + si.setNumField(fieldInfos.size()); Enumeration values = norms.elements(); while (values.hasMoreElements()) { Norm norm = (Norm) values.nextElement(); if (norm.dirty) { - norm.reWrite(); + norm.reWrite(si); } } } @@ -191,8 +231,12 @@ class SegmentReader extends IndexReader { } protected void doClose() throws IOException { - fieldsReader.close(); - tis.close(); + if (fieldsReader != null) { + fieldsReader.close(); + } + if (tis != null) { + tis.close(); + } if (freqStream != null) freqStream.close(); @@ -209,27 +253,19 @@ class SegmentReader extends IndexReader { } static boolean hasDeletions(SegmentInfo si) throws IOException { - return si.dir.fileExists(si.name + ".del"); + return si.hasDeletions(); } public boolean hasDeletions() { return deletedDocs != null; } - static boolean usesCompoundFile(SegmentInfo si) throws IOException { - return si.dir.fileExists(si.name + ".cfs"); + return si.getUseCompoundFile(); } static boolean hasSeparateNorms(SegmentInfo si) throws IOException { - String[] result = si.dir.list(); - String pattern = si.name + ".s"; - int patternLength = pattern.length(); - for(int i = 0; i < result.length; i++){ - if(result[i].startsWith(pattern) && Character.isDigit(result[i].charAt(patternLength))) - return true; - } - return false; + return si.hasSeparateNorms(); } protected void doDelete(int docNum) { @@ -249,23 +285,27 @@ class SegmentReader extends IndexReader { Vector files() throws IOException { Vector files = new Vector(16); - for (int i = 0; i < IndexFileNames.INDEX_EXTENSIONS.length; i++) { - String name = segment + "." + IndexFileNames.INDEX_EXTENSIONS[i]; - if (directory().fileExists(name)) + if (si.getUseCompoundFile()) { + String name = segment + ".cfs"; + if (directory().fileExists(name)) { files.addElement(name); + } + } else { + for (int i = 0; i < IndexFileNames.INDEX_EXTENSIONS.length; i++) { + String name = segment + "." + IndexFileNames.INDEX_EXTENSIONS[i]; + if (directory().fileExists(name)) + files.addElement(name); + } + } + + if (si.hasDeletions()) { + files.addElement(si.getDelFileName()); } for (int i = 0; i < fieldInfos.size(); i++) { - FieldInfo fi = fieldInfos.fieldInfo(i); - if (fi.isIndexed && !fi.omitNorms){ - String name; - if(cfsReader == null) - name = segment + ".f" + i; - else - name = segment + ".s" + i; - if (directory().fileExists(name)) + String name = si.getNormFileName(i); + if (name != null && directory().fileExists(name)) files.addElement(name); - } } return files; } @@ -380,7 +420,6 @@ class SegmentReader extends IndexReader { protected synchronized byte[] getNorms(String field) throws IOException { Norm norm = (Norm) norms.get(field); if (norm == null) return null; // not indexed, or norms not stored - if (norm.bytes == null) { // value not yet read byte[] bytes = new byte[maxDoc()]; norms(field, bytes, 0); @@ -436,12 +475,10 @@ class SegmentReader extends IndexReader { for (int i = 0; i < fieldInfos.size(); i++) { FieldInfo fi = fieldInfos.fieldInfo(i); if (fi.isIndexed && !fi.omitNorms) { - // look first if there are separate norms in compound format - String fileName = segment + ".s" + fi.number; Directory d = directory(); - if(!d.fileExists(fileName)){ - fileName = segment + ".f" + fi.number; - d = cfsDir; + String fileName = si.getNormFileName(fi.number); + if (!si.hasSeparateNorms(fi.number)) { + d = cfsDir; } norms.put(fi.name, new Norm(d.openInput(fileName), fi.number)); } diff --git a/src/java/org/apache/lucene/store/FSDirectory.java b/src/java/org/apache/lucene/store/FSDirectory.java index 5ccec1a4428..62345015cae 100644 --- a/src/java/org/apache/lucene/store/FSDirectory.java +++ b/src/java/org/apache/lucene/store/FSDirectory.java @@ -128,7 +128,7 @@ public class FSDirectory extends Directory { * @return the FSDirectory for the named file. */ public static FSDirectory getDirectory(String path, boolean create) throws IOException { - return getDirectory(path, create, null); + return getDirectory(new File(path), create, null, true); } /** Returns the directory instance for the named location, using the @@ -143,10 +143,16 @@ public class FSDirectory extends Directory { * @param lockFactory instance of {@link LockFactory} providing the * locking implementation. * @return the FSDirectory for the named file. */ + public static FSDirectory getDirectory(String path, boolean create, + LockFactory lockFactory, boolean doRemoveOldFiles) + throws IOException { + return getDirectory(new File(path), create, lockFactory, doRemoveOldFiles); + } + public static FSDirectory getDirectory(String path, boolean create, LockFactory lockFactory) throws IOException { - return getDirectory(new File(path), create, lockFactory); + return getDirectory(new File(path), create, lockFactory, true); } /** Returns the directory instance for the named location. @@ -158,9 +164,9 @@ public class FSDirectory extends Directory { * @param file the path to the directory. * @param create if true, create, or erase any existing contents. * @return the FSDirectory for the named file. */ - public static FSDirectory getDirectory(File file, boolean create) + public static FSDirectory getDirectory(File file, boolean create, boolean doRemoveOldFiles) throws IOException { - return getDirectory(file, create, null); + return getDirectory(file, create, null, doRemoveOldFiles); } /** Returns the directory instance for the named location, using the @@ -176,7 +182,7 @@ public class FSDirectory extends Directory { * locking implementation. * @return the FSDirectory for the named file. */ public static FSDirectory getDirectory(File file, boolean create, - LockFactory lockFactory) + LockFactory lockFactory, boolean doRemoveOldFiles) throws IOException { file = new File(file.getCanonicalPath()); FSDirectory dir; @@ -188,7 +194,7 @@ public class FSDirectory extends Directory { } catch (Exception e) { throw new RuntimeException("cannot load FSDirectory class: " + e.toString(), e); } - dir.init(file, create, lockFactory); + dir.init(file, create, lockFactory, doRemoveOldFiles); DIRECTORIES.put(file, dir); } else { @@ -199,7 +205,7 @@ public class FSDirectory extends Directory { } if (create) { - dir.create(); + dir.create(doRemoveOldFiles); } } } @@ -209,23 +215,35 @@ public class FSDirectory extends Directory { return dir; } + public static FSDirectory getDirectory(File file, boolean create, + LockFactory lockFactory) + throws IOException + { + return getDirectory(file, create, lockFactory, true); + } + + public static FSDirectory getDirectory(File file, boolean create) + throws IOException { + return getDirectory(file, create, true); + } + private File directory = null; private int refCount; protected FSDirectory() {}; // permit subclassing - private void init(File path, boolean create) throws IOException { + private void init(File path, boolean create, boolean doRemoveOldFiles) throws IOException { directory = path; if (create) { - create(); + create(doRemoveOldFiles); } if (!directory.isDirectory()) throw new IOException(path + " not a directory"); } - private void init(File path, boolean create, LockFactory lockFactory) throws IOException { + private void init(File path, boolean create, LockFactory lockFactory, boolean doRemoveOldFiles) throws IOException { // Set up lockFactory with cascaded defaults: if an instance was passed in, // use that; else if locks are disabled, use NoLockFactory; else if the @@ -280,10 +298,10 @@ public class FSDirectory extends Directory { setLockFactory(lockFactory); - init(path, create); + init(path, create, doRemoveOldFiles); } - private synchronized void create() throws IOException { + private synchronized void create(boolean doRemoveOldFiles) throws IOException { if (!directory.exists()) if (!directory.mkdirs()) throw new IOException("Cannot create directory: " + directory); @@ -291,13 +309,15 @@ public class FSDirectory extends Directory { if (!directory.isDirectory()) throw new IOException(directory + " not a directory"); - String[] files = directory.list(new IndexFileNameFilter()); // clear old files - if (files == null) - throw new IOException("Cannot read directory " + directory.getAbsolutePath()); - for (int i = 0; i < files.length; i++) { - File file = new File(directory, files[i]); - if (!file.delete()) - throw new IOException("Cannot delete " + file); + if (doRemoveOldFiles) { + String[] files = directory.list(IndexFileNameFilter.getFilter()); // clear old files + if (files == null) + throw new IOException("Cannot read directory " + directory.getAbsolutePath()); + for (int i = 0; i < files.length; i++) { + File file = new File(directory, files[i]); + if (!file.delete()) + throw new IOException("Cannot delete " + file); + } } lockFactory.clearAllLocks(); @@ -305,7 +325,7 @@ public class FSDirectory extends Directory { /** Returns an array of strings, one for each Lucene index file in the directory. */ public String[] list() { - return directory.list(new IndexFileNameFilter()); + return directory.list(IndexFileNameFilter.getFilter()); } /** Returns true iff a file with the given name exists. */ diff --git a/src/java/org/apache/lucene/store/RAMDirectory.java b/src/java/org/apache/lucene/store/RAMDirectory.java index 97b4ba2d42f..9d5aa94ca37 100644 --- a/src/java/org/apache/lucene/store/RAMDirectory.java +++ b/src/java/org/apache/lucene/store/RAMDirectory.java @@ -18,6 +18,7 @@ package org.apache.lucene.store; */ import java.io.IOException; +import java.io.FileNotFoundException; import java.io.File; import java.io.Serializable; import java.util.Hashtable; @@ -105,7 +106,7 @@ public final class RAMDirectory extends Directory implements Serializable { } /** Returns an array of strings, one for each file in the directory. */ - public final String[] list() { + public synchronized final String[] list() { String[] result = new String[files.size()]; int i = 0; Enumeration names = files.keys(); @@ -129,7 +130,7 @@ public final class RAMDirectory extends Directory implements Serializable { /** Set the modified time of an existing file to now. */ public void touchFile(String name) { // final boolean MONITOR = false; - + RAMFile file = (RAMFile)files.get(name); long ts2, ts1 = System.currentTimeMillis(); do { @@ -175,8 +176,11 @@ public final class RAMDirectory extends Directory implements Serializable { } /** Returns a stream reading an existing file. */ - public final IndexInput openInput(String name) { + public final IndexInput openInput(String name) throws IOException { RAMFile file = (RAMFile)files.get(name); + if (file == null) { + throw new FileNotFoundException(name); + } return new RAMInputStream(file); } diff --git a/src/test/org/apache/lucene/index/TestIndexReader.java b/src/test/org/apache/lucene/index/TestIndexReader.java index 4144a975c70..efe6ce0d930 100644 --- a/src/test/org/apache/lucene/index/TestIndexReader.java +++ b/src/test/org/apache/lucene/index/TestIndexReader.java @@ -32,6 +32,7 @@ import org.apache.lucene.document.Field; import java.util.Collection; import java.io.IOException; +import java.io.FileNotFoundException; import java.io.File; public class TestIndexReader extends TestCase @@ -222,6 +223,11 @@ public class TestIndexReader extends TestCase assertEquals("deleted count", 100, deleted); assertEquals("deleted docFreq", 100, reader.docFreq(searchTerm)); assertTermDocsCount("deleted termDocs", reader, searchTerm, 0); + + // open a 2nd reader to make sure first reader can + // commit its changes (.del) while second reader + // is open: + IndexReader reader2 = IndexReader.open(dir); reader.close(); // CREATE A NEW READER and re-test @@ -231,10 +237,73 @@ public class TestIndexReader extends TestCase reader.close(); } + // Make sure you can set norms & commit even if a reader + // is open against the index: + public void testWritingNorms() throws IOException + { + String tempDir = System.getProperty("tempDir"); + if (tempDir == null) + throw new IOException("tempDir undefined, cannot run test"); + + File indexDir = new File(tempDir, "lucenetestnormwriter"); + Directory dir = FSDirectory.getDirectory(indexDir, true); + IndexWriter writer = null; + IndexReader reader = null; + Term searchTerm = new Term("content", "aaa"); + + // add 1 documents with term : aaa + writer = new IndexWriter(dir, new WhitespaceAnalyzer(), true); + addDoc(writer, searchTerm.text()); + writer.close(); + + // now open reader & set norm for doc 0 + reader = IndexReader.open(dir); + reader.setNorm(0, "content", (float) 2.0); + + // we should be holding the write lock now: + assertTrue("locked", IndexReader.isLocked(dir)); + + reader.commit(); + + // we should not be holding the write lock now: + assertTrue("not locked", !IndexReader.isLocked(dir)); + + // open a 2nd reader: + IndexReader reader2 = IndexReader.open(dir); + + // set norm again for doc 0 + reader.setNorm(0, "content", (float) 3.0); + assertTrue("locked", IndexReader.isLocked(dir)); + + reader.close(); + + // we should not be holding the write lock now: + assertTrue("not locked", !IndexReader.isLocked(dir)); + + reader2.close(); + dir.close(); + + rmDir(indexDir); + } + public void testDeleteReaderWriterConflictUnoptimized() throws IOException{ deleteReaderWriterConflict(false); } + + public void testOpenEmptyDirectory() throws IOException{ + String dirName = "test.empty"; + File fileDirName = new File(dirName); + if (!fileDirName.exists()) { + fileDirName.mkdir(); + } + try { + IndexReader reader = IndexReader.open(fileDirName); + fail("opening IndexReader on empty directory failed to produce FileNotFoundException"); + } catch (FileNotFoundException e) { + // GOOD + } + } public void testDeleteReaderWriterConflictOptimized() throws IOException{ deleteReaderWriterConflict(true); @@ -368,12 +437,36 @@ public class TestIndexReader extends TestCase assertFalse(IndexReader.isLocked(dir)); // reader only, no lock long version = IndexReader.lastModified(dir); reader.close(); - // modify index and check version has been incremented: + // modify index and check version has been + // incremented: writer = new IndexWriter(dir, new WhitespaceAnalyzer(), true); addDocumentWithFields(writer); writer.close(); reader = IndexReader.open(dir); - assertTrue(version < IndexReader.getCurrentVersion(dir)); + assertTrue("old lastModified is " + version + "; new lastModified is " + IndexReader.lastModified(dir), version <= IndexReader.lastModified(dir)); + reader.close(); + } + + public void testVersion() throws IOException { + assertFalse(IndexReader.indexExists("there_is_no_such_index")); + Directory dir = new RAMDirectory(); + assertFalse(IndexReader.indexExists(dir)); + IndexWriter writer = new IndexWriter(dir, new WhitespaceAnalyzer(), true); + addDocumentWithFields(writer); + assertTrue(IndexReader.isLocked(dir)); // writer open, so dir is locked + writer.close(); + assertTrue(IndexReader.indexExists(dir)); + IndexReader reader = IndexReader.open(dir); + assertFalse(IndexReader.isLocked(dir)); // reader only, no lock + long version = IndexReader.getCurrentVersion(dir); + reader.close(); + // modify index and check version has been + // incremented: + writer = new IndexWriter(dir, new WhitespaceAnalyzer(), true); + addDocumentWithFields(writer); + writer.close(); + reader = IndexReader.open(dir); + assertTrue("old version is " + version + "; new version is " + IndexReader.getCurrentVersion(dir), version < IndexReader.getCurrentVersion(dir)); reader.close(); } @@ -412,6 +505,40 @@ public class TestIndexReader extends TestCase reader.close(); } + public void testUndeleteAllAfterClose() throws IOException { + Directory dir = new RAMDirectory(); + IndexWriter writer = new IndexWriter(dir, new WhitespaceAnalyzer(), true); + addDocumentWithFields(writer); + addDocumentWithFields(writer); + writer.close(); + IndexReader reader = IndexReader.open(dir); + reader.deleteDocument(0); + reader.deleteDocument(1); + reader.close(); + reader = IndexReader.open(dir); + reader.undeleteAll(); + assertEquals(2, reader.numDocs()); // nothing has really been deleted thanks to undeleteAll() + reader.close(); + } + + public void testUndeleteAllAfterCloseThenReopen() throws IOException { + Directory dir = new RAMDirectory(); + IndexWriter writer = new IndexWriter(dir, new WhitespaceAnalyzer(), true); + addDocumentWithFields(writer); + addDocumentWithFields(writer); + writer.close(); + IndexReader reader = IndexReader.open(dir); + reader.deleteDocument(0); + reader.deleteDocument(1); + reader.close(); + reader = IndexReader.open(dir); + reader.undeleteAll(); + reader.close(); + reader = IndexReader.open(dir); + assertEquals(2, reader.numDocs()); // nothing has really been deleted thanks to undeleteAll() + reader.close(); + } + public void testDeleteReaderReaderConflictUnoptimized() throws IOException{ deleteReaderReaderConflict(false); } @@ -562,4 +689,11 @@ public class TestIndexReader extends TestCase doc.add(new Field("content", value, Field.Store.NO, Field.Index.TOKENIZED)); writer.addDocument(doc); } + private void rmDir(File dir) { + File[] files = dir.listFiles(); + for (int i = 0; i < files.length; i++) { + files[i].delete(); + } + dir.delete(); + } } diff --git a/src/test/org/apache/lucene/index/TestIndexWriter.java b/src/test/org/apache/lucene/index/TestIndexWriter.java index e167b695beb..6ee6c14804f 100644 --- a/src/test/org/apache/lucene/index/TestIndexWriter.java +++ b/src/test/org/apache/lucene/index/TestIndexWriter.java @@ -1,6 +1,7 @@ package org.apache.lucene.index; import java.io.IOException; +import java.io.File; import junit.framework.TestCase; @@ -10,7 +11,10 @@ import org.apache.lucene.document.Field; import org.apache.lucene.index.IndexReader; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.store.Directory; +import org.apache.lucene.store.FSDirectory; import org.apache.lucene.store.RAMDirectory; +import org.apache.lucene.store.IndexInput; +import org.apache.lucene.store.IndexOutput; /** @@ -28,14 +32,11 @@ public class TestIndexWriter extends TestCase int i; IndexWriter.setDefaultWriteLockTimeout(2000); - IndexWriter.setDefaultCommitLockTimeout(2000); assertEquals(2000, IndexWriter.getDefaultWriteLockTimeout()); - assertEquals(2000, IndexWriter.getDefaultCommitLockTimeout()); writer = new IndexWriter(dir, new WhitespaceAnalyzer(), true); IndexWriter.setDefaultWriteLockTimeout(1000); - IndexWriter.setDefaultCommitLockTimeout(1000); // add 100 documents for (i = 0; i < 100; i++) { @@ -72,6 +73,12 @@ public class TestIndexWriter extends TestCase assertEquals(60, reader.maxDoc()); assertEquals(60, reader.numDocs()); reader.close(); + + // make sure opening a new index for create over + // this existing one works correctly: + writer = new IndexWriter(dir, new WhitespaceAnalyzer(), true); + assertEquals(0, writer.docCount()); + writer.close(); } private void addDoc(IndexWriter writer) throws IOException @@ -80,4 +87,192 @@ public class TestIndexWriter extends TestCase doc.add(new Field("content", "aaa", Field.Store.NO, Field.Index.TOKENIZED)); writer.addDocument(doc); } + + // Make sure we can open an index for create even when a + // reader holds it open (this fails pre lock-less + // commits on windows): + public void testCreateWithReader() throws IOException { + String tempDir = System.getProperty("java.io.tmpdir"); + if (tempDir == null) + throw new IOException("java.io.tmpdir undefined, cannot run test"); + File indexDir = new File(tempDir, "lucenetestindexwriter"); + Directory dir = FSDirectory.getDirectory(indexDir, true); + + // add one document & close writer + IndexWriter writer = new IndexWriter(dir, new WhitespaceAnalyzer(), true); + addDoc(writer); + writer.close(); + + // now open reader: + IndexReader reader = IndexReader.open(dir); + assertEquals("should be one document", reader.numDocs(), 1); + + // now open index for create: + writer = new IndexWriter(dir, new WhitespaceAnalyzer(), true); + assertEquals("should be zero documents", writer.docCount(), 0); + addDoc(writer); + writer.close(); + + assertEquals("should be one document", reader.numDocs(), 1); + IndexReader reader2 = IndexReader.open(dir); + assertEquals("should be one document", reader2.numDocs(), 1); + reader.close(); + reader2.close(); + rmDir(indexDir); + } + + // Simulate a writer that crashed while writing segments + // file: make sure we can still open the index (ie, + // gracefully fallback to the previous segments file), + // and that we can add to the index: + public void testSimulatedCrashedWriter() throws IOException { + Directory dir = new RAMDirectory(); + + IndexWriter writer = null; + + writer = new IndexWriter(dir, new WhitespaceAnalyzer(), true); + + // add 100 documents + for (int i = 0; i < 100; i++) { + addDoc(writer); + } + + // close + writer.close(); + + long gen = SegmentInfos.getCurrentSegmentGeneration(dir); + assertTrue("segment generation should be > 1 but got " + gen, gen > 1); + + // Make the next segments file, with last byte + // missing, to simulate a writer that crashed while + // writing segments file: + String fileNameIn = SegmentInfos.getCurrentSegmentFileName(dir); + String fileNameOut = IndexFileNames.fileNameFromGeneration(IndexFileNames.SEGMENTS, + "", + 1+gen); + IndexInput in = dir.openInput(fileNameIn); + IndexOutput out = dir.createOutput(fileNameOut); + long length = in.length(); + for(int i=0;i 1 but got " + gen, gen > 1); + + String fileNameIn = SegmentInfos.getCurrentSegmentFileName(dir); + String fileNameOut = IndexFileNames.fileNameFromGeneration(IndexFileNames.SEGMENTS, + "", + 1+gen); + IndexInput in = dir.openInput(fileNameIn); + IndexOutput out = dir.createOutput(fileNameOut); + long length = in.length(); + for(int i=0;i 1 but got " + gen, gen > 1); + + String[] files = dir.list(); + for(int i=0;iT|eOTONJ)080Af7{;+JD;)E_suLn~C-2@A!J(jC5 z|B=l9CE!W+vn1iusXCtBPD+1mmq;&rI^~X6=pT9J+{Ljays_@)7ko>&@2UwsDf}T5 z+b8EfKly;+6N$neOSQ~db|1}CpHG>)N#JJgsZY1A-}<@cs{+Rr>6@vyPqE4O6ddBx zKU1J>ef{I{Jq!WBFyI3Ent`Do7zQp!ch`b=APfzIl++w>7~}%|55ll8fU06_02_e_ zMy?gw&o2Wd@56%85Kk~#@*Oe|aJB7ZP;H3d*3#1Aa=EdSal3;c_zYn}6kbh{3wPW_+u`+HK42Tm%kdbL&TB3D)7-Q0!Yi)Y_uta_Zuuyjra&s>S# z!vecR?mEBMEZ)Iey`eHJZCl@?N|DK;Y%brdQaJnN`-oE?v-N)~WPoGr$-cdbA<)rZU332LsyS!em_jJr(9sl@Edd}~C>tpsl z4Ue7fVV`|Tu!PmtG2TFchgJFDjVmAB<3PcQ2#c%T>n5Otg%La~a)3bz!m!|M?ZXHQ zmc;$k4U5t>XL4kZe%4Ua=IMA-e!?$xPR_o7RV)5`Hd}7sxpu(n2E%NHf3Cb8Q=V?! zX$=gG>AR2LeLiin^E<)s7M_#T-zo0(s}h}hDSmR(y|DeqZl1sKtu}sV)#>%S^d{C_ zJ0grbEI>(1FE4|zZp+SyxVfb%B&4LJK2JHodSihJ{}i#NOF}(8JU++R z5-th&NHuyU8)Sa^|Nnn|z|Tw7e&C-$juH=tU6b1O^Z=Zc@gDm0pvi`@{t$h0x($82%=!kP*w(JNJ$BD zAIJbuA;G`^G61!3K{kK|SX}}Q0Jq_Q27o*pS^+Zv;R4j61=#?1kO9yZ93BHu^CPkW z!Aw8{pv@i}24E&)hyg&W8J0B0GlLBPwH1K+Az_W_1Jrbk?0`8S{m|9|h64~`h?+`} z4X6iJ$VhcDh8d7x#7rs3X6y%=fvAsB{J{!p6ag!24h9}zP|1OEGBkn_?MR|i2&9t& zGXUZ{%yvDp%g|B?q=gSN0O10nQwXXH5C#yPLLiM|R2N{jA0b}AmO^m(0Cx(Ri<&}E z9e@Z!+$p31BZXjd1ZEozIUWw6rw~*lAkM&Shanpw4@^7oRv9pjfm&sd+zv@&oWN$K MA25y08I07;vahX4Qo literal 0 HcmV?d00001 diff --git a/src/test/org/apache/lucene/index/index.prelockless.nocfs.zip b/src/test/org/apache/lucene/index/index.prelockless.nocfs.zip new file mode 100644 index 0000000000000000000000000000000000000000..881f212a13da6711e0362cdb23d6d5c16640115c GIT binary patch literal 11147 zcmb`Nd2Ccw6o+SNr_<63ShfntKme=2FbvXIBrGk|u%r%#5{N_xI#ZkkT3QOBiqHa9 zWfPEy>H%K=Dc~+w&ulDd%kz? zeZPC|H%n-`J0r(|KU+d`#!No*_1pLxULHqJ$Lj%qB)X_#R<^^vr2qIScvY;(aNKg= za`V4(HabFp=mEca9UlPVld=ONGx`6_WQeLjoKOwbco~F42b-KWibvTkes}Y#V*}vM zLAdzft0MJDt1Bas@blq}C!2lWdnfl<>&@>rKWET|iR+5rJL>4ZT^ZDX;+y~V1!hR( zu1L-ePjns*?dme^erc?G;fU=0zwbF7Y`M~S>8m)CEi|sugdKO!J^?(ujImZPPRev; zRn}D3N2}|du2{q=FLqu-zml$iz~whEb+t(`m62v=S=H?2<<+mPnlxXT;-=k~yW&F= zI7{lNDW1wus%LIJKCL%Y2;NAc+KOtg+aY$oxAw+=8M0~ zZ)n(36^?p}YS;VywHsdgWto4N%Xh(b58DOhGiF4t zyMjA&^ZFE=Jkh5hui*44Z%ILEVZJw-v-g6prFh-3#?W@3OV}sat^L&@rw8taZ6)4n zgGOrAW_{@3p(eFj%RE%G*w#Rk#*1jwc(HRW}7m$$kduGFnMfMCe>^aqcvN=@Yt#tve_a>Yqr5eY&A|;YPMLdHQQh!wrX|C zW{bSpyC2m^u|*z>Ex}V`yD`^JY}wg|cl57*aULEv>;lOcZNyf+;Y$3$PHkPBJhh2p z<|cUOoMVGa^5M=;!xcZM;LcX5EfQ0!Y`#CvFKforyHI4=2&&LhaY$M=VMw;KX)eo# zv0volZ+ICqBeLmq!;jOQnU!sxfjE-oZ>;~#5I2r6x~ckHD#MMUxnZ>y%h4T+<>?=` zt{V=MnPg5VMJ&T^vzvx9gF81uH=R4TKvwJPzznznUfJNP;JVTJ`3n|B zJ1Yn&diG{PM+FCG1GV~w$OO>*74L*8jmasjn`OWPz6;(jMh<2jaM;x zHqB+R(0CP-OyhAR78^*y!1342W_qK>U}_V1&zo0fB#u_ zfYf-k9H-4Re(sl7NsV{9db`cXD(iS2jv)Ti+Jyfy-ek{oxbbEcD{y=WFcU9(4Ffo# zw6&>}0r0$ldOV=gt8e7>u=yT#j0Ok_P8%zYm zH~^0V4PY*ztk@#pC*OaNfswi$LmwXNS0v{aNYpGTQJ%R*on~qcoP6;(C!#j zaX`Ccbc6xgAEP@AkSa#9;EIbl4&MKPHHg2_qAjAimFO59K>+v^i&6UB>^84wF{0{a zYZWI@2ychu{dtTi03D+y1HfV=1MI};G6(P&$pCW+>2B07i%o&Yhyu{vD8c|nj3@xz zjrK8s5hDs98>jdb?im2qjbwnih_XO30CZ9~)FNJ31NN>k#4!E(y}aH!MzThV6ZnY&HRW$zw4C`cSLofTS@pmr(XaaRetgAVrL1TNDA07@++zDu!XgLrfJT z*$~AMyl()kV1&C!dtr9gD|(FY=9!e-^C%ktt%li@v}F_Gw{z%SgTF&hlaSZ z2ZV2n+crP;)d~Z^My7wAf{l!A650IN7c0`KZ-n8Zc(F}F)!Q~d3Sis(WB^-2{i_n< zvr)`X2AE4o_gcmLC;;7S74xG2bgxy+j{?{>KMFt}*|PbiUoV>FjWjb!sn*3Gbk971{h`0E-b$f$m2AW%Hu|bc~|1`B4Bm zMjspE24CKSqmBa5=6Bx!up?Urn2Si8UsN)`^y~LY<|o&i&ww_+YqI%K0NVV9aqr^h zCj-nCpkq|W0o?p30BwFJIe?p=3^3P#HorCoFwBnv&@mbd^AyaF0?_8S-T+v~h&I2w z0Fcd(>PYfO_@Z5mRLxIDuxF`%OfMof10wOEJ!~la(4M_L$K0_HzCAJ&KratOcl;l^ C3FFNG literal 0 HcmV?d00001 diff --git a/src/test/org/apache/lucene/store/TestLockFactory.java b/src/test/org/apache/lucene/store/TestLockFactory.java index 8494f23d955..d4a5ae4f9c7 100755 --- a/src/test/org/apache/lucene/store/TestLockFactory.java +++ b/src/test/org/apache/lucene/store/TestLockFactory.java @@ -58,9 +58,9 @@ public class TestLockFactory extends TestCase { // Both write lock and commit lock should have been created: assertEquals("# of unique locks created (after instantiating IndexWriter)", - 2, lf.locksCreated.size()); - assertTrue("# calls to makeLock <= 2 (after instantiating IndexWriter)", - lf.makeLockCount > 2); + 1, lf.locksCreated.size()); + assertTrue("# calls to makeLock is 0 (after instantiating IndexWriter)", + lf.makeLockCount >= 1); for(Enumeration e = lf.locksCreated.keys(); e.hasMoreElements();) { String lockName = (String) e.nextElement(); @@ -90,6 +90,7 @@ public class TestLockFactory extends TestCase { try { writer2 = new IndexWriter(dir, new WhitespaceAnalyzer(), false); } catch (Exception e) { + e.printStackTrace(System.out); fail("Should not have hit an IOException with no locking"); } @@ -234,6 +235,7 @@ public class TestLockFactory extends TestCase { try { writer2 = new IndexWriter(indexDirName, new WhitespaceAnalyzer(), false); } catch (IOException e) { + e.printStackTrace(System.out); fail("Should not have hit an IOException with locking disabled"); } @@ -266,6 +268,7 @@ public class TestLockFactory extends TestCase { try { fs2 = FSDirectory.getDirectory(indexDirName, true, lf); } catch (IOException e) { + e.printStackTrace(System.out); fail("Should not have hit an IOException because LockFactory instances are the same"); } @@ -294,7 +297,6 @@ public class TestLockFactory extends TestCase { public void _testStressLocks(LockFactory lockFactory, String indexDirName) throws IOException { FSDirectory fs1 = FSDirectory.getDirectory(indexDirName, true, lockFactory); - // fs1.setLockFactory(NoLockFactory.getNoLockFactory()); // First create a 1 doc index: IndexWriter w = new IndexWriter(fs1, new WhitespaceAnalyzer(), true); @@ -405,6 +407,7 @@ public class TestLockFactory extends TestCase { hitException = true; System.out.println("Stress Test Index Writer: creation hit unexpected exception: " + e.toString()); e.printStackTrace(System.out); + break; } if (writer != null) { try { @@ -413,6 +416,7 @@ public class TestLockFactory extends TestCase { hitException = true; System.out.println("Stress Test Index Writer: addDoc hit unexpected exception: " + e.toString()); e.printStackTrace(System.out); + break; } try { writer.close(); @@ -420,6 +424,7 @@ public class TestLockFactory extends TestCase { hitException = true; System.out.println("Stress Test Index Writer: close hit unexpected exception: " + e.toString()); e.printStackTrace(System.out); + break; } writer = null; } @@ -446,6 +451,7 @@ public class TestLockFactory extends TestCase { hitException = true; System.out.println("Stress Test Index Searcher: create hit unexpected exception: " + e.toString()); e.printStackTrace(System.out); + break; } if (searcher != null) { Hits hits = null; @@ -455,6 +461,7 @@ public class TestLockFactory extends TestCase { hitException = true; System.out.println("Stress Test Index Searcher: search hit unexpected exception: " + e.toString()); e.printStackTrace(System.out); + break; } // System.out.println(hits.length() + " total results"); try { @@ -463,6 +470,7 @@ public class TestLockFactory extends TestCase { hitException = true; System.out.println("Stress Test Index Searcher: close hit unexpected exception: " + e.toString()); e.printStackTrace(System.out); + break; } searcher = null; } diff --git a/xdocs/fileformats.xml b/xdocs/fileformats.xml index c3a1f34c1fa..ad7aa8c7588 100644 --- a/xdocs/fileformats.xml +++ b/xdocs/fileformats.xml @@ -14,7 +14,7 @@

This document defines the index file formats used - in Lucene version 2.0. If you are using a different + in Lucene version 2.1. If you are using a different version of Lucene, please consult the copy of docs/fileformats.html that was distributed with the version you are using. @@ -43,6 +43,18 @@ describing how file formats have changed from prior versions.

+

+ In version 2.1, the file format was changed to allow + lock-less commits (ie, no more commit lock). The + change is fully backwards compatible: you can open a + pre-2.1 index for searching or adding/deleting of + docs. When the new segments file is saved + (committed), it will be written in the new file format + (meaning no specific "upgrade" process is needed). + But note that once a commit has occurred, pre-2.1 + Lucene will not be able to read the index. +

+
@@ -260,6 +272,18 @@ required.

+

+ As of version 2.1 (lock-less commits), file names are + never re-used (there is one exception, "segments.gen", + see below). That is, when any file is saved to the + Directory it is given a never before used filename. + This is achieved using a simple generations approach. + For example, the first segments file is segments_1, + then segments_2, etc. The generation is a sequential + long integer represented in alpha-numeric (base 36) + form. +

+
@@ -696,22 +720,48 @@

The active segments in the index are stored in the - segment info file. An index only has - a single file in this format, and it is named "segments". - This lists each segment by name, and also contains the size of each - segment. + segment info file, segments_N. There may + be one or more segments_N files in the + index; however, the one with the largest + generation is the active one (when older + segments_N files are present it's because they + temporarily cannot be deleted, or, a writer is in + the process of committing). This file lists each + segment by name, has details about the separate + norms and deletion files, and also contains the + size of each segment.

+

+ As of 2.1, there is also a file + segments.gen. This file contains the + current generation (the _N in + segments_N) of the index. This is + used only as a fallback in case the current + generation cannot be accurately determined by + directory listing alone (as is the case for some + NFS clients with time-based directory cache + expiraation). This file simply contains an Int32 + version header (SegmentInfos.FORMAT_LOCKLESS = + -2), followed by the generation recorded as Int64, + written twice. +

+

+ Pre-2.1: Segments --> Format, Version, NameCounter, SegCount, <SegName, SegSize>SegCount

- -

- Format, NameCounter, SegCount, SegSize --> UInt32 +

+ 2.1 and above: + Segments --> Format, Version, NameCounter, SegCount, <SegName, SegSize, DelGen, NumField, NormGenNumField >SegCount, IsCompoundFile

- Version --> UInt64 + Format, NameCounter, SegCount, SegSize, NumField --> Int32 +

+ +

+ Version, DelGen, NormGen --> Int64

@@ -719,7 +769,11 @@

- Format is -1 in Lucene 1.4. + IsCompoundFile --> Int8 +

+ +

+ Format is -1 as of Lucene 1.4 and -2 as of Lucene 2.1.

@@ -740,65 +794,79 @@ SegSize is the number of documents contained in the segment index.

+

+ DelGen is the generation count of the separate + deletes file. If this is -1, there are no + separate deletes. If it is 0, this is a pre-2.1 + segment and you must check filesystem for the + existence of _X.del. Anything above zero means + there are separate deletes (_X_N.del). +

+ +

+ NumField is the size of the array for NormGen, or + -1 if there are no NormGens stored. +

+ +

+ NormGen records the generation of the separate + norms files. If NumField is -1, there are no + normGens stored and they are all assumed to be 0 + when the segment file was written pre-2.1 and all + assumed to be -1 when the segments file is 2.1 or + above. The generation then has the same meaning + as delGen (above). +

+ +

+ IsCompoundFile records whether the segment is + written as a compound file or not. If this is -1, + the segment is not a compound file. If it is 1, + the segment is a compound file. Else it is 0, + which means we check filesystem to see if _X.cfs + exists. +

+ - +

- Several files are used to indicate that another - process is using an index. Note that these files are not + A write lock is used to indicate that another + process is writing to the index. Note that this file is not stored in the index directory itself, but rather in the system's temporary directory, as indicated in the Java system property "java.io.tmpdir".

-
    -
  • -

    - When a file named "commit.lock" - is present, a process is currently re-writing the "segments" - file and deleting outdated segment index files, or a process is - reading the "segments" - file and opening the files of the segments it names. This lock file - prevents files from being deleted by another process after a process - has read the "segments" - file but before it has managed to open all of the files of the - segments named therein. -

    -
  • +

    + The write lock is named "XXXX-write.lock" where + XXXX is typically a unique prefix computed by the + directory path to the index. When this file is + present, a process is currently adding documents + to an index, or removing files from that index. + This lock file prevents several processes from + attempting to modify an index at the same time. +

    + +

    + Note that prior to version 2.1, Lucene also used a + commit lock. This was removed in 2.1. +

    -
  • -

    - When a file named "write.lock" - is present, a process is currently adding documents to an index, or - removing files from that index. This lock file prevents several - processes from attempting to modify an index at the same time. -

    -
  • -

- A file named "deletable" - contains the names of files that are no longer used by the index, but - which could not be deleted. This is only used on Win32, where a - file may not be deleted while it is still open. On other platforms - the file contains only null bytes. + Prior to Lucene 2.1 there was a file "deletable" + that contained details about files that need to be + deleted. As of 2.1, a writer dynamically computes + the files that are deletable, instead, so no file + is written.

-

- Deletable --> DeletableCount, - <DelableName>DeletableCount -

- -

DeletableCount --> UInt32 -

-

DeletableName --> - String -

- Lock Files + Lock File

- Several files are used to indicate that another - process is using an index. Note that these files are not + A write lock is used to indicate that another + process is writing to the index. Note that this file is not stored in the index directory itself, but rather in the system's temporary directory, as indicated in the Java system property "java.io.tmpdir".

-
    -
  • -

    - When a file named "commit.lock" - is present, a process is currently re-writing the "segments" - file and deleting outdated segment index files, or a process is - reading the "segments" - file and opening the files of the segments it names. This lock file - prevents files from being deleted by another process after a process - has read the "segments" - file but before it has managed to open all of the files of the - segments named therein. -

    -
  • - -
  • -

    - When a file named "write.lock" - is present, a process is currently adding documents to an index, or - removing files from that index. This lock file prevents several - processes from attempting to modify an index at the same time. -

    -
  • -
+

+ The write lock is named "XXXX-write.lock" where + XXXX is typically a unique prefix computed by the + directory path to the index. When this file is + present, a process is currently adding documents + to an index, or removing files from that index. + This lock file prevents several processes from + attempting to modify an index at the same time. +

+

+ Note that prior to version 2.1, Lucene also used a + commit lock. This was removed in 2.1. +


- A file named "deletable" - contains the names of files that are no longer used by the index, but - which could not be deleted. This is only used on Win32, where a - file may not be deleted while it is still open. On other platforms - the file contains only null bytes. -

-

- Deletable --> DeletableCount, - <DelableName>DeletableCount -

-

DeletableCount --> UInt32 -

-

DeletableName --> - String + Prior to Lucene 2.1 there was a file "deletable" + that contained details about files that need to be + deleted. As of 2.1, a writer dynamically computes + the files that are deletable, instead, so no file + is written.