LUCENE-2517: improve changes-to-html

git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@960374 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Michael McCandless 2010-07-04 18:14:31 +00:00
parent e6e18a81f6
commit 0e631b8031
2 changed files with 35 additions and 18 deletions

View File

@ -23,17 +23,17 @@ Changes in backwards compatibility policy
- Directory.copy/Directory.copyTo now copies all files (not just
index files), since what is and isn't and index file is now
dependent on the codecs used. (Mike McCandless)
dependent on the codecs used.
- UnicodeUtil now uses BytesRef for UTF-8 output, and some method
signatures have changed to CharSequence. These are internal APIs
and subject to change suddenly. (Robert Muir, Mike McCandless)
and subject to change suddenly.
- Positional queries (PhraseQuery, *SpanQuery) will now throw an
exception if use them on a field that omits positions during
indexing (previously they silently returned no results).
- FieldCache.(Byte,Short,Int,Long,Float,Double}Parser's API has
- FieldCache.{Byte,Short,Int,Long,Float,Double}Parser's API has
changed -- each parse method now takes a BytesRef instead of a
String. If you have an existing Parser, a simple way to fix it is
invoke BytesRef.utf8ToString, and pass that String to your
@ -57,6 +57,8 @@ Changes in backwards compatibility policy
an IllegalArgumentException, because the NTS does not support
TermAttribute/CharTermAttribute. If you want to further filter
or attach Payloads to NTS, use the new NumericTermAttribute.
(Mike McCandless, Robert Muir, Uwe Schindler, Mark Miller, Michael Busch)
* LUCENE-2386: IndexWriter no longer performs an empty commit upon new index
creation. Previously, if you passed an empty Directory and set OpenMode to
@ -119,7 +121,7 @@ New features
* LUCENE-1990: Adds internal packed ints implementation, to be used
for more efficient storage of int arrays when the values are
bounded, for example for storing the terms dict index Toke Toke
bounded, for example for storing the terms dict index (Toke
Eskildsen via Mike McCandless)
* LUCENE-2321: Cutover to a more RAM efficient packed-ints based
@ -215,7 +217,8 @@ Changes in backwards compatibility policy
the IndexWriter for a MergePolicy exactly once. You can change references to
'writer' from <code>writer.doXYZ()</code> to <code>writer.get().doXYZ()</code>
(it is also advisable to add an <code>assert writer != null;</code> before you
access the wrapped IndexWriter.
access the wrapped IndexWriter.)
In addition, MergePolicy only exposes a default constructor, and the one that
took IndexWriter as argument has been removed from all MergePolicy extensions.
(Shai Erera via Mike McCandless)
@ -265,6 +268,7 @@ Changes in runtime behavior
invokes a merge on the incoming and target segments, but instead copies the
segments to the target index. You can call maybeMerge or optimize after this
method completes, if you need to.
In addition, Directory.copyTo* were removed in favor of copy which takes the
target Directory, source and target files as arguments, and copies the source
file to the target Directory under the target file name. (Shai Erera)
@ -353,9 +357,9 @@ API Changes
next commit). (Shai Erera)
* LUCENE-2455: IndexWriter.addIndexesNoOptimize was renamed to addIndexes.
IndexFileNames.segmentFileName now takes another parameter to accomodate
IndexFileNames.segmentFileName now takes another parameter to accommodate
custom file names. You should use this method to name all your files.
(Shai Erera)
(Shai Erera)
* LUCENE-2481: SnapshotDeletionPolicy.snapshot() and release() were replaced
with equivalent ones that take a String (id) as argument. You can pass

View File

@ -62,8 +62,8 @@ for (my $line_num = 0 ; $line_num <= $#lines ; ++$line_num) {
if (/\s*===+\s*(.*?)\s*===+\s*/) { # New-style release headings
$release = $1;
$release =~ s/^release\s*//i; # Trim "Release " prefix
($release, $relinfo) = ($release =~ /^(\d+(?:\.\d+)*|Trunk)\s*(.*)/i);
$release =~ s/^(?:release|lucene)\s*//i; # Trim "Release " or "Lucene " prefix
($release, $relinfo) = ($release =~ /^(\d+(?:\.(?:\d+|[xyz]))*|Trunk)\s*(.*)/i);
$relinfo =~ s/\s*:\s*$//; # Trim trailing colon
$relinfo =~ s/^\s*,\s*//; # Trim leading comma
($reldate, $relinfo) = get_release_date($release, $relinfo);
@ -164,6 +164,14 @@ for (my $line_num = 0 ; $line_num <= $#lines ; ++$line_num) {
}
}
# Recognize IDs of top level nodes of the most recent two releases,
# escaping JavaScript regex metacharacters, e.g.: "^(?:trunk|2\\\\.4\\\\.0)"
my $first_relid_regex = $first_relid;
$first_relid_regex =~ s!([.+*?{}()|^$/\[\]\\])!\\\\\\\\$1!g;
my $second_relid_regex = $second_relid;
$second_relid_regex =~ s!([.+*?{}()|^$/\[\]\\])!\\\\\\\\$1!g;
my $newer_version_regex = "^(?:$first_relid_regex|$second_relid_regex)";
#
# Print HTML-ified version to STDOUT
#
@ -258,7 +266,7 @@ print<<"__HTML_HEADER__";
}
var newerRegex = new RegExp("^(?:trunk|2\\\\.4\\\\.0)");
var newerRegex = new RegExp("$newer_version_regex");
function isOlder(listId) {
return ! newerRegex.test(listId);
}
@ -388,16 +396,21 @@ for my $rel (@releases) {
for my $itemnum (1..$#{$items}) {
my $item = $items->[$itemnum];
$item =~ s:&:&amp;:g; # Escape HTML metachars,
$item =~ s:<(?!/?code>):&lt;:gi; # but leave <code> tags intact
$item =~ s:(?<!code)>:&gt;:gi; # and add <pre> tags so that
$item =~ s:<code>:<code><pre>:gi; # whitespace is preserved in the
$item =~ s:\s*</code>:</pre></code>:gi; # output.
$item =~ s:&:&amp;:g; # Escape HTML metachars, but leave
$item =~ s:<(?!/?code>):&lt;:gi; # <code> tags intact and add <pre>
$item =~ s:(?<!code)>:&gt;:gi; # wrappers for non-inline sections
$item =~ s{((?:^|.*\n)\s*)<code>(?!</code>.+)(.+)</code>(?![ \t]*\S)}
{
my $prefix = $1;
my $code = $2;
$code =~ s/\s+$//;
"$prefix<code><pre>$code</pre></code>"
}gise;
# Put attributions on their own lines.
# Check for trailing parenthesized attribution with no following period.
# Exclude things like "(see #3 above)" and "(use the bug number instead of xxxx)"
unless ($item =~ s:\s*(\((?!see #|use the bug number)[^)"]+?\))\s*$:\n<br /><span class="attrib">$1</span>:) {
unless ($item =~ s:\s*(\((?!see #|use the bug number)[^()"]+?\))\s*$:\n<br /><span class="attrib">$1</span>:) {
# If attribution is not found, then look for attribution with a
# trailing period, but try not to include trailing parenthesized things
# that are not attributions.
@ -405,9 +418,9 @@ for my $rel (@releases) {
# Rule of thumb: if a trailing parenthesized expression with a following
# period does not contain "LUCENE-XXX", and it either has three or
# fewer words or it includes the word "via" or the phrase "updates from",
# then it is considered to be an attribution.
# then it is considered to be an attribution.
$item =~ s{(\s*(\((?!see \#|use the bug number)[^)"]+?\)))
$item =~ s{(\s*(\((?!see \#|use the bug number)[^()"]+?\)))
((?:\.|(?i:\.?\s*Issue\s+\d{3,}|LUCENE-\d+)\.?)\s*)$}
{
my $subst = $1; # default: no change