LUCENE-4008: Use pegdown to transform MIGRATE.txt and other text-only files to readable HTML. Please alsows run ant documentation when you have changed anything on those files to check output.

git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1328978 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Uwe Schindler 2012-04-22 21:15:27 +00:00
parent b534190141
commit a20aa3e0c9
6 changed files with 584 additions and 540 deletions

View File

@ -1,36 +1,37 @@
# JRE Version Migration Guide
If possible, use the same JRE major version at both index and search time.
When upgrading to a different JRE major version, consider re-indexing.
Different JRE major versions may implement different versions of Unicode,
which will change the way some parts of Lucene treat your text.
For example: with Java 1.4, LetterTokenizer will split around the character U+02C6,
For example: with Java 1.4, `LetterTokenizer` will split around the character U+02C6,
but with Java 5 it will not.
This is because Java 1.4 implements Unicode 3, but Java 5 implements Unicode 4.
For reference, JRE major versions with their corresponding Unicode versions:
Java 1.4, Unicode 3.0
Java 5, Unicode 4.0
Java 6, Unicode 4.0
Java 7, Unicode 6.0
* Java 1.4, Unicode 3.0
* Java 5, Unicode 4.0
* Java 6, Unicode 4.0
* Java 7, Unicode 6.0
In general, whether or not you need to re-index largely depends upon the data that
you are searching, and what was changed in any given Unicode version. For example,
if you are completely sure that your content is limited to the "Basic Latin" range
of Unicode, you can safely ignore this.
Special Notes:
## Special Notes: LUCENE 2.9 TO 3.0, JAVA 1.4 TO JAVA 5 TRANSITION
LUCENE 2.9 TO 3.0, JAVA 1.4 TO JAVA 5 TRANSITION
* StandardAnalyzer will return the same results under Java 5 as it did under
* `StandardAnalyzer` will return the same results under Java 5 as it did under
Java 1.4. This is because it is largely independent of the runtime JRE for
Unicode support, (with the exception of lowercasing). However, no changes to
casing have occurred in Unicode 4.0 that affect StandardAnalyzer, so if you are
using this Analyzer you are NOT affected.
* SimpleAnalyzer, StopAnalyzer, LetterTokenizer, LowerCaseFilter, and
LowerCaseTokenizer may return different results, along with many other Analyzers
and TokenStreams in Lucene's analysis modules. If you are using one of these
* `SimpleAnalyzer`, `StopAnalyzer`, `LetterTokenizer`, `LowerCaseFilter`, and
`LowerCaseTokenizer` may return different results, along with many other `Analyzer`s
and `TokenStream`s in Lucene's analysis modules. If you are using one of these
components, you may be affected.

File diff suppressed because it is too large Load Diff

View File

@ -1,52 +1,21 @@
Apache Lucene README file
# Apache Lucene README file
INTRODUCTION
## Introduction
Lucene is a Java full-text search engine. Lucene is not a complete
application, but rather a code library and API that can easily be used
to add search capabilities to applications.
The Lucene web site is at:
http://lucene.apache.org/
* The Lucene web site is at: http://lucene.apache.org/
* Please join the Lucene-User mailing list by sending a message to:
java-user-subscribe@lucene.apache.org
Please join the Lucene-User mailing list by sending a message to:
java-user-subscribe@lucene.apache.org
Files in a binary distribution:
## Files in a binary distribution
Files are organized by module, for example in core/:
core/lucene-core-XX.jar
* `core/lucene-core-XX.jar`:
The compiled core Lucene library.
Additional modules contain the same structure:
analysis/common/: Analyzers for indexing content in different languages and domains
analysis/icu/: Analysis integration with ICU (International Components for Unicode)
analysis/kuromoji/: Analyzer for indexing Japanese
analysis/morfologik/: Analyzer for indexing Polish
analysis/phonetic/: Analyzer for indexing phonetic signatures (for sounds-alike search)
analysis/smartcn/: Analyzer for indexing Chinese
analysis/stempel/: Analyzer for indexing Polish
analysis/uima/: Analysis integration with Apache UIMA
benchmark/: System for benchmarking Lucene
demo/: Simple example code
facet/: Faceted indexing and search capabilities
grouping/: Search result grouping
highlighter/: Highlights search keywords in results
join/: Index-time and Query-time joins for normalized content
memory/: Single-document in memory index implementation
misc/: Index tools and other miscellaneous code
queries/: Filters and Queries that add to core Lucene
queryparser/: Query parsers and parsing framework
sandbox/: Various third party contributions and new ideas.
spatial/: Geospatial search
suggest/: Auto-suggest and Spellchecking support
test-framework/: Test Framework for testing Lucene-based applications
docs/index.html
The contents of the Lucene website.
docs/api/index.html
The Javadoc Lucene API documentation. This includes the core library,
the test framework, and the demo, as well as all other modules.
To review the documentation, read the main documentation page, located at:
`docs/index.html`

View File

@ -184,11 +184,11 @@
</target>
<target name="documentation" description="Generate all documentation"
depends="javadocs,changes-to-html,doc-index"/>
depends="javadocs,changes-to-html,process-webpages"/>
<target name="javadoc" depends="javadocs"/>
<target name="javadocs" description="Generate javadoc" depends="javadocs-lucene-core, javadocs-modules, javadocs-test-framework"/>
<target name="doc-index">
<target name="process-webpages" depends="resolve-pegdown">
<pathconvert pathsep="|" dirsep="/" property="buildfiles">
<fileset dir="." includes="**/build.xml" excludes="build.xml,analysis/*,build/**,tools/**,backwards/**,site/**"/>
</pathconvert>
@ -205,6 +205,12 @@
<param name="buildfiles" expression="${buildfiles}"/>
<param name="version" expression="${version}"/>
</xslt>
<pegdown todir="${javadoc.dir}">
<fileset dir="." includes="MIGRATE.txt,JRE_VERSION_MIGRATION.txt"/>
<globmapper from="*.txt" to="*.html"/>
</pegdown>
<copy todir="${javadoc.dir}">
<fileset dir="site/html" includes="**/*"/>
</copy>

View File

@ -1506,4 +1506,60 @@ ${tests-output}/junit4-*.suites - per-JVM executed suites
</scp>
</sequential>
</macrodef>
<!-- PEGDOWN macro: Before using depend on the target "resolve-pegdown" -->
<target name="resolve-pegdown" unless="pegdown.loaded">
<ivy:cachepath organisation="org.pegdown" module="pegdown" revision="1.1.0"
inline="true" conf="default" type="jar" transitive="true" pathid="pegdown.classpath"/>
<property name="pegdown.loaded" value="true"/>
</target>
<macrodef name="pegdown">
<attribute name="todir"/>
<attribute name="flatten" default="false"/>
<attribute name="overwrite" default="false"/>
<element name="nested" optional="false" implicit="true"/>
<sequential>
<copy todir="@{todir}" flatten="@{flatten}" overwrite="@{overwrite}" verbose="true"
preservelastmodified="false" encoding="UTF-8" outputencoding="UTF-8"
>
<filterchain>
<tokenfilter>
<filetokenizer/>
<replaceregex pattern="\b(LUCENE|SOLR)\-\d+\b" replace="[\0](https://issues.apache.org/jira/browse/\0)" flags="gs"/>
<scriptfilter language="javascript" classpathref="pegdown.classpath"><![CDATA[
importClass(java.lang.StringBuilder);
importClass(org.pegdown.PegDownProcessor);
importClass(org.pegdown.Extensions);
importClass(org.pegdown.FastEncoder);
var markdownSource = self.getToken();
var title = undefined;
if (markdownSource.search(/^(#+\s*)?(.+)[\n\r]/) == 0) {
title = RegExp.$2;
// Convert the first line into a markdown heading, if it is not already:
if (RegExp.$1 == '') {
markdownSource = '# ' + markdownSource;
}
}
var processor = new PegDownProcessor(
Extensions.ABBREVIATIONS | Extensions.AUTOLINKS |
Extensions.FENCED_CODE_BLOCKS | Extensions.SMARTS
);
var html = new StringBuilder('<html>\n<head>\n');
if (title) {
html.append('<title>').append(FastEncoder.encode(title)).append('</title>\n');
}
html.append('<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">\n')
.append('</head>\n<body>\n')
.append(processor.markdownToHtml(markdownSource))
.append('\n</body>\n</html>\n');
self.setToken(html.toString());
]]></scriptfilter>
</tokenfilter>
</filterchain>
<nested/>
</copy>
</sequential>
</macrodef>
</project>

View File

@ -37,11 +37,14 @@
<body>
<div><img src="lucene_green_300.gif"/></div>
<h1><xsl:text>Apache Lucene </xsl:text><xsl:value-of select="$version"/><xsl:text> Documentation</xsl:text></h1>
<p>Lucene is a Java full-text search engine. Lucene is not a complete application,
but rather a code library and API that can easily be used to add search capabilities
to applications.</p>
<p>
This is the official documentation for <b><xsl:text>Apache Lucene </xsl:text>
<xsl:value-of select="$version"/></b>. Additional documentation is available in the
<a href="http://wiki.apache.org/lucene-java">Wiki</a>.
</p>
</p>
<h2>Getting Started</h2>
<p>The following section is intended as a "getting started" guide. It has three
audiences: first-time users looking to install Apache Lucene in their
@ -60,6 +63,8 @@
<h2>Reference Documents</h2>
<ul>
<li><a href="changes/Changes.html">Changes</a>: List of changes in this release.</li>
<li><a href="MIGRATE.html">Migration Guide</a>: What changed in Lucene 4; how to migrate code from Lucene 3.x.</li>
<li><a href="JRE_VERSION_MIGRATION.html">JRE Version Migration</a>: Information about upgrading between major JRE versions.</li>
<li><a href="fileformats.html">File Formats</a>: Guide to the index format used by Lucene.</li>
<li><a href="core/org/apache/lucene/search/package-summary.html#package_description">Search and Scoring in Lucene</a>: Introduction to how Lucene scores documents.</li>
<li><a href="core/org/apache/lucene/search/similarities/TFIDFSimilarity.html">Classic Scoring Formula</a>: Formula of Lucene's classic <a href="http://en.wikipedia.org/wiki/Vector_Space_Model">Vector Space</a> implementation. (look <a href="core/org/apache/lucene/search/similarities/package-summary.html#package_description">here</a> for other models)</li>