LUCENE-4008: Use pegdown to transform MIGRATE.txt and other text-only files to readable HTML. Please alsows run ant documentation when you have changed anything on those files to check output.

git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1328978 13f79535-47bb-0310-9956-ffa450edef68
2012-04-22 21:15:27 +00:00 · 2012-04-22 21:15:27 +00:00 · a20aa3e0c9
parent b534190141
commit a20aa3e0c9
6 changed files with 584 additions and 540 deletions
--- a/lucene/JRE_VERSION_MIGRATION.txt
+++ b/lucene/JRE_VERSION_MIGRATION.txt
@ -1,36 +1,37 @@
+# JRE Version Migration Guide
+
 If possible, use the same JRE major version at both index and search time.
 When upgrading to a different JRE major version, consider re-indexing. 

 Different JRE major versions may implement different versions of Unicode,
 which will change the way some parts of Lucene treat your text.

-For example: with Java 1.4, LetterTokenizer will split around the character U+02C6,
+For example: with Java 1.4, `LetterTokenizer` will split around the character U+02C6,
 but with Java 5 it will not.
 This is because Java 1.4 implements Unicode 3, but Java 5 implements Unicode 4.

 For reference, JRE major versions with their corresponding Unicode versions:
-Java 1.4, Unicode 3.0
-Java 5, Unicode 4.0
-Java 6, Unicode 4.0
-Java 7, Unicode 6.0
+
+ * Java 1.4, Unicode 3.0
+ * Java 5, Unicode 4.0
+ * Java 6, Unicode 4.0
+ * Java 7, Unicode 6.0

 In general, whether or not you need to re-index largely depends upon the data that
 you are searching, and what was changed in any given Unicode version. For example, 
 if you are completely sure that your content is limited to the "Basic Latin" range 
 of Unicode, you can safely ignore this. 

-Special Notes:
+## Special Notes: LUCENE 2.9 TO 3.0, JAVA 1.4 TO JAVA 5 TRANSITION

-LUCENE 2.9 TO 3.0, JAVA 1.4 TO JAVA 5 TRANSITION
-
-* StandardAnalyzer will return the same results under Java 5 as it did under 
+* `StandardAnalyzer` will return the same results under Java 5 as it did under 
 Java 1.4. This is because it is largely independent of the runtime JRE for
 Unicode support, (with the exception of lowercasing).  However, no changes to 
 casing have occurred in Unicode 4.0 that affect StandardAnalyzer, so if you are 
 using this Analyzer you are NOT affected.

-* SimpleAnalyzer, StopAnalyzer, LetterTokenizer, LowerCaseFilter, and 
-LowerCaseTokenizer may return different results, along with many other Analyzers
-and TokenStreams in Lucene's analysis modules. If you are using one of these 
+* `SimpleAnalyzer`, `StopAnalyzer`, `LetterTokenizer`, `LowerCaseFilter`, and 
+`LowerCaseTokenizer` may return different results, along with many other `Analyzer`s
+and `TokenStream`s in Lucene's analysis modules. If you are using one of these 
 components, you may be affected.

--- a/lucene/MIGRATE.txt
+++ b/lucene/MIGRATE.txt
--- a/lucene/README.txt
+++ b/lucene/README.txt
@ -1,52 +1,21 @@
-Apache Lucene README file
+# Apache Lucene README file

-INTRODUCTION
+## Introduction

 Lucene is a Java full-text search engine.  Lucene is not a complete
 application, but rather a code library and API that can easily be used
 to add search capabilities to applications.

-The Lucene web site is at:
-  http://lucene.apache.org/
+ * The Lucene web site is at: http://lucene.apache.org/
+ * Please join the Lucene-User mailing list by sending a message to:
+   java-user-subscribe@lucene.apache.org

-Please join the Lucene-User mailing list by sending a message to:
-  java-user-subscribe@lucene.apache.org
-
-Files in a binary distribution:
+## Files in a binary distribution

 Files are organized by module, for example in core/:

-core/lucene-core-XX.jar
+* `core/lucene-core-XX.jar`:
  The compiled core Lucene library.

-Additional modules contain the same structure:
-
-analysis/common/: Analyzers for indexing content in different languages and domains
-analysis/icu/: Analysis integration with ICU (International Components for Unicode)
-analysis/kuromoji/: Analyzer for indexing Japanese
-analysis/morfologik/: Analyzer for indexing Polish
-analysis/phonetic/: Analyzer for indexing phonetic signatures (for sounds-alike search)
-analysis/smartcn/: Analyzer for indexing Chinese
-analysis/stempel/: Analyzer for indexing Polish
-analysis/uima/: Analysis integration with Apache UIMA
-benchmark/: System for benchmarking Lucene
-demo/: Simple example code
-facet/: Faceted indexing and search capabilities
-grouping/: Search result grouping
-highlighter/: Highlights search keywords in results
-join/: Index-time and Query-time joins for normalized content
-memory/: Single-document in memory index implementation
-misc/: Index tools and other miscellaneous code
-queries/: Filters and Queries that add to core Lucene
-queryparser/: Query parsers and parsing framework
-sandbox/: Various third party contributions and new ideas.
-spatial/: Geospatial search
-suggest/: Auto-suggest and Spellchecking support
-test-framework/:  Test Framework for testing Lucene-based applications
-  
-docs/index.html
-  The contents of the Lucene website.
-
-docs/api/index.html
-  The Javadoc Lucene API documentation.  This includes the core library, 
-  the test framework, and the demo, as well as all other modules.
+To review the documentation, read the main documentation page, located at:
+`docs/index.html`
--- a/lucene/build.xml
+++ b/lucene/build.xml
@ -184,11 +184,11 @@
  </target>

  <target name="documentation" description="Generate all documentation"
-    depends="javadocs,changes-to-html,doc-index"/>
+    depends="javadocs,changes-to-html,process-webpages"/>
  <target name="javadoc" depends="javadocs"/>
  <target name="javadocs" description="Generate javadoc" depends="javadocs-lucene-core, javadocs-modules, javadocs-test-framework"/>
  
-  <target name="doc-index">
+  <target name="process-webpages" depends="resolve-pegdown">
    <pathconvert pathsep="|" dirsep="/" property="buildfiles">
      <fileset dir="." includes="**/build.xml" excludes="build.xml,analysis/*,build/**,tools/**,backwards/**,site/**"/>
    </pathconvert>
@ -205,6 +205,12 @@
      <param name="buildfiles" expression="${buildfiles}"/>
      <param name="version" expression="${version}"/>
    </xslt>
+    
+    <pegdown todir="${javadoc.dir}">
+      <fileset dir="." includes="MIGRATE.txt,JRE_VERSION_MIGRATION.txt"/>
+      <globmapper from="*.txt" to="*.html"/>
+    </pegdown>
+
    <copy todir="${javadoc.dir}">
      <fileset dir="site/html" includes="**/*"/>
    </copy>
--- a/lucene/common-build.xml
+++ b/lucene/common-build.xml
@ -1506,4 +1506,60 @@ ${tests-output}/junit4-*.suites     - per-JVM executed suites
      </scp>
    </sequential>
  </macrodef>
+  
+  <!-- PEGDOWN macro: Before using depend on the target "resolve-pegdown" -->
+  
+  <target name="resolve-pegdown" unless="pegdown.loaded">
+    <ivy:cachepath organisation="org.pegdown" module="pegdown" revision="1.1.0"
+      inline="true" conf="default" type="jar" transitive="true" pathid="pegdown.classpath"/>
+    <property name="pegdown.loaded" value="true"/>
+  </target>
+  
+  <macrodef name="pegdown">
+    <attribute name="todir"/>
+    <attribute name="flatten" default="false"/>
+    <attribute name="overwrite" default="false"/>
+    <element name="nested" optional="false" implicit="true"/>
+    <sequential>
+      <copy todir="@{todir}" flatten="@{flatten}" overwrite="@{overwrite}" verbose="true"
+        preservelastmodified="false" encoding="UTF-8" outputencoding="UTF-8"
+      >
+        <filterchain>
+          <tokenfilter>
+            <filetokenizer/>
+            <replaceregex pattern="\b(LUCENE|SOLR)\-\d+\b" replace="[\0](https://issues.apache.org/jira/browse/\0)" flags="gs"/>
+            <scriptfilter language="javascript" classpathref="pegdown.classpath"><![CDATA[
+              importClass(java.lang.StringBuilder);
+              importClass(org.pegdown.PegDownProcessor);
+              importClass(org.pegdown.Extensions);
+              importClass(org.pegdown.FastEncoder);
+              var markdownSource = self.getToken();
+              var title = undefined;
+              if (markdownSource.search(/^(#+\s*)?(.+)[\n\r]/) == 0) {
+                title = RegExp.$2;
+                // Convert the first line into a markdown heading, if it is not already:
+                if (RegExp.$1 == '') {
+                  markdownSource = '# ' + markdownSource;
+                }
+              }
+              var processor = new PegDownProcessor(
+                Extensions.ABBREVIATIONS | Extensions.AUTOLINKS |
+                Extensions.FENCED_CODE_BLOCKS | Extensions.SMARTS
+              );
+              var html = new StringBuilder('<html>\n<head>\n');
+              if (title) {
+                html.append('<title>').append(FastEncoder.encode(title)).append('</title>\n');
+              }
+              html.append('<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">\n')
+                .append('</head>\n<body>\n')
+                .append(processor.markdownToHtml(markdownSource))
+                .append('\n</body>\n</html>\n');
+              self.setToken(html.toString());
+            ]]></scriptfilter>
+          </tokenfilter>
+        </filterchain>
+        <nested/>
+      </copy>
+    </sequential>
+  </macrodef>
 </project>
--- a/lucene/site/xsl/index.xsl
+++ b/lucene/site/xsl/index.xsl
@ -37,11 +37,14 @@
      <body>
        <div><img src="lucene_green_300.gif"/></div>
        <h1><xsl:text>Apache Lucene </xsl:text><xsl:value-of select="$version"/><xsl:text> Documentation</xsl:text></h1>
+        <p>Lucene is a Java full-text search engine. Lucene is not a complete application, 
+        but rather a code library and API that can easily be used to add search capabilities
+        to applications.</p>
        <p>
          This is the official documentation for <b><xsl:text>Apache Lucene </xsl:text>
          <xsl:value-of select="$version"/></b>. Additional documentation is available in the
          <a href="http://wiki.apache.org/lucene-java">Wiki</a>.
-        </p>
+        </p>        
        <h2>Getting Started</h2>
        <p>The following section is intended as a "getting started" guide. It has three
        audiences: first-time users looking to install Apache Lucene in their
@ -60,6 +63,8 @@
        <h2>Reference Documents</h2>
          <ul>
            <li><a href="changes/Changes.html">Changes</a>: List of changes in this release.</li>
+            <li><a href="MIGRATE.html">Migration Guide</a>: What changed in Lucene 4; how to migrate code from Lucene 3.x.</li>
+            <li><a href="JRE_VERSION_MIGRATION.html">JRE Version Migration</a>: Information about upgrading between major JRE versions.</li>
            <li><a href="fileformats.html">File Formats</a>: Guide to the index format used by Lucene.</li>
            <li><a href="core/org/apache/lucene/search/package-summary.html#package_description">Search and Scoring in Lucene</a>: Introduction to how Lucene scores documents.</li>
            <li><a href="core/org/apache/lucene/search/similarities/TFIDFSimilarity.html">Classic Scoring Formula</a>: Formula of Lucene's classic <a href="http://en.wikipedia.org/wiki/Vector_Space_Model">Vector Space</a> implementation. (look <a href="core/org/apache/lucene/search/similarities/package-summary.html#package_description">here</a> for other models)</li>