mirror of https://github.com/apache/lucene.git
PR 19468, but not exactly as it was done in the provided patches. JavaCC is no longer required to build Lucene, but can be run optionally
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@150017 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
798fc0f0ef
commit
2af2d85877
41
BUILD.txt
41
BUILD.txt
|
@ -3,15 +3,15 @@ Lucene Build Instructions
|
||||||
$Id$
|
$Id$
|
||||||
|
|
||||||
Basic steps:
|
Basic steps:
|
||||||
0) Install JDK 1.2 (or greater), Ant 1.4 (or greater), and the Ant
|
0) Install JDK 1.2 (or greater), Ant 1.5 (or greater), and the Ant
|
||||||
optional.jar
|
optional.jar
|
||||||
1) Download Lucene from Apache and unpack it
|
1) Download Lucene from Apache and unpack it
|
||||||
2) Connect to the top-level of your Lucene installation
|
2) Connect to the top-level of your Lucene installation
|
||||||
3) Install JavaCC
|
3) Install JavaCC (optional)
|
||||||
4) Run ant
|
4) Run ant
|
||||||
|
|
||||||
Step 0) Set up your development environment (JDK 1.2 or greater,
|
Step 0) Set up your development environment (JDK 1.2 or greater,
|
||||||
Ant 1.4 or greater)
|
Ant 1.5 or greater)
|
||||||
|
|
||||||
We'll assume that you know how to get and set up the JDK - if you
|
We'll assume that you know how to get and set up the JDK - if you
|
||||||
don't, then we suggest starting at http://java.sun.com and learning
|
don't, then we suggest starting at http://java.sun.com and learning
|
||||||
|
@ -22,26 +22,22 @@ with the development version of Lucene, we recommend you stick with
|
||||||
the most current version of Java (at the time of this writing, JDK
|
the most current version of Java (at the time of this writing, JDK
|
||||||
1.4). Also, note that if you're working with the Lucene source,
|
1.4). Also, note that if you're working with the Lucene source,
|
||||||
you'll need to use Ant (see below) and Ant requires at least JDK 1.1
|
you'll need to use Ant (see below) and Ant requires at least JDK 1.1
|
||||||
(and in the future will likely move to requiring JDK 1.2, according to
|
(and in the future will move to requiring JDK 1.2, according to
|
||||||
the Ant install docs).
|
the Ant install docs).
|
||||||
|
|
||||||
Like most of the Jakarta projects, Lucene uses Apache Ant for build
|
Like most of the Jakarta projects, Lucene uses Apache Ant for build
|
||||||
control. Specifically, you MUST use Ant version 1.4 or greater.
|
control. Specifically, you MUST use Ant version 1.5 or greater.
|
||||||
|
|
||||||
Ant is "kind of like make without make's wrinkles". Ant is
|
Ant is "kind of like make without make's wrinkles". Ant is
|
||||||
implemented in java and uses XML-based configuration files. You can
|
implemented in java and uses XML-based configuration files. You can
|
||||||
get it at:
|
get it at:
|
||||||
|
|
||||||
http://jakarta.apache.org/ant
|
http://ant.apache.org
|
||||||
|
|
||||||
Specifically, you can get the binary distributions at:
|
|
||||||
|
|
||||||
http://jakarta.apache.org/builds/jakarta-ant/release/
|
|
||||||
|
|
||||||
You'll need to download both the Ant binary distribution and the
|
You'll need to download both the Ant binary distribution and the
|
||||||
"optional" jar file. Install these according to the instructions at:
|
"optional" jar file. Install these according to the instructions at:
|
||||||
|
|
||||||
http://jakarta.apache.org/ant/manual
|
http://ant.apache.org/manual
|
||||||
|
|
||||||
Step 1) Download Lucene from Apache
|
Step 1) Download Lucene from Apache
|
||||||
|
|
||||||
|
@ -79,21 +75,16 @@ NOTE: the ~ character represents your user account home directory.
|
||||||
|
|
||||||
Step 3) Install JavaCC
|
Step 3) Install JavaCC
|
||||||
|
|
||||||
Building the Lucene distribution from the source requires the JavaCC
|
Building the Lucene distribution from the source does not require the JavaCC
|
||||||
parser generator. This software has a separate license agreement that
|
parser generator, but if you wish to regenerate any of the pre-generated
|
||||||
must be agreed to before you can use it. The web page for JavaCC is here:
|
parser pieces, you will need to install JavaCC.
|
||||||
|
|
||||||
http://www.experimentalstuff.com/Technologies/JavaCC/
|
http://javacc.dev.java.net
|
||||||
|
|
||||||
Follow the download links and download the zip file to a temporary
|
Follow the download links and download the zip file to a temporary
|
||||||
location on your file system. Unzip the file and run the large class file
|
location on your file system.
|
||||||
in the directory. On windows, use this command from the temp directory:
|
|
||||||
|
|
||||||
java -cp . JavaCC2_1
|
After JavaCC is installed, edit your build.properties
|
||||||
|
|
||||||
This will launch a Java GUI installer. There is also a command line
|
|
||||||
installer available, and the installation class will give you those
|
|
||||||
directions. After JavaCC is installed, edit your build properties
|
|
||||||
(as in step 2), and add the line
|
(as in step 2), and add the line
|
||||||
|
|
||||||
javacc.home=/javacc/bin
|
javacc.home=/javacc/bin
|
||||||
|
@ -107,14 +98,16 @@ location of your ant installation, typing "ant" at the shell prompt
|
||||||
and command prompt should run ant. Ant will by default look for the
|
and command prompt should run ant. Ant will by default look for the
|
||||||
"build.xml" file in your current directory, and compile Lucene.
|
"build.xml" file in your current directory, and compile Lucene.
|
||||||
|
|
||||||
|
To rebuild any of the JavaCC-based parsers, run "ant javacc".
|
||||||
|
|
||||||
For further information on Lucene, go to:
|
For further information on Lucene, go to:
|
||||||
http://jakarta.apache.org/lucene/
|
http://jakarta.apache.org/lucene/
|
||||||
|
|
||||||
Please join the Lucene-User mailing list by visiting this site:
|
Please join the Lucene-User mailing list by visiting this site:
|
||||||
http://jakarta.apache.org/site/mail.html
|
http://jakarta.apache.org/site/mail.html
|
||||||
|
|
||||||
Please post suggestions, questions, corrections or additions to this
|
Please post suggestions, questions, corrections or additions to this
|
||||||
document to the Lucene-User mailing list.
|
document to the lucene-user mailing list.
|
||||||
|
|
||||||
This file was originally written by Steven J. Owens <puff@darksleep.com>.
|
This file was originally written by Steven J. Owens <puff@darksleep.com>.
|
||||||
This file was modified by Jon S. Stevens <jon@latchkey.com>.
|
This file was modified by Jon S. Stevens <jon@latchkey.com>.
|
||||||
|
|
120
build.xml
120
build.xml
|
@ -9,6 +9,8 @@
|
||||||
<property file="${basedir}/build.properties" />
|
<property file="${basedir}/build.properties" />
|
||||||
<property file="${basedir}/default.properties" />
|
<property file="${basedir}/default.properties" />
|
||||||
|
|
||||||
|
<property name="javacc.main.class" value="org.javacc.parser.Main"/>
|
||||||
|
|
||||||
<!-- Build classpath -->
|
<!-- Build classpath -->
|
||||||
<path id="classpath">
|
<path id="classpath">
|
||||||
<pathelement location="${build.classes}"/>
|
<pathelement location="${build.classes}"/>
|
||||||
|
@ -52,8 +54,8 @@
|
||||||
|
|
||||||
<available
|
<available
|
||||||
property="javacc.present"
|
property="javacc.present"
|
||||||
classname="COM.sun.labs.javacc.Main"
|
classname="${javacc.main.class}"
|
||||||
classpath="${javacc.zip}"
|
classpath="${javacc.jar}"
|
||||||
/>
|
/>
|
||||||
|
|
||||||
<available
|
<available
|
||||||
|
@ -67,21 +69,21 @@
|
||||||
</tstamp>
|
</tstamp>
|
||||||
</target>
|
</target>
|
||||||
|
|
||||||
<target name="javacc_check" depends="init" unless="javacc.present">
|
<target name="javacc-check" depends="init">
|
||||||
<echo>
|
<fail unless="javacc.present">
|
||||||
##################################################################
|
##################################################################
|
||||||
JavaCC not found.
|
JavaCC not found.
|
||||||
JavaCC Home: ${javacc.home}
|
JavaCC Home: ${javacc.home}
|
||||||
JavaCC Zip: ${javacc.zip}
|
JavaCC Zip: ${javacc.jar}
|
||||||
|
|
||||||
Please download and install JavaCC 2.0 from:
|
Please download and install JavaCC from:
|
||||||
|
|
||||||
<http://www.experimentalstuff.com/Technologies/JavaCC/>
|
<http://javacc.dev.java.net>
|
||||||
|
|
||||||
Then, create a build.properties file either in your home
|
Then, create a build.properties file either in your home
|
||||||
directory, or within the Lucene directory and set the javacc.home
|
directory, or within the Lucene directory and set the javacc.home
|
||||||
property to the path where JavaCC.zip is located. For example,
|
property to the path where JavaCC.zip is located. For example,
|
||||||
if you installed JavaCC in /usr/local/java/javacc2.0, then set the
|
if you installed JavaCC in /usr/local/java/javacc3.2, then set the
|
||||||
javacc.home property to:
|
javacc.home property to:
|
||||||
|
|
||||||
javacc.home=/usr/local/java/javacc2.0/bin
|
javacc.home=/usr/local/java/javacc2.0/bin
|
||||||
|
@ -89,9 +91,10 @@
|
||||||
If you get an error like the one below, then you have not installed
|
If you get an error like the one below, then you have not installed
|
||||||
things correctly. Please check all your paths and try again.
|
things correctly. Please check all your paths and try again.
|
||||||
|
|
||||||
java.lang.NoClassDefFoundError: COM/sun/labs/javacc/Main
|
java.lang.NoClassDefFoundError: org.javacc.parser.Main
|
||||||
##################################################################
|
##################################################################
|
||||||
</echo>
|
</fail>
|
||||||
|
|
||||||
</target>
|
</target>
|
||||||
|
|
||||||
<!-- ================================================================== -->
|
<!-- ================================================================== -->
|
||||||
|
@ -99,25 +102,10 @@
|
||||||
<!-- ================================================================== -->
|
<!-- ================================================================== -->
|
||||||
<!-- -->
|
<!-- -->
|
||||||
<!-- ================================================================== -->
|
<!-- ================================================================== -->
|
||||||
<target name="compile" depends="init,javacc_check" if="javacc.present">
|
<target name="compile" depends="init">
|
||||||
<mkdir dir="${build.src}/org/apache/lucene/analysis/standard"/>
|
|
||||||
<javacc
|
|
||||||
target="${src.dir}/org/apache/lucene/analysis/standard/StandardTokenizer.jj"
|
|
||||||
javacchome="${javacc.zip.dir}"
|
|
||||||
outputdirectory="${build.src}/org/apache/lucene/analysis/standard"
|
|
||||||
/>
|
|
||||||
|
|
||||||
<delete file="${build.src}/org/apache/lucene/analysis/standard/ParseException.java"/>
|
|
||||||
<mkdir dir="${build.src}/org/apache/lucene/queryParser"/>
|
|
||||||
<javacc
|
|
||||||
target="${src.dir}/org/apache/lucene/queryParser/QueryParser.jj"
|
|
||||||
javacchome="${javacc.zip.dir}"
|
|
||||||
outputdirectory="${build.src}/org/apache/lucene/queryParser"
|
|
||||||
/>
|
|
||||||
|
|
||||||
<javac
|
<javac
|
||||||
encoding="${build.encoding}"
|
encoding="${build.encoding}"
|
||||||
srcdir="${src.dir}:${build.src}"
|
srcdir="${src.dir}"
|
||||||
includes="org/**/*.java"
|
includes="org/**/*.java"
|
||||||
destdir="${build.classes}"
|
destdir="${build.classes}"
|
||||||
debug="${debug}">
|
debug="${debug}">
|
||||||
|
@ -135,7 +123,7 @@
|
||||||
<!-- ================================================================== -->
|
<!-- ================================================================== -->
|
||||||
<!-- -->
|
<!-- -->
|
||||||
<!-- ================================================================== -->
|
<!-- ================================================================== -->
|
||||||
<target name="jar" depends="compile" if="javacc.present">
|
<target name="jar" depends="compile">
|
||||||
|
|
||||||
<!-- Create Jar MANIFEST file -->
|
<!-- Create Jar MANIFEST file -->
|
||||||
<echo file="${build.manifest}">Manifest-Version: 1.0
|
<echo file="${build.manifest}">Manifest-Version: 1.0
|
||||||
|
@ -158,7 +146,7 @@ Implementation-Vendor: Lucene
|
||||||
/>
|
/>
|
||||||
</target>
|
</target>
|
||||||
|
|
||||||
<target name="jardemo" depends="compile,demo" if="javacc.present">
|
<target name="jardemo" depends="compile,demo">
|
||||||
<jar
|
<jar
|
||||||
jarfile="${build.demo}/${build.demo.name}.jar"
|
jarfile="${build.demo}/${build.demo.name}.jar"
|
||||||
basedir="${build.demo.classes}"
|
basedir="${build.demo.classes}"
|
||||||
|
@ -166,7 +154,7 @@ Implementation-Vendor: Lucene
|
||||||
/>
|
/>
|
||||||
</target>
|
</target>
|
||||||
|
|
||||||
<target name="wardemo" depends="compile,demo,jar,jardemo" if="javacc.present">
|
<target name="wardemo" depends="compile,demo,jar,jardemo">
|
||||||
<mkdir dir="${build.demo}/${build.demo.war.name}"/>
|
<mkdir dir="${build.demo}/${build.demo.war.name}"/>
|
||||||
<mkdir dir="${build.demo}/${build.demo.war.name}/WEB-INF"/>
|
<mkdir dir="${build.demo}/${build.demo.war.name}/WEB-INF"/>
|
||||||
<mkdir dir="${build.demo}/${build.demo.war.name}/WEB-INF/lib"/>
|
<mkdir dir="${build.demo}/${build.demo.war.name}/WEB-INF/lib"/>
|
||||||
|
@ -202,22 +190,8 @@ Implementation-Vendor: Lucene
|
||||||
<!-- ================================================================== -->
|
<!-- ================================================================== -->
|
||||||
<!-- -->
|
<!-- -->
|
||||||
<!-- ================================================================== -->
|
<!-- ================================================================== -->
|
||||||
<target name="jar-src" depends="init,javacc_check" if="javacc.present">
|
<target name="jar-src" depends="init">
|
||||||
<mkdir dir="${build.src}/org/apache/lucene/analysis/standard"/>
|
<mkdir dir="${build.src}/org/apache/lucene/analysis/standard"/>
|
||||||
<javacc
|
|
||||||
target="${src.dir}/org/apache/lucene/analysis/standard/StandardTokenizer.jj"
|
|
||||||
javacchome="${javacc.zip.dir}"
|
|
||||||
outputdirectory="${build.src}/org/apache/lucene/analysis/standard"
|
|
||||||
/>
|
|
||||||
|
|
||||||
<delete file="${build.src}/org/apache/lucene/analysis/standard/ParseException.java"/>
|
|
||||||
<mkdir dir="${build.src}/org/apache/lucene/queryParser"/>
|
|
||||||
<javacc
|
|
||||||
target="${src.dir}/org/apache/lucene/queryParser/QueryParser.jj"
|
|
||||||
javacchome="${javacc.zip.dir}"
|
|
||||||
outputdirectory="${build.src}/org/apache/lucene/queryParser"
|
|
||||||
/>
|
|
||||||
|
|
||||||
<jar jarfile="${build.dir}/${final.name}-src.jar">
|
<jar jarfile="${build.dir}/${final.name}-src.jar">
|
||||||
<fileset dir="${build.dir}" includes="**/*.java"/>
|
<fileset dir="${build.dir}" includes="**/*.java"/>
|
||||||
</jar>
|
</jar>
|
||||||
|
@ -228,7 +202,7 @@ Implementation-Vendor: Lucene
|
||||||
<!-- ================================================================== -->
|
<!-- ================================================================== -->
|
||||||
<!-- -->
|
<!-- -->
|
||||||
<!-- ================================================================== -->
|
<!-- ================================================================== -->
|
||||||
<target name="demo" depends="compile" if="javacc.present">
|
<target name="demo" depends="compile">
|
||||||
<mkdir dir="${build.demo}"/>
|
<mkdir dir="${build.demo}"/>
|
||||||
<mkdir dir="${build.demo.src}" />
|
<mkdir dir="${build.demo.src}" />
|
||||||
|
|
||||||
|
@ -239,11 +213,6 @@ Implementation-Vendor: Lucene
|
||||||
</fileset>
|
</fileset>
|
||||||
</copy>
|
</copy>
|
||||||
|
|
||||||
<javacc
|
|
||||||
target="${build.demo.src}/org/apache/lucene/demo/html/HTMLParser.jj"
|
|
||||||
javacchome="${javacc.zip.dir}"
|
|
||||||
outputdirectory="${build.demo.src}/org/apache/lucene/demo/html"
|
|
||||||
/>
|
|
||||||
<mkdir dir="${build.demo.classes}"/>
|
<mkdir dir="${build.demo.classes}"/>
|
||||||
|
|
||||||
<javac
|
<javac
|
||||||
|
@ -355,7 +324,7 @@ Implementation-Vendor: Lucene
|
||||||
<!-- ================================================================== -->
|
<!-- ================================================================== -->
|
||||||
<!-- -->
|
<!-- -->
|
||||||
<!-- ================================================================== -->
|
<!-- ================================================================== -->
|
||||||
<target name="javadocs" depends="compile" if="javacc.present">
|
<target name="javadocs" depends="compile">
|
||||||
<mkdir dir="${build.javadocs}"/>
|
<mkdir dir="${build.javadocs}"/>
|
||||||
<javadoc
|
<javadoc
|
||||||
sourcepath="${src.dir}:${build.src}"
|
sourcepath="${src.dir}:${build.src}"
|
||||||
|
@ -619,4 +588,51 @@ Implementation-Vendor: Lucene
|
||||||
</war>
|
</war>
|
||||||
</target>
|
</target>
|
||||||
-->
|
-->
|
||||||
|
|
||||||
|
|
||||||
|
<!-- ================================================================== -->
|
||||||
|
<!-- Build the JavaCC files into the source tree -->
|
||||||
|
<!-- ================================================================== -->
|
||||||
|
<target name="javacc" depends="javacc-StandardAnalyzer,javacc-QueryParser,javacc-HTMLParser"/>
|
||||||
|
|
||||||
|
<target name="javacc-StandardAnalyzer" depends="init,javacc-check" if="javacc.present">
|
||||||
|
<!-- generate this in a build directory so we can exclude ParseException -->
|
||||||
|
<mkdir dir="${build.src}/org/apache/lucene/analysis/standard"/>
|
||||||
|
<antcall target="invoke-javacc">
|
||||||
|
<param name="target" location="${src.dir}/org/apache/lucene/analysis/standard/StandardTokenizer.jj"/>
|
||||||
|
<param name="output.dir" location="${build.src}/org/apache/lucene/analysis/standard"/>
|
||||||
|
</antcall>
|
||||||
|
<copy todir="${src.dir}/org/apache/lucene/analysis/standard">
|
||||||
|
<fileset dir="${build.src}/org/apache/lucene/analysis/standard">
|
||||||
|
<include name="*.java"/>
|
||||||
|
<exclude name="ParseException.java"/>
|
||||||
|
</fileset>
|
||||||
|
</copy>
|
||||||
|
</target>
|
||||||
|
|
||||||
|
<target name="javacc-QueryParser" depends="init,javacc-check" if="javacc.present">
|
||||||
|
<antcall target="invoke-javacc">
|
||||||
|
<param name="target" location="${src.dir}/org/apache/lucene/queryParser/QueryParser.jj"/>
|
||||||
|
<param name="output.dir" location="${src.dir}/org/apache/lucene/queryParser"/>
|
||||||
|
</antcall>
|
||||||
|
</target>
|
||||||
|
|
||||||
|
<target name="javacc-HTMLParser" depends="init,javacc-check" if="javacc.present">
|
||||||
|
<antcall target="invoke-javacc">
|
||||||
|
<param name="target" location="${demo.src}/org/apache/lucene/demo/html/HTMLParser.jj"/>
|
||||||
|
<param name="output.dir" location="${demo.src}/org/apache/lucene/demo/html"/>
|
||||||
|
</antcall>
|
||||||
|
</target>
|
||||||
|
|
||||||
|
<target name="invoke-javacc">
|
||||||
|
<java classname="${javacc.main.class}" fork="true">
|
||||||
|
<classpath path="${javacc.jar}"/>
|
||||||
|
|
||||||
|
<sysproperty key="install.root" file="${javacc.home}"/>
|
||||||
|
|
||||||
|
<arg value="-OUTPUT_DIRECTORY:${output.dir}"/>
|
||||||
|
<arg value="${target}"/>
|
||||||
|
</java>
|
||||||
|
</target>
|
||||||
|
|
||||||
</project>
|
</project>
|
||||||
|
|
|
@ -58,8 +58,8 @@ junit.reports = ${build.dir}/unit-reports
|
||||||
|
|
||||||
# Home directory of JavaCC
|
# Home directory of JavaCC
|
||||||
javacc.home = .
|
javacc.home = .
|
||||||
javacc.zip.dir = ${javacc.home}/lib
|
javacc.zip.dir = ${javacc.home}/bin/lib
|
||||||
javacc.zip = ${javacc.zip.dir}/JavaCC.zip
|
javacc.jar = ${javacc.zip.dir}/javacc.jar
|
||||||
|
|
||||||
# Home directory of jakarta-site2
|
# Home directory of jakarta-site2
|
||||||
jakarta.site2.home = ../jakarta-site2
|
jakarta.site2.home = ../jakarta-site2
|
||||||
|
|
|
@ -0,0 +1,688 @@
|
||||||
|
/* Generated By:JavaCC: Do not edit this line. HTMLParser.java */
|
||||||
|
package org.apache.lucene.demo.html;
|
||||||
|
|
||||||
|
import java.io.*;
|
||||||
|
import java.util.Properties;
|
||||||
|
|
||||||
|
public class HTMLParser implements HTMLParserConstants {
|
||||||
|
public static int SUMMARY_LENGTH = 200;
|
||||||
|
|
||||||
|
StringBuffer title = new StringBuffer(SUMMARY_LENGTH);
|
||||||
|
StringBuffer summary = new StringBuffer(SUMMARY_LENGTH * 2);
|
||||||
|
Properties metaTags=new Properties();
|
||||||
|
String currentMetaTag="";
|
||||||
|
int length = 0;
|
||||||
|
boolean titleComplete = false;
|
||||||
|
boolean inTitle = false;
|
||||||
|
boolean inMetaTag = false;
|
||||||
|
boolean inStyle = false;
|
||||||
|
boolean inScript = false;
|
||||||
|
boolean afterTag = false;
|
||||||
|
boolean afterSpace = false;
|
||||||
|
String eol = System.getProperty("line.separator");
|
||||||
|
PipedReader pipeIn = null;
|
||||||
|
PipedWriter pipeOut;
|
||||||
|
|
||||||
|
public HTMLParser(File file) throws FileNotFoundException {
|
||||||
|
this(new FileInputStream(file));
|
||||||
|
}
|
||||||
|
|
||||||
|
public String getTitle() throws IOException, InterruptedException {
|
||||||
|
if (pipeIn == null)
|
||||||
|
getReader(); // spawn parsing thread
|
||||||
|
while (true) {
|
||||||
|
synchronized(this) {
|
||||||
|
if (titleComplete || (length > SUMMARY_LENGTH))
|
||||||
|
break;
|
||||||
|
wait(10);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return title.toString().trim();
|
||||||
|
}
|
||||||
|
|
||||||
|
public Properties getMetaTags() throws IOException,
|
||||||
|
InterruptedException {
|
||||||
|
if (pipeIn == null)
|
||||||
|
getReader(); // spawn parsing thread
|
||||||
|
while (true) {
|
||||||
|
synchronized(this) {
|
||||||
|
if (titleComplete || (length > SUMMARY_LENGTH))
|
||||||
|
break;
|
||||||
|
wait(10);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return metaTags;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
public String getSummary() throws IOException, InterruptedException {
|
||||||
|
if (pipeIn == null)
|
||||||
|
getReader(); // spawn parsing thread
|
||||||
|
while (true) {
|
||||||
|
synchronized(this) {
|
||||||
|
if (summary.length() >= SUMMARY_LENGTH)
|
||||||
|
break;
|
||||||
|
wait(10);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if (summary.length() > SUMMARY_LENGTH)
|
||||||
|
summary.setLength(SUMMARY_LENGTH);
|
||||||
|
|
||||||
|
String sum = summary.toString().trim();
|
||||||
|
String tit = getTitle();
|
||||||
|
if (sum.startsWith(tit))
|
||||||
|
return sum.substring(tit.length());
|
||||||
|
else
|
||||||
|
return sum;
|
||||||
|
}
|
||||||
|
|
||||||
|
public Reader getReader() throws IOException {
|
||||||
|
if (pipeIn == null) {
|
||||||
|
pipeIn = new PipedReader();
|
||||||
|
pipeOut = new PipedWriter(pipeIn);
|
||||||
|
|
||||||
|
Thread thread = new ParserThread(this);
|
||||||
|
thread.start(); // start parsing
|
||||||
|
}
|
||||||
|
|
||||||
|
return pipeIn;
|
||||||
|
}
|
||||||
|
|
||||||
|
void addToSummary(String text) {
|
||||||
|
if (summary.length() < SUMMARY_LENGTH) {
|
||||||
|
summary.append(text);
|
||||||
|
if (summary.length() >= SUMMARY_LENGTH) {
|
||||||
|
synchronized(this) {
|
||||||
|
notifyAll();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
void addText(String text) throws IOException {
|
||||||
|
if (inScript)
|
||||||
|
return;
|
||||||
|
if (inStyle)
|
||||||
|
return;
|
||||||
|
if (inMetaTag)
|
||||||
|
{
|
||||||
|
metaTags.setProperty(currentMetaTag, text);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
if (inTitle)
|
||||||
|
title.append(text);
|
||||||
|
else {
|
||||||
|
addToSummary(text);
|
||||||
|
if (!titleComplete && !title.equals("")) { // finished title
|
||||||
|
synchronized(this) {
|
||||||
|
titleComplete = true; // tell waiting threads
|
||||||
|
notifyAll();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
length += text.length();
|
||||||
|
pipeOut.write(text);
|
||||||
|
|
||||||
|
afterSpace = false;
|
||||||
|
}
|
||||||
|
|
||||||
|
void addSpace() throws IOException {
|
||||||
|
if (inScript)
|
||||||
|
return;
|
||||||
|
if (!afterSpace) {
|
||||||
|
if (inTitle)
|
||||||
|
title.append(" ");
|
||||||
|
else
|
||||||
|
addToSummary(" ");
|
||||||
|
|
||||||
|
String space = afterTag ? eol : " ";
|
||||||
|
length += space.length();
|
||||||
|
pipeOut.write(space);
|
||||||
|
afterSpace = true;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
final public void HTMLDocument() throws ParseException, IOException {
|
||||||
|
Token t;
|
||||||
|
label_1:
|
||||||
|
while (true) {
|
||||||
|
switch ((jj_ntk==-1)?jj_ntk():jj_ntk) {
|
||||||
|
case TagName:
|
||||||
|
case DeclName:
|
||||||
|
case Comment1:
|
||||||
|
case Comment2:
|
||||||
|
case Word:
|
||||||
|
case Entity:
|
||||||
|
case Space:
|
||||||
|
case Punct:
|
||||||
|
;
|
||||||
|
break;
|
||||||
|
default:
|
||||||
|
jj_la1[0] = jj_gen;
|
||||||
|
break label_1;
|
||||||
|
}
|
||||||
|
switch ((jj_ntk==-1)?jj_ntk():jj_ntk) {
|
||||||
|
case TagName:
|
||||||
|
Tag();
|
||||||
|
afterTag = true;
|
||||||
|
break;
|
||||||
|
case DeclName:
|
||||||
|
t = Decl();
|
||||||
|
afterTag = true;
|
||||||
|
break;
|
||||||
|
case Comment1:
|
||||||
|
case Comment2:
|
||||||
|
CommentTag();
|
||||||
|
afterTag = true;
|
||||||
|
break;
|
||||||
|
case Word:
|
||||||
|
t = jj_consume_token(Word);
|
||||||
|
addText(t.image); afterTag = false;
|
||||||
|
break;
|
||||||
|
case Entity:
|
||||||
|
t = jj_consume_token(Entity);
|
||||||
|
addText(Entities.decode(t.image)); afterTag = false;
|
||||||
|
break;
|
||||||
|
case Punct:
|
||||||
|
t = jj_consume_token(Punct);
|
||||||
|
addText(t.image); afterTag = false;
|
||||||
|
break;
|
||||||
|
case Space:
|
||||||
|
jj_consume_token(Space);
|
||||||
|
addSpace(); afterTag = false;
|
||||||
|
break;
|
||||||
|
default:
|
||||||
|
jj_la1[1] = jj_gen;
|
||||||
|
jj_consume_token(-1);
|
||||||
|
throw new ParseException();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
jj_consume_token(0);
|
||||||
|
}
|
||||||
|
|
||||||
|
final public void Tag() throws ParseException, IOException {
|
||||||
|
Token t1, t2;
|
||||||
|
boolean inImg = false;
|
||||||
|
t1 = jj_consume_token(TagName);
|
||||||
|
inTitle = t1.image.equalsIgnoreCase("<title"); // keep track if in <TITLE>
|
||||||
|
inMetaTag = t1.image.equalsIgnoreCase("<META"); // keep track if in <META>
|
||||||
|
inStyle = t1.image.equalsIgnoreCase("<STYLE"); // keep track if in <STYLE>
|
||||||
|
inImg = t1.image.equalsIgnoreCase("<img"); // keep track if in <IMG>
|
||||||
|
if (inScript) { // keep track if in <SCRIPT>
|
||||||
|
inScript = !t1.image.equalsIgnoreCase("</script");
|
||||||
|
} else {
|
||||||
|
inScript = t1.image.equalsIgnoreCase("<script");
|
||||||
|
}
|
||||||
|
label_2:
|
||||||
|
while (true) {
|
||||||
|
switch ((jj_ntk==-1)?jj_ntk():jj_ntk) {
|
||||||
|
case ArgName:
|
||||||
|
;
|
||||||
|
break;
|
||||||
|
default:
|
||||||
|
jj_la1[2] = jj_gen;
|
||||||
|
break label_2;
|
||||||
|
}
|
||||||
|
t1 = jj_consume_token(ArgName);
|
||||||
|
switch ((jj_ntk==-1)?jj_ntk():jj_ntk) {
|
||||||
|
case ArgEquals:
|
||||||
|
jj_consume_token(ArgEquals);
|
||||||
|
switch ((jj_ntk==-1)?jj_ntk():jj_ntk) {
|
||||||
|
case ArgValue:
|
||||||
|
case ArgQuote1:
|
||||||
|
case ArgQuote2:
|
||||||
|
t2 = ArgValue();
|
||||||
|
if (inImg && t1.image.equalsIgnoreCase("alt") && t2 != null)
|
||||||
|
addText("[" + t2.image + "]");
|
||||||
|
|
||||||
|
if(inMetaTag &&
|
||||||
|
( t1.image.equalsIgnoreCase("name") ||
|
||||||
|
t1.image.equalsIgnoreCase("HTTP-EQUIV")
|
||||||
|
)
|
||||||
|
&& t2 != null)
|
||||||
|
{
|
||||||
|
currentMetaTag=t2.image.toLowerCase();
|
||||||
|
}
|
||||||
|
if(inMetaTag && t1.image.equalsIgnoreCase("content") && t2 !=
|
||||||
|
null)
|
||||||
|
{
|
||||||
|
addText(t2.image);
|
||||||
|
}
|
||||||
|
break;
|
||||||
|
default:
|
||||||
|
jj_la1[3] = jj_gen;
|
||||||
|
;
|
||||||
|
}
|
||||||
|
break;
|
||||||
|
default:
|
||||||
|
jj_la1[4] = jj_gen;
|
||||||
|
;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
jj_consume_token(TagEnd);
|
||||||
|
}
|
||||||
|
|
||||||
|
final public Token ArgValue() throws ParseException {
|
||||||
|
Token t = null;
|
||||||
|
switch ((jj_ntk==-1)?jj_ntk():jj_ntk) {
|
||||||
|
case ArgValue:
|
||||||
|
t = jj_consume_token(ArgValue);
|
||||||
|
{if (true) return t;}
|
||||||
|
break;
|
||||||
|
default:
|
||||||
|
jj_la1[5] = jj_gen;
|
||||||
|
if (jj_2_1(2)) {
|
||||||
|
jj_consume_token(ArgQuote1);
|
||||||
|
jj_consume_token(CloseQuote1);
|
||||||
|
{if (true) return t;}
|
||||||
|
} else {
|
||||||
|
switch ((jj_ntk==-1)?jj_ntk():jj_ntk) {
|
||||||
|
case ArgQuote1:
|
||||||
|
jj_consume_token(ArgQuote1);
|
||||||
|
t = jj_consume_token(Quote1Text);
|
||||||
|
jj_consume_token(CloseQuote1);
|
||||||
|
{if (true) return t;}
|
||||||
|
break;
|
||||||
|
default:
|
||||||
|
jj_la1[6] = jj_gen;
|
||||||
|
if (jj_2_2(2)) {
|
||||||
|
jj_consume_token(ArgQuote2);
|
||||||
|
jj_consume_token(CloseQuote2);
|
||||||
|
{if (true) return t;}
|
||||||
|
} else {
|
||||||
|
switch ((jj_ntk==-1)?jj_ntk():jj_ntk) {
|
||||||
|
case ArgQuote2:
|
||||||
|
jj_consume_token(ArgQuote2);
|
||||||
|
t = jj_consume_token(Quote2Text);
|
||||||
|
jj_consume_token(CloseQuote2);
|
||||||
|
{if (true) return t;}
|
||||||
|
break;
|
||||||
|
default:
|
||||||
|
jj_la1[7] = jj_gen;
|
||||||
|
jj_consume_token(-1);
|
||||||
|
throw new ParseException();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
throw new Error("Missing return statement in function");
|
||||||
|
}
|
||||||
|
|
||||||
|
final public Token Decl() throws ParseException {
|
||||||
|
Token t;
|
||||||
|
t = jj_consume_token(DeclName);
|
||||||
|
label_3:
|
||||||
|
while (true) {
|
||||||
|
switch ((jj_ntk==-1)?jj_ntk():jj_ntk) {
|
||||||
|
case ArgName:
|
||||||
|
case ArgEquals:
|
||||||
|
case ArgValue:
|
||||||
|
case ArgQuote1:
|
||||||
|
case ArgQuote2:
|
||||||
|
;
|
||||||
|
break;
|
||||||
|
default:
|
||||||
|
jj_la1[8] = jj_gen;
|
||||||
|
break label_3;
|
||||||
|
}
|
||||||
|
switch ((jj_ntk==-1)?jj_ntk():jj_ntk) {
|
||||||
|
case ArgName:
|
||||||
|
jj_consume_token(ArgName);
|
||||||
|
break;
|
||||||
|
case ArgValue:
|
||||||
|
case ArgQuote1:
|
||||||
|
case ArgQuote2:
|
||||||
|
ArgValue();
|
||||||
|
break;
|
||||||
|
case ArgEquals:
|
||||||
|
jj_consume_token(ArgEquals);
|
||||||
|
break;
|
||||||
|
default:
|
||||||
|
jj_la1[9] = jj_gen;
|
||||||
|
jj_consume_token(-1);
|
||||||
|
throw new ParseException();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
jj_consume_token(TagEnd);
|
||||||
|
{if (true) return t;}
|
||||||
|
throw new Error("Missing return statement in function");
|
||||||
|
}
|
||||||
|
|
||||||
|
final public void CommentTag() throws ParseException {
|
||||||
|
switch ((jj_ntk==-1)?jj_ntk():jj_ntk) {
|
||||||
|
case Comment1:
|
||||||
|
jj_consume_token(Comment1);
|
||||||
|
label_4:
|
||||||
|
while (true) {
|
||||||
|
switch ((jj_ntk==-1)?jj_ntk():jj_ntk) {
|
||||||
|
case CommentText1:
|
||||||
|
;
|
||||||
|
break;
|
||||||
|
default:
|
||||||
|
jj_la1[10] = jj_gen;
|
||||||
|
break label_4;
|
||||||
|
}
|
||||||
|
jj_consume_token(CommentText1);
|
||||||
|
}
|
||||||
|
jj_consume_token(CommentEnd1);
|
||||||
|
break;
|
||||||
|
case Comment2:
|
||||||
|
jj_consume_token(Comment2);
|
||||||
|
label_5:
|
||||||
|
while (true) {
|
||||||
|
switch ((jj_ntk==-1)?jj_ntk():jj_ntk) {
|
||||||
|
case CommentText2:
|
||||||
|
;
|
||||||
|
break;
|
||||||
|
default:
|
||||||
|
jj_la1[11] = jj_gen;
|
||||||
|
break label_5;
|
||||||
|
}
|
||||||
|
jj_consume_token(CommentText2);
|
||||||
|
}
|
||||||
|
jj_consume_token(CommentEnd2);
|
||||||
|
break;
|
||||||
|
default:
|
||||||
|
jj_la1[12] = jj_gen;
|
||||||
|
jj_consume_token(-1);
|
||||||
|
throw new ParseException();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
final private boolean jj_2_1(int xla) {
|
||||||
|
jj_la = xla; jj_lastpos = jj_scanpos = token;
|
||||||
|
try { return !jj_3_1(); }
|
||||||
|
catch(LookaheadSuccess ls) { return true; }
|
||||||
|
finally { jj_save(0, xla); }
|
||||||
|
}
|
||||||
|
|
||||||
|
final private boolean jj_2_2(int xla) {
|
||||||
|
jj_la = xla; jj_lastpos = jj_scanpos = token;
|
||||||
|
try { return !jj_3_2(); }
|
||||||
|
catch(LookaheadSuccess ls) { return true; }
|
||||||
|
finally { jj_save(1, xla); }
|
||||||
|
}
|
||||||
|
|
||||||
|
final private boolean jj_3_1() {
|
||||||
|
if (jj_scan_token(ArgQuote1)) return true;
|
||||||
|
if (jj_scan_token(CloseQuote1)) return true;
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
final private boolean jj_3_2() {
|
||||||
|
if (jj_scan_token(ArgQuote2)) return true;
|
||||||
|
if (jj_scan_token(CloseQuote2)) return true;
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
public HTMLParserTokenManager token_source;
|
||||||
|
SimpleCharStream jj_input_stream;
|
||||||
|
public Token token, jj_nt;
|
||||||
|
private int jj_ntk;
|
||||||
|
private Token jj_scanpos, jj_lastpos;
|
||||||
|
private int jj_la;
|
||||||
|
public boolean lookingAhead = false;
|
||||||
|
private boolean jj_semLA;
|
||||||
|
private int jj_gen;
|
||||||
|
final private int[] jj_la1 = new int[13];
|
||||||
|
static private int[] jj_la1_0;
|
||||||
|
static {
|
||||||
|
jj_la1_0();
|
||||||
|
}
|
||||||
|
private static void jj_la1_0() {
|
||||||
|
jj_la1_0 = new int[] {0xb3e,0xb3e,0x1000,0x38000,0x2000,0x8000,0x10000,0x20000,0x3b000,0x3b000,0x800000,0x2000000,0x18,};
|
||||||
|
}
|
||||||
|
final private JJCalls[] jj_2_rtns = new JJCalls[2];
|
||||||
|
private boolean jj_rescan = false;
|
||||||
|
private int jj_gc = 0;
|
||||||
|
|
||||||
|
public HTMLParser(java.io.InputStream stream) {
|
||||||
|
jj_input_stream = new SimpleCharStream(stream, 1, 1);
|
||||||
|
token_source = new HTMLParserTokenManager(jj_input_stream);
|
||||||
|
token = new Token();
|
||||||
|
jj_ntk = -1;
|
||||||
|
jj_gen = 0;
|
||||||
|
for (int i = 0; i < 13; i++) jj_la1[i] = -1;
|
||||||
|
for (int i = 0; i < jj_2_rtns.length; i++) jj_2_rtns[i] = new JJCalls();
|
||||||
|
}
|
||||||
|
|
||||||
|
public void ReInit(java.io.InputStream stream) {
|
||||||
|
jj_input_stream.ReInit(stream, 1, 1);
|
||||||
|
token_source.ReInit(jj_input_stream);
|
||||||
|
token = new Token();
|
||||||
|
jj_ntk = -1;
|
||||||
|
jj_gen = 0;
|
||||||
|
for (int i = 0; i < 13; i++) jj_la1[i] = -1;
|
||||||
|
for (int i = 0; i < jj_2_rtns.length; i++) jj_2_rtns[i] = new JJCalls();
|
||||||
|
}
|
||||||
|
|
||||||
|
public HTMLParser(java.io.Reader stream) {
|
||||||
|
jj_input_stream = new SimpleCharStream(stream, 1, 1);
|
||||||
|
token_source = new HTMLParserTokenManager(jj_input_stream);
|
||||||
|
token = new Token();
|
||||||
|
jj_ntk = -1;
|
||||||
|
jj_gen = 0;
|
||||||
|
for (int i = 0; i < 13; i++) jj_la1[i] = -1;
|
||||||
|
for (int i = 0; i < jj_2_rtns.length; i++) jj_2_rtns[i] = new JJCalls();
|
||||||
|
}
|
||||||
|
|
||||||
|
public void ReInit(java.io.Reader stream) {
|
||||||
|
jj_input_stream.ReInit(stream, 1, 1);
|
||||||
|
token_source.ReInit(jj_input_stream);
|
||||||
|
token = new Token();
|
||||||
|
jj_ntk = -1;
|
||||||
|
jj_gen = 0;
|
||||||
|
for (int i = 0; i < 13; i++) jj_la1[i] = -1;
|
||||||
|
for (int i = 0; i < jj_2_rtns.length; i++) jj_2_rtns[i] = new JJCalls();
|
||||||
|
}
|
||||||
|
|
||||||
|
public HTMLParser(HTMLParserTokenManager tm) {
|
||||||
|
token_source = tm;
|
||||||
|
token = new Token();
|
||||||
|
jj_ntk = -1;
|
||||||
|
jj_gen = 0;
|
||||||
|
for (int i = 0; i < 13; i++) jj_la1[i] = -1;
|
||||||
|
for (int i = 0; i < jj_2_rtns.length; i++) jj_2_rtns[i] = new JJCalls();
|
||||||
|
}
|
||||||
|
|
||||||
|
public void ReInit(HTMLParserTokenManager tm) {
|
||||||
|
token_source = tm;
|
||||||
|
token = new Token();
|
||||||
|
jj_ntk = -1;
|
||||||
|
jj_gen = 0;
|
||||||
|
for (int i = 0; i < 13; i++) jj_la1[i] = -1;
|
||||||
|
for (int i = 0; i < jj_2_rtns.length; i++) jj_2_rtns[i] = new JJCalls();
|
||||||
|
}
|
||||||
|
|
||||||
|
final private Token jj_consume_token(int kind) throws ParseException {
|
||||||
|
Token oldToken;
|
||||||
|
if ((oldToken = token).next != null) token = token.next;
|
||||||
|
else token = token.next = token_source.getNextToken();
|
||||||
|
jj_ntk = -1;
|
||||||
|
if (token.kind == kind) {
|
||||||
|
jj_gen++;
|
||||||
|
if (++jj_gc > 100) {
|
||||||
|
jj_gc = 0;
|
||||||
|
for (int i = 0; i < jj_2_rtns.length; i++) {
|
||||||
|
JJCalls c = jj_2_rtns[i];
|
||||||
|
while (c != null) {
|
||||||
|
if (c.gen < jj_gen) c.first = null;
|
||||||
|
c = c.next;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return token;
|
||||||
|
}
|
||||||
|
token = oldToken;
|
||||||
|
jj_kind = kind;
|
||||||
|
throw generateParseException();
|
||||||
|
}
|
||||||
|
|
||||||
|
static private final class LookaheadSuccess extends java.lang.Error { }
|
||||||
|
final private LookaheadSuccess jj_ls = new LookaheadSuccess();
|
||||||
|
final private boolean jj_scan_token(int kind) {
|
||||||
|
if (jj_scanpos == jj_lastpos) {
|
||||||
|
jj_la--;
|
||||||
|
if (jj_scanpos.next == null) {
|
||||||
|
jj_lastpos = jj_scanpos = jj_scanpos.next = token_source.getNextToken();
|
||||||
|
} else {
|
||||||
|
jj_lastpos = jj_scanpos = jj_scanpos.next;
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
jj_scanpos = jj_scanpos.next;
|
||||||
|
}
|
||||||
|
if (jj_rescan) {
|
||||||
|
int i = 0; Token tok = token;
|
||||||
|
while (tok != null && tok != jj_scanpos) { i++; tok = tok.next; }
|
||||||
|
if (tok != null) jj_add_error_token(kind, i);
|
||||||
|
}
|
||||||
|
if (jj_scanpos.kind != kind) return true;
|
||||||
|
if (jj_la == 0 && jj_scanpos == jj_lastpos) throw jj_ls;
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
final public Token getNextToken() {
|
||||||
|
if (token.next != null) token = token.next;
|
||||||
|
else token = token.next = token_source.getNextToken();
|
||||||
|
jj_ntk = -1;
|
||||||
|
jj_gen++;
|
||||||
|
return token;
|
||||||
|
}
|
||||||
|
|
||||||
|
final public Token getToken(int index) {
|
||||||
|
Token t = lookingAhead ? jj_scanpos : token;
|
||||||
|
for (int i = 0; i < index; i++) {
|
||||||
|
if (t.next != null) t = t.next;
|
||||||
|
else t = t.next = token_source.getNextToken();
|
||||||
|
}
|
||||||
|
return t;
|
||||||
|
}
|
||||||
|
|
||||||
|
final private int jj_ntk() {
|
||||||
|
if ((jj_nt=token.next) == null)
|
||||||
|
return (jj_ntk = (token.next=token_source.getNextToken()).kind);
|
||||||
|
else
|
||||||
|
return (jj_ntk = jj_nt.kind);
|
||||||
|
}
|
||||||
|
|
||||||
|
private java.util.Vector jj_expentries = new java.util.Vector();
|
||||||
|
private int[] jj_expentry;
|
||||||
|
private int jj_kind = -1;
|
||||||
|
private int[] jj_lasttokens = new int[100];
|
||||||
|
private int jj_endpos;
|
||||||
|
|
||||||
|
private void jj_add_error_token(int kind, int pos) {
|
||||||
|
if (pos >= 100) return;
|
||||||
|
if (pos == jj_endpos + 1) {
|
||||||
|
jj_lasttokens[jj_endpos++] = kind;
|
||||||
|
} else if (jj_endpos != 0) {
|
||||||
|
jj_expentry = new int[jj_endpos];
|
||||||
|
for (int i = 0; i < jj_endpos; i++) {
|
||||||
|
jj_expentry[i] = jj_lasttokens[i];
|
||||||
|
}
|
||||||
|
boolean exists = false;
|
||||||
|
for (java.util.Enumeration e = jj_expentries.elements(); e.hasMoreElements();) {
|
||||||
|
int[] oldentry = (int[])(e.nextElement());
|
||||||
|
if (oldentry.length == jj_expentry.length) {
|
||||||
|
exists = true;
|
||||||
|
for (int i = 0; i < jj_expentry.length; i++) {
|
||||||
|
if (oldentry[i] != jj_expentry[i]) {
|
||||||
|
exists = false;
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if (exists) break;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if (!exists) jj_expentries.addElement(jj_expentry);
|
||||||
|
if (pos != 0) jj_lasttokens[(jj_endpos = pos) - 1] = kind;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
public ParseException generateParseException() {
|
||||||
|
jj_expentries.removeAllElements();
|
||||||
|
boolean[] la1tokens = new boolean[27];
|
||||||
|
for (int i = 0; i < 27; i++) {
|
||||||
|
la1tokens[i] = false;
|
||||||
|
}
|
||||||
|
if (jj_kind >= 0) {
|
||||||
|
la1tokens[jj_kind] = true;
|
||||||
|
jj_kind = -1;
|
||||||
|
}
|
||||||
|
for (int i = 0; i < 13; i++) {
|
||||||
|
if (jj_la1[i] == jj_gen) {
|
||||||
|
for (int j = 0; j < 32; j++) {
|
||||||
|
if ((jj_la1_0[i] & (1<<j)) != 0) {
|
||||||
|
la1tokens[j] = true;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
for (int i = 0; i < 27; i++) {
|
||||||
|
if (la1tokens[i]) {
|
||||||
|
jj_expentry = new int[1];
|
||||||
|
jj_expentry[0] = i;
|
||||||
|
jj_expentries.addElement(jj_expentry);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
jj_endpos = 0;
|
||||||
|
jj_rescan_token();
|
||||||
|
jj_add_error_token(0, 0);
|
||||||
|
int[][] exptokseq = new int[jj_expentries.size()][];
|
||||||
|
for (int i = 0; i < jj_expentries.size(); i++) {
|
||||||
|
exptokseq[i] = (int[])jj_expentries.elementAt(i);
|
||||||
|
}
|
||||||
|
return new ParseException(token, exptokseq, tokenImage);
|
||||||
|
}
|
||||||
|
|
||||||
|
final public void enable_tracing() {
|
||||||
|
}
|
||||||
|
|
||||||
|
final public void disable_tracing() {
|
||||||
|
}
|
||||||
|
|
||||||
|
final private void jj_rescan_token() {
|
||||||
|
jj_rescan = true;
|
||||||
|
for (int i = 0; i < 2; i++) {
|
||||||
|
JJCalls p = jj_2_rtns[i];
|
||||||
|
do {
|
||||||
|
if (p.gen > jj_gen) {
|
||||||
|
jj_la = p.arg; jj_lastpos = jj_scanpos = p.first;
|
||||||
|
switch (i) {
|
||||||
|
case 0: jj_3_1(); break;
|
||||||
|
case 1: jj_3_2(); break;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
p = p.next;
|
||||||
|
} while (p != null);
|
||||||
|
}
|
||||||
|
jj_rescan = false;
|
||||||
|
}
|
||||||
|
|
||||||
|
final private void jj_save(int index, int xla) {
|
||||||
|
JJCalls p = jj_2_rtns[index];
|
||||||
|
while (p.gen > jj_gen) {
|
||||||
|
if (p.next == null) { p = p.next = new JJCalls(); break; }
|
||||||
|
p = p.next;
|
||||||
|
}
|
||||||
|
p.gen = jj_gen + xla - jj_la; p.first = token; p.arg = xla;
|
||||||
|
}
|
||||||
|
|
||||||
|
static final class JJCalls {
|
||||||
|
int gen;
|
||||||
|
Token first;
|
||||||
|
int arg;
|
||||||
|
JJCalls next;
|
||||||
|
}
|
||||||
|
|
||||||
|
// void handleException(Exception e) {
|
||||||
|
// System.out.println(e.toString()); // print the error message
|
||||||
|
// System.out.println("Skipping...");
|
||||||
|
// Token t;
|
||||||
|
// do {
|
||||||
|
// t = getNextToken();
|
||||||
|
// } while (t.kind != TagEnd);
|
||||||
|
// }
|
||||||
|
}
|
|
@ -0,0 +1,71 @@
|
||||||
|
/* Generated By:JavaCC: Do not edit this line. HTMLParserConstants.java */
|
||||||
|
package org.apache.lucene.demo.html;
|
||||||
|
|
||||||
|
public interface HTMLParserConstants {
|
||||||
|
|
||||||
|
int EOF = 0;
|
||||||
|
int TagName = 1;
|
||||||
|
int DeclName = 2;
|
||||||
|
int Comment1 = 3;
|
||||||
|
int Comment2 = 4;
|
||||||
|
int Word = 5;
|
||||||
|
int LET = 6;
|
||||||
|
int NUM = 7;
|
||||||
|
int Entity = 8;
|
||||||
|
int Space = 9;
|
||||||
|
int SP = 10;
|
||||||
|
int Punct = 11;
|
||||||
|
int ArgName = 12;
|
||||||
|
int ArgEquals = 13;
|
||||||
|
int TagEnd = 14;
|
||||||
|
int ArgValue = 15;
|
||||||
|
int ArgQuote1 = 16;
|
||||||
|
int ArgQuote2 = 17;
|
||||||
|
int Quote1Text = 19;
|
||||||
|
int CloseQuote1 = 20;
|
||||||
|
int Quote2Text = 21;
|
||||||
|
int CloseQuote2 = 22;
|
||||||
|
int CommentText1 = 23;
|
||||||
|
int CommentEnd1 = 24;
|
||||||
|
int CommentText2 = 25;
|
||||||
|
int CommentEnd2 = 26;
|
||||||
|
|
||||||
|
int DEFAULT = 0;
|
||||||
|
int WithinTag = 1;
|
||||||
|
int AfterEquals = 2;
|
||||||
|
int WithinQuote1 = 3;
|
||||||
|
int WithinQuote2 = 4;
|
||||||
|
int WithinComment1 = 5;
|
||||||
|
int WithinComment2 = 6;
|
||||||
|
|
||||||
|
String[] tokenImage = {
|
||||||
|
"<EOF>",
|
||||||
|
"<TagName>",
|
||||||
|
"<DeclName>",
|
||||||
|
"\"<!--\"",
|
||||||
|
"\"<!\"",
|
||||||
|
"<Word>",
|
||||||
|
"<LET>",
|
||||||
|
"<NUM>",
|
||||||
|
"<Entity>",
|
||||||
|
"<Space>",
|
||||||
|
"<SP>",
|
||||||
|
"<Punct>",
|
||||||
|
"<ArgName>",
|
||||||
|
"\"=\"",
|
||||||
|
"<TagEnd>",
|
||||||
|
"<ArgValue>",
|
||||||
|
"\"\\\'\"",
|
||||||
|
"\"\\\"\"",
|
||||||
|
"<token of kind 18>",
|
||||||
|
"<Quote1Text>",
|
||||||
|
"<CloseQuote1>",
|
||||||
|
"<Quote2Text>",
|
||||||
|
"<CloseQuote2>",
|
||||||
|
"<CommentText1>",
|
||||||
|
"\"-->\"",
|
||||||
|
"<CommentText2>",
|
||||||
|
"\">\"",
|
||||||
|
};
|
||||||
|
|
||||||
|
}
|
File diff suppressed because it is too large
Load Diff
|
@ -0,0 +1,192 @@
|
||||||
|
/* Generated By:JavaCC: Do not edit this line. ParseException.java Version 3.0 */
|
||||||
|
package org.apache.lucene.demo.html;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* This exception is thrown when parse errors are encountered.
|
||||||
|
* You can explicitly create objects of this exception type by
|
||||||
|
* calling the method generateParseException in the generated
|
||||||
|
* parser.
|
||||||
|
*
|
||||||
|
* You can modify this class to customize your error reporting
|
||||||
|
* mechanisms so long as you retain the public fields.
|
||||||
|
*/
|
||||||
|
public class ParseException extends Exception {
|
||||||
|
|
||||||
|
/**
|
||||||
|
* This constructor is used by the method "generateParseException"
|
||||||
|
* in the generated parser. Calling this constructor generates
|
||||||
|
* a new object of this type with the fields "currentToken",
|
||||||
|
* "expectedTokenSequences", and "tokenImage" set. The boolean
|
||||||
|
* flag "specialConstructor" is also set to true to indicate that
|
||||||
|
* this constructor was used to create this object.
|
||||||
|
* This constructor calls its super class with the empty string
|
||||||
|
* to force the "toString" method of parent class "Throwable" to
|
||||||
|
* print the error message in the form:
|
||||||
|
* ParseException: <result of getMessage>
|
||||||
|
*/
|
||||||
|
public ParseException(Token currentTokenVal,
|
||||||
|
int[][] expectedTokenSequencesVal,
|
||||||
|
String[] tokenImageVal
|
||||||
|
)
|
||||||
|
{
|
||||||
|
super("");
|
||||||
|
specialConstructor = true;
|
||||||
|
currentToken = currentTokenVal;
|
||||||
|
expectedTokenSequences = expectedTokenSequencesVal;
|
||||||
|
tokenImage = tokenImageVal;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* The following constructors are for use by you for whatever
|
||||||
|
* purpose you can think of. Constructing the exception in this
|
||||||
|
* manner makes the exception behave in the normal way - i.e., as
|
||||||
|
* documented in the class "Throwable". The fields "errorToken",
|
||||||
|
* "expectedTokenSequences", and "tokenImage" do not contain
|
||||||
|
* relevant information. The JavaCC generated code does not use
|
||||||
|
* these constructors.
|
||||||
|
*/
|
||||||
|
|
||||||
|
public ParseException() {
|
||||||
|
super();
|
||||||
|
specialConstructor = false;
|
||||||
|
}
|
||||||
|
|
||||||
|
public ParseException(String message) {
|
||||||
|
super(message);
|
||||||
|
specialConstructor = false;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* This variable determines which constructor was used to create
|
||||||
|
* this object and thereby affects the semantics of the
|
||||||
|
* "getMessage" method (see below).
|
||||||
|
*/
|
||||||
|
protected boolean specialConstructor;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* This is the last token that has been consumed successfully. If
|
||||||
|
* this object has been created due to a parse error, the token
|
||||||
|
* followng this token will (therefore) be the first error token.
|
||||||
|
*/
|
||||||
|
public Token currentToken;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Each entry in this array is an array of integers. Each array
|
||||||
|
* of integers represents a sequence of tokens (by their ordinal
|
||||||
|
* values) that is expected at this point of the parse.
|
||||||
|
*/
|
||||||
|
public int[][] expectedTokenSequences;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* This is a reference to the "tokenImage" array of the generated
|
||||||
|
* parser within which the parse error occurred. This array is
|
||||||
|
* defined in the generated ...Constants interface.
|
||||||
|
*/
|
||||||
|
public String[] tokenImage;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* This method has the standard behavior when this object has been
|
||||||
|
* created using the standard constructors. Otherwise, it uses
|
||||||
|
* "currentToken" and "expectedTokenSequences" to generate a parse
|
||||||
|
* error message and returns it. If this object has been created
|
||||||
|
* due to a parse error, and you do not catch it (it gets thrown
|
||||||
|
* from the parser), then this method is called during the printing
|
||||||
|
* of the final stack trace, and hence the correct error message
|
||||||
|
* gets displayed.
|
||||||
|
*/
|
||||||
|
public String getMessage() {
|
||||||
|
if (!specialConstructor) {
|
||||||
|
return super.getMessage();
|
||||||
|
}
|
||||||
|
String expected = "";
|
||||||
|
int maxSize = 0;
|
||||||
|
for (int i = 0; i < expectedTokenSequences.length; i++) {
|
||||||
|
if (maxSize < expectedTokenSequences[i].length) {
|
||||||
|
maxSize = expectedTokenSequences[i].length;
|
||||||
|
}
|
||||||
|
for (int j = 0; j < expectedTokenSequences[i].length; j++) {
|
||||||
|
expected += tokenImage[expectedTokenSequences[i][j]] + " ";
|
||||||
|
}
|
||||||
|
if (expectedTokenSequences[i][expectedTokenSequences[i].length - 1] != 0) {
|
||||||
|
expected += "...";
|
||||||
|
}
|
||||||
|
expected += eol + " ";
|
||||||
|
}
|
||||||
|
String retval = "Encountered \"";
|
||||||
|
Token tok = currentToken.next;
|
||||||
|
for (int i = 0; i < maxSize; i++) {
|
||||||
|
if (i != 0) retval += " ";
|
||||||
|
if (tok.kind == 0) {
|
||||||
|
retval += tokenImage[0];
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
retval += add_escapes(tok.image);
|
||||||
|
tok = tok.next;
|
||||||
|
}
|
||||||
|
retval += "\" at line " + currentToken.next.beginLine + ", column " + currentToken.next.beginColumn;
|
||||||
|
retval += "." + eol;
|
||||||
|
if (expectedTokenSequences.length == 1) {
|
||||||
|
retval += "Was expecting:" + eol + " ";
|
||||||
|
} else {
|
||||||
|
retval += "Was expecting one of:" + eol + " ";
|
||||||
|
}
|
||||||
|
retval += expected;
|
||||||
|
return retval;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* The end of line string for this machine.
|
||||||
|
*/
|
||||||
|
protected String eol = System.getProperty("line.separator", "\n");
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Used to convert raw characters to their escaped version
|
||||||
|
* when these raw version cannot be used as part of an ASCII
|
||||||
|
* string literal.
|
||||||
|
*/
|
||||||
|
protected String add_escapes(String str) {
|
||||||
|
StringBuffer retval = new StringBuffer();
|
||||||
|
char ch;
|
||||||
|
for (int i = 0; i < str.length(); i++) {
|
||||||
|
switch (str.charAt(i))
|
||||||
|
{
|
||||||
|
case 0 :
|
||||||
|
continue;
|
||||||
|
case '\b':
|
||||||
|
retval.append("\\b");
|
||||||
|
continue;
|
||||||
|
case '\t':
|
||||||
|
retval.append("\\t");
|
||||||
|
continue;
|
||||||
|
case '\n':
|
||||||
|
retval.append("\\n");
|
||||||
|
continue;
|
||||||
|
case '\f':
|
||||||
|
retval.append("\\f");
|
||||||
|
continue;
|
||||||
|
case '\r':
|
||||||
|
retval.append("\\r");
|
||||||
|
continue;
|
||||||
|
case '\"':
|
||||||
|
retval.append("\\\"");
|
||||||
|
continue;
|
||||||
|
case '\'':
|
||||||
|
retval.append("\\\'");
|
||||||
|
continue;
|
||||||
|
case '\\':
|
||||||
|
retval.append("\\\\");
|
||||||
|
continue;
|
||||||
|
default:
|
||||||
|
if ((ch = str.charAt(i)) < 0x20 || ch > 0x7e) {
|
||||||
|
String s = "0000" + Integer.toString(ch, 16);
|
||||||
|
retval.append("\\u" + s.substring(s.length() - 4, s.length()));
|
||||||
|
} else {
|
||||||
|
retval.append(ch);
|
||||||
|
}
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return retval.toString();
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
|
@ -0,0 +1,401 @@
|
||||||
|
/* Generated By:JavaCC: Do not edit this line. SimpleCharStream.java Version 3.0 */
|
||||||
|
package org.apache.lucene.demo.html;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* An implementation of interface CharStream, where the stream is assumed to
|
||||||
|
* contain only ASCII characters (without unicode processing).
|
||||||
|
*/
|
||||||
|
|
||||||
|
public class SimpleCharStream
|
||||||
|
{
|
||||||
|
public static final boolean staticFlag = false;
|
||||||
|
int bufsize;
|
||||||
|
int available;
|
||||||
|
int tokenBegin;
|
||||||
|
public int bufpos = -1;
|
||||||
|
protected int bufline[];
|
||||||
|
protected int bufcolumn[];
|
||||||
|
|
||||||
|
protected int column = 0;
|
||||||
|
protected int line = 1;
|
||||||
|
|
||||||
|
protected boolean prevCharIsCR = false;
|
||||||
|
protected boolean prevCharIsLF = false;
|
||||||
|
|
||||||
|
protected java.io.Reader inputStream;
|
||||||
|
|
||||||
|
protected char[] buffer;
|
||||||
|
protected int maxNextCharInd = 0;
|
||||||
|
protected int inBuf = 0;
|
||||||
|
|
||||||
|
protected void ExpandBuff(boolean wrapAround)
|
||||||
|
{
|
||||||
|
char[] newbuffer = new char[bufsize + 2048];
|
||||||
|
int newbufline[] = new int[bufsize + 2048];
|
||||||
|
int newbufcolumn[] = new int[bufsize + 2048];
|
||||||
|
|
||||||
|
try
|
||||||
|
{
|
||||||
|
if (wrapAround)
|
||||||
|
{
|
||||||
|
System.arraycopy(buffer, tokenBegin, newbuffer, 0, bufsize - tokenBegin);
|
||||||
|
System.arraycopy(buffer, 0, newbuffer,
|
||||||
|
bufsize - tokenBegin, bufpos);
|
||||||
|
buffer = newbuffer;
|
||||||
|
|
||||||
|
System.arraycopy(bufline, tokenBegin, newbufline, 0, bufsize - tokenBegin);
|
||||||
|
System.arraycopy(bufline, 0, newbufline, bufsize - tokenBegin, bufpos);
|
||||||
|
bufline = newbufline;
|
||||||
|
|
||||||
|
System.arraycopy(bufcolumn, tokenBegin, newbufcolumn, 0, bufsize - tokenBegin);
|
||||||
|
System.arraycopy(bufcolumn, 0, newbufcolumn, bufsize - tokenBegin, bufpos);
|
||||||
|
bufcolumn = newbufcolumn;
|
||||||
|
|
||||||
|
maxNextCharInd = (bufpos += (bufsize - tokenBegin));
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
System.arraycopy(buffer, tokenBegin, newbuffer, 0, bufsize - tokenBegin);
|
||||||
|
buffer = newbuffer;
|
||||||
|
|
||||||
|
System.arraycopy(bufline, tokenBegin, newbufline, 0, bufsize - tokenBegin);
|
||||||
|
bufline = newbufline;
|
||||||
|
|
||||||
|
System.arraycopy(bufcolumn, tokenBegin, newbufcolumn, 0, bufsize - tokenBegin);
|
||||||
|
bufcolumn = newbufcolumn;
|
||||||
|
|
||||||
|
maxNextCharInd = (bufpos -= tokenBegin);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
catch (Throwable t)
|
||||||
|
{
|
||||||
|
throw new Error(t.getMessage());
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
bufsize += 2048;
|
||||||
|
available = bufsize;
|
||||||
|
tokenBegin = 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
protected void FillBuff() throws java.io.IOException
|
||||||
|
{
|
||||||
|
if (maxNextCharInd == available)
|
||||||
|
{
|
||||||
|
if (available == bufsize)
|
||||||
|
{
|
||||||
|
if (tokenBegin > 2048)
|
||||||
|
{
|
||||||
|
bufpos = maxNextCharInd = 0;
|
||||||
|
available = tokenBegin;
|
||||||
|
}
|
||||||
|
else if (tokenBegin < 0)
|
||||||
|
bufpos = maxNextCharInd = 0;
|
||||||
|
else
|
||||||
|
ExpandBuff(false);
|
||||||
|
}
|
||||||
|
else if (available > tokenBegin)
|
||||||
|
available = bufsize;
|
||||||
|
else if ((tokenBegin - available) < 2048)
|
||||||
|
ExpandBuff(true);
|
||||||
|
else
|
||||||
|
available = tokenBegin;
|
||||||
|
}
|
||||||
|
|
||||||
|
int i;
|
||||||
|
try {
|
||||||
|
if ((i = inputStream.read(buffer, maxNextCharInd,
|
||||||
|
available - maxNextCharInd)) == -1)
|
||||||
|
{
|
||||||
|
inputStream.close();
|
||||||
|
throw new java.io.IOException();
|
||||||
|
}
|
||||||
|
else
|
||||||
|
maxNextCharInd += i;
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
catch(java.io.IOException e) {
|
||||||
|
--bufpos;
|
||||||
|
backup(0);
|
||||||
|
if (tokenBegin == -1)
|
||||||
|
tokenBegin = bufpos;
|
||||||
|
throw e;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
public char BeginToken() throws java.io.IOException
|
||||||
|
{
|
||||||
|
tokenBegin = -1;
|
||||||
|
char c = readChar();
|
||||||
|
tokenBegin = bufpos;
|
||||||
|
|
||||||
|
return c;
|
||||||
|
}
|
||||||
|
|
||||||
|
protected void UpdateLineColumn(char c)
|
||||||
|
{
|
||||||
|
column++;
|
||||||
|
|
||||||
|
if (prevCharIsLF)
|
||||||
|
{
|
||||||
|
prevCharIsLF = false;
|
||||||
|
line += (column = 1);
|
||||||
|
}
|
||||||
|
else if (prevCharIsCR)
|
||||||
|
{
|
||||||
|
prevCharIsCR = false;
|
||||||
|
if (c == '\n')
|
||||||
|
{
|
||||||
|
prevCharIsLF = true;
|
||||||
|
}
|
||||||
|
else
|
||||||
|
line += (column = 1);
|
||||||
|
}
|
||||||
|
|
||||||
|
switch (c)
|
||||||
|
{
|
||||||
|
case '\r' :
|
||||||
|
prevCharIsCR = true;
|
||||||
|
break;
|
||||||
|
case '\n' :
|
||||||
|
prevCharIsLF = true;
|
||||||
|
break;
|
||||||
|
case '\t' :
|
||||||
|
column--;
|
||||||
|
column += (8 - (column & 07));
|
||||||
|
break;
|
||||||
|
default :
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
|
||||||
|
bufline[bufpos] = line;
|
||||||
|
bufcolumn[bufpos] = column;
|
||||||
|
}
|
||||||
|
|
||||||
|
public char readChar() throws java.io.IOException
|
||||||
|
{
|
||||||
|
if (inBuf > 0)
|
||||||
|
{
|
||||||
|
--inBuf;
|
||||||
|
|
||||||
|
if (++bufpos == bufsize)
|
||||||
|
bufpos = 0;
|
||||||
|
|
||||||
|
return buffer[bufpos];
|
||||||
|
}
|
||||||
|
|
||||||
|
if (++bufpos >= maxNextCharInd)
|
||||||
|
FillBuff();
|
||||||
|
|
||||||
|
char c = buffer[bufpos];
|
||||||
|
|
||||||
|
UpdateLineColumn(c);
|
||||||
|
return (c);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* @deprecated
|
||||||
|
* @see #getEndColumn
|
||||||
|
*/
|
||||||
|
|
||||||
|
public int getColumn() {
|
||||||
|
return bufcolumn[bufpos];
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* @deprecated
|
||||||
|
* @see #getEndLine
|
||||||
|
*/
|
||||||
|
|
||||||
|
public int getLine() {
|
||||||
|
return bufline[bufpos];
|
||||||
|
}
|
||||||
|
|
||||||
|
public int getEndColumn() {
|
||||||
|
return bufcolumn[bufpos];
|
||||||
|
}
|
||||||
|
|
||||||
|
public int getEndLine() {
|
||||||
|
return bufline[bufpos];
|
||||||
|
}
|
||||||
|
|
||||||
|
public int getBeginColumn() {
|
||||||
|
return bufcolumn[tokenBegin];
|
||||||
|
}
|
||||||
|
|
||||||
|
public int getBeginLine() {
|
||||||
|
return bufline[tokenBegin];
|
||||||
|
}
|
||||||
|
|
||||||
|
public void backup(int amount) {
|
||||||
|
|
||||||
|
inBuf += amount;
|
||||||
|
if ((bufpos -= amount) < 0)
|
||||||
|
bufpos += bufsize;
|
||||||
|
}
|
||||||
|
|
||||||
|
public SimpleCharStream(java.io.Reader dstream, int startline,
|
||||||
|
int startcolumn, int buffersize)
|
||||||
|
{
|
||||||
|
inputStream = dstream;
|
||||||
|
line = startline;
|
||||||
|
column = startcolumn - 1;
|
||||||
|
|
||||||
|
available = bufsize = buffersize;
|
||||||
|
buffer = new char[buffersize];
|
||||||
|
bufline = new int[buffersize];
|
||||||
|
bufcolumn = new int[buffersize];
|
||||||
|
}
|
||||||
|
|
||||||
|
public SimpleCharStream(java.io.Reader dstream, int startline,
|
||||||
|
int startcolumn)
|
||||||
|
{
|
||||||
|
this(dstream, startline, startcolumn, 4096);
|
||||||
|
}
|
||||||
|
|
||||||
|
public SimpleCharStream(java.io.Reader dstream)
|
||||||
|
{
|
||||||
|
this(dstream, 1, 1, 4096);
|
||||||
|
}
|
||||||
|
public void ReInit(java.io.Reader dstream, int startline,
|
||||||
|
int startcolumn, int buffersize)
|
||||||
|
{
|
||||||
|
inputStream = dstream;
|
||||||
|
line = startline;
|
||||||
|
column = startcolumn - 1;
|
||||||
|
|
||||||
|
if (buffer == null || buffersize != buffer.length)
|
||||||
|
{
|
||||||
|
available = bufsize = buffersize;
|
||||||
|
buffer = new char[buffersize];
|
||||||
|
bufline = new int[buffersize];
|
||||||
|
bufcolumn = new int[buffersize];
|
||||||
|
}
|
||||||
|
prevCharIsLF = prevCharIsCR = false;
|
||||||
|
tokenBegin = inBuf = maxNextCharInd = 0;
|
||||||
|
bufpos = -1;
|
||||||
|
}
|
||||||
|
|
||||||
|
public void ReInit(java.io.Reader dstream, int startline,
|
||||||
|
int startcolumn)
|
||||||
|
{
|
||||||
|
ReInit(dstream, startline, startcolumn, 4096);
|
||||||
|
}
|
||||||
|
|
||||||
|
public void ReInit(java.io.Reader dstream)
|
||||||
|
{
|
||||||
|
ReInit(dstream, 1, 1, 4096);
|
||||||
|
}
|
||||||
|
public SimpleCharStream(java.io.InputStream dstream, int startline,
|
||||||
|
int startcolumn, int buffersize)
|
||||||
|
{
|
||||||
|
this(new java.io.InputStreamReader(dstream), startline, startcolumn, 4096);
|
||||||
|
}
|
||||||
|
|
||||||
|
public SimpleCharStream(java.io.InputStream dstream, int startline,
|
||||||
|
int startcolumn)
|
||||||
|
{
|
||||||
|
this(dstream, startline, startcolumn, 4096);
|
||||||
|
}
|
||||||
|
|
||||||
|
public SimpleCharStream(java.io.InputStream dstream)
|
||||||
|
{
|
||||||
|
this(dstream, 1, 1, 4096);
|
||||||
|
}
|
||||||
|
|
||||||
|
public void ReInit(java.io.InputStream dstream, int startline,
|
||||||
|
int startcolumn, int buffersize)
|
||||||
|
{
|
||||||
|
ReInit(new java.io.InputStreamReader(dstream), startline, startcolumn, 4096);
|
||||||
|
}
|
||||||
|
|
||||||
|
public void ReInit(java.io.InputStream dstream)
|
||||||
|
{
|
||||||
|
ReInit(dstream, 1, 1, 4096);
|
||||||
|
}
|
||||||
|
public void ReInit(java.io.InputStream dstream, int startline,
|
||||||
|
int startcolumn)
|
||||||
|
{
|
||||||
|
ReInit(dstream, startline, startcolumn, 4096);
|
||||||
|
}
|
||||||
|
public String GetImage()
|
||||||
|
{
|
||||||
|
if (bufpos >= tokenBegin)
|
||||||
|
return new String(buffer, tokenBegin, bufpos - tokenBegin + 1);
|
||||||
|
else
|
||||||
|
return new String(buffer, tokenBegin, bufsize - tokenBegin) +
|
||||||
|
new String(buffer, 0, bufpos + 1);
|
||||||
|
}
|
||||||
|
|
||||||
|
public char[] GetSuffix(int len)
|
||||||
|
{
|
||||||
|
char[] ret = new char[len];
|
||||||
|
|
||||||
|
if ((bufpos + 1) >= len)
|
||||||
|
System.arraycopy(buffer, bufpos - len + 1, ret, 0, len);
|
||||||
|
else
|
||||||
|
{
|
||||||
|
System.arraycopy(buffer, bufsize - (len - bufpos - 1), ret, 0,
|
||||||
|
len - bufpos - 1);
|
||||||
|
System.arraycopy(buffer, 0, ret, len - bufpos - 1, bufpos + 1);
|
||||||
|
}
|
||||||
|
|
||||||
|
return ret;
|
||||||
|
}
|
||||||
|
|
||||||
|
public void Done()
|
||||||
|
{
|
||||||
|
buffer = null;
|
||||||
|
bufline = null;
|
||||||
|
bufcolumn = null;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Method to adjust line and column numbers for the start of a token.
|
||||||
|
*/
|
||||||
|
public void adjustBeginLineColumn(int newLine, int newCol)
|
||||||
|
{
|
||||||
|
int start = tokenBegin;
|
||||||
|
int len;
|
||||||
|
|
||||||
|
if (bufpos >= tokenBegin)
|
||||||
|
{
|
||||||
|
len = bufpos - tokenBegin + inBuf + 1;
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
len = bufsize - tokenBegin + bufpos + 1 + inBuf;
|
||||||
|
}
|
||||||
|
|
||||||
|
int i = 0, j = 0, k = 0;
|
||||||
|
int nextColDiff = 0, columnDiff = 0;
|
||||||
|
|
||||||
|
while (i < len &&
|
||||||
|
bufline[j = start % bufsize] == bufline[k = ++start % bufsize])
|
||||||
|
{
|
||||||
|
bufline[j] = newLine;
|
||||||
|
nextColDiff = columnDiff + bufcolumn[k] - bufcolumn[j];
|
||||||
|
bufcolumn[j] = newCol + columnDiff;
|
||||||
|
columnDiff = nextColDiff;
|
||||||
|
i++;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (i < len)
|
||||||
|
{
|
||||||
|
bufline[j] = newLine++;
|
||||||
|
bufcolumn[j] = newCol + columnDiff;
|
||||||
|
|
||||||
|
while (i++ < len)
|
||||||
|
{
|
||||||
|
if (bufline[j = start % bufsize] != bufline[++start % bufsize])
|
||||||
|
bufline[j] = newLine++;
|
||||||
|
else
|
||||||
|
bufline[j] = newLine;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
line = bufline[j];
|
||||||
|
column = bufcolumn[j];
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
|
@ -0,0 +1,81 @@
|
||||||
|
/* Generated By:JavaCC: Do not edit this line. Token.java Version 3.0 */
|
||||||
|
package org.apache.lucene.demo.html;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Describes the input token stream.
|
||||||
|
*/
|
||||||
|
|
||||||
|
public class Token {
|
||||||
|
|
||||||
|
/**
|
||||||
|
* An integer that describes the kind of this token. This numbering
|
||||||
|
* system is determined by JavaCCParser, and a table of these numbers is
|
||||||
|
* stored in the file ...Constants.java.
|
||||||
|
*/
|
||||||
|
public int kind;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* beginLine and beginColumn describe the position of the first character
|
||||||
|
* of this token; endLine and endColumn describe the position of the
|
||||||
|
* last character of this token.
|
||||||
|
*/
|
||||||
|
public int beginLine, beginColumn, endLine, endColumn;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* The string image of the token.
|
||||||
|
*/
|
||||||
|
public String image;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* A reference to the next regular (non-special) token from the input
|
||||||
|
* stream. If this is the last token from the input stream, or if the
|
||||||
|
* token manager has not read tokens beyond this one, this field is
|
||||||
|
* set to null. This is true only if this token is also a regular
|
||||||
|
* token. Otherwise, see below for a description of the contents of
|
||||||
|
* this field.
|
||||||
|
*/
|
||||||
|
public Token next;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* This field is used to access special tokens that occur prior to this
|
||||||
|
* token, but after the immediately preceding regular (non-special) token.
|
||||||
|
* If there are no such special tokens, this field is set to null.
|
||||||
|
* When there are more than one such special token, this field refers
|
||||||
|
* to the last of these special tokens, which in turn refers to the next
|
||||||
|
* previous special token through its specialToken field, and so on
|
||||||
|
* until the first special token (whose specialToken field is null).
|
||||||
|
* The next fields of special tokens refer to other special tokens that
|
||||||
|
* immediately follow it (without an intervening regular token). If there
|
||||||
|
* is no such token, this field is null.
|
||||||
|
*/
|
||||||
|
public Token specialToken;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Returns the image.
|
||||||
|
*/
|
||||||
|
public String toString()
|
||||||
|
{
|
||||||
|
return image;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Returns a new Token object, by default. However, if you want, you
|
||||||
|
* can create and return subclass objects based on the value of ofKind.
|
||||||
|
* Simply add the cases to the switch for all those special cases.
|
||||||
|
* For example, if you have a subclass of Token called IDToken that
|
||||||
|
* you want to create if ofKind is ID, simlpy add something like :
|
||||||
|
*
|
||||||
|
* case MyParserConstants.ID : return new IDToken();
|
||||||
|
*
|
||||||
|
* to the following switch statement. Then you can cast matchedToken
|
||||||
|
* variable to the appropriate type and use it in your lexical actions.
|
||||||
|
*/
|
||||||
|
public static final Token newToken(int ofKind)
|
||||||
|
{
|
||||||
|
switch(ofKind)
|
||||||
|
{
|
||||||
|
default : return new Token();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
|
@ -0,0 +1,133 @@
|
||||||
|
/* Generated By:JavaCC: Do not edit this line. TokenMgrError.java Version 3.0 */
|
||||||
|
package org.apache.lucene.demo.html;
|
||||||
|
|
||||||
|
public class TokenMgrError extends Error
|
||||||
|
{
|
||||||
|
/*
|
||||||
|
* Ordinals for various reasons why an Error of this type can be thrown.
|
||||||
|
*/
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Lexical error occured.
|
||||||
|
*/
|
||||||
|
static final int LEXICAL_ERROR = 0;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* An attempt wass made to create a second instance of a static token manager.
|
||||||
|
*/
|
||||||
|
static final int STATIC_LEXER_ERROR = 1;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Tried to change to an invalid lexical state.
|
||||||
|
*/
|
||||||
|
static final int INVALID_LEXICAL_STATE = 2;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Detected (and bailed out of) an infinite loop in the token manager.
|
||||||
|
*/
|
||||||
|
static final int LOOP_DETECTED = 3;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Indicates the reason why the exception is thrown. It will have
|
||||||
|
* one of the above 4 values.
|
||||||
|
*/
|
||||||
|
int errorCode;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Replaces unprintable characters by their espaced (or unicode escaped)
|
||||||
|
* equivalents in the given string
|
||||||
|
*/
|
||||||
|
protected static final String addEscapes(String str) {
|
||||||
|
StringBuffer retval = new StringBuffer();
|
||||||
|
char ch;
|
||||||
|
for (int i = 0; i < str.length(); i++) {
|
||||||
|
switch (str.charAt(i))
|
||||||
|
{
|
||||||
|
case 0 :
|
||||||
|
continue;
|
||||||
|
case '\b':
|
||||||
|
retval.append("\\b");
|
||||||
|
continue;
|
||||||
|
case '\t':
|
||||||
|
retval.append("\\t");
|
||||||
|
continue;
|
||||||
|
case '\n':
|
||||||
|
retval.append("\\n");
|
||||||
|
continue;
|
||||||
|
case '\f':
|
||||||
|
retval.append("\\f");
|
||||||
|
continue;
|
||||||
|
case '\r':
|
||||||
|
retval.append("\\r");
|
||||||
|
continue;
|
||||||
|
case '\"':
|
||||||
|
retval.append("\\\"");
|
||||||
|
continue;
|
||||||
|
case '\'':
|
||||||
|
retval.append("\\\'");
|
||||||
|
continue;
|
||||||
|
case '\\':
|
||||||
|
retval.append("\\\\");
|
||||||
|
continue;
|
||||||
|
default:
|
||||||
|
if ((ch = str.charAt(i)) < 0x20 || ch > 0x7e) {
|
||||||
|
String s = "0000" + Integer.toString(ch, 16);
|
||||||
|
retval.append("\\u" + s.substring(s.length() - 4, s.length()));
|
||||||
|
} else {
|
||||||
|
retval.append(ch);
|
||||||
|
}
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return retval.toString();
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Returns a detailed message for the Error when it is thrown by the
|
||||||
|
* token manager to indicate a lexical error.
|
||||||
|
* Parameters :
|
||||||
|
* EOFSeen : indicates if EOF caused the lexicl error
|
||||||
|
* curLexState : lexical state in which this error occured
|
||||||
|
* errorLine : line number when the error occured
|
||||||
|
* errorColumn : column number when the error occured
|
||||||
|
* errorAfter : prefix that was seen before this error occured
|
||||||
|
* curchar : the offending character
|
||||||
|
* Note: You can customize the lexical error message by modifying this method.
|
||||||
|
*/
|
||||||
|
protected static String LexicalError(boolean EOFSeen, int lexState, int errorLine, int errorColumn, String errorAfter, char curChar) {
|
||||||
|
return("Lexical error at line " +
|
||||||
|
errorLine + ", column " +
|
||||||
|
errorColumn + ". Encountered: " +
|
||||||
|
(EOFSeen ? "<EOF> " : ("\"" + addEscapes(String.valueOf(curChar)) + "\"") + " (" + (int)curChar + "), ") +
|
||||||
|
"after : \"" + addEscapes(errorAfter) + "\"");
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* You can also modify the body of this method to customize your error messages.
|
||||||
|
* For example, cases like LOOP_DETECTED and INVALID_LEXICAL_STATE are not
|
||||||
|
* of end-users concern, so you can return something like :
|
||||||
|
*
|
||||||
|
* "Internal Error : Please file a bug report .... "
|
||||||
|
*
|
||||||
|
* from this method for such cases in the release version of your parser.
|
||||||
|
*/
|
||||||
|
public String getMessage() {
|
||||||
|
return super.getMessage();
|
||||||
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Constructors of various flavors follow.
|
||||||
|
*/
|
||||||
|
|
||||||
|
public TokenMgrError() {
|
||||||
|
}
|
||||||
|
|
||||||
|
public TokenMgrError(String message, int reason) {
|
||||||
|
super(message);
|
||||||
|
errorCode = reason;
|
||||||
|
}
|
||||||
|
|
||||||
|
public TokenMgrError(boolean EOFSeen, int lexState, int errorLine, int errorColumn, String errorAfter, char curChar, int reason) {
|
||||||
|
this(LexicalError(EOFSeen, lexState, errorLine, errorColumn, errorAfter, curChar), reason);
|
||||||
|
}
|
||||||
|
}
|
|
@ -1,6 +0,0 @@
|
||||||
Token.java
|
|
||||||
StandardTokenizer.java
|
|
||||||
StandardTokenizerTokenManager.java
|
|
||||||
TokenMgrError.java
|
|
||||||
CharStream.java
|
|
||||||
StandardTokenizerConstants.java
|
|
|
@ -0,0 +1,110 @@
|
||||||
|
/* Generated By:JavaCC: Do not edit this line. CharStream.java Version 3.0 */
|
||||||
|
package org.apache.lucene.analysis.standard;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* This interface describes a character stream that maintains line and
|
||||||
|
* column number positions of the characters. It also has the capability
|
||||||
|
* to backup the stream to some extent. An implementation of this
|
||||||
|
* interface is used in the TokenManager implementation generated by
|
||||||
|
* JavaCCParser.
|
||||||
|
*
|
||||||
|
* All the methods except backup can be implemented in any fashion. backup
|
||||||
|
* needs to be implemented correctly for the correct operation of the lexer.
|
||||||
|
* Rest of the methods are all used to get information like line number,
|
||||||
|
* column number and the String that constitutes a token and are not used
|
||||||
|
* by the lexer. Hence their implementation won't affect the generated lexer's
|
||||||
|
* operation.
|
||||||
|
*/
|
||||||
|
|
||||||
|
public interface CharStream {
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Returns the next character from the selected input. The method
|
||||||
|
* of selecting the input is the responsibility of the class
|
||||||
|
* implementing this interface. Can throw any java.io.IOException.
|
||||||
|
*/
|
||||||
|
char readChar() throws java.io.IOException;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Returns the column position of the character last read.
|
||||||
|
* @deprecated
|
||||||
|
* @see #getEndColumn
|
||||||
|
*/
|
||||||
|
int getColumn();
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Returns the line number of the character last read.
|
||||||
|
* @deprecated
|
||||||
|
* @see #getEndLine
|
||||||
|
*/
|
||||||
|
int getLine();
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Returns the column number of the last character for current token (being
|
||||||
|
* matched after the last call to BeginTOken).
|
||||||
|
*/
|
||||||
|
int getEndColumn();
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Returns the line number of the last character for current token (being
|
||||||
|
* matched after the last call to BeginTOken).
|
||||||
|
*/
|
||||||
|
int getEndLine();
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Returns the column number of the first character for current token (being
|
||||||
|
* matched after the last call to BeginTOken).
|
||||||
|
*/
|
||||||
|
int getBeginColumn();
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Returns the line number of the first character for current token (being
|
||||||
|
* matched after the last call to BeginTOken).
|
||||||
|
*/
|
||||||
|
int getBeginLine();
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Backs up the input stream by amount steps. Lexer calls this method if it
|
||||||
|
* had already read some characters, but could not use them to match a
|
||||||
|
* (longer) token. So, they will be used again as the prefix of the next
|
||||||
|
* token and it is the implemetation's responsibility to do this right.
|
||||||
|
*/
|
||||||
|
void backup(int amount);
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Returns the next character that marks the beginning of the next token.
|
||||||
|
* All characters must remain in the buffer between two successive calls
|
||||||
|
* to this method to implement backup correctly.
|
||||||
|
*/
|
||||||
|
char BeginToken() throws java.io.IOException;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Returns a string made up of characters from the marked token beginning
|
||||||
|
* to the current buffer position. Implementations have the choice of returning
|
||||||
|
* anything that they want to. For example, for efficiency, one might decide
|
||||||
|
* to just return null, which is a valid implementation.
|
||||||
|
*/
|
||||||
|
String GetImage();
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Returns an array of characters that make up the suffix of length 'len' for
|
||||||
|
* the currently matched token. This is used to build up the matched string
|
||||||
|
* for use in actions in the case of MORE. A simple and inefficient
|
||||||
|
* implementation of this is as follows :
|
||||||
|
*
|
||||||
|
* {
|
||||||
|
* String t = GetImage();
|
||||||
|
* return t.substring(t.length() - len, t.length()).toCharArray();
|
||||||
|
* }
|
||||||
|
*/
|
||||||
|
char[] GetSuffix(int len);
|
||||||
|
|
||||||
|
/**
|
||||||
|
* The lexer calls this function to indicate that it is done with the stream
|
||||||
|
* and hence implementations can free any resources held by this class.
|
||||||
|
* Again, the body of this function can be just empty and it will not
|
||||||
|
* affect the lexer's operation.
|
||||||
|
*/
|
||||||
|
void Done();
|
||||||
|
|
||||||
|
}
|
|
@ -0,0 +1,195 @@
|
||||||
|
/* Generated By:JavaCC: Do not edit this line. StandardTokenizer.java */
|
||||||
|
package org.apache.lucene.analysis.standard;
|
||||||
|
|
||||||
|
import java.io.*;
|
||||||
|
|
||||||
|
/** A grammar-based tokenizer constructed with JavaCC.
|
||||||
|
*
|
||||||
|
* <p> This should be a good tokenizer for most European-language documents.
|
||||||
|
*
|
||||||
|
* <p>Many applications have specific tokenizer needs. If this tokenizer does
|
||||||
|
* not suit your application, please consider copying this source code
|
||||||
|
* directory to your project and maintaining your own grammar-based tokenizer.
|
||||||
|
*/
|
||||||
|
public class StandardTokenizer extends org.apache.lucene.analysis.Tokenizer implements StandardTokenizerConstants {
|
||||||
|
|
||||||
|
/** Constructs a tokenizer for this Reader. */
|
||||||
|
public StandardTokenizer(Reader reader) {
|
||||||
|
this(new FastCharStream(reader));
|
||||||
|
this.input = reader;
|
||||||
|
}
|
||||||
|
|
||||||
|
/** Returns the next token in the stream, or null at EOS.
|
||||||
|
* <p>The returned token's type is set to an element of {@link
|
||||||
|
* StandardTokenizerConstants#tokenImage}.
|
||||||
|
*/
|
||||||
|
final public org.apache.lucene.analysis.Token next() throws ParseException, IOException {
|
||||||
|
Token token = null;
|
||||||
|
switch ((jj_ntk==-1)?jj_ntk():jj_ntk) {
|
||||||
|
case ALPHANUM:
|
||||||
|
token = jj_consume_token(ALPHANUM);
|
||||||
|
break;
|
||||||
|
case APOSTROPHE:
|
||||||
|
token = jj_consume_token(APOSTROPHE);
|
||||||
|
break;
|
||||||
|
case ACRONYM:
|
||||||
|
token = jj_consume_token(ACRONYM);
|
||||||
|
break;
|
||||||
|
case COMPANY:
|
||||||
|
token = jj_consume_token(COMPANY);
|
||||||
|
break;
|
||||||
|
case EMAIL:
|
||||||
|
token = jj_consume_token(EMAIL);
|
||||||
|
break;
|
||||||
|
case HOST:
|
||||||
|
token = jj_consume_token(HOST);
|
||||||
|
break;
|
||||||
|
case NUM:
|
||||||
|
token = jj_consume_token(NUM);
|
||||||
|
break;
|
||||||
|
case 0:
|
||||||
|
token = jj_consume_token(0);
|
||||||
|
break;
|
||||||
|
default:
|
||||||
|
jj_la1[0] = jj_gen;
|
||||||
|
jj_consume_token(-1);
|
||||||
|
throw new ParseException();
|
||||||
|
}
|
||||||
|
if (token.kind == EOF) {
|
||||||
|
{if (true) return null;}
|
||||||
|
} else {
|
||||||
|
{if (true) return
|
||||||
|
new org.apache.lucene.analysis.Token(token.image,
|
||||||
|
token.beginColumn,token.endColumn,
|
||||||
|
tokenImage[token.kind]);}
|
||||||
|
}
|
||||||
|
throw new Error("Missing return statement in function");
|
||||||
|
}
|
||||||
|
|
||||||
|
public StandardTokenizerTokenManager token_source;
|
||||||
|
public Token token, jj_nt;
|
||||||
|
private int jj_ntk;
|
||||||
|
private int jj_gen;
|
||||||
|
final private int[] jj_la1 = new int[1];
|
||||||
|
static private int[] jj_la1_0;
|
||||||
|
static {
|
||||||
|
jj_la1_0();
|
||||||
|
}
|
||||||
|
private static void jj_la1_0() {
|
||||||
|
jj_la1_0 = new int[] {0xff,};
|
||||||
|
}
|
||||||
|
|
||||||
|
public StandardTokenizer(CharStream stream) {
|
||||||
|
token_source = new StandardTokenizerTokenManager(stream);
|
||||||
|
token = new Token();
|
||||||
|
jj_ntk = -1;
|
||||||
|
jj_gen = 0;
|
||||||
|
for (int i = 0; i < 1; i++) jj_la1[i] = -1;
|
||||||
|
}
|
||||||
|
|
||||||
|
public void ReInit(CharStream stream) {
|
||||||
|
token_source.ReInit(stream);
|
||||||
|
token = new Token();
|
||||||
|
jj_ntk = -1;
|
||||||
|
jj_gen = 0;
|
||||||
|
for (int i = 0; i < 1; i++) jj_la1[i] = -1;
|
||||||
|
}
|
||||||
|
|
||||||
|
public StandardTokenizer(StandardTokenizerTokenManager tm) {
|
||||||
|
token_source = tm;
|
||||||
|
token = new Token();
|
||||||
|
jj_ntk = -1;
|
||||||
|
jj_gen = 0;
|
||||||
|
for (int i = 0; i < 1; i++) jj_la1[i] = -1;
|
||||||
|
}
|
||||||
|
|
||||||
|
public void ReInit(StandardTokenizerTokenManager tm) {
|
||||||
|
token_source = tm;
|
||||||
|
token = new Token();
|
||||||
|
jj_ntk = -1;
|
||||||
|
jj_gen = 0;
|
||||||
|
for (int i = 0; i < 1; i++) jj_la1[i] = -1;
|
||||||
|
}
|
||||||
|
|
||||||
|
final private Token jj_consume_token(int kind) throws ParseException {
|
||||||
|
Token oldToken;
|
||||||
|
if ((oldToken = token).next != null) token = token.next;
|
||||||
|
else token = token.next = token_source.getNextToken();
|
||||||
|
jj_ntk = -1;
|
||||||
|
if (token.kind == kind) {
|
||||||
|
jj_gen++;
|
||||||
|
return token;
|
||||||
|
}
|
||||||
|
token = oldToken;
|
||||||
|
jj_kind = kind;
|
||||||
|
throw generateParseException();
|
||||||
|
}
|
||||||
|
|
||||||
|
final public Token getNextToken() {
|
||||||
|
if (token.next != null) token = token.next;
|
||||||
|
else token = token.next = token_source.getNextToken();
|
||||||
|
jj_ntk = -1;
|
||||||
|
jj_gen++;
|
||||||
|
return token;
|
||||||
|
}
|
||||||
|
|
||||||
|
final public Token getToken(int index) {
|
||||||
|
Token t = token;
|
||||||
|
for (int i = 0; i < index; i++) {
|
||||||
|
if (t.next != null) t = t.next;
|
||||||
|
else t = t.next = token_source.getNextToken();
|
||||||
|
}
|
||||||
|
return t;
|
||||||
|
}
|
||||||
|
|
||||||
|
final private int jj_ntk() {
|
||||||
|
if ((jj_nt=token.next) == null)
|
||||||
|
return (jj_ntk = (token.next=token_source.getNextToken()).kind);
|
||||||
|
else
|
||||||
|
return (jj_ntk = jj_nt.kind);
|
||||||
|
}
|
||||||
|
|
||||||
|
private java.util.Vector jj_expentries = new java.util.Vector();
|
||||||
|
private int[] jj_expentry;
|
||||||
|
private int jj_kind = -1;
|
||||||
|
|
||||||
|
public ParseException generateParseException() {
|
||||||
|
jj_expentries.removeAllElements();
|
||||||
|
boolean[] la1tokens = new boolean[14];
|
||||||
|
for (int i = 0; i < 14; i++) {
|
||||||
|
la1tokens[i] = false;
|
||||||
|
}
|
||||||
|
if (jj_kind >= 0) {
|
||||||
|
la1tokens[jj_kind] = true;
|
||||||
|
jj_kind = -1;
|
||||||
|
}
|
||||||
|
for (int i = 0; i < 1; i++) {
|
||||||
|
if (jj_la1[i] == jj_gen) {
|
||||||
|
for (int j = 0; j < 32; j++) {
|
||||||
|
if ((jj_la1_0[i] & (1<<j)) != 0) {
|
||||||
|
la1tokens[j] = true;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
for (int i = 0; i < 14; i++) {
|
||||||
|
if (la1tokens[i]) {
|
||||||
|
jj_expentry = new int[1];
|
||||||
|
jj_expentry[0] = i;
|
||||||
|
jj_expentries.addElement(jj_expentry);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
int[][] exptokseq = new int[jj_expentries.size()][];
|
||||||
|
for (int i = 0; i < jj_expentries.size(); i++) {
|
||||||
|
exptokseq[i] = (int[])jj_expentries.elementAt(i);
|
||||||
|
}
|
||||||
|
return new ParseException(token, exptokseq, tokenImage);
|
||||||
|
}
|
||||||
|
|
||||||
|
final public void enable_tracing() {
|
||||||
|
}
|
||||||
|
|
||||||
|
final public void disable_tracing() {
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
|
@ -0,0 +1,40 @@
|
||||||
|
/* Generated By:JavaCC: Do not edit this line. StandardTokenizerConstants.java */
|
||||||
|
package org.apache.lucene.analysis.standard;
|
||||||
|
|
||||||
|
public interface StandardTokenizerConstants {
|
||||||
|
|
||||||
|
int EOF = 0;
|
||||||
|
int ALPHANUM = 1;
|
||||||
|
int APOSTROPHE = 2;
|
||||||
|
int ACRONYM = 3;
|
||||||
|
int COMPANY = 4;
|
||||||
|
int EMAIL = 5;
|
||||||
|
int HOST = 6;
|
||||||
|
int NUM = 7;
|
||||||
|
int P = 8;
|
||||||
|
int HAS_DIGIT = 9;
|
||||||
|
int ALPHA = 10;
|
||||||
|
int LETTER = 11;
|
||||||
|
int DIGIT = 12;
|
||||||
|
int NOISE = 13;
|
||||||
|
|
||||||
|
int DEFAULT = 0;
|
||||||
|
|
||||||
|
String[] tokenImage = {
|
||||||
|
"<EOF>",
|
||||||
|
"<ALPHANUM>",
|
||||||
|
"<APOSTROPHE>",
|
||||||
|
"<ACRONYM>",
|
||||||
|
"<COMPANY>",
|
||||||
|
"<EMAIL>",
|
||||||
|
"<HOST>",
|
||||||
|
"<NUM>",
|
||||||
|
"<P>",
|
||||||
|
"<HAS_DIGIT>",
|
||||||
|
"<ALPHA>",
|
||||||
|
"<LETTER>",
|
||||||
|
"<DIGIT>",
|
||||||
|
"<NOISE>",
|
||||||
|
};
|
||||||
|
|
||||||
|
}
|
File diff suppressed because it is too large
Load Diff
|
@ -0,0 +1,81 @@
|
||||||
|
/* Generated By:JavaCC: Do not edit this line. Token.java Version 3.0 */
|
||||||
|
package org.apache.lucene.analysis.standard;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Describes the input token stream.
|
||||||
|
*/
|
||||||
|
|
||||||
|
public class Token {
|
||||||
|
|
||||||
|
/**
|
||||||
|
* An integer that describes the kind of this token. This numbering
|
||||||
|
* system is determined by JavaCCParser, and a table of these numbers is
|
||||||
|
* stored in the file ...Constants.java.
|
||||||
|
*/
|
||||||
|
public int kind;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* beginLine and beginColumn describe the position of the first character
|
||||||
|
* of this token; endLine and endColumn describe the position of the
|
||||||
|
* last character of this token.
|
||||||
|
*/
|
||||||
|
public int beginLine, beginColumn, endLine, endColumn;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* The string image of the token.
|
||||||
|
*/
|
||||||
|
public String image;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* A reference to the next regular (non-special) token from the input
|
||||||
|
* stream. If this is the last token from the input stream, or if the
|
||||||
|
* token manager has not read tokens beyond this one, this field is
|
||||||
|
* set to null. This is true only if this token is also a regular
|
||||||
|
* token. Otherwise, see below for a description of the contents of
|
||||||
|
* this field.
|
||||||
|
*/
|
||||||
|
public Token next;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* This field is used to access special tokens that occur prior to this
|
||||||
|
* token, but after the immediately preceding regular (non-special) token.
|
||||||
|
* If there are no such special tokens, this field is set to null.
|
||||||
|
* When there are more than one such special token, this field refers
|
||||||
|
* to the last of these special tokens, which in turn refers to the next
|
||||||
|
* previous special token through its specialToken field, and so on
|
||||||
|
* until the first special token (whose specialToken field is null).
|
||||||
|
* The next fields of special tokens refer to other special tokens that
|
||||||
|
* immediately follow it (without an intervening regular token). If there
|
||||||
|
* is no such token, this field is null.
|
||||||
|
*/
|
||||||
|
public Token specialToken;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Returns the image.
|
||||||
|
*/
|
||||||
|
public String toString()
|
||||||
|
{
|
||||||
|
return image;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Returns a new Token object, by default. However, if you want, you
|
||||||
|
* can create and return subclass objects based on the value of ofKind.
|
||||||
|
* Simply add the cases to the switch for all those special cases.
|
||||||
|
* For example, if you have a subclass of Token called IDToken that
|
||||||
|
* you want to create if ofKind is ID, simlpy add something like :
|
||||||
|
*
|
||||||
|
* case MyParserConstants.ID : return new IDToken();
|
||||||
|
*
|
||||||
|
* to the following switch statement. Then you can cast matchedToken
|
||||||
|
* variable to the appropriate type and use it in your lexical actions.
|
||||||
|
*/
|
||||||
|
public static final Token newToken(int ofKind)
|
||||||
|
{
|
||||||
|
switch(ofKind)
|
||||||
|
{
|
||||||
|
default : return new Token();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
|
@ -0,0 +1,133 @@
|
||||||
|
/* Generated By:JavaCC: Do not edit this line. TokenMgrError.java Version 3.0 */
|
||||||
|
package org.apache.lucene.analysis.standard;
|
||||||
|
|
||||||
|
public class TokenMgrError extends Error
|
||||||
|
{
|
||||||
|
/*
|
||||||
|
* Ordinals for various reasons why an Error of this type can be thrown.
|
||||||
|
*/
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Lexical error occured.
|
||||||
|
*/
|
||||||
|
static final int LEXICAL_ERROR = 0;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* An attempt wass made to create a second instance of a static token manager.
|
||||||
|
*/
|
||||||
|
static final int STATIC_LEXER_ERROR = 1;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Tried to change to an invalid lexical state.
|
||||||
|
*/
|
||||||
|
static final int INVALID_LEXICAL_STATE = 2;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Detected (and bailed out of) an infinite loop in the token manager.
|
||||||
|
*/
|
||||||
|
static final int LOOP_DETECTED = 3;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Indicates the reason why the exception is thrown. It will have
|
||||||
|
* one of the above 4 values.
|
||||||
|
*/
|
||||||
|
int errorCode;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Replaces unprintable characters by their espaced (or unicode escaped)
|
||||||
|
* equivalents in the given string
|
||||||
|
*/
|
||||||
|
protected static final String addEscapes(String str) {
|
||||||
|
StringBuffer retval = new StringBuffer();
|
||||||
|
char ch;
|
||||||
|
for (int i = 0; i < str.length(); i++) {
|
||||||
|
switch (str.charAt(i))
|
||||||
|
{
|
||||||
|
case 0 :
|
||||||
|
continue;
|
||||||
|
case '\b':
|
||||||
|
retval.append("\\b");
|
||||||
|
continue;
|
||||||
|
case '\t':
|
||||||
|
retval.append("\\t");
|
||||||
|
continue;
|
||||||
|
case '\n':
|
||||||
|
retval.append("\\n");
|
||||||
|
continue;
|
||||||
|
case '\f':
|
||||||
|
retval.append("\\f");
|
||||||
|
continue;
|
||||||
|
case '\r':
|
||||||
|
retval.append("\\r");
|
||||||
|
continue;
|
||||||
|
case '\"':
|
||||||
|
retval.append("\\\"");
|
||||||
|
continue;
|
||||||
|
case '\'':
|
||||||
|
retval.append("\\\'");
|
||||||
|
continue;
|
||||||
|
case '\\':
|
||||||
|
retval.append("\\\\");
|
||||||
|
continue;
|
||||||
|
default:
|
||||||
|
if ((ch = str.charAt(i)) < 0x20 || ch > 0x7e) {
|
||||||
|
String s = "0000" + Integer.toString(ch, 16);
|
||||||
|
retval.append("\\u" + s.substring(s.length() - 4, s.length()));
|
||||||
|
} else {
|
||||||
|
retval.append(ch);
|
||||||
|
}
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return retval.toString();
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Returns a detailed message for the Error when it is thrown by the
|
||||||
|
* token manager to indicate a lexical error.
|
||||||
|
* Parameters :
|
||||||
|
* EOFSeen : indicates if EOF caused the lexicl error
|
||||||
|
* curLexState : lexical state in which this error occured
|
||||||
|
* errorLine : line number when the error occured
|
||||||
|
* errorColumn : column number when the error occured
|
||||||
|
* errorAfter : prefix that was seen before this error occured
|
||||||
|
* curchar : the offending character
|
||||||
|
* Note: You can customize the lexical error message by modifying this method.
|
||||||
|
*/
|
||||||
|
protected static String LexicalError(boolean EOFSeen, int lexState, int errorLine, int errorColumn, String errorAfter, char curChar) {
|
||||||
|
return("Lexical error at line " +
|
||||||
|
errorLine + ", column " +
|
||||||
|
errorColumn + ". Encountered: " +
|
||||||
|
(EOFSeen ? "<EOF> " : ("\"" + addEscapes(String.valueOf(curChar)) + "\"") + " (" + (int)curChar + "), ") +
|
||||||
|
"after : \"" + addEscapes(errorAfter) + "\"");
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* You can also modify the body of this method to customize your error messages.
|
||||||
|
* For example, cases like LOOP_DETECTED and INVALID_LEXICAL_STATE are not
|
||||||
|
* of end-users concern, so you can return something like :
|
||||||
|
*
|
||||||
|
* "Internal Error : Please file a bug report .... "
|
||||||
|
*
|
||||||
|
* from this method for such cases in the release version of your parser.
|
||||||
|
*/
|
||||||
|
public String getMessage() {
|
||||||
|
return super.getMessage();
|
||||||
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Constructors of various flavors follow.
|
||||||
|
*/
|
||||||
|
|
||||||
|
public TokenMgrError() {
|
||||||
|
}
|
||||||
|
|
||||||
|
public TokenMgrError(String message, int reason) {
|
||||||
|
super(message);
|
||||||
|
errorCode = reason;
|
||||||
|
}
|
||||||
|
|
||||||
|
public TokenMgrError(boolean EOFSeen, int lexState, int errorLine, int errorColumn, String errorAfter, char curChar, int reason) {
|
||||||
|
this(LexicalError(EOFSeen, lexState, errorLine, errorColumn, errorAfter, curChar), reason);
|
||||||
|
}
|
||||||
|
}
|
|
@ -1,6 +0,0 @@
|
||||||
QueryParser.java
|
|
||||||
TokenMgrError.java
|
|
||||||
ParseException.java
|
|
||||||
Token.java
|
|
||||||
TokenManager.java
|
|
||||||
QueryParserConstants.java
|
|
|
@ -0,0 +1,110 @@
|
||||||
|
/* Generated By:JavaCC: Do not edit this line. CharStream.java Version 3.0 */
|
||||||
|
package org.apache.lucene.queryParser;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* This interface describes a character stream that maintains line and
|
||||||
|
* column number positions of the characters. It also has the capability
|
||||||
|
* to backup the stream to some extent. An implementation of this
|
||||||
|
* interface is used in the TokenManager implementation generated by
|
||||||
|
* JavaCCParser.
|
||||||
|
*
|
||||||
|
* All the methods except backup can be implemented in any fashion. backup
|
||||||
|
* needs to be implemented correctly for the correct operation of the lexer.
|
||||||
|
* Rest of the methods are all used to get information like line number,
|
||||||
|
* column number and the String that constitutes a token and are not used
|
||||||
|
* by the lexer. Hence their implementation won't affect the generated lexer's
|
||||||
|
* operation.
|
||||||
|
*/
|
||||||
|
|
||||||
|
public interface CharStream {
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Returns the next character from the selected input. The method
|
||||||
|
* of selecting the input is the responsibility of the class
|
||||||
|
* implementing this interface. Can throw any java.io.IOException.
|
||||||
|
*/
|
||||||
|
char readChar() throws java.io.IOException;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Returns the column position of the character last read.
|
||||||
|
* @deprecated
|
||||||
|
* @see #getEndColumn
|
||||||
|
*/
|
||||||
|
int getColumn();
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Returns the line number of the character last read.
|
||||||
|
* @deprecated
|
||||||
|
* @see #getEndLine
|
||||||
|
*/
|
||||||
|
int getLine();
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Returns the column number of the last character for current token (being
|
||||||
|
* matched after the last call to BeginTOken).
|
||||||
|
*/
|
||||||
|
int getEndColumn();
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Returns the line number of the last character for current token (being
|
||||||
|
* matched after the last call to BeginTOken).
|
||||||
|
*/
|
||||||
|
int getEndLine();
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Returns the column number of the first character for current token (being
|
||||||
|
* matched after the last call to BeginTOken).
|
||||||
|
*/
|
||||||
|
int getBeginColumn();
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Returns the line number of the first character for current token (being
|
||||||
|
* matched after the last call to BeginTOken).
|
||||||
|
*/
|
||||||
|
int getBeginLine();
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Backs up the input stream by amount steps. Lexer calls this method if it
|
||||||
|
* had already read some characters, but could not use them to match a
|
||||||
|
* (longer) token. So, they will be used again as the prefix of the next
|
||||||
|
* token and it is the implemetation's responsibility to do this right.
|
||||||
|
*/
|
||||||
|
void backup(int amount);
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Returns the next character that marks the beginning of the next token.
|
||||||
|
* All characters must remain in the buffer between two successive calls
|
||||||
|
* to this method to implement backup correctly.
|
||||||
|
*/
|
||||||
|
char BeginToken() throws java.io.IOException;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Returns a string made up of characters from the marked token beginning
|
||||||
|
* to the current buffer position. Implementations have the choice of returning
|
||||||
|
* anything that they want to. For example, for efficiency, one might decide
|
||||||
|
* to just return null, which is a valid implementation.
|
||||||
|
*/
|
||||||
|
String GetImage();
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Returns an array of characters that make up the suffix of length 'len' for
|
||||||
|
* the currently matched token. This is used to build up the matched string
|
||||||
|
* for use in actions in the case of MORE. A simple and inefficient
|
||||||
|
* implementation of this is as follows :
|
||||||
|
*
|
||||||
|
* {
|
||||||
|
* String t = GetImage();
|
||||||
|
* return t.substring(t.length() - len, t.length()).toCharArray();
|
||||||
|
* }
|
||||||
|
*/
|
||||||
|
char[] GetSuffix(int len);
|
||||||
|
|
||||||
|
/**
|
||||||
|
* The lexer calls this function to indicate that it is done with the stream
|
||||||
|
* and hence implementations can free any resources held by this class.
|
||||||
|
* Again, the body of this function can be just empty and it will not
|
||||||
|
* affect the lexer's operation.
|
||||||
|
*/
|
||||||
|
void Done();
|
||||||
|
|
||||||
|
}
|
|
@ -0,0 +1,192 @@
|
||||||
|
/* Generated By:JavaCC: Do not edit this line. ParseException.java Version 3.0 */
|
||||||
|
package org.apache.lucene.queryParser;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* This exception is thrown when parse errors are encountered.
|
||||||
|
* You can explicitly create objects of this exception type by
|
||||||
|
* calling the method generateParseException in the generated
|
||||||
|
* parser.
|
||||||
|
*
|
||||||
|
* You can modify this class to customize your error reporting
|
||||||
|
* mechanisms so long as you retain the public fields.
|
||||||
|
*/
|
||||||
|
public class ParseException extends Exception {
|
||||||
|
|
||||||
|
/**
|
||||||
|
* This constructor is used by the method "generateParseException"
|
||||||
|
* in the generated parser. Calling this constructor generates
|
||||||
|
* a new object of this type with the fields "currentToken",
|
||||||
|
* "expectedTokenSequences", and "tokenImage" set. The boolean
|
||||||
|
* flag "specialConstructor" is also set to true to indicate that
|
||||||
|
* this constructor was used to create this object.
|
||||||
|
* This constructor calls its super class with the empty string
|
||||||
|
* to force the "toString" method of parent class "Throwable" to
|
||||||
|
* print the error message in the form:
|
||||||
|
* ParseException: <result of getMessage>
|
||||||
|
*/
|
||||||
|
public ParseException(Token currentTokenVal,
|
||||||
|
int[][] expectedTokenSequencesVal,
|
||||||
|
String[] tokenImageVal
|
||||||
|
)
|
||||||
|
{
|
||||||
|
super("");
|
||||||
|
specialConstructor = true;
|
||||||
|
currentToken = currentTokenVal;
|
||||||
|
expectedTokenSequences = expectedTokenSequencesVal;
|
||||||
|
tokenImage = tokenImageVal;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* The following constructors are for use by you for whatever
|
||||||
|
* purpose you can think of. Constructing the exception in this
|
||||||
|
* manner makes the exception behave in the normal way - i.e., as
|
||||||
|
* documented in the class "Throwable". The fields "errorToken",
|
||||||
|
* "expectedTokenSequences", and "tokenImage" do not contain
|
||||||
|
* relevant information. The JavaCC generated code does not use
|
||||||
|
* these constructors.
|
||||||
|
*/
|
||||||
|
|
||||||
|
public ParseException() {
|
||||||
|
super();
|
||||||
|
specialConstructor = false;
|
||||||
|
}
|
||||||
|
|
||||||
|
public ParseException(String message) {
|
||||||
|
super(message);
|
||||||
|
specialConstructor = false;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* This variable determines which constructor was used to create
|
||||||
|
* this object and thereby affects the semantics of the
|
||||||
|
* "getMessage" method (see below).
|
||||||
|
*/
|
||||||
|
protected boolean specialConstructor;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* This is the last token that has been consumed successfully. If
|
||||||
|
* this object has been created due to a parse error, the token
|
||||||
|
* followng this token will (therefore) be the first error token.
|
||||||
|
*/
|
||||||
|
public Token currentToken;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Each entry in this array is an array of integers. Each array
|
||||||
|
* of integers represents a sequence of tokens (by their ordinal
|
||||||
|
* values) that is expected at this point of the parse.
|
||||||
|
*/
|
||||||
|
public int[][] expectedTokenSequences;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* This is a reference to the "tokenImage" array of the generated
|
||||||
|
* parser within which the parse error occurred. This array is
|
||||||
|
* defined in the generated ...Constants interface.
|
||||||
|
*/
|
||||||
|
public String[] tokenImage;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* This method has the standard behavior when this object has been
|
||||||
|
* created using the standard constructors. Otherwise, it uses
|
||||||
|
* "currentToken" and "expectedTokenSequences" to generate a parse
|
||||||
|
* error message and returns it. If this object has been created
|
||||||
|
* due to a parse error, and you do not catch it (it gets thrown
|
||||||
|
* from the parser), then this method is called during the printing
|
||||||
|
* of the final stack trace, and hence the correct error message
|
||||||
|
* gets displayed.
|
||||||
|
*/
|
||||||
|
public String getMessage() {
|
||||||
|
if (!specialConstructor) {
|
||||||
|
return super.getMessage();
|
||||||
|
}
|
||||||
|
String expected = "";
|
||||||
|
int maxSize = 0;
|
||||||
|
for (int i = 0; i < expectedTokenSequences.length; i++) {
|
||||||
|
if (maxSize < expectedTokenSequences[i].length) {
|
||||||
|
maxSize = expectedTokenSequences[i].length;
|
||||||
|
}
|
||||||
|
for (int j = 0; j < expectedTokenSequences[i].length; j++) {
|
||||||
|
expected += tokenImage[expectedTokenSequences[i][j]] + " ";
|
||||||
|
}
|
||||||
|
if (expectedTokenSequences[i][expectedTokenSequences[i].length - 1] != 0) {
|
||||||
|
expected += "...";
|
||||||
|
}
|
||||||
|
expected += eol + " ";
|
||||||
|
}
|
||||||
|
String retval = "Encountered \"";
|
||||||
|
Token tok = currentToken.next;
|
||||||
|
for (int i = 0; i < maxSize; i++) {
|
||||||
|
if (i != 0) retval += " ";
|
||||||
|
if (tok.kind == 0) {
|
||||||
|
retval += tokenImage[0];
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
retval += add_escapes(tok.image);
|
||||||
|
tok = tok.next;
|
||||||
|
}
|
||||||
|
retval += "\" at line " + currentToken.next.beginLine + ", column " + currentToken.next.beginColumn;
|
||||||
|
retval += "." + eol;
|
||||||
|
if (expectedTokenSequences.length == 1) {
|
||||||
|
retval += "Was expecting:" + eol + " ";
|
||||||
|
} else {
|
||||||
|
retval += "Was expecting one of:" + eol + " ";
|
||||||
|
}
|
||||||
|
retval += expected;
|
||||||
|
return retval;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* The end of line string for this machine.
|
||||||
|
*/
|
||||||
|
protected String eol = System.getProperty("line.separator", "\n");
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Used to convert raw characters to their escaped version
|
||||||
|
* when these raw version cannot be used as part of an ASCII
|
||||||
|
* string literal.
|
||||||
|
*/
|
||||||
|
protected String add_escapes(String str) {
|
||||||
|
StringBuffer retval = new StringBuffer();
|
||||||
|
char ch;
|
||||||
|
for (int i = 0; i < str.length(); i++) {
|
||||||
|
switch (str.charAt(i))
|
||||||
|
{
|
||||||
|
case 0 :
|
||||||
|
continue;
|
||||||
|
case '\b':
|
||||||
|
retval.append("\\b");
|
||||||
|
continue;
|
||||||
|
case '\t':
|
||||||
|
retval.append("\\t");
|
||||||
|
continue;
|
||||||
|
case '\n':
|
||||||
|
retval.append("\\n");
|
||||||
|
continue;
|
||||||
|
case '\f':
|
||||||
|
retval.append("\\f");
|
||||||
|
continue;
|
||||||
|
case '\r':
|
||||||
|
retval.append("\\r");
|
||||||
|
continue;
|
||||||
|
case '\"':
|
||||||
|
retval.append("\\\"");
|
||||||
|
continue;
|
||||||
|
case '\'':
|
||||||
|
retval.append("\\\'");
|
||||||
|
continue;
|
||||||
|
case '\\':
|
||||||
|
retval.append("\\\\");
|
||||||
|
continue;
|
||||||
|
default:
|
||||||
|
if ((ch = str.charAt(i)) < 0x20 || ch > 0x7e) {
|
||||||
|
String s = "0000" + Integer.toString(ch, 16);
|
||||||
|
retval.append("\\u" + s.substring(s.length() - 4, s.length()));
|
||||||
|
} else {
|
||||||
|
retval.append(ch);
|
||||||
|
}
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return retval.toString();
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
File diff suppressed because it is too large
Load Diff
|
@ -0,0 +1,80 @@
|
||||||
|
/* Generated By:JavaCC: Do not edit this line. QueryParserConstants.java */
|
||||||
|
package org.apache.lucene.queryParser;
|
||||||
|
|
||||||
|
public interface QueryParserConstants {
|
||||||
|
|
||||||
|
int EOF = 0;
|
||||||
|
int _NUM_CHAR = 1;
|
||||||
|
int _ESCAPED_CHAR = 2;
|
||||||
|
int _TERM_START_CHAR = 3;
|
||||||
|
int _TERM_CHAR = 4;
|
||||||
|
int _WHITESPACE = 5;
|
||||||
|
int AND = 7;
|
||||||
|
int OR = 8;
|
||||||
|
int NOT = 9;
|
||||||
|
int PLUS = 10;
|
||||||
|
int MINUS = 11;
|
||||||
|
int LPAREN = 12;
|
||||||
|
int RPAREN = 13;
|
||||||
|
int COLON = 14;
|
||||||
|
int CARAT = 15;
|
||||||
|
int QUOTED = 16;
|
||||||
|
int TERM = 17;
|
||||||
|
int FUZZY = 18;
|
||||||
|
int SLOP = 19;
|
||||||
|
int PREFIXTERM = 20;
|
||||||
|
int WILDTERM = 21;
|
||||||
|
int RANGEIN_START = 22;
|
||||||
|
int RANGEEX_START = 23;
|
||||||
|
int NUMBER = 24;
|
||||||
|
int RANGEIN_TO = 25;
|
||||||
|
int RANGEIN_END = 26;
|
||||||
|
int RANGEIN_QUOTED = 27;
|
||||||
|
int RANGEIN_GOOP = 28;
|
||||||
|
int RANGEEX_TO = 29;
|
||||||
|
int RANGEEX_END = 30;
|
||||||
|
int RANGEEX_QUOTED = 31;
|
||||||
|
int RANGEEX_GOOP = 32;
|
||||||
|
|
||||||
|
int Boost = 0;
|
||||||
|
int RangeEx = 1;
|
||||||
|
int RangeIn = 2;
|
||||||
|
int DEFAULT = 3;
|
||||||
|
|
||||||
|
String[] tokenImage = {
|
||||||
|
"<EOF>",
|
||||||
|
"<_NUM_CHAR>",
|
||||||
|
"<_ESCAPED_CHAR>",
|
||||||
|
"<_TERM_START_CHAR>",
|
||||||
|
"<_TERM_CHAR>",
|
||||||
|
"<_WHITESPACE>",
|
||||||
|
"<token of kind 6>",
|
||||||
|
"<AND>",
|
||||||
|
"<OR>",
|
||||||
|
"<NOT>",
|
||||||
|
"\"+\"",
|
||||||
|
"\"-\"",
|
||||||
|
"\"(\"",
|
||||||
|
"\")\"",
|
||||||
|
"\":\"",
|
||||||
|
"\"^\"",
|
||||||
|
"<QUOTED>",
|
||||||
|
"<TERM>",
|
||||||
|
"\"~\"",
|
||||||
|
"<SLOP>",
|
||||||
|
"<PREFIXTERM>",
|
||||||
|
"<WILDTERM>",
|
||||||
|
"\"[\"",
|
||||||
|
"\"{\"",
|
||||||
|
"<NUMBER>",
|
||||||
|
"\"TO\"",
|
||||||
|
"\"]\"",
|
||||||
|
"<RANGEIN_QUOTED>",
|
||||||
|
"<RANGEIN_GOOP>",
|
||||||
|
"\"TO\"",
|
||||||
|
"\"}\"",
|
||||||
|
"<RANGEEX_QUOTED>",
|
||||||
|
"<RANGEEX_GOOP>",
|
||||||
|
};
|
||||||
|
|
||||||
|
}
|
File diff suppressed because it is too large
Load Diff
|
@ -0,0 +1,81 @@
|
||||||
|
/* Generated By:JavaCC: Do not edit this line. Token.java Version 3.0 */
|
||||||
|
package org.apache.lucene.queryParser;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Describes the input token stream.
|
||||||
|
*/
|
||||||
|
|
||||||
|
public class Token {
|
||||||
|
|
||||||
|
/**
|
||||||
|
* An integer that describes the kind of this token. This numbering
|
||||||
|
* system is determined by JavaCCParser, and a table of these numbers is
|
||||||
|
* stored in the file ...Constants.java.
|
||||||
|
*/
|
||||||
|
public int kind;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* beginLine and beginColumn describe the position of the first character
|
||||||
|
* of this token; endLine and endColumn describe the position of the
|
||||||
|
* last character of this token.
|
||||||
|
*/
|
||||||
|
public int beginLine, beginColumn, endLine, endColumn;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* The string image of the token.
|
||||||
|
*/
|
||||||
|
public String image;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* A reference to the next regular (non-special) token from the input
|
||||||
|
* stream. If this is the last token from the input stream, or if the
|
||||||
|
* token manager has not read tokens beyond this one, this field is
|
||||||
|
* set to null. This is true only if this token is also a regular
|
||||||
|
* token. Otherwise, see below for a description of the contents of
|
||||||
|
* this field.
|
||||||
|
*/
|
||||||
|
public Token next;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* This field is used to access special tokens that occur prior to this
|
||||||
|
* token, but after the immediately preceding regular (non-special) token.
|
||||||
|
* If there are no such special tokens, this field is set to null.
|
||||||
|
* When there are more than one such special token, this field refers
|
||||||
|
* to the last of these special tokens, which in turn refers to the next
|
||||||
|
* previous special token through its specialToken field, and so on
|
||||||
|
* until the first special token (whose specialToken field is null).
|
||||||
|
* The next fields of special tokens refer to other special tokens that
|
||||||
|
* immediately follow it (without an intervening regular token). If there
|
||||||
|
* is no such token, this field is null.
|
||||||
|
*/
|
||||||
|
public Token specialToken;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Returns the image.
|
||||||
|
*/
|
||||||
|
public String toString()
|
||||||
|
{
|
||||||
|
return image;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Returns a new Token object, by default. However, if you want, you
|
||||||
|
* can create and return subclass objects based on the value of ofKind.
|
||||||
|
* Simply add the cases to the switch for all those special cases.
|
||||||
|
* For example, if you have a subclass of Token called IDToken that
|
||||||
|
* you want to create if ofKind is ID, simlpy add something like :
|
||||||
|
*
|
||||||
|
* case MyParserConstants.ID : return new IDToken();
|
||||||
|
*
|
||||||
|
* to the following switch statement. Then you can cast matchedToken
|
||||||
|
* variable to the appropriate type and use it in your lexical actions.
|
||||||
|
*/
|
||||||
|
public static final Token newToken(int ofKind)
|
||||||
|
{
|
||||||
|
switch(ofKind)
|
||||||
|
{
|
||||||
|
default : return new Token();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
|
@ -0,0 +1,133 @@
|
||||||
|
/* Generated By:JavaCC: Do not edit this line. TokenMgrError.java Version 3.0 */
|
||||||
|
package org.apache.lucene.queryParser;
|
||||||
|
|
||||||
|
public class TokenMgrError extends Error
|
||||||
|
{
|
||||||
|
/*
|
||||||
|
* Ordinals for various reasons why an Error of this type can be thrown.
|
||||||
|
*/
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Lexical error occured.
|
||||||
|
*/
|
||||||
|
static final int LEXICAL_ERROR = 0;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* An attempt wass made to create a second instance of a static token manager.
|
||||||
|
*/
|
||||||
|
static final int STATIC_LEXER_ERROR = 1;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Tried to change to an invalid lexical state.
|
||||||
|
*/
|
||||||
|
static final int INVALID_LEXICAL_STATE = 2;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Detected (and bailed out of) an infinite loop in the token manager.
|
||||||
|
*/
|
||||||
|
static final int LOOP_DETECTED = 3;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Indicates the reason why the exception is thrown. It will have
|
||||||
|
* one of the above 4 values.
|
||||||
|
*/
|
||||||
|
int errorCode;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Replaces unprintable characters by their espaced (or unicode escaped)
|
||||||
|
* equivalents in the given string
|
||||||
|
*/
|
||||||
|
protected static final String addEscapes(String str) {
|
||||||
|
StringBuffer retval = new StringBuffer();
|
||||||
|
char ch;
|
||||||
|
for (int i = 0; i < str.length(); i++) {
|
||||||
|
switch (str.charAt(i))
|
||||||
|
{
|
||||||
|
case 0 :
|
||||||
|
continue;
|
||||||
|
case '\b':
|
||||||
|
retval.append("\\b");
|
||||||
|
continue;
|
||||||
|
case '\t':
|
||||||
|
retval.append("\\t");
|
||||||
|
continue;
|
||||||
|
case '\n':
|
||||||
|
retval.append("\\n");
|
||||||
|
continue;
|
||||||
|
case '\f':
|
||||||
|
retval.append("\\f");
|
||||||
|
continue;
|
||||||
|
case '\r':
|
||||||
|
retval.append("\\r");
|
||||||
|
continue;
|
||||||
|
case '\"':
|
||||||
|
retval.append("\\\"");
|
||||||
|
continue;
|
||||||
|
case '\'':
|
||||||
|
retval.append("\\\'");
|
||||||
|
continue;
|
||||||
|
case '\\':
|
||||||
|
retval.append("\\\\");
|
||||||
|
continue;
|
||||||
|
default:
|
||||||
|
if ((ch = str.charAt(i)) < 0x20 || ch > 0x7e) {
|
||||||
|
String s = "0000" + Integer.toString(ch, 16);
|
||||||
|
retval.append("\\u" + s.substring(s.length() - 4, s.length()));
|
||||||
|
} else {
|
||||||
|
retval.append(ch);
|
||||||
|
}
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return retval.toString();
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Returns a detailed message for the Error when it is thrown by the
|
||||||
|
* token manager to indicate a lexical error.
|
||||||
|
* Parameters :
|
||||||
|
* EOFSeen : indicates if EOF caused the lexicl error
|
||||||
|
* curLexState : lexical state in which this error occured
|
||||||
|
* errorLine : line number when the error occured
|
||||||
|
* errorColumn : column number when the error occured
|
||||||
|
* errorAfter : prefix that was seen before this error occured
|
||||||
|
* curchar : the offending character
|
||||||
|
* Note: You can customize the lexical error message by modifying this method.
|
||||||
|
*/
|
||||||
|
protected static String LexicalError(boolean EOFSeen, int lexState, int errorLine, int errorColumn, String errorAfter, char curChar) {
|
||||||
|
return("Lexical error at line " +
|
||||||
|
errorLine + ", column " +
|
||||||
|
errorColumn + ". Encountered: " +
|
||||||
|
(EOFSeen ? "<EOF> " : ("\"" + addEscapes(String.valueOf(curChar)) + "\"") + " (" + (int)curChar + "), ") +
|
||||||
|
"after : \"" + addEscapes(errorAfter) + "\"");
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* You can also modify the body of this method to customize your error messages.
|
||||||
|
* For example, cases like LOOP_DETECTED and INVALID_LEXICAL_STATE are not
|
||||||
|
* of end-users concern, so you can return something like :
|
||||||
|
*
|
||||||
|
* "Internal Error : Please file a bug report .... "
|
||||||
|
*
|
||||||
|
* from this method for such cases in the release version of your parser.
|
||||||
|
*/
|
||||||
|
public String getMessage() {
|
||||||
|
return super.getMessage();
|
||||||
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Constructors of various flavors follow.
|
||||||
|
*/
|
||||||
|
|
||||||
|
public TokenMgrError() {
|
||||||
|
}
|
||||||
|
|
||||||
|
public TokenMgrError(String message, int reason) {
|
||||||
|
super(message);
|
||||||
|
errorCode = reason;
|
||||||
|
}
|
||||||
|
|
||||||
|
public TokenMgrError(boolean EOFSeen, int lexState, int errorLine, int errorColumn, String errorAfter, char curChar, int reason) {
|
||||||
|
this(LexicalError(EOFSeen, lexState, errorLine, errorColumn, errorAfter, curChar), reason);
|
||||||
|
}
|
||||||
|
}
|
Loading…
Reference in New Issue