mirror of https://github.com/apache/lucene.git
PR 19468, but not exactly as it was done in the provided patches. JavaCC is no longer required to build Lucene, but can be run optionally
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@150017 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
798fc0f0ef
commit
2af2d85877
39
BUILD.txt
39
BUILD.txt
|
@ -3,15 +3,15 @@ Lucene Build Instructions
|
|||
$Id$
|
||||
|
||||
Basic steps:
|
||||
0) Install JDK 1.2 (or greater), Ant 1.4 (or greater), and the Ant
|
||||
0) Install JDK 1.2 (or greater), Ant 1.5 (or greater), and the Ant
|
||||
optional.jar
|
||||
1) Download Lucene from Apache and unpack it
|
||||
2) Connect to the top-level of your Lucene installation
|
||||
3) Install JavaCC
|
||||
3) Install JavaCC (optional)
|
||||
4) Run ant
|
||||
|
||||
Step 0) Set up your development environment (JDK 1.2 or greater,
|
||||
Ant 1.4 or greater)
|
||||
Ant 1.5 or greater)
|
||||
|
||||
We'll assume that you know how to get and set up the JDK - if you
|
||||
don't, then we suggest starting at http://java.sun.com and learning
|
||||
|
@ -22,26 +22,22 @@ with the development version of Lucene, we recommend you stick with
|
|||
the most current version of Java (at the time of this writing, JDK
|
||||
1.4). Also, note that if you're working with the Lucene source,
|
||||
you'll need to use Ant (see below) and Ant requires at least JDK 1.1
|
||||
(and in the future will likely move to requiring JDK 1.2, according to
|
||||
(and in the future will move to requiring JDK 1.2, according to
|
||||
the Ant install docs).
|
||||
|
||||
Like most of the Jakarta projects, Lucene uses Apache Ant for build
|
||||
control. Specifically, you MUST use Ant version 1.4 or greater.
|
||||
control. Specifically, you MUST use Ant version 1.5 or greater.
|
||||
|
||||
Ant is "kind of like make without make's wrinkles". Ant is
|
||||
implemented in java and uses XML-based configuration files. You can
|
||||
get it at:
|
||||
|
||||
http://jakarta.apache.org/ant
|
||||
|
||||
Specifically, you can get the binary distributions at:
|
||||
|
||||
http://jakarta.apache.org/builds/jakarta-ant/release/
|
||||
http://ant.apache.org
|
||||
|
||||
You'll need to download both the Ant binary distribution and the
|
||||
"optional" jar file. Install these according to the instructions at:
|
||||
|
||||
http://jakarta.apache.org/ant/manual
|
||||
http://ant.apache.org/manual
|
||||
|
||||
Step 1) Download Lucene from Apache
|
||||
|
||||
|
@ -79,21 +75,16 @@ NOTE: the ~ character represents your user account home directory.
|
|||
|
||||
Step 3) Install JavaCC
|
||||
|
||||
Building the Lucene distribution from the source requires the JavaCC
|
||||
parser generator. This software has a separate license agreement that
|
||||
must be agreed to before you can use it. The web page for JavaCC is here:
|
||||
Building the Lucene distribution from the source does not require the JavaCC
|
||||
parser generator, but if you wish to regenerate any of the pre-generated
|
||||
parser pieces, you will need to install JavaCC.
|
||||
|
||||
http://www.experimentalstuff.com/Technologies/JavaCC/
|
||||
http://javacc.dev.java.net
|
||||
|
||||
Follow the download links and download the zip file to a temporary
|
||||
location on your file system. Unzip the file and run the large class file
|
||||
in the directory. On windows, use this command from the temp directory:
|
||||
location on your file system.
|
||||
|
||||
java -cp . JavaCC2_1
|
||||
|
||||
This will launch a Java GUI installer. There is also a command line
|
||||
installer available, and the installation class will give you those
|
||||
directions. After JavaCC is installed, edit your build properties
|
||||
After JavaCC is installed, edit your build.properties
|
||||
(as in step 2), and add the line
|
||||
|
||||
javacc.home=/javacc/bin
|
||||
|
@ -107,6 +98,8 @@ location of your ant installation, typing "ant" at the shell prompt
|
|||
and command prompt should run ant. Ant will by default look for the
|
||||
"build.xml" file in your current directory, and compile Lucene.
|
||||
|
||||
To rebuild any of the JavaCC-based parsers, run "ant javacc".
|
||||
|
||||
For further information on Lucene, go to:
|
||||
http://jakarta.apache.org/lucene/
|
||||
|
||||
|
@ -114,7 +107,7 @@ Please join the Lucene-User mailing list by visiting this site:
|
|||
http://jakarta.apache.org/site/mail.html
|
||||
|
||||
Please post suggestions, questions, corrections or additions to this
|
||||
document to the Lucene-User mailing list.
|
||||
document to the lucene-user mailing list.
|
||||
|
||||
This file was originally written by Steven J. Owens <puff@darksleep.com>.
|
||||
This file was modified by Jon S. Stevens <jon@latchkey.com>.
|
||||
|
|
120
build.xml
120
build.xml
|
@ -9,6 +9,8 @@
|
|||
<property file="${basedir}/build.properties" />
|
||||
<property file="${basedir}/default.properties" />
|
||||
|
||||
<property name="javacc.main.class" value="org.javacc.parser.Main"/>
|
||||
|
||||
<!-- Build classpath -->
|
||||
<path id="classpath">
|
||||
<pathelement location="${build.classes}"/>
|
||||
|
@ -52,8 +54,8 @@
|
|||
|
||||
<available
|
||||
property="javacc.present"
|
||||
classname="COM.sun.labs.javacc.Main"
|
||||
classpath="${javacc.zip}"
|
||||
classname="${javacc.main.class}"
|
||||
classpath="${javacc.jar}"
|
||||
/>
|
||||
|
||||
<available
|
||||
|
@ -67,21 +69,21 @@
|
|||
</tstamp>
|
||||
</target>
|
||||
|
||||
<target name="javacc_check" depends="init" unless="javacc.present">
|
||||
<echo>
|
||||
<target name="javacc-check" depends="init">
|
||||
<fail unless="javacc.present">
|
||||
##################################################################
|
||||
JavaCC not found.
|
||||
JavaCC Home: ${javacc.home}
|
||||
JavaCC Zip: ${javacc.zip}
|
||||
JavaCC Zip: ${javacc.jar}
|
||||
|
||||
Please download and install JavaCC 2.0 from:
|
||||
Please download and install JavaCC from:
|
||||
|
||||
<http://www.experimentalstuff.com/Technologies/JavaCC/>
|
||||
<http://javacc.dev.java.net>
|
||||
|
||||
Then, create a build.properties file either in your home
|
||||
directory, or within the Lucene directory and set the javacc.home
|
||||
property to the path where JavaCC.zip is located. For example,
|
||||
if you installed JavaCC in /usr/local/java/javacc2.0, then set the
|
||||
if you installed JavaCC in /usr/local/java/javacc3.2, then set the
|
||||
javacc.home property to:
|
||||
|
||||
javacc.home=/usr/local/java/javacc2.0/bin
|
||||
|
@ -89,9 +91,10 @@
|
|||
If you get an error like the one below, then you have not installed
|
||||
things correctly. Please check all your paths and try again.
|
||||
|
||||
java.lang.NoClassDefFoundError: COM/sun/labs/javacc/Main
|
||||
java.lang.NoClassDefFoundError: org.javacc.parser.Main
|
||||
##################################################################
|
||||
</echo>
|
||||
</fail>
|
||||
|
||||
</target>
|
||||
|
||||
<!-- ================================================================== -->
|
||||
|
@ -99,25 +102,10 @@
|
|||
<!-- ================================================================== -->
|
||||
<!-- -->
|
||||
<!-- ================================================================== -->
|
||||
<target name="compile" depends="init,javacc_check" if="javacc.present">
|
||||
<mkdir dir="${build.src}/org/apache/lucene/analysis/standard"/>
|
||||
<javacc
|
||||
target="${src.dir}/org/apache/lucene/analysis/standard/StandardTokenizer.jj"
|
||||
javacchome="${javacc.zip.dir}"
|
||||
outputdirectory="${build.src}/org/apache/lucene/analysis/standard"
|
||||
/>
|
||||
|
||||
<delete file="${build.src}/org/apache/lucene/analysis/standard/ParseException.java"/>
|
||||
<mkdir dir="${build.src}/org/apache/lucene/queryParser"/>
|
||||
<javacc
|
||||
target="${src.dir}/org/apache/lucene/queryParser/QueryParser.jj"
|
||||
javacchome="${javacc.zip.dir}"
|
||||
outputdirectory="${build.src}/org/apache/lucene/queryParser"
|
||||
/>
|
||||
|
||||
<target name="compile" depends="init">
|
||||
<javac
|
||||
encoding="${build.encoding}"
|
||||
srcdir="${src.dir}:${build.src}"
|
||||
srcdir="${src.dir}"
|
||||
includes="org/**/*.java"
|
||||
destdir="${build.classes}"
|
||||
debug="${debug}">
|
||||
|
@ -135,7 +123,7 @@
|
|||
<!-- ================================================================== -->
|
||||
<!-- -->
|
||||
<!-- ================================================================== -->
|
||||
<target name="jar" depends="compile" if="javacc.present">
|
||||
<target name="jar" depends="compile">
|
||||
|
||||
<!-- Create Jar MANIFEST file -->
|
||||
<echo file="${build.manifest}">Manifest-Version: 1.0
|
||||
|
@ -158,7 +146,7 @@ Implementation-Vendor: Lucene
|
|||
/>
|
||||
</target>
|
||||
|
||||
<target name="jardemo" depends="compile,demo" if="javacc.present">
|
||||
<target name="jardemo" depends="compile,demo">
|
||||
<jar
|
||||
jarfile="${build.demo}/${build.demo.name}.jar"
|
||||
basedir="${build.demo.classes}"
|
||||
|
@ -166,7 +154,7 @@ Implementation-Vendor: Lucene
|
|||
/>
|
||||
</target>
|
||||
|
||||
<target name="wardemo" depends="compile,demo,jar,jardemo" if="javacc.present">
|
||||
<target name="wardemo" depends="compile,demo,jar,jardemo">
|
||||
<mkdir dir="${build.demo}/${build.demo.war.name}"/>
|
||||
<mkdir dir="${build.demo}/${build.demo.war.name}/WEB-INF"/>
|
||||
<mkdir dir="${build.demo}/${build.demo.war.name}/WEB-INF/lib"/>
|
||||
|
@ -202,22 +190,8 @@ Implementation-Vendor: Lucene
|
|||
<!-- ================================================================== -->
|
||||
<!-- -->
|
||||
<!-- ================================================================== -->
|
||||
<target name="jar-src" depends="init,javacc_check" if="javacc.present">
|
||||
<target name="jar-src" depends="init">
|
||||
<mkdir dir="${build.src}/org/apache/lucene/analysis/standard"/>
|
||||
<javacc
|
||||
target="${src.dir}/org/apache/lucene/analysis/standard/StandardTokenizer.jj"
|
||||
javacchome="${javacc.zip.dir}"
|
||||
outputdirectory="${build.src}/org/apache/lucene/analysis/standard"
|
||||
/>
|
||||
|
||||
<delete file="${build.src}/org/apache/lucene/analysis/standard/ParseException.java"/>
|
||||
<mkdir dir="${build.src}/org/apache/lucene/queryParser"/>
|
||||
<javacc
|
||||
target="${src.dir}/org/apache/lucene/queryParser/QueryParser.jj"
|
||||
javacchome="${javacc.zip.dir}"
|
||||
outputdirectory="${build.src}/org/apache/lucene/queryParser"
|
||||
/>
|
||||
|
||||
<jar jarfile="${build.dir}/${final.name}-src.jar">
|
||||
<fileset dir="${build.dir}" includes="**/*.java"/>
|
||||
</jar>
|
||||
|
@ -228,7 +202,7 @@ Implementation-Vendor: Lucene
|
|||
<!-- ================================================================== -->
|
||||
<!-- -->
|
||||
<!-- ================================================================== -->
|
||||
<target name="demo" depends="compile" if="javacc.present">
|
||||
<target name="demo" depends="compile">
|
||||
<mkdir dir="${build.demo}"/>
|
||||
<mkdir dir="${build.demo.src}" />
|
||||
|
||||
|
@ -239,11 +213,6 @@ Implementation-Vendor: Lucene
|
|||
</fileset>
|
||||
</copy>
|
||||
|
||||
<javacc
|
||||
target="${build.demo.src}/org/apache/lucene/demo/html/HTMLParser.jj"
|
||||
javacchome="${javacc.zip.dir}"
|
||||
outputdirectory="${build.demo.src}/org/apache/lucene/demo/html"
|
||||
/>
|
||||
<mkdir dir="${build.demo.classes}"/>
|
||||
|
||||
<javac
|
||||
|
@ -355,7 +324,7 @@ Implementation-Vendor: Lucene
|
|||
<!-- ================================================================== -->
|
||||
<!-- -->
|
||||
<!-- ================================================================== -->
|
||||
<target name="javadocs" depends="compile" if="javacc.present">
|
||||
<target name="javadocs" depends="compile">
|
||||
<mkdir dir="${build.javadocs}"/>
|
||||
<javadoc
|
||||
sourcepath="${src.dir}:${build.src}"
|
||||
|
@ -619,4 +588,51 @@ Implementation-Vendor: Lucene
|
|||
</war>
|
||||
</target>
|
||||
-->
|
||||
|
||||
|
||||
<!-- ================================================================== -->
|
||||
<!-- Build the JavaCC files into the source tree -->
|
||||
<!-- ================================================================== -->
|
||||
<target name="javacc" depends="javacc-StandardAnalyzer,javacc-QueryParser,javacc-HTMLParser"/>
|
||||
|
||||
<target name="javacc-StandardAnalyzer" depends="init,javacc-check" if="javacc.present">
|
||||
<!-- generate this in a build directory so we can exclude ParseException -->
|
||||
<mkdir dir="${build.src}/org/apache/lucene/analysis/standard"/>
|
||||
<antcall target="invoke-javacc">
|
||||
<param name="target" location="${src.dir}/org/apache/lucene/analysis/standard/StandardTokenizer.jj"/>
|
||||
<param name="output.dir" location="${build.src}/org/apache/lucene/analysis/standard"/>
|
||||
</antcall>
|
||||
<copy todir="${src.dir}/org/apache/lucene/analysis/standard">
|
||||
<fileset dir="${build.src}/org/apache/lucene/analysis/standard">
|
||||
<include name="*.java"/>
|
||||
<exclude name="ParseException.java"/>
|
||||
</fileset>
|
||||
</copy>
|
||||
</target>
|
||||
|
||||
<target name="javacc-QueryParser" depends="init,javacc-check" if="javacc.present">
|
||||
<antcall target="invoke-javacc">
|
||||
<param name="target" location="${src.dir}/org/apache/lucene/queryParser/QueryParser.jj"/>
|
||||
<param name="output.dir" location="${src.dir}/org/apache/lucene/queryParser"/>
|
||||
</antcall>
|
||||
</target>
|
||||
|
||||
<target name="javacc-HTMLParser" depends="init,javacc-check" if="javacc.present">
|
||||
<antcall target="invoke-javacc">
|
||||
<param name="target" location="${demo.src}/org/apache/lucene/demo/html/HTMLParser.jj"/>
|
||||
<param name="output.dir" location="${demo.src}/org/apache/lucene/demo/html"/>
|
||||
</antcall>
|
||||
</target>
|
||||
|
||||
<target name="invoke-javacc">
|
||||
<java classname="${javacc.main.class}" fork="true">
|
||||
<classpath path="${javacc.jar}"/>
|
||||
|
||||
<sysproperty key="install.root" file="${javacc.home}"/>
|
||||
|
||||
<arg value="-OUTPUT_DIRECTORY:${output.dir}"/>
|
||||
<arg value="${target}"/>
|
||||
</java>
|
||||
</target>
|
||||
|
||||
</project>
|
||||
|
|
|
@ -58,8 +58,8 @@ junit.reports = ${build.dir}/unit-reports
|
|||
|
||||
# Home directory of JavaCC
|
||||
javacc.home = .
|
||||
javacc.zip.dir = ${javacc.home}/lib
|
||||
javacc.zip = ${javacc.zip.dir}/JavaCC.zip
|
||||
javacc.zip.dir = ${javacc.home}/bin/lib
|
||||
javacc.jar = ${javacc.zip.dir}/javacc.jar
|
||||
|
||||
# Home directory of jakarta-site2
|
||||
jakarta.site2.home = ../jakarta-site2
|
||||
|
|
|
@ -0,0 +1,688 @@
|
|||
/* Generated By:JavaCC: Do not edit this line. HTMLParser.java */
|
||||
package org.apache.lucene.demo.html;
|
||||
|
||||
import java.io.*;
|
||||
import java.util.Properties;
|
||||
|
||||
public class HTMLParser implements HTMLParserConstants {
|
||||
public static int SUMMARY_LENGTH = 200;
|
||||
|
||||
StringBuffer title = new StringBuffer(SUMMARY_LENGTH);
|
||||
StringBuffer summary = new StringBuffer(SUMMARY_LENGTH * 2);
|
||||
Properties metaTags=new Properties();
|
||||
String currentMetaTag="";
|
||||
int length = 0;
|
||||
boolean titleComplete = false;
|
||||
boolean inTitle = false;
|
||||
boolean inMetaTag = false;
|
||||
boolean inStyle = false;
|
||||
boolean inScript = false;
|
||||
boolean afterTag = false;
|
||||
boolean afterSpace = false;
|
||||
String eol = System.getProperty("line.separator");
|
||||
PipedReader pipeIn = null;
|
||||
PipedWriter pipeOut;
|
||||
|
||||
public HTMLParser(File file) throws FileNotFoundException {
|
||||
this(new FileInputStream(file));
|
||||
}
|
||||
|
||||
public String getTitle() throws IOException, InterruptedException {
|
||||
if (pipeIn == null)
|
||||
getReader(); // spawn parsing thread
|
||||
while (true) {
|
||||
synchronized(this) {
|
||||
if (titleComplete || (length > SUMMARY_LENGTH))
|
||||
break;
|
||||
wait(10);
|
||||
}
|
||||
}
|
||||
return title.toString().trim();
|
||||
}
|
||||
|
||||
public Properties getMetaTags() throws IOException,
|
||||
InterruptedException {
|
||||
if (pipeIn == null)
|
||||
getReader(); // spawn parsing thread
|
||||
while (true) {
|
||||
synchronized(this) {
|
||||
if (titleComplete || (length > SUMMARY_LENGTH))
|
||||
break;
|
||||
wait(10);
|
||||
}
|
||||
}
|
||||
return metaTags;
|
||||
}
|
||||
|
||||
|
||||
public String getSummary() throws IOException, InterruptedException {
|
||||
if (pipeIn == null)
|
||||
getReader(); // spawn parsing thread
|
||||
while (true) {
|
||||
synchronized(this) {
|
||||
if (summary.length() >= SUMMARY_LENGTH)
|
||||
break;
|
||||
wait(10);
|
||||
}
|
||||
}
|
||||
if (summary.length() > SUMMARY_LENGTH)
|
||||
summary.setLength(SUMMARY_LENGTH);
|
||||
|
||||
String sum = summary.toString().trim();
|
||||
String tit = getTitle();
|
||||
if (sum.startsWith(tit))
|
||||
return sum.substring(tit.length());
|
||||
else
|
||||
return sum;
|
||||
}
|
||||
|
||||
public Reader getReader() throws IOException {
|
||||
if (pipeIn == null) {
|
||||
pipeIn = new PipedReader();
|
||||
pipeOut = new PipedWriter(pipeIn);
|
||||
|
||||
Thread thread = new ParserThread(this);
|
||||
thread.start(); // start parsing
|
||||
}
|
||||
|
||||
return pipeIn;
|
||||
}
|
||||
|
||||
void addToSummary(String text) {
|
||||
if (summary.length() < SUMMARY_LENGTH) {
|
||||
summary.append(text);
|
||||
if (summary.length() >= SUMMARY_LENGTH) {
|
||||
synchronized(this) {
|
||||
notifyAll();
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
void addText(String text) throws IOException {
|
||||
if (inScript)
|
||||
return;
|
||||
if (inStyle)
|
||||
return;
|
||||
if (inMetaTag)
|
||||
{
|
||||
metaTags.setProperty(currentMetaTag, text);
|
||||
return;
|
||||
}
|
||||
if (inTitle)
|
||||
title.append(text);
|
||||
else {
|
||||
addToSummary(text);
|
||||
if (!titleComplete && !title.equals("")) { // finished title
|
||||
synchronized(this) {
|
||||
titleComplete = true; // tell waiting threads
|
||||
notifyAll();
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
length += text.length();
|
||||
pipeOut.write(text);
|
||||
|
||||
afterSpace = false;
|
||||
}
|
||||
|
||||
void addSpace() throws IOException {
|
||||
if (inScript)
|
||||
return;
|
||||
if (!afterSpace) {
|
||||
if (inTitle)
|
||||
title.append(" ");
|
||||
else
|
||||
addToSummary(" ");
|
||||
|
||||
String space = afterTag ? eol : " ";
|
||||
length += space.length();
|
||||
pipeOut.write(space);
|
||||
afterSpace = true;
|
||||
}
|
||||
}
|
||||
|
||||
final public void HTMLDocument() throws ParseException, IOException {
|
||||
Token t;
|
||||
label_1:
|
||||
while (true) {
|
||||
switch ((jj_ntk==-1)?jj_ntk():jj_ntk) {
|
||||
case TagName:
|
||||
case DeclName:
|
||||
case Comment1:
|
||||
case Comment2:
|
||||
case Word:
|
||||
case Entity:
|
||||
case Space:
|
||||
case Punct:
|
||||
;
|
||||
break;
|
||||
default:
|
||||
jj_la1[0] = jj_gen;
|
||||
break label_1;
|
||||
}
|
||||
switch ((jj_ntk==-1)?jj_ntk():jj_ntk) {
|
||||
case TagName:
|
||||
Tag();
|
||||
afterTag = true;
|
||||
break;
|
||||
case DeclName:
|
||||
t = Decl();
|
||||
afterTag = true;
|
||||
break;
|
||||
case Comment1:
|
||||
case Comment2:
|
||||
CommentTag();
|
||||
afterTag = true;
|
||||
break;
|
||||
case Word:
|
||||
t = jj_consume_token(Word);
|
||||
addText(t.image); afterTag = false;
|
||||
break;
|
||||
case Entity:
|
||||
t = jj_consume_token(Entity);
|
||||
addText(Entities.decode(t.image)); afterTag = false;
|
||||
break;
|
||||
case Punct:
|
||||
t = jj_consume_token(Punct);
|
||||
addText(t.image); afterTag = false;
|
||||
break;
|
||||
case Space:
|
||||
jj_consume_token(Space);
|
||||
addSpace(); afterTag = false;
|
||||
break;
|
||||
default:
|
||||
jj_la1[1] = jj_gen;
|
||||
jj_consume_token(-1);
|
||||
throw new ParseException();
|
||||
}
|
||||
}
|
||||
jj_consume_token(0);
|
||||
}
|
||||
|
||||
final public void Tag() throws ParseException, IOException {
|
||||
Token t1, t2;
|
||||
boolean inImg = false;
|
||||
t1 = jj_consume_token(TagName);
|
||||
inTitle = t1.image.equalsIgnoreCase("<title"); // keep track if in <TITLE>
|
||||
inMetaTag = t1.image.equalsIgnoreCase("<META"); // keep track if in <META>
|
||||
inStyle = t1.image.equalsIgnoreCase("<STYLE"); // keep track if in <STYLE>
|
||||
inImg = t1.image.equalsIgnoreCase("<img"); // keep track if in <IMG>
|
||||
if (inScript) { // keep track if in <SCRIPT>
|
||||
inScript = !t1.image.equalsIgnoreCase("</script");
|
||||
} else {
|
||||
inScript = t1.image.equalsIgnoreCase("<script");
|
||||
}
|
||||
label_2:
|
||||
while (true) {
|
||||
switch ((jj_ntk==-1)?jj_ntk():jj_ntk) {
|
||||
case ArgName:
|
||||
;
|
||||
break;
|
||||
default:
|
||||
jj_la1[2] = jj_gen;
|
||||
break label_2;
|
||||
}
|
||||
t1 = jj_consume_token(ArgName);
|
||||
switch ((jj_ntk==-1)?jj_ntk():jj_ntk) {
|
||||
case ArgEquals:
|
||||
jj_consume_token(ArgEquals);
|
||||
switch ((jj_ntk==-1)?jj_ntk():jj_ntk) {
|
||||
case ArgValue:
|
||||
case ArgQuote1:
|
||||
case ArgQuote2:
|
||||
t2 = ArgValue();
|
||||
if (inImg && t1.image.equalsIgnoreCase("alt") && t2 != null)
|
||||
addText("[" + t2.image + "]");
|
||||
|
||||
if(inMetaTag &&
|
||||
( t1.image.equalsIgnoreCase("name") ||
|
||||
t1.image.equalsIgnoreCase("HTTP-EQUIV")
|
||||
)
|
||||
&& t2 != null)
|
||||
{
|
||||
currentMetaTag=t2.image.toLowerCase();
|
||||
}
|
||||
if(inMetaTag && t1.image.equalsIgnoreCase("content") && t2 !=
|
||||
null)
|
||||
{
|
||||
addText(t2.image);
|
||||
}
|
||||
break;
|
||||
default:
|
||||
jj_la1[3] = jj_gen;
|
||||
;
|
||||
}
|
||||
break;
|
||||
default:
|
||||
jj_la1[4] = jj_gen;
|
||||
;
|
||||
}
|
||||
}
|
||||
jj_consume_token(TagEnd);
|
||||
}
|
||||
|
||||
final public Token ArgValue() throws ParseException {
|
||||
Token t = null;
|
||||
switch ((jj_ntk==-1)?jj_ntk():jj_ntk) {
|
||||
case ArgValue:
|
||||
t = jj_consume_token(ArgValue);
|
||||
{if (true) return t;}
|
||||
break;
|
||||
default:
|
||||
jj_la1[5] = jj_gen;
|
||||
if (jj_2_1(2)) {
|
||||
jj_consume_token(ArgQuote1);
|
||||
jj_consume_token(CloseQuote1);
|
||||
{if (true) return t;}
|
||||
} else {
|
||||
switch ((jj_ntk==-1)?jj_ntk():jj_ntk) {
|
||||
case ArgQuote1:
|
||||
jj_consume_token(ArgQuote1);
|
||||
t = jj_consume_token(Quote1Text);
|
||||
jj_consume_token(CloseQuote1);
|
||||
{if (true) return t;}
|
||||
break;
|
||||
default:
|
||||
jj_la1[6] = jj_gen;
|
||||
if (jj_2_2(2)) {
|
||||
jj_consume_token(ArgQuote2);
|
||||
jj_consume_token(CloseQuote2);
|
||||
{if (true) return t;}
|
||||
} else {
|
||||
switch ((jj_ntk==-1)?jj_ntk():jj_ntk) {
|
||||
case ArgQuote2:
|
||||
jj_consume_token(ArgQuote2);
|
||||
t = jj_consume_token(Quote2Text);
|
||||
jj_consume_token(CloseQuote2);
|
||||
{if (true) return t;}
|
||||
break;
|
||||
default:
|
||||
jj_la1[7] = jj_gen;
|
||||
jj_consume_token(-1);
|
||||
throw new ParseException();
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
throw new Error("Missing return statement in function");
|
||||
}
|
||||
|
||||
final public Token Decl() throws ParseException {
|
||||
Token t;
|
||||
t = jj_consume_token(DeclName);
|
||||
label_3:
|
||||
while (true) {
|
||||
switch ((jj_ntk==-1)?jj_ntk():jj_ntk) {
|
||||
case ArgName:
|
||||
case ArgEquals:
|
||||
case ArgValue:
|
||||
case ArgQuote1:
|
||||
case ArgQuote2:
|
||||
;
|
||||
break;
|
||||
default:
|
||||
jj_la1[8] = jj_gen;
|
||||
break label_3;
|
||||
}
|
||||
switch ((jj_ntk==-1)?jj_ntk():jj_ntk) {
|
||||
case ArgName:
|
||||
jj_consume_token(ArgName);
|
||||
break;
|
||||
case ArgValue:
|
||||
case ArgQuote1:
|
||||
case ArgQuote2:
|
||||
ArgValue();
|
||||
break;
|
||||
case ArgEquals:
|
||||
jj_consume_token(ArgEquals);
|
||||
break;
|
||||
default:
|
||||
jj_la1[9] = jj_gen;
|
||||
jj_consume_token(-1);
|
||||
throw new ParseException();
|
||||
}
|
||||
}
|
||||
jj_consume_token(TagEnd);
|
||||
{if (true) return t;}
|
||||
throw new Error("Missing return statement in function");
|
||||
}
|
||||
|
||||
final public void CommentTag() throws ParseException {
|
||||
switch ((jj_ntk==-1)?jj_ntk():jj_ntk) {
|
||||
case Comment1:
|
||||
jj_consume_token(Comment1);
|
||||
label_4:
|
||||
while (true) {
|
||||
switch ((jj_ntk==-1)?jj_ntk():jj_ntk) {
|
||||
case CommentText1:
|
||||
;
|
||||
break;
|
||||
default:
|
||||
jj_la1[10] = jj_gen;
|
||||
break label_4;
|
||||
}
|
||||
jj_consume_token(CommentText1);
|
||||
}
|
||||
jj_consume_token(CommentEnd1);
|
||||
break;
|
||||
case Comment2:
|
||||
jj_consume_token(Comment2);
|
||||
label_5:
|
||||
while (true) {
|
||||
switch ((jj_ntk==-1)?jj_ntk():jj_ntk) {
|
||||
case CommentText2:
|
||||
;
|
||||
break;
|
||||
default:
|
||||
jj_la1[11] = jj_gen;
|
||||
break label_5;
|
||||
}
|
||||
jj_consume_token(CommentText2);
|
||||
}
|
||||
jj_consume_token(CommentEnd2);
|
||||
break;
|
||||
default:
|
||||
jj_la1[12] = jj_gen;
|
||||
jj_consume_token(-1);
|
||||
throw new ParseException();
|
||||
}
|
||||
}
|
||||
|
||||
final private boolean jj_2_1(int xla) {
|
||||
jj_la = xla; jj_lastpos = jj_scanpos = token;
|
||||
try { return !jj_3_1(); }
|
||||
catch(LookaheadSuccess ls) { return true; }
|
||||
finally { jj_save(0, xla); }
|
||||
}
|
||||
|
||||
final private boolean jj_2_2(int xla) {
|
||||
jj_la = xla; jj_lastpos = jj_scanpos = token;
|
||||
try { return !jj_3_2(); }
|
||||
catch(LookaheadSuccess ls) { return true; }
|
||||
finally { jj_save(1, xla); }
|
||||
}
|
||||
|
||||
final private boolean jj_3_1() {
|
||||
if (jj_scan_token(ArgQuote1)) return true;
|
||||
if (jj_scan_token(CloseQuote1)) return true;
|
||||
return false;
|
||||
}
|
||||
|
||||
final private boolean jj_3_2() {
|
||||
if (jj_scan_token(ArgQuote2)) return true;
|
||||
if (jj_scan_token(CloseQuote2)) return true;
|
||||
return false;
|
||||
}
|
||||
|
||||
public HTMLParserTokenManager token_source;
|
||||
SimpleCharStream jj_input_stream;
|
||||
public Token token, jj_nt;
|
||||
private int jj_ntk;
|
||||
private Token jj_scanpos, jj_lastpos;
|
||||
private int jj_la;
|
||||
public boolean lookingAhead = false;
|
||||
private boolean jj_semLA;
|
||||
private int jj_gen;
|
||||
final private int[] jj_la1 = new int[13];
|
||||
static private int[] jj_la1_0;
|
||||
static {
|
||||
jj_la1_0();
|
||||
}
|
||||
private static void jj_la1_0() {
|
||||
jj_la1_0 = new int[] {0xb3e,0xb3e,0x1000,0x38000,0x2000,0x8000,0x10000,0x20000,0x3b000,0x3b000,0x800000,0x2000000,0x18,};
|
||||
}
|
||||
final private JJCalls[] jj_2_rtns = new JJCalls[2];
|
||||
private boolean jj_rescan = false;
|
||||
private int jj_gc = 0;
|
||||
|
||||
public HTMLParser(java.io.InputStream stream) {
|
||||
jj_input_stream = new SimpleCharStream(stream, 1, 1);
|
||||
token_source = new HTMLParserTokenManager(jj_input_stream);
|
||||
token = new Token();
|
||||
jj_ntk = -1;
|
||||
jj_gen = 0;
|
||||
for (int i = 0; i < 13; i++) jj_la1[i] = -1;
|
||||
for (int i = 0; i < jj_2_rtns.length; i++) jj_2_rtns[i] = new JJCalls();
|
||||
}
|
||||
|
||||
public void ReInit(java.io.InputStream stream) {
|
||||
jj_input_stream.ReInit(stream, 1, 1);
|
||||
token_source.ReInit(jj_input_stream);
|
||||
token = new Token();
|
||||
jj_ntk = -1;
|
||||
jj_gen = 0;
|
||||
for (int i = 0; i < 13; i++) jj_la1[i] = -1;
|
||||
for (int i = 0; i < jj_2_rtns.length; i++) jj_2_rtns[i] = new JJCalls();
|
||||
}
|
||||
|
||||
public HTMLParser(java.io.Reader stream) {
|
||||
jj_input_stream = new SimpleCharStream(stream, 1, 1);
|
||||
token_source = new HTMLParserTokenManager(jj_input_stream);
|
||||
token = new Token();
|
||||
jj_ntk = -1;
|
||||
jj_gen = 0;
|
||||
for (int i = 0; i < 13; i++) jj_la1[i] = -1;
|
||||
for (int i = 0; i < jj_2_rtns.length; i++) jj_2_rtns[i] = new JJCalls();
|
||||
}
|
||||
|
||||
public void ReInit(java.io.Reader stream) {
|
||||
jj_input_stream.ReInit(stream, 1, 1);
|
||||
token_source.ReInit(jj_input_stream);
|
||||
token = new Token();
|
||||
jj_ntk = -1;
|
||||
jj_gen = 0;
|
||||
for (int i = 0; i < 13; i++) jj_la1[i] = -1;
|
||||
for (int i = 0; i < jj_2_rtns.length; i++) jj_2_rtns[i] = new JJCalls();
|
||||
}
|
||||
|
||||
public HTMLParser(HTMLParserTokenManager tm) {
|
||||
token_source = tm;
|
||||
token = new Token();
|
||||
jj_ntk = -1;
|
||||
jj_gen = 0;
|
||||
for (int i = 0; i < 13; i++) jj_la1[i] = -1;
|
||||
for (int i = 0; i < jj_2_rtns.length; i++) jj_2_rtns[i] = new JJCalls();
|
||||
}
|
||||
|
||||
public void ReInit(HTMLParserTokenManager tm) {
|
||||
token_source = tm;
|
||||
token = new Token();
|
||||
jj_ntk = -1;
|
||||
jj_gen = 0;
|
||||
for (int i = 0; i < 13; i++) jj_la1[i] = -1;
|
||||
for (int i = 0; i < jj_2_rtns.length; i++) jj_2_rtns[i] = new JJCalls();
|
||||
}
|
||||
|
||||
final private Token jj_consume_token(int kind) throws ParseException {
|
||||
Token oldToken;
|
||||
if ((oldToken = token).next != null) token = token.next;
|
||||
else token = token.next = token_source.getNextToken();
|
||||
jj_ntk = -1;
|
||||
if (token.kind == kind) {
|
||||
jj_gen++;
|
||||
if (++jj_gc > 100) {
|
||||
jj_gc = 0;
|
||||
for (int i = 0; i < jj_2_rtns.length; i++) {
|
||||
JJCalls c = jj_2_rtns[i];
|
||||
while (c != null) {
|
||||
if (c.gen < jj_gen) c.first = null;
|
||||
c = c.next;
|
||||
}
|
||||
}
|
||||
}
|
||||
return token;
|
||||
}
|
||||
token = oldToken;
|
||||
jj_kind = kind;
|
||||
throw generateParseException();
|
||||
}
|
||||
|
||||
static private final class LookaheadSuccess extends java.lang.Error { }
|
||||
final private LookaheadSuccess jj_ls = new LookaheadSuccess();
|
||||
final private boolean jj_scan_token(int kind) {
|
||||
if (jj_scanpos == jj_lastpos) {
|
||||
jj_la--;
|
||||
if (jj_scanpos.next == null) {
|
||||
jj_lastpos = jj_scanpos = jj_scanpos.next = token_source.getNextToken();
|
||||
} else {
|
||||
jj_lastpos = jj_scanpos = jj_scanpos.next;
|
||||
}
|
||||
} else {
|
||||
jj_scanpos = jj_scanpos.next;
|
||||
}
|
||||
if (jj_rescan) {
|
||||
int i = 0; Token tok = token;
|
||||
while (tok != null && tok != jj_scanpos) { i++; tok = tok.next; }
|
||||
if (tok != null) jj_add_error_token(kind, i);
|
||||
}
|
||||
if (jj_scanpos.kind != kind) return true;
|
||||
if (jj_la == 0 && jj_scanpos == jj_lastpos) throw jj_ls;
|
||||
return false;
|
||||
}
|
||||
|
||||
final public Token getNextToken() {
|
||||
if (token.next != null) token = token.next;
|
||||
else token = token.next = token_source.getNextToken();
|
||||
jj_ntk = -1;
|
||||
jj_gen++;
|
||||
return token;
|
||||
}
|
||||
|
||||
final public Token getToken(int index) {
|
||||
Token t = lookingAhead ? jj_scanpos : token;
|
||||
for (int i = 0; i < index; i++) {
|
||||
if (t.next != null) t = t.next;
|
||||
else t = t.next = token_source.getNextToken();
|
||||
}
|
||||
return t;
|
||||
}
|
||||
|
||||
final private int jj_ntk() {
|
||||
if ((jj_nt=token.next) == null)
|
||||
return (jj_ntk = (token.next=token_source.getNextToken()).kind);
|
||||
else
|
||||
return (jj_ntk = jj_nt.kind);
|
||||
}
|
||||
|
||||
private java.util.Vector jj_expentries = new java.util.Vector();
|
||||
private int[] jj_expentry;
|
||||
private int jj_kind = -1;
|
||||
private int[] jj_lasttokens = new int[100];
|
||||
private int jj_endpos;
|
||||
|
||||
private void jj_add_error_token(int kind, int pos) {
|
||||
if (pos >= 100) return;
|
||||
if (pos == jj_endpos + 1) {
|
||||
jj_lasttokens[jj_endpos++] = kind;
|
||||
} else if (jj_endpos != 0) {
|
||||
jj_expentry = new int[jj_endpos];
|
||||
for (int i = 0; i < jj_endpos; i++) {
|
||||
jj_expentry[i] = jj_lasttokens[i];
|
||||
}
|
||||
boolean exists = false;
|
||||
for (java.util.Enumeration e = jj_expentries.elements(); e.hasMoreElements();) {
|
||||
int[] oldentry = (int[])(e.nextElement());
|
||||
if (oldentry.length == jj_expentry.length) {
|
||||
exists = true;
|
||||
for (int i = 0; i < jj_expentry.length; i++) {
|
||||
if (oldentry[i] != jj_expentry[i]) {
|
||||
exists = false;
|
||||
break;
|
||||
}
|
||||
}
|
||||
if (exists) break;
|
||||
}
|
||||
}
|
||||
if (!exists) jj_expentries.addElement(jj_expentry);
|
||||
if (pos != 0) jj_lasttokens[(jj_endpos = pos) - 1] = kind;
|
||||
}
|
||||
}
|
||||
|
||||
public ParseException generateParseException() {
|
||||
jj_expentries.removeAllElements();
|
||||
boolean[] la1tokens = new boolean[27];
|
||||
for (int i = 0; i < 27; i++) {
|
||||
la1tokens[i] = false;
|
||||
}
|
||||
if (jj_kind >= 0) {
|
||||
la1tokens[jj_kind] = true;
|
||||
jj_kind = -1;
|
||||
}
|
||||
for (int i = 0; i < 13; i++) {
|
||||
if (jj_la1[i] == jj_gen) {
|
||||
for (int j = 0; j < 32; j++) {
|
||||
if ((jj_la1_0[i] & (1<<j)) != 0) {
|
||||
la1tokens[j] = true;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
for (int i = 0; i < 27; i++) {
|
||||
if (la1tokens[i]) {
|
||||
jj_expentry = new int[1];
|
||||
jj_expentry[0] = i;
|
||||
jj_expentries.addElement(jj_expentry);
|
||||
}
|
||||
}
|
||||
jj_endpos = 0;
|
||||
jj_rescan_token();
|
||||
jj_add_error_token(0, 0);
|
||||
int[][] exptokseq = new int[jj_expentries.size()][];
|
||||
for (int i = 0; i < jj_expentries.size(); i++) {
|
||||
exptokseq[i] = (int[])jj_expentries.elementAt(i);
|
||||
}
|
||||
return new ParseException(token, exptokseq, tokenImage);
|
||||
}
|
||||
|
||||
final public void enable_tracing() {
|
||||
}
|
||||
|
||||
final public void disable_tracing() {
|
||||
}
|
||||
|
||||
final private void jj_rescan_token() {
|
||||
jj_rescan = true;
|
||||
for (int i = 0; i < 2; i++) {
|
||||
JJCalls p = jj_2_rtns[i];
|
||||
do {
|
||||
if (p.gen > jj_gen) {
|
||||
jj_la = p.arg; jj_lastpos = jj_scanpos = p.first;
|
||||
switch (i) {
|
||||
case 0: jj_3_1(); break;
|
||||
case 1: jj_3_2(); break;
|
||||
}
|
||||
}
|
||||
p = p.next;
|
||||
} while (p != null);
|
||||
}
|
||||
jj_rescan = false;
|
||||
}
|
||||
|
||||
final private void jj_save(int index, int xla) {
|
||||
JJCalls p = jj_2_rtns[index];
|
||||
while (p.gen > jj_gen) {
|
||||
if (p.next == null) { p = p.next = new JJCalls(); break; }
|
||||
p = p.next;
|
||||
}
|
||||
p.gen = jj_gen + xla - jj_la; p.first = token; p.arg = xla;
|
||||
}
|
||||
|
||||
static final class JJCalls {
|
||||
int gen;
|
||||
Token first;
|
||||
int arg;
|
||||
JJCalls next;
|
||||
}
|
||||
|
||||
// void handleException(Exception e) {
|
||||
// System.out.println(e.toString()); // print the error message
|
||||
// System.out.println("Skipping...");
|
||||
// Token t;
|
||||
// do {
|
||||
// t = getNextToken();
|
||||
// } while (t.kind != TagEnd);
|
||||
// }
|
||||
}
|
|
@ -0,0 +1,71 @@
|
|||
/* Generated By:JavaCC: Do not edit this line. HTMLParserConstants.java */
|
||||
package org.apache.lucene.demo.html;
|
||||
|
||||
public interface HTMLParserConstants {
|
||||
|
||||
int EOF = 0;
|
||||
int TagName = 1;
|
||||
int DeclName = 2;
|
||||
int Comment1 = 3;
|
||||
int Comment2 = 4;
|
||||
int Word = 5;
|
||||
int LET = 6;
|
||||
int NUM = 7;
|
||||
int Entity = 8;
|
||||
int Space = 9;
|
||||
int SP = 10;
|
||||
int Punct = 11;
|
||||
int ArgName = 12;
|
||||
int ArgEquals = 13;
|
||||
int TagEnd = 14;
|
||||
int ArgValue = 15;
|
||||
int ArgQuote1 = 16;
|
||||
int ArgQuote2 = 17;
|
||||
int Quote1Text = 19;
|
||||
int CloseQuote1 = 20;
|
||||
int Quote2Text = 21;
|
||||
int CloseQuote2 = 22;
|
||||
int CommentText1 = 23;
|
||||
int CommentEnd1 = 24;
|
||||
int CommentText2 = 25;
|
||||
int CommentEnd2 = 26;
|
||||
|
||||
int DEFAULT = 0;
|
||||
int WithinTag = 1;
|
||||
int AfterEquals = 2;
|
||||
int WithinQuote1 = 3;
|
||||
int WithinQuote2 = 4;
|
||||
int WithinComment1 = 5;
|
||||
int WithinComment2 = 6;
|
||||
|
||||
String[] tokenImage = {
|
||||
"<EOF>",
|
||||
"<TagName>",
|
||||
"<DeclName>",
|
||||
"\"<!--\"",
|
||||
"\"<!\"",
|
||||
"<Word>",
|
||||
"<LET>",
|
||||
"<NUM>",
|
||||
"<Entity>",
|
||||
"<Space>",
|
||||
"<SP>",
|
||||
"<Punct>",
|
||||
"<ArgName>",
|
||||
"\"=\"",
|
||||
"<TagEnd>",
|
||||
"<ArgValue>",
|
||||
"\"\\\'\"",
|
||||
"\"\\\"\"",
|
||||
"<token of kind 18>",
|
||||
"<Quote1Text>",
|
||||
"<CloseQuote1>",
|
||||
"<Quote2Text>",
|
||||
"<CloseQuote2>",
|
||||
"<CommentText1>",
|
||||
"\"-->\"",
|
||||
"<CommentText2>",
|
||||
"\">\"",
|
||||
};
|
||||
|
||||
}
|
File diff suppressed because it is too large
Load Diff
|
@ -0,0 +1,192 @@
|
|||
/* Generated By:JavaCC: Do not edit this line. ParseException.java Version 3.0 */
|
||||
package org.apache.lucene.demo.html;
|
||||
|
||||
/**
|
||||
* This exception is thrown when parse errors are encountered.
|
||||
* You can explicitly create objects of this exception type by
|
||||
* calling the method generateParseException in the generated
|
||||
* parser.
|
||||
*
|
||||
* You can modify this class to customize your error reporting
|
||||
* mechanisms so long as you retain the public fields.
|
||||
*/
|
||||
public class ParseException extends Exception {
|
||||
|
||||
/**
|
||||
* This constructor is used by the method "generateParseException"
|
||||
* in the generated parser. Calling this constructor generates
|
||||
* a new object of this type with the fields "currentToken",
|
||||
* "expectedTokenSequences", and "tokenImage" set. The boolean
|
||||
* flag "specialConstructor" is also set to true to indicate that
|
||||
* this constructor was used to create this object.
|
||||
* This constructor calls its super class with the empty string
|
||||
* to force the "toString" method of parent class "Throwable" to
|
||||
* print the error message in the form:
|
||||
* ParseException: <result of getMessage>
|
||||
*/
|
||||
public ParseException(Token currentTokenVal,
|
||||
int[][] expectedTokenSequencesVal,
|
||||
String[] tokenImageVal
|
||||
)
|
||||
{
|
||||
super("");
|
||||
specialConstructor = true;
|
||||
currentToken = currentTokenVal;
|
||||
expectedTokenSequences = expectedTokenSequencesVal;
|
||||
tokenImage = tokenImageVal;
|
||||
}
|
||||
|
||||
/**
|
||||
* The following constructors are for use by you for whatever
|
||||
* purpose you can think of. Constructing the exception in this
|
||||
* manner makes the exception behave in the normal way - i.e., as
|
||||
* documented in the class "Throwable". The fields "errorToken",
|
||||
* "expectedTokenSequences", and "tokenImage" do not contain
|
||||
* relevant information. The JavaCC generated code does not use
|
||||
* these constructors.
|
||||
*/
|
||||
|
||||
public ParseException() {
|
||||
super();
|
||||
specialConstructor = false;
|
||||
}
|
||||
|
||||
public ParseException(String message) {
|
||||
super(message);
|
||||
specialConstructor = false;
|
||||
}
|
||||
|
||||
/**
|
||||
* This variable determines which constructor was used to create
|
||||
* this object and thereby affects the semantics of the
|
||||
* "getMessage" method (see below).
|
||||
*/
|
||||
protected boolean specialConstructor;
|
||||
|
||||
/**
|
||||
* This is the last token that has been consumed successfully. If
|
||||
* this object has been created due to a parse error, the token
|
||||
* followng this token will (therefore) be the first error token.
|
||||
*/
|
||||
public Token currentToken;
|
||||
|
||||
/**
|
||||
* Each entry in this array is an array of integers. Each array
|
||||
* of integers represents a sequence of tokens (by their ordinal
|
||||
* values) that is expected at this point of the parse.
|
||||
*/
|
||||
public int[][] expectedTokenSequences;
|
||||
|
||||
/**
|
||||
* This is a reference to the "tokenImage" array of the generated
|
||||
* parser within which the parse error occurred. This array is
|
||||
* defined in the generated ...Constants interface.
|
||||
*/
|
||||
public String[] tokenImage;
|
||||
|
||||
/**
|
||||
* This method has the standard behavior when this object has been
|
||||
* created using the standard constructors. Otherwise, it uses
|
||||
* "currentToken" and "expectedTokenSequences" to generate a parse
|
||||
* error message and returns it. If this object has been created
|
||||
* due to a parse error, and you do not catch it (it gets thrown
|
||||
* from the parser), then this method is called during the printing
|
||||
* of the final stack trace, and hence the correct error message
|
||||
* gets displayed.
|
||||
*/
|
||||
public String getMessage() {
|
||||
if (!specialConstructor) {
|
||||
return super.getMessage();
|
||||
}
|
||||
String expected = "";
|
||||
int maxSize = 0;
|
||||
for (int i = 0; i < expectedTokenSequences.length; i++) {
|
||||
if (maxSize < expectedTokenSequences[i].length) {
|
||||
maxSize = expectedTokenSequences[i].length;
|
||||
}
|
||||
for (int j = 0; j < expectedTokenSequences[i].length; j++) {
|
||||
expected += tokenImage[expectedTokenSequences[i][j]] + " ";
|
||||
}
|
||||
if (expectedTokenSequences[i][expectedTokenSequences[i].length - 1] != 0) {
|
||||
expected += "...";
|
||||
}
|
||||
expected += eol + " ";
|
||||
}
|
||||
String retval = "Encountered \"";
|
||||
Token tok = currentToken.next;
|
||||
for (int i = 0; i < maxSize; i++) {
|
||||
if (i != 0) retval += " ";
|
||||
if (tok.kind == 0) {
|
||||
retval += tokenImage[0];
|
||||
break;
|
||||
}
|
||||
retval += add_escapes(tok.image);
|
||||
tok = tok.next;
|
||||
}
|
||||
retval += "\" at line " + currentToken.next.beginLine + ", column " + currentToken.next.beginColumn;
|
||||
retval += "." + eol;
|
||||
if (expectedTokenSequences.length == 1) {
|
||||
retval += "Was expecting:" + eol + " ";
|
||||
} else {
|
||||
retval += "Was expecting one of:" + eol + " ";
|
||||
}
|
||||
retval += expected;
|
||||
return retval;
|
||||
}
|
||||
|
||||
/**
|
||||
* The end of line string for this machine.
|
||||
*/
|
||||
protected String eol = System.getProperty("line.separator", "\n");
|
||||
|
||||
/**
|
||||
* Used to convert raw characters to their escaped version
|
||||
* when these raw version cannot be used as part of an ASCII
|
||||
* string literal.
|
||||
*/
|
||||
protected String add_escapes(String str) {
|
||||
StringBuffer retval = new StringBuffer();
|
||||
char ch;
|
||||
for (int i = 0; i < str.length(); i++) {
|
||||
switch (str.charAt(i))
|
||||
{
|
||||
case 0 :
|
||||
continue;
|
||||
case '\b':
|
||||
retval.append("\\b");
|
||||
continue;
|
||||
case '\t':
|
||||
retval.append("\\t");
|
||||
continue;
|
||||
case '\n':
|
||||
retval.append("\\n");
|
||||
continue;
|
||||
case '\f':
|
||||
retval.append("\\f");
|
||||
continue;
|
||||
case '\r':
|
||||
retval.append("\\r");
|
||||
continue;
|
||||
case '\"':
|
||||
retval.append("\\\"");
|
||||
continue;
|
||||
case '\'':
|
||||
retval.append("\\\'");
|
||||
continue;
|
||||
case '\\':
|
||||
retval.append("\\\\");
|
||||
continue;
|
||||
default:
|
||||
if ((ch = str.charAt(i)) < 0x20 || ch > 0x7e) {
|
||||
String s = "0000" + Integer.toString(ch, 16);
|
||||
retval.append("\\u" + s.substring(s.length() - 4, s.length()));
|
||||
} else {
|
||||
retval.append(ch);
|
||||
}
|
||||
continue;
|
||||
}
|
||||
}
|
||||
return retval.toString();
|
||||
}
|
||||
|
||||
}
|
|
@ -0,0 +1,401 @@
|
|||
/* Generated By:JavaCC: Do not edit this line. SimpleCharStream.java Version 3.0 */
|
||||
package org.apache.lucene.demo.html;
|
||||
|
||||
/**
|
||||
* An implementation of interface CharStream, where the stream is assumed to
|
||||
* contain only ASCII characters (without unicode processing).
|
||||
*/
|
||||
|
||||
public class SimpleCharStream
|
||||
{
|
||||
public static final boolean staticFlag = false;
|
||||
int bufsize;
|
||||
int available;
|
||||
int tokenBegin;
|
||||
public int bufpos = -1;
|
||||
protected int bufline[];
|
||||
protected int bufcolumn[];
|
||||
|
||||
protected int column = 0;
|
||||
protected int line = 1;
|
||||
|
||||
protected boolean prevCharIsCR = false;
|
||||
protected boolean prevCharIsLF = false;
|
||||
|
||||
protected java.io.Reader inputStream;
|
||||
|
||||
protected char[] buffer;
|
||||
protected int maxNextCharInd = 0;
|
||||
protected int inBuf = 0;
|
||||
|
||||
protected void ExpandBuff(boolean wrapAround)
|
||||
{
|
||||
char[] newbuffer = new char[bufsize + 2048];
|
||||
int newbufline[] = new int[bufsize + 2048];
|
||||
int newbufcolumn[] = new int[bufsize + 2048];
|
||||
|
||||
try
|
||||
{
|
||||
if (wrapAround)
|
||||
{
|
||||
System.arraycopy(buffer, tokenBegin, newbuffer, 0, bufsize - tokenBegin);
|
||||
System.arraycopy(buffer, 0, newbuffer,
|
||||
bufsize - tokenBegin, bufpos);
|
||||
buffer = newbuffer;
|
||||
|
||||
System.arraycopy(bufline, tokenBegin, newbufline, 0, bufsize - tokenBegin);
|
||||
System.arraycopy(bufline, 0, newbufline, bufsize - tokenBegin, bufpos);
|
||||
bufline = newbufline;
|
||||
|
||||
System.arraycopy(bufcolumn, tokenBegin, newbufcolumn, 0, bufsize - tokenBegin);
|
||||
System.arraycopy(bufcolumn, 0, newbufcolumn, bufsize - tokenBegin, bufpos);
|
||||
bufcolumn = newbufcolumn;
|
||||
|
||||
maxNextCharInd = (bufpos += (bufsize - tokenBegin));
|
||||
}
|
||||
else
|
||||
{
|
||||
System.arraycopy(buffer, tokenBegin, newbuffer, 0, bufsize - tokenBegin);
|
||||
buffer = newbuffer;
|
||||
|
||||
System.arraycopy(bufline, tokenBegin, newbufline, 0, bufsize - tokenBegin);
|
||||
bufline = newbufline;
|
||||
|
||||
System.arraycopy(bufcolumn, tokenBegin, newbufcolumn, 0, bufsize - tokenBegin);
|
||||
bufcolumn = newbufcolumn;
|
||||
|
||||
maxNextCharInd = (bufpos -= tokenBegin);
|
||||
}
|
||||
}
|
||||
catch (Throwable t)
|
||||
{
|
||||
throw new Error(t.getMessage());
|
||||
}
|
||||
|
||||
|
||||
bufsize += 2048;
|
||||
available = bufsize;
|
||||
tokenBegin = 0;
|
||||
}
|
||||
|
||||
protected void FillBuff() throws java.io.IOException
|
||||
{
|
||||
if (maxNextCharInd == available)
|
||||
{
|
||||
if (available == bufsize)
|
||||
{
|
||||
if (tokenBegin > 2048)
|
||||
{
|
||||
bufpos = maxNextCharInd = 0;
|
||||
available = tokenBegin;
|
||||
}
|
||||
else if (tokenBegin < 0)
|
||||
bufpos = maxNextCharInd = 0;
|
||||
else
|
||||
ExpandBuff(false);
|
||||
}
|
||||
else if (available > tokenBegin)
|
||||
available = bufsize;
|
||||
else if ((tokenBegin - available) < 2048)
|
||||
ExpandBuff(true);
|
||||
else
|
||||
available = tokenBegin;
|
||||
}
|
||||
|
||||
int i;
|
||||
try {
|
||||
if ((i = inputStream.read(buffer, maxNextCharInd,
|
||||
available - maxNextCharInd)) == -1)
|
||||
{
|
||||
inputStream.close();
|
||||
throw new java.io.IOException();
|
||||
}
|
||||
else
|
||||
maxNextCharInd += i;
|
||||
return;
|
||||
}
|
||||
catch(java.io.IOException e) {
|
||||
--bufpos;
|
||||
backup(0);
|
||||
if (tokenBegin == -1)
|
||||
tokenBegin = bufpos;
|
||||
throw e;
|
||||
}
|
||||
}
|
||||
|
||||
public char BeginToken() throws java.io.IOException
|
||||
{
|
||||
tokenBegin = -1;
|
||||
char c = readChar();
|
||||
tokenBegin = bufpos;
|
||||
|
||||
return c;
|
||||
}
|
||||
|
||||
protected void UpdateLineColumn(char c)
|
||||
{
|
||||
column++;
|
||||
|
||||
if (prevCharIsLF)
|
||||
{
|
||||
prevCharIsLF = false;
|
||||
line += (column = 1);
|
||||
}
|
||||
else if (prevCharIsCR)
|
||||
{
|
||||
prevCharIsCR = false;
|
||||
if (c == '\n')
|
||||
{
|
||||
prevCharIsLF = true;
|
||||
}
|
||||
else
|
||||
line += (column = 1);
|
||||
}
|
||||
|
||||
switch (c)
|
||||
{
|
||||
case '\r' :
|
||||
prevCharIsCR = true;
|
||||
break;
|
||||
case '\n' :
|
||||
prevCharIsLF = true;
|
||||
break;
|
||||
case '\t' :
|
||||
column--;
|
||||
column += (8 - (column & 07));
|
||||
break;
|
||||
default :
|
||||
break;
|
||||
}
|
||||
|
||||
bufline[bufpos] = line;
|
||||
bufcolumn[bufpos] = column;
|
||||
}
|
||||
|
||||
public char readChar() throws java.io.IOException
|
||||
{
|
||||
if (inBuf > 0)
|
||||
{
|
||||
--inBuf;
|
||||
|
||||
if (++bufpos == bufsize)
|
||||
bufpos = 0;
|
||||
|
||||
return buffer[bufpos];
|
||||
}
|
||||
|
||||
if (++bufpos >= maxNextCharInd)
|
||||
FillBuff();
|
||||
|
||||
char c = buffer[bufpos];
|
||||
|
||||
UpdateLineColumn(c);
|
||||
return (c);
|
||||
}
|
||||
|
||||
/**
|
||||
* @deprecated
|
||||
* @see #getEndColumn
|
||||
*/
|
||||
|
||||
public int getColumn() {
|
||||
return bufcolumn[bufpos];
|
||||
}
|
||||
|
||||
/**
|
||||
* @deprecated
|
||||
* @see #getEndLine
|
||||
*/
|
||||
|
||||
public int getLine() {
|
||||
return bufline[bufpos];
|
||||
}
|
||||
|
||||
public int getEndColumn() {
|
||||
return bufcolumn[bufpos];
|
||||
}
|
||||
|
||||
public int getEndLine() {
|
||||
return bufline[bufpos];
|
||||
}
|
||||
|
||||
public int getBeginColumn() {
|
||||
return bufcolumn[tokenBegin];
|
||||
}
|
||||
|
||||
public int getBeginLine() {
|
||||
return bufline[tokenBegin];
|
||||
}
|
||||
|
||||
public void backup(int amount) {
|
||||
|
||||
inBuf += amount;
|
||||
if ((bufpos -= amount) < 0)
|
||||
bufpos += bufsize;
|
||||
}
|
||||
|
||||
public SimpleCharStream(java.io.Reader dstream, int startline,
|
||||
int startcolumn, int buffersize)
|
||||
{
|
||||
inputStream = dstream;
|
||||
line = startline;
|
||||
column = startcolumn - 1;
|
||||
|
||||
available = bufsize = buffersize;
|
||||
buffer = new char[buffersize];
|
||||
bufline = new int[buffersize];
|
||||
bufcolumn = new int[buffersize];
|
||||
}
|
||||
|
||||
public SimpleCharStream(java.io.Reader dstream, int startline,
|
||||
int startcolumn)
|
||||
{
|
||||
this(dstream, startline, startcolumn, 4096);
|
||||
}
|
||||
|
||||
public SimpleCharStream(java.io.Reader dstream)
|
||||
{
|
||||
this(dstream, 1, 1, 4096);
|
||||
}
|
||||
public void ReInit(java.io.Reader dstream, int startline,
|
||||
int startcolumn, int buffersize)
|
||||
{
|
||||
inputStream = dstream;
|
||||
line = startline;
|
||||
column = startcolumn - 1;
|
||||
|
||||
if (buffer == null || buffersize != buffer.length)
|
||||
{
|
||||
available = bufsize = buffersize;
|
||||
buffer = new char[buffersize];
|
||||
bufline = new int[buffersize];
|
||||
bufcolumn = new int[buffersize];
|
||||
}
|
||||
prevCharIsLF = prevCharIsCR = false;
|
||||
tokenBegin = inBuf = maxNextCharInd = 0;
|
||||
bufpos = -1;
|
||||
}
|
||||
|
||||
public void ReInit(java.io.Reader dstream, int startline,
|
||||
int startcolumn)
|
||||
{
|
||||
ReInit(dstream, startline, startcolumn, 4096);
|
||||
}
|
||||
|
||||
public void ReInit(java.io.Reader dstream)
|
||||
{
|
||||
ReInit(dstream, 1, 1, 4096);
|
||||
}
|
||||
public SimpleCharStream(java.io.InputStream dstream, int startline,
|
||||
int startcolumn, int buffersize)
|
||||
{
|
||||
this(new java.io.InputStreamReader(dstream), startline, startcolumn, 4096);
|
||||
}
|
||||
|
||||
public SimpleCharStream(java.io.InputStream dstream, int startline,
|
||||
int startcolumn)
|
||||
{
|
||||
this(dstream, startline, startcolumn, 4096);
|
||||
}
|
||||
|
||||
public SimpleCharStream(java.io.InputStream dstream)
|
||||
{
|
||||
this(dstream, 1, 1, 4096);
|
||||
}
|
||||
|
||||
public void ReInit(java.io.InputStream dstream, int startline,
|
||||
int startcolumn, int buffersize)
|
||||
{
|
||||
ReInit(new java.io.InputStreamReader(dstream), startline, startcolumn, 4096);
|
||||
}
|
||||
|
||||
public void ReInit(java.io.InputStream dstream)
|
||||
{
|
||||
ReInit(dstream, 1, 1, 4096);
|
||||
}
|
||||
public void ReInit(java.io.InputStream dstream, int startline,
|
||||
int startcolumn)
|
||||
{
|
||||
ReInit(dstream, startline, startcolumn, 4096);
|
||||
}
|
||||
public String GetImage()
|
||||
{
|
||||
if (bufpos >= tokenBegin)
|
||||
return new String(buffer, tokenBegin, bufpos - tokenBegin + 1);
|
||||
else
|
||||
return new String(buffer, tokenBegin, bufsize - tokenBegin) +
|
||||
new String(buffer, 0, bufpos + 1);
|
||||
}
|
||||
|
||||
public char[] GetSuffix(int len)
|
||||
{
|
||||
char[] ret = new char[len];
|
||||
|
||||
if ((bufpos + 1) >= len)
|
||||
System.arraycopy(buffer, bufpos - len + 1, ret, 0, len);
|
||||
else
|
||||
{
|
||||
System.arraycopy(buffer, bufsize - (len - bufpos - 1), ret, 0,
|
||||
len - bufpos - 1);
|
||||
System.arraycopy(buffer, 0, ret, len - bufpos - 1, bufpos + 1);
|
||||
}
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
public void Done()
|
||||
{
|
||||
buffer = null;
|
||||
bufline = null;
|
||||
bufcolumn = null;
|
||||
}
|
||||
|
||||
/**
|
||||
* Method to adjust line and column numbers for the start of a token.
|
||||
*/
|
||||
public void adjustBeginLineColumn(int newLine, int newCol)
|
||||
{
|
||||
int start = tokenBegin;
|
||||
int len;
|
||||
|
||||
if (bufpos >= tokenBegin)
|
||||
{
|
||||
len = bufpos - tokenBegin + inBuf + 1;
|
||||
}
|
||||
else
|
||||
{
|
||||
len = bufsize - tokenBegin + bufpos + 1 + inBuf;
|
||||
}
|
||||
|
||||
int i = 0, j = 0, k = 0;
|
||||
int nextColDiff = 0, columnDiff = 0;
|
||||
|
||||
while (i < len &&
|
||||
bufline[j = start % bufsize] == bufline[k = ++start % bufsize])
|
||||
{
|
||||
bufline[j] = newLine;
|
||||
nextColDiff = columnDiff + bufcolumn[k] - bufcolumn[j];
|
||||
bufcolumn[j] = newCol + columnDiff;
|
||||
columnDiff = nextColDiff;
|
||||
i++;
|
||||
}
|
||||
|
||||
if (i < len)
|
||||
{
|
||||
bufline[j] = newLine++;
|
||||
bufcolumn[j] = newCol + columnDiff;
|
||||
|
||||
while (i++ < len)
|
||||
{
|
||||
if (bufline[j = start % bufsize] != bufline[++start % bufsize])
|
||||
bufline[j] = newLine++;
|
||||
else
|
||||
bufline[j] = newLine;
|
||||
}
|
||||
}
|
||||
|
||||
line = bufline[j];
|
||||
column = bufcolumn[j];
|
||||
}
|
||||
|
||||
}
|
|
@ -0,0 +1,81 @@
|
|||
/* Generated By:JavaCC: Do not edit this line. Token.java Version 3.0 */
|
||||
package org.apache.lucene.demo.html;
|
||||
|
||||
/**
|
||||
* Describes the input token stream.
|
||||
*/
|
||||
|
||||
public class Token {
|
||||
|
||||
/**
|
||||
* An integer that describes the kind of this token. This numbering
|
||||
* system is determined by JavaCCParser, and a table of these numbers is
|
||||
* stored in the file ...Constants.java.
|
||||
*/
|
||||
public int kind;
|
||||
|
||||
/**
|
||||
* beginLine and beginColumn describe the position of the first character
|
||||
* of this token; endLine and endColumn describe the position of the
|
||||
* last character of this token.
|
||||
*/
|
||||
public int beginLine, beginColumn, endLine, endColumn;
|
||||
|
||||
/**
|
||||
* The string image of the token.
|
||||
*/
|
||||
public String image;
|
||||
|
||||
/**
|
||||
* A reference to the next regular (non-special) token from the input
|
||||
* stream. If this is the last token from the input stream, or if the
|
||||
* token manager has not read tokens beyond this one, this field is
|
||||
* set to null. This is true only if this token is also a regular
|
||||
* token. Otherwise, see below for a description of the contents of
|
||||
* this field.
|
||||
*/
|
||||
public Token next;
|
||||
|
||||
/**
|
||||
* This field is used to access special tokens that occur prior to this
|
||||
* token, but after the immediately preceding regular (non-special) token.
|
||||
* If there are no such special tokens, this field is set to null.
|
||||
* When there are more than one such special token, this field refers
|
||||
* to the last of these special tokens, which in turn refers to the next
|
||||
* previous special token through its specialToken field, and so on
|
||||
* until the first special token (whose specialToken field is null).
|
||||
* The next fields of special tokens refer to other special tokens that
|
||||
* immediately follow it (without an intervening regular token). If there
|
||||
* is no such token, this field is null.
|
||||
*/
|
||||
public Token specialToken;
|
||||
|
||||
/**
|
||||
* Returns the image.
|
||||
*/
|
||||
public String toString()
|
||||
{
|
||||
return image;
|
||||
}
|
||||
|
||||
/**
|
||||
* Returns a new Token object, by default. However, if you want, you
|
||||
* can create and return subclass objects based on the value of ofKind.
|
||||
* Simply add the cases to the switch for all those special cases.
|
||||
* For example, if you have a subclass of Token called IDToken that
|
||||
* you want to create if ofKind is ID, simlpy add something like :
|
||||
*
|
||||
* case MyParserConstants.ID : return new IDToken();
|
||||
*
|
||||
* to the following switch statement. Then you can cast matchedToken
|
||||
* variable to the appropriate type and use it in your lexical actions.
|
||||
*/
|
||||
public static final Token newToken(int ofKind)
|
||||
{
|
||||
switch(ofKind)
|
||||
{
|
||||
default : return new Token();
|
||||
}
|
||||
}
|
||||
|
||||
}
|
|
@ -0,0 +1,133 @@
|
|||
/* Generated By:JavaCC: Do not edit this line. TokenMgrError.java Version 3.0 */
|
||||
package org.apache.lucene.demo.html;
|
||||
|
||||
public class TokenMgrError extends Error
|
||||
{
|
||||
/*
|
||||
* Ordinals for various reasons why an Error of this type can be thrown.
|
||||
*/
|
||||
|
||||
/**
|
||||
* Lexical error occured.
|
||||
*/
|
||||
static final int LEXICAL_ERROR = 0;
|
||||
|
||||
/**
|
||||
* An attempt wass made to create a second instance of a static token manager.
|
||||
*/
|
||||
static final int STATIC_LEXER_ERROR = 1;
|
||||
|
||||
/**
|
||||
* Tried to change to an invalid lexical state.
|
||||
*/
|
||||
static final int INVALID_LEXICAL_STATE = 2;
|
||||
|
||||
/**
|
||||
* Detected (and bailed out of) an infinite loop in the token manager.
|
||||
*/
|
||||
static final int LOOP_DETECTED = 3;
|
||||
|
||||
/**
|
||||
* Indicates the reason why the exception is thrown. It will have
|
||||
* one of the above 4 values.
|
||||
*/
|
||||
int errorCode;
|
||||
|
||||
/**
|
||||
* Replaces unprintable characters by their espaced (or unicode escaped)
|
||||
* equivalents in the given string
|
||||
*/
|
||||
protected static final String addEscapes(String str) {
|
||||
StringBuffer retval = new StringBuffer();
|
||||
char ch;
|
||||
for (int i = 0; i < str.length(); i++) {
|
||||
switch (str.charAt(i))
|
||||
{
|
||||
case 0 :
|
||||
continue;
|
||||
case '\b':
|
||||
retval.append("\\b");
|
||||
continue;
|
||||
case '\t':
|
||||
retval.append("\\t");
|
||||
continue;
|
||||
case '\n':
|
||||
retval.append("\\n");
|
||||
continue;
|
||||
case '\f':
|
||||
retval.append("\\f");
|
||||
continue;
|
||||
case '\r':
|
||||
retval.append("\\r");
|
||||
continue;
|
||||
case '\"':
|
||||
retval.append("\\\"");
|
||||
continue;
|
||||
case '\'':
|
||||
retval.append("\\\'");
|
||||
continue;
|
||||
case '\\':
|
||||
retval.append("\\\\");
|
||||
continue;
|
||||
default:
|
||||
if ((ch = str.charAt(i)) < 0x20 || ch > 0x7e) {
|
||||
String s = "0000" + Integer.toString(ch, 16);
|
||||
retval.append("\\u" + s.substring(s.length() - 4, s.length()));
|
||||
} else {
|
||||
retval.append(ch);
|
||||
}
|
||||
continue;
|
||||
}
|
||||
}
|
||||
return retval.toString();
|
||||
}
|
||||
|
||||
/**
|
||||
* Returns a detailed message for the Error when it is thrown by the
|
||||
* token manager to indicate a lexical error.
|
||||
* Parameters :
|
||||
* EOFSeen : indicates if EOF caused the lexicl error
|
||||
* curLexState : lexical state in which this error occured
|
||||
* errorLine : line number when the error occured
|
||||
* errorColumn : column number when the error occured
|
||||
* errorAfter : prefix that was seen before this error occured
|
||||
* curchar : the offending character
|
||||
* Note: You can customize the lexical error message by modifying this method.
|
||||
*/
|
||||
protected static String LexicalError(boolean EOFSeen, int lexState, int errorLine, int errorColumn, String errorAfter, char curChar) {
|
||||
return("Lexical error at line " +
|
||||
errorLine + ", column " +
|
||||
errorColumn + ". Encountered: " +
|
||||
(EOFSeen ? "<EOF> " : ("\"" + addEscapes(String.valueOf(curChar)) + "\"") + " (" + (int)curChar + "), ") +
|
||||
"after : \"" + addEscapes(errorAfter) + "\"");
|
||||
}
|
||||
|
||||
/**
|
||||
* You can also modify the body of this method to customize your error messages.
|
||||
* For example, cases like LOOP_DETECTED and INVALID_LEXICAL_STATE are not
|
||||
* of end-users concern, so you can return something like :
|
||||
*
|
||||
* "Internal Error : Please file a bug report .... "
|
||||
*
|
||||
* from this method for such cases in the release version of your parser.
|
||||
*/
|
||||
public String getMessage() {
|
||||
return super.getMessage();
|
||||
}
|
||||
|
||||
/*
|
||||
* Constructors of various flavors follow.
|
||||
*/
|
||||
|
||||
public TokenMgrError() {
|
||||
}
|
||||
|
||||
public TokenMgrError(String message, int reason) {
|
||||
super(message);
|
||||
errorCode = reason;
|
||||
}
|
||||
|
||||
public TokenMgrError(boolean EOFSeen, int lexState, int errorLine, int errorColumn, String errorAfter, char curChar, int reason) {
|
||||
this(LexicalError(EOFSeen, lexState, errorLine, errorColumn, errorAfter, curChar), reason);
|
||||
}
|
||||
}
|
|
@ -1,6 +0,0 @@
|
|||
Token.java
|
||||
StandardTokenizer.java
|
||||
StandardTokenizerTokenManager.java
|
||||
TokenMgrError.java
|
||||
CharStream.java
|
||||
StandardTokenizerConstants.java
|
|
@ -0,0 +1,110 @@
|
|||
/* Generated By:JavaCC: Do not edit this line. CharStream.java Version 3.0 */
|
||||
package org.apache.lucene.analysis.standard;
|
||||
|
||||
/**
|
||||
* This interface describes a character stream that maintains line and
|
||||
* column number positions of the characters. It also has the capability
|
||||
* to backup the stream to some extent. An implementation of this
|
||||
* interface is used in the TokenManager implementation generated by
|
||||
* JavaCCParser.
|
||||
*
|
||||
* All the methods except backup can be implemented in any fashion. backup
|
||||
* needs to be implemented correctly for the correct operation of the lexer.
|
||||
* Rest of the methods are all used to get information like line number,
|
||||
* column number and the String that constitutes a token and are not used
|
||||
* by the lexer. Hence their implementation won't affect the generated lexer's
|
||||
* operation.
|
||||
*/
|
||||
|
||||
public interface CharStream {
|
||||
|
||||
/**
|
||||
* Returns the next character from the selected input. The method
|
||||
* of selecting the input is the responsibility of the class
|
||||
* implementing this interface. Can throw any java.io.IOException.
|
||||
*/
|
||||
char readChar() throws java.io.IOException;
|
||||
|
||||
/**
|
||||
* Returns the column position of the character last read.
|
||||
* @deprecated
|
||||
* @see #getEndColumn
|
||||
*/
|
||||
int getColumn();
|
||||
|
||||
/**
|
||||
* Returns the line number of the character last read.
|
||||
* @deprecated
|
||||
* @see #getEndLine
|
||||
*/
|
||||
int getLine();
|
||||
|
||||
/**
|
||||
* Returns the column number of the last character for current token (being
|
||||
* matched after the last call to BeginTOken).
|
||||
*/
|
||||
int getEndColumn();
|
||||
|
||||
/**
|
||||
* Returns the line number of the last character for current token (being
|
||||
* matched after the last call to BeginTOken).
|
||||
*/
|
||||
int getEndLine();
|
||||
|
||||
/**
|
||||
* Returns the column number of the first character for current token (being
|
||||
* matched after the last call to BeginTOken).
|
||||
*/
|
||||
int getBeginColumn();
|
||||
|
||||
/**
|
||||
* Returns the line number of the first character for current token (being
|
||||
* matched after the last call to BeginTOken).
|
||||
*/
|
||||
int getBeginLine();
|
||||
|
||||
/**
|
||||
* Backs up the input stream by amount steps. Lexer calls this method if it
|
||||
* had already read some characters, but could not use them to match a
|
||||
* (longer) token. So, they will be used again as the prefix of the next
|
||||
* token and it is the implemetation's responsibility to do this right.
|
||||
*/
|
||||
void backup(int amount);
|
||||
|
||||
/**
|
||||
* Returns the next character that marks the beginning of the next token.
|
||||
* All characters must remain in the buffer between two successive calls
|
||||
* to this method to implement backup correctly.
|
||||
*/
|
||||
char BeginToken() throws java.io.IOException;
|
||||
|
||||
/**
|
||||
* Returns a string made up of characters from the marked token beginning
|
||||
* to the current buffer position. Implementations have the choice of returning
|
||||
* anything that they want to. For example, for efficiency, one might decide
|
||||
* to just return null, which is a valid implementation.
|
||||
*/
|
||||
String GetImage();
|
||||
|
||||
/**
|
||||
* Returns an array of characters that make up the suffix of length 'len' for
|
||||
* the currently matched token. This is used to build up the matched string
|
||||
* for use in actions in the case of MORE. A simple and inefficient
|
||||
* implementation of this is as follows :
|
||||
*
|
||||
* {
|
||||
* String t = GetImage();
|
||||
* return t.substring(t.length() - len, t.length()).toCharArray();
|
||||
* }
|
||||
*/
|
||||
char[] GetSuffix(int len);
|
||||
|
||||
/**
|
||||
* The lexer calls this function to indicate that it is done with the stream
|
||||
* and hence implementations can free any resources held by this class.
|
||||
* Again, the body of this function can be just empty and it will not
|
||||
* affect the lexer's operation.
|
||||
*/
|
||||
void Done();
|
||||
|
||||
}
|
|
@ -0,0 +1,195 @@
|
|||
/* Generated By:JavaCC: Do not edit this line. StandardTokenizer.java */
|
||||
package org.apache.lucene.analysis.standard;
|
||||
|
||||
import java.io.*;
|
||||
|
||||
/** A grammar-based tokenizer constructed with JavaCC.
|
||||
*
|
||||
* <p> This should be a good tokenizer for most European-language documents.
|
||||
*
|
||||
* <p>Many applications have specific tokenizer needs. If this tokenizer does
|
||||
* not suit your application, please consider copying this source code
|
||||
* directory to your project and maintaining your own grammar-based tokenizer.
|
||||
*/
|
||||
public class StandardTokenizer extends org.apache.lucene.analysis.Tokenizer implements StandardTokenizerConstants {
|
||||
|
||||
/** Constructs a tokenizer for this Reader. */
|
||||
public StandardTokenizer(Reader reader) {
|
||||
this(new FastCharStream(reader));
|
||||
this.input = reader;
|
||||
}
|
||||
|
||||
/** Returns the next token in the stream, or null at EOS.
|
||||
* <p>The returned token's type is set to an element of {@link
|
||||
* StandardTokenizerConstants#tokenImage}.
|
||||
*/
|
||||
final public org.apache.lucene.analysis.Token next() throws ParseException, IOException {
|
||||
Token token = null;
|
||||
switch ((jj_ntk==-1)?jj_ntk():jj_ntk) {
|
||||
case ALPHANUM:
|
||||
token = jj_consume_token(ALPHANUM);
|
||||
break;
|
||||
case APOSTROPHE:
|
||||
token = jj_consume_token(APOSTROPHE);
|
||||
break;
|
||||
case ACRONYM:
|
||||
token = jj_consume_token(ACRONYM);
|
||||
break;
|
||||
case COMPANY:
|
||||
token = jj_consume_token(COMPANY);
|
||||
break;
|
||||
case EMAIL:
|
||||
token = jj_consume_token(EMAIL);
|
||||
break;
|
||||
case HOST:
|
||||
token = jj_consume_token(HOST);
|
||||
break;
|
||||
case NUM:
|
||||
token = jj_consume_token(NUM);
|
||||
break;
|
||||
case 0:
|
||||
token = jj_consume_token(0);
|
||||
break;
|
||||
default:
|
||||
jj_la1[0] = jj_gen;
|
||||
jj_consume_token(-1);
|
||||
throw new ParseException();
|
||||
}
|
||||
if (token.kind == EOF) {
|
||||
{if (true) return null;}
|
||||
} else {
|
||||
{if (true) return
|
||||
new org.apache.lucene.analysis.Token(token.image,
|
||||
token.beginColumn,token.endColumn,
|
||||
tokenImage[token.kind]);}
|
||||
}
|
||||
throw new Error("Missing return statement in function");
|
||||
}
|
||||
|
||||
public StandardTokenizerTokenManager token_source;
|
||||
public Token token, jj_nt;
|
||||
private int jj_ntk;
|
||||
private int jj_gen;
|
||||
final private int[] jj_la1 = new int[1];
|
||||
static private int[] jj_la1_0;
|
||||
static {
|
||||
jj_la1_0();
|
||||
}
|
||||
private static void jj_la1_0() {
|
||||
jj_la1_0 = new int[] {0xff,};
|
||||
}
|
||||
|
||||
public StandardTokenizer(CharStream stream) {
|
||||
token_source = new StandardTokenizerTokenManager(stream);
|
||||
token = new Token();
|
||||
jj_ntk = -1;
|
||||
jj_gen = 0;
|
||||
for (int i = 0; i < 1; i++) jj_la1[i] = -1;
|
||||
}
|
||||
|
||||
public void ReInit(CharStream stream) {
|
||||
token_source.ReInit(stream);
|
||||
token = new Token();
|
||||
jj_ntk = -1;
|
||||
jj_gen = 0;
|
||||
for (int i = 0; i < 1; i++) jj_la1[i] = -1;
|
||||
}
|
||||
|
||||
public StandardTokenizer(StandardTokenizerTokenManager tm) {
|
||||
token_source = tm;
|
||||
token = new Token();
|
||||
jj_ntk = -1;
|
||||
jj_gen = 0;
|
||||
for (int i = 0; i < 1; i++) jj_la1[i] = -1;
|
||||
}
|
||||
|
||||
public void ReInit(StandardTokenizerTokenManager tm) {
|
||||
token_source = tm;
|
||||
token = new Token();
|
||||
jj_ntk = -1;
|
||||
jj_gen = 0;
|
||||
for (int i = 0; i < 1; i++) jj_la1[i] = -1;
|
||||
}
|
||||
|
||||
final private Token jj_consume_token(int kind) throws ParseException {
|
||||
Token oldToken;
|
||||
if ((oldToken = token).next != null) token = token.next;
|
||||
else token = token.next = token_source.getNextToken();
|
||||
jj_ntk = -1;
|
||||
if (token.kind == kind) {
|
||||
jj_gen++;
|
||||
return token;
|
||||
}
|
||||
token = oldToken;
|
||||
jj_kind = kind;
|
||||
throw generateParseException();
|
||||
}
|
||||
|
||||
final public Token getNextToken() {
|
||||
if (token.next != null) token = token.next;
|
||||
else token = token.next = token_source.getNextToken();
|
||||
jj_ntk = -1;
|
||||
jj_gen++;
|
||||
return token;
|
||||
}
|
||||
|
||||
final public Token getToken(int index) {
|
||||
Token t = token;
|
||||
for (int i = 0; i < index; i++) {
|
||||
if (t.next != null) t = t.next;
|
||||
else t = t.next = token_source.getNextToken();
|
||||
}
|
||||
return t;
|
||||
}
|
||||
|
||||
final private int jj_ntk() {
|
||||
if ((jj_nt=token.next) == null)
|
||||
return (jj_ntk = (token.next=token_source.getNextToken()).kind);
|
||||
else
|
||||
return (jj_ntk = jj_nt.kind);
|
||||
}
|
||||
|
||||
private java.util.Vector jj_expentries = new java.util.Vector();
|
||||
private int[] jj_expentry;
|
||||
private int jj_kind = -1;
|
||||
|
||||
public ParseException generateParseException() {
|
||||
jj_expentries.removeAllElements();
|
||||
boolean[] la1tokens = new boolean[14];
|
||||
for (int i = 0; i < 14; i++) {
|
||||
la1tokens[i] = false;
|
||||
}
|
||||
if (jj_kind >= 0) {
|
||||
la1tokens[jj_kind] = true;
|
||||
jj_kind = -1;
|
||||
}
|
||||
for (int i = 0; i < 1; i++) {
|
||||
if (jj_la1[i] == jj_gen) {
|
||||
for (int j = 0; j < 32; j++) {
|
||||
if ((jj_la1_0[i] & (1<<j)) != 0) {
|
||||
la1tokens[j] = true;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
for (int i = 0; i < 14; i++) {
|
||||
if (la1tokens[i]) {
|
||||
jj_expentry = new int[1];
|
||||
jj_expentry[0] = i;
|
||||
jj_expentries.addElement(jj_expentry);
|
||||
}
|
||||
}
|
||||
int[][] exptokseq = new int[jj_expentries.size()][];
|
||||
for (int i = 0; i < jj_expentries.size(); i++) {
|
||||
exptokseq[i] = (int[])jj_expentries.elementAt(i);
|
||||
}
|
||||
return new ParseException(token, exptokseq, tokenImage);
|
||||
}
|
||||
|
||||
final public void enable_tracing() {
|
||||
}
|
||||
|
||||
final public void disable_tracing() {
|
||||
}
|
||||
|
||||
}
|
|
@ -0,0 +1,40 @@
|
|||
/* Generated By:JavaCC: Do not edit this line. StandardTokenizerConstants.java */
|
||||
package org.apache.lucene.analysis.standard;
|
||||
|
||||
public interface StandardTokenizerConstants {
|
||||
|
||||
int EOF = 0;
|
||||
int ALPHANUM = 1;
|
||||
int APOSTROPHE = 2;
|
||||
int ACRONYM = 3;
|
||||
int COMPANY = 4;
|
||||
int EMAIL = 5;
|
||||
int HOST = 6;
|
||||
int NUM = 7;
|
||||
int P = 8;
|
||||
int HAS_DIGIT = 9;
|
||||
int ALPHA = 10;
|
||||
int LETTER = 11;
|
||||
int DIGIT = 12;
|
||||
int NOISE = 13;
|
||||
|
||||
int DEFAULT = 0;
|
||||
|
||||
String[] tokenImage = {
|
||||
"<EOF>",
|
||||
"<ALPHANUM>",
|
||||
"<APOSTROPHE>",
|
||||
"<ACRONYM>",
|
||||
"<COMPANY>",
|
||||
"<EMAIL>",
|
||||
"<HOST>",
|
||||
"<NUM>",
|
||||
"<P>",
|
||||
"<HAS_DIGIT>",
|
||||
"<ALPHA>",
|
||||
"<LETTER>",
|
||||
"<DIGIT>",
|
||||
"<NOISE>",
|
||||
};
|
||||
|
||||
}
|
File diff suppressed because it is too large
Load Diff
|
@ -0,0 +1,81 @@
|
|||
/* Generated By:JavaCC: Do not edit this line. Token.java Version 3.0 */
|
||||
package org.apache.lucene.analysis.standard;
|
||||
|
||||
/**
|
||||
* Describes the input token stream.
|
||||
*/
|
||||
|
||||
public class Token {
|
||||
|
||||
/**
|
||||
* An integer that describes the kind of this token. This numbering
|
||||
* system is determined by JavaCCParser, and a table of these numbers is
|
||||
* stored in the file ...Constants.java.
|
||||
*/
|
||||
public int kind;
|
||||
|
||||
/**
|
||||
* beginLine and beginColumn describe the position of the first character
|
||||
* of this token; endLine and endColumn describe the position of the
|
||||
* last character of this token.
|
||||
*/
|
||||
public int beginLine, beginColumn, endLine, endColumn;
|
||||
|
||||
/**
|
||||
* The string image of the token.
|
||||
*/
|
||||
public String image;
|
||||
|
||||
/**
|
||||
* A reference to the next regular (non-special) token from the input
|
||||
* stream. If this is the last token from the input stream, or if the
|
||||
* token manager has not read tokens beyond this one, this field is
|
||||
* set to null. This is true only if this token is also a regular
|
||||
* token. Otherwise, see below for a description of the contents of
|
||||
* this field.
|
||||
*/
|
||||
public Token next;
|
||||
|
||||
/**
|
||||
* This field is used to access special tokens that occur prior to this
|
||||
* token, but after the immediately preceding regular (non-special) token.
|
||||
* If there are no such special tokens, this field is set to null.
|
||||
* When there are more than one such special token, this field refers
|
||||
* to the last of these special tokens, which in turn refers to the next
|
||||
* previous special token through its specialToken field, and so on
|
||||
* until the first special token (whose specialToken field is null).
|
||||
* The next fields of special tokens refer to other special tokens that
|
||||
* immediately follow it (without an intervening regular token). If there
|
||||
* is no such token, this field is null.
|
||||
*/
|
||||
public Token specialToken;
|
||||
|
||||
/**
|
||||
* Returns the image.
|
||||
*/
|
||||
public String toString()
|
||||
{
|
||||
return image;
|
||||
}
|
||||
|
||||
/**
|
||||
* Returns a new Token object, by default. However, if you want, you
|
||||
* can create and return subclass objects based on the value of ofKind.
|
||||
* Simply add the cases to the switch for all those special cases.
|
||||
* For example, if you have a subclass of Token called IDToken that
|
||||
* you want to create if ofKind is ID, simlpy add something like :
|
||||
*
|
||||
* case MyParserConstants.ID : return new IDToken();
|
||||
*
|
||||
* to the following switch statement. Then you can cast matchedToken
|
||||
* variable to the appropriate type and use it in your lexical actions.
|
||||
*/
|
||||
public static final Token newToken(int ofKind)
|
||||
{
|
||||
switch(ofKind)
|
||||
{
|
||||
default : return new Token();
|
||||
}
|
||||
}
|
||||
|
||||
}
|
|
@ -0,0 +1,133 @@
|
|||
/* Generated By:JavaCC: Do not edit this line. TokenMgrError.java Version 3.0 */
|
||||
package org.apache.lucene.analysis.standard;
|
||||
|
||||
public class TokenMgrError extends Error
|
||||
{
|
||||
/*
|
||||
* Ordinals for various reasons why an Error of this type can be thrown.
|
||||
*/
|
||||
|
||||
/**
|
||||
* Lexical error occured.
|
||||
*/
|
||||
static final int LEXICAL_ERROR = 0;
|
||||
|
||||
/**
|
||||
* An attempt wass made to create a second instance of a static token manager.
|
||||
*/
|
||||
static final int STATIC_LEXER_ERROR = 1;
|
||||
|
||||
/**
|
||||
* Tried to change to an invalid lexical state.
|
||||
*/
|
||||
static final int INVALID_LEXICAL_STATE = 2;
|
||||
|
||||
/**
|
||||
* Detected (and bailed out of) an infinite loop in the token manager.
|
||||
*/
|
||||
static final int LOOP_DETECTED = 3;
|
||||
|
||||
/**
|
||||
* Indicates the reason why the exception is thrown. It will have
|
||||
* one of the above 4 values.
|
||||
*/
|
||||
int errorCode;
|
||||
|
||||
/**
|
||||
* Replaces unprintable characters by their espaced (or unicode escaped)
|
||||
* equivalents in the given string
|
||||
*/
|
||||
protected static final String addEscapes(String str) {
|
||||
StringBuffer retval = new StringBuffer();
|
||||
char ch;
|
||||
for (int i = 0; i < str.length(); i++) {
|
||||
switch (str.charAt(i))
|
||||
{
|
||||
case 0 :
|
||||
continue;
|
||||
case '\b':
|
||||
retval.append("\\b");
|
||||
continue;
|
||||
case '\t':
|
||||
retval.append("\\t");
|
||||
continue;
|
||||
case '\n':
|
||||
retval.append("\\n");
|
||||
continue;
|
||||
case '\f':
|
||||
retval.append("\\f");
|
||||
continue;
|
||||
case '\r':
|
||||
retval.append("\\r");
|
||||
continue;
|
||||
case '\"':
|
||||
retval.append("\\\"");
|
||||
continue;
|
||||
case '\'':
|
||||
retval.append("\\\'");
|
||||
continue;
|
||||
case '\\':
|
||||
retval.append("\\\\");
|
||||
continue;
|
||||
default:
|
||||
if ((ch = str.charAt(i)) < 0x20 || ch > 0x7e) {
|
||||
String s = "0000" + Integer.toString(ch, 16);
|
||||
retval.append("\\u" + s.substring(s.length() - 4, s.length()));
|
||||
} else {
|
||||
retval.append(ch);
|
||||
}
|
||||
continue;
|
||||
}
|
||||
}
|
||||
return retval.toString();
|
||||
}
|
||||
|
||||
/**
|
||||
* Returns a detailed message for the Error when it is thrown by the
|
||||
* token manager to indicate a lexical error.
|
||||
* Parameters :
|
||||
* EOFSeen : indicates if EOF caused the lexicl error
|
||||
* curLexState : lexical state in which this error occured
|
||||
* errorLine : line number when the error occured
|
||||
* errorColumn : column number when the error occured
|
||||
* errorAfter : prefix that was seen before this error occured
|
||||
* curchar : the offending character
|
||||
* Note: You can customize the lexical error message by modifying this method.
|
||||
*/
|
||||
protected static String LexicalError(boolean EOFSeen, int lexState, int errorLine, int errorColumn, String errorAfter, char curChar) {
|
||||
return("Lexical error at line " +
|
||||
errorLine + ", column " +
|
||||
errorColumn + ". Encountered: " +
|
||||
(EOFSeen ? "<EOF> " : ("\"" + addEscapes(String.valueOf(curChar)) + "\"") + " (" + (int)curChar + "), ") +
|
||||
"after : \"" + addEscapes(errorAfter) + "\"");
|
||||
}
|
||||
|
||||
/**
|
||||
* You can also modify the body of this method to customize your error messages.
|
||||
* For example, cases like LOOP_DETECTED and INVALID_LEXICAL_STATE are not
|
||||
* of end-users concern, so you can return something like :
|
||||
*
|
||||
* "Internal Error : Please file a bug report .... "
|
||||
*
|
||||
* from this method for such cases in the release version of your parser.
|
||||
*/
|
||||
public String getMessage() {
|
||||
return super.getMessage();
|
||||
}
|
||||
|
||||
/*
|
||||
* Constructors of various flavors follow.
|
||||
*/
|
||||
|
||||
public TokenMgrError() {
|
||||
}
|
||||
|
||||
public TokenMgrError(String message, int reason) {
|
||||
super(message);
|
||||
errorCode = reason;
|
||||
}
|
||||
|
||||
public TokenMgrError(boolean EOFSeen, int lexState, int errorLine, int errorColumn, String errorAfter, char curChar, int reason) {
|
||||
this(LexicalError(EOFSeen, lexState, errorLine, errorColumn, errorAfter, curChar), reason);
|
||||
}
|
||||
}
|
|
@ -1,6 +0,0 @@
|
|||
QueryParser.java
|
||||
TokenMgrError.java
|
||||
ParseException.java
|
||||
Token.java
|
||||
TokenManager.java
|
||||
QueryParserConstants.java
|
|
@ -0,0 +1,110 @@
|
|||
/* Generated By:JavaCC: Do not edit this line. CharStream.java Version 3.0 */
|
||||
package org.apache.lucene.queryParser;
|
||||
|
||||
/**
|
||||
* This interface describes a character stream that maintains line and
|
||||
* column number positions of the characters. It also has the capability
|
||||
* to backup the stream to some extent. An implementation of this
|
||||
* interface is used in the TokenManager implementation generated by
|
||||
* JavaCCParser.
|
||||
*
|
||||
* All the methods except backup can be implemented in any fashion. backup
|
||||
* needs to be implemented correctly for the correct operation of the lexer.
|
||||
* Rest of the methods are all used to get information like line number,
|
||||
* column number and the String that constitutes a token and are not used
|
||||
* by the lexer. Hence their implementation won't affect the generated lexer's
|
||||
* operation.
|
||||
*/
|
||||
|
||||
public interface CharStream {
|
||||
|
||||
/**
|
||||
* Returns the next character from the selected input. The method
|
||||
* of selecting the input is the responsibility of the class
|
||||
* implementing this interface. Can throw any java.io.IOException.
|
||||
*/
|
||||
char readChar() throws java.io.IOException;
|
||||
|
||||
/**
|
||||
* Returns the column position of the character last read.
|
||||
* @deprecated
|
||||
* @see #getEndColumn
|
||||
*/
|
||||
int getColumn();
|
||||
|
||||
/**
|
||||
* Returns the line number of the character last read.
|
||||
* @deprecated
|
||||
* @see #getEndLine
|
||||
*/
|
||||
int getLine();
|
||||
|
||||
/**
|
||||
* Returns the column number of the last character for current token (being
|
||||
* matched after the last call to BeginTOken).
|
||||
*/
|
||||
int getEndColumn();
|
||||
|
||||
/**
|
||||
* Returns the line number of the last character for current token (being
|
||||
* matched after the last call to BeginTOken).
|
||||
*/
|
||||
int getEndLine();
|
||||
|
||||
/**
|
||||
* Returns the column number of the first character for current token (being
|
||||
* matched after the last call to BeginTOken).
|
||||
*/
|
||||
int getBeginColumn();
|
||||
|
||||
/**
|
||||
* Returns the line number of the first character for current token (being
|
||||
* matched after the last call to BeginTOken).
|
||||
*/
|
||||
int getBeginLine();
|
||||
|
||||
/**
|
||||
* Backs up the input stream by amount steps. Lexer calls this method if it
|
||||
* had already read some characters, but could not use them to match a
|
||||
* (longer) token. So, they will be used again as the prefix of the next
|
||||
* token and it is the implemetation's responsibility to do this right.
|
||||
*/
|
||||
void backup(int amount);
|
||||
|
||||
/**
|
||||
* Returns the next character that marks the beginning of the next token.
|
||||
* All characters must remain in the buffer between two successive calls
|
||||
* to this method to implement backup correctly.
|
||||
*/
|
||||
char BeginToken() throws java.io.IOException;
|
||||
|
||||
/**
|
||||
* Returns a string made up of characters from the marked token beginning
|
||||
* to the current buffer position. Implementations have the choice of returning
|
||||
* anything that they want to. For example, for efficiency, one might decide
|
||||
* to just return null, which is a valid implementation.
|
||||
*/
|
||||
String GetImage();
|
||||
|
||||
/**
|
||||
* Returns an array of characters that make up the suffix of length 'len' for
|
||||
* the currently matched token. This is used to build up the matched string
|
||||
* for use in actions in the case of MORE. A simple and inefficient
|
||||
* implementation of this is as follows :
|
||||
*
|
||||
* {
|
||||
* String t = GetImage();
|
||||
* return t.substring(t.length() - len, t.length()).toCharArray();
|
||||
* }
|
||||
*/
|
||||
char[] GetSuffix(int len);
|
||||
|
||||
/**
|
||||
* The lexer calls this function to indicate that it is done with the stream
|
||||
* and hence implementations can free any resources held by this class.
|
||||
* Again, the body of this function can be just empty and it will not
|
||||
* affect the lexer's operation.
|
||||
*/
|
||||
void Done();
|
||||
|
||||
}
|
|
@ -0,0 +1,192 @@
|
|||
/* Generated By:JavaCC: Do not edit this line. ParseException.java Version 3.0 */
|
||||
package org.apache.lucene.queryParser;
|
||||
|
||||
/**
|
||||
* This exception is thrown when parse errors are encountered.
|
||||
* You can explicitly create objects of this exception type by
|
||||
* calling the method generateParseException in the generated
|
||||
* parser.
|
||||
*
|
||||
* You can modify this class to customize your error reporting
|
||||
* mechanisms so long as you retain the public fields.
|
||||
*/
|
||||
public class ParseException extends Exception {
|
||||
|
||||
/**
|
||||
* This constructor is used by the method "generateParseException"
|
||||
* in the generated parser. Calling this constructor generates
|
||||
* a new object of this type with the fields "currentToken",
|
||||
* "expectedTokenSequences", and "tokenImage" set. The boolean
|
||||
* flag "specialConstructor" is also set to true to indicate that
|
||||
* this constructor was used to create this object.
|
||||
* This constructor calls its super class with the empty string
|
||||
* to force the "toString" method of parent class "Throwable" to
|
||||
* print the error message in the form:
|
||||
* ParseException: <result of getMessage>
|
||||
*/
|
||||
public ParseException(Token currentTokenVal,
|
||||
int[][] expectedTokenSequencesVal,
|
||||
String[] tokenImageVal
|
||||
)
|
||||
{
|
||||
super("");
|
||||
specialConstructor = true;
|
||||
currentToken = currentTokenVal;
|
||||
expectedTokenSequences = expectedTokenSequencesVal;
|
||||
tokenImage = tokenImageVal;
|
||||
}
|
||||
|
||||
/**
|
||||
* The following constructors are for use by you for whatever
|
||||
* purpose you can think of. Constructing the exception in this
|
||||
* manner makes the exception behave in the normal way - i.e., as
|
||||
* documented in the class "Throwable". The fields "errorToken",
|
||||
* "expectedTokenSequences", and "tokenImage" do not contain
|
||||
* relevant information. The JavaCC generated code does not use
|
||||
* these constructors.
|
||||
*/
|
||||
|
||||
public ParseException() {
|
||||
super();
|
||||
specialConstructor = false;
|
||||
}
|
||||
|
||||
public ParseException(String message) {
|
||||
super(message);
|
||||
specialConstructor = false;
|
||||
}
|
||||
|
||||
/**
|
||||
* This variable determines which constructor was used to create
|
||||
* this object and thereby affects the semantics of the
|
||||
* "getMessage" method (see below).
|
||||
*/
|
||||
protected boolean specialConstructor;
|
||||
|
||||
/**
|
||||
* This is the last token that has been consumed successfully. If
|
||||
* this object has been created due to a parse error, the token
|
||||
* followng this token will (therefore) be the first error token.
|
||||
*/
|
||||
public Token currentToken;
|
||||
|
||||
/**
|
||||
* Each entry in this array is an array of integers. Each array
|
||||
* of integers represents a sequence of tokens (by their ordinal
|
||||
* values) that is expected at this point of the parse.
|
||||
*/
|
||||
public int[][] expectedTokenSequences;
|
||||
|
||||
/**
|
||||
* This is a reference to the "tokenImage" array of the generated
|
||||
* parser within which the parse error occurred. This array is
|
||||
* defined in the generated ...Constants interface.
|
||||
*/
|
||||
public String[] tokenImage;
|
||||
|
||||
/**
|
||||
* This method has the standard behavior when this object has been
|
||||
* created using the standard constructors. Otherwise, it uses
|
||||
* "currentToken" and "expectedTokenSequences" to generate a parse
|
||||
* error message and returns it. If this object has been created
|
||||
* due to a parse error, and you do not catch it (it gets thrown
|
||||
* from the parser), then this method is called during the printing
|
||||
* of the final stack trace, and hence the correct error message
|
||||
* gets displayed.
|
||||
*/
|
||||
public String getMessage() {
|
||||
if (!specialConstructor) {
|
||||
return super.getMessage();
|
||||
}
|
||||
String expected = "";
|
||||
int maxSize = 0;
|
||||
for (int i = 0; i < expectedTokenSequences.length; i++) {
|
||||
if (maxSize < expectedTokenSequences[i].length) {
|
||||
maxSize = expectedTokenSequences[i].length;
|
||||
}
|
||||
for (int j = 0; j < expectedTokenSequences[i].length; j++) {
|
||||
expected += tokenImage[expectedTokenSequences[i][j]] + " ";
|
||||
}
|
||||
if (expectedTokenSequences[i][expectedTokenSequences[i].length - 1] != 0) {
|
||||
expected += "...";
|
||||
}
|
||||
expected += eol + " ";
|
||||
}
|
||||
String retval = "Encountered \"";
|
||||
Token tok = currentToken.next;
|
||||
for (int i = 0; i < maxSize; i++) {
|
||||
if (i != 0) retval += " ";
|
||||
if (tok.kind == 0) {
|
||||
retval += tokenImage[0];
|
||||
break;
|
||||
}
|
||||
retval += add_escapes(tok.image);
|
||||
tok = tok.next;
|
||||
}
|
||||
retval += "\" at line " + currentToken.next.beginLine + ", column " + currentToken.next.beginColumn;
|
||||
retval += "." + eol;
|
||||
if (expectedTokenSequences.length == 1) {
|
||||
retval += "Was expecting:" + eol + " ";
|
||||
} else {
|
||||
retval += "Was expecting one of:" + eol + " ";
|
||||
}
|
||||
retval += expected;
|
||||
return retval;
|
||||
}
|
||||
|
||||
/**
|
||||
* The end of line string for this machine.
|
||||
*/
|
||||
protected String eol = System.getProperty("line.separator", "\n");
|
||||
|
||||
/**
|
||||
* Used to convert raw characters to their escaped version
|
||||
* when these raw version cannot be used as part of an ASCII
|
||||
* string literal.
|
||||
*/
|
||||
protected String add_escapes(String str) {
|
||||
StringBuffer retval = new StringBuffer();
|
||||
char ch;
|
||||
for (int i = 0; i < str.length(); i++) {
|
||||
switch (str.charAt(i))
|
||||
{
|
||||
case 0 :
|
||||
continue;
|
||||
case '\b':
|
||||
retval.append("\\b");
|
||||
continue;
|
||||
case '\t':
|
||||
retval.append("\\t");
|
||||
continue;
|
||||
case '\n':
|
||||
retval.append("\\n");
|
||||
continue;
|
||||
case '\f':
|
||||
retval.append("\\f");
|
||||
continue;
|
||||
case '\r':
|
||||
retval.append("\\r");
|
||||
continue;
|
||||
case '\"':
|
||||
retval.append("\\\"");
|
||||
continue;
|
||||
case '\'':
|
||||
retval.append("\\\'");
|
||||
continue;
|
||||
case '\\':
|
||||
retval.append("\\\\");
|
||||
continue;
|
||||
default:
|
||||
if ((ch = str.charAt(i)) < 0x20 || ch > 0x7e) {
|
||||
String s = "0000" + Integer.toString(ch, 16);
|
||||
retval.append("\\u" + s.substring(s.length() - 4, s.length()));
|
||||
} else {
|
||||
retval.append(ch);
|
||||
}
|
||||
continue;
|
||||
}
|
||||
}
|
||||
return retval.toString();
|
||||
}
|
||||
|
||||
}
|
File diff suppressed because it is too large
Load Diff
|
@ -0,0 +1,80 @@
|
|||
/* Generated By:JavaCC: Do not edit this line. QueryParserConstants.java */
|
||||
package org.apache.lucene.queryParser;
|
||||
|
||||
public interface QueryParserConstants {
|
||||
|
||||
int EOF = 0;
|
||||
int _NUM_CHAR = 1;
|
||||
int _ESCAPED_CHAR = 2;
|
||||
int _TERM_START_CHAR = 3;
|
||||
int _TERM_CHAR = 4;
|
||||
int _WHITESPACE = 5;
|
||||
int AND = 7;
|
||||
int OR = 8;
|
||||
int NOT = 9;
|
||||
int PLUS = 10;
|
||||
int MINUS = 11;
|
||||
int LPAREN = 12;
|
||||
int RPAREN = 13;
|
||||
int COLON = 14;
|
||||
int CARAT = 15;
|
||||
int QUOTED = 16;
|
||||
int TERM = 17;
|
||||
int FUZZY = 18;
|
||||
int SLOP = 19;
|
||||
int PREFIXTERM = 20;
|
||||
int WILDTERM = 21;
|
||||
int RANGEIN_START = 22;
|
||||
int RANGEEX_START = 23;
|
||||
int NUMBER = 24;
|
||||
int RANGEIN_TO = 25;
|
||||
int RANGEIN_END = 26;
|
||||
int RANGEIN_QUOTED = 27;
|
||||
int RANGEIN_GOOP = 28;
|
||||
int RANGEEX_TO = 29;
|
||||
int RANGEEX_END = 30;
|
||||
int RANGEEX_QUOTED = 31;
|
||||
int RANGEEX_GOOP = 32;
|
||||
|
||||
int Boost = 0;
|
||||
int RangeEx = 1;
|
||||
int RangeIn = 2;
|
||||
int DEFAULT = 3;
|
||||
|
||||
String[] tokenImage = {
|
||||
"<EOF>",
|
||||
"<_NUM_CHAR>",
|
||||
"<_ESCAPED_CHAR>",
|
||||
"<_TERM_START_CHAR>",
|
||||
"<_TERM_CHAR>",
|
||||
"<_WHITESPACE>",
|
||||
"<token of kind 6>",
|
||||
"<AND>",
|
||||
"<OR>",
|
||||
"<NOT>",
|
||||
"\"+\"",
|
||||
"\"-\"",
|
||||
"\"(\"",
|
||||
"\")\"",
|
||||
"\":\"",
|
||||
"\"^\"",
|
||||
"<QUOTED>",
|
||||
"<TERM>",
|
||||
"\"~\"",
|
||||
"<SLOP>",
|
||||
"<PREFIXTERM>",
|
||||
"<WILDTERM>",
|
||||
"\"[\"",
|
||||
"\"{\"",
|
||||
"<NUMBER>",
|
||||
"\"TO\"",
|
||||
"\"]\"",
|
||||
"<RANGEIN_QUOTED>",
|
||||
"<RANGEIN_GOOP>",
|
||||
"\"TO\"",
|
||||
"\"}\"",
|
||||
"<RANGEEX_QUOTED>",
|
||||
"<RANGEEX_GOOP>",
|
||||
};
|
||||
|
||||
}
|
File diff suppressed because it is too large
Load Diff
|
@ -0,0 +1,81 @@
|
|||
/* Generated By:JavaCC: Do not edit this line. Token.java Version 3.0 */
|
||||
package org.apache.lucene.queryParser;
|
||||
|
||||
/**
|
||||
* Describes the input token stream.
|
||||
*/
|
||||
|
||||
public class Token {
|
||||
|
||||
/**
|
||||
* An integer that describes the kind of this token. This numbering
|
||||
* system is determined by JavaCCParser, and a table of these numbers is
|
||||
* stored in the file ...Constants.java.
|
||||
*/
|
||||
public int kind;
|
||||
|
||||
/**
|
||||
* beginLine and beginColumn describe the position of the first character
|
||||
* of this token; endLine and endColumn describe the position of the
|
||||
* last character of this token.
|
||||
*/
|
||||
public int beginLine, beginColumn, endLine, endColumn;
|
||||
|
||||
/**
|
||||
* The string image of the token.
|
||||
*/
|
||||
public String image;
|
||||
|
||||
/**
|
||||
* A reference to the next regular (non-special) token from the input
|
||||
* stream. If this is the last token from the input stream, or if the
|
||||
* token manager has not read tokens beyond this one, this field is
|
||||
* set to null. This is true only if this token is also a regular
|
||||
* token. Otherwise, see below for a description of the contents of
|
||||
* this field.
|
||||
*/
|
||||
public Token next;
|
||||
|
||||
/**
|
||||
* This field is used to access special tokens that occur prior to this
|
||||
* token, but after the immediately preceding regular (non-special) token.
|
||||
* If there are no such special tokens, this field is set to null.
|
||||
* When there are more than one such special token, this field refers
|
||||
* to the last of these special tokens, which in turn refers to the next
|
||||
* previous special token through its specialToken field, and so on
|
||||
* until the first special token (whose specialToken field is null).
|
||||
* The next fields of special tokens refer to other special tokens that
|
||||
* immediately follow it (without an intervening regular token). If there
|
||||
* is no such token, this field is null.
|
||||
*/
|
||||
public Token specialToken;
|
||||
|
||||
/**
|
||||
* Returns the image.
|
||||
*/
|
||||
public String toString()
|
||||
{
|
||||
return image;
|
||||
}
|
||||
|
||||
/**
|
||||
* Returns a new Token object, by default. However, if you want, you
|
||||
* can create and return subclass objects based on the value of ofKind.
|
||||
* Simply add the cases to the switch for all those special cases.
|
||||
* For example, if you have a subclass of Token called IDToken that
|
||||
* you want to create if ofKind is ID, simlpy add something like :
|
||||
*
|
||||
* case MyParserConstants.ID : return new IDToken();
|
||||
*
|
||||
* to the following switch statement. Then you can cast matchedToken
|
||||
* variable to the appropriate type and use it in your lexical actions.
|
||||
*/
|
||||
public static final Token newToken(int ofKind)
|
||||
{
|
||||
switch(ofKind)
|
||||
{
|
||||
default : return new Token();
|
||||
}
|
||||
}
|
||||
|
||||
}
|
|
@ -0,0 +1,133 @@
|
|||
/* Generated By:JavaCC: Do not edit this line. TokenMgrError.java Version 3.0 */
|
||||
package org.apache.lucene.queryParser;
|
||||
|
||||
public class TokenMgrError extends Error
|
||||
{
|
||||
/*
|
||||
* Ordinals for various reasons why an Error of this type can be thrown.
|
||||
*/
|
||||
|
||||
/**
|
||||
* Lexical error occured.
|
||||
*/
|
||||
static final int LEXICAL_ERROR = 0;
|
||||
|
||||
/**
|
||||
* An attempt wass made to create a second instance of a static token manager.
|
||||
*/
|
||||
static final int STATIC_LEXER_ERROR = 1;
|
||||
|
||||
/**
|
||||
* Tried to change to an invalid lexical state.
|
||||
*/
|
||||
static final int INVALID_LEXICAL_STATE = 2;
|
||||
|
||||
/**
|
||||
* Detected (and bailed out of) an infinite loop in the token manager.
|
||||
*/
|
||||
static final int LOOP_DETECTED = 3;
|
||||
|
||||
/**
|
||||
* Indicates the reason why the exception is thrown. It will have
|
||||
* one of the above 4 values.
|
||||
*/
|
||||
int errorCode;
|
||||
|
||||
/**
|
||||
* Replaces unprintable characters by their espaced (or unicode escaped)
|
||||
* equivalents in the given string
|
||||
*/
|
||||
protected static final String addEscapes(String str) {
|
||||
StringBuffer retval = new StringBuffer();
|
||||
char ch;
|
||||
for (int i = 0; i < str.length(); i++) {
|
||||
switch (str.charAt(i))
|
||||
{
|
||||
case 0 :
|
||||
continue;
|
||||
case '\b':
|
||||
retval.append("\\b");
|
||||
continue;
|
||||
case '\t':
|
||||
retval.append("\\t");
|
||||
continue;
|
||||
case '\n':
|
||||
retval.append("\\n");
|
||||
continue;
|
||||
case '\f':
|
||||
retval.append("\\f");
|
||||
continue;
|
||||
case '\r':
|
||||
retval.append("\\r");
|
||||
continue;
|
||||
case '\"':
|
||||
retval.append("\\\"");
|
||||
continue;
|
||||
case '\'':
|
||||
retval.append("\\\'");
|
||||
continue;
|
||||
case '\\':
|
||||
retval.append("\\\\");
|
||||
continue;
|
||||
default:
|
||||
if ((ch = str.charAt(i)) < 0x20 || ch > 0x7e) {
|
||||
String s = "0000" + Integer.toString(ch, 16);
|
||||
retval.append("\\u" + s.substring(s.length() - 4, s.length()));
|
||||
} else {
|
||||
retval.append(ch);
|
||||
}
|
||||
continue;
|
||||
}
|
||||
}
|
||||
return retval.toString();
|
||||
}
|
||||
|
||||
/**
|
||||
* Returns a detailed message for the Error when it is thrown by the
|
||||
* token manager to indicate a lexical error.
|
||||
* Parameters :
|
||||
* EOFSeen : indicates if EOF caused the lexicl error
|
||||
* curLexState : lexical state in which this error occured
|
||||
* errorLine : line number when the error occured
|
||||
* errorColumn : column number when the error occured
|
||||
* errorAfter : prefix that was seen before this error occured
|
||||
* curchar : the offending character
|
||||
* Note: You can customize the lexical error message by modifying this method.
|
||||
*/
|
||||
protected static String LexicalError(boolean EOFSeen, int lexState, int errorLine, int errorColumn, String errorAfter, char curChar) {
|
||||
return("Lexical error at line " +
|
||||
errorLine + ", column " +
|
||||
errorColumn + ". Encountered: " +
|
||||
(EOFSeen ? "<EOF> " : ("\"" + addEscapes(String.valueOf(curChar)) + "\"") + " (" + (int)curChar + "), ") +
|
||||
"after : \"" + addEscapes(errorAfter) + "\"");
|
||||
}
|
||||
|
||||
/**
|
||||
* You can also modify the body of this method to customize your error messages.
|
||||
* For example, cases like LOOP_DETECTED and INVALID_LEXICAL_STATE are not
|
||||
* of end-users concern, so you can return something like :
|
||||
*
|
||||
* "Internal Error : Please file a bug report .... "
|
||||
*
|
||||
* from this method for such cases in the release version of your parser.
|
||||
*/
|
||||
public String getMessage() {
|
||||
return super.getMessage();
|
||||
}
|
||||
|
||||
/*
|
||||
* Constructors of various flavors follow.
|
||||
*/
|
||||
|
||||
public TokenMgrError() {
|
||||
}
|
||||
|
||||
public TokenMgrError(String message, int reason) {
|
||||
super(message);
|
||||
errorCode = reason;
|
||||
}
|
||||
|
||||
public TokenMgrError(boolean EOFSeen, int lexState, int errorLine, int errorColumn, String errorAfter, char curChar, int reason) {
|
||||
this(LexicalError(EOFSeen, lexState, errorLine, errorColumn, errorAfter, curChar), reason);
|
||||
}
|
||||
}
|
Loading…
Reference in New Issue