PR 19468, but not exactly as it was done in the provided patches. JavaCC is no longer required to build Lucene, but can be run optionally

git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@150017 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Erik Hatcher 2003-09-11 01:51:33 +00:00
parent 798fc0f0ef
commit 2af2d85877
25 changed files with 7297 additions and 90 deletions

View File

@ -3,15 +3,15 @@ Lucene Build Instructions
$Id$
Basic steps:
0) Install JDK 1.2 (or greater), Ant 1.4 (or greater), and the Ant
0) Install JDK 1.2 (or greater), Ant 1.5 (or greater), and the Ant
optional.jar
1) Download Lucene from Apache and unpack it
2) Connect to the top-level of your Lucene installation
3) Install JavaCC
3) Install JavaCC (optional)
4) Run ant
Step 0) Set up your development environment (JDK 1.2 or greater,
Ant 1.4 or greater)
Ant 1.5 or greater)
We'll assume that you know how to get and set up the JDK - if you
don't, then we suggest starting at http://java.sun.com and learning
@ -22,26 +22,22 @@ with the development version of Lucene, we recommend you stick with
the most current version of Java (at the time of this writing, JDK
1.4). Also, note that if you're working with the Lucene source,
you'll need to use Ant (see below) and Ant requires at least JDK 1.1
(and in the future will likely move to requiring JDK 1.2, according to
(and in the future will move to requiring JDK 1.2, according to
the Ant install docs).
Like most of the Jakarta projects, Lucene uses Apache Ant for build
control. Specifically, you MUST use Ant version 1.4 or greater.
control. Specifically, you MUST use Ant version 1.5 or greater.
Ant is "kind of like make without make's wrinkles". Ant is
implemented in java and uses XML-based configuration files. You can
get it at:
http://jakarta.apache.org/ant
Specifically, you can get the binary distributions at:
http://jakarta.apache.org/builds/jakarta-ant/release/
http://ant.apache.org
You'll need to download both the Ant binary distribution and the
"optional" jar file. Install these according to the instructions at:
http://jakarta.apache.org/ant/manual
http://ant.apache.org/manual
Step 1) Download Lucene from Apache
@ -79,21 +75,16 @@ NOTE: the ~ character represents your user account home directory.
Step 3) Install JavaCC
Building the Lucene distribution from the source requires the JavaCC
parser generator. This software has a separate license agreement that
must be agreed to before you can use it. The web page for JavaCC is here:
Building the Lucene distribution from the source does not require the JavaCC
parser generator, but if you wish to regenerate any of the pre-generated
parser pieces, you will need to install JavaCC.
http://www.experimentalstuff.com/Technologies/JavaCC/
http://javacc.dev.java.net
Follow the download links and download the zip file to a temporary
location on your file system. Unzip the file and run the large class file
in the directory. On windows, use this command from the temp directory:
location on your file system.
java -cp . JavaCC2_1
This will launch a Java GUI installer. There is also a command line
installer available, and the installation class will give you those
directions. After JavaCC is installed, edit your build properties
After JavaCC is installed, edit your build.properties
(as in step 2), and add the line
javacc.home=/javacc/bin
@ -107,14 +98,16 @@ location of your ant installation, typing "ant" at the shell prompt
and command prompt should run ant. Ant will by default look for the
"build.xml" file in your current directory, and compile Lucene.
To rebuild any of the JavaCC-based parsers, run "ant javacc".
For further information on Lucene, go to:
http://jakarta.apache.org/lucene/
Please join the Lucene-User mailing list by visiting this site:
http://jakarta.apache.org/site/mail.html
Please post suggestions, questions, corrections or additions to this
document to the Lucene-User mailing list.
document to the lucene-user mailing list.
This file was originally written by Steven J. Owens <puff@darksleep.com>.
This file was modified by Jon S. Stevens <jon@latchkey.com>.

120
build.xml
View File

@ -9,6 +9,8 @@
<property file="${basedir}/build.properties" />
<property file="${basedir}/default.properties" />
<property name="javacc.main.class" value="org.javacc.parser.Main"/>
<!-- Build classpath -->
<path id="classpath">
<pathelement location="${build.classes}"/>
@ -52,8 +54,8 @@
<available
property="javacc.present"
classname="COM.sun.labs.javacc.Main"
classpath="${javacc.zip}"
classname="${javacc.main.class}"
classpath="${javacc.jar}"
/>
<available
@ -67,21 +69,21 @@
</tstamp>
</target>
<target name="javacc_check" depends="init" unless="javacc.present">
<echo>
<target name="javacc-check" depends="init">
<fail unless="javacc.present">
##################################################################
JavaCC not found.
JavaCC Home: ${javacc.home}
JavaCC Zip: ${javacc.zip}
JavaCC Zip: ${javacc.jar}
Please download and install JavaCC 2.0 from:
Please download and install JavaCC from:
&lt;http://www.experimentalstuff.com/Technologies/JavaCC/&gt;
&lt;http://javacc.dev.java.net&gt;
Then, create a build.properties file either in your home
directory, or within the Lucene directory and set the javacc.home
property to the path where JavaCC.zip is located. For example,
if you installed JavaCC in /usr/local/java/javacc2.0, then set the
if you installed JavaCC in /usr/local/java/javacc3.2, then set the
javacc.home property to:
javacc.home=/usr/local/java/javacc2.0/bin
@ -89,9 +91,10 @@
If you get an error like the one below, then you have not installed
things correctly. Please check all your paths and try again.
java.lang.NoClassDefFoundError: COM/sun/labs/javacc/Main
java.lang.NoClassDefFoundError: org.javacc.parser.Main
##################################################################
</echo>
</fail>
</target>
<!-- ================================================================== -->
@ -99,25 +102,10 @@
<!-- ================================================================== -->
<!-- -->
<!-- ================================================================== -->
<target name="compile" depends="init,javacc_check" if="javacc.present">
<mkdir dir="${build.src}/org/apache/lucene/analysis/standard"/>
<javacc
target="${src.dir}/org/apache/lucene/analysis/standard/StandardTokenizer.jj"
javacchome="${javacc.zip.dir}"
outputdirectory="${build.src}/org/apache/lucene/analysis/standard"
/>
<delete file="${build.src}/org/apache/lucene/analysis/standard/ParseException.java"/>
<mkdir dir="${build.src}/org/apache/lucene/queryParser"/>
<javacc
target="${src.dir}/org/apache/lucene/queryParser/QueryParser.jj"
javacchome="${javacc.zip.dir}"
outputdirectory="${build.src}/org/apache/lucene/queryParser"
/>
<target name="compile" depends="init">
<javac
encoding="${build.encoding}"
srcdir="${src.dir}:${build.src}"
srcdir="${src.dir}"
includes="org/**/*.java"
destdir="${build.classes}"
debug="${debug}">
@ -135,7 +123,7 @@
<!-- ================================================================== -->
<!-- -->
<!-- ================================================================== -->
<target name="jar" depends="compile" if="javacc.present">
<target name="jar" depends="compile">
<!-- Create Jar MANIFEST file -->
<echo file="${build.manifest}">Manifest-Version: 1.0
@ -158,7 +146,7 @@ Implementation-Vendor: Lucene
/>
</target>
<target name="jardemo" depends="compile,demo" if="javacc.present">
<target name="jardemo" depends="compile,demo">
<jar
jarfile="${build.demo}/${build.demo.name}.jar"
basedir="${build.demo.classes}"
@ -166,7 +154,7 @@ Implementation-Vendor: Lucene
/>
</target>
<target name="wardemo" depends="compile,demo,jar,jardemo" if="javacc.present">
<target name="wardemo" depends="compile,demo,jar,jardemo">
<mkdir dir="${build.demo}/${build.demo.war.name}"/>
<mkdir dir="${build.demo}/${build.demo.war.name}/WEB-INF"/>
<mkdir dir="${build.demo}/${build.demo.war.name}/WEB-INF/lib"/>
@ -202,22 +190,8 @@ Implementation-Vendor: Lucene
<!-- ================================================================== -->
<!-- -->
<!-- ================================================================== -->
<target name="jar-src" depends="init,javacc_check" if="javacc.present">
<target name="jar-src" depends="init">
<mkdir dir="${build.src}/org/apache/lucene/analysis/standard"/>
<javacc
target="${src.dir}/org/apache/lucene/analysis/standard/StandardTokenizer.jj"
javacchome="${javacc.zip.dir}"
outputdirectory="${build.src}/org/apache/lucene/analysis/standard"
/>
<delete file="${build.src}/org/apache/lucene/analysis/standard/ParseException.java"/>
<mkdir dir="${build.src}/org/apache/lucene/queryParser"/>
<javacc
target="${src.dir}/org/apache/lucene/queryParser/QueryParser.jj"
javacchome="${javacc.zip.dir}"
outputdirectory="${build.src}/org/apache/lucene/queryParser"
/>
<jar jarfile="${build.dir}/${final.name}-src.jar">
<fileset dir="${build.dir}" includes="**/*.java"/>
</jar>
@ -228,7 +202,7 @@ Implementation-Vendor: Lucene
<!-- ================================================================== -->
<!-- -->
<!-- ================================================================== -->
<target name="demo" depends="compile" if="javacc.present">
<target name="demo" depends="compile">
<mkdir dir="${build.demo}"/>
<mkdir dir="${build.demo.src}" />
@ -239,11 +213,6 @@ Implementation-Vendor: Lucene
</fileset>
</copy>
<javacc
target="${build.demo.src}/org/apache/lucene/demo/html/HTMLParser.jj"
javacchome="${javacc.zip.dir}"
outputdirectory="${build.demo.src}/org/apache/lucene/demo/html"
/>
<mkdir dir="${build.demo.classes}"/>
<javac
@ -355,7 +324,7 @@ Implementation-Vendor: Lucene
<!-- ================================================================== -->
<!-- -->
<!-- ================================================================== -->
<target name="javadocs" depends="compile" if="javacc.present">
<target name="javadocs" depends="compile">
<mkdir dir="${build.javadocs}"/>
<javadoc
sourcepath="${src.dir}:${build.src}"
@ -619,4 +588,51 @@ Implementation-Vendor: Lucene
</war>
</target>
-->
<!-- ================================================================== -->
<!-- Build the JavaCC files into the source tree -->
<!-- ================================================================== -->
<target name="javacc" depends="javacc-StandardAnalyzer,javacc-QueryParser,javacc-HTMLParser"/>
<target name="javacc-StandardAnalyzer" depends="init,javacc-check" if="javacc.present">
<!-- generate this in a build directory so we can exclude ParseException -->
<mkdir dir="${build.src}/org/apache/lucene/analysis/standard"/>
<antcall target="invoke-javacc">
<param name="target" location="${src.dir}/org/apache/lucene/analysis/standard/StandardTokenizer.jj"/>
<param name="output.dir" location="${build.src}/org/apache/lucene/analysis/standard"/>
</antcall>
<copy todir="${src.dir}/org/apache/lucene/analysis/standard">
<fileset dir="${build.src}/org/apache/lucene/analysis/standard">
<include name="*.java"/>
<exclude name="ParseException.java"/>
</fileset>
</copy>
</target>
<target name="javacc-QueryParser" depends="init,javacc-check" if="javacc.present">
<antcall target="invoke-javacc">
<param name="target" location="${src.dir}/org/apache/lucene/queryParser/QueryParser.jj"/>
<param name="output.dir" location="${src.dir}/org/apache/lucene/queryParser"/>
</antcall>
</target>
<target name="javacc-HTMLParser" depends="init,javacc-check" if="javacc.present">
<antcall target="invoke-javacc">
<param name="target" location="${demo.src}/org/apache/lucene/demo/html/HTMLParser.jj"/>
<param name="output.dir" location="${demo.src}/org/apache/lucene/demo/html"/>
</antcall>
</target>
<target name="invoke-javacc">
<java classname="${javacc.main.class}" fork="true">
<classpath path="${javacc.jar}"/>
<sysproperty key="install.root" file="${javacc.home}"/>
<arg value="-OUTPUT_DIRECTORY:${output.dir}"/>
<arg value="${target}"/>
</java>
</target>
</project>

View File

@ -58,8 +58,8 @@ junit.reports = ${build.dir}/unit-reports
# Home directory of JavaCC
javacc.home = .
javacc.zip.dir = ${javacc.home}/lib
javacc.zip = ${javacc.zip.dir}/JavaCC.zip
javacc.zip.dir = ${javacc.home}/bin/lib
javacc.jar = ${javacc.zip.dir}/javacc.jar
# Home directory of jakarta-site2
jakarta.site2.home = ../jakarta-site2

View File

@ -0,0 +1,688 @@
/* Generated By:JavaCC: Do not edit this line. HTMLParser.java */
package org.apache.lucene.demo.html;
import java.io.*;
import java.util.Properties;
public class HTMLParser implements HTMLParserConstants {
public static int SUMMARY_LENGTH = 200;
StringBuffer title = new StringBuffer(SUMMARY_LENGTH);
StringBuffer summary = new StringBuffer(SUMMARY_LENGTH * 2);
Properties metaTags=new Properties();
String currentMetaTag="";
int length = 0;
boolean titleComplete = false;
boolean inTitle = false;
boolean inMetaTag = false;
boolean inStyle = false;
boolean inScript = false;
boolean afterTag = false;
boolean afterSpace = false;
String eol = System.getProperty("line.separator");
PipedReader pipeIn = null;
PipedWriter pipeOut;
public HTMLParser(File file) throws FileNotFoundException {
this(new FileInputStream(file));
}
public String getTitle() throws IOException, InterruptedException {
if (pipeIn == null)
getReader(); // spawn parsing thread
while (true) {
synchronized(this) {
if (titleComplete || (length > SUMMARY_LENGTH))
break;
wait(10);
}
}
return title.toString().trim();
}
public Properties getMetaTags() throws IOException,
InterruptedException {
if (pipeIn == null)
getReader(); // spawn parsing thread
while (true) {
synchronized(this) {
if (titleComplete || (length > SUMMARY_LENGTH))
break;
wait(10);
}
}
return metaTags;
}
public String getSummary() throws IOException, InterruptedException {
if (pipeIn == null)
getReader(); // spawn parsing thread
while (true) {
synchronized(this) {
if (summary.length() >= SUMMARY_LENGTH)
break;
wait(10);
}
}
if (summary.length() > SUMMARY_LENGTH)
summary.setLength(SUMMARY_LENGTH);
String sum = summary.toString().trim();
String tit = getTitle();
if (sum.startsWith(tit))
return sum.substring(tit.length());
else
return sum;
}
public Reader getReader() throws IOException {
if (pipeIn == null) {
pipeIn = new PipedReader();
pipeOut = new PipedWriter(pipeIn);
Thread thread = new ParserThread(this);
thread.start(); // start parsing
}
return pipeIn;
}
void addToSummary(String text) {
if (summary.length() < SUMMARY_LENGTH) {
summary.append(text);
if (summary.length() >= SUMMARY_LENGTH) {
synchronized(this) {
notifyAll();
}
}
}
}
void addText(String text) throws IOException {
if (inScript)
return;
if (inStyle)
return;
if (inMetaTag)
{
metaTags.setProperty(currentMetaTag, text);
return;
}
if (inTitle)
title.append(text);
else {
addToSummary(text);
if (!titleComplete && !title.equals("")) { // finished title
synchronized(this) {
titleComplete = true; // tell waiting threads
notifyAll();
}
}
}
length += text.length();
pipeOut.write(text);
afterSpace = false;
}
void addSpace() throws IOException {
if (inScript)
return;
if (!afterSpace) {
if (inTitle)
title.append(" ");
else
addToSummary(" ");
String space = afterTag ? eol : " ";
length += space.length();
pipeOut.write(space);
afterSpace = true;
}
}
final public void HTMLDocument() throws ParseException, IOException {
Token t;
label_1:
while (true) {
switch ((jj_ntk==-1)?jj_ntk():jj_ntk) {
case TagName:
case DeclName:
case Comment1:
case Comment2:
case Word:
case Entity:
case Space:
case Punct:
;
break;
default:
jj_la1[0] = jj_gen;
break label_1;
}
switch ((jj_ntk==-1)?jj_ntk():jj_ntk) {
case TagName:
Tag();
afterTag = true;
break;
case DeclName:
t = Decl();
afterTag = true;
break;
case Comment1:
case Comment2:
CommentTag();
afterTag = true;
break;
case Word:
t = jj_consume_token(Word);
addText(t.image); afterTag = false;
break;
case Entity:
t = jj_consume_token(Entity);
addText(Entities.decode(t.image)); afterTag = false;
break;
case Punct:
t = jj_consume_token(Punct);
addText(t.image); afterTag = false;
break;
case Space:
jj_consume_token(Space);
addSpace(); afterTag = false;
break;
default:
jj_la1[1] = jj_gen;
jj_consume_token(-1);
throw new ParseException();
}
}
jj_consume_token(0);
}
final public void Tag() throws ParseException, IOException {
Token t1, t2;
boolean inImg = false;
t1 = jj_consume_token(TagName);
inTitle = t1.image.equalsIgnoreCase("<title"); // keep track if in <TITLE>
inMetaTag = t1.image.equalsIgnoreCase("<META"); // keep track if in <META>
inStyle = t1.image.equalsIgnoreCase("<STYLE"); // keep track if in <STYLE>
inImg = t1.image.equalsIgnoreCase("<img"); // keep track if in <IMG>
if (inScript) { // keep track if in <SCRIPT>
inScript = !t1.image.equalsIgnoreCase("</script");
} else {
inScript = t1.image.equalsIgnoreCase("<script");
}
label_2:
while (true) {
switch ((jj_ntk==-1)?jj_ntk():jj_ntk) {
case ArgName:
;
break;
default:
jj_la1[2] = jj_gen;
break label_2;
}
t1 = jj_consume_token(ArgName);
switch ((jj_ntk==-1)?jj_ntk():jj_ntk) {
case ArgEquals:
jj_consume_token(ArgEquals);
switch ((jj_ntk==-1)?jj_ntk():jj_ntk) {
case ArgValue:
case ArgQuote1:
case ArgQuote2:
t2 = ArgValue();
if (inImg && t1.image.equalsIgnoreCase("alt") && t2 != null)
addText("[" + t2.image + "]");
if(inMetaTag &&
( t1.image.equalsIgnoreCase("name") ||
t1.image.equalsIgnoreCase("HTTP-EQUIV")
)
&& t2 != null)
{
currentMetaTag=t2.image.toLowerCase();
}
if(inMetaTag && t1.image.equalsIgnoreCase("content") && t2 !=
null)
{
addText(t2.image);
}
break;
default:
jj_la1[3] = jj_gen;
;
}
break;
default:
jj_la1[4] = jj_gen;
;
}
}
jj_consume_token(TagEnd);
}
final public Token ArgValue() throws ParseException {
Token t = null;
switch ((jj_ntk==-1)?jj_ntk():jj_ntk) {
case ArgValue:
t = jj_consume_token(ArgValue);
{if (true) return t;}
break;
default:
jj_la1[5] = jj_gen;
if (jj_2_1(2)) {
jj_consume_token(ArgQuote1);
jj_consume_token(CloseQuote1);
{if (true) return t;}
} else {
switch ((jj_ntk==-1)?jj_ntk():jj_ntk) {
case ArgQuote1:
jj_consume_token(ArgQuote1);
t = jj_consume_token(Quote1Text);
jj_consume_token(CloseQuote1);
{if (true) return t;}
break;
default:
jj_la1[6] = jj_gen;
if (jj_2_2(2)) {
jj_consume_token(ArgQuote2);
jj_consume_token(CloseQuote2);
{if (true) return t;}
} else {
switch ((jj_ntk==-1)?jj_ntk():jj_ntk) {
case ArgQuote2:
jj_consume_token(ArgQuote2);
t = jj_consume_token(Quote2Text);
jj_consume_token(CloseQuote2);
{if (true) return t;}
break;
default:
jj_la1[7] = jj_gen;
jj_consume_token(-1);
throw new ParseException();
}
}
}
}
}
throw new Error("Missing return statement in function");
}
final public Token Decl() throws ParseException {
Token t;
t = jj_consume_token(DeclName);
label_3:
while (true) {
switch ((jj_ntk==-1)?jj_ntk():jj_ntk) {
case ArgName:
case ArgEquals:
case ArgValue:
case ArgQuote1:
case ArgQuote2:
;
break;
default:
jj_la1[8] = jj_gen;
break label_3;
}
switch ((jj_ntk==-1)?jj_ntk():jj_ntk) {
case ArgName:
jj_consume_token(ArgName);
break;
case ArgValue:
case ArgQuote1:
case ArgQuote2:
ArgValue();
break;
case ArgEquals:
jj_consume_token(ArgEquals);
break;
default:
jj_la1[9] = jj_gen;
jj_consume_token(-1);
throw new ParseException();
}
}
jj_consume_token(TagEnd);
{if (true) return t;}
throw new Error("Missing return statement in function");
}
final public void CommentTag() throws ParseException {
switch ((jj_ntk==-1)?jj_ntk():jj_ntk) {
case Comment1:
jj_consume_token(Comment1);
label_4:
while (true) {
switch ((jj_ntk==-1)?jj_ntk():jj_ntk) {
case CommentText1:
;
break;
default:
jj_la1[10] = jj_gen;
break label_4;
}
jj_consume_token(CommentText1);
}
jj_consume_token(CommentEnd1);
break;
case Comment2:
jj_consume_token(Comment2);
label_5:
while (true) {
switch ((jj_ntk==-1)?jj_ntk():jj_ntk) {
case CommentText2:
;
break;
default:
jj_la1[11] = jj_gen;
break label_5;
}
jj_consume_token(CommentText2);
}
jj_consume_token(CommentEnd2);
break;
default:
jj_la1[12] = jj_gen;
jj_consume_token(-1);
throw new ParseException();
}
}
final private boolean jj_2_1(int xla) {
jj_la = xla; jj_lastpos = jj_scanpos = token;
try { return !jj_3_1(); }
catch(LookaheadSuccess ls) { return true; }
finally { jj_save(0, xla); }
}
final private boolean jj_2_2(int xla) {
jj_la = xla; jj_lastpos = jj_scanpos = token;
try { return !jj_3_2(); }
catch(LookaheadSuccess ls) { return true; }
finally { jj_save(1, xla); }
}
final private boolean jj_3_1() {
if (jj_scan_token(ArgQuote1)) return true;
if (jj_scan_token(CloseQuote1)) return true;
return false;
}
final private boolean jj_3_2() {
if (jj_scan_token(ArgQuote2)) return true;
if (jj_scan_token(CloseQuote2)) return true;
return false;
}
public HTMLParserTokenManager token_source;
SimpleCharStream jj_input_stream;
public Token token, jj_nt;
private int jj_ntk;
private Token jj_scanpos, jj_lastpos;
private int jj_la;
public boolean lookingAhead = false;
private boolean jj_semLA;
private int jj_gen;
final private int[] jj_la1 = new int[13];
static private int[] jj_la1_0;
static {
jj_la1_0();
}
private static void jj_la1_0() {
jj_la1_0 = new int[] {0xb3e,0xb3e,0x1000,0x38000,0x2000,0x8000,0x10000,0x20000,0x3b000,0x3b000,0x800000,0x2000000,0x18,};
}
final private JJCalls[] jj_2_rtns = new JJCalls[2];
private boolean jj_rescan = false;
private int jj_gc = 0;
public HTMLParser(java.io.InputStream stream) {
jj_input_stream = new SimpleCharStream(stream, 1, 1);
token_source = new HTMLParserTokenManager(jj_input_stream);
token = new Token();
jj_ntk = -1;
jj_gen = 0;
for (int i = 0; i < 13; i++) jj_la1[i] = -1;
for (int i = 0; i < jj_2_rtns.length; i++) jj_2_rtns[i] = new JJCalls();
}
public void ReInit(java.io.InputStream stream) {
jj_input_stream.ReInit(stream, 1, 1);
token_source.ReInit(jj_input_stream);
token = new Token();
jj_ntk = -1;
jj_gen = 0;
for (int i = 0; i < 13; i++) jj_la1[i] = -1;
for (int i = 0; i < jj_2_rtns.length; i++) jj_2_rtns[i] = new JJCalls();
}
public HTMLParser(java.io.Reader stream) {
jj_input_stream = new SimpleCharStream(stream, 1, 1);
token_source = new HTMLParserTokenManager(jj_input_stream);
token = new Token();
jj_ntk = -1;
jj_gen = 0;
for (int i = 0; i < 13; i++) jj_la1[i] = -1;
for (int i = 0; i < jj_2_rtns.length; i++) jj_2_rtns[i] = new JJCalls();
}
public void ReInit(java.io.Reader stream) {
jj_input_stream.ReInit(stream, 1, 1);
token_source.ReInit(jj_input_stream);
token = new Token();
jj_ntk = -1;
jj_gen = 0;
for (int i = 0; i < 13; i++) jj_la1[i] = -1;
for (int i = 0; i < jj_2_rtns.length; i++) jj_2_rtns[i] = new JJCalls();
}
public HTMLParser(HTMLParserTokenManager tm) {
token_source = tm;
token = new Token();
jj_ntk = -1;
jj_gen = 0;
for (int i = 0; i < 13; i++) jj_la1[i] = -1;
for (int i = 0; i < jj_2_rtns.length; i++) jj_2_rtns[i] = new JJCalls();
}
public void ReInit(HTMLParserTokenManager tm) {
token_source = tm;
token = new Token();
jj_ntk = -1;
jj_gen = 0;
for (int i = 0; i < 13; i++) jj_la1[i] = -1;
for (int i = 0; i < jj_2_rtns.length; i++) jj_2_rtns[i] = new JJCalls();
}
final private Token jj_consume_token(int kind) throws ParseException {
Token oldToken;
if ((oldToken = token).next != null) token = token.next;
else token = token.next = token_source.getNextToken();
jj_ntk = -1;
if (token.kind == kind) {
jj_gen++;
if (++jj_gc > 100) {
jj_gc = 0;
for (int i = 0; i < jj_2_rtns.length; i++) {
JJCalls c = jj_2_rtns[i];
while (c != null) {
if (c.gen < jj_gen) c.first = null;
c = c.next;
}
}
}
return token;
}
token = oldToken;
jj_kind = kind;
throw generateParseException();
}
static private final class LookaheadSuccess extends java.lang.Error { }
final private LookaheadSuccess jj_ls = new LookaheadSuccess();
final private boolean jj_scan_token(int kind) {
if (jj_scanpos == jj_lastpos) {
jj_la--;
if (jj_scanpos.next == null) {
jj_lastpos = jj_scanpos = jj_scanpos.next = token_source.getNextToken();
} else {
jj_lastpos = jj_scanpos = jj_scanpos.next;
}
} else {
jj_scanpos = jj_scanpos.next;
}
if (jj_rescan) {
int i = 0; Token tok = token;
while (tok != null && tok != jj_scanpos) { i++; tok = tok.next; }
if (tok != null) jj_add_error_token(kind, i);
}
if (jj_scanpos.kind != kind) return true;
if (jj_la == 0 && jj_scanpos == jj_lastpos) throw jj_ls;
return false;
}
final public Token getNextToken() {
if (token.next != null) token = token.next;
else token = token.next = token_source.getNextToken();
jj_ntk = -1;
jj_gen++;
return token;
}
final public Token getToken(int index) {
Token t = lookingAhead ? jj_scanpos : token;
for (int i = 0; i < index; i++) {
if (t.next != null) t = t.next;
else t = t.next = token_source.getNextToken();
}
return t;
}
final private int jj_ntk() {
if ((jj_nt=token.next) == null)
return (jj_ntk = (token.next=token_source.getNextToken()).kind);
else
return (jj_ntk = jj_nt.kind);
}
private java.util.Vector jj_expentries = new java.util.Vector();
private int[] jj_expentry;
private int jj_kind = -1;
private int[] jj_lasttokens = new int[100];
private int jj_endpos;
private void jj_add_error_token(int kind, int pos) {
if (pos >= 100) return;
if (pos == jj_endpos + 1) {
jj_lasttokens[jj_endpos++] = kind;
} else if (jj_endpos != 0) {
jj_expentry = new int[jj_endpos];
for (int i = 0; i < jj_endpos; i++) {
jj_expentry[i] = jj_lasttokens[i];
}
boolean exists = false;
for (java.util.Enumeration e = jj_expentries.elements(); e.hasMoreElements();) {
int[] oldentry = (int[])(e.nextElement());
if (oldentry.length == jj_expentry.length) {
exists = true;
for (int i = 0; i < jj_expentry.length; i++) {
if (oldentry[i] != jj_expentry[i]) {
exists = false;
break;
}
}
if (exists) break;
}
}
if (!exists) jj_expentries.addElement(jj_expentry);
if (pos != 0) jj_lasttokens[(jj_endpos = pos) - 1] = kind;
}
}
public ParseException generateParseException() {
jj_expentries.removeAllElements();
boolean[] la1tokens = new boolean[27];
for (int i = 0; i < 27; i++) {
la1tokens[i] = false;
}
if (jj_kind >= 0) {
la1tokens[jj_kind] = true;
jj_kind = -1;
}
for (int i = 0; i < 13; i++) {
if (jj_la1[i] == jj_gen) {
for (int j = 0; j < 32; j++) {
if ((jj_la1_0[i] & (1<<j)) != 0) {
la1tokens[j] = true;
}
}
}
}
for (int i = 0; i < 27; i++) {
if (la1tokens[i]) {
jj_expentry = new int[1];
jj_expentry[0] = i;
jj_expentries.addElement(jj_expentry);
}
}
jj_endpos = 0;
jj_rescan_token();
jj_add_error_token(0, 0);
int[][] exptokseq = new int[jj_expentries.size()][];
for (int i = 0; i < jj_expentries.size(); i++) {
exptokseq[i] = (int[])jj_expentries.elementAt(i);
}
return new ParseException(token, exptokseq, tokenImage);
}
final public void enable_tracing() {
}
final public void disable_tracing() {
}
final private void jj_rescan_token() {
jj_rescan = true;
for (int i = 0; i < 2; i++) {
JJCalls p = jj_2_rtns[i];
do {
if (p.gen > jj_gen) {
jj_la = p.arg; jj_lastpos = jj_scanpos = p.first;
switch (i) {
case 0: jj_3_1(); break;
case 1: jj_3_2(); break;
}
}
p = p.next;
} while (p != null);
}
jj_rescan = false;
}
final private void jj_save(int index, int xla) {
JJCalls p = jj_2_rtns[index];
while (p.gen > jj_gen) {
if (p.next == null) { p = p.next = new JJCalls(); break; }
p = p.next;
}
p.gen = jj_gen + xla - jj_la; p.first = token; p.arg = xla;
}
static final class JJCalls {
int gen;
Token first;
int arg;
JJCalls next;
}
// void handleException(Exception e) {
// System.out.println(e.toString()); // print the error message
// System.out.println("Skipping...");
// Token t;
// do {
// t = getNextToken();
// } while (t.kind != TagEnd);
// }
}

View File

@ -0,0 +1,71 @@
/* Generated By:JavaCC: Do not edit this line. HTMLParserConstants.java */
package org.apache.lucene.demo.html;
public interface HTMLParserConstants {
int EOF = 0;
int TagName = 1;
int DeclName = 2;
int Comment1 = 3;
int Comment2 = 4;
int Word = 5;
int LET = 6;
int NUM = 7;
int Entity = 8;
int Space = 9;
int SP = 10;
int Punct = 11;
int ArgName = 12;
int ArgEquals = 13;
int TagEnd = 14;
int ArgValue = 15;
int ArgQuote1 = 16;
int ArgQuote2 = 17;
int Quote1Text = 19;
int CloseQuote1 = 20;
int Quote2Text = 21;
int CloseQuote2 = 22;
int CommentText1 = 23;
int CommentEnd1 = 24;
int CommentText2 = 25;
int CommentEnd2 = 26;
int DEFAULT = 0;
int WithinTag = 1;
int AfterEquals = 2;
int WithinQuote1 = 3;
int WithinQuote2 = 4;
int WithinComment1 = 5;
int WithinComment2 = 6;
String[] tokenImage = {
"<EOF>",
"<TagName>",
"<DeclName>",
"\"<!--\"",
"\"<!\"",
"<Word>",
"<LET>",
"<NUM>",
"<Entity>",
"<Space>",
"<SP>",
"<Punct>",
"<ArgName>",
"\"=\"",
"<TagEnd>",
"<ArgValue>",
"\"\\\'\"",
"\"\\\"\"",
"<token of kind 18>",
"<Quote1Text>",
"<CloseQuote1>",
"<Quote2Text>",
"<CloseQuote2>",
"<CommentText1>",
"\"-->\"",
"<CommentText2>",
"\">\"",
};
}

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,192 @@
/* Generated By:JavaCC: Do not edit this line. ParseException.java Version 3.0 */
package org.apache.lucene.demo.html;
/**
* This exception is thrown when parse errors are encountered.
* You can explicitly create objects of this exception type by
* calling the method generateParseException in the generated
* parser.
*
* You can modify this class to customize your error reporting
* mechanisms so long as you retain the public fields.
*/
public class ParseException extends Exception {
/**
* This constructor is used by the method "generateParseException"
* in the generated parser. Calling this constructor generates
* a new object of this type with the fields "currentToken",
* "expectedTokenSequences", and "tokenImage" set. The boolean
* flag "specialConstructor" is also set to true to indicate that
* this constructor was used to create this object.
* This constructor calls its super class with the empty string
* to force the "toString" method of parent class "Throwable" to
* print the error message in the form:
* ParseException: <result of getMessage>
*/
public ParseException(Token currentTokenVal,
int[][] expectedTokenSequencesVal,
String[] tokenImageVal
)
{
super("");
specialConstructor = true;
currentToken = currentTokenVal;
expectedTokenSequences = expectedTokenSequencesVal;
tokenImage = tokenImageVal;
}
/**
* The following constructors are for use by you for whatever
* purpose you can think of. Constructing the exception in this
* manner makes the exception behave in the normal way - i.e., as
* documented in the class "Throwable". The fields "errorToken",
* "expectedTokenSequences", and "tokenImage" do not contain
* relevant information. The JavaCC generated code does not use
* these constructors.
*/
public ParseException() {
super();
specialConstructor = false;
}
public ParseException(String message) {
super(message);
specialConstructor = false;
}
/**
* This variable determines which constructor was used to create
* this object and thereby affects the semantics of the
* "getMessage" method (see below).
*/
protected boolean specialConstructor;
/**
* This is the last token that has been consumed successfully. If
* this object has been created due to a parse error, the token
* followng this token will (therefore) be the first error token.
*/
public Token currentToken;
/**
* Each entry in this array is an array of integers. Each array
* of integers represents a sequence of tokens (by their ordinal
* values) that is expected at this point of the parse.
*/
public int[][] expectedTokenSequences;
/**
* This is a reference to the "tokenImage" array of the generated
* parser within which the parse error occurred. This array is
* defined in the generated ...Constants interface.
*/
public String[] tokenImage;
/**
* This method has the standard behavior when this object has been
* created using the standard constructors. Otherwise, it uses
* "currentToken" and "expectedTokenSequences" to generate a parse
* error message and returns it. If this object has been created
* due to a parse error, and you do not catch it (it gets thrown
* from the parser), then this method is called during the printing
* of the final stack trace, and hence the correct error message
* gets displayed.
*/
public String getMessage() {
if (!specialConstructor) {
return super.getMessage();
}
String expected = "";
int maxSize = 0;
for (int i = 0; i < expectedTokenSequences.length; i++) {
if (maxSize < expectedTokenSequences[i].length) {
maxSize = expectedTokenSequences[i].length;
}
for (int j = 0; j < expectedTokenSequences[i].length; j++) {
expected += tokenImage[expectedTokenSequences[i][j]] + " ";
}
if (expectedTokenSequences[i][expectedTokenSequences[i].length - 1] != 0) {
expected += "...";
}
expected += eol + " ";
}
String retval = "Encountered \"";
Token tok = currentToken.next;
for (int i = 0; i < maxSize; i++) {
if (i != 0) retval += " ";
if (tok.kind == 0) {
retval += tokenImage[0];
break;
}
retval += add_escapes(tok.image);
tok = tok.next;
}
retval += "\" at line " + currentToken.next.beginLine + ", column " + currentToken.next.beginColumn;
retval += "." + eol;
if (expectedTokenSequences.length == 1) {
retval += "Was expecting:" + eol + " ";
} else {
retval += "Was expecting one of:" + eol + " ";
}
retval += expected;
return retval;
}
/**
* The end of line string for this machine.
*/
protected String eol = System.getProperty("line.separator", "\n");
/**
* Used to convert raw characters to their escaped version
* when these raw version cannot be used as part of an ASCII
* string literal.
*/
protected String add_escapes(String str) {
StringBuffer retval = new StringBuffer();
char ch;
for (int i = 0; i < str.length(); i++) {
switch (str.charAt(i))
{
case 0 :
continue;
case '\b':
retval.append("\\b");
continue;
case '\t':
retval.append("\\t");
continue;
case '\n':
retval.append("\\n");
continue;
case '\f':
retval.append("\\f");
continue;
case '\r':
retval.append("\\r");
continue;
case '\"':
retval.append("\\\"");
continue;
case '\'':
retval.append("\\\'");
continue;
case '\\':
retval.append("\\\\");
continue;
default:
if ((ch = str.charAt(i)) < 0x20 || ch > 0x7e) {
String s = "0000" + Integer.toString(ch, 16);
retval.append("\\u" + s.substring(s.length() - 4, s.length()));
} else {
retval.append(ch);
}
continue;
}
}
return retval.toString();
}
}

View File

@ -0,0 +1,401 @@
/* Generated By:JavaCC: Do not edit this line. SimpleCharStream.java Version 3.0 */
package org.apache.lucene.demo.html;
/**
* An implementation of interface CharStream, where the stream is assumed to
* contain only ASCII characters (without unicode processing).
*/
public class SimpleCharStream
{
public static final boolean staticFlag = false;
int bufsize;
int available;
int tokenBegin;
public int bufpos = -1;
protected int bufline[];
protected int bufcolumn[];
protected int column = 0;
protected int line = 1;
protected boolean prevCharIsCR = false;
protected boolean prevCharIsLF = false;
protected java.io.Reader inputStream;
protected char[] buffer;
protected int maxNextCharInd = 0;
protected int inBuf = 0;
protected void ExpandBuff(boolean wrapAround)
{
char[] newbuffer = new char[bufsize + 2048];
int newbufline[] = new int[bufsize + 2048];
int newbufcolumn[] = new int[bufsize + 2048];
try
{
if (wrapAround)
{
System.arraycopy(buffer, tokenBegin, newbuffer, 0, bufsize - tokenBegin);
System.arraycopy(buffer, 0, newbuffer,
bufsize - tokenBegin, bufpos);
buffer = newbuffer;
System.arraycopy(bufline, tokenBegin, newbufline, 0, bufsize - tokenBegin);
System.arraycopy(bufline, 0, newbufline, bufsize - tokenBegin, bufpos);
bufline = newbufline;
System.arraycopy(bufcolumn, tokenBegin, newbufcolumn, 0, bufsize - tokenBegin);
System.arraycopy(bufcolumn, 0, newbufcolumn, bufsize - tokenBegin, bufpos);
bufcolumn = newbufcolumn;
maxNextCharInd = (bufpos += (bufsize - tokenBegin));
}
else
{
System.arraycopy(buffer, tokenBegin, newbuffer, 0, bufsize - tokenBegin);
buffer = newbuffer;
System.arraycopy(bufline, tokenBegin, newbufline, 0, bufsize - tokenBegin);
bufline = newbufline;
System.arraycopy(bufcolumn, tokenBegin, newbufcolumn, 0, bufsize - tokenBegin);
bufcolumn = newbufcolumn;
maxNextCharInd = (bufpos -= tokenBegin);
}
}
catch (Throwable t)
{
throw new Error(t.getMessage());
}
bufsize += 2048;
available = bufsize;
tokenBegin = 0;
}
protected void FillBuff() throws java.io.IOException
{
if (maxNextCharInd == available)
{
if (available == bufsize)
{
if (tokenBegin > 2048)
{
bufpos = maxNextCharInd = 0;
available = tokenBegin;
}
else if (tokenBegin < 0)
bufpos = maxNextCharInd = 0;
else
ExpandBuff(false);
}
else if (available > tokenBegin)
available = bufsize;
else if ((tokenBegin - available) < 2048)
ExpandBuff(true);
else
available = tokenBegin;
}
int i;
try {
if ((i = inputStream.read(buffer, maxNextCharInd,
available - maxNextCharInd)) == -1)
{
inputStream.close();
throw new java.io.IOException();
}
else
maxNextCharInd += i;
return;
}
catch(java.io.IOException e) {
--bufpos;
backup(0);
if (tokenBegin == -1)
tokenBegin = bufpos;
throw e;
}
}
public char BeginToken() throws java.io.IOException
{
tokenBegin = -1;
char c = readChar();
tokenBegin = bufpos;
return c;
}
protected void UpdateLineColumn(char c)
{
column++;
if (prevCharIsLF)
{
prevCharIsLF = false;
line += (column = 1);
}
else if (prevCharIsCR)
{
prevCharIsCR = false;
if (c == '\n')
{
prevCharIsLF = true;
}
else
line += (column = 1);
}
switch (c)
{
case '\r' :
prevCharIsCR = true;
break;
case '\n' :
prevCharIsLF = true;
break;
case '\t' :
column--;
column += (8 - (column & 07));
break;
default :
break;
}
bufline[bufpos] = line;
bufcolumn[bufpos] = column;
}
public char readChar() throws java.io.IOException
{
if (inBuf > 0)
{
--inBuf;
if (++bufpos == bufsize)
bufpos = 0;
return buffer[bufpos];
}
if (++bufpos >= maxNextCharInd)
FillBuff();
char c = buffer[bufpos];
UpdateLineColumn(c);
return (c);
}
/**
* @deprecated
* @see #getEndColumn
*/
public int getColumn() {
return bufcolumn[bufpos];
}
/**
* @deprecated
* @see #getEndLine
*/
public int getLine() {
return bufline[bufpos];
}
public int getEndColumn() {
return bufcolumn[bufpos];
}
public int getEndLine() {
return bufline[bufpos];
}
public int getBeginColumn() {
return bufcolumn[tokenBegin];
}
public int getBeginLine() {
return bufline[tokenBegin];
}
public void backup(int amount) {
inBuf += amount;
if ((bufpos -= amount) < 0)
bufpos += bufsize;
}
public SimpleCharStream(java.io.Reader dstream, int startline,
int startcolumn, int buffersize)
{
inputStream = dstream;
line = startline;
column = startcolumn - 1;
available = bufsize = buffersize;
buffer = new char[buffersize];
bufline = new int[buffersize];
bufcolumn = new int[buffersize];
}
public SimpleCharStream(java.io.Reader dstream, int startline,
int startcolumn)
{
this(dstream, startline, startcolumn, 4096);
}
public SimpleCharStream(java.io.Reader dstream)
{
this(dstream, 1, 1, 4096);
}
public void ReInit(java.io.Reader dstream, int startline,
int startcolumn, int buffersize)
{
inputStream = dstream;
line = startline;
column = startcolumn - 1;
if (buffer == null || buffersize != buffer.length)
{
available = bufsize = buffersize;
buffer = new char[buffersize];
bufline = new int[buffersize];
bufcolumn = new int[buffersize];
}
prevCharIsLF = prevCharIsCR = false;
tokenBegin = inBuf = maxNextCharInd = 0;
bufpos = -1;
}
public void ReInit(java.io.Reader dstream, int startline,
int startcolumn)
{
ReInit(dstream, startline, startcolumn, 4096);
}
public void ReInit(java.io.Reader dstream)
{
ReInit(dstream, 1, 1, 4096);
}
public SimpleCharStream(java.io.InputStream dstream, int startline,
int startcolumn, int buffersize)
{
this(new java.io.InputStreamReader(dstream), startline, startcolumn, 4096);
}
public SimpleCharStream(java.io.InputStream dstream, int startline,
int startcolumn)
{
this(dstream, startline, startcolumn, 4096);
}
public SimpleCharStream(java.io.InputStream dstream)
{
this(dstream, 1, 1, 4096);
}
public void ReInit(java.io.InputStream dstream, int startline,
int startcolumn, int buffersize)
{
ReInit(new java.io.InputStreamReader(dstream), startline, startcolumn, 4096);
}
public void ReInit(java.io.InputStream dstream)
{
ReInit(dstream, 1, 1, 4096);
}
public void ReInit(java.io.InputStream dstream, int startline,
int startcolumn)
{
ReInit(dstream, startline, startcolumn, 4096);
}
public String GetImage()
{
if (bufpos >= tokenBegin)
return new String(buffer, tokenBegin, bufpos - tokenBegin + 1);
else
return new String(buffer, tokenBegin, bufsize - tokenBegin) +
new String(buffer, 0, bufpos + 1);
}
public char[] GetSuffix(int len)
{
char[] ret = new char[len];
if ((bufpos + 1) >= len)
System.arraycopy(buffer, bufpos - len + 1, ret, 0, len);
else
{
System.arraycopy(buffer, bufsize - (len - bufpos - 1), ret, 0,
len - bufpos - 1);
System.arraycopy(buffer, 0, ret, len - bufpos - 1, bufpos + 1);
}
return ret;
}
public void Done()
{
buffer = null;
bufline = null;
bufcolumn = null;
}
/**
* Method to adjust line and column numbers for the start of a token.
*/
public void adjustBeginLineColumn(int newLine, int newCol)
{
int start = tokenBegin;
int len;
if (bufpos >= tokenBegin)
{
len = bufpos - tokenBegin + inBuf + 1;
}
else
{
len = bufsize - tokenBegin + bufpos + 1 + inBuf;
}
int i = 0, j = 0, k = 0;
int nextColDiff = 0, columnDiff = 0;
while (i < len &&
bufline[j = start % bufsize] == bufline[k = ++start % bufsize])
{
bufline[j] = newLine;
nextColDiff = columnDiff + bufcolumn[k] - bufcolumn[j];
bufcolumn[j] = newCol + columnDiff;
columnDiff = nextColDiff;
i++;
}
if (i < len)
{
bufline[j] = newLine++;
bufcolumn[j] = newCol + columnDiff;
while (i++ < len)
{
if (bufline[j = start % bufsize] != bufline[++start % bufsize])
bufline[j] = newLine++;
else
bufline[j] = newLine;
}
}
line = bufline[j];
column = bufcolumn[j];
}
}

View File

@ -0,0 +1,81 @@
/* Generated By:JavaCC: Do not edit this line. Token.java Version 3.0 */
package org.apache.lucene.demo.html;
/**
* Describes the input token stream.
*/
public class Token {
/**
* An integer that describes the kind of this token. This numbering
* system is determined by JavaCCParser, and a table of these numbers is
* stored in the file ...Constants.java.
*/
public int kind;
/**
* beginLine and beginColumn describe the position of the first character
* of this token; endLine and endColumn describe the position of the
* last character of this token.
*/
public int beginLine, beginColumn, endLine, endColumn;
/**
* The string image of the token.
*/
public String image;
/**
* A reference to the next regular (non-special) token from the input
* stream. If this is the last token from the input stream, or if the
* token manager has not read tokens beyond this one, this field is
* set to null. This is true only if this token is also a regular
* token. Otherwise, see below for a description of the contents of
* this field.
*/
public Token next;
/**
* This field is used to access special tokens that occur prior to this
* token, but after the immediately preceding regular (non-special) token.
* If there are no such special tokens, this field is set to null.
* When there are more than one such special token, this field refers
* to the last of these special tokens, which in turn refers to the next
* previous special token through its specialToken field, and so on
* until the first special token (whose specialToken field is null).
* The next fields of special tokens refer to other special tokens that
* immediately follow it (without an intervening regular token). If there
* is no such token, this field is null.
*/
public Token specialToken;
/**
* Returns the image.
*/
public String toString()
{
return image;
}
/**
* Returns a new Token object, by default. However, if you want, you
* can create and return subclass objects based on the value of ofKind.
* Simply add the cases to the switch for all those special cases.
* For example, if you have a subclass of Token called IDToken that
* you want to create if ofKind is ID, simlpy add something like :
*
* case MyParserConstants.ID : return new IDToken();
*
* to the following switch statement. Then you can cast matchedToken
* variable to the appropriate type and use it in your lexical actions.
*/
public static final Token newToken(int ofKind)
{
switch(ofKind)
{
default : return new Token();
}
}
}

View File

@ -0,0 +1,133 @@
/* Generated By:JavaCC: Do not edit this line. TokenMgrError.java Version 3.0 */
package org.apache.lucene.demo.html;
public class TokenMgrError extends Error
{
/*
* Ordinals for various reasons why an Error of this type can be thrown.
*/
/**
* Lexical error occured.
*/
static final int LEXICAL_ERROR = 0;
/**
* An attempt wass made to create a second instance of a static token manager.
*/
static final int STATIC_LEXER_ERROR = 1;
/**
* Tried to change to an invalid lexical state.
*/
static final int INVALID_LEXICAL_STATE = 2;
/**
* Detected (and bailed out of) an infinite loop in the token manager.
*/
static final int LOOP_DETECTED = 3;
/**
* Indicates the reason why the exception is thrown. It will have
* one of the above 4 values.
*/
int errorCode;
/**
* Replaces unprintable characters by their espaced (or unicode escaped)
* equivalents in the given string
*/
protected static final String addEscapes(String str) {
StringBuffer retval = new StringBuffer();
char ch;
for (int i = 0; i < str.length(); i++) {
switch (str.charAt(i))
{
case 0 :
continue;
case '\b':
retval.append("\\b");
continue;
case '\t':
retval.append("\\t");
continue;
case '\n':
retval.append("\\n");
continue;
case '\f':
retval.append("\\f");
continue;
case '\r':
retval.append("\\r");
continue;
case '\"':
retval.append("\\\"");
continue;
case '\'':
retval.append("\\\'");
continue;
case '\\':
retval.append("\\\\");
continue;
default:
if ((ch = str.charAt(i)) < 0x20 || ch > 0x7e) {
String s = "0000" + Integer.toString(ch, 16);
retval.append("\\u" + s.substring(s.length() - 4, s.length()));
} else {
retval.append(ch);
}
continue;
}
}
return retval.toString();
}
/**
* Returns a detailed message for the Error when it is thrown by the
* token manager to indicate a lexical error.
* Parameters :
* EOFSeen : indicates if EOF caused the lexicl error
* curLexState : lexical state in which this error occured
* errorLine : line number when the error occured
* errorColumn : column number when the error occured
* errorAfter : prefix that was seen before this error occured
* curchar : the offending character
* Note: You can customize the lexical error message by modifying this method.
*/
protected static String LexicalError(boolean EOFSeen, int lexState, int errorLine, int errorColumn, String errorAfter, char curChar) {
return("Lexical error at line " +
errorLine + ", column " +
errorColumn + ". Encountered: " +
(EOFSeen ? "<EOF> " : ("\"" + addEscapes(String.valueOf(curChar)) + "\"") + " (" + (int)curChar + "), ") +
"after : \"" + addEscapes(errorAfter) + "\"");
}
/**
* You can also modify the body of this method to customize your error messages.
* For example, cases like LOOP_DETECTED and INVALID_LEXICAL_STATE are not
* of end-users concern, so you can return something like :
*
* "Internal Error : Please file a bug report .... "
*
* from this method for such cases in the release version of your parser.
*/
public String getMessage() {
return super.getMessage();
}
/*
* Constructors of various flavors follow.
*/
public TokenMgrError() {
}
public TokenMgrError(String message, int reason) {
super(message);
errorCode = reason;
}
public TokenMgrError(boolean EOFSeen, int lexState, int errorLine, int errorColumn, String errorAfter, char curChar, int reason) {
this(LexicalError(EOFSeen, lexState, errorLine, errorColumn, errorAfter, curChar), reason);
}
}

View File

@ -1,6 +0,0 @@
Token.java
StandardTokenizer.java
StandardTokenizerTokenManager.java
TokenMgrError.java
CharStream.java
StandardTokenizerConstants.java

View File

@ -0,0 +1,110 @@
/* Generated By:JavaCC: Do not edit this line. CharStream.java Version 3.0 */
package org.apache.lucene.analysis.standard;
/**
* This interface describes a character stream that maintains line and
* column number positions of the characters. It also has the capability
* to backup the stream to some extent. An implementation of this
* interface is used in the TokenManager implementation generated by
* JavaCCParser.
*
* All the methods except backup can be implemented in any fashion. backup
* needs to be implemented correctly for the correct operation of the lexer.
* Rest of the methods are all used to get information like line number,
* column number and the String that constitutes a token and are not used
* by the lexer. Hence their implementation won't affect the generated lexer's
* operation.
*/
public interface CharStream {
/**
* Returns the next character from the selected input. The method
* of selecting the input is the responsibility of the class
* implementing this interface. Can throw any java.io.IOException.
*/
char readChar() throws java.io.IOException;
/**
* Returns the column position of the character last read.
* @deprecated
* @see #getEndColumn
*/
int getColumn();
/**
* Returns the line number of the character last read.
* @deprecated
* @see #getEndLine
*/
int getLine();
/**
* Returns the column number of the last character for current token (being
* matched after the last call to BeginTOken).
*/
int getEndColumn();
/**
* Returns the line number of the last character for current token (being
* matched after the last call to BeginTOken).
*/
int getEndLine();
/**
* Returns the column number of the first character for current token (being
* matched after the last call to BeginTOken).
*/
int getBeginColumn();
/**
* Returns the line number of the first character for current token (being
* matched after the last call to BeginTOken).
*/
int getBeginLine();
/**
* Backs up the input stream by amount steps. Lexer calls this method if it
* had already read some characters, but could not use them to match a
* (longer) token. So, they will be used again as the prefix of the next
* token and it is the implemetation's responsibility to do this right.
*/
void backup(int amount);
/**
* Returns the next character that marks the beginning of the next token.
* All characters must remain in the buffer between two successive calls
* to this method to implement backup correctly.
*/
char BeginToken() throws java.io.IOException;
/**
* Returns a string made up of characters from the marked token beginning
* to the current buffer position. Implementations have the choice of returning
* anything that they want to. For example, for efficiency, one might decide
* to just return null, which is a valid implementation.
*/
String GetImage();
/**
* Returns an array of characters that make up the suffix of length 'len' for
* the currently matched token. This is used to build up the matched string
* for use in actions in the case of MORE. A simple and inefficient
* implementation of this is as follows :
*
* {
* String t = GetImage();
* return t.substring(t.length() - len, t.length()).toCharArray();
* }
*/
char[] GetSuffix(int len);
/**
* The lexer calls this function to indicate that it is done with the stream
* and hence implementations can free any resources held by this class.
* Again, the body of this function can be just empty and it will not
* affect the lexer's operation.
*/
void Done();
}

View File

@ -0,0 +1,195 @@
/* Generated By:JavaCC: Do not edit this line. StandardTokenizer.java */
package org.apache.lucene.analysis.standard;
import java.io.*;
/** A grammar-based tokenizer constructed with JavaCC.
*
* <p> This should be a good tokenizer for most European-language documents.
*
* <p>Many applications have specific tokenizer needs. If this tokenizer does
* not suit your application, please consider copying this source code
* directory to your project and maintaining your own grammar-based tokenizer.
*/
public class StandardTokenizer extends org.apache.lucene.analysis.Tokenizer implements StandardTokenizerConstants {
/** Constructs a tokenizer for this Reader. */
public StandardTokenizer(Reader reader) {
this(new FastCharStream(reader));
this.input = reader;
}
/** Returns the next token in the stream, or null at EOS.
* <p>The returned token's type is set to an element of {@link
* StandardTokenizerConstants#tokenImage}.
*/
final public org.apache.lucene.analysis.Token next() throws ParseException, IOException {
Token token = null;
switch ((jj_ntk==-1)?jj_ntk():jj_ntk) {
case ALPHANUM:
token = jj_consume_token(ALPHANUM);
break;
case APOSTROPHE:
token = jj_consume_token(APOSTROPHE);
break;
case ACRONYM:
token = jj_consume_token(ACRONYM);
break;
case COMPANY:
token = jj_consume_token(COMPANY);
break;
case EMAIL:
token = jj_consume_token(EMAIL);
break;
case HOST:
token = jj_consume_token(HOST);
break;
case NUM:
token = jj_consume_token(NUM);
break;
case 0:
token = jj_consume_token(0);
break;
default:
jj_la1[0] = jj_gen;
jj_consume_token(-1);
throw new ParseException();
}
if (token.kind == EOF) {
{if (true) return null;}
} else {
{if (true) return
new org.apache.lucene.analysis.Token(token.image,
token.beginColumn,token.endColumn,
tokenImage[token.kind]);}
}
throw new Error("Missing return statement in function");
}
public StandardTokenizerTokenManager token_source;
public Token token, jj_nt;
private int jj_ntk;
private int jj_gen;
final private int[] jj_la1 = new int[1];
static private int[] jj_la1_0;
static {
jj_la1_0();
}
private static void jj_la1_0() {
jj_la1_0 = new int[] {0xff,};
}
public StandardTokenizer(CharStream stream) {
token_source = new StandardTokenizerTokenManager(stream);
token = new Token();
jj_ntk = -1;
jj_gen = 0;
for (int i = 0; i < 1; i++) jj_la1[i] = -1;
}
public void ReInit(CharStream stream) {
token_source.ReInit(stream);
token = new Token();
jj_ntk = -1;
jj_gen = 0;
for (int i = 0; i < 1; i++) jj_la1[i] = -1;
}
public StandardTokenizer(StandardTokenizerTokenManager tm) {
token_source = tm;
token = new Token();
jj_ntk = -1;
jj_gen = 0;
for (int i = 0; i < 1; i++) jj_la1[i] = -1;
}
public void ReInit(StandardTokenizerTokenManager tm) {
token_source = tm;
token = new Token();
jj_ntk = -1;
jj_gen = 0;
for (int i = 0; i < 1; i++) jj_la1[i] = -1;
}
final private Token jj_consume_token(int kind) throws ParseException {
Token oldToken;
if ((oldToken = token).next != null) token = token.next;
else token = token.next = token_source.getNextToken();
jj_ntk = -1;
if (token.kind == kind) {
jj_gen++;
return token;
}
token = oldToken;
jj_kind = kind;
throw generateParseException();
}
final public Token getNextToken() {
if (token.next != null) token = token.next;
else token = token.next = token_source.getNextToken();
jj_ntk = -1;
jj_gen++;
return token;
}
final public Token getToken(int index) {
Token t = token;
for (int i = 0; i < index; i++) {
if (t.next != null) t = t.next;
else t = t.next = token_source.getNextToken();
}
return t;
}
final private int jj_ntk() {
if ((jj_nt=token.next) == null)
return (jj_ntk = (token.next=token_source.getNextToken()).kind);
else
return (jj_ntk = jj_nt.kind);
}
private java.util.Vector jj_expentries = new java.util.Vector();
private int[] jj_expentry;
private int jj_kind = -1;
public ParseException generateParseException() {
jj_expentries.removeAllElements();
boolean[] la1tokens = new boolean[14];
for (int i = 0; i < 14; i++) {
la1tokens[i] = false;
}
if (jj_kind >= 0) {
la1tokens[jj_kind] = true;
jj_kind = -1;
}
for (int i = 0; i < 1; i++) {
if (jj_la1[i] == jj_gen) {
for (int j = 0; j < 32; j++) {
if ((jj_la1_0[i] & (1<<j)) != 0) {
la1tokens[j] = true;
}
}
}
}
for (int i = 0; i < 14; i++) {
if (la1tokens[i]) {
jj_expentry = new int[1];
jj_expentry[0] = i;
jj_expentries.addElement(jj_expentry);
}
}
int[][] exptokseq = new int[jj_expentries.size()][];
for (int i = 0; i < jj_expentries.size(); i++) {
exptokseq[i] = (int[])jj_expentries.elementAt(i);
}
return new ParseException(token, exptokseq, tokenImage);
}
final public void enable_tracing() {
}
final public void disable_tracing() {
}
}

View File

@ -0,0 +1,40 @@
/* Generated By:JavaCC: Do not edit this line. StandardTokenizerConstants.java */
package org.apache.lucene.analysis.standard;
public interface StandardTokenizerConstants {
int EOF = 0;
int ALPHANUM = 1;
int APOSTROPHE = 2;
int ACRONYM = 3;
int COMPANY = 4;
int EMAIL = 5;
int HOST = 6;
int NUM = 7;
int P = 8;
int HAS_DIGIT = 9;
int ALPHA = 10;
int LETTER = 11;
int DIGIT = 12;
int NOISE = 13;
int DEFAULT = 0;
String[] tokenImage = {
"<EOF>",
"<ALPHANUM>",
"<APOSTROPHE>",
"<ACRONYM>",
"<COMPANY>",
"<EMAIL>",
"<HOST>",
"<NUM>",
"<P>",
"<HAS_DIGIT>",
"<ALPHA>",
"<LETTER>",
"<DIGIT>",
"<NOISE>",
};
}

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,81 @@
/* Generated By:JavaCC: Do not edit this line. Token.java Version 3.0 */
package org.apache.lucene.analysis.standard;
/**
* Describes the input token stream.
*/
public class Token {
/**
* An integer that describes the kind of this token. This numbering
* system is determined by JavaCCParser, and a table of these numbers is
* stored in the file ...Constants.java.
*/
public int kind;
/**
* beginLine and beginColumn describe the position of the first character
* of this token; endLine and endColumn describe the position of the
* last character of this token.
*/
public int beginLine, beginColumn, endLine, endColumn;
/**
* The string image of the token.
*/
public String image;
/**
* A reference to the next regular (non-special) token from the input
* stream. If this is the last token from the input stream, or if the
* token manager has not read tokens beyond this one, this field is
* set to null. This is true only if this token is also a regular
* token. Otherwise, see below for a description of the contents of
* this field.
*/
public Token next;
/**
* This field is used to access special tokens that occur prior to this
* token, but after the immediately preceding regular (non-special) token.
* If there are no such special tokens, this field is set to null.
* When there are more than one such special token, this field refers
* to the last of these special tokens, which in turn refers to the next
* previous special token through its specialToken field, and so on
* until the first special token (whose specialToken field is null).
* The next fields of special tokens refer to other special tokens that
* immediately follow it (without an intervening regular token). If there
* is no such token, this field is null.
*/
public Token specialToken;
/**
* Returns the image.
*/
public String toString()
{
return image;
}
/**
* Returns a new Token object, by default. However, if you want, you
* can create and return subclass objects based on the value of ofKind.
* Simply add the cases to the switch for all those special cases.
* For example, if you have a subclass of Token called IDToken that
* you want to create if ofKind is ID, simlpy add something like :
*
* case MyParserConstants.ID : return new IDToken();
*
* to the following switch statement. Then you can cast matchedToken
* variable to the appropriate type and use it in your lexical actions.
*/
public static final Token newToken(int ofKind)
{
switch(ofKind)
{
default : return new Token();
}
}
}

View File

@ -0,0 +1,133 @@
/* Generated By:JavaCC: Do not edit this line. TokenMgrError.java Version 3.0 */
package org.apache.lucene.analysis.standard;
public class TokenMgrError extends Error
{
/*
* Ordinals for various reasons why an Error of this type can be thrown.
*/
/**
* Lexical error occured.
*/
static final int LEXICAL_ERROR = 0;
/**
* An attempt wass made to create a second instance of a static token manager.
*/
static final int STATIC_LEXER_ERROR = 1;
/**
* Tried to change to an invalid lexical state.
*/
static final int INVALID_LEXICAL_STATE = 2;
/**
* Detected (and bailed out of) an infinite loop in the token manager.
*/
static final int LOOP_DETECTED = 3;
/**
* Indicates the reason why the exception is thrown. It will have
* one of the above 4 values.
*/
int errorCode;
/**
* Replaces unprintable characters by their espaced (or unicode escaped)
* equivalents in the given string
*/
protected static final String addEscapes(String str) {
StringBuffer retval = new StringBuffer();
char ch;
for (int i = 0; i < str.length(); i++) {
switch (str.charAt(i))
{
case 0 :
continue;
case '\b':
retval.append("\\b");
continue;
case '\t':
retval.append("\\t");
continue;
case '\n':
retval.append("\\n");
continue;
case '\f':
retval.append("\\f");
continue;
case '\r':
retval.append("\\r");
continue;
case '\"':
retval.append("\\\"");
continue;
case '\'':
retval.append("\\\'");
continue;
case '\\':
retval.append("\\\\");
continue;
default:
if ((ch = str.charAt(i)) < 0x20 || ch > 0x7e) {
String s = "0000" + Integer.toString(ch, 16);
retval.append("\\u" + s.substring(s.length() - 4, s.length()));
} else {
retval.append(ch);
}
continue;
}
}
return retval.toString();
}
/**
* Returns a detailed message for the Error when it is thrown by the
* token manager to indicate a lexical error.
* Parameters :
* EOFSeen : indicates if EOF caused the lexicl error
* curLexState : lexical state in which this error occured
* errorLine : line number when the error occured
* errorColumn : column number when the error occured
* errorAfter : prefix that was seen before this error occured
* curchar : the offending character
* Note: You can customize the lexical error message by modifying this method.
*/
protected static String LexicalError(boolean EOFSeen, int lexState, int errorLine, int errorColumn, String errorAfter, char curChar) {
return("Lexical error at line " +
errorLine + ", column " +
errorColumn + ". Encountered: " +
(EOFSeen ? "<EOF> " : ("\"" + addEscapes(String.valueOf(curChar)) + "\"") + " (" + (int)curChar + "), ") +
"after : \"" + addEscapes(errorAfter) + "\"");
}
/**
* You can also modify the body of this method to customize your error messages.
* For example, cases like LOOP_DETECTED and INVALID_LEXICAL_STATE are not
* of end-users concern, so you can return something like :
*
* "Internal Error : Please file a bug report .... "
*
* from this method for such cases in the release version of your parser.
*/
public String getMessage() {
return super.getMessage();
}
/*
* Constructors of various flavors follow.
*/
public TokenMgrError() {
}
public TokenMgrError(String message, int reason) {
super(message);
errorCode = reason;
}
public TokenMgrError(boolean EOFSeen, int lexState, int errorLine, int errorColumn, String errorAfter, char curChar, int reason) {
this(LexicalError(EOFSeen, lexState, errorLine, errorColumn, errorAfter, curChar), reason);
}
}

View File

@ -1,6 +0,0 @@
QueryParser.java
TokenMgrError.java
ParseException.java
Token.java
TokenManager.java
QueryParserConstants.java

View File

@ -0,0 +1,110 @@
/* Generated By:JavaCC: Do not edit this line. CharStream.java Version 3.0 */
package org.apache.lucene.queryParser;
/**
* This interface describes a character stream that maintains line and
* column number positions of the characters. It also has the capability
* to backup the stream to some extent. An implementation of this
* interface is used in the TokenManager implementation generated by
* JavaCCParser.
*
* All the methods except backup can be implemented in any fashion. backup
* needs to be implemented correctly for the correct operation of the lexer.
* Rest of the methods are all used to get information like line number,
* column number and the String that constitutes a token and are not used
* by the lexer. Hence their implementation won't affect the generated lexer's
* operation.
*/
public interface CharStream {
/**
* Returns the next character from the selected input. The method
* of selecting the input is the responsibility of the class
* implementing this interface. Can throw any java.io.IOException.
*/
char readChar() throws java.io.IOException;
/**
* Returns the column position of the character last read.
* @deprecated
* @see #getEndColumn
*/
int getColumn();
/**
* Returns the line number of the character last read.
* @deprecated
* @see #getEndLine
*/
int getLine();
/**
* Returns the column number of the last character for current token (being
* matched after the last call to BeginTOken).
*/
int getEndColumn();
/**
* Returns the line number of the last character for current token (being
* matched after the last call to BeginTOken).
*/
int getEndLine();
/**
* Returns the column number of the first character for current token (being
* matched after the last call to BeginTOken).
*/
int getBeginColumn();
/**
* Returns the line number of the first character for current token (being
* matched after the last call to BeginTOken).
*/
int getBeginLine();
/**
* Backs up the input stream by amount steps. Lexer calls this method if it
* had already read some characters, but could not use them to match a
* (longer) token. So, they will be used again as the prefix of the next
* token and it is the implemetation's responsibility to do this right.
*/
void backup(int amount);
/**
* Returns the next character that marks the beginning of the next token.
* All characters must remain in the buffer between two successive calls
* to this method to implement backup correctly.
*/
char BeginToken() throws java.io.IOException;
/**
* Returns a string made up of characters from the marked token beginning
* to the current buffer position. Implementations have the choice of returning
* anything that they want to. For example, for efficiency, one might decide
* to just return null, which is a valid implementation.
*/
String GetImage();
/**
* Returns an array of characters that make up the suffix of length 'len' for
* the currently matched token. This is used to build up the matched string
* for use in actions in the case of MORE. A simple and inefficient
* implementation of this is as follows :
*
* {
* String t = GetImage();
* return t.substring(t.length() - len, t.length()).toCharArray();
* }
*/
char[] GetSuffix(int len);
/**
* The lexer calls this function to indicate that it is done with the stream
* and hence implementations can free any resources held by this class.
* Again, the body of this function can be just empty and it will not
* affect the lexer's operation.
*/
void Done();
}

View File

@ -0,0 +1,192 @@
/* Generated By:JavaCC: Do not edit this line. ParseException.java Version 3.0 */
package org.apache.lucene.queryParser;
/**
* This exception is thrown when parse errors are encountered.
* You can explicitly create objects of this exception type by
* calling the method generateParseException in the generated
* parser.
*
* You can modify this class to customize your error reporting
* mechanisms so long as you retain the public fields.
*/
public class ParseException extends Exception {
/**
* This constructor is used by the method "generateParseException"
* in the generated parser. Calling this constructor generates
* a new object of this type with the fields "currentToken",
* "expectedTokenSequences", and "tokenImage" set. The boolean
* flag "specialConstructor" is also set to true to indicate that
* this constructor was used to create this object.
* This constructor calls its super class with the empty string
* to force the "toString" method of parent class "Throwable" to
* print the error message in the form:
* ParseException: <result of getMessage>
*/
public ParseException(Token currentTokenVal,
int[][] expectedTokenSequencesVal,
String[] tokenImageVal
)
{
super("");
specialConstructor = true;
currentToken = currentTokenVal;
expectedTokenSequences = expectedTokenSequencesVal;
tokenImage = tokenImageVal;
}
/**
* The following constructors are for use by you for whatever
* purpose you can think of. Constructing the exception in this
* manner makes the exception behave in the normal way - i.e., as
* documented in the class "Throwable". The fields "errorToken",
* "expectedTokenSequences", and "tokenImage" do not contain
* relevant information. The JavaCC generated code does not use
* these constructors.
*/
public ParseException() {
super();
specialConstructor = false;
}
public ParseException(String message) {
super(message);
specialConstructor = false;
}
/**
* This variable determines which constructor was used to create
* this object and thereby affects the semantics of the
* "getMessage" method (see below).
*/
protected boolean specialConstructor;
/**
* This is the last token that has been consumed successfully. If
* this object has been created due to a parse error, the token
* followng this token will (therefore) be the first error token.
*/
public Token currentToken;
/**
* Each entry in this array is an array of integers. Each array
* of integers represents a sequence of tokens (by their ordinal
* values) that is expected at this point of the parse.
*/
public int[][] expectedTokenSequences;
/**
* This is a reference to the "tokenImage" array of the generated
* parser within which the parse error occurred. This array is
* defined in the generated ...Constants interface.
*/
public String[] tokenImage;
/**
* This method has the standard behavior when this object has been
* created using the standard constructors. Otherwise, it uses
* "currentToken" and "expectedTokenSequences" to generate a parse
* error message and returns it. If this object has been created
* due to a parse error, and you do not catch it (it gets thrown
* from the parser), then this method is called during the printing
* of the final stack trace, and hence the correct error message
* gets displayed.
*/
public String getMessage() {
if (!specialConstructor) {
return super.getMessage();
}
String expected = "";
int maxSize = 0;
for (int i = 0; i < expectedTokenSequences.length; i++) {
if (maxSize < expectedTokenSequences[i].length) {
maxSize = expectedTokenSequences[i].length;
}
for (int j = 0; j < expectedTokenSequences[i].length; j++) {
expected += tokenImage[expectedTokenSequences[i][j]] + " ";
}
if (expectedTokenSequences[i][expectedTokenSequences[i].length - 1] != 0) {
expected += "...";
}
expected += eol + " ";
}
String retval = "Encountered \"";
Token tok = currentToken.next;
for (int i = 0; i < maxSize; i++) {
if (i != 0) retval += " ";
if (tok.kind == 0) {
retval += tokenImage[0];
break;
}
retval += add_escapes(tok.image);
tok = tok.next;
}
retval += "\" at line " + currentToken.next.beginLine + ", column " + currentToken.next.beginColumn;
retval += "." + eol;
if (expectedTokenSequences.length == 1) {
retval += "Was expecting:" + eol + " ";
} else {
retval += "Was expecting one of:" + eol + " ";
}
retval += expected;
return retval;
}
/**
* The end of line string for this machine.
*/
protected String eol = System.getProperty("line.separator", "\n");
/**
* Used to convert raw characters to their escaped version
* when these raw version cannot be used as part of an ASCII
* string literal.
*/
protected String add_escapes(String str) {
StringBuffer retval = new StringBuffer();
char ch;
for (int i = 0; i < str.length(); i++) {
switch (str.charAt(i))
{
case 0 :
continue;
case '\b':
retval.append("\\b");
continue;
case '\t':
retval.append("\\t");
continue;
case '\n':
retval.append("\\n");
continue;
case '\f':
retval.append("\\f");
continue;
case '\r':
retval.append("\\r");
continue;
case '\"':
retval.append("\\\"");
continue;
case '\'':
retval.append("\\\'");
continue;
case '\\':
retval.append("\\\\");
continue;
default:
if ((ch = str.charAt(i)) < 0x20 || ch > 0x7e) {
String s = "0000" + Integer.toString(ch, 16);
retval.append("\\u" + s.substring(s.length() - 4, s.length()));
} else {
retval.append(ch);
}
continue;
}
}
return retval.toString();
}
}

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,80 @@
/* Generated By:JavaCC: Do not edit this line. QueryParserConstants.java */
package org.apache.lucene.queryParser;
public interface QueryParserConstants {
int EOF = 0;
int _NUM_CHAR = 1;
int _ESCAPED_CHAR = 2;
int _TERM_START_CHAR = 3;
int _TERM_CHAR = 4;
int _WHITESPACE = 5;
int AND = 7;
int OR = 8;
int NOT = 9;
int PLUS = 10;
int MINUS = 11;
int LPAREN = 12;
int RPAREN = 13;
int COLON = 14;
int CARAT = 15;
int QUOTED = 16;
int TERM = 17;
int FUZZY = 18;
int SLOP = 19;
int PREFIXTERM = 20;
int WILDTERM = 21;
int RANGEIN_START = 22;
int RANGEEX_START = 23;
int NUMBER = 24;
int RANGEIN_TO = 25;
int RANGEIN_END = 26;
int RANGEIN_QUOTED = 27;
int RANGEIN_GOOP = 28;
int RANGEEX_TO = 29;
int RANGEEX_END = 30;
int RANGEEX_QUOTED = 31;
int RANGEEX_GOOP = 32;
int Boost = 0;
int RangeEx = 1;
int RangeIn = 2;
int DEFAULT = 3;
String[] tokenImage = {
"<EOF>",
"<_NUM_CHAR>",
"<_ESCAPED_CHAR>",
"<_TERM_START_CHAR>",
"<_TERM_CHAR>",
"<_WHITESPACE>",
"<token of kind 6>",
"<AND>",
"<OR>",
"<NOT>",
"\"+\"",
"\"-\"",
"\"(\"",
"\")\"",
"\":\"",
"\"^\"",
"<QUOTED>",
"<TERM>",
"\"~\"",
"<SLOP>",
"<PREFIXTERM>",
"<WILDTERM>",
"\"[\"",
"\"{\"",
"<NUMBER>",
"\"TO\"",
"\"]\"",
"<RANGEIN_QUOTED>",
"<RANGEIN_GOOP>",
"\"TO\"",
"\"}\"",
"<RANGEEX_QUOTED>",
"<RANGEEX_GOOP>",
};
}

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,81 @@
/* Generated By:JavaCC: Do not edit this line. Token.java Version 3.0 */
package org.apache.lucene.queryParser;
/**
* Describes the input token stream.
*/
public class Token {
/**
* An integer that describes the kind of this token. This numbering
* system is determined by JavaCCParser, and a table of these numbers is
* stored in the file ...Constants.java.
*/
public int kind;
/**
* beginLine and beginColumn describe the position of the first character
* of this token; endLine and endColumn describe the position of the
* last character of this token.
*/
public int beginLine, beginColumn, endLine, endColumn;
/**
* The string image of the token.
*/
public String image;
/**
* A reference to the next regular (non-special) token from the input
* stream. If this is the last token from the input stream, or if the
* token manager has not read tokens beyond this one, this field is
* set to null. This is true only if this token is also a regular
* token. Otherwise, see below for a description of the contents of
* this field.
*/
public Token next;
/**
* This field is used to access special tokens that occur prior to this
* token, but after the immediately preceding regular (non-special) token.
* If there are no such special tokens, this field is set to null.
* When there are more than one such special token, this field refers
* to the last of these special tokens, which in turn refers to the next
* previous special token through its specialToken field, and so on
* until the first special token (whose specialToken field is null).
* The next fields of special tokens refer to other special tokens that
* immediately follow it (without an intervening regular token). If there
* is no such token, this field is null.
*/
public Token specialToken;
/**
* Returns the image.
*/
public String toString()
{
return image;
}
/**
* Returns a new Token object, by default. However, if you want, you
* can create and return subclass objects based on the value of ofKind.
* Simply add the cases to the switch for all those special cases.
* For example, if you have a subclass of Token called IDToken that
* you want to create if ofKind is ID, simlpy add something like :
*
* case MyParserConstants.ID : return new IDToken();
*
* to the following switch statement. Then you can cast matchedToken
* variable to the appropriate type and use it in your lexical actions.
*/
public static final Token newToken(int ofKind)
{
switch(ofKind)
{
default : return new Token();
}
}
}

View File

@ -0,0 +1,133 @@
/* Generated By:JavaCC: Do not edit this line. TokenMgrError.java Version 3.0 */
package org.apache.lucene.queryParser;
public class TokenMgrError extends Error
{
/*
* Ordinals for various reasons why an Error of this type can be thrown.
*/
/**
* Lexical error occured.
*/
static final int LEXICAL_ERROR = 0;
/**
* An attempt wass made to create a second instance of a static token manager.
*/
static final int STATIC_LEXER_ERROR = 1;
/**
* Tried to change to an invalid lexical state.
*/
static final int INVALID_LEXICAL_STATE = 2;
/**
* Detected (and bailed out of) an infinite loop in the token manager.
*/
static final int LOOP_DETECTED = 3;
/**
* Indicates the reason why the exception is thrown. It will have
* one of the above 4 values.
*/
int errorCode;
/**
* Replaces unprintable characters by their espaced (or unicode escaped)
* equivalents in the given string
*/
protected static final String addEscapes(String str) {
StringBuffer retval = new StringBuffer();
char ch;
for (int i = 0; i < str.length(); i++) {
switch (str.charAt(i))
{
case 0 :
continue;
case '\b':
retval.append("\\b");
continue;
case '\t':
retval.append("\\t");
continue;
case '\n':
retval.append("\\n");
continue;
case '\f':
retval.append("\\f");
continue;
case '\r':
retval.append("\\r");
continue;
case '\"':
retval.append("\\\"");
continue;
case '\'':
retval.append("\\\'");
continue;
case '\\':
retval.append("\\\\");
continue;
default:
if ((ch = str.charAt(i)) < 0x20 || ch > 0x7e) {
String s = "0000" + Integer.toString(ch, 16);
retval.append("\\u" + s.substring(s.length() - 4, s.length()));
} else {
retval.append(ch);
}
continue;
}
}
return retval.toString();
}
/**
* Returns a detailed message for the Error when it is thrown by the
* token manager to indicate a lexical error.
* Parameters :
* EOFSeen : indicates if EOF caused the lexicl error
* curLexState : lexical state in which this error occured
* errorLine : line number when the error occured
* errorColumn : column number when the error occured
* errorAfter : prefix that was seen before this error occured
* curchar : the offending character
* Note: You can customize the lexical error message by modifying this method.
*/
protected static String LexicalError(boolean EOFSeen, int lexState, int errorLine, int errorColumn, String errorAfter, char curChar) {
return("Lexical error at line " +
errorLine + ", column " +
errorColumn + ". Encountered: " +
(EOFSeen ? "<EOF> " : ("\"" + addEscapes(String.valueOf(curChar)) + "\"") + " (" + (int)curChar + "), ") +
"after : \"" + addEscapes(errorAfter) + "\"");
}
/**
* You can also modify the body of this method to customize your error messages.
* For example, cases like LOOP_DETECTED and INVALID_LEXICAL_STATE are not
* of end-users concern, so you can return something like :
*
* "Internal Error : Please file a bug report .... "
*
* from this method for such cases in the release version of your parser.
*/
public String getMessage() {
return super.getMessage();
}
/*
* Constructors of various flavors follow.
*/
public TokenMgrError() {
}
public TokenMgrError(String message, int reason) {
super(message);
errorCode = reason;
}
public TokenMgrError(boolean EOFSeen, int lexState, int errorLine, int errorColumn, String errorAfter, char curChar, int reason) {
this(LexicalError(EOFSeen, lexState, errorLine, errorColumn, errorAfter, curChar), reason);
}
}