LUCENE-6803: Deprecate sandbox Regexp Query

git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1707884 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Uwe Schindler 2015-10-10 15:09:11 +00:00
parent 4edfc729df
commit bbcea96908
17 changed files with 2 additions and 1095 deletions

View File

@ -97,6 +97,8 @@ API Changes
spatial Filters are now subclass Query. The spatial heatmap/facet API
now accepts a Bits parameter to filter counts. (David Smiley, Adrien Grand)
* LUCENE-6803: Deprecate sandbox Regexp Query. (Uwe Schindler)
Optimizations
* LUCENE-6708: TopFieldCollector does not compute the score several times on the

View File

@ -73,7 +73,6 @@ com.sun.jersey.version = 1.9
/hsqldb/hsqldb = 1.8.0.10
/io.airlift/slice = 0.10
/io.netty/netty = 3.7.0.Final
/jakarta-regexp/jakarta-regexp = 1.4
/javax.activation/activation = 1.1.1
/javax.inject/javax.inject= 1
/javax.servlet/javax.servlet-api = 3.1.0

View File

@ -1 +0,0 @@
0ea514a179ac1dd7e81c7e6594468b9b9910d298

View File

@ -1,201 +0,0 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

View File

@ -1,10 +0,0 @@
Apache Regexp
Copyright 2001-2007 The Apache Software Foundation
This product includes software developed at
The Apache Software Foundation (http://www.apache.org/).
It consists of voluntary contributions made by many individuals
on behalf of the Apache Software Foundation. Please visit the
project homepage (http://jakarta.apache.org/regexp) for more
information.

View File

@ -23,9 +23,4 @@
<import file="../module-build.xml"/>
<path id="classpath">
<fileset dir="lib"/>
<path refid="base.classpath"/>
</path>
</project>

View File

@ -18,11 +18,4 @@
-->
<ivy-module version="2.0">
<info organisation="org.apache.lucene" module="sandbox"/>
<configurations defaultconfmapping="compile->master">
<conf name="compile" transitive="false"/>
</configurations>
<dependencies>
<dependency org="jakarta-regexp" name="jakarta-regexp" rev="${/jakarta-regexp/jakarta-regexp}" conf="compile"/>
<exclude org="*" ext="*" matcher="regexp" type="${ivy.exclude.types}"/>
</dependencies>
</ivy-module>

View File

@ -1,175 +0,0 @@
package org.apache.lucene.sandbox.queries.regex;
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
import org.apache.lucene.util.BytesRef;
import org.apache.lucene.util.CharsRefBuilder;
import org.apache.lucene.util.SuppressForbidden;
import org.apache.regexp.CharacterIterator;
import org.apache.regexp.RE;
import org.apache.regexp.REProgram;
import java.lang.reflect.Field;
import java.lang.reflect.Method;
/**
* Implementation tying <a href="http://jakarta.apache.org/regexp">Jakarta
* Regexp</a> to RegexQuery. Jakarta Regexp internally supports a
* {@link RegexCapabilities.RegexMatcher#prefix()} implementation which can offer
* performance gains under certain circumstances. Yet, the implementation appears
* to be rather shaky as it doesn't always provide a prefix even if one would exist.
*/
public class JakartaRegexpCapabilities implements RegexCapabilities {
private static Field prefixField;
private static Method getPrefixMethod;
static {
initClass();
}
@SuppressForbidden(reason = "TODO: Remove this class completely and also the hack around setAccessible!")
private static void initClass() {
try {
getPrefixMethod = REProgram.class.getMethod("getPrefix");
} catch (Exception e) {
getPrefixMethod = null;
}
try {
prefixField = REProgram.class.getDeclaredField("prefix");
prefixField.setAccessible(true);
} catch (Exception e) {
prefixField = null;
}
}
// Define the flags that are possible. Redefine them here
// to avoid exposing the RE class to the caller.
private int flags = RE.MATCH_NORMAL;
/**
* Flag to specify normal, case-sensitive matching behaviour. This is the default.
*/
public static final int FLAG_MATCH_NORMAL = RE.MATCH_NORMAL;
/**
* Flag to specify that matching should be case-independent (folded)
*/
public static final int FLAG_MATCH_CASEINDEPENDENT = RE.MATCH_CASEINDEPENDENT;
/**
* Constructs a RegexCapabilities with the default MATCH_NORMAL match style.
*/
public JakartaRegexpCapabilities() {}
/**
* Constructs a RegexCapabilities with the provided match flags.
* Multiple flags should be ORed together.
*
* @param flags The matching style
*/
public JakartaRegexpCapabilities(int flags) {
this.flags = flags;
}
@Override
public RegexCapabilities.RegexMatcher compile(String regex) {
return new JakartaRegexMatcher(regex, flags);
}
@Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + flags;
return result;
}
@Override
public boolean equals(Object obj) {
if (this == obj) {
return true;
}
if (obj == null) {
return false;
}
if (getClass() != obj.getClass()) {
return false;
}
JakartaRegexpCapabilities other = (JakartaRegexpCapabilities) obj;
return flags == other.flags;
}
class JakartaRegexMatcher implements RegexCapabilities.RegexMatcher {
private RE regexp;
private final CharsRefBuilder utf16 = new CharsRefBuilder();
private final CharacterIterator utf16wrapper = new CharacterIterator() {
@Override
public char charAt(int pos) {
return utf16.charAt(pos);
}
@Override
public boolean isEnd(int pos) {
return pos >= utf16.length();
}
@Override
public String substring(int beginIndex) {
return substring(beginIndex, utf16.length());
}
@Override
public String substring(int beginIndex, int endIndex) {
return new String(utf16.chars(), beginIndex, endIndex - beginIndex);
}
};
public JakartaRegexMatcher(String regex, int flags) {
regexp = new RE(regex, flags);
}
@Override
public boolean match(BytesRef term) {
utf16.copyUTF8Bytes(term);
return regexp.match(utf16wrapper, 0);
}
@Override
public String prefix() {
try {
final char[] prefix;
if (getPrefixMethod != null) {
prefix = (char[]) getPrefixMethod.invoke(regexp.getProgram());
} else if (prefixField != null) {
prefix = (char[]) prefixField.get(regexp.getProgram());
} else {
return null;
}
return prefix == null ? null : new String(prefix);
} catch (Exception e) {
// if we cannot get the prefix, return none
return null;
}
}
}
}

View File

@ -1,125 +0,0 @@
package org.apache.lucene.sandbox.queries.regex;
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import org.apache.lucene.util.BytesRef;
import org.apache.lucene.util.CharsRefBuilder;
/**
* An implementation tying Java's built-in java.util.regex to RegexQuery.
*
* Note that because this implementation currently only returns null from
* {@link RegexCapabilities.RegexMatcher#prefix()} that queries using this implementation
* will enumerate and attempt to {@link RegexCapabilities.RegexMatcher#match(BytesRef)} each
* term for the specified field in the index.
*/
public class JavaUtilRegexCapabilities implements RegexCapabilities {
private int flags = 0;
// Define the optional flags from Pattern that can be used.
// Do this here to keep Pattern contained within this class.
public static final int FLAG_CANON_EQ = Pattern.CANON_EQ;
public static final int FLAG_CASE_INSENSITIVE = Pattern.CASE_INSENSITIVE;
public static final int FLAG_COMMENTS = Pattern.COMMENTS;
public static final int FLAG_DOTALL = Pattern.DOTALL;
public static final int FLAG_LITERAL = Pattern.LITERAL;
public static final int FLAG_MULTILINE = Pattern.MULTILINE;
public static final int FLAG_UNICODE_CASE = Pattern.UNICODE_CASE;
public static final int FLAG_UNIX_LINES = Pattern.UNIX_LINES;
/**
* Default constructor that uses java.util.regex.Pattern
* with its default flags.
*/
public JavaUtilRegexCapabilities() {
this.flags = 0;
}
/**
* Constructor that allows for the modification of the flags that
* the java.util.regex.Pattern will use to compile the regular expression.
* This gives the user the ability to fine-tune how the regular expression
* to match the functionality that they need.
* The {@link java.util.regex.Pattern Pattern} class supports specifying
* these fields via the regular expression text itself, but this gives the caller
* another option to modify the behavior. Useful in cases where the regular expression text
* cannot be modified, or if doing so is undesired.
*
* @param flags The flags that are ORed together.
*/
public JavaUtilRegexCapabilities(int flags) {
this.flags = flags;
}
@Override
public RegexCapabilities.RegexMatcher compile(String regex) {
return new JavaUtilRegexMatcher(regex, flags);
}
@Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + flags;
return result;
}
@Override
public boolean equals(Object obj) {
if (this == obj) {
return true;
}
if (obj == null) {
return false;
}
if (getClass() != obj.getClass()) {
return false;
}
JavaUtilRegexCapabilities other = (JavaUtilRegexCapabilities) obj;
return flags == other.flags;
}
class JavaUtilRegexMatcher implements RegexCapabilities.RegexMatcher {
private final Pattern pattern;
private final Matcher matcher;
private final CharsRefBuilder utf16 = new CharsRefBuilder();
public JavaUtilRegexMatcher(String regex, int flags) {
this.pattern = Pattern.compile(regex, flags);
this.matcher = this.pattern.matcher(utf16.get());
}
@Override
public boolean match(BytesRef term) {
utf16.copyUTF8Bytes(term);
utf16.get();
return matcher.reset().matches();
}
@Override
public String prefix() {
return null;
}
}
}

View File

@ -1,64 +0,0 @@
package org.apache.lucene.sandbox.queries.regex;
import org.apache.lucene.util.BytesRef;
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/**
* Defines basic operations needed by {@link RegexQuery} for a regular
* expression implementation.
*/
public interface RegexCapabilities {
/**
* Called by the constructor of {@link RegexTermsEnum} allowing
* implementations to cache a compiled version of the regular
* expression pattern.
*
* @param pattern regular expression pattern
*/
public RegexMatcher compile(String pattern);
/**
* Interface for basic regex matching.
* <p>
* Implementations return true for {@link #match} if the term
* matches the regex.
* <p>
* Implementing {@link #prefix()} can restrict the TermsEnum to only
* a subset of terms when the regular expression matches a constant
* prefix.
* <p>
* NOTE: implementations cannot seek.
*/
public interface RegexMatcher {
/**
*
* @param term The term in bytes.
* @return true if string matches the pattern last passed to {@link #compile}.
*/
public boolean match(BytesRef term);
/**
* A wise prefix implementation can reduce the term enumeration (and thus increase performance)
* of RegexQuery dramatically!
*
* @return static non-regex prefix of the pattern last passed to {@link #compile}. May return null.
*/
public String prefix();
}
}

View File

@ -1,120 +0,0 @@
package org.apache.lucene.sandbox.queries.regex;
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
import org.apache.lucene.search.MultiTermQuery;
import org.apache.lucene.index.FilteredTermsEnum;
import org.apache.lucene.search.RegexpQuery; // javadoc
import org.apache.lucene.index.Term;
import org.apache.lucene.index.Terms;
import org.apache.lucene.util.AttributeSource;
import org.apache.lucene.util.ToStringUtils;
import java.io.IOException;
/** Implements the regular expression term search query.
* The expressions supported depend on the regular expression implementation
* used by way of the {@link RegexCapabilities} interface.
* <p>
* NOTE: You may wish to consider using the regex query support
* in {@link RegexpQuery} instead, as it has better performance.
*
* @see RegexTermsEnum
*/
public class RegexQuery extends MultiTermQuery implements RegexQueryCapable {
private RegexCapabilities regexImpl = new JavaUtilRegexCapabilities();
private Term term;
/** Constructs a query for terms matching <code>term</code>. */
public RegexQuery(Term term) {
super(term.field());
this.term = term;
}
public Term getTerm() {
return term;
}
@Override
public void setRegexImplementation(RegexCapabilities impl) {
this.regexImpl = impl;
}
@Override
public RegexCapabilities getRegexImplementation() {
return regexImpl;
}
@Override
protected FilteredTermsEnum getTermsEnum(Terms terms, AttributeSource atts) throws IOException {
return new RegexTermsEnum(terms.iterator(), term, regexImpl);
}
@Override
public String toString(String field) {
StringBuilder buffer = new StringBuilder();
if (!term.field().equals(field)) {
buffer.append(term.field());
buffer.append(":");
}
buffer.append(term.text());
return buffer.toString();
}
@Override
public int hashCode() {
final int prime = 31;
int result = super.hashCode();
result = prime * result + ((regexImpl == null) ? 0 : regexImpl.hashCode());
result = prime * result + ((term == null) ? 0 : term.hashCode());
return result;
}
@Override
public boolean equals(Object obj) {
if (this == obj) {
return true;
}
if (!super.equals(obj)) {
return false;
}
if (getClass() != obj.getClass()) {
return false;
}
RegexQuery other = (RegexQuery) obj;
if (regexImpl == null) {
if (other.regexImpl != null) {
return false;
}
} else if (!regexImpl.equals(other.regexImpl)) {
return false;
}
if (term == null) {
if (other.term != null) {
return false;
}
} else if (!term.equals(other.term)) {
return false;
}
return true;
}
}

View File

@ -1,36 +0,0 @@
package org.apache.lucene.sandbox.queries.regex;
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/**
* Defines methods for regular expression supporting Querys to use.
*/
public interface RegexQueryCapable {
/**
* Defines which {@link RegexCapabilities} implementation is used by this instance.
* @see #getRegexImplementation()
*/
void setRegexImplementation(RegexCapabilities impl);
/**
* Returns the implementation used by this instance.
* @see #setRegexImplementation(RegexCapabilities)
*/
RegexCapabilities getRegexImplementation();
}

View File

@ -1,63 +0,0 @@
package org.apache.lucene.sandbox.queries.regex;
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
import org.apache.lucene.index.FilteredTermsEnum;
import org.apache.lucene.index.Term;
import org.apache.lucene.index.TermsEnum;
import org.apache.lucene.util.BytesRef;
import org.apache.lucene.util.StringHelper;
/**
* Subclass of FilteredTermEnum for enumerating all terms that match the
* specified regular expression term using the specified regular expression
* implementation.
* <p>
* Term enumerations are always ordered by Term.compareTo(). Each term in
* the enumeration is greater than all that precede it.
*/
public class RegexTermsEnum extends FilteredTermsEnum {
private RegexCapabilities.RegexMatcher regexImpl;
private final BytesRef prefixRef;
public RegexTermsEnum(TermsEnum tenum, Term term, RegexCapabilities regexCap) {
super(tenum);
String text = term.text();
this.regexImpl = regexCap.compile(text);
String pre = regexImpl.prefix();
if (pre == null) {
pre = "";
}
setInitialSeekTerm(prefixRef = new BytesRef(pre));
}
@Override
protected AcceptStatus accept(BytesRef term) {
if (StringHelper.startsWith(term, prefixRef)) {
// TODO: set BoostAttr based on distance of
// searchTerm.text() and term().text()
return regexImpl.match(term) ? AcceptStatus.YES : AcceptStatus.NO;
} else {
return AcceptStatus.NO;
}
}
}

View File

@ -1,22 +0,0 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/**
* Regular expression Query.
*/
package org.apache.lucene.sandbox.queries.regex;

View File

@ -1,47 +0,0 @@
package org.apache.lucene.sandbox.queries.regex;
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
import org.apache.lucene.util.BytesRef;
import org.apache.lucene.util.LuceneTestCase;
/**
* Testcase for {@link JakartaRegexpCapabilities}
*/
public class TestJakartaRegexpCapabilities extends LuceneTestCase {
public void testGetPrefix(){
JakartaRegexpCapabilities cap = new JakartaRegexpCapabilities();
RegexCapabilities.RegexMatcher matcher = cap.compile("luc[e]?");
assertTrue(matcher.match(new BytesRef("luce")));
assertEquals("luc", matcher.prefix());
matcher = cap.compile("lucene");
assertTrue(matcher.match(new BytesRef("lucene")));
assertEquals("lucene", matcher.prefix());
}
public void testShakyPrefix(){
JakartaRegexpCapabilities cap = new JakartaRegexpCapabilities();
RegexCapabilities.RegexMatcher matcher = cap.compile("(ab|ac)");
assertTrue(matcher.match(new BytesRef("ab")));
assertTrue(matcher.match(new BytesRef("ac")));
// why is it not a???
assertNull(matcher.prefix());
}
}

View File

@ -1,135 +0,0 @@
package org.apache.lucene.sandbox.queries.regex;
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
import org.apache.lucene.document.Field;
import org.apache.lucene.search.spans.SpanMultiTermQueryWrapper;
import org.apache.lucene.store.Directory;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.MultiFields;
import org.apache.lucene.index.RandomIndexWriter;
import org.apache.lucene.index.Term;
import org.apache.lucene.index.Terms;
import org.apache.lucene.document.Document;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.index.TermsEnum;
import org.apache.lucene.search.spans.SpanNearQuery;
import org.apache.lucene.search.spans.SpanQuery;
import org.apache.lucene.util.AttributeSource;
import org.apache.lucene.util.LuceneTestCase;
public class TestRegexQuery extends LuceneTestCase {
private IndexSearcher searcher;
private IndexReader reader;
private Directory directory;
private final String FN = "field";
@Override
public void setUp() throws Exception {
super.setUp();
directory = newDirectory();
RandomIndexWriter writer = new RandomIndexWriter(random(), directory);
Document doc = new Document();
doc.add(newTextField(FN, "the quick brown fox jumps over the lazy dog", Field.Store.NO));
writer.addDocument(doc);
reader = writer.getReader();
writer.close();
searcher = newSearcher(reader);
}
@Override
public void tearDown() throws Exception {
reader.close();
directory.close();
super.tearDown();
}
private Term newTerm(String value) { return new Term(FN, value); }
private int regexQueryNrHits(String regex, RegexCapabilities capability) throws Exception {
RegexQuery query = new RegexQuery( newTerm(regex));
if ( capability != null )
query.setRegexImplementation(capability);
return searcher.search(query, 1000).totalHits;
}
private int spanRegexQueryNrHits(String regex1, String regex2, int slop, boolean ordered) throws Exception {
SpanQuery srq1 = new SpanMultiTermQueryWrapper<>(new RegexQuery(newTerm(regex1)));
SpanQuery srq2 = new SpanMultiTermQueryWrapper<>(new RegexQuery(newTerm(regex2)));
SpanNearQuery query = new SpanNearQuery( new SpanQuery[]{srq1, srq2}, slop, ordered);
return searcher.search(query, 1000).totalHits;
}
public void testMatchAll() throws Exception {
Terms terms = MultiFields.getTerms(searcher.getIndexReader(), FN);
TermsEnum te = new RegexQuery(new Term(FN, "jum.")).getTermsEnum(terms, new AttributeSource() /*dummy*/);
// no term should match
assertNull(te.next());
}
public void testRegex1() throws Exception {
assertEquals(1, regexQueryNrHits("^q.[aeiou]c.*$", null));
}
public void testRegex2() throws Exception {
assertEquals(0, regexQueryNrHits("^.[aeiou]c.*$", null));
}
public void testRegex3() throws Exception {
assertEquals(0, regexQueryNrHits("^q.[aeiou]c$", null));
}
public void testSpanRegex1() throws Exception {
assertEquals(1, spanRegexQueryNrHits("^q.[aeiou]c.*$", "dog", 6, true));
}
public void testSpanRegex2() throws Exception {
assertEquals(0, spanRegexQueryNrHits("^q.[aeiou]c.*$", "dog", 5, true));
}
public void testEquals() throws Exception {
RegexQuery query1 = new RegexQuery( newTerm("foo.*"));
query1.setRegexImplementation(new JakartaRegexpCapabilities());
RegexQuery query2 = new RegexQuery( newTerm("foo.*"));
assertFalse(query1.equals(query2));
}
public void testJakartaCaseSensativeFail() throws Exception {
assertEquals(0, regexQueryNrHits("^.*DOG.*$", null));
}
public void testJavaUtilCaseSensativeFail() throws Exception {
assertEquals(0, regexQueryNrHits("^.*DOG.*$", null));
}
public void testJakartaCaseInsensative() throws Exception {
assertEquals(1, regexQueryNrHits("^.*DOG.*$", new JakartaRegexpCapabilities(JakartaRegexpCapabilities.FLAG_MATCH_CASEINDEPENDENT)));
}
public void testJavaUtilCaseInsensative() throws Exception {
assertEquals(1, regexQueryNrHits("^.*DOG.*$", new JavaUtilRegexCapabilities(JavaUtilRegexCapabilities.FLAG_CASE_INSENSITIVE)));
}
}

View File

@ -1,83 +0,0 @@
package org.apache.lucene.sandbox.queries.regex;
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.MockAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.Term;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.spans.SpanFirstQuery;
import org.apache.lucene.search.spans.SpanMultiTermQueryWrapper;
import org.apache.lucene.search.spans.SpanQuery;
import org.apache.lucene.store.Directory;
import org.apache.lucene.util.IOUtils;
import org.apache.lucene.util.LuceneTestCase;
public class TestSpanRegexQuery extends LuceneTestCase {
Directory indexStoreA;
Directory indexStoreB;
@Override
public void setUp() throws Exception {
super.setUp();
indexStoreA = newDirectory();
indexStoreB = newDirectory();
}
@Override
public void tearDown() throws Exception {
indexStoreA.close();
indexStoreB.close();
super.tearDown();
}
public void testSpanRegex() throws Exception {
Directory directory = newDirectory();
Analyzer analyzer = new MockAnalyzer(random());
IndexWriter writer = new IndexWriter(directory, newIndexWriterConfig(analyzer));
Document doc = new Document();
// doc.add(newField("field", "the quick brown fox jumps over the lazy dog",
// Field.Store.NO, Field.Index.ANALYZED));
// writer.addDocument(doc);
// doc = new Document();
doc.add(newTextField("field", "auto update", Field.Store.NO));
writer.addDocument(doc);
doc = new Document();
doc.add(newTextField("field", "first auto update", Field.Store.NO));
writer.addDocument(doc);
writer.forceMerge(1);
writer.close();
IndexReader reader = DirectoryReader.open(directory);
IndexSearcher searcher = newSearcher(reader);
SpanQuery srq = new SpanMultiTermQueryWrapper<>(new RegexQuery(new Term("field", "aut.*")));
SpanFirstQuery sfq = new SpanFirstQuery(srq, 1);
// SpanNearQuery query = new SpanNearQuery(new SpanQuery[] {srq, stq}, 6,
// true);
int numHits = searcher.search(sfq, 1000).totalHits;
assertEquals(1, numHits);
IOUtils.close(reader, directory, analyzer);
}
}