SOLR-934 -- A MailEntityProcessor to enable indexing mails from POP/IMAP sources into a solr index

git-svn-id: https://svn.apache.org/repos/asf/lucene/solr/trunk@764601 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Shalin Shekhar Mangar 2009-04-13 20:26:10 +00:00
parent 5a6d6cb386
commit 12024eba47
20 changed files with 2381 additions and 17 deletions

View File

@ -893,4 +893,99 @@ ASM library (asm)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
THE POSSIBILITY OF SUCH DAMAGE.
=================================================================================================
The following license applies to JavaMail API 1.4.1 and JavaBeans Activation Framework (JAF) 1.1
-------------------------------------------------------------------------------------------------
COMMON DEVELOPMENT AND DISTRIBUTION LICENSE (CDDL) Version 1.0 1.
Definitions.
1.1. Contributor means each individual or entity that creates or contributes to the creation of Modifications.
1.2. Contributor Version means the combination of the Original Software, prior Modifications used by a Contributor (if any), and the Modifications made by that particular Contributor.
1.3. Covered Software means (a) the Original Software, or (b) Modifications, or (c) the combination of files containing Original Software with files containing Modifications, in each case including portions thereof.
1.4. Executable means the Covered Software in any form other than Source Code.
1.5. Initial Developer means the individual or entity that first makes Original Software available under this License.
1.6. Larger Work means a work which combines Covered Software or portions thereof with code not governed by the terms of this License.
1.7. License means this document.
1.8. Licensable means having the right to grant, to the maximum extent possible, whether at the time of the initial grant or subsequently acquired, any and all of the rights conveyed herein.
1.9. Modifications means the Source Code and Executable form of any of the following: A. Any file that results from an addition to, deletion from or modification of the contents of a file containing Original Software or previous Modifications; B. Any new file that contains any part of the Original Software or previous Modification; or C. Any new file that is contributed or otherwise made available under the terms of this License.
1.10. Original Software means the Source Code and Executable form of computer software code that is originally released under this License.
1.11. Patent Claims means any patent claim(s), now owned or hereafter acquired, including without limitation, method, process, and apparatus claims, in any patent Licensable by grantor.
1.12. Source Code means (a) the common form of computer software code in which modifications are made and (b) associated documentation included in or with such code.
1.13. You (or Your) means an individual or a legal entity exercising rights under, and complying with all of the terms of, this License. For legal entities, You includes any entity which controls, is controlled by, or is under common control with You. For purposes of this definition, control means (a) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (b) ownership of more than fifty percent (50%) of the outstanding shares or beneficial ownership of such entity.
2. License Grants.
2.1. The Initial Developer Grant. Conditioned upon Your compliance with Section 3.1 below and subject to third party intellectual property claims, the Initial Developer hereby grants You a world-wide, royalty-free, non-exclusive license:
(a) under intellectual property rights (other than patent or trademark) Licensable by Initial Developer, to use, reproduce, modify, display, perform, sublicense and distribute the Original Software (or portions thereof), with or without Modifications, and/or as part of a Larger Work; and
(b) under Patent Claims infringed by the making, using or selling of Original Software, to make, have made, use, practice, sell, and offer for sale, and/or otherwise dispose of the Original Software (or portions thereof);
(c) The licenses granted in Sections 2.1(a) and (b) are effective on the date Initial Developer first distributes or otherwise makes the Original Software available to a third party under the terms of this License;
(d) Notwithstanding Section 2.1(b) above, no patent license is granted: (1) for code that You delete from the Original Software, or (2) for infringements caused by: (i) the modification of the Original Software, or (ii) the combination of the Original Software with other software or devices.
2.2. Contributor Grant. Conditioned upon Your compliance with Section 3.1 below and subject to third party intellectual property claims, each Contributor hereby grants You a world-wide, royalty-free, non-exclusive license:
(a) under intellectual property rights (other than patent or trademark) Licensable by Contributor to use, reproduce, modify, display, perform, sublicense and distribute the Modifications created by such Contributor (or portions thereof), either on an unmodified basis, with other Modifications, as Covered Software and/or as part of a Larger Work; and
(b) under Patent Claims infringed by the making, using, or selling of Modifications made by that Contributor either alone and/or in combination with its Contributor Version (or portions of such combination), to make, use, sell, offer for sale, have made, and/or otherwise dispose of: (1) Modifications made by that Contributor (or portions thereof); and (2) the combination of Modifications made by that Contributor with its Contributor Version (or portions of such combination).
(c) The licenses granted in Sections 2.2(a) and 2.2(b) are effective on the date Contributor first distributes or otherwise makes the Modifications available to a third party.
(d) Notwithstanding Section 2.2(b) above, no patent license is granted: (1) for any code that Contributor has deleted from the Contributor Version; (2) for infringements caused by: (i) third party modifications of Contributor Version, or (ii) the combination of Modifications made by that Contributor with other software (except as part of the Contributor Version) or other devices; or (3) under Patent Claims infringed by Covered Software in the absence of Modifications made by that Contributor.
3. Distribution Obligations.
3.1. Availability of Source Code. Any Covered Software that You distribute or otherwise make available in Executable form must also be made available in Source Code form and that Source Code form must be distributed only under the terms of this License. You must include a copy of this License with every copy of the Source Code form of the Covered Software You distribute or otherwise make available. You must inform recipients of any such Covered Software in Executable form as to how they can obtain such Covered Software in Source Code form in a reasonable manner on or through a medium customarily used for software exchange.
3.2. Modifications. The Modifications that You create or to which You contribute are governed by the terms of this License. You represent that You believe Your Modifications are Your original creation(s) and/or You have sufficient rights to grant the rights conveyed by this License.
3.3. Required Notices. You must include a notice in each of Your Modifications that identifies You as the Contributor of the Modification. You may not remove or alter any copyright, patent or trademark notices contained within the Covered Software, or any notices of licensing or any descriptive text giving attribution to any Contributor or the Initial Developer.
3.4. Application of Additional Terms. You may not offer or impose any terms on any Covered Software in Source Code form that alters or restricts the applicable version of this License or the recipients rights hereunder. You may choose to offer, and to charge a fee for, warranty, support, indemnity or liability obligations to one or more recipients of Covered Software. However, you may do so only on Your own behalf, and not on behalf of the Initial Developer or any Contributor. You must make it absolutely clear that any such warranty, support, indemnity or liability obligation is offered by You alone, and You hereby agree to indemnify the Initial Developer and every Contributor for any liability incurred by the Initial Developer or such Contributor as a result of warranty, support, indemnity or liability terms You offer.
3.5. Distribution of Executable Versions. You may distribute the Executable form of the Covered Software under the terms of this License or under the terms of a license of Your choice, which may contain terms different from this License, provided that You are in compliance with the terms of this License and that the license for the Executable form does not attempt to limit or alter the recipients rights in the Source Code form from the rights set forth in this License. If You distribute the Covered Software in Executable form under a different license, You must make it absolutely clear that any terms which differ from this License are offered by You alone, not by the Initial Developer or Contributor. You hereby agree to indemnify the Initial Developer and every Contributor for any liability incurred by the Initial Developer or such Contributor as a result of any such terms You offer.
3.6. Larger Works. You may create a Larger Work by combining Covered Software with other code not governed by the terms of this License and distribute the Larger Work as a single product. In such a case, You must make sure the requirements of this License are fulfilled for the Covered Software.
4. Versions of the License.
4.1. New Versions. Sun Microsystems, Inc. is the initial license steward and may publish revised and/or new versions of this License from time to time. Each version will be given a distinguishing version number. Except as provided in Section 4.3, no one other than the license steward has the right to modify this License.
4.2. Effect of New Versions. You may always continue to use, distribute or otherwise make the Covered Software available under the terms of the version of the License under which You originally received the Covered Software. If the Initial Developer includes a notice in the Original Software prohibiting it from being distributed or otherwise made available under any subsequent version of the License, You must distribute and make the Covered Software available under the terms of the version of the License under which You originally received the Covered Software. Otherwise, You may also choose to use, distribute or otherwise make the Covered Software available under the terms of any subsequent version of the License published by the license steward.
4.3. Modified Versions. When You are an Initial Developer and You want to create a new license for Your Original Software, You may create and use a modified version of this License if You: (a) rename the license and remove any references to the name of the license steward (except to note that the license differs from this License); and (b) otherwise make it clear that the license contains terms which differ from this License.
5. DISCLAIMER OF WARRANTY. COVERED SOFTWARE IS PROVIDED UNDER THIS LICENSE ON AN AS IS BASIS, WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, WITHOUT LIMITATION, WARRANTIES THAT THE COVERED SOFTWARE IS FREE OF DEFECTS, MERCHANTABLE, FIT FOR A PARTICULAR PURPOSE OR NON-INFRINGING. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE COVERED SOFTWARE IS WITH YOU. SHOULD ANY COVERED SOFTWARE PROVE DEFECTIVE IN ANY RESPECT, YOU (NOT THE INITIAL DEVELOPER OR ANY OTHER CONTRIBUTOR) ASSUME THE COST OF ANY NECESSARY SERVICING, REPAIR OR CORRECTION. THIS DISCLAIMER OF WARRANTY CONSTITUTES AN ESSENTIAL PART OF THIS LICENSE. NO USE OF ANY COVERED SOFTWARE IS AUTHORIZED HEREUNDER EXCEPT UNDER THIS DISCLAIMER.
6. TERMINATION.
6.1. This License and the rights granted hereunder will terminate automatically if You fail to comply with terms herein and fail to cure such breach within 30 days of becoming aware of the breach. Provisions which, by their nature, must remain in effect beyond the termination of this License shall survive.
6.2. If You assert a patent infringement claim (excluding declaratory judgment actions) against Initial Developer or a Contributor (the Initial Developer or Contributor against whom You assert such claim is referred to as Participant) alleging that the Participant Software (meaning the Contributor Version where the Participant is a Contributor or the Original Software where the Participant is the Initial Developer) directly or indirectly infringes any patent, then any and all rights granted directly or indirectly to You by such Participant, the Initial Developer (if the Initial Developer is not the Participant) and all Contributors under Sections 2.1 and/or 2.2 of this License shall, upon 60 days notice from Participant terminate prospectively and automatically at the expiration of such 60 day notice period, unless if within such 60 day period You withdraw Your claim with respect to the Participant Software against such Participant either unilaterally or pursuant to a written agreement with Participant.
6.3. In the event of termination under Sections 6.1 or 6.2 above, all end user licenses that have been validly granted by You or any distributor hereunder prior to termination (excluding licenses granted to You by any distributor) shall survive termination.
7. LIMITATION OF LIABILITY. UNDER NO CIRCUMSTANCES AND UNDER NO LEGAL THEORY, WHETHER TORT (INCLUDING NEGLIGENCE), CONTRACT, OR OTHERWISE, SHALL YOU, THE INITIAL DEVELOPER, ANY OTHER CONTRIBUTOR, OR ANY DISTRIBUTOR OF COVERED SOFTWARE, OR ANY SUPPLIER OF ANY OF SUCH PARTIES, BE LIABLE TO ANY PERSON FOR ANY INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES OF ANY CHARACTER INCLUDING, WITHOUT LIMITATION, DAMAGES FOR LOST PROFITS, LOSS OF GOODWILL, WORK STOPPAGE, COMPUTER FAILURE OR MALFUNCTION, OR ANY AND ALL OTHER COMMERCIAL DAMAGES OR LOSSES, EVEN IF SUCH PARTY SHALL HAVE BEEN INFORMED OF THE POSSIBILITY OF SUCH DAMAGES. THIS LIMITATION OF LIABILITY SHALL NOT APPLY TO LIABILITY FOR DEATH OR PERSONAL INJURY RESULTING FROM SUCH PARTYS NEGLIGENCE TO THE EXTENT APPLICABLE LAW PROHIBITS SUCH LIMITATION. SOME JURISDICTIONS DO NOT ALLOW THE EXCLUSION OR LIMITATION OF INCIDENTAL OR CONSEQUENTIAL DAMAGES, SO THIS EXCLUSION AND LIMITATION MAY NOT APPLY TO YOU.
8. U.S. GOVERNMENT END USERS. The Covered Software is a commercial item, as that term is defined in 48 C.F.R. 2.101 (Oct. 1995), consisting of commercial computer software (as that term is defined at 48 C.F.R. 252.227-7014(a)(1)) and commercial computer software documentation as such terms are used in 48 C.F.R. 12.212 (Sept. 1995). Consistent with 48 C.F.R. 12.212 and 48 C.F.R. 227.7202-1 through 227.7202-4 (June 1995), all U.S. Government End Users acquire Covered Software with only those rights set forth herein. This U.S. Government Rights clause is in lieu of, and supersedes, any other FAR, DFAR, or other clause or provision that addresses Government rights in computer software under this License.
9. MISCELLANEOUS. This License represents the complete agreement concerning subject matter hereof. If any provision of this License is held to be unenforceable, such provision shall be reformed only to the extent necessary to make it enforceable. This License shall be governed by the law of the jurisdiction specified in a notice contained within the Original Software (except to the extent applicable law, if any, provides otherwise), excluding such jurisdictions conflict-of-law provisions. Any litigation relating to this License shall be subject to the jurisdiction of the courts located in the jurisdiction and venue specified in a notice contained within the Original Software, with the losing party responsible for costs, including, without limitation, court costs and reasonable attorneys fees and expenses. The application of the United Nations Convention on Contracts for the International Sale of Goods is expressly excluded. Any law or regulation which provides that the language of a contract shall be construed against the drafter shall not apply to this License. You agree that You alone are responsible for compliance with the United States export administration regulations (and the export control laws and regulation of any other countries) when You use, distribute or otherwise make available any Covered Software.
10. RESPONSIBILITY FOR CLAIMS. As between Initial Developer and the Contributors, each party is responsible for claims and damages arising, directly or indirectly, out of its utilization of rights under this License and You agree to work with Initial Developer and Contributors to distribute such responsibility on an equitable basis. Nothing herein is intended or shall be deemed to constitute any admission of liability.
NOTICE PURSUANT TO SECTION 9 OF THE COMMON DEVELOPMENT AND DISTRIBUTION LICENSE (CDDL) The code released under the CDDL shall be governed by the laws of the State of California (excluding conflict-of-law provisions). Any litigation relating to this License shall be subject to the jurisdiction of the Federal Courts of the Northern District of California and the state courts of the State of California, with venue lying in Santa Clara County, California.

View File

@ -28,6 +28,13 @@ License: The BSD License (http://www.opensource.org/licenses/bsd-license.php)
This product includes a JUnit jar: http://junit.sourceforge.net/
License: Common Public License - v 1.0 (http://junit.sourceforge.net/cpl-v10.html)
This product includes the JavaMail API 1.4.1 jar: https://glassfish.dev.java.net/javaee5/mail/
License: Common Development and Distribution License (CDDL) v1.0 (https://glassfish.dev.java.net/public/CDDLv1.0.html)
This product includes the JavaBeans Activation Framework (JAF) 1.1 jar: http://java.sun.com/products/javabeans/jaf/index.jsp
License: Common Development and Distribution License (CDDL) v1.0 (https://glassfish.dev.java.net/public/CDDLv1.0.html)
=========================================================================
== Apache Lucene Notice ==
=========================================================================

View File

@ -488,6 +488,8 @@
<solr-jar destfile="${dist}/apache-solr-dataimporthandler-src-${version}.jar"
basedir="contrib/dataimporthandler/src/main/java" />
<solr-jar destfile="${dist}/apache-solr-dataimporthandler-extras-src-${version}.jar"
basedir="contrib/dataimporthandler/src/extras/main/java" />
<solr-jar destfile="${dist}/apache-solr-cell-src-${version}.jar"
basedir="contrib/extraction/src" />
@ -683,6 +685,7 @@
<sign-maven-dependency-artifacts artifact.id="solr-commons-csv"/>
<sign-maven-artifacts artifact.id="solr-core"/>
<sign-maven-artifacts artifact.id="solr-dataimporthandler"/>
<sign-maven-artifacts artifact.id="solr-dataimporthandler-extras"/>
<sign-maven-artifacts artifact.id="solr-cell"/>
<sign-maven-dependency-artifacts artifact.id="solr-lucene-analyzers"/>
<sign-maven-dependency-artifacts artifact.id="solr-lucene-core"/>
@ -767,6 +770,15 @@
</artifact-attachments>
</m2-deploy>
<m2-deploy pom.xml="contrib/dataimporthandler/solr-dataimporthandler-extras-pom.xml.template"
jar.file="${dist}/apache-solr-dataimporthandler-extras-${version}.jar">
<artifact-attachments>
<attach file="${dist}/apache-solr-dataimporthandler-extras-src-${version}.jar" classifier="sources"/>
<attach file="${dist}/apache-solr-dataimporthandler-docs-${version}.jar" classifier="javadoc"/>
</artifact-attachments>
</m2-deploy>
<m2-deploy pom.xml="contrib/extraction/solr-cell-pom.xml.template"
jar.file="${dist}/apache-solr-cell-${version}.jar">

View File

@ -120,6 +120,9 @@ New Features
28.SOLR-1083: An Evaluator for escaping query characters.
(Noble Paul, shalin)
29.SOLR-934: A MailEntityProcessor to enable indexing mails from POP/IMAP sources into a solr index.
(Preetam Rao, shalin)
Optimizations
----------------------
1. SOLR-846: Reduce memory consumption during delta import by removing keys when used

View File

@ -20,7 +20,8 @@
<project name="solr-dataimporthandler" default="build">
<property name="solr-path" value="../.." />
<property name="tikalibs-path" value="../extraction/lib" />
<import file="../../common-build.xml"/>
<description>
@ -35,7 +36,16 @@
<path id="common.classpath">
<pathelement location="${solr-path}/build/solr" />
<pathelement location="${solr-path}/build/solrj" />
<fileset dir="${solr-path}/lib" includes="*.jar"></fileset>
<fileset dir="${solr-path}/lib" includes="*.jar"/>
</path>
<path id="extras.classpath">
<pathelement location="${solr-path}/build/solr" />
<pathelement location="${solr-path}/build/solrj" />
<pathelement location="target/classes" />
<fileset dir="${solr-path}/lib" includes="*.jar"/>
<fileset dir="lib/" includes="*.jar"/>
<fileset dir="${tikalibs-path}" includes="*.jar"/>
</path>
<path id="test.classpath">
@ -45,12 +55,26 @@
<pathelement path="target/test-classes" />
<pathelement path="${java.class.path}"/>
</path>
<path id="test.extras.classpath">
<path refid="extras.classpath" />
<path refid="classpath.jetty" />
<pathelement path="target/classes" />
<pathelement path="target/extras/classes" />
<pathelement path="target/test-classes" />
<pathelement path="target/extras/test-classes" />
<pathelement path="${java.class.path}"/>
</path>
<target name="clean">
<delete failonerror="false" dir="target"/>
<delete failonerror="false">
<fileset dir="src/test/resources" includes="**/dataimport.properties" />
</delete>
<!-- Clean up examples -->
<delete failonerror="false">
<fileset dir="${example}/example-DIH/solr/mail/lib" includes="*.jar" />
</delete>
</target>
<target name="init">
@ -66,10 +90,19 @@
<src path="src/main/java" />
</solr-javac>
</target>
<target name="compileExtras" depends="compile">
<solr-javac destdir="target/extras/classes"
classpathref="extras.classpath">
<src path="src/extras/main/java" />
</solr-javac>
</target>
<target name="build" depends="compile">
<target name="build" depends="compile,compileExtras">
<solr-jar destfile="target/${fullnamever}.jar" basedir="target/classes"
manifest="${common.dir}/${dest}/META-INF/MANIFEST.MF" />
<solr-jar destfile="target/apache-${ant.project.name}-extras-${version}.jar" basedir="target/extras/classes"
manifest="${common.dir}/${dest}/META-INF/MANIFEST.MF" />
</target>
<target name="compileTests" depends="compile">
@ -78,8 +111,17 @@
<src path="src/test/java" />
</solr-javac>
</target>
<target name="compileExtrasTests" depends="compileExtras">
<solr-javac destdir="target/extras/test-classes"
classpathref="test.classpath">
<src path="src/extras/test/java" />
</solr-javac>
</target>
<target name="test" depends="testCore,testExtras"/>
<target name="test" depends="compileTests">
<target name="testCore" depends="compileTests">
<mkdir dir="${junit.output.dir}"/>
<junit printsummary="on"
@ -101,6 +143,29 @@
<fail if="tests.failed">Tests failed!</fail>
</target>
<target name="testExtras" depends="compileExtrasTests">
<mkdir dir="${junit.output.dir}"/>
<junit printsummary="on"
haltonfailure="no"
errorProperty="tests.failed"
failureProperty="tests.failed"
dir="src/extras/test/resources/"
>
<formatter type="brief" usefile="false" if="junit.details"/>
<classpath refid="test.extras.classpath"/>
<formatter type="xml"/>
<batchtest fork="yes" todir="${junit.output.dir}" unless="testcase">
<fileset dir="src/extras/test/java" includes="${junit.includes}"/>
</batchtest>
<batchtest fork="yes" todir="${junit.output.dir}" if="testcase">
<fileset dir="src/extras/test/java" includes="**/${testcase}.java"/>
</batchtest>
</junit>
<fail if="tests.failed">Tests failed!</fail>
</target>
<target name="dist" depends="build">
<copy todir="../../build/web">
@ -109,6 +174,7 @@
<mkdir dir="../../build/web/WEB-INF/lib"/>
<copy file="target/${fullnamever}.jar" todir="${solr-path}/build/web/WEB-INF/lib"></copy>
<copy file="target/${fullnamever}.jar" todir="${solr-path}/dist"></copy>
<copy file="target/apache-${ant.project.name}-extras-${version}.jar" todir="${solr-path}/dist"></copy>
</target>
<target name="javadoc">
@ -117,6 +183,7 @@
<path id="javadoc.classpath">
<path refid="common.classpath"/>
<path refid="extras.classpath"/>
</path>
<invoke-javadoc
@ -124,11 +191,25 @@
title="${Name} ${version} contrib-${fullnamever} API">
<sources>
<packageset dir="src/main/java"/>
<packageset dir="src/extras/main/java"/>
</sources>
</invoke-javadoc>
</sequential>
</target>
<target name="example" depends="build"/>
<target name="example" depends="build">
<!-- Copy the jar into example-DIH/solr/mail/lib -->
<copy file="target/apache-${ant.project.name}-extras-${version}.jar" todir="${example}/example-DIH/solr/mail/lib"/>
<copy todir="${example}/example-DIH/solr/mail/lib">
<fileset dir="lib">
<include name="**/*.jar"/>
</fileset>
</copy>
<copy todir="${example}/example-DIH/solr/mail/lib">
<fileset dir="${example}/solr/lib">
<include name="**/*.jar"/>
</fileset>
</copy>
</target>
</project>

View File

@ -0,0 +1,2 @@
AnyObjectId[53f82a1c4c492dc810c27317857bbb02afd6fa58] was removed in git history.
Apache SVN contains full history.

View File

@ -0,0 +1,2 @@
AnyObjectId[1d15e793ecd1c709de0739a7d3d818266c2e141b] was removed in git history.
Apache SVN contains full history.

View File

@ -0,0 +1,52 @@
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.apache.solr</groupId>
<artifactId>solr-parent</artifactId>
<version>@maven_version@</version>
</parent>
<groupId>org.apache.solr</groupId>
<artifactId>solr-dataimporthandler-extras</artifactId>
<name>Apache Solr DataImportHandler Extras</name>
<version>@maven_version@</version>
<description>Apache Solr DataImportHandler Extras</description>
<packaging>jar</packaging>
<dependencies>
<dependency>
<groupId>javax.activation</groupId>
<artifactId>activation</artifactId>
<version>1.1</version>
</dependency>
<dependency>
<groupId>javax.mail</groupId>
<artifactId>mail</artifactId>
<version>1.4.1</version>
</dependency>
</dependencies>
</project>

View File

@ -0,0 +1,614 @@
/**
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.solr.handler.dataimport;
import com.sun.mail.imap.IMAPMessage;
import org.apache.tika.config.TikaConfig;
import org.apache.tika.utils.ParseUtils;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import javax.mail.*;
import javax.mail.internet.AddressException;
import javax.mail.internet.ContentType;
import javax.mail.internet.InternetAddress;
import javax.mail.internet.MimeMessage;
import javax.mail.search.AndTerm;
import javax.mail.search.ComparisonTerm;
import javax.mail.search.ReceivedDateTerm;
import javax.mail.search.SearchTerm;
import java.io.InputStream;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.*;
/**
* An EntityProcessor instance which can index emails along with their attachments from POP3 or IMAP sources. Refer to
* <a href="http://wiki.apache.org/solr/DataImportHandler">http://wiki.apache.org/solr/DataImportHandler</a> for more
* details. <b>This API is experimental and subject to change</b>
*
* @version $Id$
* @since solr 1.4
*/
public class MailEntityProcessor extends EntityProcessorBase {
public interface CustomFilter {
public SearchTerm getCustomSearch(Folder folder);
}
public void init(Context context) {
super.init(context);
// set attributes using XXX getXXXFromContext(attribute, defualtValue);
// applies variable resolver and return default if value is not found or null
// REQUIRED : connection and folder info
user = getStringFromContext("user", null);
password = getStringFromContext("password", null);
host = getStringFromContext("host", null);
protocol = getStringFromContext("protocol", null);
folderNames = getStringFromContext("folders", null);
// validate
if (host == null || protocol == null || user == null || password == null
|| folderNames == null)
throw new DataImportHandlerException(DataImportHandlerException.SEVERE,
"'user|password|protocol|host|folders' are required attributes");
//OPTIONAL : have defaults and are optional
recurse = getBoolFromContext("recurse", true);
String excludes = getStringFromContext("exclude", "");
if (excludes != null && !excludes.trim().equals("")) {
exclude = Arrays.asList(excludes.split(","));
}
String includes = getStringFromContext("include", "");
if (includes != null && !includes.trim().equals("")) {
include = Arrays.asList(includes.split(","));
}
batchSize = getIntFromContext("batchSize", 20);
customFilter = getStringFromContext("customFilter", "");
String s = getStringFromContext("fetchMailsSince", "");
if (s != null)
try {
fetchMailsSince = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss").parse(s);
} catch (ParseException e) {
throw new DataImportHandlerException(DataImportHandlerException.SEVERE, "Invalid value for fetchMailSince: " + s, e);
}
fetchSize = getIntFromContext("fetchSize", 32 * 1024);
cTimeout = getIntFromContext("connectTimeout", 30 * 1000);
rTimeout = getIntFromContext("readTimeout", 60 * 1000);
processAttachment = getBoolFromContext("processAttachement", true);
logConfig();
}
public Map<String, Object> nextRow() {
Message mail;
Map<String, Object> row = null;
do {
// try till there is a valid document or folders get exhausted.
// when mail == NULL, it means end of processing
mail = getNextMail();
if (mail != null)
row = getDocumentFromMail(mail);
}
while (row == null && mail != null);
if (row != null) {
row = super.applyTransformer(row);
logRow(row);
}
return row;
}
private Message getNextMail() {
if (!connected) {
if (!connectToMailBox())
return null;
connected = true;
}
if (folderIter == null) {
createFilters();
folderIter = new FolderIterator(mailbox);
}
// get next message from the folder
// if folder is exhausted get next folder
// loop till a valid mail or all folders exhausted.
while (msgIter == null || !msgIter.hasNext()) {
Folder next = folderIter.hasNext() ? folderIter.next() : null;
if (next == null) {
return null;
}
msgIter = new MessageIterator(next, batchSize);
}
return msgIter.next();
}
private Map<String, Object> getDocumentFromMail(Message mail) {
Map<String, Object> row = new HashMap<String, Object>();
try {
addPartToDocument(mail, row, true);
return row;
} catch (Exception e) {
return null;
}
}
public void addPartToDocument(Part part, Map<String, Object> row, boolean outerMost) throws Exception {
if (part instanceof Message) {
addEnvelopToDocument(part, row);
}
String ct = part.getContentType();
ContentType ctype = new ContentType(ct);
if (part.isMimeType("multipart/*")) {
Multipart mp = (Multipart) part.getContent();
int count = mp.getCount();
if (part.isMimeType("multipart/alternative"))
count = 1;
for (int i = 0; i < count; i++)
addPartToDocument(mp.getBodyPart(i), row, false);
} else if (part.isMimeType("message/rfc822")) {
addPartToDocument((Part) part.getContent(), row, false);
} else {
String disp = part.getDisposition();
if (disp != null && disp.equalsIgnoreCase(Part.ATTACHMENT)
&& !processAttachment)
return;
InputStream is = part.getInputStream();
String fileName = part.getFileName();
String content = ParseUtils.getStringContent(is, TikaConfig.getDefaultConfig(), ctype.getBaseType().toLowerCase());
if (disp != null && disp.equalsIgnoreCase(Part.ATTACHMENT)) {
if (row.get(ATTACHMENT) == null)
row.put(ATTACHMENT, new ArrayList<String>());
List<String> contents = (List<String>) row.get(ATTACHMENT);
contents.add(content);
row.put(ATTACHMENT, contents);
if (row.get(ATTACHMENT_NAMES) == null)
row.put(ATTACHMENT_NAMES, new ArrayList<String>());
List<String> names = (List<String>) row.get(ATTACHMENT_NAMES);
names.add(fileName);
row.put(ATTACHMENT_NAMES, names);
} else {
if (row.get(CONTENT) == null)
row.put(CONTENT, new ArrayList<String>());
List<String> contents = (List<String>) row.get(CONTENT);
contents.add(content);
row.put(CONTENT, contents);
}
}
}
private void addEnvelopToDocument(Part part, Map<String, Object> row) throws MessagingException {
MimeMessage mail = (MimeMessage) part;
Address[] adresses;
if ((adresses = mail.getFrom()) != null && adresses.length > 0)
row.put(FROM, adresses[0].toString());
List<String> to = new ArrayList<String>();
if ((adresses = mail.getRecipients(Message.RecipientType.TO)) != null)
addAddressToList(adresses, to);
if ((adresses = mail.getRecipients(Message.RecipientType.CC)) != null)
addAddressToList(adresses, to);
if ((adresses = mail.getRecipients(Message.RecipientType.BCC)) != null)
addAddressToList(adresses, to);
if (to.size() > 0)
row.put(TO_CC_BCC, to);
row.put(MESSAGE_ID, mail.getMessageID());
row.put(SUBJECT, mail.getSubject());
Date d = mail.getSentDate();
if (d != null) {
row.put(SENT_DATE, d);
}
List<String> flags = new ArrayList<String>();
for (Flags.Flag flag : mail.getFlags().getSystemFlags()) {
if (flag == Flags.Flag.ANSWERED)
flags.add(FLAG_ANSWERED);
else if (flag == Flags.Flag.DELETED)
flags.add(FLAG_DELETED);
else if (flag == Flags.Flag.DRAFT)
flags.add(FLAG_DRAFT);
else if (flag == Flags.Flag.FLAGGED)
flags.add(FLAG_FLAGGED);
else if (flag == Flags.Flag.RECENT)
flags.add(FLAG_RECENT);
else if (flag == Flags.Flag.SEEN)
flags.add(FLAG_SEEN);
}
flags.addAll(Arrays.asList(mail.getFlags().getUserFlags()));
row.put(FLAGS, flags);
String[] hdrs = mail.getHeader("X-Mailer");
if (hdrs != null)
row.put(XMAILER, hdrs[0]);
}
private void addAddressToList(Address[] adresses, List<String> to) throws AddressException {
for (Address address : adresses) {
to.add(address.toString());
InternetAddress ia = (InternetAddress) address;
if (ia.isGroup()) {
InternetAddress[] group = ia.getGroup(false);
for (InternetAddress member : group)
to.add(member.toString());
}
}
}
private boolean connectToMailBox() {
try {
Properties props = new Properties();
props.setProperty("mail.store.protocol", protocol);
props.setProperty("mail.imap.fetchsize", "" + fetchSize);
props.setProperty("mail.imap.timeout", "" + rTimeout);
props.setProperty("mail.imap.connectiontimeout", "" + cTimeout);
Session session = Session.getDefaultInstance(props, null);
mailbox = session.getStore(protocol);
mailbox.connect(host, user, password);
LOG.info("Connected to mailbox");
return true;
} catch (MessagingException e) {
throw new DataImportHandlerException(DataImportHandlerException.SEVERE,
"Connection failed", e);
}
}
private void createFilters() {
if (fetchMailsSince != null) {
filters.add(new MailsSinceLastCheckFilter(fetchMailsSince));
}
if (customFilter != null && !customFilter.equals("")) {
try {
Class cf = Class.forName(customFilter);
Object obj = cf.newInstance();
if (obj instanceof CustomFilter) {
filters.add((CustomFilter) obj);
}
} catch (Exception e) {
throw new DataImportHandlerException(DataImportHandlerException.SEVERE,
"Custom filter could not be created", e);
}
}
}
private void logConfig() {
StringBuffer config = new StringBuffer();
config.append("user : ").append(user).append(System.getProperty("line.separator"));
config.append("pwd : ").append(password).append(System.getProperty("line.separator"));
config.append("protocol : ").append(protocol).append(System.getProperty("line.separator"));
config.append("host : ").append(host).append(System.getProperty("line.separator"));
config.append("folders : ").append(folderNames).append(System.getProperty("line.separator"));
config.append("recurse : ").append(recurse).append(System.getProperty("line.separator"));
config.append("exclude : ").append(exclude.toString()).append(System.getProperty("line.separator"));
config.append("include : ").append(include.toString()).append(System.getProperty("line.separator"));
config.append("batchSize : ").append(batchSize).append(System.getProperty("line.separator"));
config.append("fetchSize : ").append(fetchSize).append(System.getProperty("line.separator"));
config.append("read timeout : ").append(rTimeout).append(System.getProperty("line.separator"));
config.append("conection timeout : ").append(cTimeout).append(System.getProperty("line.separator"));
config.append("custom filter : ").append(customFilter).append(System.getProperty("line.separator"));
config.append("fetch mail since : ").append(fetchMailsSince).append(System.getProperty("line.separator"));
LOG.info(config.toString());
}
private void logRow(Map<String, Object> row) {
StringBuffer config = new StringBuffer();
String from = row.get(FROM) == null ? "" : row.get(FROM).toString();
String to = row.get(TO_CC_BCC) == null ? "" : row.get(TO_CC_BCC).toString();
String subject = row.get(SUBJECT) == null ? "" : row.get(SUBJECT).toString();
config.append("From: ").append(from).append("To: ").append(to).append(" " + "Subject: ").append(subject);
LOG.debug("ROW " + (rowCount++) + ": " + config.toString());
}
class FolderIterator implements Iterator<Folder> {
private Store mailbox;
private List<String> topLevelFolders;
private List<Folder> folders = null;
private Folder lastFolder = null;
public FolderIterator(Store mailBox) {
this.mailbox = mailBox;
folders = new ArrayList<Folder>();
getTopLevelFolders(mailBox);
}
public boolean hasNext() {
return !folders.isEmpty();
}
public Folder next() {
try {
boolean hasMessages = false;
Folder next;
do {
if (lastFolder != null) {
lastFolder.close(false);
lastFolder = null;
}
if (folders.isEmpty()) {
mailbox.close();
return null;
}
next = folders.remove(0);
if (next != null) {
String fullName = next.getFullName();
if (!excludeFolder(fullName)) {
hasMessages = (next.getType() & Folder.HOLDS_MESSAGES) != 0;
next.open(Folder.READ_ONLY);
lastFolder = next;
LOG.info("Opened folder : " + fullName);
}
if (recurse && ((next.getType() & Folder.HOLDS_FOLDERS) != 0)) {
Folder[] children = next.list();
LOG.info("Added its children to list : ");
for (int i = children.length - 1; i >= 0; i--) {
folders.add(0, children[i]);
LOG.info("child name : " + children[i].getFullName());
}
if (children.length == 0)
LOG.info("NO children : ");
}
}
}
while (!hasMessages);
return next;
} catch (MessagingException e) {
//throw new DataImportHandlerException(DataImportHandlerException.SEVERE,
// "Folder open failed", e);
}
return null;
}
public void remove() {
throw new UnsupportedOperationException("Its read only mode...");
}
private void getTopLevelFolders(Store mailBox) {
if (folderNames != null)
topLevelFolders = Arrays.asList(folderNames.split(","));
for (int i = 0; topLevelFolders != null && i < topLevelFolders.size(); i++) {
try {
folders.add(mailbox.getFolder(topLevelFolders.get(i)));
} catch (MessagingException e) {
// skip bad ones unless its the last one and still no good folder
if (folders.size() == 0 && i == topLevelFolders.size() - 1)
throw new DataImportHandlerException(DataImportHandlerException.SEVERE,
"Folder retreival failed");
}
}
if (topLevelFolders == null || topLevelFolders.size() == 0) {
try {
folders.add(mailBox.getDefaultFolder());
} catch (MessagingException e) {
throw new DataImportHandlerException(DataImportHandlerException.SEVERE,
"Folder retreival failed");
}
}
}
private boolean excludeFolder(String name) {
for (String s : exclude) {
if (name.matches(s))
return true;
}
for (String s : include) {
if (name.matches(s))
return false;
}
return include.size() > 0;
}
}
class MessageIterator implements Iterator<Message> {
private Folder folder;
private Message[] messagesInCurBatch;
private int current = 0;
private int currentBatch = 0;
private int batchSize = 0;
private int totalInFolder = 0;
private boolean doBatching = true;
public MessageIterator(Folder folder, int batchSize) {
try {
this.folder = folder;
this.batchSize = batchSize;
SearchTerm st = getSearchTerm();
if (st != null) {
doBatching = false;
messagesInCurBatch = folder.search(st);
totalInFolder = messagesInCurBatch.length;
folder.fetch(messagesInCurBatch, fp);
current = 0;
LOG.info("Total messages : " + totalInFolder);
LOG.info("Search criteria applied. Batching disabled");
} else {
totalInFolder = folder.getMessageCount();
LOG.info("Total messages : " + totalInFolder);
getNextBatch(batchSize, folder);
}
} catch (MessagingException e) {
throw new DataImportHandlerException(DataImportHandlerException.SEVERE,
"Message retreival failed", e);
}
}
private void getNextBatch(int batchSize, Folder folder) throws MessagingException {
// after each batch invalidate cache
if (messagesInCurBatch != null) {
for (Message m : messagesInCurBatch) {
if (m instanceof IMAPMessage)
((IMAPMessage) m).invalidateHeaders();
}
}
int lastMsg = (currentBatch + 1) * batchSize;
lastMsg = lastMsg > totalInFolder ? totalInFolder : lastMsg;
messagesInCurBatch = folder.getMessages(currentBatch * batchSize + 1, lastMsg);
folder.fetch(messagesInCurBatch, fp);
current = 0;
currentBatch++;
LOG.info("Current Batch : " + currentBatch);
LOG.info("Messages in this batch : " + messagesInCurBatch.length);
}
public boolean hasNext() {
boolean hasMore = current < messagesInCurBatch.length;
if (!hasMore && doBatching
&& currentBatch * batchSize < totalInFolder) {
// try next batch
try {
getNextBatch(batchSize, folder);
hasMore = current < messagesInCurBatch.length;
} catch (MessagingException e) {
throw new DataImportHandlerException(DataImportHandlerException.SEVERE,
"Message retreival failed", e);
}
}
return hasMore;
}
public Message next() {
return hasNext() ? messagesInCurBatch[current++] : null;
}
public void remove() {
throw new UnsupportedOperationException("Its read only mode...");
}
private SearchTerm getSearchTerm() {
if (filters.size() == 0)
return null;
if (filters.size() == 1)
return filters.get(0).getCustomSearch(folder);
SearchTerm last = filters.get(0).getCustomSearch(folder);
for (int i = 1; i < filters.size(); i++) {
CustomFilter filter = filters.get(i);
SearchTerm st = filter.getCustomSearch(folder);
if (st != null) {
last = new AndTerm(last, st);
}
}
return last;
}
}
class MailsSinceLastCheckFilter implements CustomFilter {
private Date since;
public MailsSinceLastCheckFilter(Date date) {
since = date;
}
public SearchTerm getCustomSearch(Folder folder) {
return new ReceivedDateTerm(ComparisonTerm.GE, since);
}
}
// user settings stored in member variables
private String user;
private String password;
private String host;
private String protocol;
private String folderNames;
private List<String> exclude = new ArrayList<String>();
private List<String> include = new ArrayList<String>();
private boolean recurse;
private int batchSize;
private int fetchSize;
private int cTimeout;
private int rTimeout;
private Date fetchMailsSince;
private String customFilter;
private boolean processAttachment = true;
// holds the current state
private Store mailbox;
private boolean connected = false;
private FolderIterator folderIter;
private MessageIterator msgIter;
private List<CustomFilter> filters = new ArrayList<CustomFilter>();
private static FetchProfile fp = new FetchProfile();
private static final Logger LOG = LoggerFactory.getLogger(DataImporter.class);
// diagnostics
private int rowCount = 0;
static {
fp.add(FetchProfile.Item.ENVELOPE);
fp.add(FetchProfile.Item.FLAGS);
fp.add("X-Mailer");
}
// Fields To Index
// single valued
private static final String MESSAGE_ID = "messageId";
private static final String SUBJECT = "subject";
private static final String FROM = "from";
private static final String SENT_DATE = "sentDate";
private static final String XMAILER = "xMailer";
// multi valued
private static final String TO_CC_BCC = "allTo";
private static final String FLAGS = "flags";
private static final String CONTENT = "content";
private static final String ATTACHMENT = "attachment";
private static final String ATTACHMENT_NAMES = "attachmentNames";
// flag values
private static final String FLAG_ANSWERED = "answered";
private static final String FLAG_DELETED = "deleted";
private static final String FLAG_DRAFT = "draft";
private static final String FLAG_FLAGGED = "flagged";
private static final String FLAG_RECENT = "recent";
private static final String FLAG_SEEN = "seen";
private int getIntFromContext(String prop, int ifNull) {
int v = ifNull;
try {
String val = context.getEntityAttribute(prop);
if (val != null) {
val = context.getVariableResolver().replaceTokens(val);
v = Integer.valueOf(val);
}
} catch (NumberFormatException e) {
//do nothing
}
return v;
}
private boolean getBoolFromContext(String prop, boolean ifNull) {
boolean v = ifNull;
String val = context.getEntityAttribute(prop);
if (val != null) {
val = context.getVariableResolver().replaceTokens(val);
v = Boolean.valueOf(val);
}
return v;
}
private String getStringFromContext(String prop, String ifNull) {
String v = ifNull;
String val = context.getEntityAttribute(prop);
if (val != null) {
val = context.getVariableResolver().replaceTokens(val);
v = val;
}
return v;
}
}

View File

@ -0,0 +1,211 @@
/**
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.solr.handler.dataimport;
import junit.framework.Assert;
import org.apache.solr.common.SolrInputDocument;
import org.junit.Ignore;
import org.junit.Test;
import java.text.ParseException;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
// Test mailbox is like this: foldername(mailcount)
// top1(2) -> child11(6)
// -> child12(0)
// top2(2) -> child21(1)
// -> grandchild211(2)
// -> grandchild212(1)
// -> child22(2)
/**
* Test for MailEntityProcessor. The tests are marked as ignored because we'd need a mail server (real or mocked) for
* these to work.
*
* TODO: Find a way to make the tests actually test code
*
* @version $Id$
* @see org.apache.solr.handler.dataimport.MailEntityProcessor
* @since solr 1.4
*/
public class TestMailEntityProcessor {
// Credentials
private static final String user = "user";
private static final String password = "password";
private static final String host = "host";
private static final String protocol = "imaps";
private static Map<String, String> paramMap = new HashMap<String, String>();
@Test
@Ignore
public void testConnection() {
// also tests recurse = false and default settings
paramMap.put("folders", "top2");
paramMap.put("recurse", "false");
paramMap.put("processAttachement", "false");
DataImporter di = new DataImporter();
di.loadDataConfig(getConfigFromMap(paramMap));
DataConfig.Entity ent = di.getConfig().document.entities.get(0);
ent.isDocRoot = true;
DataImporter.RequestParams rp = new DataImporter.RequestParams();
rp.command = "full-import";
SolrWriterImpl swi = new SolrWriterImpl();
di.runCmd(rp, swi);
Assert.assertEquals("top1 did not return 2 messages", swi.docs.size(), 2);
}
@Test
@Ignore
public void testRecursion() {
paramMap.put("folders", "top2");
paramMap.put("recurse", "true");
paramMap.put("processAttachement", "false");
DataImporter di = new DataImporter();
di.loadDataConfig(getConfigFromMap(paramMap));
DataConfig.Entity ent = di.getConfig().document.entities.get(0);
ent.isDocRoot = true;
DataImporter.RequestParams rp = new DataImporter.RequestParams();
rp.command = "full-import";
SolrWriterImpl swi = new SolrWriterImpl();
di.runCmd(rp, swi);
Assert.assertEquals("top2 and its children did not return 8 messages", swi.docs.size(), 8);
}
@Test
@Ignore
public void testExclude() {
paramMap.put("folders", "top2");
paramMap.put("recurse", "true");
paramMap.put("processAttachement", "false");
paramMap.put("exclude", ".*grandchild.*");
DataImporter di = new DataImporter();
di.loadDataConfig(getConfigFromMap(paramMap));
DataConfig.Entity ent = di.getConfig().document.entities.get(0);
ent.isDocRoot = true;
DataImporter.RequestParams rp = new DataImporter.RequestParams();
rp.command = "full-import";
SolrWriterImpl swi = new SolrWriterImpl();
di.runCmd(rp, swi);
Assert.assertEquals("top2 and its direct children did not return 5 messages", swi.docs.size(), 5);
}
@Test
@Ignore
public void testInclude() {
paramMap.put("folders", "top2");
paramMap.put("recurse", "true");
paramMap.put("processAttachement", "false");
paramMap.put("include", ".*grandchild.*");
DataImporter di = new DataImporter();
di.loadDataConfig(getConfigFromMap(paramMap));
DataConfig.Entity ent = di.getConfig().document.entities.get(0);
ent.isDocRoot = true;
DataImporter.RequestParams rp = new DataImporter.RequestParams();
rp.command = "full-import";
SolrWriterImpl swi = new SolrWriterImpl();
di.runCmd(rp, swi);
Assert.assertEquals("top2 and its direct children did not return 3 messages", swi.docs.size(), 3);
}
@Test
@Ignore
public void testIncludeAndExclude() {
paramMap.put("folders", "top1,top2");
paramMap.put("recurse", "true");
paramMap.put("processAttachement", "false");
paramMap.put("exclude", ".*top1.*");
paramMap.put("include", ".*grandchild.*");
DataImporter di = new DataImporter();
di.loadDataConfig(getConfigFromMap(paramMap));
DataConfig.Entity ent = di.getConfig().document.entities.get(0);
ent.isDocRoot = true;
DataImporter.RequestParams rp = new DataImporter.RequestParams();
rp.command = "full-import";
SolrWriterImpl swi = new SolrWriterImpl();
di.runCmd(rp, swi);
Assert.assertEquals("top2 and its direct children did not return 3 messages", swi.docs.size(), 3);
}
@Test
@Ignore
public void testFetchTimeSince() throws ParseException {
paramMap.put("folders", "top1/child11");
paramMap.put("recurse", "true");
paramMap.put("processAttachement", "false");
paramMap.put("fetchMailsSince", "2008-12-26 00:00:00");
DataImporter di = new DataImporter();
di.loadDataConfig(getConfigFromMap(paramMap));
DataConfig.Entity ent = di.getConfig().document.entities.get(0);
ent.isDocRoot = true;
DataImporter.RequestParams rp = new DataImporter.RequestParams();
rp.command = "full-import";
SolrWriterImpl swi = new SolrWriterImpl();
di.runCmd(rp, swi);
Assert.assertEquals("top2 and its direct children did not return 3 messages", swi.docs.size(), 3);
}
private String getConfigFromMap(Map<String, String> params) {
String conf =
"<dataConfig>" +
"<document>" +
"<entity processor=\"org.apache.solr.handler.dataimport.MailEntityProcessor\" " +
"someconfig" +
"/>" +
"</document>" +
"</dataConfig>";
params.put("user", user);
params.put("password", password);
params.put("host", host);
params.put("protocol", protocol);
StringBuilder attribs = new StringBuilder("");
for (String key : params.keySet())
attribs.append(" ").append(key).append("=" + "\"").append(params.get(key)).append("\"");
attribs.append(" ");
return conf.replace("someconfig", attribs.toString());
}
static class SolrWriterImpl extends SolrWriter {
List<SolrInputDocument> docs = new ArrayList<SolrInputDocument>();
Boolean deleteAllCalled;
Boolean commitCalled;
public SolrWriterImpl() {
super(null, ".");
}
public boolean upload(SolrInputDocument doc) {
return docs.add(doc);
}
public void log(int event, String name, Object row) {
// Do nothing
}
public void doDeleteAll() {
deleteAllCalled = Boolean.TRUE;
}
public void commit(boolean b) {
commitCalled = Boolean.TRUE;
}
}
}

View File

@ -39,6 +39,11 @@ To import data from the slashdot feed, connect to
http://localhost:8983/solr/rss/dataimport?command=full-import
To import data from your imap server
1. Edit the example-DIH/solr/mail/conf/data-config.xml and add details about username, password, imap server
2. Connect to http://localhost:8983/solr/mail/dataimport?command=full-import
See also README.txt in the solr subdirectory, and check
http://wiki.apache.org/solr/DataImportHandler for detailed
usage guide and tutorial.

View File

@ -29,12 +29,6 @@
-->
<abortOnConfigurationError>${solr.abortOnConfigurationError:true}</abortOnConfigurationError>
<!-- Used to specify an alternate directory to hold all index data
other than the default ./data under the Solr home.
If replication is in use, this should match the replication configuration. -->
<dataDir>${solr.data.dir:./solr/db/data}</dataDir>
<indexDefaults>
<!-- Values here affect all index writers and act as a default unless overridden. -->
<useCompoundFile>false</useCompoundFile>

View File

@ -0,0 +1,7 @@
<dataConfig>
<document>
<entity processor="MailEntityProcessor" user="email@gmail.com"
password="password" host="imap.gmail.com" protocol="imaps"
fetchMailsSince="2009-04-01 00:00:00" batchSize="20" folders="inbox"/>
</document>
</dataConfig>

View File

@ -0,0 +1,21 @@
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#-----------------------------------------------------------------------
# Use a protected word file to protect against the stemmer reducing two
# unrelated words to the same base word.
# Some non-words that normally won't be encountered,
# just to test that they won't be stemmed.
dontstems
zwhacky

View File

@ -0,0 +1,370 @@
<?xml version="1.0" encoding="UTF-8" ?>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<!--
This is the Solr schema file. This file should be named "schema.xml" and
should be in the conf directory under the solr home
(i.e. ./solr/conf/schema.xml by default)
or located where the classloader for the Solr webapp can find it.
This example schema is the recommended starting point for users.
It should be kept correct and concise, usable out-of-the-box.
For more information, on how to customize this file, please see
http://wiki.apache.org/solr/SchemaXml
NOTE: this schema includes many optional features and should not
be used for benchmarking.
-->
<schema name="example" version="1.2">
<!-- attribute "name" is the name of this schema and is only used for display purposes.
Applications should change this to reflect the nature of the search collection.
version="1.2" is Solr's version number for the schema syntax and semantics. It should
not normally be changed by applications.
1.0: multiValued attribute did not exist, all fields are multiValued by nature
1.1: multiValued attribute introduced, false by default
1.2: omitTf attribute introduced, true by default -->
<types>
<!-- field type definitions. The "name" attribute is
just a label to be used by field definitions. The "class"
attribute and any other attributes determine the real
behavior of the fieldType.
Class names starting with "solr" refer to java classes in the
org.apache.solr.analysis package.
-->
<!-- The StrField type is not analyzed, but indexed/stored verbatim.
- StrField and TextField support an optional compressThreshold which
limits compression (if enabled in the derived fields) to values which
exceed a certain size (in characters).
-->
<fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>
<!-- boolean type: "true" or "false" -->
<fieldType name="boolean" class="solr.BoolField" sortMissingLast="true" omitNorms="true"/>
<!-- The optional sortMissingLast and sortMissingFirst attributes are
currently supported on types that are sorted internally as strings.
- If sortMissingLast="true", then a sort on this field will cause documents
without the field to come after documents with the field,
regardless of the requested sort order (asc or desc).
- If sortMissingFirst="true", then a sort on this field will cause documents
without the field to come before documents with the field,
regardless of the requested sort order.
- If sortMissingLast="false" and sortMissingFirst="false" (the default),
then default lucene sorting will be used which places docs without the
field first in an ascending sort and last in a descending sort.
-->
<!-- numeric field types that store and index the text
value verbatim (and hence don't support range queries, since the
lexicographic ordering isn't equal to the numeric ordering) -->
<fieldType name="integer" class="solr.IntField" omitNorms="true"/>
<fieldType name="long" class="solr.LongField" omitNorms="true"/>
<fieldType name="float" class="solr.FloatField" omitNorms="true"/>
<fieldType name="double" class="solr.DoubleField" omitNorms="true"/>
<!-- Numeric field types that manipulate the value into
a string value that isn't human-readable in its internal form,
but with a lexicographic ordering the same as the numeric ordering,
so that range queries work correctly. -->
<fieldType name="sint" class="solr.SortableIntField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="slong" class="solr.SortableLongField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="sfloat" class="solr.SortableFloatField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="sdouble" class="solr.SortableDoubleField" sortMissingLast="true" omitNorms="true"/>
<!-- The format for this date field is of the form 1995-12-31T23:59:59Z, and
is a more restricted form of the canonical representation of dateTime
http://www.w3.org/TR/xmlschema-2/#dateTime
The trailing "Z" designates UTC time and is mandatory.
Optional fractional seconds are allowed: 1995-12-31T23:59:59.999Z
All other components are mandatory.
Expressions can also be used to denote calculations that should be
performed relative to "NOW" to determine the value, ie...
NOW/HOUR
... Round to the start of the current hour
NOW-1DAY
... Exactly 1 day prior to now
NOW/DAY+6MONTHS+3DAYS
... 6 months and 3 days in the future from the start of
the current day
Consult the DateField javadocs for more information.
-->
<fieldType name="date" class="solr.DateField" sortMissingLast="true" omitNorms="true"/>
<!--
Numeric field types that manipulate the value into trie encoded strings which are not
human readable in the internal form. Range searches on such fields use the fast Trie Range Queries
which are much faster than range searches on the SortableNumberField types.
For the fast range search to work, trie fields must be indexed. Trie fields are <b>not</b> sortable
in numerical order. Also, they cannot be used in function queries. If one needs sorting as well as
fast range search, one should create a copy field specifically for sorting. Same workaround is
suggested for using trie fields in function queries as well.
For each number being added to this field, multiple terms are generated as per the algorithm described in
org.apache.lucene.search.trie package description. The possible number of terms depend on the precisionStep
attribute and increase dramatically with higher precision steps (factor 2**precisionStep). The default
value of precisionStep is 8.
Note that if you use a precisionStep of 32 for int/float and 64 for long/double, then multiple terms
will not be generated, range search will be no faster than any other number field,
but sorting will be possible.
-->
<fieldType name="tint" class="solr.TrieField" type="integer" omitNorms="true" positionIncrementGap="0" indexed="true" stored="false" />
<fieldType name="tfloat" class="solr.TrieField" type="float" omitNorms="true" positionIncrementGap="0" indexed="true" stored="false" />
<fieldType name="tlong" class="solr.TrieField" type="long" omitNorms="true" positionIncrementGap="0" indexed="true" stored="false" />
<fieldType name="tdouble" class="solr.TrieField" type="double" omitNorms="true" positionIncrementGap="0" indexed="true" stored="false" />
<fieldType name="tdouble4" class="solr.TrieField" type="double" precisionStep="4" omitNorms="true" positionIncrementGap="0" indexed="true" stored="false" />
<!--
This date field manipulates the value into a trie encoded strings for fast range searches. They follow the
same format and semantics as the normal DateField and support the date math syntax except that they are
not sortable and cannot be used in function queries.
-->
<fieldType name="tdate" class="solr.TrieField" type="date" omitNorms="true" positionIncrementGap="0" indexed="true" stored="false" />
<!-- The "RandomSortField" is not used to store or search any
data. You can declare fields of this type it in your schema
to generate psuedo-random orderings of your docs for sorting
purposes. The ordering is generated based on the field name
and the version of the index, As long as the index version
remains unchanged, and the same field name is reused,
the ordering of the docs will be consistent.
If you want differend psuedo-random orderings of documents,
for the same version of the index, use a dynamicField and
change the name
-->
<fieldType name="random" class="solr.RandomSortField" indexed="true" />
<!-- solr.TextField allows the specification of custom text analyzers
specified as a tokenizer and a list of token filters. Different
analyzers may be specified for indexing and querying.
The optional positionIncrementGap puts space between multiple fields of
this type on the same document, with the purpose of preventing false phrase
matching across fields.
For more info on customizing your analyzer chain, please see
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
-->
<!-- One can also specify an existing Analyzer class that has a
default constructor via the class attribute on the analyzer element
<fieldType name="text_greek" class="solr.TextField">
<analyzer class="org.apache.lucene.analysis.el.GreekAnalyzer"/>
</fieldType>
-->
<!-- A text field that only splits on whitespace for exact matching of words -->
<fieldType name="text_ws" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
</analyzer>
</fieldType>
<!-- A text field that uses WordDelimiterFilter to enable splitting and matching of
words on case-change, alpha numeric boundaries, and non-alphanumeric chars,
so that a query of "wifi" or "wi fi" could match a document containing "Wi-Fi".
Synonyms and stopwords are customized by external files, and stemming is enabled.
Duplicate tokens at the same position (which may result from Stemmed Synonyms or
WordDelim parts) are removed.
-->
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<!-- in this example, we will only use synonyms at query time
<filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
-->
<!-- Case insensitive stop word removal.
add enablePositionIncrements=true in both the index and query
analyzers to leave a 'gap' for more accurate phrase queries.
-->
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="stopwords.txt"
enablePositionIncrements="true"
/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="stopwords.txt"
enablePositionIncrements="true"
/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
<!-- Less flexible matching, but less false matches. Probably not ideal for product names,
but may be good for SKUs. Can insert dashes in the wrong place and still match. -->
<fieldType name="textTight" class="solr.TextField" positionIncrementGap="100" >
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="0" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
<!--
Setup simple analysis for spell checking
-->
<fieldType name="textSpell" class="solr.TextField" positionIncrementGap="100" >
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
<!-- charFilter + "CharStream aware" WhitespaceTokenizer -->
<!--
<fieldType name="textCharNorm" class="solr.TextField" positionIncrementGap="100" >
<analyzer>
<charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
<tokenizer class="solr.CharStreamAwareWhitespaceTokenizerFactory"/>
</analyzer>
</fieldType>
-->
<!-- This is an example of using the KeywordTokenizer along
With various TokenFilterFactories to produce a sortable field
that does not include some properties of the source text
-->
<fieldType name="alphaOnlySort" class="solr.TextField" sortMissingLast="true" omitNorms="true">
<analyzer>
<!-- KeywordTokenizer does no actual tokenizing, so the entire
input string is preserved as a single token
-->
<tokenizer class="solr.KeywordTokenizerFactory"/>
<!-- The LowerCase TokenFilter does what you expect, which can be
when you want your sorting to be case insensitive
-->
<filter class="solr.LowerCaseFilterFactory" />
<!-- The TrimFilter removes any leading or trailing whitespace -->
<filter class="solr.TrimFilterFactory" />
<!-- The PatternReplaceFilter gives you the flexibility to use
Java Regular expression to replace any sequence of characters
matching a pattern with an arbitrary replacement string,
which may include back refrences to portions of the orriginal
string matched by the pattern.
See the Java Regular Expression documentation for more
infomation on pattern and replacement string syntax.
http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/package-summary.html
-->
<filter class="solr.PatternReplaceFilterFactory"
pattern="([^a-z])" replacement="" replace="all"
/>
</analyzer>
</fieldType>
<fieldtype name="phonetic" stored="false" indexed="true" class="solr.TextField" >
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.DoubleMetaphoneFilterFactory" inject="false"/>
</analyzer>
</fieldtype>
<!-- since fields of this type are by default not stored or indexed, any data added to
them will be ignored outright
-->
<fieldtype name="ignored" stored="false" indexed="false" class="solr.StrField" />
</types>
<fields>
<!-- Valid attributes for fields:
name: mandatory - the name for the field
type: mandatory - the name of a previously defined type from the <types> section
indexed: true if this field should be indexed (searchable or sortable)
stored: true if this field should be retrievable
compressed: [false] if this field should be stored using gzip compression
(this will only apply if the field type is compressable; among
the standard field types, only TextField and StrField are)
multiValued: true if this field may contain multiple values per document
omitNorms: (expert) set to true to omit the norms associated with
this field (this disables length normalization and index-time
boosting for the field, and saves some memory). Only full-text
fields or fields that need an index-time boost need norms.
termVectors: [false] set to true to store the term vector for a given field.
When using MoreLikeThis, fields used for similarity should be stored for
best performance.
termPositions: Store position information with the term vector. This will increase storage costs.
termOffsets: Store offset information with the term vector. This will increase storage costs.
-->
<field name="messageId" type="string" indexed="true" stored="true" required="true" omitNorms="true" />
<field name="subject" type="string" indexed="true" stored="true" omitNorms="true" />
<field name="from" type="string" indexed="true" stored="true" omitNorms="true"/>
<field name="sentDate" type="date" indexed="true" stored="true"/>
<field name="xMailer" type="string" indexed="true" stored="true" omitNorms="true"/>
<field name="allTo" type="string" indexed="true" stored="true" omitNorms="true" multiValued="true"/>
<field name="flags" type="string" indexed="true" stored="true" omitNorms="true" multiValued="true"/>
<field name="content" type="text" indexed="true" stored="true" omitNorms="true" multiValued="true"/>
<field name="attachment" type="text" indexed="true" stored="true" omitNorms="true" multiValued="true"/>
<field name="attachmentNames" type="string" indexed="true" stored="true" omitNorms="true" multiValued="true"/>
<field name="catchAllField" type="text" indexed="true" stored="true" omitNorms="true" multiValued="true"/>
</fields>
<copyField source="content" dest="catchAllField"/>
<copyField source="attachmentNames" dest="catchAllField"/>
<copyField source="attachment" dest="catchAllField"/>
<copyField source="subject" dest="catchAllField"/>
<copyField source="allTo" dest="catchAllField"/>
<!-- The unique key, Note that some mail servers may not send the message-id or they may send duplicate ones -->
<uniqueKey>messageId</uniqueKey>
<!-- field for the QueryParser to use when an explicit fieldname is absent -->
<defaultSearchField>catchAllField</defaultSearchField>
<!-- SolrQueryParser configuration: defaultOperator="AND|OR" -->
<solrQueryParser defaultOperator="OR"/>
</schema>

View File

@ -0,0 +1,804 @@
<?xml version="1.0" encoding="UTF-8" ?>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<config>
<!-- Set this to 'false' if you want solr to continue working after it has
encountered an severe configuration error. In a production environment,
you may want solr to keep working even if one handler is mis-configured.
You may also set this to false using by setting the system property:
-Dsolr.abortOnConfigurationError=false
-->
<abortOnConfigurationError>${solr.abortOnConfigurationError:true}</abortOnConfigurationError>
<indexDefaults>
<!-- Values here affect all index writers and act as a default unless overridden. -->
<useCompoundFile>false</useCompoundFile>
<mergeFactor>10</mergeFactor>
<!--
If both ramBufferSizeMB and maxBufferedDocs is set, then Lucene will flush based on whichever limit is hit first.
-->
<!--<maxBufferedDocs>1000</maxBufferedDocs>-->
<!-- Tell Lucene when to flush documents to disk.
Giving Lucene more memory for indexing means faster indexing at the cost of more RAM
If both ramBufferSizeMB and maxBufferedDocs is set, then Lucene will flush based on whichever limit is hit first.
-->
<ramBufferSizeMB>32</ramBufferSizeMB>
<maxMergeDocs>2147483647</maxMergeDocs>
<maxFieldLength>10000</maxFieldLength>
<writeLockTimeout>1000</writeLockTimeout>
<commitLockTimeout>10000</commitLockTimeout>
<!--
Expert: Turn on Lucene's auto commit capability.
This causes intermediate segment flushes to write a new lucene
index descriptor, enabling it to be opened by an external
IndexReader.
NOTE: Despite the name, this value does not have any relation to Solr's autoCommit functionality
-->
<!--<luceneAutoCommit>false</luceneAutoCommit>-->
<!--
Expert:
The Merge Policy in Lucene controls how merging is handled by Lucene. The default in 2.3 is the LogByteSizeMergePolicy, previous
versions used LogDocMergePolicy.
LogByteSizeMergePolicy chooses segments to merge based on their size. The Lucene 2.2 default, LogDocMergePolicy chose when
to merge based on number of documents
Other implementations of MergePolicy must have a no-argument constructor
-->
<!--<mergePolicy>org.apache.lucene.index.LogByteSizeMergePolicy</mergePolicy>-->
<!--
Expert:
The Merge Scheduler in Lucene controls how merges are performed. The ConcurrentMergeScheduler (Lucene 2.3 default)
can perform merges in the background using separate threads. The SerialMergeScheduler (Lucene 2.2 default) does not.
-->
<!--<mergeScheduler>org.apache.lucene.index.ConcurrentMergeScheduler</mergeScheduler>-->
<!--
This option specifies which Lucene LockFactory implementation to use.
single = SingleInstanceLockFactory - suggested for a read-only index
or when there is no possibility of another process trying
to modify the index.
native = NativeFSLockFactory
simple = SimpleFSLockFactory
(For backwards compatibility with Solr 1.2, 'simple' is the default
if not specified.)
-->
<lockType>single</lockType>
</indexDefaults>
<mainIndex>
<!-- options specific to the main on-disk lucene index -->
<useCompoundFile>false</useCompoundFile>
<ramBufferSizeMB>32</ramBufferSizeMB>
<mergeFactor>10</mergeFactor>
<!-- Deprecated -->
<!--<maxBufferedDocs>1000</maxBufferedDocs>-->
<maxMergeDocs>2147483647</maxMergeDocs>
<maxFieldLength>10000</maxFieldLength>
<!-- If true, unlock any held write or commit locks on startup.
This defeats the locking mechanism that allows multiple
processes to safely access a lucene index, and should be
used with care.
This is not needed if lock type is 'none' or 'single'
-->
<unlockOnStartup>false</unlockOnStartup>
<!--
Custom deletion policies can specified here. The class must
implement org.apache.lucene.index.IndexDeletionPolicy.
http://lucene.apache.org/java/2_3_2/api/org/apache/lucene/index/IndexDeletionPolicy.html
The standard Solr IndexDeletionPolicy implementation supports deleting
index commit points on number of commits, age of commit point and
optimized status.
The latest commit point should always be preserved regardless
of the criteria.
-->
<deletionPolicy class="solr.SolrDeletionPolicy">
<!-- Keep only optimized commit points -->
<str name="keepOptimizedOnly">false</str>
<!-- The maximum number of commit points to be kept -->
<str name="maxCommitsToKeep">1</str>
<!--
Delete all commit points once they have reached the given age.
Supports DateMathParser syntax e.g.
<str name="maxCommitAge">30MINUTES</str>
<str name="maxCommitAge">1DAY</str>
-->
</deletionPolicy>
</mainIndex>
<!-- Enables JMX if and only if an existing MBeanServer is found, use
this if you want to configure JMX through JVM parameters. Remove
this to disable exposing Solr configuration and statistics to JMX.
If you want to connect to a particular server, specify the agentId
e.g. <jmx agentId="myAgent" />
If you want to start a new MBeanServer, specify the serviceUrl
e.g <jmx serviceUrl="service:jmx:rmi:///jndi/rmi://localhost:9999/solr" />
For more details see http://wiki.apache.org/solr/SolrJmx
-->
<jmx />
<!-- the default high-performance update handler -->
<updateHandler class="solr.DirectUpdateHandler2">
<!-- A prefix of "solr." for class names is an alias that
causes solr to search appropriate packages, including
org.apache.solr.(search|update|request|core|analysis)
-->
<!-- Perform a <commit/> automatically under certain conditions:
maxDocs - number of updates since last commit is greater than this
maxTime - oldest uncommited update (in ms) is this long ago
<autoCommit>
<maxDocs>10000</maxDocs>
<maxTime>1000</maxTime>
</autoCommit>
-->
<!-- The RunExecutableListener executes an external command.
exe - the name of the executable to run
dir - dir to use as the current working directory. default="."
wait - the calling thread waits until the executable returns. default="true"
args - the arguments to pass to the program. default=nothing
env - environment variables to set. default=nothing
-->
<!-- A postCommit event is fired after every commit or optimize command
<listener event="postCommit" class="solr.RunExecutableListener">
<str name="exe">solr/bin/snapshooter</str>
<str name="dir">.</str>
<bool name="wait">true</bool>
<arr name="args"> <str>arg1</str> <str>arg2</str> </arr>
<arr name="env"> <str>MYVAR=val1</str> </arr>
</listener>
-->
<!-- A postOptimize event is fired only after every optimize command, useful
in conjunction with index distribution to only distribute optimized indicies
<listener event="postOptimize" class="solr.RunExecutableListener">
<str name="exe">snapshooter</str>
<str name="dir">solr/bin</str>
<bool name="wait">true</bool>
</listener>
-->
</updateHandler>
<query>
<!-- Maximum number of clauses in a boolean query... can affect
range or prefix queries that expand to big boolean
queries. An exception is thrown if exceeded. -->
<maxBooleanClauses>1024</maxBooleanClauses>
<!-- There are two implementations of cache available for Solr,
LRUCache, based on a synchronized LinkedHashMap, and
FastLRUCache, based on a ConcurrentHashMap. FastLRUCache has faster gets
and slower puts in single threaded operation and thus is generally faster
than LRUCache when the hit ratio of the cache is high (> 75%), and may be
faster under other scenarios on multi-cpu systems. -->
<!-- Cache used by SolrIndexSearcher for filters (DocSets),
unordered sets of *all* documents that match a query.
When a new searcher is opened, its caches may be prepopulated
or "autowarmed" using data from caches in the old searcher.
autowarmCount is the number of items to prepopulate. For LRUCache,
the autowarmed items will be the most recently accessed items.
Parameters:
class - the SolrCache implementation LRUCache or FastLRUCache
size - the maximum number of entries in the cache
initialSize - the initial capacity (number of entries) of
the cache. (seel java.util.HashMap)
autowarmCount - the number of entries to prepopulate from
and old cache.
-->
<filterCache
class="solr.FastLRUCache"
size="512"
initialSize="512"
autowarmCount="128"/>
<!-- Cache used to hold field values that are quickly accessible
by document id. The fieldValueCache is created by default
even if not configured here.
<fieldValueCache
class="solr.FastLRUCache"
size="512"
autowarmCount="128"
showItems="32"
/>
-->
<!-- queryResultCache caches results of searches - ordered lists of
document ids (DocList) based on a query, a sort, and the range
of documents requested. -->
<queryResultCache
class="solr.LRUCache"
size="512"
initialSize="512"
autowarmCount="32"/>
<!-- documentCache caches Lucene Document objects (the stored fields for each document).
Since Lucene internal document ids are transient, this cache will not be autowarmed. -->
<documentCache
class="solr.LRUCache"
size="512"
initialSize="512"
autowarmCount="0"/>
<!-- If true, stored fields that are not requested will be loaded lazily.
This can result in a significant speed improvement if the usual case is to
not load all stored fields, especially if the skipped fields are large compressed
text fields.
-->
<enableLazyFieldLoading>true</enableLazyFieldLoading>
<!-- Example of a generic cache. These caches may be accessed by name
through SolrIndexSearcher.getCache(),cacheLookup(), and cacheInsert().
The purpose is to enable easy caching of user/application level data.
The regenerator argument should be specified as an implementation
of solr.search.CacheRegenerator if autowarming is desired. -->
<!--
<cache name="myUserCache"
class="solr.LRUCache"
size="4096"
initialSize="1024"
autowarmCount="1024"
regenerator="org.mycompany.mypackage.MyRegenerator"
/>
-->
<!-- An optimization that attempts to use a filter to satisfy a search.
If the requested sort does not include score, then the filterCache
will be checked for a filter matching the query. If found, the filter
will be used as the source of document ids, and then the sort will be
applied to that.
<useFilterForSortedQuery>true</useFilterForSortedQuery>
-->
<!-- An optimization for use with the queryResultCache. When a search
is requested, a superset of the requested number of document ids
are collected. For example, if a search for a particular query
requests matching documents 10 through 19, and queryWindowSize is 50,
then documents 0 through 49 will be collected and cached. Any further
requests in that range can be satisfied via the cache. -->
<queryResultWindowSize>50</queryResultWindowSize>
<!-- Maximum number of documents to cache for any entry in the
queryResultCache. -->
<queryResultMaxDocsCached>200</queryResultMaxDocsCached>
<!-- This entry enables an int hash representation for filters (DocSets)
when the number of items in the set is less than maxSize. For smaller
sets, this representation is more memory efficient, more efficient to
iterate over, and faster to take intersections. -->
<HashDocSet maxSize="3000" loadFactor="0.75"/>
<!-- a newSearcher event is fired whenever a new searcher is being prepared
and there is a current searcher handling requests (aka registered). -->
<!-- QuerySenderListener takes an array of NamedList and executes a
local query request for each NamedList in sequence. -->
<listener event="newSearcher" class="solr.QuerySenderListener">
<arr name="queries">
<lst> <str name="q">solr</str> <str name="start">0</str> <str name="rows">10</str> </lst>
<lst> <str name="q">rocks</str> <str name="start">0</str> <str name="rows">10</str> </lst>
<lst><str name="q">static newSearcher warming query from solrconfig.xml</str></lst>
</arr>
</listener>
<!-- a firstSearcher event is fired whenever a new searcher is being
prepared but there is no current registered searcher to handle
requests or to gain autowarming data from. -->
<listener event="firstSearcher" class="solr.QuerySenderListener">
<arr name="queries">
<lst> <str name="q">fast_warm</str> <str name="start">0</str> <str name="rows">10</str> </lst>
<lst><str name="q">static firstSearcher warming query from solrconfig.xml</str></lst>
</arr>
</listener>
<!-- If a search request comes in and there is no current registered searcher,
then immediately register the still warming searcher and use it. If
"false" then all requests will block until the first searcher is done
warming. -->
<useColdSearcher>false</useColdSearcher>
<!-- Maximum number of searchers that may be warming in the background
concurrently. An error is returned if this limit is exceeded. Recommend
1-2 for read-only slaves, higher for masters w/o cache warming. -->
<maxWarmingSearchers>2</maxWarmingSearchers>
</query>
<!--
Let the dispatch filter handler /select?qt=XXX
handleSelect=true will use consistent error handling for /select and /update
handleSelect=false will use solr1.1 style error formatting
-->
<requestDispatcher handleSelect="true" >
<!--Make sure your system has some authentication before enabling remote streaming! -->
<requestParsers enableRemoteStreaming="true" multipartUploadLimitInKB="2048000" />
<!-- Set HTTP caching related parameters (for proxy caches and clients).
To get the behaviour of Solr 1.2 (ie: no caching related headers)
use the never304="true" option and do not specify a value for
<cacheControl>
-->
<!-- <httpCaching never304="true"> -->
<httpCaching lastModifiedFrom="openTime"
etagSeed="Solr">
<!-- lastModFrom="openTime" is the default, the Last-Modified value
(and validation against If-Modified-Since requests) will all be
relative to when the current Searcher was opened.
You can change it to lastModFrom="dirLastMod" if you want the
value to exactly corrispond to when the physical index was last
modified.
etagSeed="..." is an option you can change to force the ETag
header (and validation against If-None-Match requests) to be
differnet even if the index has not changed (ie: when making
significant changes to your config file)
lastModifiedFrom and etagSeed are both ignored if you use the
never304="true" option.
-->
<!-- If you include a <cacheControl> directive, it will be used to
generate a Cache-Control header, as well as an Expires header
if the value contains "max-age="
By default, no Cache-Control header is generated.
You can use the <cacheControl> option even if you have set
never304="true"
-->
<!-- <cacheControl>max-age=30, public</cacheControl> -->
</httpCaching>
</requestDispatcher>
<!-- requestHandler plugins... incoming queries will be dispatched to the
correct handler based on the path or the qt (query type) param.
Names starting with a '/' are accessed with the a path equal to the
registered name. Names without a leading '/' are accessed with:
http://host/app/select?qt=name
If no qt is defined, the requestHandler that declares default="true"
will be used.
-->
<requestHandler name="standard" class="solr.SearchHandler" default="true">
<!-- default values for query parameters -->
<lst name="defaults">
<str name="echoParams">explicit</str>
<!--
<int name="rows">10</int>
<str name="fl">*</str>
<str name="version">2.1</str>
-->
</lst>
</requestHandler>
<!-- Please refer to http://wiki.apache.org/solr/SolrReplication for details on configuring replication -->
<!--Master config-->
<!--
<requestHandler name="/replication" class="solr.ReplicationHandler" >
<lst name="master">
<str name="replicateAfter">commit</str>
<str name="confFiles">schema.xml,stopwords.txt</str>
</lst>
</requestHandler>
-->
<!-- Slave config-->
<!--
<requestHandler name="/replication" class="solr.ReplicationHandler">
<lst name="slave">
<str name="masterUrl">http://localhost:8983/solr/replication</str>
<str name="pollInterval">00:00:60</str>
</lst>
</requestHandler>
-->
<!-- DisMaxRequestHandler allows easy searching across multiple fields
for simple user-entered phrases. It's implementation is now
just the standard SearchHandler with a default query type
of "dismax".
see http://wiki.apache.org/solr/DisMaxRequestHandler
-->
<requestHandler name="dismax" class="solr.SearchHandler" >
<lst name="defaults">
<str name="defType">dismax</str>
<str name="echoParams">explicit</str>
<float name="tie">0.01</float>
<str name="qf">
text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
</str>
<str name="pf">
text^0.2 features^1.1 name^1.5 manu^1.4 manu_exact^1.9
</str>
<str name="bf">
ord(popularity)^0.5 recip(rord(price),1,1000,1000)^0.3
</str>
<str name="fl">
id,name,price,score
</str>
<str name="mm">
2&lt;-1 5&lt;-2 6&lt;90%
</str>
<int name="ps">100</int>
<str name="q.alt">*:*</str>
<!-- example highlighter config, enable per-query with hl=true -->
<str name="hl.fl">text features name</str>
<!-- for this field, we want no fragmenting, just highlighting -->
<str name="f.name.hl.fragsize">0</str>
<!-- instructs Solr to return the field itself if no query terms are
found -->
<str name="f.name.hl.alternateField">name</str>
<str name="f.text.hl.fragmenter">regex</str> <!-- defined below -->
</lst>
</requestHandler>
<!-- Note how you can register the same handler multiple times with
different names (and different init parameters)
-->
<requestHandler name="partitioned" class="solr.SearchHandler" >
<lst name="defaults">
<str name="defType">dismax</str>
<str name="echoParams">explicit</str>
<str name="qf">text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0</str>
<str name="mm">2&lt;-1 5&lt;-2 6&lt;90%</str>
<!-- This is an example of using Date Math to specify a constantly
moving date range in a config...
-->
<str name="bq">incubationdate_dt:[* TO NOW/DAY-1MONTH]^2.2</str>
</lst>
<!-- In addition to defaults, "appends" params can be specified
to identify values which should be appended to the list of
multi-val params from the query (or the existing "defaults").
In this example, the param "fq=instock:true" will be appended to
any query time fq params the user may specify, as a mechanism for
partitioning the index, independent of any user selected filtering
that may also be desired (perhaps as a result of faceted searching).
NOTE: there is *absolutely* nothing a client can do to prevent these
"appends" values from being used, so don't use this mechanism
unless you are sure you always want it.
-->
<lst name="appends">
<str name="fq">inStock:true</str>
</lst>
<!-- "invariants" are a way of letting the Solr maintainer lock down
the options available to Solr clients. Any params values
specified here are used regardless of what values may be specified
in either the query, the "defaults", or the "appends" params.
In this example, the facet.field and facet.query params are fixed,
limiting the facets clients can use. Faceting is not turned on by
default - but if the client does specify facet=true in the request,
these are the only facets they will be able to see counts for;
regardless of what other facet.field or facet.query params they
may specify.
NOTE: there is *absolutely* nothing a client can do to prevent these
"invariants" values from being used, so don't use this mechanism
unless you are sure you always want it.
-->
<lst name="invariants">
<str name="facet.field">cat</str>
<str name="facet.field">manu_exact</str>
<str name="facet.query">price:[* TO 500]</str>
<str name="facet.query">price:[500 TO *]</str>
</lst>
</requestHandler>
<!--
Search components are registered to SolrCore and used by Search Handlers
By default, the following components are avaliable:
<searchComponent name="query" class="org.apache.solr.handler.component.QueryComponent" />
<searchComponent name="facet" class="org.apache.solr.handler.component.FacetComponent" />
<searchComponent name="mlt" class="org.apache.solr.handler.component.MoreLikeThisComponent" />
<searchComponent name="highlight" class="org.apache.solr.handler.component.HighlightComponent" />
<searchComponent name="stats" class="org.apache.solr.handler.component.StatsComponent" />
<searchComponent name="debug" class="org.apache.solr.handler.component.DebugComponent" />
Default configuration in a requestHandler would look like:
<arr name="components">
<str>query</str>
<str>facet</str>
<str>mlt</str>
<str>highlight</str>
<str>stats</str>
<str>debug</str>
</arr>
If you register a searchComponent to one of the standard names, that will be used instead.
To insert components before or after the 'standard' components, use:
<arr name="first-components">
<str>myFirstComponentName</str>
</arr>
<arr name="last-components">
<str>myLastComponentName</str>
</arr>
-->
<!-- The spell check component can return a list of alternative spelling
suggestions. -->
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<str name="queryAnalyzerFieldType">textSpell</str>
<lst name="spellchecker">
<str name="name">default</str>
<str name="field">spell</str>
<str name="spellcheckIndexDir">./spellchecker1</str>
</lst>
<lst name="spellchecker">
<str name="name">jarowinkler</str>
<str name="field">spell</str>
<!-- Use a different Distance Measure -->
<str name="distanceMeasure">org.apache.lucene.search.spell.JaroWinklerDistance</str>
<str name="spellcheckIndexDir">./spellchecker2</str>
</lst>
<lst name="spellchecker">
<str name="classname">solr.FileBasedSpellChecker</str>
<str name="name">file</str>
<str name="sourceLocation">spellings.txt</str>
<str name="characterEncoding">UTF-8</str>
<str name="spellcheckIndexDir">./spellcheckerFile</str>
</lst>
</searchComponent>
<!-- A request handler utilizing the spellcheck component.
################################################################################################
NOTE: This is purely as an example. The whole purpose of the SpellCheckComponent is to hook it into
the request handler that handles (i.e. the standard or dismax SearchHandler)
queries such that a separate request is not needed to get suggestions.
IN OTHER WORDS, THERE IS REALLY GOOD CHANCE THE SETUP BELOW IS NOT WHAT YOU WANT FOR YOUR PRODUCTION SYSTEM!
################################################################################################
-->
<requestHandler name="/spellCheckCompRH" class="solr.SearchHandler">
<lst name="defaults">
<!-- omp = Only More Popular -->
<str name="spellcheck.onlyMorePopular">false</str>
<!-- exr = Extended Results -->
<str name="spellcheck.extendedResults">false</str>
<!-- The number of suggestions to return -->
<str name="spellcheck.count">1</str>
</lst>
<arr name="last-components">
<str>spellcheck</str>
</arr>
</requestHandler>
<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">data-config.xml</str>
</lst>
</requestHandler>
<searchComponent name="tvComponent" class="org.apache.solr.handler.component.TermVectorComponent"/>
<!-- A Req Handler for working with the tvComponent. This is purely as an example.
You will likely want to add the component to your already specified request handlers. -->
<requestHandler name="tvrh" class="org.apache.solr.handler.component.SearchHandler">
<lst name="defaults">
<bool name="tv">true</bool>
</lst>
<arr name="last-components">
<str>tvComponent</str>
</arr>
</requestHandler>
<!--
<requestHandler name="/update/extract" class="org.apache.solr.handler.extraction.ExtractingRequestHandler">
<lst name="defaults">
<str name="ext.map.Last-Modified">last_modified</str>
<bool name="ext.ignore.und.fl">true</bool>
</lst>
</requestHandler>
-->
<searchComponent name="termsComp" class="org.apache.solr.handler.component.TermsComponent"/>
<requestHandler name="/autoSuggest" class="org.apache.solr.handler.component.SearchHandler">
<arr name="components">
<str>termsComp</str>
</arr>
</requestHandler>
<!-- Update request handler.
Note: Since solr1.1 requestHandlers requires a valid content type header if posted in
the body. For example, curl now requires: -H 'Content-type:text/xml; charset=utf-8'
The response format differs from solr1.1 formatting and returns a standard error code.
To enable solr1.1 behavior, remove the /update handler or change its path
-->
<requestHandler name="/update" class="solr.XmlUpdateRequestHandler" />
<requestHandler name="/update/javabin" class="solr.BinaryUpdateRequestHandler" />
<!--
Analysis request handler. Since Solr 1.3. Use to returnhow a document is analyzed. Useful
for debugging and as a token server for other types of applications
-->
<requestHandler name="/analysis" class="solr.AnalysisRequestHandler" />
<!-- CSV update handler, loaded on demand -->
<requestHandler name="/update/csv" class="solr.CSVRequestHandler" startup="lazy" />
<!--
Admin Handlers - This will register all the standard admin RequestHandlers. Adding
this single handler is equivalent to registering:
<requestHandler name="/admin/luke" class="org.apache.solr.handler.admin.LukeRequestHandler" />
<requestHandler name="/admin/system" class="org.apache.solr.handler.admin.SystemInfoHandler" />
<requestHandler name="/admin/plugins" class="org.apache.solr.handler.admin.PluginInfoHandler" />
<requestHandler name="/admin/threads" class="org.apache.solr.handler.admin.ThreadDumpHandler" />
<requestHandler name="/admin/properties" class="org.apache.solr.handler.admin.PropertiesRequestHandler" />
<requestHandler name="/admin/file" class="org.apache.solr.handler.admin.ShowFileRequestHandler" >
If you wish to hide files under ${solr.home}/conf, explicitly register the ShowFileRequestHandler using:
<requestHandler name="/admin/file" class="org.apache.solr.handler.admin.ShowFileRequestHandler" >
<lst name="invariants">
<str name="hidden">synonyms.txt</str>
<str name="hidden">anotherfile.txt</str>
</lst>
</requestHandler>
-->
<requestHandler name="/admin/" class="org.apache.solr.handler.admin.AdminHandlers" />
<!-- ping/healthcheck -->
<requestHandler name="/admin/ping" class="PingRequestHandler">
<lst name="defaults">
<str name="qt">standard</str>
<str name="q">solrpingquery</str>
<str name="echoParams">all</str>
</lst>
</requestHandler>
<!-- Echo the request contents back to the client -->
<requestHandler name="/debug/dump" class="solr.DumpRequestHandler" >
<lst name="defaults">
<str name="echoParams">explicit</str> <!-- for all params (including the default etc) use: 'all' -->
<str name="echoHandler">true</str>
</lst>
</requestHandler>
<highlighting>
<!-- Configure the standard fragmenter -->
<!-- This could most likely be commented out in the "default" case -->
<fragmenter name="gap" class="org.apache.solr.highlight.GapFragmenter" default="true">
<lst name="defaults">
<int name="hl.fragsize">100</int>
</lst>
</fragmenter>
<!-- A regular-expression-based fragmenter (f.i., for sentence extraction) -->
<fragmenter name="regex" class="org.apache.solr.highlight.RegexFragmenter">
<lst name="defaults">
<!-- slightly smaller fragsizes work better because of slop -->
<int name="hl.fragsize">70</int>
<!-- allow 50% slop on fragment sizes -->
<float name="hl.regex.slop">0.5</float>
<!-- a basic sentence pattern -->
<str name="hl.regex.pattern">[-\w ,/\n\"']{20,200}</str>
</lst>
</fragmenter>
<!-- Configure the standard formatter -->
<formatter name="html" class="org.apache.solr.highlight.HtmlFormatter" default="true">
<lst name="defaults">
<str name="hl.simple.pre"><![CDATA[<em>]]></str>
<str name="hl.simple.post"><![CDATA[</em>]]></str>
</lst>
</formatter>
</highlighting>
<!-- An example dedup update processor that creates the "id" field on the fly
based on the hash code of some other fields. This example has overwriteDupes
set to false since we are using the id field as the signatureField and Solr
will maintain uniqueness based on that anyway. -->
<!--
<updateRequestProcessorChain name="dedupe">
<processor class="org.apache.solr.update.processor.SignatureUpdateProcessorFactory">
<bool name="enabled">true</bool>
<str name="signatureField">id</str>
<bool name="overwriteDupes">false</bool>
<str name="fields">name,features,cat</str>
<str name="signatureClass">org.apache.solr.update.processor.Lookup3Signature</str>
</processor>
<processor class="solr.LogUpdateProcessorFactory" />
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
-->
<!-- queryResponseWriter plugins... query responses will be written using the
writer specified by the 'wt' request parameter matching the name of a registered
writer.
The "default" writer is the default and will be used if 'wt' is not specified
in the request. XMLResponseWriter will be used if nothing is specified here.
The json, python, and ruby writers are also available by default.
<queryResponseWriter name="xml" class="org.apache.solr.request.XMLResponseWriter" default="true"/>
<queryResponseWriter name="json" class="org.apache.solr.request.JSONResponseWriter"/>
<queryResponseWriter name="python" class="org.apache.solr.request.PythonResponseWriter"/>
<queryResponseWriter name="ruby" class="org.apache.solr.request.RubyResponseWriter"/>
<queryResponseWriter name="php" class="org.apache.solr.request.PHPResponseWriter"/>
<queryResponseWriter name="phps" class="org.apache.solr.request.PHPSerializedResponseWriter"/>
<queryResponseWriter name="custom" class="com.example.MyResponseWriter"/>
-->
<!-- XSLT response writer transforms the XML output by any xslt file found
in Solr's conf/xslt directory. Changes to xslt files are checked for
every xsltCacheLifetimeSeconds.
-->
<queryResponseWriter name="xslt" class="org.apache.solr.request.XSLTResponseWriter">
<int name="xsltCacheLifetimeSeconds">5</int>
</queryResponseWriter>
<!-- example of registering a query parser
<queryParser name="lucene" class="org.apache.solr.search.LuceneQParserPlugin"/>
-->
<!-- example of registering a custom function parser
<valueSourceParser name="myfunc" class="com.mycompany.MyValueSourceParser" />
-->
<!-- config for the admin interface -->
<admin>
<defaultQuery>solr</defaultQuery>
<!-- configure a healthcheck file for servers behind a loadbalancer
<healthcheck type="file">server-enabled</healthcheck>
-->
</admin>
</config>

View File

@ -0,0 +1,58 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#-----------------------------------------------------------------------
# a couple of test stopwords to test that the words are really being
# configured from this file:
stopworda
stopwordb
#Standard english stop words taken from Lucene's StopAnalyzer
a
an
and
are
as
at
be
but
by
for
if
in
into
is
it
no
not
of
on
or
s
such
t
that
the
their
then
there
these
they
this
to
was
will
with

View File

@ -0,0 +1,31 @@
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#-----------------------------------------------------------------------
#some test synonym mappings unlikely to appear in real input text
aaa => aaaa
bbb => bbbb1 bbbb2
ccc => cccc1,cccc2
a\=>a => b\=>b
a\,a => b\,b
fooaaa,baraaa,bazaaa
# Some synonym groups specific to this example
GB,gib,gigabyte,gigabytes
MB,mib,megabyte,megabytes
Television, Televisions, TV, TVs
#notice we use "gib" instead of "GiB" so any WordDelimiterFilter coming
#after us won't split it into two words.
# Synonym mappings can be used for spelling correction too
pixima => pixma

View File

@ -29,12 +29,6 @@
-->
<abortOnConfigurationError>${solr.abortOnConfigurationError:true}</abortOnConfigurationError>
<!-- Used to specify an alternate directory to hold all index data
other than the default ./data under the Solr home.
If replication is in use, this should match the replication configuration. -->
<dataDir>${solr.data.dir:./solr/rss/data}</dataDir>
<indexDefaults>
<!-- Values here affect all index writers and act as a default unless overridden. -->
<useCompoundFile>false</useCompoundFile>

View File

@ -3,5 +3,6 @@
<cores adminPath="/admin/cores">
<core default="true" instanceDir="db" name="db"></core>
<core default="false" instanceDir="rss" name="rss"></core>
<core default="false" instanceDir="mail" name="mail"></core>
</cores>
</solr>