mirror of https://github.com/apache/lucene.git
SOLR-14429: Convert .txt files to properly formatted .md files (#1450)
This commit is contained in:
parent
ce18505e28
commit
f03e6aac59
|
@ -130,6 +130,7 @@ class RatTask extends DefaultTask {
|
|||
List<String> srcExcludes = [
|
||||
"**/TODO",
|
||||
"**/*.txt",
|
||||
"**/*.md",
|
||||
"**/*.iml",
|
||||
"build/**"
|
||||
]
|
||||
|
|
|
@ -147,10 +147,12 @@ ant.fileScanner{
|
|||
include(name: 'dev-tools/**/*.' + it)
|
||||
include(name: '*.' + it)
|
||||
}
|
||||
// TODO: For now we don't scan txt files, so we
|
||||
// TODO: For now we don't scan txt / md files, so we
|
||||
// check licenses in top-level folders separately:
|
||||
include(name: '*.txt')
|
||||
include(name: '*/*.txt')
|
||||
include(name: '*.md')
|
||||
include(name: '*/*.md')
|
||||
// excludes:
|
||||
exclude(name: '**/build/**')
|
||||
exclude(name: '**/dist/**')
|
||||
|
|
|
@ -62,6 +62,8 @@ Other Changes
|
|||
|
||||
* SOLR-14420: AuthenticationPlugin.authenticate accepts HttpServletRequest instead of ServletRequest. (Mike Drob)
|
||||
|
||||
* SOLR-14429: Convert .txt files to properly formatted .md files. (Tomoko Uchida, Uwe Schindler)
|
||||
|
||||
================== 8.6.0 ==================
|
||||
|
||||
Consult the LUCENE_CHANGES.txt file for additional, low level, changes in this release.
|
||||
|
|
|
@ -1,18 +1,19 @@
|
|||
# Licensed to the Apache Software Foundation (ASF) under one or more
|
||||
# contributor license agreements. See the NOTICE file distributed with
|
||||
# this work for additional information regarding copyright ownership.
|
||||
# The ASF licenses this file to You under the Apache License, Version 2.0
|
||||
# (the "License"); you may not use this file except in compliance with
|
||||
# the License. You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
<!--
|
||||
Licensed to the Apache Software Foundation (ASF) under one or more
|
||||
contributor license agreements. See the NOTICE file distributed with
|
||||
this work for additional information regarding copyright ownership.
|
||||
The ASF licenses this file to You under the Apache License, Version 2.0
|
||||
(the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
-->
|
||||
|
||||
Welcome to the Apache Solr project!
|
||||
-----------------------------------
|
||||
|
@ -22,7 +23,7 @@ from the Apache Lucene project.
|
|||
|
||||
For a complete description of the Solr project, team composition, source
|
||||
code repositories, and other details, please see the Solr web site at
|
||||
http://lucene.apache.org/solr
|
||||
https://lucene.apache.org/solr
|
||||
|
||||
|
||||
Getting Started
|
||||
|
@ -30,37 +31,51 @@ Getting Started
|
|||
|
||||
To start Solr for the first time after installation, simply do:
|
||||
|
||||
```
|
||||
bin/solr start
|
||||
```
|
||||
|
||||
This will launch a standalone Solr server in the background of your shell,
|
||||
listening on port 8983. Alternatively, you can launch Solr in "cloud" mode,
|
||||
which allows you to scale out using sharding and replication. To launch Solr
|
||||
in cloud mode, do:
|
||||
|
||||
```
|
||||
bin/solr start -cloud
|
||||
```
|
||||
|
||||
To see all available options for starting Solr, please do:
|
||||
|
||||
```
|
||||
bin/solr start -help
|
||||
```
|
||||
|
||||
After starting Solr, create either a core or collection depending on whether
|
||||
Solr is running in standalone (core) or SolrCloud mode (collection) by doing:
|
||||
|
||||
```
|
||||
bin/solr create -c <name>
|
||||
```
|
||||
|
||||
This will create a collection that uses a data-driven schema which tries to guess
|
||||
the correct field type when you add documents to the index. To see all available
|
||||
options for creating a new collection, execute:
|
||||
|
||||
```
|
||||
bin/solr create -help
|
||||
```
|
||||
|
||||
After starting Solr, direct your Web browser to the Solr Admin Console at:
|
||||
|
||||
```
|
||||
http://localhost:8983/solr/
|
||||
```
|
||||
|
||||
When finished with your Solr installation, shut it down by executing:
|
||||
|
||||
```
|
||||
bin/solr stop -all
|
||||
```
|
||||
|
||||
The `-p PORT` option can also be used to identify the Solr instance to shutdown,
|
||||
where more than one Solr is running on the machine.
|
||||
|
@ -71,43 +86,55 @@ Solr Examples
|
|||
|
||||
Solr includes a few examples to help you get started. To run a specific example, do:
|
||||
|
||||
```
|
||||
bin/solr -e <EXAMPLE> where <EXAMPLE> is one of:
|
||||
|
||||
cloud : SolrCloud example
|
||||
dih : Data Import Handler (rdbms, mail, atom, tika)
|
||||
schemaless : Schema-less example (schema is inferred from data during indexing)
|
||||
techproducts : Kitchen sink example providing comprehensive examples of Solr features
|
||||
```
|
||||
|
||||
For instance, if you want to run the Solr Data Import Handler example, do:
|
||||
|
||||
```
|
||||
bin/solr -e dih
|
||||
|
||||
```
|
||||
|
||||
Indexing Documents
|
||||
---------------
|
||||
|
||||
To add documents to the index, use bin/post. For example:
|
||||
|
||||
```
|
||||
bin/post -c <collection_name> example/exampledocs/*.xml
|
||||
```
|
||||
|
||||
For more information about Solr examples please read...
|
||||
|
||||
* example/README.txt
|
||||
For more information about the "Solr Home" and Solr specific configuration
|
||||
* [example/README.md](example/README.md)
|
||||
|
||||
For more information about the "Solr Home" and Solr specific configuration
|
||||
|
||||
* https://lucene.apache.org/solr/guide/solr-tutorial.html
|
||||
For a Solr tutorial
|
||||
* http://lucene.apache.org/solr/resources.html
|
||||
For a list of other tutorials and introductory articles.
|
||||
|
||||
For a Solr tutorial
|
||||
|
||||
* https://lucene.apache.org/solr/resources.html
|
||||
|
||||
For a list of other tutorials and introductory articles.
|
||||
|
||||
or linked from "docs/index.html" in a binary distribution.
|
||||
|
||||
Also, there are Solr clients for many programming languages, see
|
||||
http://wiki.apache.org/solr/IntegratingSolr
|
||||
|
||||
* https://wiki.apache.org/solr/IntegratingSolr
|
||||
|
||||
|
||||
Files included in an Apache Solr binary distribution
|
||||
----------------------------------------------------
|
||||
|
||||
```
|
||||
server/
|
||||
A self-contained Solr instance, complete with a sample
|
||||
configuration and documents to index. Please see: bin/solr start -help
|
||||
|
@ -116,7 +143,7 @@ server/
|
|||
example/
|
||||
Contains example documents and an alternative Solr home
|
||||
directory containing examples of how to use the Data Import Handler,
|
||||
see example/example-DIH/README.txt for more information.
|
||||
see example/example-DIH/README.md for more information.
|
||||
|
||||
dist/solr-<component>-XX.jar
|
||||
The Apache Solr libraries. To compile Apache Solr Plugins,
|
||||
|
@ -126,7 +153,7 @@ dist/solr-<component>-XX.jar
|
|||
|
||||
docs/index.html
|
||||
A link to the online version of Apache Solr Javadoc API documentation and Tutorial
|
||||
|
||||
```
|
||||
|
||||
Instructions for Building Apache Solr from Source
|
||||
-------------------------------------------------
|
||||
|
@ -151,14 +178,14 @@ Instructions for Building Apache Solr from Source
|
|||
Alternately, you can obtain a copy of the latest Apache Solr source code
|
||||
directly from the GIT repository:
|
||||
|
||||
http://lucene.apache.org/solr/versioncontrol.html
|
||||
https://lucene.apache.org/solr/versioncontrol.html
|
||||
|
||||
4. Navigate to the "solr" folder and issue an "ant" command to see the available options
|
||||
for building, testing, and packaging Solr.
|
||||
|
||||
NOTE:
|
||||
To see Solr in action, you may want to use the "ant server" command to build
|
||||
and package Solr into the server directory. See also server/README.txt.
|
||||
and package Solr into the server directory. See also server/README.md.
|
||||
|
||||
|
||||
Export control
|
||||
|
@ -184,6 +211,7 @@ code and source code.
|
|||
|
||||
The following provides more details on the included cryptographic
|
||||
software:
|
||||
Apache Solr uses the Apache Tika which uses the Bouncy Castle generic encryption libraries for
|
||||
extracting text content and metadata from encrypted PDF files.
|
||||
See http://www.bouncycastle.org/ for more details on Bouncy Castle.
|
||||
|
||||
Apache Solr uses the Apache Tika which uses the Bouncy Castle generic encryption libraries for
|
||||
extracting text content and metadata from encrypted PDF files.
|
||||
See http://www.bouncycastle.org/ for more details on Bouncy Castle.
|
|
@ -57,7 +57,7 @@
|
|||
<attribute name="Main-Class" value="org.apache.solr.util.SimplePostTool"/>
|
||||
</manifest>
|
||||
</jar>
|
||||
<echo>See ${common-solr.dir}/README.txt for how to run the Solr server.</echo>
|
||||
<echo>See ${common-solr.dir}/README.md for how to run the Solr server.</echo>
|
||||
</target>
|
||||
|
||||
<target name="run-example" depends="server"
|
||||
|
@ -207,8 +207,8 @@
|
|||
</xslt>
|
||||
|
||||
<markdown todir="${javadoc.dir}">
|
||||
<fileset dir="site" includes="**/*.mdtext"/>
|
||||
<globmapper from="*.mdtext" to="*.html"/>
|
||||
<fileset dir="site" includes="**/*.md"/>
|
||||
<globmapper from="*.md" to="*.html"/>
|
||||
</markdown>
|
||||
|
||||
<copy todir="${javadoc.dir}">
|
||||
|
@ -530,8 +530,8 @@
|
|||
fullpath="${fullnamever}/LUCENE_CHANGES.txt" />
|
||||
<tarfileset dir="."
|
||||
prefix="${fullnamever}"
|
||||
includes="LICENSE.txt NOTICE.txt CHANGES.txt README.txt SYSTEM_REQUIREMENTS.txt
|
||||
bin/** server/** example/** contrib/**/lib/** contrib/**/conf/** contrib/**/README.txt
|
||||
includes="LICENSE.txt NOTICE.txt CHANGES.txt README.md site/SYSTEM_REQUIREMENTS.md
|
||||
bin/** server/** example/** contrib/**/lib/** contrib/**/conf/** contrib/**/README.md
|
||||
licenses/**"
|
||||
excludes="licenses/README.committers.txt **/data/ **/logs/*
|
||||
**/classes/ **/*.sh **/ivy.xml **/build.xml
|
||||
|
|
|
@ -0,0 +1,26 @@
|
|||
Apache Solr - Analysis Extras
|
||||
=============================
|
||||
|
||||
The analysis-extras plugin provides additional analyzers that rely
|
||||
upon large dependencies/dictionaries.
|
||||
|
||||
It includes integration with ICU for multilingual support,
|
||||
analyzers for Chinese and Polish, and integration with
|
||||
OpenNLP for multilingual tokenization, part-of-speech tagging
|
||||
lemmatization, phrase chunking, and named-entity recognition.
|
||||
|
||||
Each of the jars below relies upon including `/dist/solr-analysis-extras-X.Y.jar`
|
||||
in the `solrconfig.xml`
|
||||
|
||||
* ICU relies upon `lucene-libs/lucene-analyzers-icu-X.Y.jar`
|
||||
and `lib/icu4j-X.Y.jar`
|
||||
|
||||
* Smartcn relies upon `lucene-libs/lucene-analyzers-smartcn-X.Y.jar`
|
||||
|
||||
* Stempel relies on `lucene-libs/lucene-analyzers-stempel-X.Y.jar`
|
||||
|
||||
* Morfologik relies on `lucene-libs/lucene-analyzers-morfologik-X.Y.jar`
|
||||
and `lib/morfologik-*.jar`
|
||||
|
||||
* OpenNLP relies on `lucene-libs/lucene-analyzers-opennlp-X.Y.jar`
|
||||
and `lib/opennlp-*.jar`
|
|
@ -1,23 +0,0 @@
|
|||
The analysis-extras plugin provides additional analyzers that rely
|
||||
upon large dependencies/dictionaries.
|
||||
|
||||
It includes integration with ICU for multilingual support,
|
||||
analyzers for Chinese and Polish, and integration with
|
||||
OpenNLP for multilingual tokenization, part-of-speech tagging
|
||||
lemmatization, phrase chunking, and named-entity recognition.
|
||||
|
||||
Each of the jars below relies upon including /dist/solr-analysis-extras-X.Y.jar
|
||||
in the solrconfig.xml
|
||||
|
||||
ICU relies upon lucene-libs/lucene-analyzers-icu-X.Y.jar
|
||||
and lib/icu4j-X.Y.jar
|
||||
|
||||
Smartcn relies upon lucene-libs/lucene-analyzers-smartcn-X.Y.jar
|
||||
|
||||
Stempel relies on lucene-libs/lucene-analyzers-stempel-X.Y.jar
|
||||
|
||||
Morfologik relies on lucene-libs/lucene-analyzers-morfologik-X.Y.jar
|
||||
and lib/morfologik-*.jar
|
||||
|
||||
OpenNLP relies on lucene-libs/lucene-analyzers-opennlp-X.Y.jar
|
||||
and lib/opennlp-*.jar
|
|
@ -1,4 +1,5 @@
|
|||
Apache Solr - DataImportHandler
|
||||
Apache Solr - DataImportHandler
|
||||
================================
|
||||
|
||||
Introduction
|
||||
------------
|
|
@ -1,4 +1,5 @@
|
|||
Apache Solr Content Extraction Library (Solr Cell)
|
||||
==================================================
|
||||
|
||||
Introduction
|
||||
------------
|
|
@ -18,6 +18,7 @@ Note that all library of solr-jaegertracer must be included in the classpath of
|
|||
```
|
||||
|
||||
List of parameters for JaegerTracerConfigurator include:
|
||||
|
||||
|Parameter|Type|Required|Default|Description|
|
||||
|---------|----|--------|-------|-----------|
|
||||
|agentHost|string|Yes||The host of Jaeger backend|
|
|
@ -1,5 +1,5 @@
|
|||
Apache Solr Language Identifier
|
||||
|
||||
===============================
|
||||
|
||||
Introduction
|
||||
------------
|
|
@ -13,7 +13,7 @@ For information on how to get started with solr ltr please see:
|
|||
# Getting Started With Solr
|
||||
|
||||
For information on how to get started with solr please see:
|
||||
* [solr/README.txt](../../README.txt)
|
||||
* [solr/README.md](../../README.md)
|
||||
* [Solr Tutorial](https://lucene.apache.org/solr/guide/solr-tutorial.html)
|
||||
|
||||
# How To Contribute
|
||||
|
|
|
@ -1 +0,0 @@
|
|||
README.md
|
|
@ -11,7 +11,7 @@ For information on how to get started with solr-exporter please see:
|
|||
# Getting Started With Solr
|
||||
|
||||
For information on how to get started with solr please see:
|
||||
* [solr/README.txt](../../README.txt)
|
||||
* [solr/README.md](../../README.md)
|
||||
* [Solr Tutorial](https://lucene.apache.org/solr/guide/solr-tutorial.html)
|
||||
|
||||
# How To Contribute
|
|
@ -1,17 +1,19 @@
|
|||
# Licensed to the Apache Software Foundation (ASF) under one or more
|
||||
# contributor license agreements. See the NOTICE file distributed with
|
||||
# this work for additional information regarding copyright ownership.
|
||||
# The ASF licenses this file to You under the Apache License, Version 2.0
|
||||
# (the "License"); you may not use this file except in compliance with
|
||||
# the License. You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
<!--
|
||||
Licensed to the Apache Software Foundation (ASF) under one or more
|
||||
contributor license agreements. See the NOTICE file distributed with
|
||||
this work for additional information regarding copyright ownership.
|
||||
The ASF licenses this file to You under the Apache License, Version 2.0
|
||||
(the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
-->
|
||||
|
||||
Solr example
|
||||
------------
|
||||
|
@ -19,44 +21,59 @@ Solr example
|
|||
This directory contains Solr examples. Each example is contained in a
|
||||
separate directory. To run a specific example, do:
|
||||
|
||||
```
|
||||
bin/solr -e <EXAMPLE> where <EXAMPLE> is one of:
|
||||
|
||||
cloud : SolrCloud example
|
||||
dih : Data Import Handler (rdbms, mail, atom, tika)
|
||||
schemaless : Schema-less example (schema is inferred from data during indexing)
|
||||
techproducts : Kitchen sink example providing comprehensive examples of Solr features
|
||||
```
|
||||
|
||||
For instance, if you want to run the Solr Data Import Handler example, do:
|
||||
|
||||
```
|
||||
bin/solr -e dih
|
||||
```
|
||||
|
||||
To see all the options available when starting Solr:
|
||||
|
||||
```
|
||||
bin/solr start -help
|
||||
```
|
||||
|
||||
After starting a Solr example, direct your Web browser to:
|
||||
|
||||
```
|
||||
http://localhost:8983/solr/
|
||||
```
|
||||
|
||||
To add documents to the index, use bin/post, for example:
|
||||
|
||||
```
|
||||
bin/post -c techproducts example/exampledocs/*.xml
|
||||
```
|
||||
|
||||
(where "techproducts" is the Solr core name)
|
||||
|
||||
For more information about this example please read...
|
||||
|
||||
* example/solr/README.txt
|
||||
For more information about the "Solr Home" and Solr specific configuration
|
||||
* https://lucene.apache.org/solr/guide/solr-tutorial.html
|
||||
For a Solr tutorial
|
||||
* http://wiki.apache.org/solr/SolrResources
|
||||
For a list of other tutorials and introductory articles.
|
||||
* [solr/example/README.md](./README.md)
|
||||
|
||||
For more information about the "Solr Home" and Solr specific configuration
|
||||
|
||||
* https://lucene.apache.org/solr/guide/solr-tutorial.html
|
||||
|
||||
For a Solr tutorial
|
||||
|
||||
* https://wiki.apache.org/solr/SolrResources
|
||||
|
||||
For a list of other tutorials and introductory articles.
|
||||
|
||||
Notes About These Examples
|
||||
--------------------------
|
||||
|
||||
* References to Jar Files Outside This Directory *
|
||||
### References to Jar Files Outside This Directory
|
||||
|
||||
Various example SolrHome dirs contained in this directory may use "<lib>"
|
||||
statements in the solrconfig.xml file to reference plugin jars outside of
|
||||
|
@ -68,7 +85,7 @@ clustering component, or any other modules in "contrib", you will need to
|
|||
copy the required jars or update the paths to those jars in your
|
||||
solrconfig.xml.
|
||||
|
||||
* Logging *
|
||||
### Logging
|
||||
|
||||
By default, Jetty & Solr will log to the console and logs/solr.log. This can
|
||||
be convenient when first getting started, but eventually you will want to
|
|
@ -41,7 +41,7 @@ task assemblePackaging(type: Sync) {
|
|||
include "exampledocs/**"
|
||||
include "files/**"
|
||||
include "films/**"
|
||||
include "README.txt"
|
||||
include "README.md"
|
||||
exclude "**/*.jar"
|
||||
})
|
||||
|
||||
|
|
|
@ -1,24 +1,28 @@
|
|||
# Licensed to the Apache Software Foundation (ASF) under one or more
|
||||
# contributor license agreements. See the NOTICE file distributed with
|
||||
# this work for additional information regarding copyright ownership.
|
||||
# The ASF licenses this file to You under the Apache License, Version 2.0
|
||||
# (the "License"); you may not use this file except in compliance with
|
||||
# the License. You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
<!--
|
||||
Licensed to the Apache Software Foundation (ASF) under one or more
|
||||
contributor license agreements. See the NOTICE file distributed with
|
||||
this work for additional information regarding copyright ownership.
|
||||
The ASF licenses this file to You under the Apache License, Version 2.0
|
||||
(the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
-->
|
||||
|
||||
Solr DataImportHandler example configuration
|
||||
--------------------------------------------
|
||||
|
||||
To run this multi-core example, use the "-e" option of the bin/solr script:
|
||||
|
||||
```
|
||||
> bin/solr -e dih
|
||||
```
|
||||
|
||||
When Solr is started connect to:
|
||||
|
|
@ -22,28 +22,34 @@ PDFs, HTML, and many other supported types.
|
|||
|
||||
For further explanations, see the frequently asked questions at the end of the guide.
|
||||
|
||||
##GETTING STARTED
|
||||
## GETTING STARTED
|
||||
|
||||
* To start Solr, enter the following command (make sure you’ve cd’ed into the directory in which Solr was installed):
|
||||
|
||||
```
|
||||
bin/solr start
|
||||
```
|
||||
|
||||
* If you’ve started correctly, you should see the following output:
|
||||
|
||||
```
|
||||
Waiting to see Solr listening on port 8983 [/]
|
||||
Started Solr server on port 8983 (pid=<your pid>). Happy searching!
|
||||
<hr>
|
||||
```
|
||||
|
||||
##CREATING THE CORE/COLLECTION
|
||||
## CREATING THE CORE/COLLECTION
|
||||
|
||||
* Before you can index your documents, you’ll need to create a core/collection. Do this by entering:
|
||||
|
||||
```
|
||||
bin/solr create -c files -d example/files/conf
|
||||
```
|
||||
|
||||
* Now you’ve created a core called “files” using a configuration tuned for indexing and querying rich text files.
|
||||
|
||||
* You should see the following response:
|
||||
|
||||
```
|
||||
Creating new core 'files' using command:
|
||||
http://localhost:8983/solr/admin/cores?action=CREATE&name=files&instanceDir=files
|
||||
|
||||
|
@ -52,26 +58,31 @@ For further explanations, see the frequently asked questions at the end of the g
|
|||
"status":0,
|
||||
"QTime":239},
|
||||
"core":"files"}
|
||||
```
|
||||
|
||||
<hr>
|
||||
##INDEXING DOCUMENTS
|
||||
## INDEXING DOCUMENTS
|
||||
|
||||
* Return to your command shell. To post all of your documents to the documents core, enter the following:
|
||||
|
||||
```
|
||||
bin/post -c files ~/Documents
|
||||
```
|
||||
|
||||
* Depending on how many documents you have, this could take a while. Sit back and watch the magic happen. When all of your documents have been indexed you’ll see something like:
|
||||
|
||||
```
|
||||
<some number> files indexed.
|
||||
COMMITting Solr index changes to http://localhost:8983/solr/files/update...
|
||||
Time spent: <some amount of time>
|
||||
```
|
||||
|
||||
* To see a list of accepted file types, do:
|
||||
|
||||
```
|
||||
bin/post -h
|
||||
```
|
||||
|
||||
|
||||
<hr>
|
||||
##BROWSING DOCUMENTS
|
||||
## BROWSING DOCUMENTS
|
||||
|
||||
* Your document information can be viewed in multiple formats: XML, JSON, CSV, as well as a nice HTML interface.
|
||||
|
||||
|
@ -80,8 +91,7 @@ For further explanations, see the frequently asked questions at the end of the g
|
|||
* To view your document information in XML or other formats, add &wt (for writer type) to the end of that URL. i.e. To view your results in xml format direct your browser to:
|
||||
[http://localhost:8983/solr/files/browse?&wt=xml](http://localhost:8983/solr/files/browse?&wt=xml)
|
||||
|
||||
<hr>
|
||||
##ADMIN UI
|
||||
## ADMIN UI
|
||||
|
||||
* Another way to verify that your core has been created is to view it in the Admin User Interface.
|
||||
|
||||
|
@ -98,8 +108,7 @@ For further explanations, see the frequently asked questions at the end of the g
|
|||
* Now you’ve opened the core page. On this page there are a multitude of different tools you can use to analyze and search your core. You will make use of these features after indexing your documents.
|
||||
* Take note of the "Num Docs" field in your core Statistics. If after indexing your documents, it shows Num Docs to be 0, that means there was a problem indexing.
|
||||
|
||||
<hr>
|
||||
##QUERYING INDEX
|
||||
## QUERYING INDEX
|
||||
|
||||
* In the Admin UI, enter a term in the query box to see which documents contain the word.
|
||||
|
||||
|
@ -111,8 +120,8 @@ For further explanations, see the frequently asked questions at the end of the g
|
|||
* Another way to query the index is by manipulating the URL in your address bar once in the browse view.
|
||||
|
||||
* i.e. : [http://localhost:8983/solr/files/browse?q=Lucene](http://localhost:8983/solr/files/browse?q=Lucene)
|
||||
<hr>
|
||||
##FAQs
|
||||
|
||||
## FAQs
|
||||
|
||||
* Why use -d when creating a core?
|
||||
* -d specifies a specific configuration to use. This example as a configuration tuned for indexing and query rich
|
||||
|
@ -120,16 +129,22 @@ For further explanations, see the frequently asked questions at the end of the g
|
|||
|
||||
* How do I delete a core?
|
||||
* To delete a core (i.e. files), you can enter the following in your command shell:
|
||||
|
||||
```
|
||||
bin/solr delete -c files
|
||||
```
|
||||
|
||||
* You should see the following output:
|
||||
|
||||
Deleting core 'files' using command:
|
||||
|
||||
```
|
||||
http://localhost:8983/solr/admin/cores?action=UNLOAD&core=files&deleteIndex=true&deleteDataDir=true&deleteInstanceDir=true
|
||||
|
||||
{"responseHeader":{
|
||||
"status":0,
|
||||
"QTime":19}}
|
||||
```
|
||||
|
||||
* This calls the Solr core admin handler, "UNLOAD", and the parameters "deleteDataDir" and "deleteInstanceDir" to ensure that all data associated with core is also removed
|
||||
|
||||
|
@ -139,14 +154,14 @@ For further explanations, see the frequently asked questions at the end of the g
|
|||
re-create or patch a core/collection with an updated configuration)**, Solr can be started with a special system property
|
||||
set to the _absolute_ path to the conf/velocity directory, like this:
|
||||
|
||||
```
|
||||
bin/solr start -Dvelocity.template.base.dir=</full/path/to>/example/files/conf/velocity/
|
||||
```
|
||||
|
||||
If you want to adjust the browse templates for an existing collection, edit the core’s configuration
|
||||
under server/solr/files/conf/velocity.
|
||||
If you want to adjust the browse templates for an existing collection, edit the core’s configuration
|
||||
under server/solr/files/conf/velocity.
|
||||
|
||||
## Provenance of free images used in this example:
|
||||
|
||||
=======
|
||||
|
||||
* Provenance of free images used in this example:
|
||||
- Globe icon: visualpharm.com
|
||||
- Flag icons: freeflagicons.com
|
|
@ -12,13 +12,20 @@ This data consists of the following fields:
|
|||
|
||||
Steps:
|
||||
* Start Solr:
|
||||
```
|
||||
bin/solr start
|
||||
```
|
||||
|
||||
* Create a "films" core:
|
||||
|
||||
```
|
||||
bin/solr create -c films
|
||||
```
|
||||
|
||||
* Set the schema on a couple of fields that Solr would otherwise guess differently (than we'd like) about:
|
||||
curl http://localhost:8983/solr/films/schema -X POST -H 'Content-type:application/json' --data-binary '{
|
||||
|
||||
```
|
||||
curl http://localhost:8983/solr/films/schema -X POST -H 'Content-type:application/json' --data-binary '{
|
||||
"add-field" : {
|
||||
"name":"name",
|
||||
"type":"text_general",
|
||||
|
@ -30,19 +37,23 @@ curl http://localhost:8983/solr/films/schema -X POST -H 'Content-type:applicatio
|
|||
"type":"pdate",
|
||||
"stored":true
|
||||
}
|
||||
}'
|
||||
}'
|
||||
```
|
||||
|
||||
* Now let's index the data, using one of these three commands:
|
||||
|
||||
- JSON: bin/post -c films example/films/films.json
|
||||
- XML: bin/post -c films example/films/films.xml
|
||||
- CSV: bin/post \
|
||||
- JSON: `bin/post -c films example/films/films.json`
|
||||
- XML: `bin/post -c films example/films/films.xml`
|
||||
- CSV:
|
||||
```
|
||||
bin/post \
|
||||
-c films \
|
||||
example/films/films.csv \
|
||||
-params "f.genre.split=true&f.directed_by.split=true&f.genre.separator=|&f.directed_by.separator=|"
|
||||
|
||||
```
|
||||
* Let's get searching!
|
||||
- Search for 'Batman':
|
||||
|
||||
http://localhost:8983/solr/films/query?q=name:batman
|
||||
|
||||
* If you get an error about the name field not existing, you haven't yet indexed the data
|
||||
|
@ -51,26 +62,33 @@ curl http://localhost:8983/solr/films/schema -X POST -H 'Content-type:applicatio
|
|||
It's easiest to simply reset the environment and try again, ensuring that each step successfully executes.
|
||||
|
||||
- Show me all 'Super hero' movies:
|
||||
|
||||
http://localhost:8983/solr/films/query?q=*:*&fq=genre:%22Superhero%20movie%22
|
||||
|
||||
- Let's see the distribution of genres across all the movies. See the facet section of the response for the counts:
|
||||
|
||||
http://localhost:8983/solr/films/query?q=*:*&facet=true&facet.field=genre
|
||||
|
||||
- Browse the indexed films in a traditional browser search interface:
|
||||
|
||||
http://localhost:8983/solr/films/browse
|
||||
|
||||
Now browse including the genre field as a facet:
|
||||
|
||||
http://localhost:8983/solr/films/browse?facet.field=genre
|
||||
|
||||
If you want to set a facet for /browse to keep around for every request add the facet.field into the "facets"
|
||||
param set (which the /browse handler is already configured to use):
|
||||
curl http://localhost:8983/solr/films/config/params -H 'Content-type:application/json' -d '{
|
||||
"update" : {
|
||||
|
||||
```
|
||||
curl http://localhost:8983/solr/films/config/params -H 'Content-type:application/json' -d '{
|
||||
"update" : {
|
||||
"facets": {
|
||||
"facet.field":"genre"
|
||||
}
|
||||
}
|
||||
}'
|
||||
}'
|
||||
```
|
||||
|
||||
And now http://localhost:8983/solr/films/browse will display the _genre_ facet automatically.
|
||||
|
||||
|
@ -93,6 +111,7 @@ FAQ:
|
|||
|
||||
Is there an easy to copy/paste script to do all of the above?
|
||||
|
||||
```
|
||||
Here ya go << END_OF_SCRIPT
|
||||
|
||||
bin/solr stop
|
||||
|
@ -123,9 +142,11 @@ curl http://localhost:8983/solr/films/config/params -H 'Content-type:application
|
|||
}'
|
||||
|
||||
# END_OF_SCRIPT
|
||||
```
|
||||
|
||||
Additional fun -
|
||||
|
||||
```
|
||||
Add highlighting:
|
||||
curl http://localhost:8983/solr/films/config/params -H 'Content-type:application/json' -d '{
|
||||
"set" : {
|
||||
|
@ -135,4 +156,6 @@ curl http://localhost:8983/solr/films/config/params -H 'Content-type:application
|
|||
}
|
||||
}
|
||||
}'
|
||||
```
|
||||
|
||||
try http://localhost:8983/solr/films/browse?q=batman now, and you'll see "batman" highlighted in the results
|
|
@ -72,7 +72,7 @@ task toDir(type: Sync) {
|
|||
include "CHANGES.txt"
|
||||
include "LICENSE.txt"
|
||||
include "NOTICE.txt"
|
||||
include "README.txt"
|
||||
include "README.md"
|
||||
})
|
||||
|
||||
from(project(":lucene").projectDir, {
|
||||
|
|
|
@ -1,17 +1,19 @@
|
|||
# Licensed to the Apache Software Foundation (ASF) under one or more
|
||||
# contributor license agreements. See the NOTICE file distributed with
|
||||
# this work for additional information regarding copyright ownership.
|
||||
# The ASF licenses this file to You under the Apache License, Version 2.0
|
||||
# (the "License"); you may not use this file except in compliance with
|
||||
# the License. You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
<!--
|
||||
Licensed to the Apache Software Foundation (ASF) under one or more
|
||||
contributor license agreements. See the NOTICE file distributed with
|
||||
this work for additional information regarding copyright ownership.
|
||||
The ASF licenses this file to You under the Apache License, Version 2.0
|
||||
(the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
-->
|
||||
|
||||
Solr server
|
||||
------------
|
||||
|
@ -21,14 +23,17 @@ run Solr.
|
|||
|
||||
To run Solr:
|
||||
|
||||
```
|
||||
cd $SOLR_INSTALL
|
||||
bin/solr start
|
||||
```
|
||||
|
||||
where $SOLR_INSTALL is the location where you extracted the Solr installation bundle.
|
||||
|
||||
Server directory layout
|
||||
-----------------------
|
||||
|
||||
```
|
||||
server/contexts
|
||||
|
||||
This directory contains the Jetty Web application deployment descriptor for the Solr Web app.
|
||||
|
@ -75,18 +80,18 @@ server/solr/configsets
|
|||
server/solr-webapp
|
||||
|
||||
Contains files used by the Solr server; do not edit files in this directory (Solr is not a Java Web application).
|
||||
|
||||
```
|
||||
|
||||
Notes About Solr Examples
|
||||
--------------------------
|
||||
|
||||
* SolrHome *
|
||||
### SolrHome
|
||||
|
||||
By default, start.jar starts Solr in Jetty using the default Solr Home
|
||||
directory of "./solr/" (relative to the working directory of the servlet
|
||||
container).
|
||||
|
||||
* References to Jar Files Outside This Directory *
|
||||
### References to Jar Files Outside This Directory
|
||||
|
||||
Various example SolrHome dirs contained in this directory may use "<lib>"
|
||||
statements in the solrconfig.xml file to reference plugin jars outside of
|
||||
|
@ -98,7 +103,7 @@ clustering component, or any other modules in "contrib", you will need to
|
|||
copy the required jars or update the paths to those jars in your
|
||||
solrconfig.xml.
|
||||
|
||||
* Logging *
|
||||
### Logging
|
||||
|
||||
By default, Jetty & Solr will log to the console and logs/solr.log. This can
|
||||
be convenient when first getting started, but eventually you will want to
|
|
@ -100,7 +100,7 @@ task assemblePackaging(type: Sync) {
|
|||
include "resources/**"
|
||||
include "scripts/**"
|
||||
include "solr/**"
|
||||
include "README.txt"
|
||||
include "README.md"
|
||||
})
|
||||
|
||||
from(configurations.compileClasspath, {
|
||||
|
|
|
@ -1,18 +1,19 @@
|
|||
# Licensed to the Apache Software Foundation (ASF) under one or more
|
||||
# contributor license agreements. See the NOTICE file distributed with
|
||||
# this work for additional information regarding copyright ownership.
|
||||
# The ASF licenses this file to You under the Apache License, Version 2.0
|
||||
# (the "License"); you may not use this file except in compliance with
|
||||
# the License. You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
<!--
|
||||
Licensed to the Apache Software Foundation (ASF) under one or more
|
||||
contributor license agreements. See the NOTICE file distributed with
|
||||
this work for additional information regarding copyright ownership.
|
||||
The ASF licenses this file to You under the Apache License, Version 2.0
|
||||
(the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
-->
|
||||
|
||||
Default Solr Home Directory
|
||||
=============================
|
||||
|
@ -26,7 +27,7 @@ Basic Directory Structure
|
|||
|
||||
The Solr Home directory typically contains the following...
|
||||
|
||||
* solr.xml *
|
||||
### solr.xml
|
||||
|
||||
This is the primary configuration file Solr looks for when starting;
|
||||
it specifies high-level configuration options that apply to all
|
||||
|
@ -44,13 +45,13 @@ for collection1 should be the same as the Solr Home Directory.
|
|||
For more information about solr.xml, please see:
|
||||
https://lucene.apache.org/solr/guide/solr-cores-and-solr-xml.html
|
||||
|
||||
* Individual SolrCore Instance Directories *
|
||||
### Individual SolrCore Instance Directories
|
||||
|
||||
Although solr.xml can be configured to look for SolrCore Instance Directories
|
||||
in any path, simple sub-directories of the Solr Home Dir using relative paths
|
||||
are common for many installations.
|
||||
|
||||
* Core Discovery *
|
||||
### Core Discovery
|
||||
|
||||
During startup, Solr will scan sub-directories of Solr home looking for
|
||||
a specific file named core.properties. If core.properties is found in a
|
||||
|
@ -60,15 +61,16 @@ defined in core.properties. For an example of core.properties, please see:
|
|||
example/solr/collection1/core.properties
|
||||
|
||||
For more information about core discovery, please see:
|
||||
|
||||
https://lucene.apache.org/solr/guide/defining-core-properties.html
|
||||
|
||||
* A Shared 'lib' Directory *
|
||||
### A Shared 'lib' Directory
|
||||
|
||||
Although solr.xml can be configured with an optional "sharedLib" attribute
|
||||
that can point to any path, it is common to use a "./lib" sub-directory of the
|
||||
Solr Home Directory.
|
||||
|
||||
* ZooKeeper Files *
|
||||
### ZooKeeper Files
|
||||
|
||||
When using SolrCloud using the embedded ZooKeeper option for Solr, it is
|
||||
common to have a "zoo.cfg" file and "zoo_data" directories in the Solr Home
|
|
@ -30,6 +30,7 @@ It's nice in this context because change to the templates
|
|||
are immediately visible in browser on the next visit.
|
||||
|
||||
Links:
|
||||
|
||||
http://velocity.apache.org
|
||||
http://wiki.apache.org/velocity/
|
||||
http://velocity.apache.org/engine/releases/velocity-1.7/user-guide.html
|
||||
|
@ -39,14 +40,18 @@ File List
|
|||
---------
|
||||
|
||||
System and Misc:
|
||||
```
|
||||
VM_global_library.vm - Macros used other templates,
|
||||
exact filename is important for Velocity to see it
|
||||
error.vm - shows errors, if any
|
||||
debug.vm - includes toggle links for "explain" and "all fields"
|
||||
activated by debug link in footer.vm
|
||||
README.txt - this file
|
||||
README.md - this file
|
||||
```
|
||||
|
||||
Overall Page Composition:
|
||||
|
||||
```
|
||||
browse.vm - Main entry point into templates
|
||||
layout.vm - overall HTML page layout
|
||||
head.vm - elements in the <head> section of the HTML document
|
||||
|
@ -55,22 +60,30 @@ Overall Page Composition:
|
|||
includes debug and help links
|
||||
main.css - CSS style for overall pages
|
||||
see also jquery.autocomplete.css
|
||||
```
|
||||
|
||||
Query Form and Options:
|
||||
|
||||
```
|
||||
query_form.vm - renders query form
|
||||
query_group.vm - group by fields
|
||||
e.g.: Manufacturer or Poplularity
|
||||
query_spatial.vm - select box for location based Geospacial search
|
||||
```
|
||||
|
||||
Spelling Suggestions:
|
||||
|
||||
```
|
||||
did_you_mean.vm - hyperlinked spelling suggestions in results
|
||||
suggest.vm - dynamic spelling suggestions
|
||||
as you type in the search form
|
||||
jquery.autocomplete.js - supporting files for dynamic suggestions
|
||||
jquery.autocomplete.css - Most CSS is defined in main.css
|
||||
|
||||
```
|
||||
|
||||
Search Results, General:
|
||||
|
||||
```
|
||||
(see also browse.vm)
|
||||
tabs.vm - provides navigation to advanced search options
|
||||
pagination_top.vm - paging and staticis at top of results
|
||||
|
@ -84,9 +97,10 @@ Search Results, General:
|
|||
richtext_doc.vm - display a complex/misc. document
|
||||
hit_plain.vm - basic display of all fields,
|
||||
edit results_list.vm to enable this
|
||||
|
||||
```
|
||||
|
||||
Search Results, Facets & Clusters:
|
||||
```
|
||||
facets.vm - calls the 4 facet and 1 cluster template
|
||||
facet_fields.vm - display facets based on field values
|
||||
e.g.: fields specified by &facet.field=
|
||||
|
@ -99,3 +113,4 @@ Search Results, Facets & Clusters:
|
|||
cluster.vm - if clustering is available
|
||||
then call cluster_results.vm
|
||||
cluster_results.vm - actual rendering of clusters
|
||||
```
|
|
@ -816,7 +816,7 @@ Note that for this filter to work properly, the upstream tokenizer must not remo
|
|||
|
||||
This filter is a custom Unicode normalization form that applies the foldings specified in http://www.unicode.org/reports/tr30/tr30-4.html[Unicode TR #30: Character Foldings] in addition to the `NFKC_Casefold` normalization form as described in <<ICU Normalizer 2 Filter>>. This filter is a better substitute for the combined behavior of the <<ASCII Folding Filter>>, <<Lower Case Filter>>, and <<ICU Normalizer 2 Filter>>.
|
||||
|
||||
To use this filter, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See `solr/contrib/analysis-extras/README.txt` for instructions on which jars you need to add.
|
||||
To use this filter, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See `solr/contrib/analysis-extras/README.md` for instructions on which jars you need to add.
|
||||
|
||||
*Factory class:* `solr.ICUFoldingFilterFactory`
|
||||
|
||||
|
@ -924,7 +924,7 @@ This filter factory normalizes text according to one of five Unicode Normalizati
|
|||
|
||||
For detailed information about these normalization forms, see http://unicode.org/reports/tr15/[Unicode Normalization Forms].
|
||||
|
||||
To use this filter, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See `solr/contrib/analysis-extras/README.txt` for instructions on which jars you need to add.
|
||||
To use this filter, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See `solr/contrib/analysis-extras/README.md` for instructions on which jars you need to add.
|
||||
|
||||
== ICU Transform Filter
|
||||
|
||||
|
@ -966,7 +966,7 @@ This filter applies http://userguide.icu-project.org/transforms/general[ICU Tran
|
|||
|
||||
For detailed information about ICU Transforms, see http://userguide.icu-project.org/transforms/general.
|
||||
|
||||
To use this filter, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See `solr/contrib/analysis-extras/README.txt` for instructions on which jars you need to add.
|
||||
To use this filter, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See `solr/contrib/analysis-extras/README.md` for instructions on which jars you need to add.
|
||||
|
||||
== Keep Word Filter
|
||||
|
||||
|
|
|
@ -220,7 +220,7 @@ Unicode Collation in Solr is fast, because all the work is done at index time.
|
|||
|
||||
Rather than specifying an analyzer within `<fieldtype ... class="solr.TextField">`, the `solr.CollationField` and `solr.ICUCollationField` field type classes provide this functionality. `solr.ICUCollationField`, which is backed by http://site.icu-project.org[the ICU4J library], provides more flexible configuration, has more locales, is significantly faster, and requires less memory and less index space, since its keys are smaller than those produced by the JDK implementation that backs `solr.CollationField`.
|
||||
|
||||
To use `solr.ICUCollationField`, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See `solr/contrib/analysis-extras/README.txt` for instructions on which jars you need to add.
|
||||
To use `solr.ICUCollationField`, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See `solr/contrib/analysis-extras/README.md` for instructions on which jars you need to add.
|
||||
|
||||
`solr.ICUCollationField` and `solr.CollationField` fields can be created in two ways:
|
||||
|
||||
|
@ -487,7 +487,7 @@ The `lucene/analysis/opennlp` module provides OpenNLP integration via several an
|
|||
|
||||
NOTE: The <<OpenNLP Tokenizer>> must be used with all other OpenNLP analysis components, for two reasons: first, the OpenNLP Tokenizer detects and marks the sentence boundaries required by all the OpenNLP filters; and second, since the pre-trained OpenNLP models used by these filters were trained using the corresponding language-specific sentence-detection/tokenization models, the same tokenization, using the same models, must be used at runtime for optimal performance.
|
||||
|
||||
To use the OpenNLP components, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See `solr/contrib/analysis-extras/README.txt` for instructions on which jars you need to add.
|
||||
To use the OpenNLP components, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See `solr/contrib/analysis-extras/README.md` for instructions on which jars you need to add.
|
||||
|
||||
=== OpenNLP Tokenizer
|
||||
|
||||
|
@ -1033,7 +1033,7 @@ Solr can stem Catalan using the Snowball Porter Stemmer with an argument of `lan
|
|||
|
||||
=== Traditional Chinese
|
||||
|
||||
The default configuration of the <<tokenizers.adoc#icu-tokenizer,ICU Tokenizer>> is suitable for Traditional Chinese text. It follows the Word Break rules from the Unicode Text Segmentation algorithm for non-Chinese text, and uses a dictionary to segment Chinese words. To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See the `solr/contrib/analysis-extras/README.txt` for information on which jars you need to add.
|
||||
The default configuration of the <<tokenizers.adoc#icu-tokenizer,ICU Tokenizer>> is suitable for Traditional Chinese text. It follows the Word Break rules from the Unicode Text Segmentation algorithm for non-Chinese text, and uses a dictionary to segment Chinese words. To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See the `solr/contrib/analysis-extras/README.md` for information on which jars you need to add.
|
||||
|
||||
<<tokenizers.adoc#standard-tokenizer,Standard Tokenizer>> can also be used to tokenize Traditional Chinese text. Following the Word Break rules from the Unicode Text Segmentation algorithm, it produces one token per Chinese character. When combined with <<CJK Bigram Filter>>, overlapping bigrams of Chinese characters are formed.
|
||||
|
||||
|
@ -1105,9 +1105,9 @@ See the example under <<Traditional Chinese>>.
|
|||
|
||||
=== Simplified Chinese
|
||||
|
||||
For Simplified Chinese, Solr provides support for Chinese sentence and word segmentation with the <<HMM Chinese Tokenizer>>. This component includes a large dictionary and segments Chinese text into words with the Hidden Markov Model. To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See the `solr/contrib/analysis-extras/README.txt` for information on which jars you need to add.
|
||||
For Simplified Chinese, Solr provides support for Chinese sentence and word segmentation with the <<HMM Chinese Tokenizer>>. This component includes a large dictionary and segments Chinese text into words with the Hidden Markov Model. To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See the `solr/contrib/analysis-extras/README.md` for information on which jars you need to add.
|
||||
|
||||
The default configuration of the <<tokenizers.adoc#icu-tokenizer,ICU Tokenizer>> is also suitable for Simplified Chinese text. It follows the Word Break rules from the Unicode Text Segmentation algorithm for non-Chinese text, and uses a dictionary to segment Chinese words. To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See the `solr/contrib/analysis-extras/README.txt` for information on which jars you need to add.
|
||||
The default configuration of the <<tokenizers.adoc#icu-tokenizer,ICU Tokenizer>> is also suitable for Simplified Chinese text. It follows the Word Break rules from the Unicode Text Segmentation algorithm for non-Chinese text, and uses a dictionary to segment Chinese words. To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See the `solr/contrib/analysis-extras/README.md` for information on which jars you need to add.
|
||||
|
||||
Also useful for Chinese analysis:
|
||||
|
||||
|
@ -1162,7 +1162,7 @@ Also useful for Chinese analysis:
|
|||
|
||||
=== HMM Chinese Tokenizer
|
||||
|
||||
For Simplified Chinese, Solr provides support for Chinese sentence and word segmentation with the `solr.HMMChineseTokenizerFactory` in the `analysis-extras` contrib module. This component includes a large dictionary and segments Chinese text into words with the Hidden Markov Model. To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See `solr/contrib/analysis-extras/README.txt` for instructions on which jars you need to add.
|
||||
For Simplified Chinese, Solr provides support for Chinese sentence and word segmentation with the `solr.HMMChineseTokenizerFactory` in the `analysis-extras` contrib module. This component includes a large dictionary and segments Chinese text into words with the Hidden Markov Model. To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See `solr/contrib/analysis-extras/README.md` for instructions on which jars you need to add.
|
||||
|
||||
*Factory class:* `solr.HMMChineseTokenizerFactory`
|
||||
|
||||
|
@ -1960,7 +1960,7 @@ Example:
|
|||
[[hebrew-lao-myanmar-khmer]]
|
||||
=== Hebrew, Lao, Myanmar, Khmer
|
||||
|
||||
Lucene provides support, in addition to UAX#29 word break rules, for Hebrew's use of the double and single quote characters, and for segmenting Lao, Myanmar, and Khmer into syllables with the `solr.ICUTokenizerFactory` in the `analysis-extras` contrib module. To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See `solr/contrib/analysis-extras/README.txt for` instructions on which jars you need to add.
|
||||
Lucene provides support, in addition to UAX#29 word break rules, for Hebrew's use of the double and single quote characters, and for segmenting Lao, Myanmar, and Khmer into syllables with the `solr.ICUTokenizerFactory` in the `analysis-extras` contrib module. To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See `solr/contrib/analysis-extras/README.md` for instructions on which jars you need to add.
|
||||
|
||||
See <<tokenizers.adoc#icu-tokenizer,the ICUTokenizer>> for more information.
|
||||
|
||||
|
@ -2167,7 +2167,7 @@ Solr includes support for normalizing Persian, and Lucene includes an example st
|
|||
|
||||
=== Polish
|
||||
|
||||
Solr provides support for Polish stemming with the `solr.StempelPolishStemFilterFactory`, and `solr.MorphologikFilterFactory` for lemmatization, in the `contrib/analysis-extras` module. The `solr.StempelPolishStemFilterFactory` component includes an algorithmic stemmer with tables for Polish. To use either of these filters, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See `solr/contrib/analysis-extras/README.txt` for instructions on which jars you need to add.
|
||||
Solr provides support for Polish stemming with the `solr.StempelPolishStemFilterFactory`, and `solr.MorphologikFilterFactory` for lemmatization, in the `contrib/analysis-extras` module. The `solr.StempelPolishStemFilterFactory` component includes an algorithmic stemmer with tables for Polish. To use either of these filters, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See `solr/contrib/analysis-extras/README.md` for instructions on which jars you need to add.
|
||||
|
||||
*Factory class:* `solr.StempelPolishStemFilterFactory` and `solr.MorfologikFilterFactory`
|
||||
|
||||
|
@ -2684,7 +2684,7 @@ Solr includes support for stemming Turkish with the `solr.SnowballPorterFilterFa
|
|||
|
||||
=== Ukrainian
|
||||
|
||||
Solr provides support for Ukrainian lemmatization with the `solr.MorphologikFilterFactory`, in the `contrib/analysis-extras` module. To use this filter, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See `solr/contrib/analysis-extras/README.txt` for instructions on which jars you need to add.
|
||||
Solr provides support for Ukrainian lemmatization with the `solr.MorphologikFilterFactory`, in the `contrib/analysis-extras` module. To use this filter, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See `solr/contrib/analysis-extras/README.md` for instructions on which jars you need to add.
|
||||
|
||||
Lucene also includes an example Ukrainian stopword list, in the `lucene-analyzers-morfologik` jar.
|
||||
|
||||
|
|
|
@ -916,7 +916,7 @@ You may get errors as it works through your documents. These might be caused by
|
|||
DataImportHandler::
|
||||
Solr includes a tool called the <<uploading-structured-data-store-data-with-the-data-import-handler.adoc#uploading-structured-data-store-data-with-the-data-import-handler,Data Import Handler (DIH)>> which can connect to databases (if you have a jdbc driver), mail servers, or other structured data sources. There are several examples included for feeds, GMail, and a small HSQL database.
|
||||
+
|
||||
The `README.txt` file in `example/example-DIH` will give you details on how to start working with this tool.
|
||||
The `README.md` file in `example/example-DIH` will give you details on how to start working with this tool.
|
||||
|
||||
SolrJ::
|
||||
SolrJ is a Java-based client for interacting with Solr. Use <<using-solrj.adoc#using-solrj,SolrJ>> for JVM-based languages or other <<client-apis.adoc#client-apis,Solr clients>> to programmatically create documents to send to Solr.
|
||||
|
|
|
@ -514,7 +514,7 @@ The default configuration for `solr.ICUTokenizerFactory` provides UAX#29 word br
|
|||
[IMPORTANT]
|
||||
====
|
||||
|
||||
To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See the `solr/contrib/analysis-extras/README.txt` for information on which jars you need to add.
|
||||
To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See the `solr/contrib/analysis-extras/README.md` for information on which jars you need to add.
|
||||
|
||||
====
|
||||
|
||||
|
|
|
@ -111,7 +111,7 @@
|
|||
</fileset>
|
||||
<fileset dir=".">
|
||||
<include name="lib/*" />
|
||||
<include name="README.txt" />
|
||||
<include name="README.md" />
|
||||
</fileset>
|
||||
</copy>
|
||||
</target>
|
||||
|
|
Loading…
Reference in New Issue