SOLR-14429: Convert .txt files to properly formatted .md files (#1450)

This commit is contained in:
Tomoko Uchida 2020-04-27 08:43:04 +09:00 committed by GitHub
parent ce18505e28
commit f03e6aac59
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
33 changed files with 339 additions and 220 deletions

View File

@ -130,6 +130,7 @@ class RatTask extends DefaultTask {
List<String> srcExcludes = [ List<String> srcExcludes = [
"**/TODO", "**/TODO",
"**/*.txt", "**/*.txt",
"**/*.md",
"**/*.iml", "**/*.iml",
"build/**" "build/**"
] ]

View File

@ -147,10 +147,12 @@ ant.fileScanner{
include(name: 'dev-tools/**/*.' + it) include(name: 'dev-tools/**/*.' + it)
include(name: '*.' + it) include(name: '*.' + it)
} }
// TODO: For now we don't scan txt files, so we // TODO: For now we don't scan txt / md files, so we
// check licenses in top-level folders separately: // check licenses in top-level folders separately:
include(name: '*.txt') include(name: '*.txt')
include(name: '*/*.txt') include(name: '*/*.txt')
include(name: '*.md')
include(name: '*/*.md')
// excludes: // excludes:
exclude(name: '**/build/**') exclude(name: '**/build/**')
exclude(name: '**/dist/**') exclude(name: '**/dist/**')

View File

@ -62,6 +62,8 @@ Other Changes
* SOLR-14420: AuthenticationPlugin.authenticate accepts HttpServletRequest instead of ServletRequest. (Mike Drob) * SOLR-14420: AuthenticationPlugin.authenticate accepts HttpServletRequest instead of ServletRequest. (Mike Drob)
* SOLR-14429: Convert .txt files to properly formatted .md files. (Tomoko Uchida, Uwe Schindler)
================== 8.6.0 ================== ================== 8.6.0 ==================
Consult the LUCENE_CHANGES.txt file for additional, low level, changes in this release. Consult the LUCENE_CHANGES.txt file for additional, low level, changes in this release.

View File

@ -1,18 +1,19 @@
# Licensed to the Apache Software Foundation (ASF) under one or more <!--
# contributor license agreements. See the NOTICE file distributed with Licensed to the Apache Software Foundation (ASF) under one or more
# this work for additional information regarding copyright ownership. contributor license agreements. See the NOTICE file distributed with
# The ASF licenses this file to You under the Apache License, Version 2.0 this work for additional information regarding copyright ownership.
# (the "License"); you may not use this file except in compliance with The ASF licenses this file to You under the Apache License, Version 2.0
# the License. You may obtain a copy of the License at (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
Welcome to the Apache Solr project! Welcome to the Apache Solr project!
----------------------------------- -----------------------------------
@ -22,7 +23,7 @@ from the Apache Lucene project.
For a complete description of the Solr project, team composition, source For a complete description of the Solr project, team composition, source
code repositories, and other details, please see the Solr web site at code repositories, and other details, please see the Solr web site at
http://lucene.apache.org/solr https://lucene.apache.org/solr
Getting Started Getting Started
@ -30,37 +31,51 @@ Getting Started
To start Solr for the first time after installation, simply do: To start Solr for the first time after installation, simply do:
```
bin/solr start bin/solr start
```
This will launch a standalone Solr server in the background of your shell, This will launch a standalone Solr server in the background of your shell,
listening on port 8983. Alternatively, you can launch Solr in "cloud" mode, listening on port 8983. Alternatively, you can launch Solr in "cloud" mode,
which allows you to scale out using sharding and replication. To launch Solr which allows you to scale out using sharding and replication. To launch Solr
in cloud mode, do: in cloud mode, do:
```
bin/solr start -cloud bin/solr start -cloud
```
To see all available options for starting Solr, please do: To see all available options for starting Solr, please do:
```
bin/solr start -help bin/solr start -help
```
After starting Solr, create either a core or collection depending on whether After starting Solr, create either a core or collection depending on whether
Solr is running in standalone (core) or SolrCloud mode (collection) by doing: Solr is running in standalone (core) or SolrCloud mode (collection) by doing:
```
bin/solr create -c <name> bin/solr create -c <name>
```
This will create a collection that uses a data-driven schema which tries to guess This will create a collection that uses a data-driven schema which tries to guess
the correct field type when you add documents to the index. To see all available the correct field type when you add documents to the index. To see all available
options for creating a new collection, execute: options for creating a new collection, execute:
```
bin/solr create -help bin/solr create -help
```
After starting Solr, direct your Web browser to the Solr Admin Console at: After starting Solr, direct your Web browser to the Solr Admin Console at:
```
http://localhost:8983/solr/ http://localhost:8983/solr/
```
When finished with your Solr installation, shut it down by executing: When finished with your Solr installation, shut it down by executing:
```
bin/solr stop -all bin/solr stop -all
```
The `-p PORT` option can also be used to identify the Solr instance to shutdown, The `-p PORT` option can also be used to identify the Solr instance to shutdown,
where more than one Solr is running on the machine. where more than one Solr is running on the machine.
@ -71,43 +86,55 @@ Solr Examples
Solr includes a few examples to help you get started. To run a specific example, do: Solr includes a few examples to help you get started. To run a specific example, do:
```
bin/solr -e <EXAMPLE> where <EXAMPLE> is one of: bin/solr -e <EXAMPLE> where <EXAMPLE> is one of:
cloud : SolrCloud example cloud : SolrCloud example
dih : Data Import Handler (rdbms, mail, atom, tika) dih : Data Import Handler (rdbms, mail, atom, tika)
schemaless : Schema-less example (schema is inferred from data during indexing) schemaless : Schema-less example (schema is inferred from data during indexing)
techproducts : Kitchen sink example providing comprehensive examples of Solr features techproducts : Kitchen sink example providing comprehensive examples of Solr features
```
For instance, if you want to run the Solr Data Import Handler example, do: For instance, if you want to run the Solr Data Import Handler example, do:
```
bin/solr -e dih bin/solr -e dih
```
Indexing Documents Indexing Documents
--------------- ---------------
To add documents to the index, use bin/post. For example: To add documents to the index, use bin/post. For example:
```
bin/post -c <collection_name> example/exampledocs/*.xml bin/post -c <collection_name> example/exampledocs/*.xml
```
For more information about Solr examples please read... For more information about Solr examples please read...
* example/README.txt * [example/README.md](example/README.md)
For more information about the "Solr Home" and Solr specific configuration
For more information about the "Solr Home" and Solr specific configuration
* https://lucene.apache.org/solr/guide/solr-tutorial.html * https://lucene.apache.org/solr/guide/solr-tutorial.html
For a Solr tutorial
* http://lucene.apache.org/solr/resources.html For a Solr tutorial
For a list of other tutorials and introductory articles.
* https://lucene.apache.org/solr/resources.html
For a list of other tutorials and introductory articles.
or linked from "docs/index.html" in a binary distribution. or linked from "docs/index.html" in a binary distribution.
Also, there are Solr clients for many programming languages, see Also, there are Solr clients for many programming languages, see
http://wiki.apache.org/solr/IntegratingSolr
* https://wiki.apache.org/solr/IntegratingSolr
Files included in an Apache Solr binary distribution Files included in an Apache Solr binary distribution
---------------------------------------------------- ----------------------------------------------------
```
server/ server/
A self-contained Solr instance, complete with a sample A self-contained Solr instance, complete with a sample
configuration and documents to index. Please see: bin/solr start -help configuration and documents to index. Please see: bin/solr start -help
@ -116,7 +143,7 @@ server/
example/ example/
Contains example documents and an alternative Solr home Contains example documents and an alternative Solr home
directory containing examples of how to use the Data Import Handler, directory containing examples of how to use the Data Import Handler,
see example/example-DIH/README.txt for more information. see example/example-DIH/README.md for more information.
dist/solr-<component>-XX.jar dist/solr-<component>-XX.jar
The Apache Solr libraries. To compile Apache Solr Plugins, The Apache Solr libraries. To compile Apache Solr Plugins,
@ -126,7 +153,7 @@ dist/solr-<component>-XX.jar
docs/index.html docs/index.html
A link to the online version of Apache Solr Javadoc API documentation and Tutorial A link to the online version of Apache Solr Javadoc API documentation and Tutorial
```
Instructions for Building Apache Solr from Source Instructions for Building Apache Solr from Source
------------------------------------------------- -------------------------------------------------
@ -151,14 +178,14 @@ Instructions for Building Apache Solr from Source
Alternately, you can obtain a copy of the latest Apache Solr source code Alternately, you can obtain a copy of the latest Apache Solr source code
directly from the GIT repository: directly from the GIT repository:
http://lucene.apache.org/solr/versioncontrol.html https://lucene.apache.org/solr/versioncontrol.html
4. Navigate to the "solr" folder and issue an "ant" command to see the available options 4. Navigate to the "solr" folder and issue an "ant" command to see the available options
for building, testing, and packaging Solr. for building, testing, and packaging Solr.
NOTE: NOTE:
To see Solr in action, you may want to use the "ant server" command to build To see Solr in action, you may want to use the "ant server" command to build
and package Solr into the server directory. See also server/README.txt. and package Solr into the server directory. See also server/README.md.
Export control Export control
@ -184,6 +211,7 @@ code and source code.
The following provides more details on the included cryptographic The following provides more details on the included cryptographic
software: software:
Apache Solr uses the Apache Tika which uses the Bouncy Castle generic encryption libraries for
extracting text content and metadata from encrypted PDF files. Apache Solr uses the Apache Tika which uses the Bouncy Castle generic encryption libraries for
See http://www.bouncycastle.org/ for more details on Bouncy Castle. extracting text content and metadata from encrypted PDF files.
See http://www.bouncycastle.org/ for more details on Bouncy Castle.

View File

@ -57,7 +57,7 @@
<attribute name="Main-Class" value="org.apache.solr.util.SimplePostTool"/> <attribute name="Main-Class" value="org.apache.solr.util.SimplePostTool"/>
</manifest> </manifest>
</jar> </jar>
<echo>See ${common-solr.dir}/README.txt for how to run the Solr server.</echo> <echo>See ${common-solr.dir}/README.md for how to run the Solr server.</echo>
</target> </target>
<target name="run-example" depends="server" <target name="run-example" depends="server"
@ -207,8 +207,8 @@
</xslt> </xslt>
<markdown todir="${javadoc.dir}"> <markdown todir="${javadoc.dir}">
<fileset dir="site" includes="**/*.mdtext"/> <fileset dir="site" includes="**/*.md"/>
<globmapper from="*.mdtext" to="*.html"/> <globmapper from="*.md" to="*.html"/>
</markdown> </markdown>
<copy todir="${javadoc.dir}"> <copy todir="${javadoc.dir}">
@ -530,8 +530,8 @@
fullpath="${fullnamever}/LUCENE_CHANGES.txt" /> fullpath="${fullnamever}/LUCENE_CHANGES.txt" />
<tarfileset dir="." <tarfileset dir="."
prefix="${fullnamever}" prefix="${fullnamever}"
includes="LICENSE.txt NOTICE.txt CHANGES.txt README.txt SYSTEM_REQUIREMENTS.txt includes="LICENSE.txt NOTICE.txt CHANGES.txt README.md site/SYSTEM_REQUIREMENTS.md
bin/** server/** example/** contrib/**/lib/** contrib/**/conf/** contrib/**/README.txt bin/** server/** example/** contrib/**/lib/** contrib/**/conf/** contrib/**/README.md
licenses/**" licenses/**"
excludes="licenses/README.committers.txt **/data/ **/logs/* excludes="licenses/README.committers.txt **/data/ **/logs/*
**/classes/ **/*.sh **/ivy.xml **/build.xml **/classes/ **/*.sh **/ivy.xml **/build.xml

View File

@ -0,0 +1,26 @@
Apache Solr - Analysis Extras
=============================
The analysis-extras plugin provides additional analyzers that rely
upon large dependencies/dictionaries.
It includes integration with ICU for multilingual support,
analyzers for Chinese and Polish, and integration with
OpenNLP for multilingual tokenization, part-of-speech tagging
lemmatization, phrase chunking, and named-entity recognition.
Each of the jars below relies upon including `/dist/solr-analysis-extras-X.Y.jar`
in the `solrconfig.xml`
* ICU relies upon `lucene-libs/lucene-analyzers-icu-X.Y.jar`
and `lib/icu4j-X.Y.jar`
* Smartcn relies upon `lucene-libs/lucene-analyzers-smartcn-X.Y.jar`
* Stempel relies on `lucene-libs/lucene-analyzers-stempel-X.Y.jar`
* Morfologik relies on `lucene-libs/lucene-analyzers-morfologik-X.Y.jar`
and `lib/morfologik-*.jar`
* OpenNLP relies on `lucene-libs/lucene-analyzers-opennlp-X.Y.jar`
and `lib/opennlp-*.jar`

View File

@ -1,23 +0,0 @@
The analysis-extras plugin provides additional analyzers that rely
upon large dependencies/dictionaries.
It includes integration with ICU for multilingual support,
analyzers for Chinese and Polish, and integration with
OpenNLP for multilingual tokenization, part-of-speech tagging
lemmatization, phrase chunking, and named-entity recognition.
Each of the jars below relies upon including /dist/solr-analysis-extras-X.Y.jar
in the solrconfig.xml
ICU relies upon lucene-libs/lucene-analyzers-icu-X.Y.jar
and lib/icu4j-X.Y.jar
Smartcn relies upon lucene-libs/lucene-analyzers-smartcn-X.Y.jar
Stempel relies on lucene-libs/lucene-analyzers-stempel-X.Y.jar
Morfologik relies on lucene-libs/lucene-analyzers-morfologik-X.Y.jar
and lib/morfologik-*.jar
OpenNLP relies on lucene-libs/lucene-analyzers-opennlp-X.Y.jar
and lib/opennlp-*.jar

View File

@ -1,4 +1,5 @@
Apache Solr - DataImportHandler Apache Solr - DataImportHandler
================================
Introduction Introduction
------------ ------------

View File

@ -1,4 +1,5 @@
Apache Solr Content Extraction Library (Solr Cell) Apache Solr Content Extraction Library (Solr Cell)
==================================================
Introduction Introduction
------------ ------------

View File

@ -18,6 +18,7 @@ Note that all library of solr-jaegertracer must be included in the classpath of
``` ```
List of parameters for JaegerTracerConfigurator include: List of parameters for JaegerTracerConfigurator include:
|Parameter|Type|Required|Default|Description| |Parameter|Type|Required|Default|Description|
|---------|----|--------|-------|-----------| |---------|----|--------|-------|-----------|
|agentHost|string|Yes||The host of Jaeger backend| |agentHost|string|Yes||The host of Jaeger backend|

View File

@ -1,5 +1,5 @@
Apache Solr Language Identifier Apache Solr Language Identifier
===============================
Introduction Introduction
------------ ------------

View File

@ -13,7 +13,7 @@ For information on how to get started with solr ltr please see:
# Getting Started With Solr # Getting Started With Solr
For information on how to get started with solr please see: For information on how to get started with solr please see:
* [solr/README.txt](../../README.txt) * [solr/README.md](../../README.md)
* [Solr Tutorial](https://lucene.apache.org/solr/guide/solr-tutorial.html) * [Solr Tutorial](https://lucene.apache.org/solr/guide/solr-tutorial.html)
# How To Contribute # How To Contribute

View File

@ -1 +0,0 @@
README.md

View File

@ -11,7 +11,7 @@ For information on how to get started with solr-exporter please see:
# Getting Started With Solr # Getting Started With Solr
For information on how to get started with solr please see: For information on how to get started with solr please see:
* [solr/README.txt](../../README.txt) * [solr/README.md](../../README.md)
* [Solr Tutorial](https://lucene.apache.org/solr/guide/solr-tutorial.html) * [Solr Tutorial](https://lucene.apache.org/solr/guide/solr-tutorial.html)
# How To Contribute # How To Contribute

View File

@ -1,17 +1,19 @@
# Licensed to the Apache Software Foundation (ASF) under one or more <!--
# contributor license agreements. See the NOTICE file distributed with Licensed to the Apache Software Foundation (ASF) under one or more
# this work for additional information regarding copyright ownership. contributor license agreements. See the NOTICE file distributed with
# The ASF licenses this file to You under the Apache License, Version 2.0 this work for additional information regarding copyright ownership.
# (the "License"); you may not use this file except in compliance with The ASF licenses this file to You under the Apache License, Version 2.0
# the License. You may obtain a copy of the License at (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
# http://www.apache.org/licenses/LICENSE-2.0
# http://www.apache.org/licenses/LICENSE-2.0
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, Unless required by applicable law or agreed to in writing, software
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. distributed under the License is distributed on an "AS IS" BASIS,
# See the License for the specific language governing permissions and WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# limitations under the License. See the License for the specific language governing permissions and
limitations under the License.
-->
Solr example Solr example
------------ ------------
@ -19,44 +21,59 @@ Solr example
This directory contains Solr examples. Each example is contained in a This directory contains Solr examples. Each example is contained in a
separate directory. To run a specific example, do: separate directory. To run a specific example, do:
```
bin/solr -e <EXAMPLE> where <EXAMPLE> is one of: bin/solr -e <EXAMPLE> where <EXAMPLE> is one of:
cloud : SolrCloud example cloud : SolrCloud example
dih : Data Import Handler (rdbms, mail, atom, tika) dih : Data Import Handler (rdbms, mail, atom, tika)
schemaless : Schema-less example (schema is inferred from data during indexing) schemaless : Schema-less example (schema is inferred from data during indexing)
techproducts : Kitchen sink example providing comprehensive examples of Solr features techproducts : Kitchen sink example providing comprehensive examples of Solr features
```
For instance, if you want to run the Solr Data Import Handler example, do: For instance, if you want to run the Solr Data Import Handler example, do:
```
bin/solr -e dih bin/solr -e dih
```
To see all the options available when starting Solr: To see all the options available when starting Solr:
```
bin/solr start -help bin/solr start -help
```
After starting a Solr example, direct your Web browser to: After starting a Solr example, direct your Web browser to:
```
http://localhost:8983/solr/ http://localhost:8983/solr/
```
To add documents to the index, use bin/post, for example: To add documents to the index, use bin/post, for example:
```
bin/post -c techproducts example/exampledocs/*.xml bin/post -c techproducts example/exampledocs/*.xml
```
(where "techproducts" is the Solr core name) (where "techproducts" is the Solr core name)
For more information about this example please read... For more information about this example please read...
* example/solr/README.txt * [solr/example/README.md](./README.md)
For more information about the "Solr Home" and Solr specific configuration
* https://lucene.apache.org/solr/guide/solr-tutorial.html For more information about the "Solr Home" and Solr specific configuration
For a Solr tutorial
* http://wiki.apache.org/solr/SolrResources * https://lucene.apache.org/solr/guide/solr-tutorial.html
For a list of other tutorials and introductory articles.
For a Solr tutorial
* https://wiki.apache.org/solr/SolrResources
For a list of other tutorials and introductory articles.
Notes About These Examples Notes About These Examples
-------------------------- --------------------------
* References to Jar Files Outside This Directory * ### References to Jar Files Outside This Directory
Various example SolrHome dirs contained in this directory may use "<lib>" Various example SolrHome dirs contained in this directory may use "<lib>"
statements in the solrconfig.xml file to reference plugin jars outside of statements in the solrconfig.xml file to reference plugin jars outside of
@ -68,7 +85,7 @@ clustering component, or any other modules in "contrib", you will need to
copy the required jars or update the paths to those jars in your copy the required jars or update the paths to those jars in your
solrconfig.xml. solrconfig.xml.
* Logging * ### Logging
By default, Jetty & Solr will log to the console and logs/solr.log. This can By default, Jetty & Solr will log to the console and logs/solr.log. This can
be convenient when first getting started, but eventually you will want to be convenient when first getting started, but eventually you will want to

View File

@ -41,7 +41,7 @@ task assemblePackaging(type: Sync) {
include "exampledocs/**" include "exampledocs/**"
include "files/**" include "files/**"
include "films/**" include "films/**"
include "README.txt" include "README.md"
exclude "**/*.jar" exclude "**/*.jar"
}) })

View File

@ -1,24 +1,28 @@
# Licensed to the Apache Software Foundation (ASF) under one or more <!--
# contributor license agreements. See the NOTICE file distributed with Licensed to the Apache Software Foundation (ASF) under one or more
# this work for additional information regarding copyright ownership. contributor license agreements. See the NOTICE file distributed with
# The ASF licenses this file to You under the Apache License, Version 2.0 this work for additional information regarding copyright ownership.
# (the "License"); you may not use this file except in compliance with The ASF licenses this file to You under the Apache License, Version 2.0
# the License. You may obtain a copy of the License at (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
# http://www.apache.org/licenses/LICENSE-2.0
# http://www.apache.org/licenses/LICENSE-2.0
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, Unless required by applicable law or agreed to in writing, software
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. distributed under the License is distributed on an "AS IS" BASIS,
# See the License for the specific language governing permissions and WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# limitations under the License. See the License for the specific language governing permissions and
limitations under the License.
-->
Solr DataImportHandler example configuration Solr DataImportHandler example configuration
-------------------------------------------- --------------------------------------------
To run this multi-core example, use the "-e" option of the bin/solr script: To run this multi-core example, use the "-e" option of the bin/solr script:
```
> bin/solr -e dih > bin/solr -e dih
```
When Solr is started connect to: When Solr is started connect to:

View File

@ -22,28 +22,34 @@ PDFs, HTML, and many other supported types.
For further explanations, see the frequently asked questions at the end of the guide. For further explanations, see the frequently asked questions at the end of the guide.
##GETTING STARTED ## GETTING STARTED
* To start Solr, enter the following command (make sure youve cded into the directory in which Solr was installed): * To start Solr, enter the following command (make sure youve cded into the directory in which Solr was installed):
```
bin/solr start bin/solr start
```
* If youve started correctly, you should see the following output: * If youve started correctly, you should see the following output:
```
Waiting to see Solr listening on port 8983 [/] Waiting to see Solr listening on port 8983 [/]
Started Solr server on port 8983 (pid=<your pid>). Happy searching! Started Solr server on port 8983 (pid=<your pid>). Happy searching!
<hr> ```
##CREATING THE CORE/COLLECTION ## CREATING THE CORE/COLLECTION
* Before you can index your documents, youll need to create a core/collection. Do this by entering: * Before you can index your documents, youll need to create a core/collection. Do this by entering:
```
bin/solr create -c files -d example/files/conf bin/solr create -c files -d example/files/conf
```
* Now youve created a core called “files” using a configuration tuned for indexing and querying rich text files. * Now youve created a core called “files” using a configuration tuned for indexing and querying rich text files.
* You should see the following response: * You should see the following response:
```
Creating new core 'files' using command: Creating new core 'files' using command:
http://localhost:8983/solr/admin/cores?action=CREATE&name=files&instanceDir=files http://localhost:8983/solr/admin/cores?action=CREATE&name=files&instanceDir=files
@ -52,26 +58,31 @@ For further explanations, see the frequently asked questions at the end of the g
"status":0, "status":0,
"QTime":239}, "QTime":239},
"core":"files"} "core":"files"}
```
<hr> ## INDEXING DOCUMENTS
##INDEXING DOCUMENTS
* Return to your command shell. To post all of your documents to the documents core, enter the following: * Return to your command shell. To post all of your documents to the documents core, enter the following:
```
bin/post -c files ~/Documents bin/post -c files ~/Documents
```
* Depending on how many documents you have, this could take a while. Sit back and watch the magic happen. When all of your documents have been indexed youll see something like: * Depending on how many documents you have, this could take a while. Sit back and watch the magic happen. When all of your documents have been indexed youll see something like:
```
<some number> files indexed. <some number> files indexed.
COMMITting Solr index changes to http://localhost:8983/solr/files/update... COMMITting Solr index changes to http://localhost:8983/solr/files/update...
Time spent: <some amount of time> Time spent: <some amount of time>
```
* To see a list of accepted file types, do: * To see a list of accepted file types, do:
```
bin/post -h bin/post -h
```
## BROWSING DOCUMENTS
<hr>
##BROWSING DOCUMENTS
* Your document information can be viewed in multiple formats: XML, JSON, CSV, as well as a nice HTML interface. * Your document information can be viewed in multiple formats: XML, JSON, CSV, as well as a nice HTML interface.
@ -80,8 +91,7 @@ For further explanations, see the frequently asked questions at the end of the g
* To view your document information in XML or other formats, add &wt (for writer type) to the end of that URL. i.e. To view your results in xml format direct your browser to: * To view your document information in XML or other formats, add &wt (for writer type) to the end of that URL. i.e. To view your results in xml format direct your browser to:
[http://localhost:8983/solr/files/browse?&wt=xml](http://localhost:8983/solr/files/browse?&wt=xml) [http://localhost:8983/solr/files/browse?&wt=xml](http://localhost:8983/solr/files/browse?&wt=xml)
<hr> ## ADMIN UI
##ADMIN UI
* Another way to verify that your core has been created is to view it in the Admin User Interface. * Another way to verify that your core has been created is to view it in the Admin User Interface.
@ -98,8 +108,7 @@ For further explanations, see the frequently asked questions at the end of the g
* Now youve opened the core page. On this page there are a multitude of different tools you can use to analyze and search your core. You will make use of these features after indexing your documents. * Now youve opened the core page. On this page there are a multitude of different tools you can use to analyze and search your core. You will make use of these features after indexing your documents.
* Take note of the "Num Docs" field in your core Statistics. If after indexing your documents, it shows Num Docs to be 0, that means there was a problem indexing. * Take note of the "Num Docs" field in your core Statistics. If after indexing your documents, it shows Num Docs to be 0, that means there was a problem indexing.
<hr> ## QUERYING INDEX
##QUERYING INDEX
* In the Admin UI, enter a term in the query box to see which documents contain the word. * In the Admin UI, enter a term in the query box to see which documents contain the word.
@ -111,42 +120,48 @@ For further explanations, see the frequently asked questions at the end of the g
* Another way to query the index is by manipulating the URL in your address bar once in the browse view. * Another way to query the index is by manipulating the URL in your address bar once in the browse view.
* i.e. : [http://localhost:8983/solr/files/browse?q=Lucene](http://localhost:8983/solr/files/browse?q=Lucene) * i.e. : [http://localhost:8983/solr/files/browse?q=Lucene](http://localhost:8983/solr/files/browse?q=Lucene)
<hr>
##FAQs ## FAQs
* Why use -d when creating a core? * Why use -d when creating a core?
* -d specifies a specific configuration to use. This example as a configuration tuned for indexing and query rich * -d specifies a specific configuration to use. This example as a configuration tuned for indexing and query rich
text files. text files.
* How do I delete a core? * How do I delete a core?
* To delete a core (i.e. files), you can enter the following in your command shell: * To delete a core (i.e. files), you can enter the following in your command shell:
bin/solr delete -c files
* You should see the following output: ```
bin/solr delete -c files
```
Deleting core 'files' using command: * You should see the following output:
http://localhost:8983/solr/admin/cores?action=UNLOAD&core=files&deleteIndex=true&deleteDataDir=true&deleteInstanceDir=true
{"responseHeader":{ Deleting core 'files' using command:
"status":0,
"QTime":19}}
* This calls the Solr core admin handler, "UNLOAD", and the parameters "deleteDataDir" and "deleteInstanceDir" to ensure that all data associated with core is also removed ```
http://localhost:8983/solr/admin/cores?action=UNLOAD&core=files&deleteIndex=true&deleteDataDir=true&deleteInstanceDir=true
{"responseHeader":{
"status":0,
"QTime":19}}
```
* This calls the Solr core admin handler, "UNLOAD", and the parameters "deleteDataDir" and "deleteInstanceDir" to ensure that all data associated with core is also removed
* How can I change the /browse UI? * How can I change the /browse UI?
The primary templates are under example/files/conf/velocity. **In order to edit those files in place (without having to The primary templates are under example/files/conf/velocity. **In order to edit those files in place (without having to
re-create or patch a core/collection with an updated configuration)**, Solr can be started with a special system property re-create or patch a core/collection with an updated configuration)**, Solr can be started with a special system property
set to the _absolute_ path to the conf/velocity directory, like this: set to the _absolute_ path to the conf/velocity directory, like this:
bin/solr start -Dvelocity.template.base.dir=</full/path/to>/example/files/conf/velocity/ ```
bin/solr start -Dvelocity.template.base.dir=</full/path/to>/example/files/conf/velocity/
```
If you want to adjust the browse templates for an existing collection, edit the cores configuration If you want to adjust the browse templates for an existing collection, edit the cores configuration
under server/solr/files/conf/velocity. under server/solr/files/conf/velocity.
## Provenance of free images used in this example:
=======
* Provenance of free images used in this example:
- Globe icon: visualpharm.com - Globe icon: visualpharm.com
- Flag icons: freeflagicons.com - Flag icons: freeflagicons.com

View File

@ -12,37 +12,48 @@ This data consists of the following fields:
Steps: Steps:
* Start Solr: * Start Solr:
bin/solr start ```
bin/solr start
```
* Create a "films" core: * Create a "films" core:
bin/solr create -c films
```
bin/solr create -c films
```
* Set the schema on a couple of fields that Solr would otherwise guess differently (than we'd like) about: * Set the schema on a couple of fields that Solr would otherwise guess differently (than we'd like) about:
curl http://localhost:8983/solr/films/schema -X POST -H 'Content-type:application/json' --data-binary '{
"add-field" : { ```
"name":"name", curl http://localhost:8983/solr/films/schema -X POST -H 'Content-type:application/json' --data-binary '{
"type":"text_general", "add-field" : {
"multiValued":false, "name":"name",
"stored":true "type":"text_general",
}, "multiValued":false,
"add-field" : { "stored":true
"name":"initial_release_date", },
"type":"pdate", "add-field" : {
"stored":true "name":"initial_release_date",
} "type":"pdate",
}' "stored":true
}
}'
```
* Now let's index the data, using one of these three commands: * Now let's index the data, using one of these three commands:
- JSON: bin/post -c films example/films/films.json - JSON: `bin/post -c films example/films/films.json`
- XML: bin/post -c films example/films/films.xml - XML: `bin/post -c films example/films/films.xml`
- CSV: bin/post \ - CSV:
```
bin/post \
-c films \ -c films \
example/films/films.csv \ example/films/films.csv \
-params "f.genre.split=true&f.directed_by.split=true&f.genre.separator=|&f.directed_by.separator=|" -params "f.genre.split=true&f.directed_by.split=true&f.genre.separator=|&f.directed_by.separator=|"
```
* Let's get searching! * Let's get searching!
- Search for 'Batman': - Search for 'Batman':
http://localhost:8983/solr/films/query?q=name:batman http://localhost:8983/solr/films/query?q=name:batman
* If you get an error about the name field not existing, you haven't yet indexed the data * If you get an error about the name field not existing, you haven't yet indexed the data
@ -51,26 +62,33 @@ curl http://localhost:8983/solr/films/schema -X POST -H 'Content-type:applicatio
It's easiest to simply reset the environment and try again, ensuring that each step successfully executes. It's easiest to simply reset the environment and try again, ensuring that each step successfully executes.
- Show me all 'Super hero' movies: - Show me all 'Super hero' movies:
http://localhost:8983/solr/films/query?q=*:*&fq=genre:%22Superhero%20movie%22 http://localhost:8983/solr/films/query?q=*:*&fq=genre:%22Superhero%20movie%22
- Let's see the distribution of genres across all the movies. See the facet section of the response for the counts: - Let's see the distribution of genres across all the movies. See the facet section of the response for the counts:
http://localhost:8983/solr/films/query?q=*:*&facet=true&facet.field=genre http://localhost:8983/solr/films/query?q=*:*&facet=true&facet.field=genre
- Browse the indexed films in a traditional browser search interface: - Browse the indexed films in a traditional browser search interface:
http://localhost:8983/solr/films/browse http://localhost:8983/solr/films/browse
Now browse including the genre field as a facet: Now browse including the genre field as a facet:
http://localhost:8983/solr/films/browse?facet.field=genre http://localhost:8983/solr/films/browse?facet.field=genre
If you want to set a facet for /browse to keep around for every request add the facet.field into the "facets" If you want to set a facet for /browse to keep around for every request add the facet.field into the "facets"
param set (which the /browse handler is already configured to use): param set (which the /browse handler is already configured to use):
curl http://localhost:8983/solr/films/config/params -H 'Content-type:application/json' -d '{
"update" : { ```
"facets": { curl http://localhost:8983/solr/films/config/params -H 'Content-type:application/json' -d '{
"facet.field":"genre" "update" : {
} "facets": {
} "facet.field":"genre"
}' }
}
}'
```
And now http://localhost:8983/solr/films/browse will display the _genre_ facet automatically. And now http://localhost:8983/solr/films/browse will display the _genre_ facet automatically.
@ -93,6 +111,7 @@ FAQ:
Is there an easy to copy/paste script to do all of the above? Is there an easy to copy/paste script to do all of the above?
```
Here ya go << END_OF_SCRIPT Here ya go << END_OF_SCRIPT
bin/solr stop bin/solr stop
@ -123,9 +142,11 @@ curl http://localhost:8983/solr/films/config/params -H 'Content-type:application
}' }'
# END_OF_SCRIPT # END_OF_SCRIPT
```
Additional fun - Additional fun -
```
Add highlighting: Add highlighting:
curl http://localhost:8983/solr/films/config/params -H 'Content-type:application/json' -d '{ curl http://localhost:8983/solr/films/config/params -H 'Content-type:application/json' -d '{
"set" : { "set" : {
@ -135,4 +156,6 @@ curl http://localhost:8983/solr/films/config/params -H 'Content-type:application
} }
} }
}' }'
```
try http://localhost:8983/solr/films/browse?q=batman now, and you'll see "batman" highlighted in the results try http://localhost:8983/solr/films/browse?q=batman now, and you'll see "batman" highlighted in the results

View File

@ -72,7 +72,7 @@ task toDir(type: Sync) {
include "CHANGES.txt" include "CHANGES.txt"
include "LICENSE.txt" include "LICENSE.txt"
include "NOTICE.txt" include "NOTICE.txt"
include "README.txt" include "README.md"
}) })
from(project(":lucene").projectDir, { from(project(":lucene").projectDir, {

View File

@ -1,17 +1,19 @@
# Licensed to the Apache Software Foundation (ASF) under one or more <!--
# contributor license agreements. See the NOTICE file distributed with Licensed to the Apache Software Foundation (ASF) under one or more
# this work for additional information regarding copyright ownership. contributor license agreements. See the NOTICE file distributed with
# The ASF licenses this file to You under the Apache License, Version 2.0 this work for additional information regarding copyright ownership.
# (the "License"); you may not use this file except in compliance with The ASF licenses this file to You under the Apache License, Version 2.0
# the License. You may obtain a copy of the License at (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
# http://www.apache.org/licenses/LICENSE-2.0
# http://www.apache.org/licenses/LICENSE-2.0
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, Unless required by applicable law or agreed to in writing, software
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. distributed under the License is distributed on an "AS IS" BASIS,
# See the License for the specific language governing permissions and WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# limitations under the License. See the License for the specific language governing permissions and
limitations under the License.
-->
Solr server Solr server
------------ ------------
@ -21,14 +23,17 @@ run Solr.
To run Solr: To run Solr:
```
cd $SOLR_INSTALL cd $SOLR_INSTALL
bin/solr start bin/solr start
```
where $SOLR_INSTALL is the location where you extracted the Solr installation bundle. where $SOLR_INSTALL is the location where you extracted the Solr installation bundle.
Server directory layout Server directory layout
----------------------- -----------------------
```
server/contexts server/contexts
This directory contains the Jetty Web application deployment descriptor for the Solr Web app. This directory contains the Jetty Web application deployment descriptor for the Solr Web app.
@ -75,18 +80,18 @@ server/solr/configsets
server/solr-webapp server/solr-webapp
Contains files used by the Solr server; do not edit files in this directory (Solr is not a Java Web application). Contains files used by the Solr server; do not edit files in this directory (Solr is not a Java Web application).
```
Notes About Solr Examples Notes About Solr Examples
-------------------------- --------------------------
* SolrHome * ### SolrHome
By default, start.jar starts Solr in Jetty using the default Solr Home By default, start.jar starts Solr in Jetty using the default Solr Home
directory of "./solr/" (relative to the working directory of the servlet directory of "./solr/" (relative to the working directory of the servlet
container). container).
* References to Jar Files Outside This Directory * ### References to Jar Files Outside This Directory
Various example SolrHome dirs contained in this directory may use "<lib>" Various example SolrHome dirs contained in this directory may use "<lib>"
statements in the solrconfig.xml file to reference plugin jars outside of statements in the solrconfig.xml file to reference plugin jars outside of
@ -98,7 +103,7 @@ clustering component, or any other modules in "contrib", you will need to
copy the required jars or update the paths to those jars in your copy the required jars or update the paths to those jars in your
solrconfig.xml. solrconfig.xml.
* Logging * ### Logging
By default, Jetty & Solr will log to the console and logs/solr.log. This can By default, Jetty & Solr will log to the console and logs/solr.log. This can
be convenient when first getting started, but eventually you will want to be convenient when first getting started, but eventually you will want to

View File

@ -100,7 +100,7 @@ task assemblePackaging(type: Sync) {
include "resources/**" include "resources/**"
include "scripts/**" include "scripts/**"
include "solr/**" include "solr/**"
include "README.txt" include "README.md"
}) })
from(configurations.compileClasspath, { from(configurations.compileClasspath, {

View File

@ -1,18 +1,19 @@
# Licensed to the Apache Software Foundation (ASF) under one or more <!--
# contributor license agreements. See the NOTICE file distributed with Licensed to the Apache Software Foundation (ASF) under one or more
# this work for additional information regarding copyright ownership. contributor license agreements. See the NOTICE file distributed with
# The ASF licenses this file to You under the Apache License, Version 2.0 this work for additional information regarding copyright ownership.
# (the "License"); you may not use this file except in compliance with The ASF licenses this file to You under the Apache License, Version 2.0
# the License. You may obtain a copy of the License at (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
Default Solr Home Directory Default Solr Home Directory
============================= =============================
@ -26,7 +27,7 @@ Basic Directory Structure
The Solr Home directory typically contains the following... The Solr Home directory typically contains the following...
* solr.xml * ### solr.xml
This is the primary configuration file Solr looks for when starting; This is the primary configuration file Solr looks for when starting;
it specifies high-level configuration options that apply to all it specifies high-level configuration options that apply to all
@ -44,13 +45,13 @@ for collection1 should be the same as the Solr Home Directory.
For more information about solr.xml, please see: For more information about solr.xml, please see:
https://lucene.apache.org/solr/guide/solr-cores-and-solr-xml.html https://lucene.apache.org/solr/guide/solr-cores-and-solr-xml.html
* Individual SolrCore Instance Directories * ### Individual SolrCore Instance Directories
Although solr.xml can be configured to look for SolrCore Instance Directories Although solr.xml can be configured to look for SolrCore Instance Directories
in any path, simple sub-directories of the Solr Home Dir using relative paths in any path, simple sub-directories of the Solr Home Dir using relative paths
are common for many installations. are common for many installations.
* Core Discovery * ### Core Discovery
During startup, Solr will scan sub-directories of Solr home looking for During startup, Solr will scan sub-directories of Solr home looking for
a specific file named core.properties. If core.properties is found in a a specific file named core.properties. If core.properties is found in a
@ -60,15 +61,16 @@ defined in core.properties. For an example of core.properties, please see:
example/solr/collection1/core.properties example/solr/collection1/core.properties
For more information about core discovery, please see: For more information about core discovery, please see:
https://lucene.apache.org/solr/guide/defining-core-properties.html https://lucene.apache.org/solr/guide/defining-core-properties.html
* A Shared 'lib' Directory * ### A Shared 'lib' Directory
Although solr.xml can be configured with an optional "sharedLib" attribute Although solr.xml can be configured with an optional "sharedLib" attribute
that can point to any path, it is common to use a "./lib" sub-directory of the that can point to any path, it is common to use a "./lib" sub-directory of the
Solr Home Directory. Solr Home Directory.
* ZooKeeper Files * ### ZooKeeper Files
When using SolrCloud using the embedded ZooKeeper option for Solr, it is When using SolrCloud using the embedded ZooKeeper option for Solr, it is
common to have a "zoo.cfg" file and "zoo_data" directories in the Solr Home common to have a "zoo.cfg" file and "zoo_data" directories in the Solr Home

View File

@ -30,6 +30,7 @@ It's nice in this context because change to the templates
are immediately visible in browser on the next visit. are immediately visible in browser on the next visit.
Links: Links:
http://velocity.apache.org http://velocity.apache.org
http://wiki.apache.org/velocity/ http://wiki.apache.org/velocity/
http://velocity.apache.org/engine/releases/velocity-1.7/user-guide.html http://velocity.apache.org/engine/releases/velocity-1.7/user-guide.html
@ -39,14 +40,18 @@ File List
--------- ---------
System and Misc: System and Misc:
```
VM_global_library.vm - Macros used other templates, VM_global_library.vm - Macros used other templates,
exact filename is important for Velocity to see it exact filename is important for Velocity to see it
error.vm - shows errors, if any error.vm - shows errors, if any
debug.vm - includes toggle links for "explain" and "all fields" debug.vm - includes toggle links for "explain" and "all fields"
activated by debug link in footer.vm activated by debug link in footer.vm
README.txt - this file README.md - this file
```
Overall Page Composition: Overall Page Composition:
```
browse.vm - Main entry point into templates browse.vm - Main entry point into templates
layout.vm - overall HTML page layout layout.vm - overall HTML page layout
head.vm - elements in the <head> section of the HTML document head.vm - elements in the <head> section of the HTML document
@ -55,22 +60,30 @@ Overall Page Composition:
includes debug and help links includes debug and help links
main.css - CSS style for overall pages main.css - CSS style for overall pages
see also jquery.autocomplete.css see also jquery.autocomplete.css
```
Query Form and Options: Query Form and Options:
```
query_form.vm - renders query form query_form.vm - renders query form
query_group.vm - group by fields query_group.vm - group by fields
e.g.: Manufacturer or Poplularity e.g.: Manufacturer or Poplularity
query_spatial.vm - select box for location based Geospacial search query_spatial.vm - select box for location based Geospacial search
```
Spelling Suggestions: Spelling Suggestions:
```
did_you_mean.vm - hyperlinked spelling suggestions in results did_you_mean.vm - hyperlinked spelling suggestions in results
suggest.vm - dynamic spelling suggestions suggest.vm - dynamic spelling suggestions
as you type in the search form as you type in the search form
jquery.autocomplete.js - supporting files for dynamic suggestions jquery.autocomplete.js - supporting files for dynamic suggestions
jquery.autocomplete.css - Most CSS is defined in main.css jquery.autocomplete.css - Most CSS is defined in main.css
```
Search Results, General: Search Results, General:
```
(see also browse.vm) (see also browse.vm)
tabs.vm - provides navigation to advanced search options tabs.vm - provides navigation to advanced search options
pagination_top.vm - paging and staticis at top of results pagination_top.vm - paging and staticis at top of results
@ -84,9 +97,10 @@ Search Results, General:
richtext_doc.vm - display a complex/misc. document richtext_doc.vm - display a complex/misc. document
hit_plain.vm - basic display of all fields, hit_plain.vm - basic display of all fields,
edit results_list.vm to enable this edit results_list.vm to enable this
```
Search Results, Facets & Clusters: Search Results, Facets & Clusters:
```
facets.vm - calls the 4 facet and 1 cluster template facets.vm - calls the 4 facet and 1 cluster template
facet_fields.vm - display facets based on field values facet_fields.vm - display facets based on field values
e.g.: fields specified by &facet.field= e.g.: fields specified by &facet.field=
@ -99,3 +113,4 @@ Search Results, Facets & Clusters:
cluster.vm - if clustering is available cluster.vm - if clustering is available
then call cluster_results.vm then call cluster_results.vm
cluster_results.vm - actual rendering of clusters cluster_results.vm - actual rendering of clusters
```

View File

@ -816,7 +816,7 @@ Note that for this filter to work properly, the upstream tokenizer must not remo
This filter is a custom Unicode normalization form that applies the foldings specified in http://www.unicode.org/reports/tr30/tr30-4.html[Unicode TR #30: Character Foldings] in addition to the `NFKC_Casefold` normalization form as described in <<ICU Normalizer 2 Filter>>. This filter is a better substitute for the combined behavior of the <<ASCII Folding Filter>>, <<Lower Case Filter>>, and <<ICU Normalizer 2 Filter>>. This filter is a custom Unicode normalization form that applies the foldings specified in http://www.unicode.org/reports/tr30/tr30-4.html[Unicode TR #30: Character Foldings] in addition to the `NFKC_Casefold` normalization form as described in <<ICU Normalizer 2 Filter>>. This filter is a better substitute for the combined behavior of the <<ASCII Folding Filter>>, <<Lower Case Filter>>, and <<ICU Normalizer 2 Filter>>.
To use this filter, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See `solr/contrib/analysis-extras/README.txt` for instructions on which jars you need to add. To use this filter, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See `solr/contrib/analysis-extras/README.md` for instructions on which jars you need to add.
*Factory class:* `solr.ICUFoldingFilterFactory` *Factory class:* `solr.ICUFoldingFilterFactory`
@ -924,7 +924,7 @@ This filter factory normalizes text according to one of five Unicode Normalizati
For detailed information about these normalization forms, see http://unicode.org/reports/tr15/[Unicode Normalization Forms]. For detailed information about these normalization forms, see http://unicode.org/reports/tr15/[Unicode Normalization Forms].
To use this filter, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See `solr/contrib/analysis-extras/README.txt` for instructions on which jars you need to add. To use this filter, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See `solr/contrib/analysis-extras/README.md` for instructions on which jars you need to add.
== ICU Transform Filter == ICU Transform Filter
@ -966,7 +966,7 @@ This filter applies http://userguide.icu-project.org/transforms/general[ICU Tran
For detailed information about ICU Transforms, see http://userguide.icu-project.org/transforms/general. For detailed information about ICU Transforms, see http://userguide.icu-project.org/transforms/general.
To use this filter, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See `solr/contrib/analysis-extras/README.txt` for instructions on which jars you need to add. To use this filter, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See `solr/contrib/analysis-extras/README.md` for instructions on which jars you need to add.
== Keep Word Filter == Keep Word Filter

View File

@ -220,7 +220,7 @@ Unicode Collation in Solr is fast, because all the work is done at index time.
Rather than specifying an analyzer within `<fieldtype ... class="solr.TextField">`, the `solr.CollationField` and `solr.ICUCollationField` field type classes provide this functionality. `solr.ICUCollationField`, which is backed by http://site.icu-project.org[the ICU4J library], provides more flexible configuration, has more locales, is significantly faster, and requires less memory and less index space, since its keys are smaller than those produced by the JDK implementation that backs `solr.CollationField`. Rather than specifying an analyzer within `<fieldtype ... class="solr.TextField">`, the `solr.CollationField` and `solr.ICUCollationField` field type classes provide this functionality. `solr.ICUCollationField`, which is backed by http://site.icu-project.org[the ICU4J library], provides more flexible configuration, has more locales, is significantly faster, and requires less memory and less index space, since its keys are smaller than those produced by the JDK implementation that backs `solr.CollationField`.
To use `solr.ICUCollationField`, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See `solr/contrib/analysis-extras/README.txt` for instructions on which jars you need to add. To use `solr.ICUCollationField`, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See `solr/contrib/analysis-extras/README.md` for instructions on which jars you need to add.
`solr.ICUCollationField` and `solr.CollationField` fields can be created in two ways: `solr.ICUCollationField` and `solr.CollationField` fields can be created in two ways:
@ -487,7 +487,7 @@ The `lucene/analysis/opennlp` module provides OpenNLP integration via several an
NOTE: The <<OpenNLP Tokenizer>> must be used with all other OpenNLP analysis components, for two reasons: first, the OpenNLP Tokenizer detects and marks the sentence boundaries required by all the OpenNLP filters; and second, since the pre-trained OpenNLP models used by these filters were trained using the corresponding language-specific sentence-detection/tokenization models, the same tokenization, using the same models, must be used at runtime for optimal performance. NOTE: The <<OpenNLP Tokenizer>> must be used with all other OpenNLP analysis components, for two reasons: first, the OpenNLP Tokenizer detects and marks the sentence boundaries required by all the OpenNLP filters; and second, since the pre-trained OpenNLP models used by these filters were trained using the corresponding language-specific sentence-detection/tokenization models, the same tokenization, using the same models, must be used at runtime for optimal performance.
To use the OpenNLP components, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See `solr/contrib/analysis-extras/README.txt` for instructions on which jars you need to add. To use the OpenNLP components, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See `solr/contrib/analysis-extras/README.md` for instructions on which jars you need to add.
=== OpenNLP Tokenizer === OpenNLP Tokenizer
@ -1033,7 +1033,7 @@ Solr can stem Catalan using the Snowball Porter Stemmer with an argument of `lan
=== Traditional Chinese === Traditional Chinese
The default configuration of the <<tokenizers.adoc#icu-tokenizer,ICU Tokenizer>> is suitable for Traditional Chinese text. It follows the Word Break rules from the Unicode Text Segmentation algorithm for non-Chinese text, and uses a dictionary to segment Chinese words. To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See the `solr/contrib/analysis-extras/README.txt` for information on which jars you need to add. The default configuration of the <<tokenizers.adoc#icu-tokenizer,ICU Tokenizer>> is suitable for Traditional Chinese text. It follows the Word Break rules from the Unicode Text Segmentation algorithm for non-Chinese text, and uses a dictionary to segment Chinese words. To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See the `solr/contrib/analysis-extras/README.md` for information on which jars you need to add.
<<tokenizers.adoc#standard-tokenizer,Standard Tokenizer>> can also be used to tokenize Traditional Chinese text. Following the Word Break rules from the Unicode Text Segmentation algorithm, it produces one token per Chinese character. When combined with <<CJK Bigram Filter>>, overlapping bigrams of Chinese characters are formed. <<tokenizers.adoc#standard-tokenizer,Standard Tokenizer>> can also be used to tokenize Traditional Chinese text. Following the Word Break rules from the Unicode Text Segmentation algorithm, it produces one token per Chinese character. When combined with <<CJK Bigram Filter>>, overlapping bigrams of Chinese characters are formed.
@ -1105,9 +1105,9 @@ See the example under <<Traditional Chinese>>.
=== Simplified Chinese === Simplified Chinese
For Simplified Chinese, Solr provides support for Chinese sentence and word segmentation with the <<HMM Chinese Tokenizer>>. This component includes a large dictionary and segments Chinese text into words with the Hidden Markov Model. To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See the `solr/contrib/analysis-extras/README.txt` for information on which jars you need to add. For Simplified Chinese, Solr provides support for Chinese sentence and word segmentation with the <<HMM Chinese Tokenizer>>. This component includes a large dictionary and segments Chinese text into words with the Hidden Markov Model. To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See the `solr/contrib/analysis-extras/README.md` for information on which jars you need to add.
The default configuration of the <<tokenizers.adoc#icu-tokenizer,ICU Tokenizer>> is also suitable for Simplified Chinese text. It follows the Word Break rules from the Unicode Text Segmentation algorithm for non-Chinese text, and uses a dictionary to segment Chinese words. To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See the `solr/contrib/analysis-extras/README.txt` for information on which jars you need to add. The default configuration of the <<tokenizers.adoc#icu-tokenizer,ICU Tokenizer>> is also suitable for Simplified Chinese text. It follows the Word Break rules from the Unicode Text Segmentation algorithm for non-Chinese text, and uses a dictionary to segment Chinese words. To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See the `solr/contrib/analysis-extras/README.md` for information on which jars you need to add.
Also useful for Chinese analysis: Also useful for Chinese analysis:
@ -1162,7 +1162,7 @@ Also useful for Chinese analysis:
=== HMM Chinese Tokenizer === HMM Chinese Tokenizer
For Simplified Chinese, Solr provides support for Chinese sentence and word segmentation with the `solr.HMMChineseTokenizerFactory` in the `analysis-extras` contrib module. This component includes a large dictionary and segments Chinese text into words with the Hidden Markov Model. To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See `solr/contrib/analysis-extras/README.txt` for instructions on which jars you need to add. For Simplified Chinese, Solr provides support for Chinese sentence and word segmentation with the `solr.HMMChineseTokenizerFactory` in the `analysis-extras` contrib module. This component includes a large dictionary and segments Chinese text into words with the Hidden Markov Model. To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See `solr/contrib/analysis-extras/README.md` for instructions on which jars you need to add.
*Factory class:* `solr.HMMChineseTokenizerFactory` *Factory class:* `solr.HMMChineseTokenizerFactory`
@ -1960,7 +1960,7 @@ Example:
[[hebrew-lao-myanmar-khmer]] [[hebrew-lao-myanmar-khmer]]
=== Hebrew, Lao, Myanmar, Khmer === Hebrew, Lao, Myanmar, Khmer
Lucene provides support, in addition to UAX#29 word break rules, for Hebrew's use of the double and single quote characters, and for segmenting Lao, Myanmar, and Khmer into syllables with the `solr.ICUTokenizerFactory` in the `analysis-extras` contrib module. To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See `solr/contrib/analysis-extras/README.txt for` instructions on which jars you need to add. Lucene provides support, in addition to UAX#29 word break rules, for Hebrew's use of the double and single quote characters, and for segmenting Lao, Myanmar, and Khmer into syllables with the `solr.ICUTokenizerFactory` in the `analysis-extras` contrib module. To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See `solr/contrib/analysis-extras/README.md` for instructions on which jars you need to add.
See <<tokenizers.adoc#icu-tokenizer,the ICUTokenizer>> for more information. See <<tokenizers.adoc#icu-tokenizer,the ICUTokenizer>> for more information.
@ -2167,7 +2167,7 @@ Solr includes support for normalizing Persian, and Lucene includes an example st
=== Polish === Polish
Solr provides support for Polish stemming with the `solr.StempelPolishStemFilterFactory`, and `solr.MorphologikFilterFactory` for lemmatization, in the `contrib/analysis-extras` module. The `solr.StempelPolishStemFilterFactory` component includes an algorithmic stemmer with tables for Polish. To use either of these filters, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See `solr/contrib/analysis-extras/README.txt` for instructions on which jars you need to add. Solr provides support for Polish stemming with the `solr.StempelPolishStemFilterFactory`, and `solr.MorphologikFilterFactory` for lemmatization, in the `contrib/analysis-extras` module. The `solr.StempelPolishStemFilterFactory` component includes an algorithmic stemmer with tables for Polish. To use either of these filters, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See `solr/contrib/analysis-extras/README.md` for instructions on which jars you need to add.
*Factory class:* `solr.StempelPolishStemFilterFactory` and `solr.MorfologikFilterFactory` *Factory class:* `solr.StempelPolishStemFilterFactory` and `solr.MorfologikFilterFactory`
@ -2684,7 +2684,7 @@ Solr includes support for stemming Turkish with the `solr.SnowballPorterFilterFa
=== Ukrainian === Ukrainian
Solr provides support for Ukrainian lemmatization with the `solr.MorphologikFilterFactory`, in the `contrib/analysis-extras` module. To use this filter, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See `solr/contrib/analysis-extras/README.txt` for instructions on which jars you need to add. Solr provides support for Ukrainian lemmatization with the `solr.MorphologikFilterFactory`, in the `contrib/analysis-extras` module. To use this filter, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See `solr/contrib/analysis-extras/README.md` for instructions on which jars you need to add.
Lucene also includes an example Ukrainian stopword list, in the `lucene-analyzers-morfologik` jar. Lucene also includes an example Ukrainian stopword list, in the `lucene-analyzers-morfologik` jar.

View File

@ -916,7 +916,7 @@ You may get errors as it works through your documents. These might be caused by
DataImportHandler:: DataImportHandler::
Solr includes a tool called the <<uploading-structured-data-store-data-with-the-data-import-handler.adoc#uploading-structured-data-store-data-with-the-data-import-handler,Data Import Handler (DIH)>> which can connect to databases (if you have a jdbc driver), mail servers, or other structured data sources. There are several examples included for feeds, GMail, and a small HSQL database. Solr includes a tool called the <<uploading-structured-data-store-data-with-the-data-import-handler.adoc#uploading-structured-data-store-data-with-the-data-import-handler,Data Import Handler (DIH)>> which can connect to databases (if you have a jdbc driver), mail servers, or other structured data sources. There are several examples included for feeds, GMail, and a small HSQL database.
+ +
The `README.txt` file in `example/example-DIH` will give you details on how to start working with this tool. The `README.md` file in `example/example-DIH` will give you details on how to start working with this tool.
SolrJ:: SolrJ::
SolrJ is a Java-based client for interacting with Solr. Use <<using-solrj.adoc#using-solrj,SolrJ>> for JVM-based languages or other <<client-apis.adoc#client-apis,Solr clients>> to programmatically create documents to send to Solr. SolrJ is a Java-based client for interacting with Solr. Use <<using-solrj.adoc#using-solrj,SolrJ>> for JVM-based languages or other <<client-apis.adoc#client-apis,Solr clients>> to programmatically create documents to send to Solr.

View File

@ -514,7 +514,7 @@ The default configuration for `solr.ICUTokenizerFactory` provides UAX#29 word br
[IMPORTANT] [IMPORTANT]
==== ====
To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See the `solr/contrib/analysis-extras/README.txt` for information on which jars you need to add. To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See the `solr/contrib/analysis-extras/README.md` for information on which jars you need to add.
==== ====

View File

@ -111,7 +111,7 @@
</fileset> </fileset>
<fileset dir="."> <fileset dir=".">
<include name="lib/*" /> <include name="lib/*" />
<include name="README.txt" /> <include name="README.md" />
</fileset> </fileset>
</copy> </copy>
</target> </target>