SOLR-14429: Convert .txt files to properly formatted .md files (#1450)

This commit is contained in:
Tomoko Uchida 2020-04-27 08:43:04 +09:00 committed by GitHub
parent ce18505e28
commit f03e6aac59
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
33 changed files with 339 additions and 220 deletions

View File

@ -130,6 +130,7 @@ class RatTask extends DefaultTask {
List<String> srcExcludes = [
"**/TODO",
"**/*.txt",
"**/*.md",
"**/*.iml",
"build/**"
]

View File

@ -147,10 +147,12 @@ ant.fileScanner{
include(name: 'dev-tools/**/*.' + it)
include(name: '*.' + it)
}
// TODO: For now we don't scan txt files, so we
// TODO: For now we don't scan txt / md files, so we
// check licenses in top-level folders separately:
include(name: '*.txt')
include(name: '*/*.txt')
include(name: '*.md')
include(name: '*/*.md')
// excludes:
exclude(name: '**/build/**')
exclude(name: '**/dist/**')

View File

@ -62,6 +62,8 @@ Other Changes
* SOLR-14420: AuthenticationPlugin.authenticate accepts HttpServletRequest instead of ServletRequest. (Mike Drob)
* SOLR-14429: Convert .txt files to properly formatted .md files. (Tomoko Uchida, Uwe Schindler)
================== 8.6.0 ==================
Consult the LUCENE_CHANGES.txt file for additional, low level, changes in this release.

View File

@ -1,18 +1,19 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
Welcome to the Apache Solr project!
-----------------------------------
@ -22,7 +23,7 @@ from the Apache Lucene project.
For a complete description of the Solr project, team composition, source
code repositories, and other details, please see the Solr web site at
http://lucene.apache.org/solr
https://lucene.apache.org/solr
Getting Started
@ -30,37 +31,51 @@ Getting Started
To start Solr for the first time after installation, simply do:
```
bin/solr start
```
This will launch a standalone Solr server in the background of your shell,
listening on port 8983. Alternatively, you can launch Solr in "cloud" mode,
which allows you to scale out using sharding and replication. To launch Solr
in cloud mode, do:
```
bin/solr start -cloud
```
To see all available options for starting Solr, please do:
```
bin/solr start -help
```
After starting Solr, create either a core or collection depending on whether
Solr is running in standalone (core) or SolrCloud mode (collection) by doing:
```
bin/solr create -c <name>
```
This will create a collection that uses a data-driven schema which tries to guess
the correct field type when you add documents to the index. To see all available
options for creating a new collection, execute:
```
bin/solr create -help
```
After starting Solr, direct your Web browser to the Solr Admin Console at:
```
http://localhost:8983/solr/
```
When finished with your Solr installation, shut it down by executing:
```
bin/solr stop -all
```
The `-p PORT` option can also be used to identify the Solr instance to shutdown,
where more than one Solr is running on the machine.
@ -71,43 +86,55 @@ Solr Examples
Solr includes a few examples to help you get started. To run a specific example, do:
```
bin/solr -e <EXAMPLE> where <EXAMPLE> is one of:
cloud : SolrCloud example
dih : Data Import Handler (rdbms, mail, atom, tika)
schemaless : Schema-less example (schema is inferred from data during indexing)
techproducts : Kitchen sink example providing comprehensive examples of Solr features
```
For instance, if you want to run the Solr Data Import Handler example, do:
```
bin/solr -e dih
```
Indexing Documents
---------------
To add documents to the index, use bin/post. For example:
```
bin/post -c <collection_name> example/exampledocs/*.xml
```
For more information about Solr examples please read...
* example/README.txt
For more information about the "Solr Home" and Solr specific configuration
* [example/README.md](example/README.md)
For more information about the "Solr Home" and Solr specific configuration
* https://lucene.apache.org/solr/guide/solr-tutorial.html
For a Solr tutorial
* http://lucene.apache.org/solr/resources.html
For a list of other tutorials and introductory articles.
For a Solr tutorial
* https://lucene.apache.org/solr/resources.html
For a list of other tutorials and introductory articles.
or linked from "docs/index.html" in a binary distribution.
Also, there are Solr clients for many programming languages, see
http://wiki.apache.org/solr/IntegratingSolr
* https://wiki.apache.org/solr/IntegratingSolr
Files included in an Apache Solr binary distribution
----------------------------------------------------
```
server/
A self-contained Solr instance, complete with a sample
configuration and documents to index. Please see: bin/solr start -help
@ -116,7 +143,7 @@ server/
example/
Contains example documents and an alternative Solr home
directory containing examples of how to use the Data Import Handler,
see example/example-DIH/README.txt for more information.
see example/example-DIH/README.md for more information.
dist/solr-<component>-XX.jar
The Apache Solr libraries. To compile Apache Solr Plugins,
@ -126,7 +153,7 @@ dist/solr-<component>-XX.jar
docs/index.html
A link to the online version of Apache Solr Javadoc API documentation and Tutorial
```
Instructions for Building Apache Solr from Source
-------------------------------------------------
@ -151,14 +178,14 @@ Instructions for Building Apache Solr from Source
Alternately, you can obtain a copy of the latest Apache Solr source code
directly from the GIT repository:
http://lucene.apache.org/solr/versioncontrol.html
https://lucene.apache.org/solr/versioncontrol.html
4. Navigate to the "solr" folder and issue an "ant" command to see the available options
for building, testing, and packaging Solr.
NOTE:
To see Solr in action, you may want to use the "ant server" command to build
and package Solr into the server directory. See also server/README.txt.
and package Solr into the server directory. See also server/README.md.
Export control
@ -184,6 +211,7 @@ code and source code.
The following provides more details on the included cryptographic
software:
Apache Solr uses the Apache Tika which uses the Bouncy Castle generic encryption libraries for
extracting text content and metadata from encrypted PDF files.
See http://www.bouncycastle.org/ for more details on Bouncy Castle.
Apache Solr uses the Apache Tika which uses the Bouncy Castle generic encryption libraries for
extracting text content and metadata from encrypted PDF files.
See http://www.bouncycastle.org/ for more details on Bouncy Castle.

View File

@ -57,7 +57,7 @@
<attribute name="Main-Class" value="org.apache.solr.util.SimplePostTool"/>
</manifest>
</jar>
<echo>See ${common-solr.dir}/README.txt for how to run the Solr server.</echo>
<echo>See ${common-solr.dir}/README.md for how to run the Solr server.</echo>
</target>
<target name="run-example" depends="server"
@ -207,8 +207,8 @@
</xslt>
<markdown todir="${javadoc.dir}">
<fileset dir="site" includes="**/*.mdtext"/>
<globmapper from="*.mdtext" to="*.html"/>
<fileset dir="site" includes="**/*.md"/>
<globmapper from="*.md" to="*.html"/>
</markdown>
<copy todir="${javadoc.dir}">
@ -530,8 +530,8 @@
fullpath="${fullnamever}/LUCENE_CHANGES.txt" />
<tarfileset dir="."
prefix="${fullnamever}"
includes="LICENSE.txt NOTICE.txt CHANGES.txt README.txt SYSTEM_REQUIREMENTS.txt
bin/** server/** example/** contrib/**/lib/** contrib/**/conf/** contrib/**/README.txt
includes="LICENSE.txt NOTICE.txt CHANGES.txt README.md site/SYSTEM_REQUIREMENTS.md
bin/** server/** example/** contrib/**/lib/** contrib/**/conf/** contrib/**/README.md
licenses/**"
excludes="licenses/README.committers.txt **/data/ **/logs/*
**/classes/ **/*.sh **/ivy.xml **/build.xml

View File

@ -0,0 +1,26 @@
Apache Solr - Analysis Extras
=============================
The analysis-extras plugin provides additional analyzers that rely
upon large dependencies/dictionaries.
It includes integration with ICU for multilingual support,
analyzers for Chinese and Polish, and integration with
OpenNLP for multilingual tokenization, part-of-speech tagging
lemmatization, phrase chunking, and named-entity recognition.
Each of the jars below relies upon including `/dist/solr-analysis-extras-X.Y.jar`
in the `solrconfig.xml`
* ICU relies upon `lucene-libs/lucene-analyzers-icu-X.Y.jar`
and `lib/icu4j-X.Y.jar`
* Smartcn relies upon `lucene-libs/lucene-analyzers-smartcn-X.Y.jar`
* Stempel relies on `lucene-libs/lucene-analyzers-stempel-X.Y.jar`
* Morfologik relies on `lucene-libs/lucene-analyzers-morfologik-X.Y.jar`
and `lib/morfologik-*.jar`
* OpenNLP relies on `lucene-libs/lucene-analyzers-opennlp-X.Y.jar`
and `lib/opennlp-*.jar`

View File

@ -1,23 +0,0 @@
The analysis-extras plugin provides additional analyzers that rely
upon large dependencies/dictionaries.
It includes integration with ICU for multilingual support,
analyzers for Chinese and Polish, and integration with
OpenNLP for multilingual tokenization, part-of-speech tagging
lemmatization, phrase chunking, and named-entity recognition.
Each of the jars below relies upon including /dist/solr-analysis-extras-X.Y.jar
in the solrconfig.xml
ICU relies upon lucene-libs/lucene-analyzers-icu-X.Y.jar
and lib/icu4j-X.Y.jar
Smartcn relies upon lucene-libs/lucene-analyzers-smartcn-X.Y.jar
Stempel relies on lucene-libs/lucene-analyzers-stempel-X.Y.jar
Morfologik relies on lucene-libs/lucene-analyzers-morfologik-X.Y.jar
and lib/morfologik-*.jar
OpenNLP relies on lucene-libs/lucene-analyzers-opennlp-X.Y.jar
and lib/opennlp-*.jar

View File

@ -1,4 +1,5 @@
Apache Solr - DataImportHandler
Apache Solr - DataImportHandler
================================
Introduction
------------

View File

@ -1,4 +1,5 @@
Apache Solr Content Extraction Library (Solr Cell)
==================================================
Introduction
------------

View File

@ -18,6 +18,7 @@ Note that all library of solr-jaegertracer must be included in the classpath of
```
List of parameters for JaegerTracerConfigurator include:
|Parameter|Type|Required|Default|Description|
|---------|----|--------|-------|-----------|
|agentHost|string|Yes||The host of Jaeger backend|

View File

@ -1,5 +1,5 @@
Apache Solr Language Identifier
===============================
Introduction
------------

View File

@ -13,7 +13,7 @@ For information on how to get started with solr ltr please see:
# Getting Started With Solr
For information on how to get started with solr please see:
* [solr/README.txt](../../README.txt)
* [solr/README.md](../../README.md)
* [Solr Tutorial](https://lucene.apache.org/solr/guide/solr-tutorial.html)
# How To Contribute

View File

@ -1 +0,0 @@
README.md

View File

@ -11,7 +11,7 @@ For information on how to get started with solr-exporter please see:
# Getting Started With Solr
For information on how to get started with solr please see:
* [solr/README.txt](../../README.txt)
* [solr/README.md](../../README.md)
* [Solr Tutorial](https://lucene.apache.org/solr/guide/solr-tutorial.html)
# How To Contribute

View File

@ -1,17 +1,19 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
Solr example
------------
@ -19,44 +21,59 @@ Solr example
This directory contains Solr examples. Each example is contained in a
separate directory. To run a specific example, do:
```
bin/solr -e <EXAMPLE> where <EXAMPLE> is one of:
cloud : SolrCloud example
dih : Data Import Handler (rdbms, mail, atom, tika)
schemaless : Schema-less example (schema is inferred from data during indexing)
techproducts : Kitchen sink example providing comprehensive examples of Solr features
```
For instance, if you want to run the Solr Data Import Handler example, do:
```
bin/solr -e dih
```
To see all the options available when starting Solr:
```
bin/solr start -help
```
After starting a Solr example, direct your Web browser to:
```
http://localhost:8983/solr/
```
To add documents to the index, use bin/post, for example:
```
bin/post -c techproducts example/exampledocs/*.xml
```
(where "techproducts" is the Solr core name)
For more information about this example please read...
* example/solr/README.txt
For more information about the "Solr Home" and Solr specific configuration
* https://lucene.apache.org/solr/guide/solr-tutorial.html
For a Solr tutorial
* http://wiki.apache.org/solr/SolrResources
For a list of other tutorials and introductory articles.
* [solr/example/README.md](./README.md)
For more information about the "Solr Home" and Solr specific configuration
* https://lucene.apache.org/solr/guide/solr-tutorial.html
For a Solr tutorial
* https://wiki.apache.org/solr/SolrResources
For a list of other tutorials and introductory articles.
Notes About These Examples
--------------------------
* References to Jar Files Outside This Directory *
### References to Jar Files Outside This Directory
Various example SolrHome dirs contained in this directory may use "<lib>"
statements in the solrconfig.xml file to reference plugin jars outside of
@ -68,7 +85,7 @@ clustering component, or any other modules in "contrib", you will need to
copy the required jars or update the paths to those jars in your
solrconfig.xml.
* Logging *
### Logging
By default, Jetty & Solr will log to the console and logs/solr.log. This can
be convenient when first getting started, but eventually you will want to

View File

@ -41,7 +41,7 @@ task assemblePackaging(type: Sync) {
include "exampledocs/**"
include "files/**"
include "films/**"
include "README.txt"
include "README.md"
exclude "**/*.jar"
})

View File

@ -1,24 +1,28 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
Solr DataImportHandler example configuration
--------------------------------------------
To run this multi-core example, use the "-e" option of the bin/solr script:
```
> bin/solr -e dih
```
When Solr is started connect to:

View File

@ -22,28 +22,34 @@ PDFs, HTML, and many other supported types.
For further explanations, see the frequently asked questions at the end of the guide.
##GETTING STARTED
## GETTING STARTED
* To start Solr, enter the following command (make sure youve cded into the directory in which Solr was installed):
```
bin/solr start
```
* If youve started correctly, you should see the following output:
```
Waiting to see Solr listening on port 8983 [/]
Started Solr server on port 8983 (pid=<your pid>). Happy searching!
<hr>
```
##CREATING THE CORE/COLLECTION
## CREATING THE CORE/COLLECTION
* Before you can index your documents, youll need to create a core/collection. Do this by entering:
```
bin/solr create -c files -d example/files/conf
```
* Now youve created a core called “files” using a configuration tuned for indexing and querying rich text files.
* You should see the following response:
```
Creating new core 'files' using command:
http://localhost:8983/solr/admin/cores?action=CREATE&name=files&instanceDir=files
@ -52,26 +58,31 @@ For further explanations, see the frequently asked questions at the end of the g
"status":0,
"QTime":239},
"core":"files"}
```
<hr>
##INDEXING DOCUMENTS
## INDEXING DOCUMENTS
* Return to your command shell. To post all of your documents to the documents core, enter the following:
```
bin/post -c files ~/Documents
```
* Depending on how many documents you have, this could take a while. Sit back and watch the magic happen. When all of your documents have been indexed youll see something like:
```
<some number> files indexed.
COMMITting Solr index changes to http://localhost:8983/solr/files/update...
Time spent: <some amount of time>
```
* To see a list of accepted file types, do:
bin/post -h
<hr>
##BROWSING DOCUMENTS
```
bin/post -h
```
## BROWSING DOCUMENTS
* Your document information can be viewed in multiple formats: XML, JSON, CSV, as well as a nice HTML interface.
@ -80,8 +91,7 @@ For further explanations, see the frequently asked questions at the end of the g
* To view your document information in XML or other formats, add &wt (for writer type) to the end of that URL. i.e. To view your results in xml format direct your browser to:
[http://localhost:8983/solr/files/browse?&wt=xml](http://localhost:8983/solr/files/browse?&wt=xml)
<hr>
##ADMIN UI
## ADMIN UI
* Another way to verify that your core has been created is to view it in the Admin User Interface.
@ -98,8 +108,7 @@ For further explanations, see the frequently asked questions at the end of the g
* Now youve opened the core page. On this page there are a multitude of different tools you can use to analyze and search your core. You will make use of these features after indexing your documents.
* Take note of the "Num Docs" field in your core Statistics. If after indexing your documents, it shows Num Docs to be 0, that means there was a problem indexing.
<hr>
##QUERYING INDEX
## QUERYING INDEX
* In the Admin UI, enter a term in the query box to see which documents contain the word.
@ -111,42 +120,48 @@ For further explanations, see the frequently asked questions at the end of the g
* Another way to query the index is by manipulating the URL in your address bar once in the browse view.
* i.e. : [http://localhost:8983/solr/files/browse?q=Lucene](http://localhost:8983/solr/files/browse?q=Lucene)
<hr>
##FAQs
## FAQs
* Why use -d when creating a core?
* -d specifies a specific configuration to use. This example as a configuration tuned for indexing and query rich
text files.
* How do I delete a core?
* To delete a core (i.e. files), you can enter the following in your command shell:
bin/solr delete -c files
* You should see the following output:
* To delete a core (i.e. files), you can enter the following in your command shell:
Deleting core 'files' using command:
http://localhost:8983/solr/admin/cores?action=UNLOAD&core=files&deleteIndex=true&deleteDataDir=true&deleteInstanceDir=true
```
bin/solr delete -c files
```
* You should see the following output:
Deleting core 'files' using command:
```
http://localhost:8983/solr/admin/cores?action=UNLOAD&core=files&deleteIndex=true&deleteDataDir=true&deleteInstanceDir=true
{"responseHeader":{
"status":0,
"QTime":19}}
* This calls the Solr core admin handler, "UNLOAD", and the parameters "deleteDataDir" and "deleteInstanceDir" to ensure that all data associated with core is also removed
{"responseHeader":{
"status":0,
"QTime":19}}
```
* This calls the Solr core admin handler, "UNLOAD", and the parameters "deleteDataDir" and "deleteInstanceDir" to ensure that all data associated with core is also removed
* How can I change the /browse UI?
The primary templates are under example/files/conf/velocity. **In order to edit those files in place (without having to
re-create or patch a core/collection with an updated configuration)**, Solr can be started with a special system property
set to the _absolute_ path to the conf/velocity directory, like this:
The primary templates are under example/files/conf/velocity. **In order to edit those files in place (without having to
re-create or patch a core/collection with an updated configuration)**, Solr can be started with a special system property
set to the _absolute_ path to the conf/velocity directory, like this:
bin/solr start -Dvelocity.template.base.dir=</full/path/to>/example/files/conf/velocity/
If you want to adjust the browse templates for an existing collection, edit the cores configuration
under server/solr/files/conf/velocity.
```
bin/solr start -Dvelocity.template.base.dir=</full/path/to>/example/files/conf/velocity/
```
If you want to adjust the browse templates for an existing collection, edit the cores configuration
under server/solr/files/conf/velocity.
## Provenance of free images used in this example:
=======
* Provenance of free images used in this example:
- Globe icon: visualpharm.com
- Flag icons: freeflagicons.com

View File

@ -12,37 +12,48 @@ This data consists of the following fields:
Steps:
* Start Solr:
bin/solr start
```
bin/solr start
```
* Create a "films" core:
bin/solr create -c films
```
bin/solr create -c films
```
* Set the schema on a couple of fields that Solr would otherwise guess differently (than we'd like) about:
curl http://localhost:8983/solr/films/schema -X POST -H 'Content-type:application/json' --data-binary '{
"add-field" : {
"name":"name",
"type":"text_general",
"multiValued":false,
"stored":true
},
"add-field" : {
"name":"initial_release_date",
"type":"pdate",
"stored":true
}
}'
```
curl http://localhost:8983/solr/films/schema -X POST -H 'Content-type:application/json' --data-binary '{
"add-field" : {
"name":"name",
"type":"text_general",
"multiValued":false,
"stored":true
},
"add-field" : {
"name":"initial_release_date",
"type":"pdate",
"stored":true
}
}'
```
* Now let's index the data, using one of these three commands:
- JSON: bin/post -c films example/films/films.json
- XML: bin/post -c films example/films/films.xml
- CSV: bin/post \
- JSON: `bin/post -c films example/films/films.json`
- XML: `bin/post -c films example/films/films.xml`
- CSV:
```
bin/post \
-c films \
example/films/films.csv \
-params "f.genre.split=true&f.directed_by.split=true&f.genre.separator=|&f.directed_by.separator=|"
```
* Let's get searching!
- Search for 'Batman':
http://localhost:8983/solr/films/query?q=name:batman
* If you get an error about the name field not existing, you haven't yet indexed the data
@ -51,27 +62,34 @@ curl http://localhost:8983/solr/films/schema -X POST -H 'Content-type:applicatio
It's easiest to simply reset the environment and try again, ensuring that each step successfully executes.
- Show me all 'Super hero' movies:
http://localhost:8983/solr/films/query?q=*:*&fq=genre:%22Superhero%20movie%22
- Let's see the distribution of genres across all the movies. See the facet section of the response for the counts:
http://localhost:8983/solr/films/query?q=*:*&facet=true&facet.field=genre
- Browse the indexed films in a traditional browser search interface:
http://localhost:8983/solr/films/browse
Now browse including the genre field as a facet:
http://localhost:8983/solr/films/browse?facet.field=genre
If you want to set a facet for /browse to keep around for every request add the facet.field into the "facets"
param set (which the /browse handler is already configured to use):
curl http://localhost:8983/solr/films/config/params -H 'Content-type:application/json' -d '{
"update" : {
"facets": {
"facet.field":"genre"
}
}
}'
```
curl http://localhost:8983/solr/films/config/params -H 'Content-type:application/json' -d '{
"update" : {
"facets": {
"facet.field":"genre"
}
}
}'
```
And now http://localhost:8983/solr/films/browse will display the _genre_ facet automatically.
Exploring the data further -
@ -93,6 +111,7 @@ FAQ:
Is there an easy to copy/paste script to do all of the above?
```
Here ya go << END_OF_SCRIPT
bin/solr stop
@ -123,9 +142,11 @@ curl http://localhost:8983/solr/films/config/params -H 'Content-type:application
}'
# END_OF_SCRIPT
```
Additional fun -
```
Add highlighting:
curl http://localhost:8983/solr/films/config/params -H 'Content-type:application/json' -d '{
"set" : {
@ -135,4 +156,6 @@ curl http://localhost:8983/solr/films/config/params -H 'Content-type:application
}
}
}'
```
try http://localhost:8983/solr/films/browse?q=batman now, and you'll see "batman" highlighted in the results

View File

@ -72,7 +72,7 @@ task toDir(type: Sync) {
include "CHANGES.txt"
include "LICENSE.txt"
include "NOTICE.txt"
include "README.txt"
include "README.md"
})
from(project(":lucene").projectDir, {

View File

@ -1,17 +1,19 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
Solr server
------------
@ -21,14 +23,17 @@ run Solr.
To run Solr:
```
cd $SOLR_INSTALL
bin/solr start
```
where $SOLR_INSTALL is the location where you extracted the Solr installation bundle.
Server directory layout
-----------------------
```
server/contexts
This directory contains the Jetty Web application deployment descriptor for the Solr Web app.
@ -75,18 +80,18 @@ server/solr/configsets
server/solr-webapp
Contains files used by the Solr server; do not edit files in this directory (Solr is not a Java Web application).
```
Notes About Solr Examples
--------------------------
* SolrHome *
### SolrHome
By default, start.jar starts Solr in Jetty using the default Solr Home
directory of "./solr/" (relative to the working directory of the servlet
container).
* References to Jar Files Outside This Directory *
### References to Jar Files Outside This Directory
Various example SolrHome dirs contained in this directory may use "<lib>"
statements in the solrconfig.xml file to reference plugin jars outside of
@ -98,7 +103,7 @@ clustering component, or any other modules in "contrib", you will need to
copy the required jars or update the paths to those jars in your
solrconfig.xml.
* Logging *
### Logging
By default, Jetty & Solr will log to the console and logs/solr.log. This can
be convenient when first getting started, but eventually you will want to

View File

@ -100,7 +100,7 @@ task assemblePackaging(type: Sync) {
include "resources/**"
include "scripts/**"
include "solr/**"
include "README.txt"
include "README.md"
})
from(configurations.compileClasspath, {

View File

@ -1,18 +1,19 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
Default Solr Home Directory
=============================
@ -26,7 +27,7 @@ Basic Directory Structure
The Solr Home directory typically contains the following...
* solr.xml *
### solr.xml
This is the primary configuration file Solr looks for when starting;
it specifies high-level configuration options that apply to all
@ -44,13 +45,13 @@ for collection1 should be the same as the Solr Home Directory.
For more information about solr.xml, please see:
https://lucene.apache.org/solr/guide/solr-cores-and-solr-xml.html
* Individual SolrCore Instance Directories *
### Individual SolrCore Instance Directories
Although solr.xml can be configured to look for SolrCore Instance Directories
in any path, simple sub-directories of the Solr Home Dir using relative paths
are common for many installations.
* Core Discovery *
### Core Discovery
During startup, Solr will scan sub-directories of Solr home looking for
a specific file named core.properties. If core.properties is found in a
@ -60,15 +61,16 @@ defined in core.properties. For an example of core.properties, please see:
example/solr/collection1/core.properties
For more information about core discovery, please see:
https://lucene.apache.org/solr/guide/defining-core-properties.html
* A Shared 'lib' Directory *
### A Shared 'lib' Directory
Although solr.xml can be configured with an optional "sharedLib" attribute
that can point to any path, it is common to use a "./lib" sub-directory of the
Solr Home Directory.
* ZooKeeper Files *
### ZooKeeper Files
When using SolrCloud using the embedded ZooKeeper option for Solr, it is
common to have a "zoo.cfg" file and "zoo_data" directories in the Solr Home

View File

@ -30,6 +30,7 @@ It's nice in this context because change to the templates
are immediately visible in browser on the next visit.
Links:
http://velocity.apache.org
http://wiki.apache.org/velocity/
http://velocity.apache.org/engine/releases/velocity-1.7/user-guide.html
@ -39,14 +40,18 @@ File List
---------
System and Misc:
```
VM_global_library.vm - Macros used other templates,
exact filename is important for Velocity to see it
error.vm - shows errors, if any
debug.vm - includes toggle links for "explain" and "all fields"
activated by debug link in footer.vm
README.txt - this file
README.md - this file
```
Overall Page Composition:
```
browse.vm - Main entry point into templates
layout.vm - overall HTML page layout
head.vm - elements in the <head> section of the HTML document
@ -55,22 +60,30 @@ Overall Page Composition:
includes debug and help links
main.css - CSS style for overall pages
see also jquery.autocomplete.css
```
Query Form and Options:
```
query_form.vm - renders query form
query_group.vm - group by fields
e.g.: Manufacturer or Poplularity
query_spatial.vm - select box for location based Geospacial search
```
Spelling Suggestions:
```
did_you_mean.vm - hyperlinked spelling suggestions in results
suggest.vm - dynamic spelling suggestions
as you type in the search form
jquery.autocomplete.js - supporting files for dynamic suggestions
jquery.autocomplete.css - Most CSS is defined in main.css
```
Search Results, General:
```
(see also browse.vm)
tabs.vm - provides navigation to advanced search options
pagination_top.vm - paging and staticis at top of results
@ -84,9 +97,10 @@ Search Results, General:
richtext_doc.vm - display a complex/misc. document
hit_plain.vm - basic display of all fields,
edit results_list.vm to enable this
```
Search Results, Facets & Clusters:
```
facets.vm - calls the 4 facet and 1 cluster template
facet_fields.vm - display facets based on field values
e.g.: fields specified by &facet.field=
@ -99,3 +113,4 @@ Search Results, Facets & Clusters:
cluster.vm - if clustering is available
then call cluster_results.vm
cluster_results.vm - actual rendering of clusters
```

View File

@ -816,7 +816,7 @@ Note that for this filter to work properly, the upstream tokenizer must not remo
This filter is a custom Unicode normalization form that applies the foldings specified in http://www.unicode.org/reports/tr30/tr30-4.html[Unicode TR #30: Character Foldings] in addition to the `NFKC_Casefold` normalization form as described in <<ICU Normalizer 2 Filter>>. This filter is a better substitute for the combined behavior of the <<ASCII Folding Filter>>, <<Lower Case Filter>>, and <<ICU Normalizer 2 Filter>>.
To use this filter, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See `solr/contrib/analysis-extras/README.txt` for instructions on which jars you need to add.
To use this filter, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See `solr/contrib/analysis-extras/README.md` for instructions on which jars you need to add.
*Factory class:* `solr.ICUFoldingFilterFactory`
@ -924,7 +924,7 @@ This filter factory normalizes text according to one of five Unicode Normalizati
For detailed information about these normalization forms, see http://unicode.org/reports/tr15/[Unicode Normalization Forms].
To use this filter, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See `solr/contrib/analysis-extras/README.txt` for instructions on which jars you need to add.
To use this filter, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See `solr/contrib/analysis-extras/README.md` for instructions on which jars you need to add.
== ICU Transform Filter
@ -966,7 +966,7 @@ This filter applies http://userguide.icu-project.org/transforms/general[ICU Tran
For detailed information about ICU Transforms, see http://userguide.icu-project.org/transforms/general.
To use this filter, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See `solr/contrib/analysis-extras/README.txt` for instructions on which jars you need to add.
To use this filter, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See `solr/contrib/analysis-extras/README.md` for instructions on which jars you need to add.
== Keep Word Filter

View File

@ -220,7 +220,7 @@ Unicode Collation in Solr is fast, because all the work is done at index time.
Rather than specifying an analyzer within `<fieldtype ... class="solr.TextField">`, the `solr.CollationField` and `solr.ICUCollationField` field type classes provide this functionality. `solr.ICUCollationField`, which is backed by http://site.icu-project.org[the ICU4J library], provides more flexible configuration, has more locales, is significantly faster, and requires less memory and less index space, since its keys are smaller than those produced by the JDK implementation that backs `solr.CollationField`.
To use `solr.ICUCollationField`, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See `solr/contrib/analysis-extras/README.txt` for instructions on which jars you need to add.
To use `solr.ICUCollationField`, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See `solr/contrib/analysis-extras/README.md` for instructions on which jars you need to add.
`solr.ICUCollationField` and `solr.CollationField` fields can be created in two ways:
@ -487,7 +487,7 @@ The `lucene/analysis/opennlp` module provides OpenNLP integration via several an
NOTE: The <<OpenNLP Tokenizer>> must be used with all other OpenNLP analysis components, for two reasons: first, the OpenNLP Tokenizer detects and marks the sentence boundaries required by all the OpenNLP filters; and second, since the pre-trained OpenNLP models used by these filters were trained using the corresponding language-specific sentence-detection/tokenization models, the same tokenization, using the same models, must be used at runtime for optimal performance.
To use the OpenNLP components, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See `solr/contrib/analysis-extras/README.txt` for instructions on which jars you need to add.
To use the OpenNLP components, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See `solr/contrib/analysis-extras/README.md` for instructions on which jars you need to add.
=== OpenNLP Tokenizer
@ -1033,7 +1033,7 @@ Solr can stem Catalan using the Snowball Porter Stemmer with an argument of `lan
=== Traditional Chinese
The default configuration of the <<tokenizers.adoc#icu-tokenizer,ICU Tokenizer>> is suitable for Traditional Chinese text. It follows the Word Break rules from the Unicode Text Segmentation algorithm for non-Chinese text, and uses a dictionary to segment Chinese words. To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See the `solr/contrib/analysis-extras/README.txt` for information on which jars you need to add.
The default configuration of the <<tokenizers.adoc#icu-tokenizer,ICU Tokenizer>> is suitable for Traditional Chinese text. It follows the Word Break rules from the Unicode Text Segmentation algorithm for non-Chinese text, and uses a dictionary to segment Chinese words. To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See the `solr/contrib/analysis-extras/README.md` for information on which jars you need to add.
<<tokenizers.adoc#standard-tokenizer,Standard Tokenizer>> can also be used to tokenize Traditional Chinese text. Following the Word Break rules from the Unicode Text Segmentation algorithm, it produces one token per Chinese character. When combined with <<CJK Bigram Filter>>, overlapping bigrams of Chinese characters are formed.
@ -1105,9 +1105,9 @@ See the example under <<Traditional Chinese>>.
=== Simplified Chinese
For Simplified Chinese, Solr provides support for Chinese sentence and word segmentation with the <<HMM Chinese Tokenizer>>. This component includes a large dictionary and segments Chinese text into words with the Hidden Markov Model. To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See the `solr/contrib/analysis-extras/README.txt` for information on which jars you need to add.
For Simplified Chinese, Solr provides support for Chinese sentence and word segmentation with the <<HMM Chinese Tokenizer>>. This component includes a large dictionary and segments Chinese text into words with the Hidden Markov Model. To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See the `solr/contrib/analysis-extras/README.md` for information on which jars you need to add.
The default configuration of the <<tokenizers.adoc#icu-tokenizer,ICU Tokenizer>> is also suitable for Simplified Chinese text. It follows the Word Break rules from the Unicode Text Segmentation algorithm for non-Chinese text, and uses a dictionary to segment Chinese words. To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See the `solr/contrib/analysis-extras/README.txt` for information on which jars you need to add.
The default configuration of the <<tokenizers.adoc#icu-tokenizer,ICU Tokenizer>> is also suitable for Simplified Chinese text. It follows the Word Break rules from the Unicode Text Segmentation algorithm for non-Chinese text, and uses a dictionary to segment Chinese words. To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See the `solr/contrib/analysis-extras/README.md` for information on which jars you need to add.
Also useful for Chinese analysis:
@ -1162,7 +1162,7 @@ Also useful for Chinese analysis:
=== HMM Chinese Tokenizer
For Simplified Chinese, Solr provides support for Chinese sentence and word segmentation with the `solr.HMMChineseTokenizerFactory` in the `analysis-extras` contrib module. This component includes a large dictionary and segments Chinese text into words with the Hidden Markov Model. To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See `solr/contrib/analysis-extras/README.txt` for instructions on which jars you need to add.
For Simplified Chinese, Solr provides support for Chinese sentence and word segmentation with the `solr.HMMChineseTokenizerFactory` in the `analysis-extras` contrib module. This component includes a large dictionary and segments Chinese text into words with the Hidden Markov Model. To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See `solr/contrib/analysis-extras/README.md` for instructions on which jars you need to add.
*Factory class:* `solr.HMMChineseTokenizerFactory`
@ -1960,7 +1960,7 @@ Example:
[[hebrew-lao-myanmar-khmer]]
=== Hebrew, Lao, Myanmar, Khmer
Lucene provides support, in addition to UAX#29 word break rules, for Hebrew's use of the double and single quote characters, and for segmenting Lao, Myanmar, and Khmer into syllables with the `solr.ICUTokenizerFactory` in the `analysis-extras` contrib module. To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See `solr/contrib/analysis-extras/README.txt for` instructions on which jars you need to add.
Lucene provides support, in addition to UAX#29 word break rules, for Hebrew's use of the double and single quote characters, and for segmenting Lao, Myanmar, and Khmer into syllables with the `solr.ICUTokenizerFactory` in the `analysis-extras` contrib module. To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See `solr/contrib/analysis-extras/README.md` for instructions on which jars you need to add.
See <<tokenizers.adoc#icu-tokenizer,the ICUTokenizer>> for more information.
@ -2167,7 +2167,7 @@ Solr includes support for normalizing Persian, and Lucene includes an example st
=== Polish
Solr provides support for Polish stemming with the `solr.StempelPolishStemFilterFactory`, and `solr.MorphologikFilterFactory` for lemmatization, in the `contrib/analysis-extras` module. The `solr.StempelPolishStemFilterFactory` component includes an algorithmic stemmer with tables for Polish. To use either of these filters, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See `solr/contrib/analysis-extras/README.txt` for instructions on which jars you need to add.
Solr provides support for Polish stemming with the `solr.StempelPolishStemFilterFactory`, and `solr.MorphologikFilterFactory` for lemmatization, in the `contrib/analysis-extras` module. The `solr.StempelPolishStemFilterFactory` component includes an algorithmic stemmer with tables for Polish. To use either of these filters, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See `solr/contrib/analysis-extras/README.md` for instructions on which jars you need to add.
*Factory class:* `solr.StempelPolishStemFilterFactory` and `solr.MorfologikFilterFactory`
@ -2684,7 +2684,7 @@ Solr includes support for stemming Turkish with the `solr.SnowballPorterFilterFa
=== Ukrainian
Solr provides support for Ukrainian lemmatization with the `solr.MorphologikFilterFactory`, in the `contrib/analysis-extras` module. To use this filter, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See `solr/contrib/analysis-extras/README.txt` for instructions on which jars you need to add.
Solr provides support for Ukrainian lemmatization with the `solr.MorphologikFilterFactory`, in the `contrib/analysis-extras` module. To use this filter, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See `solr/contrib/analysis-extras/README.md` for instructions on which jars you need to add.
Lucene also includes an example Ukrainian stopword list, in the `lucene-analyzers-morfologik` jar.

View File

@ -916,7 +916,7 @@ You may get errors as it works through your documents. These might be caused by
DataImportHandler::
Solr includes a tool called the <<uploading-structured-data-store-data-with-the-data-import-handler.adoc#uploading-structured-data-store-data-with-the-data-import-handler,Data Import Handler (DIH)>> which can connect to databases (if you have a jdbc driver), mail servers, or other structured data sources. There are several examples included for feeds, GMail, and a small HSQL database.
+
The `README.txt` file in `example/example-DIH` will give you details on how to start working with this tool.
The `README.md` file in `example/example-DIH` will give you details on how to start working with this tool.
SolrJ::
SolrJ is a Java-based client for interacting with Solr. Use <<using-solrj.adoc#using-solrj,SolrJ>> for JVM-based languages or other <<client-apis.adoc#client-apis,Solr clients>> to programmatically create documents to send to Solr.

View File

@ -514,7 +514,7 @@ The default configuration for `solr.ICUTokenizerFactory` provides UAX#29 word br
[IMPORTANT]
====
To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See the `solr/contrib/analysis-extras/README.txt` for information on which jars you need to add.
To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See the `solr/contrib/analysis-extras/README.md` for information on which jars you need to add.
====

View File

@ -111,7 +111,7 @@
</fileset>
<fileset dir=".">
<include name="lib/*" />
<include name="README.txt" />
<include name="README.md" />
</fileset>
</copy>
</target>