lucene/solr/solr-ref-guide/src/managed-resources.adoc

292 lines
14 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

= Managed Resources
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
Managed resources expose a REST API endpoint for performing Create-Read-Update-Delete (CRUD) operations on a Solr object.
Any long-lived Solr object that has configuration settings and/or data is a good candidate to be a managed resource. Managed resources complement other programmatically manageable components in Solr, such as the RESTful schema API to add fields to a managed schema.
Consider a Web-based UI that offers Solr-as-a-Service where users need to configure a set of stop words and synonym mappings as part of an initial setup process for their search application. This type of use case can easily be supported using the Managed Stop Filter & Managed Synonym Graph Filter Factories provided by Solr, via the Managed resources REST API.
Users can also write their own custom plugins, that leverage the same internal hooks to make additional resources REST managed.
All of the examples in this section assume you are running the "techproducts" Solr example:
[source,bash]
----
bin/solr -e techproducts
----
== Managed Resources Overview
Let's begin learning about managed resources by looking at a couple of examples provided by Solr for managing stop words and synonyms using a REST API. After reading this section, you'll be ready to dig into the details of how managed resources are implemented in Solr so you can start building your own implementation.
=== Managing Stop Words
To begin, you need to define a field type that uses the <<filter-descriptions.adoc#managed-stop-filter,ManagedStopFilterFactory>>, such as:
[source,xml,subs="verbatim,callouts"]
----
<fieldType name="managed_en" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.ManagedStopFilterFactory" <!--1-->
managed="english" /> <!--2-->
</analyzer>
</fieldType>
----
There are two important things to notice about this field type definition:
<1> The filter implementation class is `solr.ManagedStopFilterFactory`. This is a special implementation of the <<filter-descriptions.adoc#stop-filter,StopFilterFactory>> that uses a set of stop words that are managed from a REST API.
<2> The `managed=”english”` attribute gives a name to the set of managed stop words, in this case indicating the stop words are for English text.
The REST endpoint for managing the English stop words in the techproducts collection is: `/solr/techproducts/schema/analysis/stopwords/english`.
The example resource path should be mostly self-explanatory. It should be noted that the ManagedStopFilterFactory implementation determines the `/schema/analysis/stopwords` part of the path, which makes sense because this is an analysis component defined by the schema.
It follows that a field type that uses the following filter:
[source,xml]
----
<filter class="solr.ManagedStopFilterFactory"
managed="french" />
----
would resolve to path: `/solr/techproducts/schema/analysis/stopwords/french`.
So now lets see this API in action, starting with a simple GET request:
[source,bash]
----
curl "http://localhost:8983/solr/techproducts/schema/analysis/stopwords/english"
----
Assuming you sent this request to Solr, the response body is a JSON document:
[source,json]
----
{
"responseHeader":{
"status":0,
"QTime":1
},
"wordSet":{
"initArgs":{"ignoreCase":true},
"initializedOn":"2014-03-28T20:53:53.058Z",
"managedList":[
"a",
"an",
"and",
"are",
]
}
}
----
The `sample_techproducts_configs` <<config-sets.adoc#config-sets,configset>> ships with a pre-built set of managed stop words, however you should only interact with this file using the API and not edit it directly.
One thing that should stand out to you in this response is that it contains a `managedList` of words as well as `initArgs`. This is an important concept in this framework -- managed resources typically have configuration and data. For stop words, the only configuration parameter is a boolean that determines whether to ignore the case of tokens during stop word filtering (ignoreCase=true|false). The data is a list of words, which is represented as a JSON array named `managedList` in the response.
Now, lets add a new word to the English stop word list using an HTTP PUT:
[source,bash]
----
curl -X PUT -H 'Content-type:application/json' --data-binary '["foo"]' "http://localhost:8983/solr/techproducts/schema/analysis/stopwords/english"
----
Here were using curl to PUT a JSON list containing a single word “foo” to the managed English stop words set. Solr will return 200 if the request was successful. You can also put multiple words in a single PUT request.
You can test to see if a specific word exists by sending a GET request for that word as a child resource of the set, such as:
[source,bash]
----
curl "http://localhost:8983/solr/techproducts/schema/analysis/stopwords/english/foo"
----
This request will return a status code of 200 if the child resource (foo) exists or 404 if it does not exist the managed list.
To delete a stop word, you would do:
[source,bash]
----
curl -X DELETE "http://localhost:8983/solr/techproducts/schema/analysis/stopwords/english/foo"
----
NOTE: PUT/POST is used to add terms to an existing list instead of replacing the list entirely. This is because it is more common to add a term to an existing list than it is to replace a list altogether, so the API favors the more common approach of incrementally adding terms especially since deleting individual terms is also supported.
=== Managing Synonyms
For the most part, the API for managing synonyms behaves similar to the API for stop words, except instead of working with a list of words, it uses a map, where the value for each entry in the map is a set of synonyms for a term. As with stop words, the `sample_techproducts_configs` <<config-sets.adoc#config-sets,configset>> includes a pre-built set of synonym mappings suitable for the sample data that is activated by the following field type definition in `schema.xml`:
[source,xml]
----
<fieldType name="managed_en" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.ManagedStopFilterFactory" managed="english" />
<filter class="solr.ManagedSynonymGraphFilterFactory" managed="english" />
<filter class="solr.FlattenGraphFilterFactory"/> <!-- required on index analyzers after graph filters -->
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.ManagedStopFilterFactory" managed="english" />
<filter class="solr.ManagedSynonymGraphFilterFactory" managed="english" />
</analyzer>
</fieldType>
----
To get the map of managed synonyms, send a GET request to:
[source,bash]
----
curl "http://localhost:8983/solr/techproducts/schema/analysis/synonyms/english"
----
This request will return a response that looks like:
[source,json]
----
{
"responseHeader":{
"status":0,
"QTime":3},
"synonymMappings":{
"initArgs":{
"ignoreCase":true,
"format":"solr"},
"initializedOn":"2014-12-16T22:44:05.33Z",
"managedMap":{
"GB":
["GiB",
"Gigabyte"],
"TV":
["Television"],
"happy":
["glad",
"joyful"]}}}
----
Managed synonyms are returned under the `managedMap` property which contains a JSON Map where the value of each entry is a set of synonyms for a term, such as "happy" has synonyms "glad" and "joyful" in the example above.
To add a new synonym mapping, you can PUT/POST a single mapping such as:
[source,bash]
----
curl -X PUT -H 'Content-type:application/json' --data-binary '{"mad":["angry","upset"]}' "http://localhost:8983/solr/techproducts/schema/analysis/synonyms/english"
----
The API will return status code 200 if the PUT request was successful. To determine the synonyms for a specific term, you send a GET request for the child resource, such as `/schema/analysis/synonyms/english/mad` would return `["angry","upset"]`.
You can also PUT a list of symmetric synonyms, which will be expanded into a mapping for each term in the list. For example, you could PUT the following list of symmetric synonyms using the JSON list syntax instead of a map:
[source,bash]
----
curl -X PUT -H 'Content-type:application/json' --data-binary '["funny", "entertaining", "whimiscal", "jocular"]' "http://localhost:8983/solr/techproducts/schema/analysis/synonyms/english"
----
Note that the expansion is performed when processing the PUT request so the underlying persistent state is still a managed map. Consequently, if after sending the previous PUT request, you did a GET for `/schema/analysis/synonyms/english/jocular`, then you would receive a list containing `["funny", "entertaining", "whimiscal"]`. Once you've created synonym mappings using a list, each term must be managed separately.
Lastly, you can delete a mapping by sending a DELETE request to the managed endpoint.
== Applying Managed Resource Changes
Changes made to managed resources via this REST API are not applied to the active Solr components until the Solr collection (or Solr core in single server mode) is reloaded.
For example: after adding or deleting a stop word, you must reload the core/collection before changes become active; related APIs: <<coreadmin-api.adoc#coreadmin-api,CoreAdmin API>> and <<collections-api.adoc#collections-api,Collections API>>.
This approach is required when running in distributed mode so that we are assured changes are applied to all cores in a collection at the same time so that behavior is consistent and predictable. It goes without saying that you dont want one of your replicas working with a different set of stop words or synonyms than the others.
One subtle outcome of this _apply-changes-at-reload_ approach is that the once you make changes with the API, there is no way to read the active data. In other words, the API returns the most up-to-date data from an API perspective, which could be different than what is currently being used by Solr components.
However, the intent of this API implementation is that changes will be applied using a reload within a short time frame after making them so the time in which the data returned by the API differs from what is active in the server is intended to be negligible.
[IMPORTANT]
====
Changing things like stop words and synonym mappings typically require re-indexing existing documents if being used by index-time analyzers. The RestManager framework does not guard you from this, it simply makes it possible to programmatically build up a set of stop words, synonyms etc.
====
== RestManager Endpoint
Metadata about registered ManagedResources is available using the `/schema/managed` endpoint for each collection.
Assuming you have the `managed_en` field type shown above defined in your `schema.xml`, sending a GET request to the following resource will return metadata about which schema-related resources are being managed by the RestManager:
[source,bash]
----
curl "http://localhost:8983/solr/techproducts/schema/managed"
----
The response body is a JSON document containing metadata about managed resources under the /schema root:
[source,json]
----
{
"responseHeader":{
"status":0,
"QTime":3
},
"managedResources":[
{
"resourceId":"/schema/analysis/stopwords/english",
"class":"org.apache.solr.rest.schema.analysis.ManagedWordSetResource",
"numObservers":"1"
},
{
"resourceId":"/schema/analysis/synonyms/english",
"class":"org.apache.solr.rest.schema.analysis.ManagedSynonymGraphFilterFactory$SynonymManager",
"numObservers":"1"
}
]
}
----
You can also create new managed resource using PUT/POST to the appropriate URL before ever configuring anything that uses these resources.
For example, imagine we want to build up a set of German stop words. Before we can start adding stop words, we need to create the endpoint:
`/solr/techproducts/schema/analysis/stopwords/german`
To create this endpoint, send the following PUT/POST request to the endpoint we wish to create:
[source,bash]
----
curl -X PUT -H 'Content-type:application/json' --data-binary \
'{"class":"org.apache.solr.rest.schema.analysis.ManagedWordSetResource"}' \
"http://localhost:8983/solr/techproducts/schema/analysis/stopwords/german"
----
Solr will respond with status code 200 if the request is successful. Effectively, this action registers a new endpoint for a managed resource in the RestManager. From here you can start adding German stop words as we saw above:
[source,bash]
----
curl -X PUT -H 'Content-type:application/json' --data-binary '["die"]' \
"http://localhost:8983/solr/techproducts/schema/analysis/stopwords/german"
----
For most users, creating resources in this way should never be necessary, since managed resources are created automatically when configured.
However, You may want to explicitly delete managed resources if they are no longer being used by a Solr component.
For instance, the managed resource for German that we created above can be deleted because there are no Solr components that are using it, whereas the managed resource for English stop words cannot be deleted because there is a token filter declared in `schema.xml` that is using it.
[source,bash]
----
curl -X DELETE "http://localhost:8983/solr/techproducts/schema/analysis/stopwords/german"
----