mirror of
https://github.com/apache/lucene.git
synced 2025-02-13 05:25:50 +00:00
96 lines
5.2 KiB
Plaintext
96 lines
5.2 KiB
Plaintext
= Other Schema Elements
|
|
// Licensed to the Apache Software Foundation (ASF) under one
|
|
// or more contributor license agreements. See the NOTICE file
|
|
// distributed with this work for additional information
|
|
// regarding copyright ownership. The ASF licenses this file
|
|
// to you under the Apache License, Version 2.0 (the
|
|
// "License"); you may not use this file except in compliance
|
|
// with the License. You may obtain a copy of the License at
|
|
//
|
|
// http://www.apache.org/licenses/LICENSE-2.0
|
|
//
|
|
// Unless required by applicable law or agreed to in writing,
|
|
// software distributed under the License is distributed on an
|
|
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
|
// KIND, either express or implied. See the License for the
|
|
// specific language governing permissions and limitations
|
|
// under the License.
|
|
|
|
This section describes several other important elements of `schema.xml` not covered in earlier sections.
|
|
|
|
== Unique Key
|
|
|
|
The `uniqueKey` element specifies which field is a unique identifier for documents. Although `uniqueKey` is not required, it is nearly always warranted by your application design. For example, `uniqueKey` should be used if you will ever update a document in the index.
|
|
|
|
You can define the unique key field by naming it:
|
|
|
|
[source,xml]
|
|
----
|
|
<uniqueKey>id</uniqueKey>
|
|
----
|
|
|
|
Schema defaults and `copyFields` cannot be used to populate the `uniqueKey` field. The `fieldType` of `uniqueKey` must not be analyzed and must not be any of the `*PointField` types. You can use `UUIDUpdateProcessorFactory` to have `uniqueKey` values generated automatically.
|
|
|
|
Further, the operation will fail if the `uniqueKey` field is used, but is multivalued (or inherits the multivalue-ness from the `fieldtype`). However, `uniqueKey` will continue to work, as long as the field is properly used.
|
|
|
|
|
|
== Similarity
|
|
|
|
Similarity is a Lucene class used to score a document in searching.
|
|
|
|
Each collection has one "global" Similarity, and by default Solr uses an implicit {solr-javadocs}/solr-core/org/apache/solr/search/similarities/SchemaSimilarityFactory.html[`SchemaSimilarityFactory`] which allows individual field types to be configured with a "per-type" specific Similarity and implicitly uses `BM25Similarity` for any field type which does not have an explicit Similarity.
|
|
|
|
This default behavior can be overridden by declaring a top level `<similarity/>` element in your `schema.xml`, outside of any single field type. This similarity declaration can either refer directly to the name of a class with a no-argument constructor, such as in this example showing `BM25Similarity`:
|
|
|
|
[source,xml]
|
|
----
|
|
<similarity class="solr.BM25SimilarityFactory"/>
|
|
----
|
|
|
|
or by referencing a `SimilarityFactory` implementation, which may take optional initialization parameters:
|
|
|
|
[source,xml]
|
|
----
|
|
<similarity class="solr.DFRSimilarityFactory">
|
|
<str name="basicModel">P</str>
|
|
<str name="afterEffect">L</str>
|
|
<str name="normalization">H2</str>
|
|
<float name="c">7</float>
|
|
</similarity>
|
|
----
|
|
|
|
In most cases, specifying global level similarity like this will cause an error if your `schema.xml` also includes field type specific `<similarity/>` declarations. One key exception to this is that you may explicitly declare a {solr-javadocs}/solr-core/org/apache/solr/search/similarities/SchemaSimilarityFactory.html[`SchemaSimilarityFactory`] and specify what that default behavior will be for all field types that do not declare an explicit Similarity using the name of field type (specified by `defaultSimFromFieldType`) that _is_ configured with a specific similarity:
|
|
|
|
[source,xml]
|
|
----
|
|
<similarity class="solr.SchemaSimilarityFactory">
|
|
<str name="defaultSimFromFieldType">text_dfr</str>
|
|
</similarity>
|
|
<fieldType name="text_dfr" class="solr.TextField">
|
|
<analyzer ... />
|
|
<similarity class="solr.DFRSimilarityFactory">
|
|
<str name="basicModel">I(F)</str>
|
|
<str name="afterEffect">B</str>
|
|
<str name="normalization">H3</str>
|
|
<float name="mu">900</float>
|
|
</similarity>
|
|
</fieldType>
|
|
<fieldType name="text_ib" class="solr.TextField">
|
|
<analyzer ... />
|
|
<similarity class="solr.IBSimilarityFactory">
|
|
<str name="distribution">SPL</str>
|
|
<str name="lambda">DF</str>
|
|
<str name="normalization">H2</str>
|
|
</similarity>
|
|
</fieldType>
|
|
<fieldType name="text_other" class="solr.TextField">
|
|
<analyzer ... />
|
|
</fieldType>
|
|
----
|
|
|
|
In the example above `IBSimilarityFactory` (using the Information-Based model) will be used for any fields of type `text_ib`, while `DFRSimilarityFactory` (divergence from random) will be used for any fields of type `text_dfr`, as well as any fields using a type that does not explicitly specify a `<similarity/>`.
|
|
|
|
If `SchemaSimilarityFactory` is explicitly declared without configuring a `defaultSimFromFieldType`, then `BM25Similarity` is implicitly used as the default for `luceneMatchVersion >= 8.0.0` and otherwise `LegacyBM25Similarity` is used to mimic the same BM25 formula that was the default in those versions.
|
|
|
|
In addition to the various factories mentioned on this page, there are several other similarity implementations that can be used such as the `SweetSpotSimilarityFactory`, `ClassicSimilarityFactory`, `LegacyBM25SimilarityFactory` etc. For details, see the Solr Javadocs for the {solr-javadocs}/solr-core/org/apache/solr/schema/SimilarityFactory.html[similarity factories].
|