Expose Lucene's FeatureField. (#30618)
Lucene has a new `FeatureField` which gives the ability to record numeric features as term frequencies. Its main benefit is that it allows to boost queries with the values of these features and efficiently skip non-competitive documents at the same time using block-max WAND and indexed impacts.
This commit is contained in:
parent
739bb4f0ec
commit
886db84ad2
|
@ -40,6 +40,8 @@ string:: <<text,`text`>> and <<keyword,`keyword`>>
|
|||
|
||||
<<parent-join>>:: Defines parent/child relation for documents within the same index
|
||||
|
||||
<<feature>>:: Record numeric features to boost hits at query time.
|
||||
|
||||
[float]
|
||||
=== Multi-fields
|
||||
|
||||
|
@ -86,6 +88,6 @@ include::types/percolator.asciidoc[]
|
|||
|
||||
include::types/parent-join.asciidoc[]
|
||||
|
||||
|
||||
include::types/feature.asciidoc[]
|
||||
|
||||
|
||||
|
|
|
@ -0,0 +1,59 @@
|
|||
[[feature]]
|
||||
=== Feature datatype
|
||||
|
||||
A `feature` field can index numbers so that they can later be used to boost
|
||||
documents in queries with a <<query-dsl-feature-query,`feature`>> query.
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT my_index
|
||||
{
|
||||
"mappings": {
|
||||
"_doc": {
|
||||
"properties": {
|
||||
"pagerank": {
|
||||
"type": "feature" <1>
|
||||
},
|
||||
"url_length": {
|
||||
"type": "feature",
|
||||
"positive_score_impact": false <2>
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
PUT my_index/_doc/1
|
||||
{
|
||||
"pagerank": 8,
|
||||
"url_length": 22
|
||||
}
|
||||
|
||||
GET my_index/_search
|
||||
{
|
||||
"query": {
|
||||
"feature": {
|
||||
"field": "pagerank"
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// CONSOLE
|
||||
<1> Feature fields must use the `feature` field type
|
||||
<2> Features that correlate negatively with the score need to declare it
|
||||
|
||||
NOTE: `feature` fields only support single-valued fields and strictly positive
|
||||
values. Multi-valued fields and negative values will be rejected.
|
||||
|
||||
NOTE: `feature` fields do not support querying, sorting or aggregating. They may
|
||||
only be used within <<query-dsl-feature-query,`feature`>> queries.
|
||||
|
||||
NOTE: `feature` fields only preserve 9 significant bits for the precision, which
|
||||
translates to a relative error of about 0.4%.
|
||||
|
||||
Features that correlate negatively with the score should set
|
||||
`positive_score_impact` to `false` (defaults to `true`). This will be used by
|
||||
the <<query-dsl-feature-query,`feature`>> query to modify the scoring formula
|
||||
in such a way that the score decreases with the value of the feature instead of
|
||||
increasing. For instance in web search, the url length is a commonly used
|
||||
feature which correlates negatively with scores.
|
|
@ -0,0 +1,181 @@
|
|||
[[query-dsl-feature-query]]
|
||||
=== Feature Query
|
||||
|
||||
The `feature` query is a specialized query that only works on
|
||||
<<feature,`feature`>> fields. Its goal is to boost the score of documents based
|
||||
on the values of numeric features. It is typically put in a `should` clause of
|
||||
a <<query-dsl-bool-query,`bool`>> query so that its score is added to the score
|
||||
of the query.
|
||||
|
||||
Compared to using <<query-dsl-function-score-query,`function_score`>> or other
|
||||
ways to modify the score, this query has the benefit of being able to
|
||||
efficiently skip non-competitive hits when
|
||||
<<search-uri-request,`track_total_hits`>> is set to `false`. Speedups may be
|
||||
spectacular.
|
||||
|
||||
Here is an example:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT test
|
||||
{
|
||||
"mappings": {
|
||||
"_doc": {
|
||||
"properties": {
|
||||
"pagerank": {
|
||||
"type": "feature"
|
||||
},
|
||||
"url_length": {
|
||||
"type": "feature",
|
||||
"positive_score_impact": false
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
PUT test/_doc/1
|
||||
{
|
||||
"pagerank": 10,
|
||||
"url_length": 50
|
||||
}
|
||||
|
||||
PUT test/_doc/2
|
||||
{
|
||||
"pagerank": 100,
|
||||
"url_length": 20
|
||||
}
|
||||
|
||||
POST test/_refresh
|
||||
|
||||
GET test/_search
|
||||
{
|
||||
"query": {
|
||||
"feature": {
|
||||
"field": "pagerank"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
GET test/_search
|
||||
{
|
||||
"query": {
|
||||
"feature": {
|
||||
"field": "url_length"
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// CONSOLE
|
||||
|
||||
[float]
|
||||
=== Supported functions
|
||||
|
||||
The `feature` query supports 3 functions in order to boost scores using the
|
||||
values of features. If you do not know where to start, we recommend that you
|
||||
start with the `saturation` function, which is the default when no function is
|
||||
provided.
|
||||
|
||||
[float]
|
||||
==== Saturation
|
||||
|
||||
This function gives a score that is equal to `S / (S + pivot)` where `S` is the
|
||||
value of the feature and `pivot` is a configurable pivot value so that the
|
||||
result will be less than +0.5+ if `S` is less than pivot and greater than +0.5+
|
||||
otherwise. Scores are always is +(0, 1)+.
|
||||
|
||||
If the feature has a negative score impact then the function will be computed as
|
||||
`pivot / (S + pivot)`, which decreases when `S` increases.
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
GET test/_search
|
||||
{
|
||||
"query": {
|
||||
"feature": {
|
||||
"field": "pagerank",
|
||||
"saturation": {
|
||||
"pivot": 8
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// CONSOLE
|
||||
// TEST[continued]
|
||||
|
||||
If +pivot+ is not supplied then Elasticsearch will compute a default value that
|
||||
will be approximately equal to the geometric mean of all feature values that
|
||||
exist in the index. We recommend this if you haven't had the opportunity to
|
||||
train a good pivot value.
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
GET test/_search
|
||||
{
|
||||
"query": {
|
||||
"feature": {
|
||||
"field": "pagerank",
|
||||
"saturation": {}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// CONSOLE
|
||||
// TEST[continued]
|
||||
|
||||
[float]
|
||||
==== Logarithm
|
||||
|
||||
This function gives a score that is equal to `log(scaling_factor + S)` where
|
||||
`S` is the value of the feature and `scaling_factor` is a configurable scaling
|
||||
factor. Scores are unbounded.
|
||||
|
||||
This function only supports features that have a positive score impact.
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
GET test/_search
|
||||
{
|
||||
"query": {
|
||||
"feature": {
|
||||
"field": "pagerank",
|
||||
"log": {
|
||||
"scaling_factor": 4
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// CONSOLE
|
||||
// TEST[continued]
|
||||
|
||||
[float]
|
||||
==== Sigmoid
|
||||
|
||||
This function is an extension of `saturation` which adds a configurable
|
||||
exponent. Scores are computed as `S^exp^ / (S^exp^ + pivot^exp^)`. Like for the
|
||||
`saturation` function, `pivot` is the value of `S` that gives a score of +0.5+
|
||||
and scores are in +(0, 1)+.
|
||||
|
||||
`exponent` must be positive, but is typically in +[0.5, 1]+. A good value should
|
||||
be computed via traning. If you don't have the opportunity to do so, we recommend
|
||||
that you stick to the `saturation` function instead.
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
GET test/_search
|
||||
{
|
||||
"query": {
|
||||
"feature": {
|
||||
"field": "pagerank",
|
||||
"sigmoid": {
|
||||
"pivot": 7,
|
||||
"exponent": 0.6
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// CONSOLE
|
||||
// TEST[continued]
|
|
@ -19,6 +19,11 @@ This query allows a script to act as a filter. Also see the
|
|||
This query finds queries that are stored as documents that match with
|
||||
the specified document.
|
||||
|
||||
<<query-dsl-feature-query,`feature` query>>::
|
||||
|
||||
A query that computes scores based on the values of numeric features and is
|
||||
able to efficiently skip non-competitive hits.
|
||||
|
||||
<<query-dsl-wrapper-query,`wrapper` query>>::
|
||||
|
||||
A query that accepts other queries as json or yaml string.
|
||||
|
@ -29,4 +34,6 @@ include::script-query.asciidoc[]
|
|||
|
||||
include::percolate-query.asciidoc[]
|
||||
|
||||
include::feature-query.asciidoc[]
|
||||
|
||||
include::wrapper-query.asciidoc[]
|
||||
|
|
|
@ -0,0 +1,248 @@
|
|||
/*
|
||||
* Licensed to Elasticsearch under one or more contributor
|
||||
* license agreements. See the NOTICE file distributed with
|
||||
* this work for additional information regarding copyright
|
||||
* ownership. Elasticsearch licenses this file to you under
|
||||
* the Apache License, Version 2.0 (the "License"); you may
|
||||
* not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing,
|
||||
* software distributed under the License is distributed on an
|
||||
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
* KIND, either express or implied. See the License for the
|
||||
* specific language governing permissions and limitations
|
||||
* under the License.
|
||||
*/
|
||||
|
||||
package org.elasticsearch.index.mapper;
|
||||
|
||||
import org.apache.lucene.document.FeatureField;
|
||||
import org.apache.lucene.index.IndexOptions;
|
||||
import org.apache.lucene.index.IndexableField;
|
||||
import org.apache.lucene.index.Term;
|
||||
import org.apache.lucene.search.Query;
|
||||
import org.apache.lucene.search.TermQuery;
|
||||
import org.elasticsearch.common.lucene.Lucene;
|
||||
import org.elasticsearch.common.settings.Settings;
|
||||
import org.elasticsearch.common.xcontent.XContentBuilder;
|
||||
import org.elasticsearch.common.xcontent.XContentParser.Token;
|
||||
import org.elasticsearch.common.xcontent.support.XContentMapValues;
|
||||
import org.elasticsearch.index.fielddata.IndexFieldData;
|
||||
import org.elasticsearch.index.fielddata.plain.DocValuesIndexFieldData;
|
||||
import org.elasticsearch.index.query.QueryShardContext;
|
||||
|
||||
import java.io.IOException;
|
||||
import java.util.Iterator;
|
||||
import java.util.List;
|
||||
import java.util.Map;
|
||||
import java.util.Objects;
|
||||
|
||||
/**
|
||||
* A {@link FieldMapper} that exposes Lucene's {@link FeatureField}.
|
||||
*/
|
||||
public class FeatureFieldMapper extends FieldMapper {
|
||||
|
||||
public static final String CONTENT_TYPE = "feature";
|
||||
|
||||
public static class Defaults {
|
||||
public static final MappedFieldType FIELD_TYPE = new FeatureFieldType();
|
||||
|
||||
static {
|
||||
FIELD_TYPE.setTokenized(false);
|
||||
FIELD_TYPE.setIndexOptions(IndexOptions.NONE);
|
||||
FIELD_TYPE.setHasDocValues(false);
|
||||
FIELD_TYPE.setOmitNorms(true);
|
||||
FIELD_TYPE.freeze();
|
||||
}
|
||||
}
|
||||
|
||||
public static class Builder extends FieldMapper.Builder<Builder, FeatureFieldMapper> {
|
||||
|
||||
public Builder(String name) {
|
||||
super(name, Defaults.FIELD_TYPE, Defaults.FIELD_TYPE);
|
||||
builder = this;
|
||||
}
|
||||
|
||||
@Override
|
||||
public FeatureFieldType fieldType() {
|
||||
return (FeatureFieldType) super.fieldType();
|
||||
}
|
||||
|
||||
public Builder positiveScoreImpact(boolean v) {
|
||||
fieldType().setPositiveScoreImpact(v);
|
||||
return builder;
|
||||
}
|
||||
|
||||
@Override
|
||||
public FeatureFieldMapper build(BuilderContext context) {
|
||||
setupFieldType(context);
|
||||
return new FeatureFieldMapper(
|
||||
name, fieldType, defaultFieldType,
|
||||
context.indexSettings(), multiFieldsBuilder.build(this, context), copyTo);
|
||||
}
|
||||
}
|
||||
|
||||
public static class TypeParser implements Mapper.TypeParser {
|
||||
@Override
|
||||
public Mapper.Builder<?,?> parse(String name, Map<String, Object> node, ParserContext parserContext) throws MapperParsingException {
|
||||
FeatureFieldMapper.Builder builder = new FeatureFieldMapper.Builder(name);
|
||||
for (Iterator<Map.Entry<String, Object>> iterator = node.entrySet().iterator(); iterator.hasNext();) {
|
||||
Map.Entry<String, Object> entry = iterator.next();
|
||||
String propName = entry.getKey();
|
||||
Object propNode = entry.getValue();
|
||||
if (propName.equals("positive_score_impact")) {
|
||||
builder.positiveScoreImpact(XContentMapValues.nodeBooleanValue(propNode));
|
||||
iterator.remove();
|
||||
}
|
||||
}
|
||||
return builder;
|
||||
}
|
||||
}
|
||||
|
||||
public static final class FeatureFieldType extends MappedFieldType {
|
||||
|
||||
private boolean positiveScoreImpact = true;
|
||||
|
||||
public FeatureFieldType() {
|
||||
setIndexAnalyzer(Lucene.KEYWORD_ANALYZER);
|
||||
setSearchAnalyzer(Lucene.KEYWORD_ANALYZER);
|
||||
}
|
||||
|
||||
protected FeatureFieldType(FeatureFieldType ref) {
|
||||
super(ref);
|
||||
this.positiveScoreImpact = ref.positiveScoreImpact;
|
||||
}
|
||||
|
||||
public FeatureFieldType clone() {
|
||||
return new FeatureFieldType(this);
|
||||
}
|
||||
|
||||
@Override
|
||||
public boolean equals(Object o) {
|
||||
if (super.equals(o) == false) {
|
||||
return false;
|
||||
}
|
||||
FeatureFieldType other = (FeatureFieldType) o;
|
||||
return Objects.equals(positiveScoreImpact, other.positiveScoreImpact);
|
||||
}
|
||||
|
||||
@Override
|
||||
public int hashCode() {
|
||||
int h = super.hashCode();
|
||||
h = 31 * h + Objects.hashCode(positiveScoreImpact);
|
||||
return h;
|
||||
}
|
||||
|
||||
@Override
|
||||
public void checkCompatibility(MappedFieldType other, List<String> conflicts) {
|
||||
super.checkCompatibility(other, conflicts);
|
||||
if (positiveScoreImpact != ((FeatureFieldType) other).positiveScoreImpact()) {
|
||||
conflicts.add("mapper [" + name() + "] has different [positive_score_impact] values");
|
||||
}
|
||||
}
|
||||
|
||||
@Override
|
||||
public String typeName() {
|
||||
return CONTENT_TYPE;
|
||||
}
|
||||
|
||||
public boolean positiveScoreImpact() {
|
||||
return positiveScoreImpact;
|
||||
}
|
||||
|
||||
public void setPositiveScoreImpact(boolean positiveScoreImpact) {
|
||||
checkIfFrozen();
|
||||
this.positiveScoreImpact = positiveScoreImpact;
|
||||
}
|
||||
|
||||
@Override
|
||||
public Query existsQuery(QueryShardContext context) {
|
||||
return new TermQuery(new Term("_feature", name()));
|
||||
}
|
||||
|
||||
@Override
|
||||
public Query nullValueQuery() {
|
||||
if (nullValue() == null) {
|
||||
return null;
|
||||
}
|
||||
return termQuery(nullValue(), null);
|
||||
}
|
||||
|
||||
@Override
|
||||
public IndexFieldData.Builder fielddataBuilder(String fullyQualifiedIndexName) {
|
||||
failIfNoDocValues();
|
||||
return new DocValuesIndexFieldData.Builder();
|
||||
}
|
||||
|
||||
@Override
|
||||
public Query termQuery(Object value, QueryShardContext context) {
|
||||
throw new UnsupportedOperationException("Queries on [feature] fields are not supported");
|
||||
}
|
||||
}
|
||||
|
||||
private FeatureFieldMapper(String simpleName, MappedFieldType fieldType, MappedFieldType defaultFieldType,
|
||||
Settings indexSettings, MultiFields multiFields, CopyTo copyTo) {
|
||||
super(simpleName, fieldType, defaultFieldType, indexSettings, multiFields, copyTo);
|
||||
assert fieldType.indexOptions().compareTo(IndexOptions.DOCS_AND_FREQS) <= 0;
|
||||
}
|
||||
|
||||
@Override
|
||||
protected FeatureFieldMapper clone() {
|
||||
return (FeatureFieldMapper) super.clone();
|
||||
}
|
||||
|
||||
@Override
|
||||
public FeatureFieldType fieldType() {
|
||||
return (FeatureFieldType) super.fieldType();
|
||||
}
|
||||
|
||||
@Override
|
||||
protected void parseCreateField(ParseContext context, List<IndexableField> fields) throws IOException {
|
||||
float value;
|
||||
if (context.externalValueSet()) {
|
||||
Object v = context.externalValue();
|
||||
if (v instanceof Number) {
|
||||
value = ((Number) v).floatValue();
|
||||
} else {
|
||||
value = Float.parseFloat(v.toString());
|
||||
}
|
||||
} else if (context.parser().currentToken() == Token.VALUE_NULL) {
|
||||
// skip
|
||||
return;
|
||||
} else {
|
||||
value = context.parser().floatValue();
|
||||
}
|
||||
|
||||
if (context.doc().getByKey(name()) != null) {
|
||||
throw new IllegalArgumentException("[feature] fields do not support indexing multiple values for the same field [" + name() +
|
||||
"] in the same document");
|
||||
}
|
||||
|
||||
if (fieldType().positiveScoreImpact() == false) {
|
||||
value = 1 / value;
|
||||
}
|
||||
|
||||
context.doc().addWithKey(name(), new FeatureField("_feature", name(), value));
|
||||
}
|
||||
|
||||
@Override
|
||||
protected String contentType() {
|
||||
return CONTENT_TYPE;
|
||||
}
|
||||
|
||||
@Override
|
||||
protected void doXContentBody(XContentBuilder builder, boolean includeDefaults, Params params) throws IOException {
|
||||
super.doXContentBody(builder, includeDefaults, params);
|
||||
|
||||
if (includeDefaults || fieldType().nullValue() != null) {
|
||||
builder.field("null_value", fieldType().nullValue());
|
||||
}
|
||||
|
||||
if (includeDefaults || fieldType().positiveScoreImpact() == false) {
|
||||
builder.field("positive_score_impact", fieldType().positiveScoreImpact());
|
||||
}
|
||||
}
|
||||
}
|
|
@ -0,0 +1,151 @@
|
|||
/*
|
||||
* Licensed to Elasticsearch under one or more contributor
|
||||
* license agreements. See the NOTICE file distributed with
|
||||
* this work for additional information regarding copyright
|
||||
* ownership. Elasticsearch licenses this file to you under
|
||||
* the Apache License, Version 2.0 (the "License"); you may
|
||||
* not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing,
|
||||
* software distributed under the License is distributed on an
|
||||
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
* KIND, either express or implied. See the License for the
|
||||
* specific language governing permissions and limitations
|
||||
* under the License.
|
||||
*/
|
||||
|
||||
package org.elasticsearch.index.mapper;
|
||||
|
||||
import org.apache.lucene.index.IndexOptions;
|
||||
import org.apache.lucene.index.IndexableField;
|
||||
import org.apache.lucene.search.Query;
|
||||
import org.elasticsearch.common.lucene.Lucene;
|
||||
import org.elasticsearch.common.settings.Settings;
|
||||
import org.elasticsearch.common.xcontent.XContentBuilder;
|
||||
import org.elasticsearch.index.query.QueryShardContext;
|
||||
|
||||
import java.io.IOException;
|
||||
import java.util.Collections;
|
||||
import java.util.List;
|
||||
import java.util.Map;
|
||||
|
||||
/**
|
||||
* This meta field only exists because feature fields index everything into a
|
||||
* common _feature field and Elasticsearch has a custom codec that complains
|
||||
* when fields exist in the index and not in mappings.
|
||||
*/
|
||||
public class FeatureMetaFieldMapper extends MetadataFieldMapper {
|
||||
|
||||
public static final String NAME = "_feature";
|
||||
|
||||
public static final String CONTENT_TYPE = "_feature";
|
||||
|
||||
public static class Defaults {
|
||||
public static final MappedFieldType FIELD_TYPE = new FeatureMetaFieldType();
|
||||
|
||||
static {
|
||||
FIELD_TYPE.setIndexOptions(IndexOptions.DOCS_AND_FREQS);
|
||||
FIELD_TYPE.setTokenized(true);
|
||||
FIELD_TYPE.setStored(false);
|
||||
FIELD_TYPE.setOmitNorms(true);
|
||||
FIELD_TYPE.setIndexAnalyzer(Lucene.KEYWORD_ANALYZER);
|
||||
FIELD_TYPE.setSearchAnalyzer(Lucene.KEYWORD_ANALYZER);
|
||||
FIELD_TYPE.setName(NAME);
|
||||
FIELD_TYPE.freeze();
|
||||
}
|
||||
}
|
||||
|
||||
public static class Builder extends MetadataFieldMapper.Builder<Builder, FeatureMetaFieldMapper> {
|
||||
|
||||
public Builder(MappedFieldType existing) {
|
||||
super(NAME, existing == null ? Defaults.FIELD_TYPE : existing, Defaults.FIELD_TYPE);
|
||||
}
|
||||
|
||||
@Override
|
||||
public FeatureMetaFieldMapper build(BuilderContext context) {
|
||||
setupFieldType(context);
|
||||
return new FeatureMetaFieldMapper(fieldType, context.indexSettings());
|
||||
}
|
||||
}
|
||||
|
||||
public static class TypeParser implements MetadataFieldMapper.TypeParser {
|
||||
@Override
|
||||
public MetadataFieldMapper.Builder<?,?> parse(String name,
|
||||
Map<String, Object> node, ParserContext parserContext) throws MapperParsingException {
|
||||
return new Builder(parserContext.mapperService().fullName(NAME));
|
||||
}
|
||||
|
||||
@Override
|
||||
public MetadataFieldMapper getDefault(MappedFieldType fieldType, ParserContext context) {
|
||||
final Settings indexSettings = context.mapperService().getIndexSettings().getSettings();
|
||||
if (fieldType != null) {
|
||||
return new FeatureMetaFieldMapper(indexSettings, fieldType);
|
||||
} else {
|
||||
return parse(NAME, Collections.emptyMap(), context)
|
||||
.build(new BuilderContext(indexSettings, new ContentPath(1)));
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
public static final class FeatureMetaFieldType extends MappedFieldType {
|
||||
|
||||
public FeatureMetaFieldType() {
|
||||
}
|
||||
|
||||
protected FeatureMetaFieldType(FeatureMetaFieldType ref) {
|
||||
super(ref);
|
||||
}
|
||||
|
||||
@Override
|
||||
public FeatureMetaFieldType clone() {
|
||||
return new FeatureMetaFieldType(this);
|
||||
}
|
||||
|
||||
@Override
|
||||
public String typeName() {
|
||||
return CONTENT_TYPE;
|
||||
}
|
||||
|
||||
@Override
|
||||
public Query existsQuery(QueryShardContext context) {
|
||||
throw new UnsupportedOperationException("Cannot run exists query on [_feature]");
|
||||
}
|
||||
|
||||
@Override
|
||||
public Query termQuery(Object value, QueryShardContext context) {
|
||||
throw new UnsupportedOperationException("The [_feature] field may not be queried directly");
|
||||
}
|
||||
}
|
||||
|
||||
private FeatureMetaFieldMapper(Settings indexSettings, MappedFieldType existing) {
|
||||
this(existing.clone(), indexSettings);
|
||||
}
|
||||
|
||||
private FeatureMetaFieldMapper(MappedFieldType fieldType, Settings indexSettings) {
|
||||
super(NAME, fieldType, Defaults.FIELD_TYPE, indexSettings);
|
||||
}
|
||||
|
||||
@Override
|
||||
public void preParse(ParseContext context) throws IOException {}
|
||||
|
||||
@Override
|
||||
protected void parseCreateField(ParseContext context, List<IndexableField> fields) throws IOException {
|
||||
throw new AssertionError("Should never be called");
|
||||
}
|
||||
|
||||
@Override
|
||||
public void postParse(ParseContext context) throws IOException {}
|
||||
|
||||
@Override
|
||||
protected String contentType() {
|
||||
return CONTENT_TYPE;
|
||||
}
|
||||
|
||||
@Override
|
||||
public XContentBuilder toXContent(XContentBuilder builder, Params params) throws IOException {
|
||||
return builder;
|
||||
}
|
||||
}
|
|
@ -19,21 +19,37 @@
|
|||
|
||||
package org.elasticsearch.index.mapper;
|
||||
|
||||
import org.elasticsearch.index.mapper.MetadataFieldMapper.TypeParser;
|
||||
import org.elasticsearch.index.query.FeatureQueryBuilder;
|
||||
import org.elasticsearch.plugins.MapperPlugin;
|
||||
import org.elasticsearch.plugins.Plugin;
|
||||
import org.elasticsearch.plugins.SearchPlugin;
|
||||
|
||||
import java.util.Collections;
|
||||
import java.util.LinkedHashMap;
|
||||
import java.util.List;
|
||||
import java.util.Map;
|
||||
|
||||
public class MapperExtrasPlugin extends Plugin implements MapperPlugin {
|
||||
public class MapperExtrasPlugin extends Plugin implements MapperPlugin, SearchPlugin {
|
||||
|
||||
@Override
|
||||
public Map<String, Mapper.TypeParser> getMappers() {
|
||||
Map<String, Mapper.TypeParser> mappers = new LinkedHashMap<>();
|
||||
mappers.put(ScaledFloatFieldMapper.CONTENT_TYPE, new ScaledFloatFieldMapper.TypeParser());
|
||||
mappers.put(TokenCountFieldMapper.CONTENT_TYPE, new TokenCountFieldMapper.TypeParser());
|
||||
mappers.put(FeatureFieldMapper.CONTENT_TYPE, new FeatureFieldMapper.TypeParser());
|
||||
return Collections.unmodifiableMap(mappers);
|
||||
}
|
||||
|
||||
@Override
|
||||
public Map<String, TypeParser> getMetadataMappers() {
|
||||
return Collections.singletonMap(FeatureMetaFieldMapper.CONTENT_TYPE, new FeatureMetaFieldMapper.TypeParser());
|
||||
}
|
||||
|
||||
@Override
|
||||
public List<QuerySpec<?>> getQueries() {
|
||||
return Collections.singletonList(
|
||||
new QuerySpec<>(FeatureQueryBuilder.NAME, FeatureQueryBuilder::new, p -> FeatureQueryBuilder.PARSER.parse(p, null)));
|
||||
}
|
||||
|
||||
}
|
||||
|
|
|
@ -0,0 +1,354 @@
|
|||
/*
|
||||
* Licensed to Elasticsearch under one or more contributor
|
||||
* license agreements. See the NOTICE file distributed with
|
||||
* this work for additional information regarding copyright
|
||||
* ownership. Elasticsearch licenses this file to you under
|
||||
* the Apache License, Version 2.0 (the "License"); you may
|
||||
* not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing,
|
||||
* software distributed under the License is distributed on an
|
||||
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
* KIND, either express or implied. See the License for the
|
||||
* specific language governing permissions and limitations
|
||||
* under the License.
|
||||
*/
|
||||
|
||||
package org.elasticsearch.index.query;
|
||||
|
||||
import org.apache.lucene.document.FeatureField;
|
||||
import org.apache.lucene.search.MatchNoDocsQuery;
|
||||
import org.apache.lucene.search.Query;
|
||||
import org.elasticsearch.common.ParseField;
|
||||
import org.elasticsearch.common.io.stream.StreamInput;
|
||||
import org.elasticsearch.common.io.stream.StreamOutput;
|
||||
import org.elasticsearch.common.xcontent.ConstructingObjectParser;
|
||||
import org.elasticsearch.common.xcontent.XContentBuilder;
|
||||
import org.elasticsearch.index.mapper.FeatureFieldMapper.FeatureFieldType;
|
||||
import org.elasticsearch.index.mapper.MappedFieldType;
|
||||
|
||||
import java.io.IOException;
|
||||
import java.util.Arrays;
|
||||
import java.util.Objects;
|
||||
|
||||
/**
|
||||
* Query to run on a [feature] field.
|
||||
*/
|
||||
public final class FeatureQueryBuilder extends AbstractQueryBuilder<FeatureQueryBuilder> {
|
||||
|
||||
/**
|
||||
* Scoring function for a [feature] field.
|
||||
*/
|
||||
public abstract static class ScoreFunction {
|
||||
|
||||
private ScoreFunction() {} // prevent extensions by users
|
||||
|
||||
abstract void writeTo(StreamOutput out) throws IOException;
|
||||
|
||||
abstract Query toQuery(String feature, boolean positiveScoreImpact) throws IOException;
|
||||
|
||||
abstract void doXContent(XContentBuilder builder) throws IOException;
|
||||
|
||||
/**
|
||||
* A scoring function that scores documents as {@code Math.log(scalingFactor + S)}
|
||||
* where S is the value of the static feature.
|
||||
*/
|
||||
public static class Log extends ScoreFunction {
|
||||
|
||||
private static final ConstructingObjectParser<Log, Void> PARSER = new ConstructingObjectParser<>(
|
||||
"log", a -> new Log((Float) a[0]));
|
||||
static {
|
||||
PARSER.declareFloat(ConstructingObjectParser.constructorArg(), new ParseField("scaling_factor"));
|
||||
}
|
||||
|
||||
private final float scalingFactor;
|
||||
|
||||
public Log(float scalingFactor) {
|
||||
this.scalingFactor = scalingFactor;
|
||||
}
|
||||
|
||||
private Log(StreamInput in) throws IOException {
|
||||
this(in.readFloat());
|
||||
}
|
||||
|
||||
@Override
|
||||
public boolean equals(Object obj) {
|
||||
if (obj == null || obj.getClass() != getClass()) {
|
||||
return false;
|
||||
}
|
||||
Log that = (Log) obj;
|
||||
return scalingFactor == that.scalingFactor;
|
||||
}
|
||||
|
||||
@Override
|
||||
public int hashCode() {
|
||||
return Float.hashCode(scalingFactor);
|
||||
}
|
||||
|
||||
@Override
|
||||
void writeTo(StreamOutput out) throws IOException {
|
||||
out.writeByte((byte) 0);
|
||||
out.writeFloat(scalingFactor);
|
||||
}
|
||||
|
||||
@Override
|
||||
void doXContent(XContentBuilder builder) throws IOException {
|
||||
builder.startObject("log");
|
||||
builder.field("scaling_factor", scalingFactor);
|
||||
builder.endObject();
|
||||
}
|
||||
|
||||
@Override
|
||||
Query toQuery(String feature, boolean positiveScoreImpact) throws IOException {
|
||||
if (positiveScoreImpact == false) {
|
||||
throw new IllegalArgumentException("Cannot use the [log] function with a field that has a negative score impact as " +
|
||||
"it would trigger negative scores");
|
||||
}
|
||||
return FeatureField.newLogQuery("_feature", feature, DEFAULT_BOOST, scalingFactor);
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* A scoring function that scores documents as {@code S / (S + pivot)} where S is
|
||||
* the value of the static feature.
|
||||
*/
|
||||
public static class Saturation extends ScoreFunction {
|
||||
|
||||
private static final ConstructingObjectParser<Saturation, Void> PARSER = new ConstructingObjectParser<>(
|
||||
"saturation", a -> new Saturation((Float) a[0]));
|
||||
static {
|
||||
PARSER.declareFloat(ConstructingObjectParser.optionalConstructorArg(), new ParseField("pivot"));
|
||||
}
|
||||
|
||||
private final Float pivot;
|
||||
|
||||
/** Constructor with a default pivot, computed as the geometric average of
|
||||
* all feature values in the index. */
|
||||
public Saturation() {
|
||||
this((Float) null);
|
||||
}
|
||||
|
||||
public Saturation(float pivot) {
|
||||
this(Float.valueOf(pivot));
|
||||
}
|
||||
|
||||
private Saturation(Float pivot) {
|
||||
this.pivot = pivot;
|
||||
}
|
||||
|
||||
private Saturation(StreamInput in) throws IOException {
|
||||
this(in.readOptionalFloat());
|
||||
}
|
||||
|
||||
@Override
|
||||
public boolean equals(Object obj) {
|
||||
if (obj == null || obj.getClass() != getClass()) {
|
||||
return false;
|
||||
}
|
||||
Saturation that = (Saturation) obj;
|
||||
return Objects.equals(pivot, that.pivot);
|
||||
}
|
||||
|
||||
@Override
|
||||
public int hashCode() {
|
||||
return Objects.hashCode(pivot);
|
||||
}
|
||||
|
||||
@Override
|
||||
void writeTo(StreamOutput out) throws IOException {
|
||||
out.writeByte((byte) 1);
|
||||
out.writeOptionalFloat(pivot);
|
||||
}
|
||||
|
||||
@Override
|
||||
void doXContent(XContentBuilder builder) throws IOException {
|
||||
builder.startObject("saturation");
|
||||
if (pivot != null) {
|
||||
builder.field("pivot", pivot);
|
||||
}
|
||||
builder.endObject();
|
||||
}
|
||||
|
||||
@Override
|
||||
Query toQuery(String feature, boolean positiveScoreImpact) throws IOException {
|
||||
if (pivot == null) {
|
||||
return FeatureField.newSaturationQuery("_feature", feature);
|
||||
} else {
|
||||
return FeatureField.newSaturationQuery("_feature", feature, DEFAULT_BOOST, pivot);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* A scoring function that scores documents as {@code S^exp / (S^exp + pivot^exp)}
|
||||
* where S is the value of the static feature.
|
||||
*/
|
||||
public static class Sigmoid extends ScoreFunction {
|
||||
|
||||
private static final ConstructingObjectParser<Sigmoid, Void> PARSER = new ConstructingObjectParser<>(
|
||||
"sigmoid", a -> new Sigmoid((Float) a[0], ((Float) a[1]).floatValue()));
|
||||
static {
|
||||
PARSER.declareFloat(ConstructingObjectParser.constructorArg(), new ParseField("pivot"));
|
||||
PARSER.declareFloat(ConstructingObjectParser.constructorArg(), new ParseField("exponent"));
|
||||
}
|
||||
|
||||
private final float pivot;
|
||||
private final float exp;
|
||||
|
||||
public Sigmoid(float pivot, float exp) {
|
||||
this.pivot = pivot;
|
||||
this.exp = exp;
|
||||
}
|
||||
|
||||
private Sigmoid(StreamInput in) throws IOException {
|
||||
this(in.readFloat(), in.readFloat());
|
||||
}
|
||||
|
||||
@Override
|
||||
public boolean equals(Object obj) {
|
||||
if (obj == null || obj.getClass() != getClass()) {
|
||||
return false;
|
||||
}
|
||||
Sigmoid that = (Sigmoid) obj;
|
||||
return pivot == that.pivot
|
||||
&& exp == that.exp;
|
||||
}
|
||||
|
||||
@Override
|
||||
public int hashCode() {
|
||||
return Objects.hash(pivot, exp);
|
||||
}
|
||||
|
||||
@Override
|
||||
void writeTo(StreamOutput out) throws IOException {
|
||||
out.writeByte((byte) 2);
|
||||
out.writeFloat(pivot);
|
||||
out.writeFloat(exp);
|
||||
}
|
||||
|
||||
@Override
|
||||
void doXContent(XContentBuilder builder) throws IOException {
|
||||
builder.startObject("sigmoid");
|
||||
builder.field("pivot", pivot);
|
||||
builder.field("exponent", exp);
|
||||
builder.endObject();
|
||||
}
|
||||
|
||||
@Override
|
||||
Query toQuery(String feature, boolean positiveScoreImpact) throws IOException {
|
||||
return FeatureField.newSigmoidQuery("_feature", feature, DEFAULT_BOOST, pivot, exp);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
private static ScoreFunction readScoreFunction(StreamInput in) throws IOException {
|
||||
byte b = in.readByte();
|
||||
switch (b) {
|
||||
case 0:
|
||||
return new ScoreFunction.Log(in);
|
||||
case 1:
|
||||
return new ScoreFunction.Saturation(in);
|
||||
case 2:
|
||||
return new ScoreFunction.Sigmoid(in);
|
||||
default:
|
||||
throw new IOException("Illegal score function id: " + b);
|
||||
}
|
||||
}
|
||||
|
||||
public static ConstructingObjectParser<FeatureQueryBuilder, Void> PARSER = new ConstructingObjectParser<>(
|
||||
"feature", args -> {
|
||||
final String field = (String) args[0];
|
||||
final float boost = args[1] == null ? DEFAULT_BOOST : (Float) args[1];
|
||||
final String queryName = (String) args[2];
|
||||
long numNonNulls = Arrays.stream(args, 3, args.length).filter(Objects::nonNull).count();
|
||||
final FeatureQueryBuilder query;
|
||||
if (numNonNulls > 1) {
|
||||
throw new IllegalArgumentException("Can only specify one of [log], [saturation] and [sigmoid]");
|
||||
} else if (numNonNulls == 0) {
|
||||
query = new FeatureQueryBuilder(field, new ScoreFunction.Saturation());
|
||||
} else {
|
||||
ScoreFunction scoreFunction = (ScoreFunction) Arrays.stream(args, 3, args.length)
|
||||
.filter(Objects::nonNull)
|
||||
.findAny()
|
||||
.get();
|
||||
query = new FeatureQueryBuilder(field, scoreFunction);
|
||||
}
|
||||
query.boost(boost);
|
||||
query.queryName(queryName);
|
||||
return query;
|
||||
});
|
||||
static {
|
||||
PARSER.declareString(ConstructingObjectParser.constructorArg(), new ParseField("field"));
|
||||
PARSER.declareFloat(ConstructingObjectParser.optionalConstructorArg(), BOOST_FIELD);
|
||||
PARSER.declareString(ConstructingObjectParser.optionalConstructorArg(), NAME_FIELD);
|
||||
PARSER.declareObject(ConstructingObjectParser.optionalConstructorArg(),
|
||||
ScoreFunction.Log.PARSER, new ParseField("log"));
|
||||
PARSER.declareObject(ConstructingObjectParser.optionalConstructorArg(),
|
||||
ScoreFunction.Saturation.PARSER, new ParseField("saturation"));
|
||||
PARSER.declareObject(ConstructingObjectParser.optionalConstructorArg(),
|
||||
ScoreFunction.Sigmoid.PARSER, new ParseField("sigmoid"));
|
||||
}
|
||||
|
||||
public static final String NAME = "feature";
|
||||
|
||||
private final String field;
|
||||
private final ScoreFunction scoreFunction;
|
||||
|
||||
public FeatureQueryBuilder(String field, ScoreFunction scoreFunction) {
|
||||
this.field = Objects.requireNonNull(field);
|
||||
this.scoreFunction = Objects.requireNonNull(scoreFunction);
|
||||
}
|
||||
|
||||
public FeatureQueryBuilder(StreamInput in) throws IOException {
|
||||
super(in);
|
||||
this.field = in.readString();
|
||||
this.scoreFunction = readScoreFunction(in);
|
||||
}
|
||||
|
||||
@Override
|
||||
public String getWriteableName() {
|
||||
return NAME;
|
||||
}
|
||||
|
||||
@Override
|
||||
protected void doWriteTo(StreamOutput out) throws IOException {
|
||||
out.writeString(field);
|
||||
scoreFunction.writeTo(out);
|
||||
}
|
||||
|
||||
@Override
|
||||
protected void doXContent(XContentBuilder builder, Params params) throws IOException {
|
||||
builder.startObject(getName());
|
||||
builder.field("field", field);
|
||||
scoreFunction.doXContent(builder);
|
||||
printBoostAndQueryName(builder);
|
||||
builder.endObject();
|
||||
}
|
||||
|
||||
@Override
|
||||
protected Query doToQuery(QueryShardContext context) throws IOException {
|
||||
final MappedFieldType ft = context.fieldMapper(field);
|
||||
if (ft == null) {
|
||||
return new MatchNoDocsQuery();
|
||||
}
|
||||
if (ft instanceof FeatureFieldType == false) {
|
||||
throw new IllegalArgumentException("[feature] query only works on [feature] fields, not [" + ft.typeName() + "]");
|
||||
}
|
||||
final FeatureFieldType fft = (FeatureFieldType) ft;
|
||||
return scoreFunction.toQuery(field, fft.positiveScoreImpact());
|
||||
}
|
||||
|
||||
@Override
|
||||
protected boolean doEquals(FeatureQueryBuilder other) {
|
||||
return Objects.equals(field, other.field) && Objects.equals(scoreFunction, other.scoreFunction);
|
||||
}
|
||||
|
||||
@Override
|
||||
protected int doHashCode() {
|
||||
return Objects.hash(field, scoreFunction);
|
||||
}
|
||||
|
||||
}
|
|
@ -0,0 +1,173 @@
|
|||
/*
|
||||
* Licensed to Elasticsearch under one or more contributor
|
||||
* license agreements. See the NOTICE file distributed with
|
||||
* this work for additional information regarding copyright
|
||||
* ownership. Elasticsearch licenses this file to you under
|
||||
* the Apache License, Version 2.0 (the "License"); you may
|
||||
* not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing,
|
||||
* software distributed under the License is distributed on an
|
||||
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
* KIND, either express or implied. See the License for the
|
||||
* specific language governing permissions and limitations
|
||||
* under the License.
|
||||
*/
|
||||
|
||||
package org.elasticsearch.index.mapper;
|
||||
|
||||
import org.apache.lucene.analysis.TokenStream;
|
||||
import org.apache.lucene.analysis.tokenattributes.TermFrequencyAttribute;
|
||||
import org.apache.lucene.document.FeatureField;
|
||||
import org.apache.lucene.index.IndexableField;
|
||||
import org.elasticsearch.common.Strings;
|
||||
import org.elasticsearch.common.bytes.BytesReference;
|
||||
import org.elasticsearch.common.compress.CompressedXContent;
|
||||
import org.elasticsearch.common.xcontent.XContentFactory;
|
||||
import org.elasticsearch.common.xcontent.XContentType;
|
||||
import org.elasticsearch.index.IndexService;
|
||||
import org.elasticsearch.plugins.Plugin;
|
||||
import org.elasticsearch.test.ESSingleNodeTestCase;
|
||||
import org.hamcrest.Matchers;
|
||||
import org.junit.Before;
|
||||
|
||||
import java.io.IOException;
|
||||
import java.util.Collection;
|
||||
|
||||
public class FeatureFieldMapperTests extends ESSingleNodeTestCase {
|
||||
|
||||
IndexService indexService;
|
||||
DocumentMapperParser parser;
|
||||
|
||||
@Before
|
||||
public void setup() {
|
||||
indexService = createIndex("test");
|
||||
parser = indexService.mapperService().documentMapperParser();
|
||||
}
|
||||
|
||||
@Override
|
||||
protected Collection<Class<? extends Plugin>> getPlugins() {
|
||||
return pluginList(MapperExtrasPlugin.class);
|
||||
}
|
||||
|
||||
private static int getFrequency(TokenStream tk) throws IOException {
|
||||
TermFrequencyAttribute freqAttribute = tk.addAttribute(TermFrequencyAttribute.class);
|
||||
tk.reset();
|
||||
assertTrue(tk.incrementToken());
|
||||
int freq = freqAttribute.getTermFrequency();
|
||||
assertFalse(tk.incrementToken());
|
||||
return freq;
|
||||
}
|
||||
|
||||
public void testDefaults() throws Exception {
|
||||
String mapping = Strings.toString(XContentFactory.jsonBuilder().startObject().startObject("type")
|
||||
.startObject("properties").startObject("field").field("type", "feature").endObject().endObject()
|
||||
.endObject().endObject());
|
||||
|
||||
DocumentMapper mapper = parser.parse("type", new CompressedXContent(mapping));
|
||||
|
||||
assertEquals(mapping, mapper.mappingSource().toString());
|
||||
|
||||
ParsedDocument doc1 = mapper.parse(SourceToParse.source("test", "type", "1", BytesReference
|
||||
.bytes(XContentFactory.jsonBuilder()
|
||||
.startObject()
|
||||
.field("field", 10)
|
||||
.endObject()),
|
||||
XContentType.JSON));
|
||||
|
||||
IndexableField[] fields = doc1.rootDoc().getFields("_feature");
|
||||
assertEquals(1, fields.length);
|
||||
assertThat(fields[0], Matchers.instanceOf(FeatureField.class));
|
||||
FeatureField featureField1 = (FeatureField) fields[0];
|
||||
|
||||
ParsedDocument doc2 = mapper.parse(SourceToParse.source("test", "type", "1", BytesReference
|
||||
.bytes(XContentFactory.jsonBuilder()
|
||||
.startObject()
|
||||
.field("field", 12)
|
||||
.endObject()),
|
||||
XContentType.JSON));
|
||||
|
||||
FeatureField featureField2 = (FeatureField) doc2.rootDoc().getFields("_feature")[0];
|
||||
|
||||
int freq1 = getFrequency(featureField1.tokenStream(null, null));
|
||||
int freq2 = getFrequency(featureField2.tokenStream(null, null));
|
||||
assertTrue(freq1 < freq2);
|
||||
}
|
||||
|
||||
public void testNegativeScoreImpact() throws Exception {
|
||||
String mapping = Strings.toString(XContentFactory.jsonBuilder().startObject().startObject("type")
|
||||
.startObject("properties").startObject("field").field("type", "feature")
|
||||
.field("positive_score_impact", false).endObject().endObject()
|
||||
.endObject().endObject());
|
||||
|
||||
DocumentMapper mapper = parser.parse("type", new CompressedXContent(mapping));
|
||||
|
||||
assertEquals(mapping, mapper.mappingSource().toString());
|
||||
|
||||
ParsedDocument doc1 = mapper.parse(SourceToParse.source("test", "type", "1", BytesReference
|
||||
.bytes(XContentFactory.jsonBuilder()
|
||||
.startObject()
|
||||
.field("field", 10)
|
||||
.endObject()),
|
||||
XContentType.JSON));
|
||||
|
||||
IndexableField[] fields = doc1.rootDoc().getFields("_feature");
|
||||
assertEquals(1, fields.length);
|
||||
assertThat(fields[0], Matchers.instanceOf(FeatureField.class));
|
||||
FeatureField featureField1 = (FeatureField) fields[0];
|
||||
|
||||
ParsedDocument doc2 = mapper.parse(SourceToParse.source("test", "type", "1", BytesReference
|
||||
.bytes(XContentFactory.jsonBuilder()
|
||||
.startObject()
|
||||
.field("field", 12)
|
||||
.endObject()),
|
||||
XContentType.JSON));
|
||||
|
||||
FeatureField featureField2 = (FeatureField) doc2.rootDoc().getFields("_feature")[0];
|
||||
|
||||
int freq1 = getFrequency(featureField1.tokenStream(null, null));
|
||||
int freq2 = getFrequency(featureField2.tokenStream(null, null));
|
||||
assertTrue(freq1 > freq2);
|
||||
}
|
||||
|
||||
public void testRejectMultiValuedFields() throws MapperParsingException, IOException {
|
||||
String mapping = Strings.toString(XContentFactory.jsonBuilder().startObject().startObject("type")
|
||||
.startObject("properties").startObject("field").field("type", "feature").endObject().startObject("foo")
|
||||
.startObject("properties").startObject("field").field("type", "feature").endObject().endObject()
|
||||
.endObject().endObject().endObject().endObject());
|
||||
|
||||
DocumentMapper mapper = parser.parse("type", new CompressedXContent(mapping));
|
||||
|
||||
assertEquals(mapping, mapper.mappingSource().toString());
|
||||
|
||||
MapperParsingException e = null;/*expectThrows(MapperParsingException.class,
|
||||
() -> mapper.parse(SourceToParse.source("test", "type", "1", BytesReference
|
||||
.bytes(XContentFactory.jsonBuilder()
|
||||
.startObject()
|
||||
.field("field", Arrays.asList(10, 20))
|
||||
.endObject()),
|
||||
XContentType.JSON)));
|
||||
assertEquals("[feature] fields do not support indexing multiple values for the same field [field] in the same document",
|
||||
e.getCause().getMessage());*/
|
||||
|
||||
e = expectThrows(MapperParsingException.class,
|
||||
() -> mapper.parse(SourceToParse.source("test", "type", "1", BytesReference
|
||||
.bytes(XContentFactory.jsonBuilder()
|
||||
.startObject()
|
||||
.startArray("foo")
|
||||
.startObject()
|
||||
.field("field", 10)
|
||||
.endObject()
|
||||
.startObject()
|
||||
.field("field", 20)
|
||||
.endObject()
|
||||
.endArray()
|
||||
.endObject()),
|
||||
XContentType.JSON)));
|
||||
assertEquals("[feature] fields do not support indexing multiple values for the same field [foo.field] in the same document",
|
||||
e.getCause().getMessage());
|
||||
}
|
||||
}
|
|
@ -0,0 +1,46 @@
|
|||
/*
|
||||
* Licensed to Elasticsearch under one or more contributor
|
||||
* license agreements. See the NOTICE file distributed with
|
||||
* this work for additional information regarding copyright
|
||||
* ownership. Elasticsearch licenses this file to you under
|
||||
* the Apache License, Version 2.0 (the "License"); you may
|
||||
* not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing,
|
||||
* software distributed under the License is distributed on an
|
||||
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
* KIND, either express or implied. See the License for the
|
||||
* specific language governing permissions and limitations
|
||||
* under the License.
|
||||
*/
|
||||
|
||||
package org.elasticsearch.index.mapper;
|
||||
|
||||
import org.junit.Before;
|
||||
|
||||
public class FeatureFieldTypeTests extends FieldTypeTestCase {
|
||||
|
||||
@Override
|
||||
protected MappedFieldType createDefaultFieldType() {
|
||||
return new FeatureFieldMapper.FeatureFieldType();
|
||||
}
|
||||
|
||||
@Before
|
||||
public void setupProperties() {
|
||||
addModifier(new Modifier("positive_score_impact", false) {
|
||||
@Override
|
||||
public void modify(MappedFieldType ft) {
|
||||
FeatureFieldMapper.FeatureFieldType tft = (FeatureFieldMapper.FeatureFieldType)ft;
|
||||
tft.setPositiveScoreImpact(tft.positiveScoreImpact() == false);
|
||||
}
|
||||
@Override
|
||||
public void normalizeOther(MappedFieldType other) {
|
||||
super.normalizeOther(other);
|
||||
((FeatureFieldMapper.FeatureFieldType) other).setPositiveScoreImpact(true);
|
||||
}
|
||||
});
|
||||
}
|
||||
}
|
|
@ -0,0 +1,58 @@
|
|||
/*
|
||||
* Licensed to Elasticsearch under one or more contributor
|
||||
* license agreements. See the NOTICE file distributed with
|
||||
* this work for additional information regarding copyright
|
||||
* ownership. Elasticsearch licenses this file to you under
|
||||
* the Apache License, Version 2.0 (the "License"); you may
|
||||
* not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing,
|
||||
* software distributed under the License is distributed on an
|
||||
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
* KIND, either express or implied. See the License for the
|
||||
* specific language governing permissions and limitations
|
||||
* under the License.
|
||||
*/
|
||||
|
||||
package org.elasticsearch.index.mapper;
|
||||
|
||||
import org.elasticsearch.common.Strings;
|
||||
import org.elasticsearch.common.compress.CompressedXContent;
|
||||
import org.elasticsearch.common.xcontent.XContentFactory;
|
||||
import org.elasticsearch.index.IndexService;
|
||||
import org.elasticsearch.plugins.Plugin;
|
||||
import org.elasticsearch.test.ESSingleNodeTestCase;
|
||||
import org.junit.Before;
|
||||
|
||||
import java.util.Collection;
|
||||
|
||||
public class FeatureMetaFieldMapperTests extends ESSingleNodeTestCase {
|
||||
|
||||
IndexService indexService;
|
||||
DocumentMapperParser parser;
|
||||
|
||||
@Before
|
||||
public void setup() {
|
||||
indexService = createIndex("test");
|
||||
parser = indexService.mapperService().documentMapperParser();
|
||||
}
|
||||
|
||||
@Override
|
||||
protected Collection<Class<? extends Plugin>> getPlugins() {
|
||||
return pluginList(MapperExtrasPlugin.class);
|
||||
}
|
||||
|
||||
public void testBasics() throws Exception {
|
||||
String mapping = Strings.toString(XContentFactory.jsonBuilder().startObject().startObject("type")
|
||||
.startObject("properties").startObject("field").field("type", "feature").endObject().endObject()
|
||||
.endObject().endObject());
|
||||
|
||||
DocumentMapper mapper = parser.parse("type", new CompressedXContent(mapping));
|
||||
|
||||
assertEquals(mapping, mapper.mappingSource().toString());
|
||||
assertNotNull(mapper.metadataMapper(FeatureMetaFieldMapper.class));
|
||||
}
|
||||
}
|
|
@ -0,0 +1,29 @@
|
|||
/*
|
||||
* Licensed to Elasticsearch under one or more contributor
|
||||
* license agreements. See the NOTICE file distributed with
|
||||
* this work for additional information regarding copyright
|
||||
* ownership. Elasticsearch licenses this file to you under
|
||||
* the Apache License, Version 2.0 (the "License"); you may
|
||||
* not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing,
|
||||
* software distributed under the License is distributed on an
|
||||
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
* KIND, either express or implied. See the License for the
|
||||
* specific language governing permissions and limitations
|
||||
* under the License.
|
||||
*/
|
||||
|
||||
package org.elasticsearch.index.mapper;
|
||||
|
||||
public class FeatureMetaFieldTypeTests extends FieldTypeTestCase {
|
||||
|
||||
@Override
|
||||
protected MappedFieldType createDefaultFieldType() {
|
||||
return new FeatureMetaFieldMapper.FeatureMetaFieldType();
|
||||
}
|
||||
|
||||
}
|
|
@ -0,0 +1,130 @@
|
|||
/*
|
||||
* Licensed to Elasticsearch under one or more contributor
|
||||
* license agreements. See the NOTICE file distributed with
|
||||
* this work for additional information regarding copyright
|
||||
* ownership. Elasticsearch licenses this file to you under
|
||||
* the Apache License, Version 2.0 (the "License"); you may
|
||||
* not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing,
|
||||
* software distributed under the License is distributed on an
|
||||
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
* KIND, either express or implied. See the License for the
|
||||
* specific language governing permissions and limitations
|
||||
* under the License.
|
||||
*/
|
||||
|
||||
package org.elasticsearch.index.query;
|
||||
|
||||
import org.apache.lucene.document.FeatureField;
|
||||
import org.apache.lucene.search.MatchNoDocsQuery;
|
||||
import org.apache.lucene.search.Query;
|
||||
import org.elasticsearch.action.admin.indices.mapping.put.PutMappingRequest;
|
||||
import org.elasticsearch.common.Strings;
|
||||
import org.elasticsearch.common.compress.CompressedXContent;
|
||||
import org.elasticsearch.index.mapper.MapperExtrasPlugin;
|
||||
import org.elasticsearch.index.mapper.MapperService;
|
||||
import org.elasticsearch.index.query.FeatureQueryBuilder.ScoreFunction;
|
||||
import org.elasticsearch.plugins.Plugin;
|
||||
import org.elasticsearch.search.internal.SearchContext;
|
||||
import org.elasticsearch.test.AbstractQueryTestCase;
|
||||
|
||||
import java.io.IOException;
|
||||
import java.util.Collection;
|
||||
import java.util.Collections;
|
||||
|
||||
import static org.hamcrest.CoreMatchers.instanceOf;
|
||||
import static org.hamcrest.Matchers.either;
|
||||
|
||||
public class FeatureQueryBuilderTests extends AbstractQueryTestCase<FeatureQueryBuilder> {
|
||||
|
||||
@Override
|
||||
protected void initializeAdditionalMappings(MapperService mapperService) throws IOException {
|
||||
for (String type : getCurrentTypes()) {
|
||||
mapperService.merge(type, new CompressedXContent(Strings.toString(PutMappingRequest.buildFromSimplifiedDef(type,
|
||||
"my_feature_field", "type=feature",
|
||||
"my_negative_feature_field", "type=feature,positive_score_impact=false"))), MapperService.MergeReason.MAPPING_UPDATE);
|
||||
}
|
||||
}
|
||||
|
||||
@Override
|
||||
protected Collection<Class<? extends Plugin>> getPlugins() {
|
||||
return Collections.singleton(MapperExtrasPlugin.class);
|
||||
}
|
||||
|
||||
@Override
|
||||
protected FeatureQueryBuilder doCreateTestQueryBuilder() {
|
||||
ScoreFunction function;
|
||||
switch (random().nextInt(3)) {
|
||||
case 0:
|
||||
function = new ScoreFunction.Log(1 + randomFloat());
|
||||
break;
|
||||
case 1:
|
||||
if (randomBoolean()) {
|
||||
function = new ScoreFunction.Saturation();
|
||||
} else {
|
||||
function = new ScoreFunction.Saturation(randomFloat());
|
||||
}
|
||||
break;
|
||||
case 2:
|
||||
function = new ScoreFunction.Sigmoid(randomFloat(), randomFloat());
|
||||
break;
|
||||
default:
|
||||
throw new AssertionError();
|
||||
}
|
||||
return new FeatureQueryBuilder("my_feature_field", function);
|
||||
}
|
||||
|
||||
@Override
|
||||
protected void doAssertLuceneQuery(FeatureQueryBuilder queryBuilder, Query query, SearchContext context) throws IOException {
|
||||
Class<?> expectedClass = FeatureField.newSaturationQuery("", "", 1, 1).getClass();
|
||||
assertThat(query, either(instanceOf(MatchNoDocsQuery.class)).or(instanceOf(expectedClass)));
|
||||
}
|
||||
|
||||
@Override
|
||||
@AwaitsFix(bugUrl="https://github.com/elastic/elasticsearch/issues/30605")
|
||||
public void testUnknownField() {
|
||||
super.testUnknownField();
|
||||
}
|
||||
|
||||
public void testDefaultScoreFunction() throws IOException {
|
||||
assumeTrue("test runs only when at least a type is registered", getCurrentTypes().length > 0);
|
||||
String query = "{\n" +
|
||||
" \"feature\" : {\n" +
|
||||
" \"field\": \"my_feature_field\"\n" +
|
||||
" }\n" +
|
||||
"}";
|
||||
Query parsedQuery = parseQuery(query).toQuery(createShardContext());
|
||||
assertEquals(FeatureField.newSaturationQuery("_feature", "my_feature_field"), parsedQuery);
|
||||
}
|
||||
|
||||
public void testIllegalField() throws IOException {
|
||||
assumeTrue("test runs only when at least a type is registered", getCurrentTypes().length > 0);
|
||||
String query = "{\n" +
|
||||
" \"feature\" : {\n" +
|
||||
" \"field\": \"" + STRING_FIELD_NAME + "\"\n" +
|
||||
" }\n" +
|
||||
"}";
|
||||
IllegalArgumentException e = expectThrows(IllegalArgumentException.class, () -> parseQuery(query).toQuery(createShardContext()));
|
||||
assertEquals("[feature] query only works on [feature] fields, not [text]", e.getMessage());
|
||||
}
|
||||
|
||||
public void testIllegalCombination() throws IOException {
|
||||
assumeTrue("test runs only when at least a type is registered", getCurrentTypes().length > 0);
|
||||
String query = "{\n" +
|
||||
" \"feature\" : {\n" +
|
||||
" \"field\": \"my_negative_feature_field\",\n" +
|
||||
" \"log\" : {\n" +
|
||||
" \"scaling_factor\": 4.5\n" +
|
||||
" }\n" +
|
||||
" }\n" +
|
||||
"}";
|
||||
IllegalArgumentException e = expectThrows(IllegalArgumentException.class, () -> parseQuery(query).toQuery(createShardContext()));
|
||||
assertEquals(
|
||||
"Cannot use the [log] function with a field that has a negative score impact as it would trigger negative scores",
|
||||
e.getMessage());
|
||||
}
|
||||
}
|
|
@ -0,0 +1,160 @@
|
|||
setup:
|
||||
- skip:
|
||||
version: " - 6.99.99"
|
||||
reason: "The feature field/query was introduced in 7.0.0"
|
||||
|
||||
- do:
|
||||
indices.create:
|
||||
index: test
|
||||
body:
|
||||
settings:
|
||||
number_of_replicas: 0
|
||||
mappings:
|
||||
_doc:
|
||||
properties:
|
||||
pagerank:
|
||||
type: feature
|
||||
url_length:
|
||||
type: feature
|
||||
positive_score_impact: false
|
||||
|
||||
- do:
|
||||
index:
|
||||
index: test
|
||||
type: _doc
|
||||
id: 1
|
||||
body:
|
||||
pagerank: 10
|
||||
url_length: 50
|
||||
|
||||
- do:
|
||||
index:
|
||||
index: test
|
||||
type: _doc
|
||||
id: 2
|
||||
body:
|
||||
pagerank: 100
|
||||
url_length: 20
|
||||
|
||||
- do:
|
||||
indices.refresh: {}
|
||||
|
||||
---
|
||||
"Positive log":
|
||||
|
||||
- do:
|
||||
search:
|
||||
body:
|
||||
query:
|
||||
feature:
|
||||
field: pagerank
|
||||
log:
|
||||
scaling_factor: 3
|
||||
|
||||
- match:
|
||||
hits.total: 2
|
||||
|
||||
- match:
|
||||
hits.hits.0._id: "2"
|
||||
|
||||
- match:
|
||||
hits.hits.1._id: "1"
|
||||
|
||||
---
|
||||
"Positive saturation":
|
||||
|
||||
- do:
|
||||
search:
|
||||
body:
|
||||
query:
|
||||
feature:
|
||||
field: pagerank
|
||||
saturation:
|
||||
pivot: 20
|
||||
|
||||
- match:
|
||||
hits.total: 2
|
||||
|
||||
- match:
|
||||
hits.hits.0._id: "2"
|
||||
|
||||
- match:
|
||||
hits.hits.1._id: "1"
|
||||
|
||||
---
|
||||
"Positive sigmoid":
|
||||
|
||||
- do:
|
||||
search:
|
||||
body:
|
||||
query:
|
||||
feature:
|
||||
field: pagerank
|
||||
sigmoid:
|
||||
pivot: 20
|
||||
exponent: 0.6
|
||||
|
||||
- match:
|
||||
hits.total: 2
|
||||
|
||||
- match:
|
||||
hits.hits.0._id: "2"
|
||||
|
||||
- match:
|
||||
hits.hits.1._id: "1"
|
||||
|
||||
---
|
||||
"Negative log":
|
||||
|
||||
- do:
|
||||
catch: bad_request
|
||||
search:
|
||||
body:
|
||||
query:
|
||||
feature:
|
||||
field: url_length
|
||||
log:
|
||||
scaling_factor: 3
|
||||
|
||||
---
|
||||
"Negative saturation":
|
||||
|
||||
- do:
|
||||
search:
|
||||
body:
|
||||
query:
|
||||
feature:
|
||||
field: url_length
|
||||
saturation:
|
||||
pivot: 20
|
||||
|
||||
- match:
|
||||
hits.total: 2
|
||||
|
||||
- match:
|
||||
hits.hits.0._id: "2"
|
||||
|
||||
- match:
|
||||
hits.hits.1._id: "1"
|
||||
|
||||
---
|
||||
"Negative sigmoid":
|
||||
|
||||
- do:
|
||||
search:
|
||||
body:
|
||||
query:
|
||||
feature:
|
||||
field: url_length
|
||||
sigmoid:
|
||||
pivot: 20
|
||||
exponent: 0.6
|
||||
|
||||
- match:
|
||||
hits.total: 2
|
||||
|
||||
- match:
|
||||
hits.hits.0._id: "2"
|
||||
|
||||
- match:
|
||||
hits.hits.1._id: "1"
|
Loading…
Reference in New Issue