Expose Lucene's FeatureField. (#30618)

Lucene has a new `FeatureField` which gives the ability to record numeric
features as term frequencies. Its main benefit is that it allows to boost
queries with the values of these features and efficiently skip non-competitive
documents at the same time using block-max WAND and indexed impacts.
This commit is contained in:
Adrien Grand 2018-05-23 08:55:21 +02:00 committed by GitHub
parent 739bb4f0ec
commit 886db84ad2
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
14 changed files with 1616 additions and 2 deletions

View File

@ -40,6 +40,8 @@ string:: <<text,`text`>> and <<keyword,`keyword`>>
<<parent-join>>:: Defines parent/child relation for documents within the same index
<<feature>>:: Record numeric features to boost hits at query time.
[float]
=== Multi-fields
@ -86,6 +88,6 @@ include::types/percolator.asciidoc[]
include::types/parent-join.asciidoc[]
include::types/feature.asciidoc[]

View File

@ -0,0 +1,59 @@
[[feature]]
=== Feature datatype
A `feature` field can index numbers so that they can later be used to boost
documents in queries with a <<query-dsl-feature-query,`feature`>> query.
[source,js]
--------------------------------------------------
PUT my_index
{
"mappings": {
"_doc": {
"properties": {
"pagerank": {
"type": "feature" <1>
},
"url_length": {
"type": "feature",
"positive_score_impact": false <2>
}
}
}
}
}
PUT my_index/_doc/1
{
"pagerank": 8,
"url_length": 22
}
GET my_index/_search
{
"query": {
"feature": {
"field": "pagerank"
}
}
}
--------------------------------------------------
// CONSOLE
<1> Feature fields must use the `feature` field type
<2> Features that correlate negatively with the score need to declare it
NOTE: `feature` fields only support single-valued fields and strictly positive
values. Multi-valued fields and negative values will be rejected.
NOTE: `feature` fields do not support querying, sorting or aggregating. They may
only be used within <<query-dsl-feature-query,`feature`>> queries.
NOTE: `feature` fields only preserve 9 significant bits for the precision, which
translates to a relative error of about 0.4%.
Features that correlate negatively with the score should set
`positive_score_impact` to `false` (defaults to `true`). This will be used by
the <<query-dsl-feature-query,`feature`>> query to modify the scoring formula
in such a way that the score decreases with the value of the feature instead of
increasing. For instance in web search, the url length is a commonly used
feature which correlates negatively with scores.

View File

@ -0,0 +1,181 @@
[[query-dsl-feature-query]]
=== Feature Query
The `feature` query is a specialized query that only works on
<<feature,`feature`>> fields. Its goal is to boost the score of documents based
on the values of numeric features. It is typically put in a `should` clause of
a <<query-dsl-bool-query,`bool`>> query so that its score is added to the score
of the query.
Compared to using <<query-dsl-function-score-query,`function_score`>> or other
ways to modify the score, this query has the benefit of being able to
efficiently skip non-competitive hits when
<<search-uri-request,`track_total_hits`>> is set to `false`. Speedups may be
spectacular.
Here is an example:
[source,js]
--------------------------------------------------
PUT test
{
"mappings": {
"_doc": {
"properties": {
"pagerank": {
"type": "feature"
},
"url_length": {
"type": "feature",
"positive_score_impact": false
}
}
}
}
}
PUT test/_doc/1
{
"pagerank": 10,
"url_length": 50
}
PUT test/_doc/2
{
"pagerank": 100,
"url_length": 20
}
POST test/_refresh
GET test/_search
{
"query": {
"feature": {
"field": "pagerank"
}
}
}
GET test/_search
{
"query": {
"feature": {
"field": "url_length"
}
}
}
--------------------------------------------------
// CONSOLE
[float]
=== Supported functions
The `feature` query supports 3 functions in order to boost scores using the
values of features. If you do not know where to start, we recommend that you
start with the `saturation` function, which is the default when no function is
provided.
[float]
==== Saturation
This function gives a score that is equal to `S / (S + pivot)` where `S` is the
value of the feature and `pivot` is a configurable pivot value so that the
result will be less than +0.5+ if `S` is less than pivot and greater than +0.5+
otherwise. Scores are always is +(0, 1)+.
If the feature has a negative score impact then the function will be computed as
`pivot / (S + pivot)`, which decreases when `S` increases.
[source,js]
--------------------------------------------------
GET test/_search
{
"query": {
"feature": {
"field": "pagerank",
"saturation": {
"pivot": 8
}
}
}
}
--------------------------------------------------
// CONSOLE
// TEST[continued]
If +pivot+ is not supplied then Elasticsearch will compute a default value that
will be approximately equal to the geometric mean of all feature values that
exist in the index. We recommend this if you haven't had the opportunity to
train a good pivot value.
[source,js]
--------------------------------------------------
GET test/_search
{
"query": {
"feature": {
"field": "pagerank",
"saturation": {}
}
}
}
--------------------------------------------------
// CONSOLE
// TEST[continued]
[float]
==== Logarithm
This function gives a score that is equal to `log(scaling_factor + S)` where
`S` is the value of the feature and `scaling_factor` is a configurable scaling
factor. Scores are unbounded.
This function only supports features that have a positive score impact.
[source,js]
--------------------------------------------------
GET test/_search
{
"query": {
"feature": {
"field": "pagerank",
"log": {
"scaling_factor": 4
}
}
}
}
--------------------------------------------------
// CONSOLE
// TEST[continued]
[float]
==== Sigmoid
This function is an extension of `saturation` which adds a configurable
exponent. Scores are computed as `S^exp^ / (S^exp^ + pivot^exp^)`. Like for the
`saturation` function, `pivot` is the value of `S` that gives a score of +0.5+
and scores are in +(0, 1)+.
`exponent` must be positive, but is typically in +[0.5, 1]+. A good value should
be computed via traning. If you don't have the opportunity to do so, we recommend
that you stick to the `saturation` function instead.
[source,js]
--------------------------------------------------
GET test/_search
{
"query": {
"feature": {
"field": "pagerank",
"sigmoid": {
"pivot": 7,
"exponent": 0.6
}
}
}
}
--------------------------------------------------
// CONSOLE
// TEST[continued]

View File

@ -19,6 +19,11 @@ This query allows a script to act as a filter. Also see the
This query finds queries that are stored as documents that match with
the specified document.
<<query-dsl-feature-query,`feature` query>>::
A query that computes scores based on the values of numeric features and is
able to efficiently skip non-competitive hits.
<<query-dsl-wrapper-query,`wrapper` query>>::
A query that accepts other queries as json or yaml string.
@ -29,4 +34,6 @@ include::script-query.asciidoc[]
include::percolate-query.asciidoc[]
include::feature-query.asciidoc[]
include::wrapper-query.asciidoc[]

View File

@ -0,0 +1,248 @@
/*
* Licensed to Elasticsearch under one or more contributor
* license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright
* ownership. Elasticsearch licenses this file to you under
* the Apache License, Version 2.0 (the "License"); you may
* not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
package org.elasticsearch.index.mapper;
import org.apache.lucene.document.FeatureField;
import org.apache.lucene.index.IndexOptions;
import org.apache.lucene.index.IndexableField;
import org.apache.lucene.index.Term;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.TermQuery;
import org.elasticsearch.common.lucene.Lucene;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.common.xcontent.XContentBuilder;
import org.elasticsearch.common.xcontent.XContentParser.Token;
import org.elasticsearch.common.xcontent.support.XContentMapValues;
import org.elasticsearch.index.fielddata.IndexFieldData;
import org.elasticsearch.index.fielddata.plain.DocValuesIndexFieldData;
import org.elasticsearch.index.query.QueryShardContext;
import java.io.IOException;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
import java.util.Objects;
/**
* A {@link FieldMapper} that exposes Lucene's {@link FeatureField}.
*/
public class FeatureFieldMapper extends FieldMapper {
public static final String CONTENT_TYPE = "feature";
public static class Defaults {
public static final MappedFieldType FIELD_TYPE = new FeatureFieldType();
static {
FIELD_TYPE.setTokenized(false);
FIELD_TYPE.setIndexOptions(IndexOptions.NONE);
FIELD_TYPE.setHasDocValues(false);
FIELD_TYPE.setOmitNorms(true);
FIELD_TYPE.freeze();
}
}
public static class Builder extends FieldMapper.Builder<Builder, FeatureFieldMapper> {
public Builder(String name) {
super(name, Defaults.FIELD_TYPE, Defaults.FIELD_TYPE);
builder = this;
}
@Override
public FeatureFieldType fieldType() {
return (FeatureFieldType) super.fieldType();
}
public Builder positiveScoreImpact(boolean v) {
fieldType().setPositiveScoreImpact(v);
return builder;
}
@Override
public FeatureFieldMapper build(BuilderContext context) {
setupFieldType(context);
return new FeatureFieldMapper(
name, fieldType, defaultFieldType,
context.indexSettings(), multiFieldsBuilder.build(this, context), copyTo);
}
}
public static class TypeParser implements Mapper.TypeParser {
@Override
public Mapper.Builder<?,?> parse(String name, Map<String, Object> node, ParserContext parserContext) throws MapperParsingException {
FeatureFieldMapper.Builder builder = new FeatureFieldMapper.Builder(name);
for (Iterator<Map.Entry<String, Object>> iterator = node.entrySet().iterator(); iterator.hasNext();) {
Map.Entry<String, Object> entry = iterator.next();
String propName = entry.getKey();
Object propNode = entry.getValue();
if (propName.equals("positive_score_impact")) {
builder.positiveScoreImpact(XContentMapValues.nodeBooleanValue(propNode));
iterator.remove();
}
}
return builder;
}
}
public static final class FeatureFieldType extends MappedFieldType {
private boolean positiveScoreImpact = true;
public FeatureFieldType() {
setIndexAnalyzer(Lucene.KEYWORD_ANALYZER);
setSearchAnalyzer(Lucene.KEYWORD_ANALYZER);
}
protected FeatureFieldType(FeatureFieldType ref) {
super(ref);
this.positiveScoreImpact = ref.positiveScoreImpact;
}
public FeatureFieldType clone() {
return new FeatureFieldType(this);
}
@Override
public boolean equals(Object o) {
if (super.equals(o) == false) {
return false;
}
FeatureFieldType other = (FeatureFieldType) o;
return Objects.equals(positiveScoreImpact, other.positiveScoreImpact);
}
@Override
public int hashCode() {
int h = super.hashCode();
h = 31 * h + Objects.hashCode(positiveScoreImpact);
return h;
}
@Override
public void checkCompatibility(MappedFieldType other, List<String> conflicts) {
super.checkCompatibility(other, conflicts);
if (positiveScoreImpact != ((FeatureFieldType) other).positiveScoreImpact()) {
conflicts.add("mapper [" + name() + "] has different [positive_score_impact] values");
}
}
@Override
public String typeName() {
return CONTENT_TYPE;
}
public boolean positiveScoreImpact() {
return positiveScoreImpact;
}
public void setPositiveScoreImpact(boolean positiveScoreImpact) {
checkIfFrozen();
this.positiveScoreImpact = positiveScoreImpact;
}
@Override
public Query existsQuery(QueryShardContext context) {
return new TermQuery(new Term("_feature", name()));
}
@Override
public Query nullValueQuery() {
if (nullValue() == null) {
return null;
}
return termQuery(nullValue(), null);
}
@Override
public IndexFieldData.Builder fielddataBuilder(String fullyQualifiedIndexName) {
failIfNoDocValues();
return new DocValuesIndexFieldData.Builder();
}
@Override
public Query termQuery(Object value, QueryShardContext context) {
throw new UnsupportedOperationException("Queries on [feature] fields are not supported");
}
}
private FeatureFieldMapper(String simpleName, MappedFieldType fieldType, MappedFieldType defaultFieldType,
Settings indexSettings, MultiFields multiFields, CopyTo copyTo) {
super(simpleName, fieldType, defaultFieldType, indexSettings, multiFields, copyTo);
assert fieldType.indexOptions().compareTo(IndexOptions.DOCS_AND_FREQS) <= 0;
}
@Override
protected FeatureFieldMapper clone() {
return (FeatureFieldMapper) super.clone();
}
@Override
public FeatureFieldType fieldType() {
return (FeatureFieldType) super.fieldType();
}
@Override
protected void parseCreateField(ParseContext context, List<IndexableField> fields) throws IOException {
float value;
if (context.externalValueSet()) {
Object v = context.externalValue();
if (v instanceof Number) {
value = ((Number) v).floatValue();
} else {
value = Float.parseFloat(v.toString());
}
} else if (context.parser().currentToken() == Token.VALUE_NULL) {
// skip
return;
} else {
value = context.parser().floatValue();
}
if (context.doc().getByKey(name()) != null) {
throw new IllegalArgumentException("[feature] fields do not support indexing multiple values for the same field [" + name() +
"] in the same document");
}
if (fieldType().positiveScoreImpact() == false) {
value = 1 / value;
}
context.doc().addWithKey(name(), new FeatureField("_feature", name(), value));
}
@Override
protected String contentType() {
return CONTENT_TYPE;
}
@Override
protected void doXContentBody(XContentBuilder builder, boolean includeDefaults, Params params) throws IOException {
super.doXContentBody(builder, includeDefaults, params);
if (includeDefaults || fieldType().nullValue() != null) {
builder.field("null_value", fieldType().nullValue());
}
if (includeDefaults || fieldType().positiveScoreImpact() == false) {
builder.field("positive_score_impact", fieldType().positiveScoreImpact());
}
}
}

View File

@ -0,0 +1,151 @@
/*
* Licensed to Elasticsearch under one or more contributor
* license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright
* ownership. Elasticsearch licenses this file to you under
* the Apache License, Version 2.0 (the "License"); you may
* not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
package org.elasticsearch.index.mapper;
import org.apache.lucene.index.IndexOptions;
import org.apache.lucene.index.IndexableField;
import org.apache.lucene.search.Query;
import org.elasticsearch.common.lucene.Lucene;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.common.xcontent.XContentBuilder;
import org.elasticsearch.index.query.QueryShardContext;
import java.io.IOException;
import java.util.Collections;
import java.util.List;
import java.util.Map;
/**
* This meta field only exists because feature fields index everything into a
* common _feature field and Elasticsearch has a custom codec that complains
* when fields exist in the index and not in mappings.
*/
public class FeatureMetaFieldMapper extends MetadataFieldMapper {
public static final String NAME = "_feature";
public static final String CONTENT_TYPE = "_feature";
public static class Defaults {
public static final MappedFieldType FIELD_TYPE = new FeatureMetaFieldType();
static {
FIELD_TYPE.setIndexOptions(IndexOptions.DOCS_AND_FREQS);
FIELD_TYPE.setTokenized(true);
FIELD_TYPE.setStored(false);
FIELD_TYPE.setOmitNorms(true);
FIELD_TYPE.setIndexAnalyzer(Lucene.KEYWORD_ANALYZER);
FIELD_TYPE.setSearchAnalyzer(Lucene.KEYWORD_ANALYZER);
FIELD_TYPE.setName(NAME);
FIELD_TYPE.freeze();
}
}
public static class Builder extends MetadataFieldMapper.Builder<Builder, FeatureMetaFieldMapper> {
public Builder(MappedFieldType existing) {
super(NAME, existing == null ? Defaults.FIELD_TYPE : existing, Defaults.FIELD_TYPE);
}
@Override
public FeatureMetaFieldMapper build(BuilderContext context) {
setupFieldType(context);
return new FeatureMetaFieldMapper(fieldType, context.indexSettings());
}
}
public static class TypeParser implements MetadataFieldMapper.TypeParser {
@Override
public MetadataFieldMapper.Builder<?,?> parse(String name,
Map<String, Object> node, ParserContext parserContext) throws MapperParsingException {
return new Builder(parserContext.mapperService().fullName(NAME));
}
@Override
public MetadataFieldMapper getDefault(MappedFieldType fieldType, ParserContext context) {
final Settings indexSettings = context.mapperService().getIndexSettings().getSettings();
if (fieldType != null) {
return new FeatureMetaFieldMapper(indexSettings, fieldType);
} else {
return parse(NAME, Collections.emptyMap(), context)
.build(new BuilderContext(indexSettings, new ContentPath(1)));
}
}
}
public static final class FeatureMetaFieldType extends MappedFieldType {
public FeatureMetaFieldType() {
}
protected FeatureMetaFieldType(FeatureMetaFieldType ref) {
super(ref);
}
@Override
public FeatureMetaFieldType clone() {
return new FeatureMetaFieldType(this);
}
@Override
public String typeName() {
return CONTENT_TYPE;
}
@Override
public Query existsQuery(QueryShardContext context) {
throw new UnsupportedOperationException("Cannot run exists query on [_feature]");
}
@Override
public Query termQuery(Object value, QueryShardContext context) {
throw new UnsupportedOperationException("The [_feature] field may not be queried directly");
}
}
private FeatureMetaFieldMapper(Settings indexSettings, MappedFieldType existing) {
this(existing.clone(), indexSettings);
}
private FeatureMetaFieldMapper(MappedFieldType fieldType, Settings indexSettings) {
super(NAME, fieldType, Defaults.FIELD_TYPE, indexSettings);
}
@Override
public void preParse(ParseContext context) throws IOException {}
@Override
protected void parseCreateField(ParseContext context, List<IndexableField> fields) throws IOException {
throw new AssertionError("Should never be called");
}
@Override
public void postParse(ParseContext context) throws IOException {}
@Override
protected String contentType() {
return CONTENT_TYPE;
}
@Override
public XContentBuilder toXContent(XContentBuilder builder, Params params) throws IOException {
return builder;
}
}

View File

@ -19,21 +19,37 @@
package org.elasticsearch.index.mapper;
import org.elasticsearch.index.mapper.MetadataFieldMapper.TypeParser;
import org.elasticsearch.index.query.FeatureQueryBuilder;
import org.elasticsearch.plugins.MapperPlugin;
import org.elasticsearch.plugins.Plugin;
import org.elasticsearch.plugins.SearchPlugin;
import java.util.Collections;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;
public class MapperExtrasPlugin extends Plugin implements MapperPlugin {
public class MapperExtrasPlugin extends Plugin implements MapperPlugin, SearchPlugin {
@Override
public Map<String, Mapper.TypeParser> getMappers() {
Map<String, Mapper.TypeParser> mappers = new LinkedHashMap<>();
mappers.put(ScaledFloatFieldMapper.CONTENT_TYPE, new ScaledFloatFieldMapper.TypeParser());
mappers.put(TokenCountFieldMapper.CONTENT_TYPE, new TokenCountFieldMapper.TypeParser());
mappers.put(FeatureFieldMapper.CONTENT_TYPE, new FeatureFieldMapper.TypeParser());
return Collections.unmodifiableMap(mappers);
}
@Override
public Map<String, TypeParser> getMetadataMappers() {
return Collections.singletonMap(FeatureMetaFieldMapper.CONTENT_TYPE, new FeatureMetaFieldMapper.TypeParser());
}
@Override
public List<QuerySpec<?>> getQueries() {
return Collections.singletonList(
new QuerySpec<>(FeatureQueryBuilder.NAME, FeatureQueryBuilder::new, p -> FeatureQueryBuilder.PARSER.parse(p, null)));
}
}

View File

@ -0,0 +1,354 @@
/*
* Licensed to Elasticsearch under one or more contributor
* license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright
* ownership. Elasticsearch licenses this file to you under
* the Apache License, Version 2.0 (the "License"); you may
* not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
package org.elasticsearch.index.query;
import org.apache.lucene.document.FeatureField;
import org.apache.lucene.search.MatchNoDocsQuery;
import org.apache.lucene.search.Query;
import org.elasticsearch.common.ParseField;
import org.elasticsearch.common.io.stream.StreamInput;
import org.elasticsearch.common.io.stream.StreamOutput;
import org.elasticsearch.common.xcontent.ConstructingObjectParser;
import org.elasticsearch.common.xcontent.XContentBuilder;
import org.elasticsearch.index.mapper.FeatureFieldMapper.FeatureFieldType;
import org.elasticsearch.index.mapper.MappedFieldType;
import java.io.IOException;
import java.util.Arrays;
import java.util.Objects;
/**
* Query to run on a [feature] field.
*/
public final class FeatureQueryBuilder extends AbstractQueryBuilder<FeatureQueryBuilder> {
/**
* Scoring function for a [feature] field.
*/
public abstract static class ScoreFunction {
private ScoreFunction() {} // prevent extensions by users
abstract void writeTo(StreamOutput out) throws IOException;
abstract Query toQuery(String feature, boolean positiveScoreImpact) throws IOException;
abstract void doXContent(XContentBuilder builder) throws IOException;
/**
* A scoring function that scores documents as {@code Math.log(scalingFactor + S)}
* where S is the value of the static feature.
*/
public static class Log extends ScoreFunction {
private static final ConstructingObjectParser<Log, Void> PARSER = new ConstructingObjectParser<>(
"log", a -> new Log((Float) a[0]));
static {
PARSER.declareFloat(ConstructingObjectParser.constructorArg(), new ParseField("scaling_factor"));
}
private final float scalingFactor;
public Log(float scalingFactor) {
this.scalingFactor = scalingFactor;
}
private Log(StreamInput in) throws IOException {
this(in.readFloat());
}
@Override
public boolean equals(Object obj) {
if (obj == null || obj.getClass() != getClass()) {
return false;
}
Log that = (Log) obj;
return scalingFactor == that.scalingFactor;
}
@Override
public int hashCode() {
return Float.hashCode(scalingFactor);
}
@Override
void writeTo(StreamOutput out) throws IOException {
out.writeByte((byte) 0);
out.writeFloat(scalingFactor);
}
@Override
void doXContent(XContentBuilder builder) throws IOException {
builder.startObject("log");
builder.field("scaling_factor", scalingFactor);
builder.endObject();
}
@Override
Query toQuery(String feature, boolean positiveScoreImpact) throws IOException {
if (positiveScoreImpact == false) {
throw new IllegalArgumentException("Cannot use the [log] function with a field that has a negative score impact as " +
"it would trigger negative scores");
}
return FeatureField.newLogQuery("_feature", feature, DEFAULT_BOOST, scalingFactor);
}
}
/**
* A scoring function that scores documents as {@code S / (S + pivot)} where S is
* the value of the static feature.
*/
public static class Saturation extends ScoreFunction {
private static final ConstructingObjectParser<Saturation, Void> PARSER = new ConstructingObjectParser<>(
"saturation", a -> new Saturation((Float) a[0]));
static {
PARSER.declareFloat(ConstructingObjectParser.optionalConstructorArg(), new ParseField("pivot"));
}
private final Float pivot;
/** Constructor with a default pivot, computed as the geometric average of
* all feature values in the index. */
public Saturation() {
this((Float) null);
}
public Saturation(float pivot) {
this(Float.valueOf(pivot));
}
private Saturation(Float pivot) {
this.pivot = pivot;
}
private Saturation(StreamInput in) throws IOException {
this(in.readOptionalFloat());
}
@Override
public boolean equals(Object obj) {
if (obj == null || obj.getClass() != getClass()) {
return false;
}
Saturation that = (Saturation) obj;
return Objects.equals(pivot, that.pivot);
}
@Override
public int hashCode() {
return Objects.hashCode(pivot);
}
@Override
void writeTo(StreamOutput out) throws IOException {
out.writeByte((byte) 1);
out.writeOptionalFloat(pivot);
}
@Override
void doXContent(XContentBuilder builder) throws IOException {
builder.startObject("saturation");
if (pivot != null) {
builder.field("pivot", pivot);
}
builder.endObject();
}
@Override
Query toQuery(String feature, boolean positiveScoreImpact) throws IOException {
if (pivot == null) {
return FeatureField.newSaturationQuery("_feature", feature);
} else {
return FeatureField.newSaturationQuery("_feature", feature, DEFAULT_BOOST, pivot);
}
}
}
/**
* A scoring function that scores documents as {@code S^exp / (S^exp + pivot^exp)}
* where S is the value of the static feature.
*/
public static class Sigmoid extends ScoreFunction {
private static final ConstructingObjectParser<Sigmoid, Void> PARSER = new ConstructingObjectParser<>(
"sigmoid", a -> new Sigmoid((Float) a[0], ((Float) a[1]).floatValue()));
static {
PARSER.declareFloat(ConstructingObjectParser.constructorArg(), new ParseField("pivot"));
PARSER.declareFloat(ConstructingObjectParser.constructorArg(), new ParseField("exponent"));
}
private final float pivot;
private final float exp;
public Sigmoid(float pivot, float exp) {
this.pivot = pivot;
this.exp = exp;
}
private Sigmoid(StreamInput in) throws IOException {
this(in.readFloat(), in.readFloat());
}
@Override
public boolean equals(Object obj) {
if (obj == null || obj.getClass() != getClass()) {
return false;
}
Sigmoid that = (Sigmoid) obj;
return pivot == that.pivot
&& exp == that.exp;
}
@Override
public int hashCode() {
return Objects.hash(pivot, exp);
}
@Override
void writeTo(StreamOutput out) throws IOException {
out.writeByte((byte) 2);
out.writeFloat(pivot);
out.writeFloat(exp);
}
@Override
void doXContent(XContentBuilder builder) throws IOException {
builder.startObject("sigmoid");
builder.field("pivot", pivot);
builder.field("exponent", exp);
builder.endObject();
}
@Override
Query toQuery(String feature, boolean positiveScoreImpact) throws IOException {
return FeatureField.newSigmoidQuery("_feature", feature, DEFAULT_BOOST, pivot, exp);
}
}
}
private static ScoreFunction readScoreFunction(StreamInput in) throws IOException {
byte b = in.readByte();
switch (b) {
case 0:
return new ScoreFunction.Log(in);
case 1:
return new ScoreFunction.Saturation(in);
case 2:
return new ScoreFunction.Sigmoid(in);
default:
throw new IOException("Illegal score function id: " + b);
}
}
public static ConstructingObjectParser<FeatureQueryBuilder, Void> PARSER = new ConstructingObjectParser<>(
"feature", args -> {
final String field = (String) args[0];
final float boost = args[1] == null ? DEFAULT_BOOST : (Float) args[1];
final String queryName = (String) args[2];
long numNonNulls = Arrays.stream(args, 3, args.length).filter(Objects::nonNull).count();
final FeatureQueryBuilder query;
if (numNonNulls > 1) {
throw new IllegalArgumentException("Can only specify one of [log], [saturation] and [sigmoid]");
} else if (numNonNulls == 0) {
query = new FeatureQueryBuilder(field, new ScoreFunction.Saturation());
} else {
ScoreFunction scoreFunction = (ScoreFunction) Arrays.stream(args, 3, args.length)
.filter(Objects::nonNull)
.findAny()
.get();
query = new FeatureQueryBuilder(field, scoreFunction);
}
query.boost(boost);
query.queryName(queryName);
return query;
});
static {
PARSER.declareString(ConstructingObjectParser.constructorArg(), new ParseField("field"));
PARSER.declareFloat(ConstructingObjectParser.optionalConstructorArg(), BOOST_FIELD);
PARSER.declareString(ConstructingObjectParser.optionalConstructorArg(), NAME_FIELD);
PARSER.declareObject(ConstructingObjectParser.optionalConstructorArg(),
ScoreFunction.Log.PARSER, new ParseField("log"));
PARSER.declareObject(ConstructingObjectParser.optionalConstructorArg(),
ScoreFunction.Saturation.PARSER, new ParseField("saturation"));
PARSER.declareObject(ConstructingObjectParser.optionalConstructorArg(),
ScoreFunction.Sigmoid.PARSER, new ParseField("sigmoid"));
}
public static final String NAME = "feature";
private final String field;
private final ScoreFunction scoreFunction;
public FeatureQueryBuilder(String field, ScoreFunction scoreFunction) {
this.field = Objects.requireNonNull(field);
this.scoreFunction = Objects.requireNonNull(scoreFunction);
}
public FeatureQueryBuilder(StreamInput in) throws IOException {
super(in);
this.field = in.readString();
this.scoreFunction = readScoreFunction(in);
}
@Override
public String getWriteableName() {
return NAME;
}
@Override
protected void doWriteTo(StreamOutput out) throws IOException {
out.writeString(field);
scoreFunction.writeTo(out);
}
@Override
protected void doXContent(XContentBuilder builder, Params params) throws IOException {
builder.startObject(getName());
builder.field("field", field);
scoreFunction.doXContent(builder);
printBoostAndQueryName(builder);
builder.endObject();
}
@Override
protected Query doToQuery(QueryShardContext context) throws IOException {
final MappedFieldType ft = context.fieldMapper(field);
if (ft == null) {
return new MatchNoDocsQuery();
}
if (ft instanceof FeatureFieldType == false) {
throw new IllegalArgumentException("[feature] query only works on [feature] fields, not [" + ft.typeName() + "]");
}
final FeatureFieldType fft = (FeatureFieldType) ft;
return scoreFunction.toQuery(field, fft.positiveScoreImpact());
}
@Override
protected boolean doEquals(FeatureQueryBuilder other) {
return Objects.equals(field, other.field) && Objects.equals(scoreFunction, other.scoreFunction);
}
@Override
protected int doHashCode() {
return Objects.hash(field, scoreFunction);
}
}

View File

@ -0,0 +1,173 @@
/*
* Licensed to Elasticsearch under one or more contributor
* license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright
* ownership. Elasticsearch licenses this file to you under
* the Apache License, Version 2.0 (the "License"); you may
* not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
package org.elasticsearch.index.mapper;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.tokenattributes.TermFrequencyAttribute;
import org.apache.lucene.document.FeatureField;
import org.apache.lucene.index.IndexableField;
import org.elasticsearch.common.Strings;
import org.elasticsearch.common.bytes.BytesReference;
import org.elasticsearch.common.compress.CompressedXContent;
import org.elasticsearch.common.xcontent.XContentFactory;
import org.elasticsearch.common.xcontent.XContentType;
import org.elasticsearch.index.IndexService;
import org.elasticsearch.plugins.Plugin;
import org.elasticsearch.test.ESSingleNodeTestCase;
import org.hamcrest.Matchers;
import org.junit.Before;
import java.io.IOException;
import java.util.Collection;
public class FeatureFieldMapperTests extends ESSingleNodeTestCase {
IndexService indexService;
DocumentMapperParser parser;
@Before
public void setup() {
indexService = createIndex("test");
parser = indexService.mapperService().documentMapperParser();
}
@Override
protected Collection<Class<? extends Plugin>> getPlugins() {
return pluginList(MapperExtrasPlugin.class);
}
private static int getFrequency(TokenStream tk) throws IOException {
TermFrequencyAttribute freqAttribute = tk.addAttribute(TermFrequencyAttribute.class);
tk.reset();
assertTrue(tk.incrementToken());
int freq = freqAttribute.getTermFrequency();
assertFalse(tk.incrementToken());
return freq;
}
public void testDefaults() throws Exception {
String mapping = Strings.toString(XContentFactory.jsonBuilder().startObject().startObject("type")
.startObject("properties").startObject("field").field("type", "feature").endObject().endObject()
.endObject().endObject());
DocumentMapper mapper = parser.parse("type", new CompressedXContent(mapping));
assertEquals(mapping, mapper.mappingSource().toString());
ParsedDocument doc1 = mapper.parse(SourceToParse.source("test", "type", "1", BytesReference
.bytes(XContentFactory.jsonBuilder()
.startObject()
.field("field", 10)
.endObject()),
XContentType.JSON));
IndexableField[] fields = doc1.rootDoc().getFields("_feature");
assertEquals(1, fields.length);
assertThat(fields[0], Matchers.instanceOf(FeatureField.class));
FeatureField featureField1 = (FeatureField) fields[0];
ParsedDocument doc2 = mapper.parse(SourceToParse.source("test", "type", "1", BytesReference
.bytes(XContentFactory.jsonBuilder()
.startObject()
.field("field", 12)
.endObject()),
XContentType.JSON));
FeatureField featureField2 = (FeatureField) doc2.rootDoc().getFields("_feature")[0];
int freq1 = getFrequency(featureField1.tokenStream(null, null));
int freq2 = getFrequency(featureField2.tokenStream(null, null));
assertTrue(freq1 < freq2);
}
public void testNegativeScoreImpact() throws Exception {
String mapping = Strings.toString(XContentFactory.jsonBuilder().startObject().startObject("type")
.startObject("properties").startObject("field").field("type", "feature")
.field("positive_score_impact", false).endObject().endObject()
.endObject().endObject());
DocumentMapper mapper = parser.parse("type", new CompressedXContent(mapping));
assertEquals(mapping, mapper.mappingSource().toString());
ParsedDocument doc1 = mapper.parse(SourceToParse.source("test", "type", "1", BytesReference
.bytes(XContentFactory.jsonBuilder()
.startObject()
.field("field", 10)
.endObject()),
XContentType.JSON));
IndexableField[] fields = doc1.rootDoc().getFields("_feature");
assertEquals(1, fields.length);
assertThat(fields[0], Matchers.instanceOf(FeatureField.class));
FeatureField featureField1 = (FeatureField) fields[0];
ParsedDocument doc2 = mapper.parse(SourceToParse.source("test", "type", "1", BytesReference
.bytes(XContentFactory.jsonBuilder()
.startObject()
.field("field", 12)
.endObject()),
XContentType.JSON));
FeatureField featureField2 = (FeatureField) doc2.rootDoc().getFields("_feature")[0];
int freq1 = getFrequency(featureField1.tokenStream(null, null));
int freq2 = getFrequency(featureField2.tokenStream(null, null));
assertTrue(freq1 > freq2);
}
public void testRejectMultiValuedFields() throws MapperParsingException, IOException {
String mapping = Strings.toString(XContentFactory.jsonBuilder().startObject().startObject("type")
.startObject("properties").startObject("field").field("type", "feature").endObject().startObject("foo")
.startObject("properties").startObject("field").field("type", "feature").endObject().endObject()
.endObject().endObject().endObject().endObject());
DocumentMapper mapper = parser.parse("type", new CompressedXContent(mapping));
assertEquals(mapping, mapper.mappingSource().toString());
MapperParsingException e = null;/*expectThrows(MapperParsingException.class,
() -> mapper.parse(SourceToParse.source("test", "type", "1", BytesReference
.bytes(XContentFactory.jsonBuilder()
.startObject()
.field("field", Arrays.asList(10, 20))
.endObject()),
XContentType.JSON)));
assertEquals("[feature] fields do not support indexing multiple values for the same field [field] in the same document",
e.getCause().getMessage());*/
e = expectThrows(MapperParsingException.class,
() -> mapper.parse(SourceToParse.source("test", "type", "1", BytesReference
.bytes(XContentFactory.jsonBuilder()
.startObject()
.startArray("foo")
.startObject()
.field("field", 10)
.endObject()
.startObject()
.field("field", 20)
.endObject()
.endArray()
.endObject()),
XContentType.JSON)));
assertEquals("[feature] fields do not support indexing multiple values for the same field [foo.field] in the same document",
e.getCause().getMessage());
}
}

View File

@ -0,0 +1,46 @@
/*
* Licensed to Elasticsearch under one or more contributor
* license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright
* ownership. Elasticsearch licenses this file to you under
* the Apache License, Version 2.0 (the "License"); you may
* not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
package org.elasticsearch.index.mapper;
import org.junit.Before;
public class FeatureFieldTypeTests extends FieldTypeTestCase {
@Override
protected MappedFieldType createDefaultFieldType() {
return new FeatureFieldMapper.FeatureFieldType();
}
@Before
public void setupProperties() {
addModifier(new Modifier("positive_score_impact", false) {
@Override
public void modify(MappedFieldType ft) {
FeatureFieldMapper.FeatureFieldType tft = (FeatureFieldMapper.FeatureFieldType)ft;
tft.setPositiveScoreImpact(tft.positiveScoreImpact() == false);
}
@Override
public void normalizeOther(MappedFieldType other) {
super.normalizeOther(other);
((FeatureFieldMapper.FeatureFieldType) other).setPositiveScoreImpact(true);
}
});
}
}

View File

@ -0,0 +1,58 @@
/*
* Licensed to Elasticsearch under one or more contributor
* license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright
* ownership. Elasticsearch licenses this file to you under
* the Apache License, Version 2.0 (the "License"); you may
* not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
package org.elasticsearch.index.mapper;
import org.elasticsearch.common.Strings;
import org.elasticsearch.common.compress.CompressedXContent;
import org.elasticsearch.common.xcontent.XContentFactory;
import org.elasticsearch.index.IndexService;
import org.elasticsearch.plugins.Plugin;
import org.elasticsearch.test.ESSingleNodeTestCase;
import org.junit.Before;
import java.util.Collection;
public class FeatureMetaFieldMapperTests extends ESSingleNodeTestCase {
IndexService indexService;
DocumentMapperParser parser;
@Before
public void setup() {
indexService = createIndex("test");
parser = indexService.mapperService().documentMapperParser();
}
@Override
protected Collection<Class<? extends Plugin>> getPlugins() {
return pluginList(MapperExtrasPlugin.class);
}
public void testBasics() throws Exception {
String mapping = Strings.toString(XContentFactory.jsonBuilder().startObject().startObject("type")
.startObject("properties").startObject("field").field("type", "feature").endObject().endObject()
.endObject().endObject());
DocumentMapper mapper = parser.parse("type", new CompressedXContent(mapping));
assertEquals(mapping, mapper.mappingSource().toString());
assertNotNull(mapper.metadataMapper(FeatureMetaFieldMapper.class));
}
}

View File

@ -0,0 +1,29 @@
/*
* Licensed to Elasticsearch under one or more contributor
* license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright
* ownership. Elasticsearch licenses this file to you under
* the Apache License, Version 2.0 (the "License"); you may
* not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
package org.elasticsearch.index.mapper;
public class FeatureMetaFieldTypeTests extends FieldTypeTestCase {
@Override
protected MappedFieldType createDefaultFieldType() {
return new FeatureMetaFieldMapper.FeatureMetaFieldType();
}
}

View File

@ -0,0 +1,130 @@
/*
* Licensed to Elasticsearch under one or more contributor
* license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright
* ownership. Elasticsearch licenses this file to you under
* the Apache License, Version 2.0 (the "License"); you may
* not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
package org.elasticsearch.index.query;
import org.apache.lucene.document.FeatureField;
import org.apache.lucene.search.MatchNoDocsQuery;
import org.apache.lucene.search.Query;
import org.elasticsearch.action.admin.indices.mapping.put.PutMappingRequest;
import org.elasticsearch.common.Strings;
import org.elasticsearch.common.compress.CompressedXContent;
import org.elasticsearch.index.mapper.MapperExtrasPlugin;
import org.elasticsearch.index.mapper.MapperService;
import org.elasticsearch.index.query.FeatureQueryBuilder.ScoreFunction;
import org.elasticsearch.plugins.Plugin;
import org.elasticsearch.search.internal.SearchContext;
import org.elasticsearch.test.AbstractQueryTestCase;
import java.io.IOException;
import java.util.Collection;
import java.util.Collections;
import static org.hamcrest.CoreMatchers.instanceOf;
import static org.hamcrest.Matchers.either;
public class FeatureQueryBuilderTests extends AbstractQueryTestCase<FeatureQueryBuilder> {
@Override
protected void initializeAdditionalMappings(MapperService mapperService) throws IOException {
for (String type : getCurrentTypes()) {
mapperService.merge(type, new CompressedXContent(Strings.toString(PutMappingRequest.buildFromSimplifiedDef(type,
"my_feature_field", "type=feature",
"my_negative_feature_field", "type=feature,positive_score_impact=false"))), MapperService.MergeReason.MAPPING_UPDATE);
}
}
@Override
protected Collection<Class<? extends Plugin>> getPlugins() {
return Collections.singleton(MapperExtrasPlugin.class);
}
@Override
protected FeatureQueryBuilder doCreateTestQueryBuilder() {
ScoreFunction function;
switch (random().nextInt(3)) {
case 0:
function = new ScoreFunction.Log(1 + randomFloat());
break;
case 1:
if (randomBoolean()) {
function = new ScoreFunction.Saturation();
} else {
function = new ScoreFunction.Saturation(randomFloat());
}
break;
case 2:
function = new ScoreFunction.Sigmoid(randomFloat(), randomFloat());
break;
default:
throw new AssertionError();
}
return new FeatureQueryBuilder("my_feature_field", function);
}
@Override
protected void doAssertLuceneQuery(FeatureQueryBuilder queryBuilder, Query query, SearchContext context) throws IOException {
Class<?> expectedClass = FeatureField.newSaturationQuery("", "", 1, 1).getClass();
assertThat(query, either(instanceOf(MatchNoDocsQuery.class)).or(instanceOf(expectedClass)));
}
@Override
@AwaitsFix(bugUrl="https://github.com/elastic/elasticsearch/issues/30605")
public void testUnknownField() {
super.testUnknownField();
}
public void testDefaultScoreFunction() throws IOException {
assumeTrue("test runs only when at least a type is registered", getCurrentTypes().length > 0);
String query = "{\n" +
" \"feature\" : {\n" +
" \"field\": \"my_feature_field\"\n" +
" }\n" +
"}";
Query parsedQuery = parseQuery(query).toQuery(createShardContext());
assertEquals(FeatureField.newSaturationQuery("_feature", "my_feature_field"), parsedQuery);
}
public void testIllegalField() throws IOException {
assumeTrue("test runs only when at least a type is registered", getCurrentTypes().length > 0);
String query = "{\n" +
" \"feature\" : {\n" +
" \"field\": \"" + STRING_FIELD_NAME + "\"\n" +
" }\n" +
"}";
IllegalArgumentException e = expectThrows(IllegalArgumentException.class, () -> parseQuery(query).toQuery(createShardContext()));
assertEquals("[feature] query only works on [feature] fields, not [text]", e.getMessage());
}
public void testIllegalCombination() throws IOException {
assumeTrue("test runs only when at least a type is registered", getCurrentTypes().length > 0);
String query = "{\n" +
" \"feature\" : {\n" +
" \"field\": \"my_negative_feature_field\",\n" +
" \"log\" : {\n" +
" \"scaling_factor\": 4.5\n" +
" }\n" +
" }\n" +
"}";
IllegalArgumentException e = expectThrows(IllegalArgumentException.class, () -> parseQuery(query).toQuery(createShardContext()));
assertEquals(
"Cannot use the [log] function with a field that has a negative score impact as it would trigger negative scores",
e.getMessage());
}
}

View File

@ -0,0 +1,160 @@
setup:
- skip:
version: " - 6.99.99"
reason: "The feature field/query was introduced in 7.0.0"
- do:
indices.create:
index: test
body:
settings:
number_of_replicas: 0
mappings:
_doc:
properties:
pagerank:
type: feature
url_length:
type: feature
positive_score_impact: false
- do:
index:
index: test
type: _doc
id: 1
body:
pagerank: 10
url_length: 50
- do:
index:
index: test
type: _doc
id: 2
body:
pagerank: 100
url_length: 20
- do:
indices.refresh: {}
---
"Positive log":
- do:
search:
body:
query:
feature:
field: pagerank
log:
scaling_factor: 3
- match:
hits.total: 2
- match:
hits.hits.0._id: "2"
- match:
hits.hits.1._id: "1"
---
"Positive saturation":
- do:
search:
body:
query:
feature:
field: pagerank
saturation:
pivot: 20
- match:
hits.total: 2
- match:
hits.hits.0._id: "2"
- match:
hits.hits.1._id: "1"
---
"Positive sigmoid":
- do:
search:
body:
query:
feature:
field: pagerank
sigmoid:
pivot: 20
exponent: 0.6
- match:
hits.total: 2
- match:
hits.hits.0._id: "2"
- match:
hits.hits.1._id: "1"
---
"Negative log":
- do:
catch: bad_request
search:
body:
query:
feature:
field: url_length
log:
scaling_factor: 3
---
"Negative saturation":
- do:
search:
body:
query:
feature:
field: url_length
saturation:
pivot: 20
- match:
hits.total: 2
- match:
hits.hits.0._id: "2"
- match:
hits.hits.1._id: "1"
---
"Negative sigmoid":
- do:
search:
body:
query:
feature:
field: url_length
sigmoid:
pivot: 20
exponent: 0.6
- match:
hits.total: 2
- match:
hits.hits.0._id: "2"
- match:
hits.hits.1._id: "1"