Aggs: Make it possible to configure missing values.

Most aggregations (terms, histogram, stats, percentiles, geohash-grid) now
support a new `missing` option which defines the value to consider when a
field does not have a value. This can be handy if you eg. want a terms
aggregation to handle the same way documents that have "N/A" or no value
for a `tag` field.

This works in a very similar way to the `missing` option on the `sort`
element.

One known issue is that this option sometimes cannot make the right decision
in the unmapped case: it needs to replace all values with the `missing` value
but might not know what kind of values source should be produced (numerics,
strings, geo points?). For this reason, we might want to add an `unmapped_type`
option in the future like we did for sorting.

Related to #5324
This commit is contained in:
Adrien Grand 2015-05-07 16:46:40 +02:00
parent 66921ffa50
commit 32e23b9100
25 changed files with 1311 additions and 48 deletions

View File

@ -123,3 +123,26 @@ settings and filter the returned buckets based on a `min_doc_count` setting (by
bucket that matches documents and the last one are returned). This histogram also supports the `extended_bounds`
setting, which enables extending the bounds of the histogram beyond the data itself (to read more on why you'd want to
do that please refer to the explanation <<search-aggregations-bucket-histogram-aggregation-extended-bounds,here>>).
==== Missing value
The `missing` parameter defines how documents that are missing a value should be treated.
By default they will be ignored but it is also possible to treat them as if they
had a value.
[source,js]
--------------------------------------------------
{
"aggs" : {
"publish_date" : {
"datehistogram" : {
"field" : "publish_date",
"interval": "year",
"missing": "2000-01-01" <1>
}
}
}
}
--------------------------------------------------
<1> Documents without a value in the `publish_date` field will fall into the same bucket as documents that have the value `2000-01-01`.

View File

@ -317,3 +317,26 @@ Response:
}
}
--------------------------------------------------
==== Missing value
The `missing` parameter defines how documents that are missing a value should be treated.
By default they will be ignored but it is also possible to treat them as if they
had a value.
[source,js]
--------------------------------------------------
{
"aggs" : {
"quantity" : {
"histogram" : {
"field" : "quantity",
"interval": 10,
"missing": 0 <1>
}
}
}
}
--------------------------------------------------
<1> Documents without a value in the `quantity` field will fall into the same bucket as documents that have the value `0`.

View File

@ -655,3 +655,25 @@ in inner aggregations.
<1> experimental[] the possible values are `map`, `global_ordinals`, `global_ordinals_hash` and `global_ordinals_low_cardinality`
Please note that Elasticsearch will ignore this execution hint if it is not applicable and that there is no backward compatibility guarantee on these hints.
==== Missing value
The `missing` parameter defines how documents that are missing a value should be treated.
By default they will be ignored but it is also possible to treat them as if they
had a value.
[source,js]
--------------------------------------------------
{
"aggs" : {
"tags" : {
"terms" : {
"field" : "tags",
"missing": "N/A" <1>
}
}
}
}
--------------------------------------------------
<1> Documents without a value in the `tags` field will fall into the same bucket as documents that have the value `N/A`.

View File

@ -72,4 +72,26 @@ It turned out that the exam was way above the level of the students and a grade
}
}
}
--------------------------------------------------
--------------------------------------------------
==== Missing value
The `missing` parameter defines how documents that are missing a value should be treated.
By default they will be ignored but it is also possible to treat them as if they
had a value.
[source,js]
--------------------------------------------------
{
"aggs" : {
"grade_avg" : {
"avg" : {
"field" : "grade",
"missing": 10 <1>
}
}
}
}
--------------------------------------------------
<1> Documents without a value in the `grade` field will fall into the same bucket as documents that have the value `10`.

View File

@ -155,3 +155,24 @@ however since hashes need to be computed on the fly.
TIP: The `script` parameter expects an inline script. Use `script_id` for indexed scripts and `script_file` for scripts in the `config/scripts/` directory.
==== Missing value
The `missing` parameter defines how documents that are missing a value should be treated.
By default they will be ignored but it is also possible to treat them as if they
had a value.
[source,js]
--------------------------------------------------
{
"aggs" : {
"tag_cardinality" : {
"cardinality" : {
"field" : "tag",
"missing": "N/A" <1>
}
}
}
}
--------------------------------------------------
<1> Documents without a value in the `tag` field will fall into the same bucket as documents that have the value `N/A`.

View File

@ -116,4 +116,26 @@ It turned out that the exam was way above the level of the students and a grade
}
}
}
--------------------------------------------------
--------------------------------------------------
==== Missing value
The `missing` parameter defines how documents that are missing a value should be treated.
By default they will be ignored but it is also possible to treat them as if they
had a value.
[source,js]
--------------------------------------------------
{
"aggs" : {
"grades_stats" : {
"extended_stats" : {
"field" : "grade",
"missing": 0 <1>
}
}
}
}
--------------------------------------------------
<1> Documents without a value in the `grade` field will fall into the same bucket as documents that have the value `0`.

View File

@ -67,3 +67,24 @@ Let's say that the prices of the documents in our index are in USD, but we would
}
--------------------------------------------------
==== Missing value
The `missing` parameter defines how documents that are missing a value should be treated.
By default they will be ignored but it is also possible to treat them as if they
had a value.
[source,js]
--------------------------------------------------
{
"aggs" : {
"grade_max" : {
"max" : {
"field" : "grade",
"missing": 10 <1>
}
}
}
}
--------------------------------------------------
<1> Documents without a value in the `grade` field will fall into the same bucket as documents that have the value `10`.

View File

@ -66,3 +66,25 @@ Let's say that the prices of the documents in our index are in USD, but we would
}
}
--------------------------------------------------
==== Missing value
The `missing` parameter defines how documents that are missing a value should be treated.
By default they will be ignored but it is also possible to treat them as if they
had a value.
[source,js]
--------------------------------------------------
{
"aggs" : {
"grade_min" : {
"min" : {
"field" : "grade",
"missing": 10 <1>
}
}
}
}
--------------------------------------------------
<1> Documents without a value in the `grade` field will fall into the same bucket as documents that have the value `10`.

View File

@ -190,3 +190,25 @@ A "node" uses roughly 32 bytes of memory, so under worst-case scenarios (large a
of data which arrives sorted and in-order) the default settings will produce a
TDigest roughly 64KB in size. In practice data tends to be more random and
the TDigest will use less memory.
==== Missing value
The `missing` parameter defines how documents that are missing a value should be treated.
By default they will be ignored but it is also possible to treat them as if they
had a value.
[source,js]
--------------------------------------------------
{
"aggs" : {
"grade_percentiles" : {
"percentiles" : {
"field" : "grade",
"missing": 10 <1>
}
}
}
}
--------------------------------------------------
<1> Documents without a value in the `grade` field will fall into the same bucket as documents that have the value `10`.

View File

@ -86,3 +86,25 @@ script to generate values which percentile ranks are calculated on
<2> Scripting supports parameterized input just like any other script
TIP: The `script` parameter expects an inline script. Use `script_id` for indexed scripts and `script_file` for scripts in the `config/scripts/` directory.
==== Missing value
The `missing` parameter defines how documents that are missing a value should be treated.
By default they will be ignored but it is also possible to treat them as if they
had a value.
[source,js]
--------------------------------------------------
{
"aggs" : {
"grade_ranks" : {
"percentile_ranks" : {
"field" : "grade",
"missing": 10 <1>
}
}
}
}
--------------------------------------------------
<1> Documents without a value in the `grade` field will fall into the same bucket as documents that have the value `10`.

View File

@ -78,4 +78,26 @@ It turned out that the exam was way above the level of the students and a grade
}
}
}
--------------------------------------------------
--------------------------------------------------
==== Missing value
The `missing` parameter defines how documents that are missing a value should be treated.
By default they will be ignored but it is also possible to treat them as if they
had a value.
[source,js]
--------------------------------------------------
{
"aggs" : {
"grades_stats" : {
"stats" : {
"field" : "grade",
"missing": 0 <1>
}
}
}
}
--------------------------------------------------
<1> Documents without a value in the `grade` field will fall into the same bucket as documents that have the value `0`.

View File

@ -77,3 +77,25 @@ Computing the sum of squares over all stock tick changes:
}
}
--------------------------------------------------
==== Missing value
The `missing` parameter defines how documents that are missing a value should be treated.
By default they will be ignored but it is also possible to treat them as if they
had a value.
[source,js]
--------------------------------------------------
{
"aggs" : {
"total_time" : {
"sum" : {
"field" : "took",
"missing": 100 <1>
}
}
}
}
--------------------------------------------------
<1> Documents without a value in the `took` field will fall into the same bucket as documents that have the value `100`.

View File

@ -409,19 +409,24 @@ public class GeoUtils {
return point.reset(lat, lon);
} else if(parser.currentToken() == Token.VALUE_STRING) {
String data = parser.text();
int comma = data.indexOf(',');
if(comma > 0) {
lat = Double.parseDouble(data.substring(0, comma).trim());
lon = Double.parseDouble(data.substring(comma + 1).trim());
return point.reset(lat, lon);
} else {
return point.resetFromGeoHash(data);
}
return parseGeoPoint(data, point);
} else {
throw new ElasticsearchParseException("geo_point expected");
}
}
/** parse a {@link GeoPoint} from a String */
public static GeoPoint parseGeoPoint(String data, GeoPoint point) {
int comma = data.indexOf(',');
if(comma > 0) {
double lat = Double.parseDouble(data.substring(0, comma).trim());
double lon = Double.parseDouble(data.substring(comma + 1).trim());
return point.reset(lat, lon);
} else {
return point.resetFromGeoHash(data);
}
}
private GeoUtils() {
}
}

View File

@ -34,6 +34,7 @@ public abstract class ValuesSourceAggregationBuilder<B extends ValuesSourceAggre
private String script;
private String lang;
private Map<String, Object> params;
private Object missing;
/**
* Constructs a new builder.
@ -117,6 +118,14 @@ public abstract class ValuesSourceAggregationBuilder<B extends ValuesSourceAggre
return (B) this;
}
/**
* Configure the value to use when documents miss a value.
*/
public B missing(Object missingValue) {
this.missing = missingValue;
return (B) this;
}
@Override
protected final XContentBuilder internalXContent(XContentBuilder builder, Params params) throws IOException {
builder.startObject();
@ -132,6 +141,9 @@ public abstract class ValuesSourceAggregationBuilder<B extends ValuesSourceAggre
if (this.params != null) {
builder.field("params").map(this.params);
}
if (missing != null) {
builder.field("missing", missing);
}
doInternalXContent(builder, params);
return builder.endObject();

View File

@ -58,7 +58,7 @@ import java.util.Map;
*/
public class GlobalOrdinalsStringTermsAggregator extends AbstractStringTermsAggregator {
protected final ValuesSource.Bytes.WithOrdinals.FieldData valuesSource;
protected final ValuesSource.Bytes.WithOrdinals valuesSource;
protected final IncludeExclude.OrdinalsFilter includeExclude;
// TODO: cache the acceptedglobalValues per aggregation definition.
@ -71,7 +71,7 @@ public class GlobalOrdinalsStringTermsAggregator extends AbstractStringTermsAggr
protected RandomAccessOrds globalOrds;
public GlobalOrdinalsStringTermsAggregator(String name, AggregatorFactories factories, ValuesSource.Bytes.WithOrdinals.FieldData valuesSource,
public GlobalOrdinalsStringTermsAggregator(String name, AggregatorFactories factories, ValuesSource.Bytes.WithOrdinals valuesSource,
Terms.Order order, BucketCountThresholds bucketCountThresholds,
IncludeExclude.OrdinalsFilter includeExclude, AggregationContext aggregationContext, Aggregator parent, SubAggCollectionMode collectionMode, boolean showTermDocCountError, List<Reducer> reducers, Map<String, Object> metaData) throws IOException {
super(name, factories, aggregationContext, parent, order, bucketCountThresholds, collectionMode, showTermDocCountError, reducers,
@ -334,7 +334,7 @@ public class GlobalOrdinalsStringTermsAggregator extends AbstractStringTermsAggr
private RandomAccessOrds segmentOrds;
public LowCardinality(String name, AggregatorFactories factories, ValuesSource.Bytes.WithOrdinals.FieldData valuesSource,
public LowCardinality(String name, AggregatorFactories factories, ValuesSource.Bytes.WithOrdinals valuesSource,
Terms.Order order, BucketCountThresholds bucketCountThresholds, AggregationContext aggregationContext, Aggregator parent, SubAggCollectionMode collectionMode, boolean showTermDocCountError, List<Reducer> reducers, Map<String, Object> metaData) throws IOException {
super(name, factories, valuesSource, order, bucketCountThresholds, null, aggregationContext, parent, collectionMode, showTermDocCountError, reducers, metaData);
assert factories == null || factories.count() == 0;

View File

@ -69,7 +69,7 @@ public class TermsAggregatorFactory extends ValuesSourceAggregatorFactory<Values
Terms.Order order, TermsAggregator.BucketCountThresholds bucketCountThresholds, IncludeExclude includeExclude,
AggregationContext aggregationContext, Aggregator parent, SubAggCollectionMode subAggCollectMode, boolean showTermDocCountError, List<Reducer> reducers, Map<String, Object> metaData) throws IOException {
final IncludeExclude.OrdinalsFilter filter = includeExclude == null ? null : includeExclude.convertToOrdinalsFilter();
return new GlobalOrdinalsStringTermsAggregator(name, factories, (ValuesSource.Bytes.WithOrdinals.FieldData) valuesSource, order, bucketCountThresholds, filter, aggregationContext, parent, subAggCollectMode, showTermDocCountError, reducers, metaData);
return new GlobalOrdinalsStringTermsAggregator(name, factories, (ValuesSource.Bytes.WithOrdinals) valuesSource, order, bucketCountThresholds, filter, aggregationContext, parent, subAggCollectMode, showTermDocCountError, reducers, metaData);
}
@Override
@ -103,7 +103,7 @@ public class TermsAggregatorFactory extends ValuesSourceAggregatorFactory<Values
if (includeExclude != null || factories.count() > 0) {
return GLOBAL_ORDINALS.create(name, factories, valuesSource, order, bucketCountThresholds, includeExclude, aggregationContext, parent, subAggCollectMode, showTermDocCountError, reducers, metaData);
}
return new GlobalOrdinalsStringTermsAggregator.LowCardinality(name, factories, (ValuesSource.Bytes.WithOrdinals.FieldData) valuesSource, order, bucketCountThresholds, aggregationContext, parent, subAggCollectMode, showTermDocCountError, reducers, metaData);
return new GlobalOrdinalsStringTermsAggregator.LowCardinality(name, factories, (ValuesSource.Bytes.WithOrdinals) valuesSource, order, bucketCountThresholds, aggregationContext, parent, subAggCollectMode, showTermDocCountError, reducers, metaData);
}
@Override

View File

@ -36,6 +36,7 @@ public abstract class ValuesSourceMetricsAggregationBuilder<B extends ValuesSour
private String lang;
private String format;
private Map<String, Object> params;
private Object missing;
protected ValuesSourceMetricsAggregationBuilder(String name, String type) {
super(name, type);
@ -84,6 +85,14 @@ public abstract class ValuesSourceMetricsAggregationBuilder<B extends ValuesSour
return (B) this;
}
/**
* Configure the value to use when documents miss a value.
*/
public B missing(Object missingValue) {
this.missing = missingValue;
return (B) this;
}
@Override
protected void internalXContent(XContentBuilder builder, Params params) throws IOException {
if (field != null) {
@ -105,5 +114,9 @@ public abstract class ValuesSourceMetricsAggregationBuilder<B extends ValuesSour
if (this.params != null && !this.params.isEmpty()) {
builder.field("params").map(this.params);
}
if (missing != null) {
builder.field("missing", missing);
}
}
}

View File

@ -18,17 +18,24 @@
*/
package org.elasticsearch.search.aggregations.support;
import org.apache.lucene.util.BytesRef;
import org.elasticsearch.cache.recycler.PageCacheRecycler;
import org.elasticsearch.common.Nullable;
import org.elasticsearch.common.geo.GeoPoint;
import org.elasticsearch.common.geo.GeoUtils;
import org.elasticsearch.common.util.BigArrays;
import org.elasticsearch.index.fielddata.IndexFieldData;
import org.elasticsearch.index.fielddata.IndexGeoPointFieldData;
import org.elasticsearch.index.fielddata.IndexNumericFieldData;
import org.elasticsearch.index.fielddata.IndexOrdinalsFieldData;
import org.elasticsearch.index.fielddata.plain.ParentChildIndexFieldData;
import org.elasticsearch.index.mapper.core.DateFieldMapper;
import org.elasticsearch.search.SearchParseException;
import org.elasticsearch.search.aggregations.AggregationExecutionException;
import org.elasticsearch.search.internal.SearchContext;
import java.io.IOException;
import java.util.Date;
/**
*
@ -53,11 +60,78 @@ public class AggregationContext {
return searchContext.bigArrays();
}
/** Get a value source given its configuration and the depth of the aggregator in the aggregation tree. */
public <VS extends ValuesSource> VS valuesSource(ValuesSourceConfig<VS> config) throws IOException {
/** Get a value source given its configuration. A return value of null indicates that
* no value source could be built. */
@Nullable
public <VS extends ValuesSource> VS valuesSource(ValuesSourceConfig<VS> config, SearchContext context) throws IOException {
assert config.valid() : "value source config is invalid - must have either a field context or a script or marked as unmapped";
assert !config.unmapped : "value source should not be created for unmapped fields";
final VS vs;
if (config.unmapped) {
if (config.missing == null) {
// otherwise we will have values because of the missing value
vs = null;
} else if (ValuesSource.Numeric.class.isAssignableFrom(config.valueSourceType)) {
vs = (VS) ValuesSource.Numeric.EMPTY;
} else if (ValuesSource.GeoPoint.class.isAssignableFrom(config.valueSourceType)) {
vs = (VS) ValuesSource.GeoPoint.EMPTY;
} else if (ValuesSource.class.isAssignableFrom(config.valueSourceType)
|| ValuesSource.Bytes.class.isAssignableFrom(config.valueSourceType)
|| ValuesSource.Bytes.WithOrdinals.class.isAssignableFrom(config.valueSourceType)) {
vs = (VS) ValuesSource.Bytes.EMPTY;
} else {
throw new SearchParseException(searchContext, "Can't deal with unmapped ValuesSource type " + config.valueSourceType, null);
}
} else {
vs = originalValuesSource(config);
}
if (config.missing == null) {
return vs;
}
if (vs instanceof ValuesSource.Bytes) {
final BytesRef missing = new BytesRef(config.missing.toString());
if (vs instanceof ValuesSource.Bytes.WithOrdinals) {
return (VS) MissingValues.replaceMissing((ValuesSource.Bytes.WithOrdinals) vs, missing);
} else {
return (VS) MissingValues.replaceMissing((ValuesSource.Bytes) vs, missing);
}
} else if (vs instanceof ValuesSource.Numeric) {
Number missing = null;
if (config.missing instanceof Number) {
missing = (Number) config.missing;
} else {
if (config.fieldContext != null && config.fieldContext.mapper() instanceof DateFieldMapper) {
final DateFieldMapper mapper = (DateFieldMapper) config.fieldContext.mapper();
try {
missing = mapper.dateTimeFormatter().parser().parseDateTime(config.missing.toString()).getMillis();
} catch (IllegalArgumentException e) {
throw new SearchParseException(context, "Expected a date value in [missing] but got [" + config.missing + "]", null, e);
}
} else {
try {
missing = Double.parseDouble(config.missing.toString());
} catch (NumberFormatException e) {
throw new SearchParseException(context, "Expected a numeric value in [missing] but got [" + config.missing + "]", null, e);
}
}
}
return (VS) MissingValues.replaceMissing((ValuesSource.Numeric) vs, missing);
} else if (vs instanceof ValuesSource.GeoPoint) {
// TODO: also support the structured formats of geo points
final GeoPoint missing = GeoUtils.parseGeoPoint(config.missing.toString(), new GeoPoint());
return (VS) MissingValues.replaceMissing((ValuesSource.GeoPoint) vs, missing);
} else {
// Should not happen
throw new SearchParseException(searchContext, "Can't apply missing values on a " + vs.getClass(), null);
}
}
/**
* Return the original values source, before we apply `missing`.
*/
private <VS extends ValuesSource> VS originalValuesSource(ValuesSourceConfig<VS> config) throws IOException {
if (config.fieldContext == null) {
if (ValuesSource.Numeric.class.isAssignableFrom(config.valueSourceType)) {
return (VS) numericScript(config);
@ -111,7 +185,7 @@ public class AggregationContext {
}
private ValuesSource.GeoPoint geoPointField(ValuesSourceConfig<?> config) throws IOException {
return new ValuesSource.GeoPoint((IndexGeoPointFieldData) config.fieldContext.indexFieldData());
return new ValuesSource.GeoPoint.Fielddata((IndexGeoPointFieldData) config.fieldContext.indexFieldData());
}
}

View File

@ -0,0 +1,337 @@
/*
* Licensed to Elasticsearch under one or more contributor
* license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright
* ownership. Elasticsearch licenses this file to you under
* the Apache License, Version 2.0 (the "License"); you may
* not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
package org.elasticsearch.search.aggregations.support;
import org.apache.lucene.index.LeafReaderContext;
import org.apache.lucene.index.RandomAccessOrds;
import org.apache.lucene.index.SortedNumericDocValues;
import org.apache.lucene.util.BytesRef;
import org.elasticsearch.common.geo.GeoPoint;
import org.elasticsearch.index.fielddata.AbstractRandomAccessOrds;
import org.elasticsearch.index.fielddata.MultiGeoPointValues;
import org.elasticsearch.index.fielddata.SortedBinaryDocValues;
import org.elasticsearch.index.fielddata.SortedNumericDoubleValues;
import java.io.IOException;
/**
* Utility class that allows to return views of {@link ValuesSource}s that
* replace the missing value with a configured value.
*/
public enum MissingValues {
;
// TODO: we could specialize the single value case
public static ValuesSource.Bytes replaceMissing(final ValuesSource.Bytes valuesSource, final BytesRef missing) {
return new ValuesSource.Bytes() {
@Override
public SortedBinaryDocValues bytesValues(LeafReaderContext context) throws IOException {
SortedBinaryDocValues values = valuesSource.bytesValues(context);
return replaceMissing(values, missing);
}
};
}
static SortedBinaryDocValues replaceMissing(final SortedBinaryDocValues values, final BytesRef missing) {
return new SortedBinaryDocValues() {
private int count;
@Override
public BytesRef valueAt(int index) {
if (count > 0) {
return values.valueAt(index);
} else if (index == 0) {
return missing;
} else {
throw new IndexOutOfBoundsException();
}
}
@Override
public void setDocument(int docId) {
values.setDocument(docId);
count = values.count();
}
@Override
public int count() {
return count == 0 ? 1 : count;
}
};
}
public static ValuesSource.Numeric replaceMissing(final ValuesSource.Numeric valuesSource, final Number missing) {
final boolean missingIsFloat = missing.longValue() != (long) missing.doubleValue();
final boolean isFloatingPoint = valuesSource.isFloatingPoint() || missingIsFloat;
return new ValuesSource.Numeric() {
@Override
public boolean isFloatingPoint() {
return isFloatingPoint;
}
@Override
public SortedBinaryDocValues bytesValues(LeafReaderContext context) throws IOException {
return replaceMissing(valuesSource.bytesValues(context), new BytesRef(missing.toString()));
}
@Override
public SortedNumericDocValues longValues(LeafReaderContext context) throws IOException {
final SortedNumericDocValues values = valuesSource.longValues(context);
return replaceMissing(values, missing.longValue());
}
@Override
public SortedNumericDoubleValues doubleValues(LeafReaderContext context) throws IOException {
final SortedNumericDoubleValues values = valuesSource.doubleValues(context);
return replaceMissing(values, missing.doubleValue());
}
};
}
static SortedNumericDocValues replaceMissing(final SortedNumericDocValues values, final long missing) {
return new SortedNumericDocValues() {
private int count;
@Override
public void setDocument(int doc) {
values.setDocument(doc);
count = values.count();
}
@Override
public long valueAt(int index) {
if (count > 0) {
return values.valueAt(index);
} else if (index == 0) {
return missing;
} else {
throw new IndexOutOfBoundsException();
}
}
@Override
public int count() {
return count == 0 ? 1 : count;
}
};
}
static SortedNumericDoubleValues replaceMissing(final SortedNumericDoubleValues values, final double missing) {
return new SortedNumericDoubleValues() {
private int count;
@Override
public void setDocument(int doc) {
values.setDocument(doc);
count = values.count();
}
@Override
public double valueAt(int index) {
if (count > 0) {
return values.valueAt(index);
} else if (index == 0) {
return missing;
} else {
throw new IndexOutOfBoundsException();
}
}
@Override
public int count() {
return count == 0 ? 1 : count;
}
};
}
public static ValuesSource.Bytes replaceMissing(final ValuesSource.Bytes.WithOrdinals valuesSource, final BytesRef missing) {
return new ValuesSource.Bytes.WithOrdinals() {
@Override
public SortedBinaryDocValues bytesValues(LeafReaderContext context) throws IOException {
SortedBinaryDocValues values = valuesSource.bytesValues(context);
return replaceMissing(values, missing);
}
@Override
public RandomAccessOrds ordinalsValues(LeafReaderContext context) {
RandomAccessOrds values = valuesSource.ordinalsValues(context);
return replaceMissing(values, missing);
}
@Override
public RandomAccessOrds globalOrdinalsValues(LeafReaderContext context) {
RandomAccessOrds values = valuesSource.globalOrdinalsValues(context);
return replaceMissing(values, missing);
}
};
}
static RandomAccessOrds replaceMissing(final RandomAccessOrds values, final BytesRef missing) {
final long missingOrd = values.lookupTerm(missing);
if (missingOrd >= 0) {
// The value already exists
return replaceMissingOrd(values, missingOrd);
} else {
final long insertedOrd = -1 - missingOrd;
return insertOrd(values, insertedOrd, missing);
}
}
static RandomAccessOrds replaceMissingOrd(final RandomAccessOrds values, final long missingOrd) {
return new AbstractRandomAccessOrds() {
private int cardinality = 0;
@Override
public void doSetDocument(int docID) {
values.setDocument(docID);
cardinality = values.cardinality();
}
@Override
public BytesRef lookupOrd(long ord) {
return values.lookupOrd(ord);
}
@Override
public long getValueCount() {
return values.getValueCount();
}
@Override
public long ordAt(int index) {
if (cardinality > 0) {
return values.ordAt(index);
} else if (index == 0) {
return missingOrd;
} else {
throw new IndexOutOfBoundsException();
}
}
@Override
public int cardinality() {
return cardinality == 0 ? 1 : cardinality;
}
};
}
static RandomAccessOrds insertOrd(final RandomAccessOrds values, final long insertedOrd, final BytesRef missingValue) {
return new AbstractRandomAccessOrds() {
private int cardinality = 0;
@Override
public void doSetDocument(int docID) {
values.setDocument(docID);
cardinality = values.cardinality();
}
@Override
public BytesRef lookupOrd(long ord) {
if (ord < insertedOrd) {
return values.lookupOrd(ord);
} else if (ord > insertedOrd) {
return values.lookupOrd(ord - 1);
} else {
return missingValue;
}
}
@Override
public long getValueCount() {
return 1 + values.getValueCount();
}
@Override
public long ordAt(int index) {
if (cardinality > 0) {
final long ord = values.ordAt(index);
if (ord < insertedOrd) {
return ord;
} else {
return ord + 1;
}
} else if (index == 0) {
return insertedOrd;
} else {
throw new IndexOutOfBoundsException();
}
}
@Override
public int cardinality() {
return cardinality == 0 ? 1 : cardinality;
}
};
}
public static ValuesSource.GeoPoint replaceMissing(final ValuesSource.GeoPoint valuesSource, final GeoPoint missing) {
return new ValuesSource.GeoPoint() {
@Override
public SortedBinaryDocValues bytesValues(LeafReaderContext context) throws IOException {
return replaceMissing(valuesSource.bytesValues(context), new BytesRef(missing.toString()));
}
@Override
public MultiGeoPointValues geoPointValues(LeafReaderContext context) {
final MultiGeoPointValues values = valuesSource.geoPointValues(context);
return replaceMissing(values, missing);
}
};
}
static MultiGeoPointValues replaceMissing(final MultiGeoPointValues values, final GeoPoint missing) {
return new MultiGeoPointValues() {
private int count;
@Override
public GeoPoint valueAt(int index) {
if (count > 0) {
return values.valueAt(index);
} else if (index == 0) {
return missing;
} else {
throw new IndexOutOfBoundsException();
}
}
@Override
public void setDocument(int docId) {
values.setDocument(docId);
count = values.count();
}
@Override
public int count() {
return count == 0 ? 1 : count;
}
};
}
}

View File

@ -68,6 +68,25 @@ public abstract class ValuesSource {
public static abstract class Bytes extends ValuesSource {
public static final WithOrdinals EMPTY = new WithOrdinals() {
@Override
public RandomAccessOrds ordinalsValues(LeafReaderContext context) {
return DocValues.emptySortedSet();
}
@Override
public RandomAccessOrds globalOrdinalsValues(LeafReaderContext context) {
return DocValues.emptySortedSet();
}
@Override
public SortedBinaryDocValues bytesValues(LeafReaderContext context) throws IOException {
return org.elasticsearch.index.fielddata.FieldData.emptySortedBinary(context.reader().maxDoc());
}
};
@Override
public Bits docsWithValue(LeafReaderContext context) throws IOException {
final SortedBinaryDocValues bytes = bytesValues(context);
@ -94,7 +113,16 @@ public abstract class ValuesSource {
public abstract RandomAccessOrds globalOrdinalsValues(LeafReaderContext context);
public abstract long globalMaxOrd(IndexSearcher indexSearcher);
public long globalMaxOrd(IndexSearcher indexSearcher) {
IndexReader indexReader = indexSearcher.getIndexReader();
if (indexReader.leaves().isEmpty()) {
return 0;
} else {
LeafReaderContext atomicReaderContext = indexReader.leaves().get(0);
RandomAccessOrds values = globalOrdinalsValues(atomicReaderContext);
return values.getValueCount();
}
}
public static class FieldData extends WithOrdinals {
@ -122,20 +150,6 @@ public abstract class ValuesSource {
final AtomicOrdinalsFieldData atomicFieldData = global.load(context);
return atomicFieldData.getOrdinalsValues();
}
@Override
public long globalMaxOrd(IndexSearcher indexSearcher) {
IndexReader indexReader = indexSearcher.getIndexReader();
if (indexReader.leaves().isEmpty()) {
return 0;
} else {
LeafReaderContext atomicReaderContext = indexReader.leaves().get(0);
IndexOrdinalsFieldData globalFieldData = indexFieldData.loadGlobal(indexReader);
AtomicOrdinalsFieldData afd = globalFieldData.load(atomicReaderContext);
RandomAccessOrds values = afd.getOrdinalsValues();
return values.getValueCount();
}
}
}
}
@ -212,6 +226,30 @@ public abstract class ValuesSource {
public static abstract class Numeric extends ValuesSource {
public static final Numeric EMPTY = new Numeric() {
@Override
public boolean isFloatingPoint() {
return false;
}
@Override
public SortedNumericDocValues longValues(LeafReaderContext context) {
return DocValues.emptySortedNumeric(context.reader().maxDoc());
}
@Override
public SortedNumericDoubleValues doubleValues(LeafReaderContext context) throws IOException {
return org.elasticsearch.index.fielddata.FieldData.emptySortedNumericDoubles(context.reader().maxDoc());
}
@Override
public SortedBinaryDocValues bytesValues(LeafReaderContext context) throws IOException {
return org.elasticsearch.index.fielddata.FieldData.emptySortedBinary(context.reader().maxDoc());
}
};
/** Whether the underlying data is floating-point or not. */
public abstract boolean isFloatingPoint();
@ -452,13 +490,21 @@ public abstract class ValuesSource {
}
}
public static class GeoPoint extends ValuesSource {
public static abstract class GeoPoint extends ValuesSource {
protected final IndexGeoPointFieldData indexFieldData;
public static final GeoPoint EMPTY = new GeoPoint() {
public GeoPoint(IndexGeoPointFieldData indexFieldData) {
this.indexFieldData = indexFieldData;
}
@Override
public MultiGeoPointValues geoPointValues(LeafReaderContext context) {
return org.elasticsearch.index.fielddata.FieldData.emptyMultiGeoPoints(context.reader().maxDoc());
}
@Override
public SortedBinaryDocValues bytesValues(LeafReaderContext context) throws IOException {
return org.elasticsearch.index.fielddata.FieldData.emptySortedBinary(context.reader().maxDoc());
}
};
@Override
public Bits docsWithValue(LeafReaderContext context) {
@ -470,13 +516,24 @@ public abstract class ValuesSource {
}
}
@Override
public SortedBinaryDocValues bytesValues(LeafReaderContext context) {
return indexFieldData.load(context).getBytesValues();
}
public abstract MultiGeoPointValues geoPointValues(LeafReaderContext context);
public org.elasticsearch.index.fielddata.MultiGeoPointValues geoPointValues(LeafReaderContext context) {
return indexFieldData.load(context).getGeoPointValues();
public static class Fielddata extends GeoPoint {
protected final IndexGeoPointFieldData indexFieldData;
public Fielddata(IndexGeoPointFieldData indexFieldData) {
this.indexFieldData = indexFieldData;
}
@Override
public SortedBinaryDocValues bytesValues(LeafReaderContext context) {
return indexFieldData.load(context).getBytesValues();
}
public org.elasticsearch.index.fielddata.MultiGeoPointValues geoPointValues(LeafReaderContext context) {
return indexFieldData.load(context).getGeoPointValues();
}
}
}

View File

@ -57,10 +57,10 @@ public abstract class ValuesSourceAggregatorFactory<VS extends ValuesSource> ext
@Override
public Aggregator createInternal(AggregationContext context, Aggregator parent, boolean collectsFromSingleBucket,
List<Reducer> reducers, Map<String, Object> metaData) throws IOException {
if (config.unmapped()) {
VS vs = context.valuesSource(config, context.searchContext());
if (vs == null) {
return createUnmapped(context, parent, reducers, metaData);
}
VS vs = context.valuesSource(config);
return doCreateInternal(vs, context, parent, collectsFromSingleBucket, reducers, metaData);
}

View File

@ -35,6 +35,7 @@ public class ValuesSourceConfig<VS extends ValuesSource> {
boolean unmapped = false;
String formatPattern;
ValueFormat format;
Object missing;
public ValuesSourceConfig(Class<VS> valueSourceType) {
this.valueSourceType = valueSourceType;

View File

@ -68,6 +68,7 @@ public class ValuesSourceParser<VS extends ValuesSource> {
Map<String, Object> params = null;
ValueType valueType = null;
String format = null;
Object missing = null;
}
private final String aggName;
@ -90,6 +91,10 @@ public class ValuesSourceParser<VS extends ValuesSource> {
}
public boolean token(String currentFieldName, XContentParser.Token token, XContentParser parser) throws IOException {
if ("missing".equals(currentFieldName) && token.isValue()) {
input.missing = parser.objectText();
return true;
}
if (token == XContentParser.Token.VALUE_STRING) {
if ("field".equals(currentFieldName)) {
input.field = parser.text();
@ -146,6 +151,7 @@ public class ValuesSourceParser<VS extends ValuesSource> {
valuesSourceType = ValuesSource.Bytes.class;
}
ValuesSourceConfig<VS> config = new ValuesSourceConfig<VS>(valuesSourceType);
config.missing = input.missing;
config.format = resolveFormat(input.format, valueType);
config.script = createScript();
config.scriptValueType = valueType;
@ -156,6 +162,7 @@ public class ValuesSourceParser<VS extends ValuesSource> {
if (mapper == null) {
Class<VS> valuesSourceType = valueType != null ? (Class<VS>) valueType.getValuesSourceType() : this.valuesSourceType;
ValuesSourceConfig<VS> config = new ValuesSourceConfig<>(valuesSourceType);
config.missing = input.missing;
config.format = resolveFormat(input.format, valueType);
config.unmapped = true;
if (valueType != null) {
@ -181,6 +188,7 @@ public class ValuesSourceParser<VS extends ValuesSource> {
}
config.fieldContext = new FieldContext(input.field, indexFieldData, mapper);
config.missing = input.missing;
config.script = createScript();
config.format = resolveFormat(input.format, mapper);
return config;

View File

@ -0,0 +1,195 @@
/*
* Licensed to Elasticsearch under one or more contributor
* license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright
* ownership. Elasticsearch licenses this file to you under
* the Apache License, Version 2.0 (the "License"); you may
* not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
package org.elasticsearch.search.aggregations;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.common.geo.GeoPoint;
import org.elasticsearch.search.aggregations.bucket.histogram.DateHistogramInterval;
import org.elasticsearch.search.aggregations.bucket.histogram.Histogram;
import org.elasticsearch.search.aggregations.bucket.terms.Terms;
import org.elasticsearch.search.aggregations.metrics.cardinality.Cardinality;
import org.elasticsearch.search.aggregations.metrics.geobounds.GeoBounds;
import org.elasticsearch.search.aggregations.metrics.percentiles.Percentiles;
import org.elasticsearch.search.aggregations.metrics.stats.Stats;
import org.elasticsearch.test.ElasticsearchIntegrationTest;
import static org.elasticsearch.search.aggregations.AggregationBuilders.cardinality;
import static org.elasticsearch.search.aggregations.AggregationBuilders.dateHistogram;
import static org.elasticsearch.search.aggregations.AggregationBuilders.geoBounds;
import static org.elasticsearch.search.aggregations.AggregationBuilders.histogram;
import static org.elasticsearch.search.aggregations.AggregationBuilders.percentiles;
import static org.elasticsearch.search.aggregations.AggregationBuilders.stats;
import static org.elasticsearch.search.aggregations.AggregationBuilders.terms;
import static org.elasticsearch.test.hamcrest.ElasticsearchAssertions.assertAcked;
import static org.elasticsearch.test.hamcrest.ElasticsearchAssertions.assertSearchResponse;
@ElasticsearchIntegrationTest.SuiteScopeTest
public class MissingValueTests extends ElasticsearchIntegrationTest {
@Override
protected int maximumNumberOfShards() {
return 2;
}
@Override
protected void setupSuiteScopeCluster() throws Exception {
assertAcked(prepareCreate("idx").addMapping("type", "date", "type=date", "location", "type=geo_point").get());
indexRandom(true,
client().prepareIndex("idx", "type", "1").setSource(),
client().prepareIndex("idx", "type", "2").setSource("str", "foo", "long", 3L, "double", 5.5, "date", "2015-05-07", "location", "1,2"));
}
public void testUnmappedTerms() {
SearchResponse response = client().prepareSearch("idx").addAggregation(terms("my_terms").field("non_existing_field").missing("bar")).get();
assertSearchResponse(response);
Terms terms = response.getAggregations().get("my_terms");
assertEquals(1, terms.getBuckets().size());
assertEquals(2, terms.getBucketByKey("bar").getDocCount());
}
public void testStringTerms() {
SearchResponse response = client().prepareSearch("idx").addAggregation(terms("my_terms").field("str").missing("bar")).get();
assertSearchResponse(response);
Terms terms = response.getAggregations().get("my_terms");
assertEquals(2, terms.getBuckets().size());
assertEquals(1, terms.getBucketByKey("foo").getDocCount());
assertEquals(1, terms.getBucketByKey("bar").getDocCount());
response = client().prepareSearch("idx").addAggregation(terms("my_terms").field("str").missing("foo")).get();
assertSearchResponse(response);
terms = response.getAggregations().get("my_terms");
assertEquals(1, terms.getBuckets().size());
assertEquals(2, terms.getBucketByKey("foo").getDocCount());
}
public void testLongTerms() {
SearchResponse response = client().prepareSearch("idx").addAggregation(terms("my_terms").field("long").missing(4)).get();
assertSearchResponse(response);
Terms terms = response.getAggregations().get("my_terms");
assertEquals(2, terms.getBuckets().size());
assertEquals(1, terms.getBucketByKey("3").getDocCount());
assertEquals(1, terms.getBucketByKey("4").getDocCount());
response = client().prepareSearch("idx").addAggregation(terms("my_terms").field("long").missing(3)).get();
assertSearchResponse(response);
terms = response.getAggregations().get("my_terms");
assertEquals(1, terms.getBuckets().size());
assertEquals(2, terms.getBucketByKey("3").getDocCount());
}
public void testDoubleTerms() {
SearchResponse response = client().prepareSearch("idx").addAggregation(terms("my_terms").field("double").missing(4.5)).get();
assertSearchResponse(response);
Terms terms = response.getAggregations().get("my_terms");
assertEquals(2, terms.getBuckets().size());
assertEquals(1, terms.getBucketByKey("4.5").getDocCount());
assertEquals(1, terms.getBucketByKey("5.5").getDocCount());
response = client().prepareSearch("idx").addAggregation(terms("my_terms").field("double").missing(5.5)).get();
assertSearchResponse(response);
terms = response.getAggregations().get("my_terms");
assertEquals(1, terms.getBuckets().size());
assertEquals(2, terms.getBucketByKey("5.5").getDocCount());
}
public void testUnmappedHistogram() {
SearchResponse response = client().prepareSearch("idx").addAggregation(histogram("my_histogram").field("non-existing_field").interval(5).missing(12)).get();
assertSearchResponse(response);
Histogram histogram = response.getAggregations().get("my_histogram");
assertEquals(1, histogram.getBuckets().size());
assertEquals(10L, histogram.getBuckets().get(0).getKey());
assertEquals(2, histogram.getBuckets().get(0).getDocCount());
}
public void testHistogram() {
SearchResponse response = client().prepareSearch("idx").addAggregation(histogram("my_histogram").field("long").interval(5).missing(7)).get();
assertSearchResponse(response);
Histogram histogram = response.getAggregations().get("my_histogram");
assertEquals(2, histogram.getBuckets().size());
assertEquals(0L, histogram.getBuckets().get(0).getKey());
assertEquals(1, histogram.getBuckets().get(0).getDocCount());
assertEquals(5L, histogram.getBuckets().get(1).getKey());
assertEquals(1, histogram.getBuckets().get(1).getDocCount());
response = client().prepareSearch("idx").addAggregation(histogram("my_histogram").field("long").interval(5).missing(3)).get();
assertSearchResponse(response);
histogram = response.getAggregations().get("my_histogram");
assertEquals(1, histogram.getBuckets().size());
assertEquals(0L, histogram.getBuckets().get(0).getKey());
assertEquals(2, histogram.getBuckets().get(0).getDocCount());
}
public void testDateHistogram() {
SearchResponse response = client().prepareSearch("idx").addAggregation(dateHistogram("my_histogram").field("date").interval(DateHistogramInterval.YEAR).missing("2014-05-07")).get();
assertSearchResponse(response);
Histogram histogram = response.getAggregations().get("my_histogram");
assertEquals(2, histogram.getBuckets().size());
assertEquals("2014-01-01T00:00:00.000Z", histogram.getBuckets().get(0).getKeyAsString());
assertEquals(1, histogram.getBuckets().get(0).getDocCount());
assertEquals("2015-01-01T00:00:00.000Z", histogram.getBuckets().get(1).getKeyAsString());
assertEquals(1, histogram.getBuckets().get(1).getDocCount());
response = client().prepareSearch("idx").addAggregation(dateHistogram("my_histogram").field("date").interval(DateHistogramInterval.YEAR).missing("2015-05-07")).get();
assertSearchResponse(response);
histogram = response.getAggregations().get("my_histogram");
assertEquals(1, histogram.getBuckets().size());
assertEquals("2015-01-01T00:00:00.000Z", histogram.getBuckets().get(0).getKeyAsString());
assertEquals(2, histogram.getBuckets().get(0).getDocCount());
}
public void testCardinality() {
SearchResponse response = client().prepareSearch("idx").addAggregation(cardinality("card").field("long").missing(2)).get();
assertSearchResponse(response);
Cardinality cardinality = response.getAggregations().get("card");
assertEquals(2, cardinality.getValue());
}
public void testPercentiles() {
SearchResponse response = client().prepareSearch("idx").addAggregation(percentiles("percentiles").field("long").missing(1000)).get();
assertSearchResponse(response);
Percentiles percentiles = response.getAggregations().get("percentiles");
assertEquals(1000, percentiles.percentile(100), 0);
}
public void testStats() {
SearchResponse response = client().prepareSearch("idx").addAggregation(stats("stats").field("long").missing(5)).get();
assertSearchResponse(response);
Stats stats = response.getAggregations().get("stats");
assertEquals(2, stats.getCount());
assertEquals(4, stats.getAvg(), 0);
}
public void testUnmappedGeoBounds() {
SearchResponse response = client().prepareSearch("idx").addAggregation(geoBounds("bounds").field("non_existing_field").missing("2,1")).get();
assertSearchResponse(response);
GeoBounds bounds = response.getAggregations().get("bounds");
assertEquals(new GeoPoint(2,1), bounds.bottomRight());
assertEquals(new GeoPoint(2,1), bounds.topLeft());
}
public void testGeoBounds() {
SearchResponse response = client().prepareSearch("idx").addAggregation(geoBounds("bounds").field("location").missing("2,1")).get();
assertSearchResponse(response);
GeoBounds bounds = response.getAggregations().get("bounds");
assertEquals(new GeoPoint(1,2), bounds.bottomRight());
assertEquals(new GeoPoint(2,1), bounds.topLeft());
}
}

View File

@ -0,0 +1,297 @@
/*
* Licensed to Elasticsearch under one or more contributor
* license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright
* ownership. Elasticsearch licenses this file to you under
* the Apache License, Version 2.0 (the "License"); you may
* not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
package org.elasticsearch.search.aggregations.support;
import com.carrotsearch.randomizedtesting.generators.RandomPicks;
import com.carrotsearch.randomizedtesting.generators.RandomStrings;
import org.apache.lucene.index.RandomAccessOrds;
import org.apache.lucene.index.SortedNumericDocValues;
import org.apache.lucene.util.BytesRef;
import org.apache.lucene.util.TestUtil;
import org.elasticsearch.common.geo.GeoPoint;
import org.elasticsearch.index.fielddata.AbstractRandomAccessOrds;
import org.elasticsearch.index.fielddata.MultiGeoPointValues;
import org.elasticsearch.index.fielddata.SortedBinaryDocValues;
import org.elasticsearch.index.fielddata.SortedNumericDoubleValues;
import org.elasticsearch.test.ElasticsearchTestCase;
import java.util.Arrays;
import java.util.HashSet;
import java.util.Set;
public class MissingValuesTests extends ElasticsearchTestCase {
public void testMissingBytes() {
final int numDocs = TestUtil.nextInt(random(), 1, 100);
final BytesRef[][] values = new BytesRef[numDocs][];
for (int i = 0; i < numDocs; ++i) {
values[i] = new BytesRef[random().nextInt(4)];
for (int j = 0; j < values[i].length; ++j) {
values[i][j] = new BytesRef(RandomStrings.randomAsciiOfLength(random(), 2));
}
Arrays.sort(values[i]);
}
SortedBinaryDocValues asBinaryValues = new SortedBinaryDocValues() {
int i = -1;
@Override
public BytesRef valueAt(int index) {
return values[i][index];
}
@Override
public void setDocument(int docId) {
i = docId;
}
@Override
public int count() {
return values[i].length;
}
};
final BytesRef missing = new BytesRef(RandomStrings.randomAsciiOfLength(random(), 2));
SortedBinaryDocValues withMissingReplaced = MissingValues.replaceMissing(asBinaryValues, missing);
for (int i = 0; i < numDocs; ++i) {
withMissingReplaced.setDocument(i);
if (values[i].length > 0) {
assertEquals(values[i].length, withMissingReplaced.count());
for (int j = 0; j < values[i].length; ++j) {
assertEquals(values[i][j], withMissingReplaced.valueAt(j));
}
} else {
assertEquals(1, withMissingReplaced.count());
assertEquals(missing, withMissingReplaced.valueAt(0));
}
}
}
public void testMissingOrds() {
final int numDocs = TestUtil.nextInt(random(), 1, 100);
final int numOrds = TestUtil.nextInt(random(), 1, 10);
final Set<BytesRef> valueSet = new HashSet<>();
while (valueSet.size() < numOrds) {
valueSet.add(new BytesRef(RandomStrings.randomAsciiOfLength(random(), 5)));
}
final BytesRef[] values = valueSet.toArray(new BytesRef[numOrds]);
Arrays.sort(values);
final int[][] ords = new int[numDocs][];
for (int i = 0; i < numDocs; ++i) {
ords[i] = new int[random().nextInt(numOrds)];
for (int j = 0; j < ords[i].length; ++j) {
ords[i][j] = j;
}
for (int j = ords[i].length - 1; j >= 0; --j) {
final int maxOrd = j == ords[i].length - 1 ? numOrds : ords[i][j+1];
ords[i][j] = TestUtil.nextInt(random(), ords[i][j], maxOrd - 1);
}
}
RandomAccessOrds asRandomAccessOrds = new AbstractRandomAccessOrds() {
int i = -1;
@Override
public void doSetDocument(int docID) {
i = docID;
}
@Override
public BytesRef lookupOrd(long ord) {
return values[(int) ord];
}
@Override
public long getValueCount() {
return values.length;
}
@Override
public long ordAt(int index) {
return ords[i][index];
}
@Override
public int cardinality() {
return ords[i].length;
}
};
final BytesRef existingMissing = RandomPicks.randomFrom(random(), values);
final BytesRef missingMissing = new BytesRef(RandomStrings.randomAsciiOfLength(random(), 5));
for (BytesRef missing : Arrays.asList(existingMissing, missingMissing)) {
RandomAccessOrds withMissingReplaced = MissingValues.replaceMissing(asRandomAccessOrds, missing);
if (valueSet.contains(missing)) {
assertEquals(values.length, withMissingReplaced.getValueCount());
} else {
assertEquals(values.length + 1, withMissingReplaced.getValueCount());
}
for (int i = 0; i < numDocs; ++i) {
withMissingReplaced.setDocument(i);
if (ords[i].length > 0) {
assertEquals(ords[i].length, withMissingReplaced.cardinality());
for (int j = 0; j < ords[i].length; ++j) {
assertEquals(values[ords[i][j]], withMissingReplaced.lookupOrd(withMissingReplaced.ordAt(j)));
}
} else {
assertEquals(1, withMissingReplaced.cardinality());
assertEquals(missing, withMissingReplaced.lookupOrd(withMissingReplaced.ordAt(0)));
}
}
}
}
public void testMissingLongs() {
final int numDocs = TestUtil.nextInt(random(), 1, 100);
final int[][] values = new int[numDocs][];
for (int i = 0; i < numDocs; ++i) {
values[i] = new int[random().nextInt(4)];
for (int j = 0; j < values[i].length; ++j) {
values[i][j] = randomInt();
}
Arrays.sort(values[i]);
}
SortedNumericDocValues asNumericValues = new SortedNumericDocValues() {
int i = -1;
@Override
public long valueAt(int index) {
return values[i][index];
}
@Override
public void setDocument(int docId) {
i = docId;
}
@Override
public int count() {
return values[i].length;
}
};
final long missing = randomInt();
SortedNumericDocValues withMissingReplaced = MissingValues.replaceMissing(asNumericValues, missing);
for (int i = 0; i < numDocs; ++i) {
withMissingReplaced.setDocument(i);
if (values[i].length > 0) {
assertEquals(values[i].length, withMissingReplaced.count());
for (int j = 0; j < values[i].length; ++j) {
assertEquals(values[i][j], withMissingReplaced.valueAt(j));
}
} else {
assertEquals(1, withMissingReplaced.count());
assertEquals(missing, withMissingReplaced.valueAt(0));
}
}
}
public void testMissingDoubles() {
final int numDocs = TestUtil.nextInt(random(), 1, 100);
final double[][] values = new double[numDocs][];
for (int i = 0; i < numDocs; ++i) {
values[i] = new double[random().nextInt(4)];
for (int j = 0; j < values[i].length; ++j) {
values[i][j] = randomDouble();
}
Arrays.sort(values[i]);
}
SortedNumericDoubleValues asNumericValues = new SortedNumericDoubleValues() {
int i = -1;
@Override
public double valueAt(int index) {
return values[i][index];
}
@Override
public void setDocument(int docId) {
i = docId;
}
@Override
public int count() {
return values[i].length;
}
};
final long missing = randomInt();
SortedNumericDoubleValues withMissingReplaced = MissingValues.replaceMissing(asNumericValues, missing);
for (int i = 0; i < numDocs; ++i) {
withMissingReplaced.setDocument(i);
if (values[i].length > 0) {
assertEquals(values[i].length, withMissingReplaced.count());
for (int j = 0; j < values[i].length; ++j) {
assertEquals(values[i][j], withMissingReplaced.valueAt(j), 0);
}
} else {
assertEquals(1, withMissingReplaced.count());
assertEquals(missing, withMissingReplaced.valueAt(0), 0);
}
}
}
public void testMissingGeoPoints() {
final int numDocs = TestUtil.nextInt(random(), 1, 100);
final GeoPoint[][] values = new GeoPoint[numDocs][];
for (int i = 0; i < numDocs; ++i) {
values[i] = new GeoPoint[random().nextInt(4)];
for (int j = 0; j < values[i].length; ++j) {
values[i][j] = new GeoPoint(randomDouble() * 90, randomDouble() * 180);
}
}
MultiGeoPointValues asGeoValues = new MultiGeoPointValues() {
int i = -1;
@Override
public GeoPoint valueAt(int index) {
return values[i][index];
}
@Override
public void setDocument(int docId) {
i = docId;
}
@Override
public int count() {
return values[i].length;
}
};
final GeoPoint missing = new GeoPoint(randomDouble() * 90, randomDouble() * 180);
MultiGeoPointValues withMissingReplaced = MissingValues.replaceMissing(asGeoValues, missing);
for (int i = 0; i < numDocs; ++i) {
withMissingReplaced.setDocument(i);
if (values[i].length > 0) {
assertEquals(values[i].length, withMissingReplaced.count());
for (int j = 0; j < values[i].length; ++j) {
assertEquals(values[i][j], withMissingReplaced.valueAt(j));
}
} else {
assertEquals(1, withMissingReplaced.count());
assertEquals(missing, withMissingReplaced.valueAt(0));
}
}
}
}