Doc values integration.
This commit allows for using Lucene doc values as a backend for field data,
moving the cost of building field data from the refresh operation to indexing.
In addition, Lucene doc values can be stored on disk (partially, or even
entirely), so that memory management is done at the operating system level
(file-system cache) instead of the JVM, avoiding long pauses during major
collections due to large heaps.
So far doc values are supported on numeric types and non-analyzed strings
(index:no or index:not_analyzed). Under the hood, it uses SORTED_SET doc values
which is the only type to support multi-valued fields. Since the field data API
set is a bit wider than the doc values API set, some operations are not
supported:
- field data filtering: this will fail if doc values are enabled,
- field data cache clearing, even for memory-based doc values formats,
- getting the memory usage for a specific field,
- knowing whether a field is actually multi-valued.
This commit also allows for configuring doc-values formats on a per-field basis
similarly to postings formats. In particular the doc values format of the
_version field can be configured through its own field mapper (it used to be
handled in UidFieldMapper previously).
Closes #3806
2013-06-12 12:51:51 +02:00
|
|
|
/*
|
|
|
|
* Licensed to ElasticSearch and Shay Banon under one
|
|
|
|
* or more contributor license agreements. See the NOTICE file
|
|
|
|
* distributed with this work for additional information
|
|
|
|
* regarding copyright ownership. ElasticSearch licenses this
|
|
|
|
* file to you under the Apache License, Version 2.0 (the
|
|
|
|
* "License"); you may not use this file except in compliance
|
|
|
|
* with the License. You may obtain a copy of the License at
|
|
|
|
*
|
|
|
|
* http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
*
|
|
|
|
* Unless required by applicable law or agreed to in writing,
|
|
|
|
* software distributed under the License is distributed on an
|
|
|
|
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
|
|
|
* KIND, either express or implied. See the License for the
|
|
|
|
* specific language governing permissions and limitations
|
|
|
|
* under the License.
|
|
|
|
*/
|
|
|
|
|
2013-11-07 14:37:01 +01:00
|
|
|
package org.elasticsearch.index.fielddata;
|
Doc values integration.
This commit allows for using Lucene doc values as a backend for field data,
moving the cost of building field data from the refresh operation to indexing.
In addition, Lucene doc values can be stored on disk (partially, or even
entirely), so that memory management is done at the operating system level
(file-system cache) instead of the JVM, avoiding long pauses during major
collections due to large heaps.
So far doc values are supported on numeric types and non-analyzed strings
(index:no or index:not_analyzed). Under the hood, it uses SORTED_SET doc values
which is the only type to support multi-valued fields. Since the field data API
set is a bit wider than the doc values API set, some operations are not
supported:
- field data filtering: this will fail if doc values are enabled,
- field data cache clearing, even for memory-based doc values formats,
- getting the memory usage for a specific field,
- knowing whether a field is actually multi-valued.
This commit also allows for configuring doc-values formats on a per-field basis
similarly to postings formats. In particular the doc values format of the
_version field can be configured through its own field mapper (it used to be
handled in UidFieldMapper previously).
Closes #3806
2013-06-12 12:51:51 +02:00
|
|
|
|
2013-12-12 15:05:47 +01:00
|
|
|
import org.apache.lucene.analysis.core.KeywordAnalyzer;
|
|
|
|
import org.apache.lucene.document.Document;
|
|
|
|
import org.apache.lucene.document.Field.Store;
|
|
|
|
import org.apache.lucene.document.StringField;
|
|
|
|
import org.apache.lucene.index.*;
|
|
|
|
import org.apache.lucene.store.RAMDirectory;
|
Doc values integration.
This commit allows for using Lucene doc values as a backend for field data,
moving the cost of building field data from the refresh operation to indexing.
In addition, Lucene doc values can be stored on disk (partially, or even
entirely), so that memory management is done at the operating system level
(file-system cache) instead of the JVM, avoiding long pauses during major
collections due to large heaps.
So far doc values are supported on numeric types and non-analyzed strings
(index:no or index:not_analyzed). Under the hood, it uses SORTED_SET doc values
which is the only type to support multi-valued fields. Since the field data API
set is a bit wider than the doc values API set, some operations are not
supported:
- field data filtering: this will fail if doc values are enabled,
- field data cache clearing, even for memory-based doc values formats,
- getting the memory usage for a specific field,
- knowing whether a field is actually multi-valued.
This commit also allows for configuring doc-values formats on a per-field basis
similarly to postings formats. In particular the doc values format of the
_version field can be configured through its own field mapper (it used to be
handled in UidFieldMapper previously).
Closes #3806
2013-06-12 12:51:51 +02:00
|
|
|
import org.elasticsearch.common.settings.ImmutableSettings;
|
|
|
|
import org.elasticsearch.common.settings.Settings;
|
|
|
|
import org.elasticsearch.index.Index;
|
|
|
|
import org.elasticsearch.index.fielddata.plain.*;
|
|
|
|
import org.elasticsearch.index.mapper.ContentPath;
|
|
|
|
import org.elasticsearch.index.mapper.FieldMapper;
|
|
|
|
import org.elasticsearch.index.mapper.Mapper.BuilderContext;
|
2013-12-12 15:05:47 +01:00
|
|
|
import org.elasticsearch.index.mapper.MapperBuilders;
|
Doc values integration.
This commit allows for using Lucene doc values as a backend for field data,
moving the cost of building field data from the refresh operation to indexing.
In addition, Lucene doc values can be stored on disk (partially, or even
entirely), so that memory management is done at the operating system level
(file-system cache) instead of the JVM, avoiding long pauses during major
collections due to large heaps.
So far doc values are supported on numeric types and non-analyzed strings
(index:no or index:not_analyzed). Under the hood, it uses SORTED_SET doc values
which is the only type to support multi-valued fields. Since the field data API
set is a bit wider than the doc values API set, some operations are not
supported:
- field data filtering: this will fail if doc values are enabled,
- field data cache clearing, even for memory-based doc values formats,
- getting the memory usage for a specific field,
- knowing whether a field is actually multi-valued.
This commit also allows for configuring doc-values formats on a per-field basis
similarly to postings formats. In particular the doc values format of the
_version field can be configured through its own field mapper (it used to be
handled in UidFieldMapper previously).
Closes #3806
2013-06-12 12:51:51 +02:00
|
|
|
import org.elasticsearch.index.mapper.core.*;
|
2014-01-02 15:04:47 -07:00
|
|
|
import org.elasticsearch.indices.fielddata.breaker.DummyCircuitBreakerService;
|
2013-10-13 22:37:40 +02:00
|
|
|
import org.elasticsearch.test.ElasticsearchTestCase;
|
Doc values integration.
This commit allows for using Lucene doc values as a backend for field data,
moving the cost of building field data from the refresh operation to indexing.
In addition, Lucene doc values can be stored on disk (partially, or even
entirely), so that memory management is done at the operating system level
(file-system cache) instead of the JVM, avoiding long pauses during major
collections due to large heaps.
So far doc values are supported on numeric types and non-analyzed strings
(index:no or index:not_analyzed). Under the hood, it uses SORTED_SET doc values
which is the only type to support multi-valued fields. Since the field data API
set is a bit wider than the doc values API set, some operations are not
supported:
- field data filtering: this will fail if doc values are enabled,
- field data cache clearing, even for memory-based doc values formats,
- getting the memory usage for a specific field,
- knowing whether a field is actually multi-valued.
This commit also allows for configuring doc-values formats on a per-field basis
similarly to postings formats. In particular the doc values format of the
_version field can be configured through its own field mapper (it used to be
handled in UidFieldMapper previously).
Closes #3806
2013-06-12 12:51:51 +02:00
|
|
|
|
|
|
|
import java.util.Arrays;
|
2013-12-12 15:05:47 +01:00
|
|
|
import java.util.Collections;
|
|
|
|
import java.util.IdentityHashMap;
|
|
|
|
import java.util.Set;
|
|
|
|
|
|
|
|
import static org.hamcrest.Matchers.instanceOf;
|
Doc values integration.
This commit allows for using Lucene doc values as a backend for field data,
moving the cost of building field data from the refresh operation to indexing.
In addition, Lucene doc values can be stored on disk (partially, or even
entirely), so that memory management is done at the operating system level
(file-system cache) instead of the JVM, avoiding long pauses during major
collections due to large heaps.
So far doc values are supported on numeric types and non-analyzed strings
(index:no or index:not_analyzed). Under the hood, it uses SORTED_SET doc values
which is the only type to support multi-valued fields. Since the field data API
set is a bit wider than the doc values API set, some operations are not
supported:
- field data filtering: this will fail if doc values are enabled,
- field data cache clearing, even for memory-based doc values formats,
- getting the memory usage for a specific field,
- knowing whether a field is actually multi-valued.
This commit also allows for configuring doc-values formats on a per-field basis
similarly to postings formats. In particular the doc values format of the
_version field can be configured through its own field mapper (it used to be
handled in UidFieldMapper previously).
Closes #3806
2013-06-12 12:51:51 +02:00
|
|
|
|
2013-10-13 22:37:40 +02:00
|
|
|
public class IndexFieldDataServiceTests extends ElasticsearchTestCase {
|
Doc values integration.
This commit allows for using Lucene doc values as a backend for field data,
moving the cost of building field data from the refresh operation to indexing.
In addition, Lucene doc values can be stored on disk (partially, or even
entirely), so that memory management is done at the operating system level
(file-system cache) instead of the JVM, avoiding long pauses during major
collections due to large heaps.
So far doc values are supported on numeric types and non-analyzed strings
(index:no or index:not_analyzed). Under the hood, it uses SORTED_SET doc values
which is the only type to support multi-valued fields. Since the field data API
set is a bit wider than the doc values API set, some operations are not
supported:
- field data filtering: this will fail if doc values are enabled,
- field data cache clearing, even for memory-based doc values formats,
- getting the memory usage for a specific field,
- knowing whether a field is actually multi-valued.
This commit also allows for configuring doc-values formats on a per-field basis
similarly to postings formats. In particular the doc values format of the
_version field can be configured through its own field mapper (it used to be
handled in UidFieldMapper previously).
Closes #3806
2013-06-12 12:51:51 +02:00
|
|
|
|
|
|
|
private static Settings DOC_VALUES_SETTINGS = ImmutableSettings.builder().put(FieldDataType.FORMAT_KEY, FieldDataType.DOC_VALUES_FORMAT_VALUE).build();
|
|
|
|
|
|
|
|
@SuppressWarnings("unchecked")
|
|
|
|
public void testGetForFieldDefaults() {
|
2014-01-02 15:04:47 -07:00
|
|
|
final IndexFieldDataService ifdService = new IndexFieldDataService(new Index("test"), new DummyCircuitBreakerService());
|
Doc values integration.
This commit allows for using Lucene doc values as a backend for field data,
moving the cost of building field data from the refresh operation to indexing.
In addition, Lucene doc values can be stored on disk (partially, or even
entirely), so that memory management is done at the operating system level
(file-system cache) instead of the JVM, avoiding long pauses during major
collections due to large heaps.
So far doc values are supported on numeric types and non-analyzed strings
(index:no or index:not_analyzed). Under the hood, it uses SORTED_SET doc values
which is the only type to support multi-valued fields. Since the field data API
set is a bit wider than the doc values API set, some operations are not
supported:
- field data filtering: this will fail if doc values are enabled,
- field data cache clearing, even for memory-based doc values formats,
- getting the memory usage for a specific field,
- knowing whether a field is actually multi-valued.
This commit also allows for configuring doc-values formats on a per-field basis
similarly to postings formats. In particular the doc values format of the
_version field can be configured through its own field mapper (it used to be
handled in UidFieldMapper previously).
Closes #3806
2013-06-12 12:51:51 +02:00
|
|
|
for (boolean docValues : Arrays.asList(true, false)) {
|
|
|
|
final BuilderContext ctx = new BuilderContext(null, new ContentPath(1));
|
|
|
|
final StringFieldMapper stringMapper = new StringFieldMapper.Builder("string").tokenized(false).fieldDataSettings(docValues ? DOC_VALUES_SETTINGS : ImmutableSettings.EMPTY).build(ctx);
|
|
|
|
ifdService.clear();
|
|
|
|
IndexFieldData<?> fd = ifdService.getForField(stringMapper);
|
|
|
|
if (docValues) {
|
|
|
|
assertTrue(fd instanceof SortedSetDVBytesIndexFieldData);
|
|
|
|
} else {
|
|
|
|
assertTrue(fd instanceof PagedBytesIndexFieldData);
|
|
|
|
}
|
|
|
|
|
|
|
|
for (FieldMapper<?> mapper : Arrays.asList(
|
|
|
|
new ByteFieldMapper.Builder("int").fieldDataSettings(docValues ? DOC_VALUES_SETTINGS : ImmutableSettings.EMPTY).build(ctx),
|
|
|
|
new ShortFieldMapper.Builder("int").fieldDataSettings(docValues ? DOC_VALUES_SETTINGS : ImmutableSettings.EMPTY).build(ctx),
|
|
|
|
new IntegerFieldMapper.Builder("int").fieldDataSettings(docValues ? DOC_VALUES_SETTINGS : ImmutableSettings.EMPTY).build(ctx),
|
|
|
|
new LongFieldMapper.Builder("long").fieldDataSettings(docValues ? DOC_VALUES_SETTINGS : ImmutableSettings.EMPTY).build(ctx)
|
|
|
|
)) {
|
|
|
|
ifdService.clear();
|
|
|
|
fd = ifdService.getForField(mapper);
|
|
|
|
if (docValues) {
|
2013-12-17 14:33:56 +01:00
|
|
|
assertTrue(fd instanceof BinaryDVNumericIndexFieldData);
|
Doc values integration.
This commit allows for using Lucene doc values as a backend for field data,
moving the cost of building field data from the refresh operation to indexing.
In addition, Lucene doc values can be stored on disk (partially, or even
entirely), so that memory management is done at the operating system level
(file-system cache) instead of the JVM, avoiding long pauses during major
collections due to large heaps.
So far doc values are supported on numeric types and non-analyzed strings
(index:no or index:not_analyzed). Under the hood, it uses SORTED_SET doc values
which is the only type to support multi-valued fields. Since the field data API
set is a bit wider than the doc values API set, some operations are not
supported:
- field data filtering: this will fail if doc values are enabled,
- field data cache clearing, even for memory-based doc values formats,
- getting the memory usage for a specific field,
- knowing whether a field is actually multi-valued.
This commit also allows for configuring doc-values formats on a per-field basis
similarly to postings formats. In particular the doc values format of the
_version field can be configured through its own field mapper (it used to be
handled in UidFieldMapper previously).
Closes #3806
2013-06-12 12:51:51 +02:00
|
|
|
} else {
|
|
|
|
assertTrue(fd instanceof PackedArrayIndexFieldData);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
final FloatFieldMapper floatMapper = new FloatFieldMapper.Builder("float").fieldDataSettings(docValues ? DOC_VALUES_SETTINGS : ImmutableSettings.EMPTY).build(ctx);
|
|
|
|
ifdService.clear();
|
|
|
|
fd = ifdService.getForField(floatMapper);
|
|
|
|
if (docValues) {
|
2013-12-17 14:33:56 +01:00
|
|
|
assertTrue(fd instanceof BinaryDVNumericIndexFieldData);
|
Doc values integration.
This commit allows for using Lucene doc values as a backend for field data,
moving the cost of building field data from the refresh operation to indexing.
In addition, Lucene doc values can be stored on disk (partially, or even
entirely), so that memory management is done at the operating system level
(file-system cache) instead of the JVM, avoiding long pauses during major
collections due to large heaps.
So far doc values are supported on numeric types and non-analyzed strings
(index:no or index:not_analyzed). Under the hood, it uses SORTED_SET doc values
which is the only type to support multi-valued fields. Since the field data API
set is a bit wider than the doc values API set, some operations are not
supported:
- field data filtering: this will fail if doc values are enabled,
- field data cache clearing, even for memory-based doc values formats,
- getting the memory usage for a specific field,
- knowing whether a field is actually multi-valued.
This commit also allows for configuring doc-values formats on a per-field basis
similarly to postings formats. In particular the doc values format of the
_version field can be configured through its own field mapper (it used to be
handled in UidFieldMapper previously).
Closes #3806
2013-06-12 12:51:51 +02:00
|
|
|
} else {
|
|
|
|
assertTrue(fd instanceof FloatArrayIndexFieldData);
|
|
|
|
}
|
|
|
|
|
|
|
|
final DoubleFieldMapper doubleMapper = new DoubleFieldMapper.Builder("double").fieldDataSettings(docValues ? DOC_VALUES_SETTINGS : ImmutableSettings.EMPTY).build(ctx);
|
|
|
|
ifdService.clear();
|
|
|
|
fd = ifdService.getForField(doubleMapper);
|
|
|
|
if (docValues) {
|
2013-12-17 14:33:56 +01:00
|
|
|
assertTrue(fd instanceof BinaryDVNumericIndexFieldData);
|
Doc values integration.
This commit allows for using Lucene doc values as a backend for field data,
moving the cost of building field data from the refresh operation to indexing.
In addition, Lucene doc values can be stored on disk (partially, or even
entirely), so that memory management is done at the operating system level
(file-system cache) instead of the JVM, avoiding long pauses during major
collections due to large heaps.
So far doc values are supported on numeric types and non-analyzed strings
(index:no or index:not_analyzed). Under the hood, it uses SORTED_SET doc values
which is the only type to support multi-valued fields. Since the field data API
set is a bit wider than the doc values API set, some operations are not
supported:
- field data filtering: this will fail if doc values are enabled,
- field data cache clearing, even for memory-based doc values formats,
- getting the memory usage for a specific field,
- knowing whether a field is actually multi-valued.
This commit also allows for configuring doc-values formats on a per-field basis
similarly to postings formats. In particular the doc values format of the
_version field can be configured through its own field mapper (it used to be
handled in UidFieldMapper previously).
Closes #3806
2013-06-12 12:51:51 +02:00
|
|
|
} else {
|
|
|
|
assertTrue(fd instanceof DoubleArrayIndexFieldData);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
@SuppressWarnings("unchecked")
|
|
|
|
public void testByPassDocValues() {
|
2014-01-02 15:04:47 -07:00
|
|
|
final IndexFieldDataService ifdService = new IndexFieldDataService(new Index("test"), new DummyCircuitBreakerService());
|
Doc values integration.
This commit allows for using Lucene doc values as a backend for field data,
moving the cost of building field data from the refresh operation to indexing.
In addition, Lucene doc values can be stored on disk (partially, or even
entirely), so that memory management is done at the operating system level
(file-system cache) instead of the JVM, avoiding long pauses during major
collections due to large heaps.
So far doc values are supported on numeric types and non-analyzed strings
(index:no or index:not_analyzed). Under the hood, it uses SORTED_SET doc values
which is the only type to support multi-valued fields. Since the field data API
set is a bit wider than the doc values API set, some operations are not
supported:
- field data filtering: this will fail if doc values are enabled,
- field data cache clearing, even for memory-based doc values formats,
- getting the memory usage for a specific field,
- knowing whether a field is actually multi-valued.
This commit also allows for configuring doc-values formats on a per-field basis
similarly to postings formats. In particular the doc values format of the
_version field can be configured through its own field mapper (it used to be
handled in UidFieldMapper previously).
Closes #3806
2013-06-12 12:51:51 +02:00
|
|
|
final BuilderContext ctx = new BuilderContext(null, new ContentPath(1));
|
2013-12-12 15:05:47 +01:00
|
|
|
final StringFieldMapper stringMapper = MapperBuilders.stringField("string").tokenized(false).fieldDataSettings(DOC_VALUES_SETTINGS).fieldDataSettings(ImmutableSettings.builder().put("format", "fst").build()).build(ctx);
|
Doc values integration.
This commit allows for using Lucene doc values as a backend for field data,
moving the cost of building field data from the refresh operation to indexing.
In addition, Lucene doc values can be stored on disk (partially, or even
entirely), so that memory management is done at the operating system level
(file-system cache) instead of the JVM, avoiding long pauses during major
collections due to large heaps.
So far doc values are supported on numeric types and non-analyzed strings
(index:no or index:not_analyzed). Under the hood, it uses SORTED_SET doc values
which is the only type to support multi-valued fields. Since the field data API
set is a bit wider than the doc values API set, some operations are not
supported:
- field data filtering: this will fail if doc values are enabled,
- field data cache clearing, even for memory-based doc values formats,
- getting the memory usage for a specific field,
- knowing whether a field is actually multi-valued.
This commit also allows for configuring doc-values formats on a per-field basis
similarly to postings formats. In particular the doc values format of the
_version field can be configured through its own field mapper (it used to be
handled in UidFieldMapper previously).
Closes #3806
2013-06-12 12:51:51 +02:00
|
|
|
ifdService.clear();
|
|
|
|
IndexFieldData<?> fd = ifdService.getForField(stringMapper);
|
|
|
|
assertTrue(fd instanceof FSTBytesIndexFieldData);
|
|
|
|
|
|
|
|
final Settings fdSettings = ImmutableSettings.builder().put("format", "array").build();
|
|
|
|
for (FieldMapper<?> mapper : Arrays.asList(
|
|
|
|
new ByteFieldMapper.Builder("int").fieldDataSettings(DOC_VALUES_SETTINGS).fieldDataSettings(fdSettings).build(ctx),
|
|
|
|
new ShortFieldMapper.Builder("int").fieldDataSettings(DOC_VALUES_SETTINGS).fieldDataSettings(fdSettings).build(ctx),
|
|
|
|
new IntegerFieldMapper.Builder("int").fieldDataSettings(DOC_VALUES_SETTINGS).fieldDataSettings(fdSettings).build(ctx),
|
|
|
|
new LongFieldMapper.Builder("long").fieldDataSettings(DOC_VALUES_SETTINGS).fieldDataSettings(fdSettings).build(ctx)
|
|
|
|
)) {
|
|
|
|
ifdService.clear();
|
|
|
|
fd = ifdService.getForField(mapper);
|
|
|
|
assertTrue(fd instanceof PackedArrayIndexFieldData);
|
|
|
|
}
|
|
|
|
|
2013-12-12 15:05:47 +01:00
|
|
|
final FloatFieldMapper floatMapper = MapperBuilders.floatField("float").fieldDataSettings(DOC_VALUES_SETTINGS).fieldDataSettings(fdSettings).build(ctx);
|
Doc values integration.
This commit allows for using Lucene doc values as a backend for field data,
moving the cost of building field data from the refresh operation to indexing.
In addition, Lucene doc values can be stored on disk (partially, or even
entirely), so that memory management is done at the operating system level
(file-system cache) instead of the JVM, avoiding long pauses during major
collections due to large heaps.
So far doc values are supported on numeric types and non-analyzed strings
(index:no or index:not_analyzed). Under the hood, it uses SORTED_SET doc values
which is the only type to support multi-valued fields. Since the field data API
set is a bit wider than the doc values API set, some operations are not
supported:
- field data filtering: this will fail if doc values are enabled,
- field data cache clearing, even for memory-based doc values formats,
- getting the memory usage for a specific field,
- knowing whether a field is actually multi-valued.
This commit also allows for configuring doc-values formats on a per-field basis
similarly to postings formats. In particular the doc values format of the
_version field can be configured through its own field mapper (it used to be
handled in UidFieldMapper previously).
Closes #3806
2013-06-12 12:51:51 +02:00
|
|
|
ifdService.clear();
|
|
|
|
fd = ifdService.getForField(floatMapper);
|
|
|
|
assertTrue(fd instanceof FloatArrayIndexFieldData);
|
|
|
|
|
2013-12-12 15:05:47 +01:00
|
|
|
final DoubleFieldMapper doubleMapper = MapperBuilders.doubleField("double").fieldDataSettings(DOC_VALUES_SETTINGS).fieldDataSettings(fdSettings).build(ctx);
|
Doc values integration.
This commit allows for using Lucene doc values as a backend for field data,
moving the cost of building field data from the refresh operation to indexing.
In addition, Lucene doc values can be stored on disk (partially, or even
entirely), so that memory management is done at the operating system level
(file-system cache) instead of the JVM, avoiding long pauses during major
collections due to large heaps.
So far doc values are supported on numeric types and non-analyzed strings
(index:no or index:not_analyzed). Under the hood, it uses SORTED_SET doc values
which is the only type to support multi-valued fields. Since the field data API
set is a bit wider than the doc values API set, some operations are not
supported:
- field data filtering: this will fail if doc values are enabled,
- field data cache clearing, even for memory-based doc values formats,
- getting the memory usage for a specific field,
- knowing whether a field is actually multi-valued.
This commit also allows for configuring doc-values formats on a per-field basis
similarly to postings formats. In particular the doc values format of the
_version field can be configured through its own field mapper (it used to be
handled in UidFieldMapper previously).
Closes #3806
2013-06-12 12:51:51 +02:00
|
|
|
ifdService.clear();
|
|
|
|
fd = ifdService.getForField(doubleMapper);
|
|
|
|
assertTrue(fd instanceof DoubleArrayIndexFieldData);
|
|
|
|
}
|
|
|
|
|
2013-12-12 15:05:47 +01:00
|
|
|
public void testChangeFieldDataFormat() throws Exception {
|
2014-01-02 15:04:47 -07:00
|
|
|
final IndexFieldDataService ifdService = new IndexFieldDataService(new Index("test"), new DummyCircuitBreakerService());
|
2013-12-12 15:05:47 +01:00
|
|
|
final BuilderContext ctx = new BuilderContext(null, new ContentPath(1));
|
|
|
|
final StringFieldMapper mapper1 = MapperBuilders.stringField("s").tokenized(false).fieldDataSettings(ImmutableSettings.builder().put(FieldDataType.FORMAT_KEY, "paged_bytes").build()).build(ctx);
|
|
|
|
final IndexWriter writer = new IndexWriter(new RAMDirectory(), new IndexWriterConfig(TEST_VERSION_CURRENT, new KeywordAnalyzer()));
|
|
|
|
Document doc = new Document();
|
|
|
|
doc.add(new StringField("s", "thisisastring", Store.NO));
|
|
|
|
writer.addDocument(doc);
|
|
|
|
final IndexReader reader1 = DirectoryReader.open(writer, true);
|
|
|
|
IndexFieldData<?> ifd = ifdService.getForField(mapper1);
|
|
|
|
assertThat(ifd, instanceOf(PagedBytesIndexFieldData.class));
|
|
|
|
Set<AtomicReader> oldSegments = Collections.newSetFromMap(new IdentityHashMap<AtomicReader, Boolean>());
|
|
|
|
for (AtomicReaderContext arc : reader1.leaves()) {
|
|
|
|
oldSegments.add(arc.reader());
|
|
|
|
AtomicFieldData<?> afd = ifd.load(arc);
|
|
|
|
assertThat(afd, instanceOf(PagedBytesAtomicFieldData.class));
|
|
|
|
}
|
|
|
|
// write new segment
|
|
|
|
writer.addDocument(doc);
|
|
|
|
final IndexReader reader2 = DirectoryReader.open(writer, true);
|
|
|
|
final StringFieldMapper mapper2 = MapperBuilders.stringField("s").tokenized(false).fieldDataSettings(ImmutableSettings.builder().put(FieldDataType.FORMAT_KEY, "fst").build()).build(ctx);
|
|
|
|
ifdService.onMappingUpdate();
|
|
|
|
ifd = ifdService.getForField(mapper2);
|
|
|
|
assertThat(ifd, instanceOf(FSTBytesIndexFieldData.class));
|
|
|
|
for (AtomicReaderContext arc : reader2.leaves()) {
|
|
|
|
AtomicFieldData<?> afd = ifd.load(arc);
|
|
|
|
if (oldSegments.contains(arc.reader())) {
|
|
|
|
assertThat(afd, instanceOf(PagedBytesAtomicFieldData.class));
|
|
|
|
} else {
|
|
|
|
assertThat(afd, instanceOf(FSTBytesAtomicFieldData.class));
|
|
|
|
}
|
|
|
|
}
|
|
|
|
reader1.close();
|
|
|
|
reader2.close();
|
|
|
|
writer.close();
|
|
|
|
writer.getDirectory().close();
|
|
|
|
}
|
|
|
|
|
Doc values integration.
This commit allows for using Lucene doc values as a backend for field data,
moving the cost of building field data from the refresh operation to indexing.
In addition, Lucene doc values can be stored on disk (partially, or even
entirely), so that memory management is done at the operating system level
(file-system cache) instead of the JVM, avoiding long pauses during major
collections due to large heaps.
So far doc values are supported on numeric types and non-analyzed strings
(index:no or index:not_analyzed). Under the hood, it uses SORTED_SET doc values
which is the only type to support multi-valued fields. Since the field data API
set is a bit wider than the doc values API set, some operations are not
supported:
- field data filtering: this will fail if doc values are enabled,
- field data cache clearing, even for memory-based doc values formats,
- getting the memory usage for a specific field,
- knowing whether a field is actually multi-valued.
This commit also allows for configuring doc-values formats on a per-field basis
similarly to postings formats. In particular the doc values format of the
_version field can be configured through its own field mapper (it used to be
handled in UidFieldMapper previously).
Closes #3806
2013-06-12 12:51:51 +02:00
|
|
|
}
|