Commit Graph

63 Commits

Author SHA1 Message Date
David Pilato 6c7a44ccd9 Fix test in mapper attachments plugin 2016-04-29 15:02:04 +02:00
David Pilato 2636703afa Merge branch 'master' into pr/attachments-add-test-forced-values 2016-04-29 14:55:42 +02:00
Adrien Grand d84c643f58 Use the new points API to index numeric fields. #17746
This makes all numeric fields including `date`, `ip` and `token_count` use
points instead of the inverted index as a lookup structure. This is expected
to perform worse for exact queries, but faster for range queries. It also
requires less storage.

Notes about how the change works:
 - Numeric mappers have been split into a legacy version that is essentially
   the current mapper, and a new version that uses points, eg.
   LegacyDateFieldMapper and DateFieldMapper.
 - Since new and old fields have the same names, the decision about which one
   to use is made based on the index creation version.
 - If you try to force using a legacy field on a new index or a field that uses
   points on an old index, you will get an exception.
 - IP addresses now support IPv6 via Lucene's InetAddressPoint and store them
   in SORTED_SET doc values using the same encoding (fixed length of 16 bytes
   and sortable).
 - The internal MappedFieldType that is stored by the new mappers does not have
   any of the points-related properties set. Instead, it keeps setting the index
   options when parsing the `index` property of mappings and does
   `if (fieldType.indexOptions() != IndexOptions.NONE) { // add point field }`
   when parsing documents.

Known issues that won't fix:
 - You can't use numeric fields in significant terms aggregations anymore since
   this requires document frequencies, which points do not record.
 - Term queries on numeric fields will now return constant scores instead of
   giving better scores to the rare values.

Known issues that we could work around (in follow-up PRs, this one is too large
already):
 - Range queries on `ip` addresses only work if both the lower and upper bounds
   are inclusive (exclusive bounds are not exposed in Lucene). We could either
   decide to implement it, or drop range support entirely and tell users to
   query subnets using the CIDR notation instead.
 - Since IP addresses now use a different representation for doc values,
   aggregations will fail when running a terms aggregation on an ip field on a
   list of indices that contains both pre-5.0 and 5.0 indices.
 - The ip range aggregation does not work on the new ip field. We need to either
   implement range aggs for SORTED_SET doc values or drop support for ip ranges
   and tell users to use filters instead. #17700

Closes #16751
Closes #17007
Closes #11513
2016-04-14 17:56:23 +02:00
Adrien Grand 013acf9179 Remove MappedFieldType.value. #17557
This commit removes `MappedFieldType.value` and simplifies
`MappedFieldType.valueforSearch`. `valueforSearch` was used to post-process
values that come for stored fields (eg. to convert a long back to a string
representation of a date in the case of a date field) and also values that
are extracted from the source but only in the case of GET calls: it would
not be called when performing source filtering on search requests.

`valueforSearch` is now only called for stored fields, since values that are
extracted from the source should already be formatted as expected.
2016-04-12 09:12:56 +02:00
Adrien Grand 42526ac28e Remove Settings.settingsBuilder.
We have both `Settings.settingsBuilder` and `Settings.builder` that do exactly
the same thing, so we should keep only one. I kept `Settings.builder` since it
has my preference but also it is the one that we use in examples of the Java API.
2016-04-08 18:10:02 +02:00
David Pilato c6b1beb083 Add a test for forced values in mapper-attachments plugin
This PR just adds a new test where we check that we forcing a value in the JSON document actually works as expected:

```json
{
     "file": {
        "_content": "BASE64"
        "_name": "12-240.pdf",
        "_language": "en",
        "_content_type": "pdf"
    }
}
```

Note that we don't support forcing all values. So sending:

```json
{
     "file": {
        "_content": "BASE64"
        "_name": "12-240.pdf",
        "_title": "12-240.pdf",
        "_keywords": "Div42 Src580 LGE Mechtech",
        "_language": "en",
        "_content_type": "pdf"
    }
}
```

Will have absolutely no effect on fields `title` and `keywords`.

Note that when `_language` is set, it only works if `index.mapping.attachment.detect_language` is set to `true`.

Related to https://discuss.elastic.co/t/mapper-attachments/46615/4
2016-04-08 10:07:21 +02:00
David Pilato 9acb0bb28c Merge branch 'master' into pr/16598-register-filter-settings
# Conflicts:
#	core/src/main/java/org/elasticsearch/cluster/service/InternalClusterService.java
#	core/src/main/java/org/elasticsearch/common/settings/IndexScopedSettings.java
#	core/src/main/java/org/elasticsearch/common/settings/Setting.java
2016-03-13 14:52:10 +01:00
Ryan Ernst 591fb8f028 Merge branch 'master' into cli-parsing 2016-03-11 10:45:05 -08:00
Ryan Ernst 2f3efc3fe1 Add doc and docx rest test to mapper attachment along with
getClassLoader permission
2016-03-10 13:28:19 -08:00
Ryan Ernst e5c852f767 Convert bootstrapcli parser to jopt-simple 2016-03-08 13:39:37 -08:00
David Pilato e35032950e Merge branch 'master' into pr/16598-register-filter-settings 2016-03-05 11:37:03 +01:00
David Pilato 2bb3846d1f Update after review:
* remove `ClusterScope`
* rename `ClusterSettings` to `NodeSettings`
* rename `SettingsProperty` to `Property`
2016-03-04 16:53:24 +01:00
David Pilato c11cf3bf1f Merge branch 'master' into pr/16598-register-filter-settings
# Conflicts:
#	core/src/main/java/org/elasticsearch/common/logging/ESLoggerFactory.java
#	core/src/main/java/org/elasticsearch/common/settings/Setting.java
#	core/src/test/java/org/elasticsearch/common/settings/SettingTests.java
2016-03-04 12:23:10 +01:00
David Pilato f97ce3c728 Deprecate mapper-attachments plugin
See #16910
2016-03-04 11:49:12 +01:00
Lee Hinman 6adbbff97c Fix organization rename in all files in project
Basically a query-replace of "https://github.com/elasticsearch/" with "https://github.com/elastic/"
2016-03-03 12:04:13 -07:00
Adrien Grand eef19be072 Deprecate string in favor of text/keyword. #16877
This commit removes the ability to use string fields on indices created on or
after 5.0. Dynamic mappings now generate text fields by default for strings
but there are plans to also add a sub keyword field (in a future PR).

Most of the changes in this commit are just about replacing string with
keyword or text. Some tests have been removed because they existed because of
corner cases of string mappings like setting ignore-above on a text field or
enabling term vectors on a keyword field which are now impossible.

The plan is to remove strings entirely in 6.0.
2016-03-03 10:20:56 +01:00
David Pilato 5fbf1b95dc Merge branch 'master' into pr/16598-register-filter-settings
# Conflicts:
#	core/src/main/java/org/elasticsearch/common/logging/ESLoggerFactory.java
#	core/src/main/java/org/elasticsearch/discovery/DiscoveryService.java
#	core/src/main/java/org/elasticsearch/discovery/DiscoverySettings.java
#	core/src/main/java/org/elasticsearch/http/HttpTransportSettings.java
#	plugins/repository-azure/src/main/java/org/elasticsearch/cloud/azure/storage/AzureStorageService.java
2016-03-02 09:43:53 +01:00
Nik Everett 95cc3e38fc Check test naming conventions on all modules
The big win here is catching tests that are incorrectly named and will
be skipped by gradle, providing a false sense of security.

The whole thing takes about 10 seconds on my Macbook Air, not counting
compiling the test classes, which seems worth it. Because this runs as
a gradle task with propery UP-TO-DATE handling it can be skipped if the
tests haven't been changed which should save some time.

I chose to keep this in test:framework rather than a new subproject of
buildSrc because ESIntegTestCase and doesn't inroduce any additional
dependencies.
2016-02-29 16:31:49 -05:00
David Pilato d77daf3861 Use an SettingsProperty.Dynamic for dynamic properties 2016-02-28 11:06:45 +01:00
David Pilato 31b5e0888f Use an SettingsProperty enumSet
Instead of modifying methods each time we need to add a new behavior for settings, we can simply pass `SettingsProperty... properties` instead.

`SettingsProperty` could be defined then:

```
public enum SettingsProperty {
  Filtered,
  Dynamic,
  ClusterScope,
  NodeScope,
  IndexScope
 // HereGoesYours;
}
```

Then in setting code, it become much more flexible.

TODO: Note that we need to validate SettingsProperty which are added to a Setting as some of them might be mutually exclusive.
2016-02-28 00:48:04 +01:00
Adrien Grand 4f8895eae3 Add a text field.
This new field is intended to replace analyzed string fields.
2016-02-15 10:43:44 +01:00
Adrien Grand a1e251af20 Remove the MapperBuilders utility class.
We can just call constructors directly.
2016-02-11 17:32:58 +01:00
Tanguy Leroux 865bbc2096 Remove string formatting from Terminal print methods
This can be trappy and wrong formating strings can throws format exceptions and hide the real message.
2016-02-02 19:57:16 +01:00
Ryan Ernst a2c37c0989 CliTool: Allow unexpected exceptions to propagate
Cli tools currently catch all exceptions, and only print the exception
message, except when a special system property is set. Even with this
flag set, certain exceptions, like IOException, are captured and their
stack trace is always lost.

This change adds a UserError class, which can be used a cli tools to
specify a message to the user, as well as an exit status. All other
exceptions are propagated out of main, so java will exit with non-zero
and print the stack trace.
2016-02-01 16:35:22 -08:00
Adrien Grand 35709f62b6 Be stricter about parsing boolean values in mappings.
Parsing is currently very lenient, which has the bad side-effect that if you
have a typo and pass eg. `store: fasle` this will actually be interpreted as
`store: true`. Since mappings can't be changed after the fact, it is quite bad
if it happens on an index that already contains data.

Note that this does not cover all settings that accept a boolean, but since the
PR was quite hard to build and already covers some main settirgs like `store`
or `doc_values` this would already be a good incremental improvement.
2016-01-27 09:06:00 +01:00
Daniel Mitterdorfer e9bb3d31a3 Convert "path.*" and "pidfile" to new settings infra 2016-01-22 15:14:13 +01:00
Ryan Ernst df24019261 Merge pull request #16038 from rjernst/remove_site_plugin
Plugins: Remove site plugins
2016-01-21 12:32:22 -08:00
Nik Everett 6b6d01c4fe Merge pull request #16072 from nik9000/xlint_plugins
Remove remaining xlints from plugins
2016-01-19 10:54:15 -05:00
Nik Everett 4efa8c4ff5 Remove remaining xlints from plugins 2016-01-19 10:53:48 -05:00
Simon Willnauer fbfa9f4925 Merge branch 'master' into new_index_settings 2016-01-19 10:13:48 +01:00
Ryan Ernst ef4f0a8699 Test: Make rest test framework accept http directly for the test cluster
The rest test framework, because it used to be tightly integrated with
ESIntegTestCase, currently expects the addresses for the test cluster to
be passed using the transport protocol port. However, it only uses this
to then find the http address.

This change makes ESRestTestCase extend from ESTestCase instead of
ESIntegTestCase, and changes the sysprop used to tests.rest.cluster,
which now takes the http address.

closes #15459
2016-01-18 16:44:14 -08:00
Simon Willnauer 7925e2ef84 convert IndexModule settings 2016-01-18 09:23:35 +01:00
Ryan Ernst 3b78267c71 Plugins: Remove site plugins
Site plugins used to be used for things like kibana and marvel, but
there is no longer a need since kibana (and marvel as a kibana plugin)
uses node.js. This change removes site plugins, as well as the flag for
jvm plugins. Now all plugins are jvm plugins.
2016-01-16 22:45:37 -08:00
Nik Everett 81a7607256 Remove -Xlint:-deprecation from plugins
Instead we suppress warnings about using deprecated stuff near the usage
site with a comment about why its ok.
2016-01-07 20:44:46 -05:00
Robert Muir 180ab2493e Improve thirdPartyAudit check, round 3 2015-12-28 22:38:55 -05:00
Adrien Grand d8d8666877 Remove `index_name` back compat.
Since 2.0 we enforce that fields have the same full and index names. So in 3.x
we can remove the ability to have different names on the same field.
2015-12-23 14:55:26 +01:00
Adrien Grand ac393b7a31 Make mappings tests more realistic.
DocumentMapperParser has both parse and parseCompressed methods. Except that the
parse methods are ONLY used from the unit tests. This commit removes the parse
method and moves all tests to parseCompressed so that they test more
realistically how mappings are managed.

Then I renamed parseCompressed to parse given that this is the only alternative
anyway.
2015-12-21 10:44:00 +01:00
Ryan Ernst 4ea19995cf Remove wildcard imports 2015-12-18 12:43:47 -08:00
Robert Muir 2e2e328879 add missing license header 2015-12-18 13:02:39 -05:00
Adrien Grand 6ccc759691 Merge pull request #15480 from jpountz/fix/mapping_explicit_defaults
Make mapping serialization more robust.
2015-12-17 11:23:27 +01:00
Robert Muir 6692e42d9a thirdPartyAudit round 2
This fixes the `lenient` parameter to be `missingClasses`. I will remove this boolean and we can handle them via the normal whitelist.
It also adds a check for sheisty classes (jar hell with the jdk).
This is inspired by the lucene "sheisty" classes check, but it has false positives. This check is more evil, it validates every class file against the extension classloader as a resource, to see if it exists there. If so: jar hell.

This jar hell is a problem for several reasons:

1. causes insanely-hard-to-debug problems (like bugs in forbidden-apis)
2. hides problems (like internal api access)
3. the code you think is executing, is not really executing
4. security permissions are not what you think they are
5. brings in unnecessary dependencies
6. its jar hell

The more difficult problems are stuff like jython, where these classes are simply 'uberjared' directly in, so you cant just fix them by removing a bogus dependency. And there is a legit reason for them to do that, they want to support java 1.4.
2015-12-17 02:35:00 -05:00
Robert Muir 42138007db add some more comments about internal api usage 2015-12-16 18:56:02 -05:00
Robert Muir ee79d46583 Add gradle thirdPartyAudit to precommit tasks 2015-12-16 16:38:16 -05:00
Adrien Grand 8ac8c1f547 Make mapping serialization more robust.
When creating a metadata mapper for a new type, we reuse an existing
configuration from an existing type (if any) in order to avoid introducing
conflicts. However this field type that is provided is considered as both an
initial configuration and the default configuration. So at serialization time,
we might only serialize the difference between the current configuration and
this default configuration, which might be different to what is actually
considered the default configuration.

This does not cause bugs today because metadata mappers usually override the
toXContent method and compare the current field type with Defaults.FIELD_TYPE
instead of defaultFieldType() but I would still like to do this change to
avoid future bugs.
2015-12-16 16:08:45 +01:00
Adrien Grand d94bba2d9c Remove back compat for the `path` option.
The `path` option allowed to index/store a field `a.b.c` under just `c` when
set to `just_name`. This "feature" has been removed in 2.0 in favor of `copy_to`
so we can remove the back compat in 3.x.
2015-12-15 14:55:23 +01:00
Adrien Grand 50eeafa75c Make mappings immutable.
Today mappings are mutable because of two APIs:
 - Mapper.merge, which expects changes to be performed in-place
 - IncludeInAll, which allows to change whether values should be put in the
   `_all` field in place.

This commit changes both APIs to return a modified copy instead of modifying in
place so that mappings can be immutable. For now, only the type-level object is
immutable, but in the future we can imagine making them immutable at the
index-level so that mapping updates could be completely atomic at the index
level.

Close #9365
2015-12-15 10:20:28 +01:00
Britta Weber e0aa481bf5 Merge pull request #15213 from brwe/copy-to-in-multi-fields-exception
throw exception if a copy_to is within a multi field

Copy to within multi field is ignored from 2.0 on, see #10802.
Instead of just ignoring it, we should throw an exception if this
is found in the mapping when a mapping is added. For already
existing indices we should at least log a warning.
We remove the copy_to in any case.

related to #14946
2015-12-08 14:41:07 +01:00
Adrien Grand 3f86adddbf Remove MergeMappingException.
Failures to merge a mapping can either come as a MergeMappingException if they
come from Mapper.merge or as an IllegalArgumentException if they come from
FieldTypeLookup.checkCompatibility. I think we should settle on one: this pull
request replaces all usage of MergeMappingException with
IllegalArgumentException.
2015-12-04 12:56:26 +01:00
Britta Weber d8a1a4bd43 fix toXContent() for mapper attachments field
We must use simpleName() instead of name() because otherwise when the mapping
is generated as a string the field name will be the full path with dots
and that is illegal from es 2.0 on.

closes https://github.com/elastic/elasticsearch-mapper-attachments/issues/169
2015-11-30 15:28:12 +01:00
Robert Muir a2816ec574 don't assert exact expected lengths for documents.
This will change depending on newline of the operating system
2015-11-29 11:48:54 -05:00