OpenSearch

Commit Graph

Author	SHA1	Message	Date
Adrien Grand	d84c643f58	Use the new points API to index numeric fields. #17746 This makes all numeric fields including `date`, `ip` and `token_count` use points instead of the inverted index as a lookup structure. This is expected to perform worse for exact queries, but faster for range queries. It also requires less storage. Notes about how the change works: - Numeric mappers have been split into a legacy version that is essentially the current mapper, and a new version that uses points, eg. LegacyDateFieldMapper and DateFieldMapper. - Since new and old fields have the same names, the decision about which one to use is made based on the index creation version. - If you try to force using a legacy field on a new index or a field that uses points on an old index, you will get an exception. - IP addresses now support IPv6 via Lucene's InetAddressPoint and store them in SORTED_SET doc values using the same encoding (fixed length of 16 bytes and sortable). - The internal MappedFieldType that is stored by the new mappers does not have any of the points-related properties set. Instead, it keeps setting the index options when parsing the `index` property of mappings and does `if (fieldType.indexOptions() != IndexOptions.NONE) { // add point field }` when parsing documents. Known issues that won't fix: - You can't use numeric fields in significant terms aggregations anymore since this requires document frequencies, which points do not record. - Term queries on numeric fields will now return constant scores instead of giving better scores to the rare values. Known issues that we could work around (in follow-up PRs, this one is too large already): - Range queries on `ip` addresses only work if both the lower and upper bounds are inclusive (exclusive bounds are not exposed in Lucene). We could either decide to implement it, or drop range support entirely and tell users to query subnets using the CIDR notation instead. - Since IP addresses now use a different representation for doc values, aggregations will fail when running a terms aggregation on an ip field on a list of indices that contains both pre-5.0 and 5.0 indices. - The ip range aggregation does not work on the new ip field. We need to either implement range aggs for SORTED_SET doc values or drop support for ip ranges and tell users to use filters instead. #17700 Closes #16751 Closes #17007 Closes #11513	2016-04-14 17:56:23 +02:00
Yannick Welsch	80cf9fc761	Add EC2 discovery tests to check permissions of AWS Java SDK (#17677 )	2016-04-13 10:01:49 +02:00
Adrien Grand	3bf6f4076c	Do not set analyzers on numeric fields. When it comes to query parsing, either a field is tokenized and it would go through analysis with its search_analyzer. Or it is not tokenized and the raw string should be passed to termQuery(). Since numeric fields are not tokenized and also declare a search analyzer, values would currently go through analysis twice...	2016-04-12 17:47:29 +02:00
Adrien Grand	013acf9179	Remove MappedFieldType.value. #17557 This commit removes `MappedFieldType.value` and simplifies `MappedFieldType.valueforSearch`. `valueforSearch` was used to post-process values that come for stored fields (eg. to convert a long back to a string representation of a date in the case of a date field) and also values that are extracted from the source but only in the case of GET calls: it would not be called when performing source filtering on search requests. `valueforSearch` is now only called for stored fields, since values that are extracted from the source should already be formatted as expected.	2016-04-12 09:12:56 +02:00
Adrien Grand	496c7fbd84	Upgrade Lucene 6 Release * upgrades numerics to new Point format * updates geo api changes * adds GeoPointDistanceRangeQuery as XGeoPointDistanceRangeQuery * cuts over to ES GeoHashUtils	2016-04-11 16:50:04 -05:00
Yannick Welsch	b08d453a0a	Fix EC2 Discovery settings (#17651 ) Fixes two bugs introduced by the settings refactoring in #16602	2016-04-11 16:17:55 +02:00
Alexander Reelsen	da19ddf3e6	Ingest Attachment: Allow to prevent base64 conversions by using raw bytes (#16601 ) CBOR is natively supported in Elasticsearch and allows for byte arrays. This means, that by using CBOR the user can prevent base64 conversions for the data being sent back and forth. This PR adds support to extract data from a byte array in addition to a string. This also required to add a ByteArrayValueSource class.	2016-04-11 14:14:56 +02:00
Adrien Grand	42526ac28e	Remove Settings.settingsBuilder. We have both `Settings.settingsBuilder` and `Settings.builder` that do exactly the same thing, so we should keep only one. I kept `Settings.builder` since it has my preference but also it is the one that we use in examples of the Java API.	2016-04-08 18:10:02 +02:00
Chris Earle	d97d5ebb8b	Remove hostname from NetworkAddress.format This removes the inconsistent output of IP addresses. The format was parsing-unfriendly and it makes it hard to reason about API responses, such as to _nodes. With this change in place, it will never print the hostname as part of the default format, which has the added benefit that it can be used consistently for URIs, which was not the case when the hostname might appear at the front with "hostname/ip:port".	2016-04-07 17:27:59 -04:00
javanna	b9f9b2e3ee	Merge branch 'master' into enhancement/discovery_node_one_getter	2016-03-30 17:22:40 +02:00
javanna	f8b5d1f5b0	Remove DiscoveryNodes#masterNodeId in favour of existing DiscoveryNodes#getMasterNodeId	2016-03-30 15:28:06 +02:00
Adrien Grand	068c788ec8	Disable fielddata on text fields by defaults. #17386 `text` fields will have fielddata disabled by default. Fielddata can still be enabled on an existing index by setting `fielddata=true` in the mappings.	2016-03-30 14:35:32 +02:00
javanna	8fc9dbbb99	Merge branch 'master' into enhancement/remove_node_client_setting	2016-03-29 14:27:04 +02:00
Clinton Gormley	579d976e90	The source parameter should not be defined in the delete-by-query REST spec	2016-03-29 11:45:20 +02:00
javanna	93ce36a198	separated attributes from node roles in DiscoveryNode Node roles are now serialized as well, they are not part of the node attributes anymore. DiscoveryNodeService takes care of dividing settings into attributes and roles. DiscoveryNode always requires to pass in attributes and roles separately.	2016-03-25 20:14:27 +01:00
Jason Tedor	7f0134e725	Revert "Merge pull request #16843 from xuzha/s3-encryption" This reverts commit `37a183d9ed`, reversing changes made to `08903f1ed8`.	2016-03-24 17:11:02 -04:00
Xu Zhang	38923b89c2	Update Format, add new settings into the setting test	2016-03-24 12:16:57 -07:00
Xu Zhang	7499e3aa4a	Update and rebase the init implementation. Also removes the MD5 checks from our side, AWS S3 SDK java is doing the check.	2016-03-24 11:21:40 -07:00
Nicolas Trésegnie	ea78fd6560	Add client-side encryption The Java Cryptography Extension (JCE) has to be installed to use this feature.	2016-03-24 11:13:37 -07:00
David Pilato	4b1ae331f0	Update after review	2016-03-23 17:32:51 +01:00
David Pilato	e907b7c11e	Check that S3 setting `buffer_size` is always lower than `chunk_size` We can be better at checking `buffer_size` and `chunk_size` for S3 repositories. For example, we know that: * `buffer_size` should be more than `5mb` * `chunk_size` should be no more than `5tb` * `buffer_size` should be lower than `chunk_size` Otherwise, setting `buffer_size` is useless. For the record: `chunk_size` is a Snapshot setting whatever the implementation is. `buffer_size` is an S3 implementation setting. Let say that you are snapshotting a 500mb file. If you set `chunk_size` to `200mb`, then Snapshot service will call S3 repository to snapshot 3 files with the following sizes: * `200mb` * `200mb` * `100mb` If you set `buffer_size` to `100mb` (AWS maximum size recommendation), the first file of `200mb` will be uploaded on S3 using the multipart feature in 2 chunks and the workflow is basically the following: * create the multipart request and get back an `id` from AWS S3 platform * upload part1: `100mb` * upload part2: `100mb` * "commit" the full upload using the `id`. Closes #17244.	2016-03-23 10:39:54 +01:00
Simon Willnauer	1988b8b387	[TEST] Reuse EsTestCase#createAnalysisService in KuromojiAnalysisTests	2016-03-22 13:45:20 +01:00
Jun Ohtani	a9a0f262af	Analysis Kuromoji: Add nbest option and NumberFilter Add nbest_cost and nbest_examples parameter to KuromojiTokenizerFactory Add KuromojiNumberFilterFactory	2016-03-22 20:09:56 +09:00
Ryan Ernst	f71f0d6010	Revert "Build: Switch to maven-publish plugin" This reverts commit `a90a2b34fc`.	2016-03-18 17:22:25 -07:00
Ryan Ernst	6af4c43c4f	Merge pull request #17128 from rjernst/maven_publish Build: Switch to maven-publish plugin	2016-03-17 11:53:50 -07:00
Simon Willnauer	e91a141233	Prevent index level setting from being configured on a node level Today we allow to set all kinds of index level settings on the node level which is error prone and difficult to get right in a consistent manner. For instance if some analyzers are setup in a yaml config file some nodes might not have these analyzers and then index creation fails. Nevertheless, this change allows some selected settings to be specified on a node level for instance: * `index.codec` which is used in a hot/cold node architecture and it's value is really per node or per index * `index.store.fs.fs_lock` which is also dependent on the filesystem a node uses All other index level setting must be specified on the index level. For existing clusters the index must be closed and all settings must be updated via the API on each of the indices. Closes #16799	2016-03-17 14:42:18 +01:00
Ryan Ernst	a90a2b34fc	Build: Switch to maven-publish plugin The build currently uses the old maven support in gradle. This commit switches to use the newer maven-publish plugin. This will allow future changes, for example, easily publishing to artifactory. An additional part of this change makes publishing of build-tools part of the normal publishing, instead of requiring a separate upload step from within buildSrc. That also sets us up for a follow up to enable precomit checks on the buildSrc code itself.	2016-03-15 19:16:37 -07:00
Jason Tedor	618441aea3	Merge pull request #17088 from jasontedor/simplify-bootstrap-settings Bootstrap does not set system properties	2016-03-15 19:25:16 -04:00
Jason Tedor	66ba044ec5	Use setting in integration test cluster config	2016-03-15 17:45:17 -04:00
Yannick Welsch	f5e6db4090	Remove System.out.println and Throwable.printStackTrace from tests	2016-03-15 15:40:37 +01:00
Yannick Welsch	d14ae5f8b6	Remove Python and Javascript Benchmark classes	2016-03-15 15:02:50 +01:00
David Pilato	84c862b825	Merge remote-tracking branch 'origin/master'	2016-03-15 09:25:26 +01:00
David Pilato	a3bf57d116	Upgrade azure SDK to 0.9.3 We are ATM using azure SDK 0.9.0. Azure latest release is now 0.9.3 (released in February 2016). <img width="1024" alt="the central repository search engine google chrome aujourd hui at 08 41 12" src="https://cloud.githubusercontent.com/assets/274222/13662836/a806ba3a-e69d-11e5-8655-4a838db2ef47.png"> Artifacts are on [maven central](http://search.maven.org/#search%7Cga%7C1%7Cg%3A%22com.microsoft.azure%22%20AND%20(a%3Aazure-serviceruntime%20OR%20a%3Aazure-servicebus%20OR%20a%3Aazure-svc-)) Change log: ## 2016.2.18 Version 0.9.3 Fix enum bugs in azure-svc-mgmt-websites ## 2016.1.26 Version 0.9.2 * Fix HTTP Proxy for Apache HTTP Client in Service Clients * Key Vault: Fix KeyVaultKey to not attempt to load RSA Private Key ## 2016.1.8 Version 0.9.1 * Support HTTP Proxy * Fix token expiration issue #557 * Service Bus: Add missing attributes: partitionKey, viaPartitionKey * Traffic Manager: Update API version, add MinChildEndpoints for NestedEndpoints * Media: Add support for Widevine (DRM) dynamic encryption Closes #17042.	2016-03-15 09:18:34 +01:00
Simon Willnauer	345e988bbc	Merge pull request #17072 from s1monw/add_backwards_rest_tests Add infrastructure to run REST tests on a multi-version cluster This change adds the infrastructure to run the rest tests on a multi-node cluster that users 2 different minor versions of elasticsearch. It doesn't implement any dedicated BWC tests but rather leverages the existing REST tests. Since we don't have a real version to test against, the tests uses the current version until the first minor / RC is released to ensure the infrastructure works. Given the amount of problems this change already found I think it's worth having this run with our test suite by default. The structure of this infra will likely change over time but for now it's a step into the right direction. We will likely want to split it up into integTests and integBwcTests etc. so each plugin can have it's own bwc tests but that's left for future refactoring.	2016-03-15 09:17:43 +01:00
Areek Zillur	c3078f4d65	adapt tests to use index uuid as folder name	2016-03-14 23:24:24 -04:00
Simon Willnauer	554bf2c282	[TEST] Test that all processors are available	2016-03-14 22:35:25 +01:00
Simon Willnauer	6f28c173e2	[TEST] Test that all processors are available	2016-03-14 21:42:37 +01:00
Adrien Grand	5596e31068	Upgrade to lucene-6.0.0-f0aa4fc. #17075	2016-03-14 07:58:52 +01:00
Jason Tedor	8a05c2a2be	Bootstrap does not set system properties Today, certain bootstrap properties are set and read via system properties. This action-at-distance way of managing these properties is rather confusing, and completely unnecessary. But another problem exists with setting these as system properties. Namely, these system properties are interpreted as Elasticsearch settings, not all of which are registered. This leads to Elasticsearch failing to startup if any of these special properties are set. Instead, these properties should be kept as local as possible, and passed around as method parameters where needed. This eliminates the action-at-distance way of handling these properties, and eliminates the need to register these non-setting properties. This commit does exactly that. Additionally, today we use the "-D" command line flag to set the properties, but this is confusing because "-D" is a special flag to the JVM for setting system properties. This creates confusion because some "-D" properties should be passed via arguments to the JVM (so via ES_JAVA_OPTS), and some should be passed as arguments to Elasticsearch. This commit changes the "-D" flag for Elasticsearch settings to "-E".	2016-03-13 20:09:15 -04:00
David Pilato	9acb0bb28c	Merge branch 'master' into pr/16598-register-filter-settings # Conflicts: # core/src/main/java/org/elasticsearch/cluster/service/InternalClusterService.java # core/src/main/java/org/elasticsearch/common/settings/IndexScopedSettings.java # core/src/main/java/org/elasticsearch/common/settings/Setting.java	2016-03-13 14:52:10 +01:00
Ryan Ernst	591fb8f028	Merge branch 'master' into cli-parsing	2016-03-11 10:45:05 -08:00
Yannick Welsch	04e55ecf6b	Make logging message String constant to allow static checks	2016-03-11 10:30:59 +01:00
Yannick Welsch	718876a941	Fix wrong placeholder usage in logging statements	2016-03-11 10:30:59 +01:00
Ryan Ernst	42a6869bb1	Merge pull request #17059 from elastic/fix/16864-attachment-doctypes Fix attachments plugins with docx	2016-03-10 17:27:02 -08:00
Ryan Ernst	2f3efc3fe1	Add doc and docx rest test to mapper attachment along with getClassLoader permission	2016-03-10 13:28:19 -08:00
Ryan Ernst	51d87d94dc	Add getClassLoader perm for tika in ingest	2016-03-10 11:17:25 -08:00
thefourtheye	304cbbbf31	fix redundant stack in comments	2016-03-11 00:31:38 +05:30
David Pilato	6deabac8e8	Can not extract text from Office documents (`.docx` extension) Add REST test for: * `.doc` * `.docx` The later fails with: ``` ==> Test Info: seed=DB93397128B876D4; jvm=1; suite=1 Suite: org.elasticsearch.ingest.attachment.IngestAttachmentRestIT 2> REPRODUCE WITH: gradle :plugins:ingest-attachment:integTest -Dtests.seed=DB93397128B876D4 -Dtests.class=org.elasticsearch.ingest.attachment.IngestAttachmentRestIT -Dtests.method="test {yaml=ingest_attachment/30_files_supported/Test ingest attachment processor with .docx file}" -Des.logger.level=WARN -Dtests.security.manager=true -Dtests.locale=bg -Dtests.timezone=Europe/Athens FAILURE 4.53s \| IngestAttachmentRestIT.test {yaml=ingest_attachment/30_files_supported/Test ingest attachment processor with .docx file} <<< FAILURES! > Throwable #1: java.lang.AssertionError: expected [2xx] status code but api [index] returned [400 Bad Request] [{"error":{"root_cause":[{"type":"parse_exception","reason":"Error parsing document in field [field1]"}],"type":"parse_exception","reason":"Error parsing document in field [field1]","caused_by":{"type":"tika_exception","reason":"Unexpected RuntimeException from org.apache.tika.parser.microsoft.ooxml.OOXMLParser@7f85baa5","caused_by":{"type":"illegal_state_exception","reason":"access denied (\"java.lang.RuntimePermission\" \"getClassLoader\")","caused_by":{"type":"access_control_exception","reason":"access denied (\"java.lang.RuntimePermission\" \"getClassLoader\")"}}}},"status":400}] > at __randomizedtesting.SeedInfo.seed([DB93397128B876D4:53C706AB86441B2C]:0) > at org.elasticsearch.test.rest.section.DoSection.execute(DoSection.java:107) > at org.elasticsearch.test.rest.ESRestTestCase.test(ESRestTestCase.java:395) > at java.lang.Thread.run(Thread.java:745) ``` Related to #16864	2016-03-10 10:57:59 +01:00
Simon Willnauer	7a53a396e4	Remove Unneded @Inject annotations	2016-03-09 12:10:47 +01:00
Ryan Ernst	80ae2b0002	Fix more licenses	2016-03-09 00:10:59 -08:00

1 2 3 4 5 ...

1286 Commits