OpenSearch

Commit Graph

Author	SHA1	Message	Date
Jason Tedor	2dea449949	Remove Strings#splitStringToArray This commit removes the method Strings#splitStringToArray and replaces the call sites with invocations to String#split. There are only two explanations for the existence of this method. The first is that String#split is slightly tricky in that it accepts a regular expression rather than a character to split on. This means that if s is a string, s.split(".") does not split on the character '.', but rather splits on the regular expression '.' which splits on every character (of course, this is easily fixed by invoking s.split("\\.") instead). The second possible explanation is that (again) String#split accepts a regular expression. This means that there could be a performance concern compared to just splitting on a single character. However, it turns out that String#split has a fast path for the case of splitting on a single character and microbenchmarks show that String#split has 1.5x--2x the throughput of Strings#splitStringToArray. There is a slight behavior difference between Strings#splitStringToArray and String#split: namely, the former would return an empty array in cases when the input string was null or empty but String#split will just NPE at the call site on null and return a one-element array containing the empty string when the input string is empty. There was only one place relying on this behavior and the call site has been modified accordingly.	2016-05-04 08:12:41 -04:00
Martijn van Groningen	7aca1389e2	ingest: Add `date_index_name` processor. Closes #17814	2016-04-29 17:20:48 +02:00
Tal Levy	07c2fbf83a	Validate properties values according to database type (#17940 ) Fixes #17683.	2016-04-29 07:58:27 -07:00
Yannick Welsch	37382ecfb2	Add Azure discovery tests mocking Azure management endpoint (#18004 )	2016-04-29 15:54:15 +02:00
David Pilato	6c7a44ccd9	Fix test in mapper attachments plugin	2016-04-29 15:02:04 +02:00
David Pilato	2636703afa	Merge branch 'master' into pr/attachments-add-test-forced-values	2016-04-29 14:55:42 +02:00
Alexander Reelsen	f71eb0b888	Version: Set version to 5.0.0-alpha2	2016-04-26 09:30:26 +02:00
Xu Zhang	3e4b470f83	Fix icu IndexScope setting	2016-04-22 15:03:02 -07:00
Ryan Ernst	d12a4bb51d	Merge pull request #17933 from rjernst/camelcase4 Remove camelCase support	2016-04-22 13:46:43 -07:00
xuzha	cd527c5b92	Add support for customizing the rule file in ICU tokenizer Lucene allows to create a ICUTokenizer with a special config argument enabling the customization of the rule based iterator by providing custom rules files. This commit enable this feature. Users could provide a list of RBBI rule files to ICU tokenizer. closes #13146	2016-04-22 12:39:20 -07:00
Ryan Ernst	55388590c1	Remove camelCase support Now that the current uses of magical camelCase support have been deprecated, we can remove these in master (sans remaining issues like BulkRequest). This change removes camel case support from ParseField, query types, analysis, and settings lookup. see #8988	2016-04-22 09:18:10 -07:00
Martijn van Groningen	c5ad2e2865	Changed indexed scripts to be stored in the cluster state instead of the `.scripts` index. Also added max script size soft limit for stored scripts. Closes #16651	2016-04-22 13:42:55 +02:00
Martijn van Groningen	dd2184ab25	ingest: Streamline option naming for several processors: * `rename` processor, renamed `to` to `target_field` * `date` processor, renamed `match_field` to `field` and renamed `match_formats` to `formats` * `geoip` processor, renamed `source_field` to `field` and renamed `fields` to `properties` * `attachment` processor, renamed `source_field` to `field` and renamed `fields` to `properties` Closes #17835	2016-04-21 13:40:43 +02:00
Jun Ohtani	9eb242a5fe	Analyze API : Rename filters/token_filters/char_filter to filter/token_filter/char_filter Closes #15189	2016-04-21 18:05:11 +09:00
Ryan Ernst	523b071836	Internal: Remove XContentBuilderString This was previously used by xcontentbuilder to support camelCase. However, it is no longer used, and can be replaced with just String.	2016-04-18 14:32:18 -07:00
Nik Everett	ff9b28d806	Deprecate remaining readXYZ\|writeXYZ methods	2016-04-18 16:19:45 -04:00
Adrien Grand	d84c643f58	Use the new points API to index numeric fields. #17746 This makes all numeric fields including `date`, `ip` and `token_count` use points instead of the inverted index as a lookup structure. This is expected to perform worse for exact queries, but faster for range queries. It also requires less storage. Notes about how the change works: - Numeric mappers have been split into a legacy version that is essentially the current mapper, and a new version that uses points, eg. LegacyDateFieldMapper and DateFieldMapper. - Since new and old fields have the same names, the decision about which one to use is made based on the index creation version. - If you try to force using a legacy field on a new index or a field that uses points on an old index, you will get an exception. - IP addresses now support IPv6 via Lucene's InetAddressPoint and store them in SORTED_SET doc values using the same encoding (fixed length of 16 bytes and sortable). - The internal MappedFieldType that is stored by the new mappers does not have any of the points-related properties set. Instead, it keeps setting the index options when parsing the `index` property of mappings and does `if (fieldType.indexOptions() != IndexOptions.NONE) { // add point field }` when parsing documents. Known issues that won't fix: - You can't use numeric fields in significant terms aggregations anymore since this requires document frequencies, which points do not record. - Term queries on numeric fields will now return constant scores instead of giving better scores to the rare values. Known issues that we could work around (in follow-up PRs, this one is too large already): - Range queries on `ip` addresses only work if both the lower and upper bounds are inclusive (exclusive bounds are not exposed in Lucene). We could either decide to implement it, or drop range support entirely and tell users to query subnets using the CIDR notation instead. - Since IP addresses now use a different representation for doc values, aggregations will fail when running a terms aggregation on an ip field on a list of indices that contains both pre-5.0 and 5.0 indices. - The ip range aggregation does not work on the new ip field. We need to either implement range aggs for SORTED_SET doc values or drop support for ip ranges and tell users to use filters instead. #17700 Closes #16751 Closes #17007 Closes #11513	2016-04-14 17:56:23 +02:00
Yannick Welsch	80cf9fc761	Add EC2 discovery tests to check permissions of AWS Java SDK (#17677 )	2016-04-13 10:01:49 +02:00
Adrien Grand	3bf6f4076c	Do not set analyzers on numeric fields. When it comes to query parsing, either a field is tokenized and it would go through analysis with its search_analyzer. Or it is not tokenized and the raw string should be passed to termQuery(). Since numeric fields are not tokenized and also declare a search analyzer, values would currently go through analysis twice...	2016-04-12 17:47:29 +02:00
Adrien Grand	013acf9179	Remove MappedFieldType.value. #17557 This commit removes `MappedFieldType.value` and simplifies `MappedFieldType.valueforSearch`. `valueforSearch` was used to post-process values that come for stored fields (eg. to convert a long back to a string representation of a date in the case of a date field) and also values that are extracted from the source but only in the case of GET calls: it would not be called when performing source filtering on search requests. `valueforSearch` is now only called for stored fields, since values that are extracted from the source should already be formatted as expected.	2016-04-12 09:12:56 +02:00
Adrien Grand	496c7fbd84	Upgrade Lucene 6 Release * upgrades numerics to new Point format * updates geo api changes * adds GeoPointDistanceRangeQuery as XGeoPointDistanceRangeQuery * cuts over to ES GeoHashUtils	2016-04-11 16:50:04 -05:00
Yannick Welsch	b08d453a0a	Fix EC2 Discovery settings (#17651 ) Fixes two bugs introduced by the settings refactoring in #16602	2016-04-11 16:17:55 +02:00
Alexander Reelsen	da19ddf3e6	Ingest Attachment: Allow to prevent base64 conversions by using raw bytes (#16601 ) CBOR is natively supported in Elasticsearch and allows for byte arrays. This means, that by using CBOR the user can prevent base64 conversions for the data being sent back and forth. This PR adds support to extract data from a byte array in addition to a string. This also required to add a ByteArrayValueSource class.	2016-04-11 14:14:56 +02:00
Adrien Grand	42526ac28e	Remove Settings.settingsBuilder. We have both `Settings.settingsBuilder` and `Settings.builder` that do exactly the same thing, so we should keep only one. I kept `Settings.builder` since it has my preference but also it is the one that we use in examples of the Java API.	2016-04-08 18:10:02 +02:00
David Pilato	c6b1beb083	Add a test for forced values in mapper-attachments plugin This PR just adds a new test where we check that we forcing a value in the JSON document actually works as expected: ```json { "file": { "_content": "BASE64" "_name": "12-240.pdf", "_language": "en", "_content_type": "pdf" } } ``` Note that we don't support forcing all values. So sending: ```json { "file": { "_content": "BASE64" "_name": "12-240.pdf", "_title": "12-240.pdf", "_keywords": "Div42 Src580 LGE Mechtech", "_language": "en", "_content_type": "pdf" } } ``` Will have absolutely no effect on fields `title` and `keywords`. Note that when `_language` is set, it only works if `index.mapping.attachment.detect_language` is set to `true`. Related to https://discuss.elastic.co/t/mapper-attachments/46615/4	2016-04-08 10:07:21 +02:00
Chris Earle	d97d5ebb8b	Remove hostname from NetworkAddress.format This removes the inconsistent output of IP addresses. The format was parsing-unfriendly and it makes it hard to reason about API responses, such as to _nodes. With this change in place, it will never print the hostname as part of the default format, which has the added benefit that it can be used consistently for URIs, which was not the case when the hostname might appear at the front with "hostname/ip:port".	2016-04-07 17:27:59 -04:00
javanna	b9f9b2e3ee	Merge branch 'master' into enhancement/discovery_node_one_getter	2016-03-30 17:22:40 +02:00
javanna	f8b5d1f5b0	Remove DiscoveryNodes#masterNodeId in favour of existing DiscoveryNodes#getMasterNodeId	2016-03-30 15:28:06 +02:00
Adrien Grand	068c788ec8	Disable fielddata on text fields by defaults. #17386 `text` fields will have fielddata disabled by default. Fielddata can still be enabled on an existing index by setting `fielddata=true` in the mappings.	2016-03-30 14:35:32 +02:00
javanna	8fc9dbbb99	Merge branch 'master' into enhancement/remove_node_client_setting	2016-03-29 14:27:04 +02:00
Clinton Gormley	579d976e90	The source parameter should not be defined in the delete-by-query REST spec	2016-03-29 11:45:20 +02:00
javanna	93ce36a198	separated attributes from node roles in DiscoveryNode Node roles are now serialized as well, they are not part of the node attributes anymore. DiscoveryNodeService takes care of dividing settings into attributes and roles. DiscoveryNode always requires to pass in attributes and roles separately.	2016-03-25 20:14:27 +01:00
Jason Tedor	7f0134e725	Revert "Merge pull request #16843 from xuzha/s3-encryption" This reverts commit `37a183d9ed`, reversing changes made to `08903f1ed8`.	2016-03-24 17:11:02 -04:00
Xu Zhang	38923b89c2	Update Format, add new settings into the setting test	2016-03-24 12:16:57 -07:00
Xu Zhang	7499e3aa4a	Update and rebase the init implementation. Also removes the MD5 checks from our side, AWS S3 SDK java is doing the check.	2016-03-24 11:21:40 -07:00
Nicolas Trésegnie	ea78fd6560	Add client-side encryption The Java Cryptography Extension (JCE) has to be installed to use this feature.	2016-03-24 11:13:37 -07:00
David Pilato	4b1ae331f0	Update after review	2016-03-23 17:32:51 +01:00
David Pilato	e907b7c11e	Check that S3 setting `buffer_size` is always lower than `chunk_size` We can be better at checking `buffer_size` and `chunk_size` for S3 repositories. For example, we know that: * `buffer_size` should be more than `5mb` * `chunk_size` should be no more than `5tb` * `buffer_size` should be lower than `chunk_size` Otherwise, setting `buffer_size` is useless. For the record: `chunk_size` is a Snapshot setting whatever the implementation is. `buffer_size` is an S3 implementation setting. Let say that you are snapshotting a 500mb file. If you set `chunk_size` to `200mb`, then Snapshot service will call S3 repository to snapshot 3 files with the following sizes: * `200mb` * `200mb` * `100mb` If you set `buffer_size` to `100mb` (AWS maximum size recommendation), the first file of `200mb` will be uploaded on S3 using the multipart feature in 2 chunks and the workflow is basically the following: * create the multipart request and get back an `id` from AWS S3 platform * upload part1: `100mb` * upload part2: `100mb` * "commit" the full upload using the `id`. Closes #17244.	2016-03-23 10:39:54 +01:00
Simon Willnauer	1988b8b387	[TEST] Reuse EsTestCase#createAnalysisService in KuromojiAnalysisTests	2016-03-22 13:45:20 +01:00
Jun Ohtani	a9a0f262af	Analysis Kuromoji: Add nbest option and NumberFilter Add nbest_cost and nbest_examples parameter to KuromojiTokenizerFactory Add KuromojiNumberFilterFactory	2016-03-22 20:09:56 +09:00
Ryan Ernst	f71f0d6010	Revert "Build: Switch to maven-publish plugin" This reverts commit `a90a2b34fc`.	2016-03-18 17:22:25 -07:00
Ryan Ernst	6af4c43c4f	Merge pull request #17128 from rjernst/maven_publish Build: Switch to maven-publish plugin	2016-03-17 11:53:50 -07:00
Simon Willnauer	e91a141233	Prevent index level setting from being configured on a node level Today we allow to set all kinds of index level settings on the node level which is error prone and difficult to get right in a consistent manner. For instance if some analyzers are setup in a yaml config file some nodes might not have these analyzers and then index creation fails. Nevertheless, this change allows some selected settings to be specified on a node level for instance: * `index.codec` which is used in a hot/cold node architecture and it's value is really per node or per index * `index.store.fs.fs_lock` which is also dependent on the filesystem a node uses All other index level setting must be specified on the index level. For existing clusters the index must be closed and all settings must be updated via the API on each of the indices. Closes #16799	2016-03-17 14:42:18 +01:00
Ryan Ernst	a90a2b34fc	Build: Switch to maven-publish plugin The build currently uses the old maven support in gradle. This commit switches to use the newer maven-publish plugin. This will allow future changes, for example, easily publishing to artifactory. An additional part of this change makes publishing of build-tools part of the normal publishing, instead of requiring a separate upload step from within buildSrc. That also sets us up for a follow up to enable precomit checks on the buildSrc code itself.	2016-03-15 19:16:37 -07:00
Jason Tedor	618441aea3	Merge pull request #17088 from jasontedor/simplify-bootstrap-settings Bootstrap does not set system properties	2016-03-15 19:25:16 -04:00
Jason Tedor	66ba044ec5	Use setting in integration test cluster config	2016-03-15 17:45:17 -04:00
Yannick Welsch	f5e6db4090	Remove System.out.println and Throwable.printStackTrace from tests	2016-03-15 15:40:37 +01:00
Yannick Welsch	d14ae5f8b6	Remove Python and Javascript Benchmark classes	2016-03-15 15:02:50 +01:00
David Pilato	84c862b825	Merge remote-tracking branch 'origin/master'	2016-03-15 09:25:26 +01:00
David Pilato	a3bf57d116	Upgrade azure SDK to 0.9.3 We are ATM using azure SDK 0.9.0. Azure latest release is now 0.9.3 (released in February 2016). <img width="1024" alt="the central repository search engine google chrome aujourd hui at 08 41 12" src="https://cloud.githubusercontent.com/assets/274222/13662836/a806ba3a-e69d-11e5-8655-4a838db2ef47.png"> Artifacts are on [maven central](http://search.maven.org/#search%7Cga%7C1%7Cg%3A%22com.microsoft.azure%22%20AND%20(a%3Aazure-serviceruntime%20OR%20a%3Aazure-servicebus%20OR%20a%3Aazure-svc-)) Change log: ## 2016.2.18 Version 0.9.3 Fix enum bugs in azure-svc-mgmt-websites ## 2016.1.26 Version 0.9.2 * Fix HTTP Proxy for Apache HTTP Client in Service Clients * Key Vault: Fix KeyVaultKey to not attempt to load RSA Private Key ## 2016.1.8 Version 0.9.1 * Support HTTP Proxy * Fix token expiration issue #557 * Service Bus: Add missing attributes: partitionKey, viaPartitionKey * Traffic Manager: Update API version, add MinChildEndpoints for NestedEndpoints * Media: Add support for Widevine (DRM) dynamic encryption Closes #17042.	2016-03-15 09:18:34 +01:00

1 2 3 4 5 ...

1303 Commits