Commit Graph

3857 Commits

Author SHA1 Message Date
Nik Everett ea4eb06b0a Test: Make update-by-query test more resilient
`UpdateByQueryWhileModifyingTests#testUpdateWhileReindexing`
runs update-by-query and concurrently updates, asserting that
the update-by-query never reverts any changes made by the update.
It is a smoke test for concurrent updates.

Now, it expects to hit a certain number of version conflicts
during the updates. This is normal as it is racing the
update-by-query. We have a maximum number of failures we
expect (10) and I'd never seen us come close until
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+5.x+multijob-unix-compatibility/os=sles/495/console

This bumps the max failures from 10 to 50 and improves
logging a bit. If we continue to see this failure then we have
some other issue.

Closes #22938
2017-02-03 09:18:26 -05:00
Jay Modi 7520a107be Optionally require a valid content type for all rest requests with content (#22691)
This change adds a strict mode for xcontent parsing on the rest layer. The strict mode will be off by default for 5.x and in a separate commit will be enabled by default for 6.0. The strict mode, which can be enabled by setting `http.content_type.required: true` in 5.x, will require that all incoming rest requests have a valid and supported content type header before the request is dispatched. In the non-strict mode, the Content-Type header will be inspected and if it is not present or not valid, we will continue with auto detection of content like we have done previously.

The content type header is parsed to the matching XContentType value with the only exception being for plain text requests. This value is then passed on with the content bytes so that we can reduce the number of places where we need to auto-detect the content type.

As part of this, many transport requests and builders were updated to provide methods that
accepted the XContentType along with the bytes and the methods that would rely on auto-detection have been deprecated.

In the non-strict mode, deprecation warnings are issued whenever a request with body doesn't provide the Content-Type header.

See #19388
2017-02-02 14:07:13 -05:00
Nik Everett ce8e042b66 Reindex: fix reindex-from-remote from <2.0 (#22931)
In 5.2 we stopped sending the source parameter if the user didn't
specify it. This was a mistake as versions before 2.0 look like
they don't always include the `_source`. This is because reindex
requests some metadata fields. Anyway, now we say `"_source": true`
if there isn't a `_source` configured in the reindex request.

Closes #22893
2017-02-02 11:46:24 -05:00
Nik Everett 73bf29072f Painless: Fix def invoked qualified method refs (#22918)
We were incorrectly resolving qualified method references at run
time when invoked on `def`. This lead to errors like
`The struct with name [org] has not been defined.` when attempting

```
doc.date.dates.stream().map(
  org.joda.time.ReadableDateTime::centuryOfEra
).collect(Collectors.toList())
```
2017-02-02 10:15:03 -05:00
Nik Everett dacc150934 Expose multi-valued dates to scripts and document painless's date functions (#22875)
Implemented by wrapping an array of reused `ModuleDateTime`s that
we grow when needed. The `ModuleDateTime`s are reused when we
move to the next document.

Also improves the error message returned when attempting to modify
the `ScriptdocValues`, removes a couple of allocations, and documents
that the date functions are available in Painless.

Relates to #22162
2017-02-01 21:57:07 -05:00
Jack Conradson 3d2626c4c6 Change Namespace for Stored Script to Only Use Id (#22206)
Currently, stored scripts use a namespace of (lang, id) to be put, get, deleted, and executed. This is not necessary since the lang is stored with the stored script. A user should only have to specify an id to use a stored script. This change makes that possible while keeping backwards compatibility with the previous namespace of (lang, id). Anywhere the previous namespace is used will log deprecation warnings.

The new behavior is the following:

When a user specifies a stored script, that script will be stored under both the new namespace and old namespace.

Take for example script 'A' with lang 'L0' and data 'D0'. If we add script 'A' to the empty set, the scripts map will be ["A" -- D0, "A#L0" -- D0]. If a script 'A' with lang 'L1' and data 'D1' is then added, the scripts map will be ["A" -- D1, "A#L1" -- D1, "A#L0" -- D0].

When a user deletes a stored script, that script will be deleted from both the new namespace (if it exists) and the old namespace.

Take for example a scripts map with {"A" -- D1, "A#L1" -- D1, "A#L0" -- D0}. If a script is removed specified by an id 'A' and lang null then the scripts map will be {"A#L0" -- D0}. To remove the final script, the deprecated namespace must be used, so an id 'A' and lang 'L0' would need to be specified.

When a user gets/executes a stored script, if the new namespace is used then the script will be retrieved/executed using only 'id', and if the old namespace is used then the script will be retrieved/executed using 'id' and 'lang'
2017-01-31 13:27:02 -08:00
Nik Everett 2e48fb8294 Move delete by query helpers into core (#22810)
This moves the building blocks for delete by query into core. This
should enabled two thigns:
1. Plugins other than reindex to implement "bulk by scroll" style
operations.
2. Plugins to directly call delete by query. Those plugins should
be careful to make sure that task cancellation still works, but
this should be possible.

Notes:
1. I've mostly just moved classes and moved around tests methods.
2. I haven't been super careful about cohesion between these core
classes and reindex. They are quite interconnected because I wanted
to make the change as mechanical as possible.

Closes #22616
2017-01-27 16:09:18 -05:00
Nik Everett 8a2d424d68 Generate reference links for painless API (#22775)
Adds "Appending B. Painless API Reference", a reference of all classes
and methods available from Painless. Removes links to java packages
because they contain methods that we don't expose and don't contain
methods that we do expose (the ones in Augmentation). Instead this
generates a list of every class and every exposed method using the same
type information available to the
interpreter/compiler/whatever-we-call-it. From there you can jump to
the relevant docs.

Right now you build all the asciidoc files by running
```
gradle generatePainlessApi
```

These files are expected to be committed because we build the docs
without running `gradle`.

Also changes the output of `Debug.explain` so that it is easy to
search for the class in the generated reference documentation.

You can also run it in an IDE safely if you pass the path to the
directory in which to generate the docs as the first parameter. It'll
blow away the entire directory an recreate it from scratch so be careful.

And then you can build the docs by running something like:
```
../docs/build_docs.pl --out ../built_docs/ --doc docs/reference/index.asciidoc --open
```

That is, if you have checked out https://github.com/elastic/docs in
`../docs`. Wait a minute or two and your browser will pop open in with
all of Elasticsearch's reference documentation. If you go to
`http://localhost:8000/painless-api-reference.html` you can see this
list. Or you can get there by following the links to `Modules` and
`Scripting` and `Painless` and then clicking the link in the paragraphs
below titled `Appendix B. Painless API Reference`.

I like having these in asciidoc because we can deep link to them from the
rest of the guide with constructs like
`<<painless-api-reference-Object-hashCode-0>>` and
`<<painless-api-reference->>` and we get link checking. Then the only
brittle link maintenance bit is the link generation for javadoc. Which
sucks. But I think it is important that we link to the methods directly
so they are easy to find.

Relates to #22720
2017-01-26 10:39:19 -05:00
Tim Brooks 719e75bb3f Add repository-url module and move URLRepository (#22752)
This is related to #22116. URLRepository requires SocketPermission
connect. This commit introduces a new module called "repository-url"
where URLRepository will reside. With the new module, permissions can
be removed from core.
2017-01-25 17:09:25 -06:00
Tal Levy e9a68b3287 fix date-processor to a new default year for every new pipeline execution. (#22601)
Beforehand, the DateProcessor constructs its joda pattern formatter during processor
construction. This led to newly ingested documents being defaulted to
the year that the pipeline was constructed, not that of processing.

Fixes #22547.
2017-01-25 15:09:07 -08:00
Chris Earle f0f75b187a Support Preemptive Authentication with RestClient (#21336)
This adds the necessary `AuthCache` needed to support preemptive authorization. By adding every host to the cache, the automatically added `RequestAuthCache` interceptor will add credentials on the first pass rather than waiting to do it after _each_ anonymous request is rejected (thus always sending everything twice when basic auth is required).
2017-01-24 11:34:05 -05:00
Luca Cavanna 47c0e13a3b Stop returning "es." internal exception headers as http response headers (#22703)
move "es." internal headers to separate metadata set in ElasticsearchException and stop returning them as response headers

Closes #17593

* [TEST] remove ESExceptionTests, move its methods to ElasticsearchExceptionTests or ExceptionSerializationTests
2017-01-24 16:12:45 +01:00
Nik Everett 28cfc533e2 Generate javadoc jar for painless's public API (#22704)
The simplest way to do that is to move the public API into a
new package and generate javadoc for that package.
2017-01-23 17:16:20 -05:00
Jim Ferenczi e48bc2eed7 Add field collapsing for search request (#22337)
* Add top hits collapsing to search request

The field collapsing is done with a custom top docs collector that "collapse" search hits with same field value.
The distributed aspect is resolve using the two passes that the regular search uses. The first pass "collapse" the top hits, then the coordinating node merge/collapse the top hits from each shard.

```
GET _search
{
   "collapse": {
      "field": "category",
   }
}
```

This change also adds an ExpandCollapseSearchResponseListener that intercepts the search response and expands collapsed hits using the CollapseBuilder#innerHit} options.
The retrieval of each inner_hits is done by sending a query to all shards filtered by the collapse key.

```
GET _search
{
   "collapse": {
      "field": "category",
      "inner_hits": {
	"size": 2
      }
   }
}
```
2017-01-23 16:33:51 +01:00
Tim Brooks a4ac29c005 Add single static instance of SpecialPermission (#22726)
This commit adds a SpecialPermission constant and uses that constant
opposed to introducing new instances everywhere.

Additionally, this commit introduces a single static method to check that
the current code has permission. This avoids all the duplicated access
blocks that exist currently.
2017-01-21 12:03:52 -06:00
Jim Ferenczi 8028578305 Upgrade to Lucene 6.4.0 (#22724)
* Upgrade to Lucene 6.4.0

`ValueSource`s are now converted to `DoubleValueSource`s using the Lucene adapter made for the migration to the new API in 6.4.0.
2017-01-21 04:48:01 +01:00
Nik Everett 6265ef1c1b Deguice rest handlers (#22575)
There are presently 7 ctor args used in any rest handlers:
* `Settings`: Every handler uses it to initialize a logger and
  some other strange things.
* `RestController`: Every handler registers itself with it.
* `ClusterSettings`: Used by `RestClusterGetSettingsAction` to
  render the default values for cluster settings.
* `IndexScopedSettings`: Used by `RestGetSettingsAction` to get
  the default values for index settings.
* `SettingsFilter`: Used by a few handlers to filter returned
  settings so we don't expose stuff like passwords.
* `IndexNameExpressionResolver`: Used by `_cat/indices` to
  filter the list of indices.
* `Supplier<DiscoveryNodes>`: Used to fill enrich the response
  by handlers that list tasks.

We probably want to reduce these arguments over time but
switching construction away from guice gives us tighter
control over the list of available arguments.

These parameters are passed to plugins using
`ActionPlugin#initRestHandlers` which is expected to build and
return that handlers immediately. This felt simpler than
returning an reference to the ctors given all the different
possible args.

Breaks java plugins by moving rest handlers off of guice.
2017-01-20 11:48:51 -05:00
Tim Brooks bc16162d21 Remove accept SocketPermissions from core (#22622)
This is related to #22116. Core no longer needs SocketPermission 
accept. This permission is relegated to the transport-netty4 module 
and (for tests) to the mocksocket jar.
2017-01-20 09:27:45 -06:00
Nik Everett 22f1c9fa0f Remove @header we no longer need 2017-01-19 11:44:13 -05:00
Nik Everett bb83c283bb Make lexer abstract 2017-01-19 11:41:50 -05:00
Nik Everett dbb4a2ca6c Move lexer hacks to EnhancedPainlessLexer
This "feels" nicer. Less classes at least.
2017-01-19 11:23:16 -05:00
Nik Everett e2da6a8ee5 Improve painless's javadocs
Hopefully useful references.
2017-01-19 11:04:08 -05:00
Tim Brooks a10aa8aade Add TestWithDependenciesPlugin to build (#22646)
This commit adds a MessyRestTestPlugin to the gradle build. It extends 
StandaloneRestTestPlugin. The main piece of functionality that it adds 
is to copy plugin-metadata from dependencies into the 
generated-resources for the current test source. This is necessary to 
ensure that permissions for dependencies are applied when running the 
tests.

A current limitation is that the permissions are applied differently 
than in the distribution sources. When permissions are granted to all 
depedencies for a module or plugin, the permissions are granted to all 
dependencies on the classpath for tests besides a few hardcoded 
exclusions:
- es core
- es test framework
- lucene test framework
- randomized runner
- junit library
2017-01-19 09:43:53 -06:00
Nik Everett 3ce41a0e15 Painless: Add augmentation to string for base 64 (#22665)
We don't want to expose `String#getBytes` which is required for
`Base64.getEncoder.encode` to work because we're worried about
character sets. This adds `encodeBase64` and `decodeBase64`
methods to `String` in Painless that are duals of one another
such that:
`someString == someString.encodeBase64().decodeBase64()`.

Both methods work with the UTF-8 encoding of the string.

Closes #22648
2017-01-19 09:31:45 -05:00
Nik Everett ee5f8c4522 Consolidate some reindex utility classes (#22666)
Everything that extended `AbstractAsyncBulkByScrollAction` also
extended `AbstractAsyncBulkIndexByScrollAction` so this removes
`AbstractAsyncBulkIndexByScrollAction`, merging it into
`AbstractAsyncBulkByScrollAction`.
2017-01-18 16:58:39 -05:00
Nik Everett 1fe74a6b4b Better error when can't auto create index (#22488)
Changes the error message when `action.auto_create_index` or
`index.mapper.dynamic` forbids automatic creation of an index
from `no such index` to one of:
* `no such index and [action.auto_create_index] is [false]`
* `no such index and [index.mapper.dynamic] is [false]`
* `no such index and [action.auto_create_index] contains [-<pattern>] which forbids automatic creation of the index`
* `no such index and [action.auto_create_index] ([all patterns]) doesn't match`

This should make it more clear *why* there is `no such index`.

Closes #22435
2017-01-18 15:18:32 -05:00
Simon Willnauer 24e2847af2 Streamline foreign stored context restore and allow to perserve response headers (#22677)
Today we do not preserve response headers if they are present on a transport protocol
response. While preserving these headers is not always desired, in the most cases we
should pass on these headers to have consistent results for depreciation headers etc.
yet, this hasn't been much of a problem since most of the deprecations are detected early
ie. on the coordinating node such that this bug wasn't uncovered until #22647

This commit allow to optionally preserve headers when a context is restored and also streamlines
the context restore since it leaked frequently into the callers thread context when the callers
context wasn't restored again.
2017-01-18 16:17:54 +01:00
Igor Motov 500548fcda Remove taskManager.registerChildTask
Instead of forcing each task to register all nodes where its children are running, this commit runs cancellation on all nodes. The task cancellation operation doesn't run too frequently, so this optimization doesn't seem to be worth additional complexity of the interface.
2017-01-17 18:07:31 -05:00
Ali Beyad e2977889b8 Allow comma delimited array settings to have a space after each entry (#22591)
Previously, certain settings that could take multiple comma delimited
values would pick up incorrect values for all entries but the first if
each comma separated value was followed by a whitespace character.  For
example, the multi-value "A,B,C" would be correctly parsed as
["A", "B", "C"] but the multi-value "A, B, C" would be incorrectly parsed
as ["A", " B", " C"].

This commit allows a comma separated list to have whitespace characters
after each entry.  The specific settings that were affected by this are:

  cluster.routing.allocation.awareness.attributes
  index.routing.allocation.require.*
  index.routing.allocation.include.*
  index.routing.allocation.exclude.*
  cluster.routing.allocation.require.*
  cluster.routing.allocation.include.*
  cluster.routing.allocation.exclude.*
  http.cors.allow-methods
  http.cors.allow-headers

For the allocation filtering related settings, this commit also provides
validation of each specified entry if the filtering is done by _ip,
_host_ip, or _publish_ip, to ensure that each entry is a valid IP
address.

Closes #22297
2017-01-17 08:51:04 -06:00
Tanguy Leroux f5542ed47f Simplify ElasticsearchException rendering as a XContent (#22611)
This commit tries to simplify the way ElasticsearchException are rendered to xcontent. It adds some documentation and renames and merges some methods. Current behavior is preserved, the goal is to be more readable and centralize everything in the ElasticsearchException class.
2017-01-17 15:44:49 +01:00
Tim Brooks 16a76d9bc0 Remove blocking TCP clients and servers (#22639)
This commit removes the option to use the blocking variants of the TCP
transport server, TCP transport client, or http server.
2017-01-16 18:38:51 -06:00
Simon Willnauer f30b1f82ee Remove HttpServer and HttpServerAdapter in favor of a simple dispatch method (#22636)
Today we have quite some abstractions that are essentially providing a simple
dispatch method to the plugins defining a `HttpServerTransport`. This commit
removes `HttpServer` and `HttpServerAdaptor` and introduces a simple `Dispatcher` functional
interface that delegate to `RestController` by default.

Relates to #18482
2017-01-16 21:06:08 +01:00
javanna a8a13bb46f replace custom functional interface with CheckedFunction in percolate module 2017-01-16 13:57:58 +01:00
Alexander Reelsen f6ee6e420b Indexing: Add shard id to indexing operation listener (#22606)
The IndexingOperationListener interface did not provide any
information about the shard id when a document was indexed.

This commit adds the shard id as the first parameter to all methods
in the IndexingOperationListener.
2017-01-16 09:08:16 +01:00
Tim Brooks f4270f9914 Wrap netty accept/connect ops with doPrivileged (#22572)
This is related to #22116. netty channels require socket `connect` and
`accept` privileges. Netty does not currently wrap these operations
with `doPrivileged` blocks. These changes extend the netty channels
and wrap calls to the relevant super methods in doPrivileged blocks.
2017-01-13 14:27:09 -06:00
Zachary Tong 18fdc39b8c Increase visibility of doExecute so it can be used directly (#22614) 2017-01-13 09:42:02 -05:00
Nik Everett baed02bbe2 Whitelist some ScriptDocValues in painless (#22600)
Without this whitelist painless can't use ip or binary doc values.

Closes #22584
2017-01-12 15:26:09 -05:00
Jason Tedor 126efea56c Upgrade to Netty 4.1.7
This commit upgrades the Netty dependency to version 4.1.7.Final,
picking up some important bug fixes.

Relates #22587
2017-01-12 10:58:21 -05:00
javanna 64c3212fdb Remove ParseFieldMatcher usages from IndexSettings 2017-01-12 14:43:35 +01:00
javanna 8072f168a3 Remove ParseFieldMatcher usages from QueryParseContext 2017-01-12 14:43:35 +01:00
Luca Cavanna 0f7d52df68 Remove some more ParseFieldMatcher usages (#22571) 2017-01-12 10:04:10 +01:00
Nik Everett 25a5f1869a Improve error message when reindex-from-remote gets bad json (#22536)
Adds a message about how the remote is unlikely to be Elasticsearch.
This isn't as good as including the whole message from the remote but
we can't do that because we are stream parsing it and we don't want
to mark the whole request.

Closes #22330
2017-01-11 12:55:23 -05:00
Jack Conradson 0c694b3d19 Update loop counter to be higher (1000000) instead of (10000). 2017-01-11 09:22:24 -08:00
Nik Everett abb7d7841f Remove SearchRequestParsers (#22538)
It is empty now that we've moved all the parsing into `namedObject`.
2017-01-11 10:28:14 -05:00
Luca Cavanna 0f391336f5 Clean up SearchShardTarget (#22468)
* unify shard target setter

* Remove indexText member from SearchShardTarget

* Remove duplicated indexName getter from SearchShardTarget

* Remove duplicated shardId getter from SearchShardTarget

* Remove duplicated nodeIde getter from SearchShardTarget

* Rename SearchShardTarget#nodeIdText getter to getNodeIdText

* Remove unused InternalSearchHit#internalSourceRef unused method

* Remove unused InternalSearchHit#internalHighlightFields unused method

* Make SearchShardTarget members final
2017-01-11 10:08:31 +01:00
Nik Everett b71b8acf59 Remove ClusterService from ctors in reindex (#22539)
Moves fetching the local node id into `NodeClient` which is a
fairly useful place to put it so you can generate task ids from
`NodeClient#executeLocally`.
2017-01-10 18:26:06 -05:00
Nik Everett d50f96e122 Remove InternalAggregation.Type (#22511)
It is no longer needed. It used to contain a lot of strings
used by serialization but those have since been removed. Now
it is just another thing to pass around that we don't really
need.
2017-01-10 11:57:19 -05:00
Nik Everett 78bb56671e Fix reindex from remote clearing scroll (#22525)
Reindex-from-remote had a race when it tried to clear the scroll. It
first starts the request to clear the scroll and then submits a task
to the generic threadpool to shutdown the client. These two things
race and, in my experience, closing the scroll generally loses. That
means that most of the time reindex-from-remote isn't clearing the
scrolls that it uses. This isn't the end of the world because we
flush old scroll contexts after a while but this isn't great.

Noticed while experimenting with #22514.
2017-01-10 10:30:23 -05:00
Nik Everett 5ef78fd015 Fix source filtering in reindex-from-remote (#22514)
Reindex-from-remote was accepting source filtering in the request
but ignoring it and setting `_source=true` on the search URI. This
fixes the filtering so it is piped through to the remote node and
adds tests for that.

Closes #22507
2017-01-10 09:00:12 -05:00
Martijn van Groningen cb2333dacd percolator: remove deprecated percolate and mpercolate apis 2017-01-10 11:18:27 +01:00