Commit Graph

4279 Commits

Author SHA1 Message Date
Christoph Büscher b83e14858a Correcting some minor typos in comments 2017-12-07 16:39:23 +01:00
Robin Neatherway 057efea893 Correct two equality checks on incomparable types (#27688) 2017-12-07 14:18:11 +01:00
Martijn van Groningen 4d78e1a9ad
Added msearch api to high level client 2017-12-05 10:17:47 +01:00
Christoph Büscher c4fe7d3f72 [Docs] add deprecation warning for `delimited_payload_filter` renaming 2017-12-04 10:22:05 +01:00
Adrien Grand 6323bb0d97
Upgrade to lucene-7.2.0-snapshot-8c94404. (#27619)
This new snapshot mostly brings a change to TopFieldCollector which can now
early terminate collection when trackTotalHits is `false`.

As a follow-up, we should replace our usage of
`EarlyTerminatingSortingCollector` with this new option.
2017-12-04 09:40:08 +01:00
Jack Conradson 2d927fabab
Painless: Fix errors allowing void to be assigned to def. (#27460) 2017-11-28 13:44:52 -08:00
Jack Conradson 9e42b77f7e
Painless: Fix variable scoping issue in lambdas not including captured variables. (#27571) 2017-11-28 13:30:13 -08:00
Adrien Grand 996990ad1f
Upgrade to lucene-7.2.0-snapshot-8c94404. (#27496)
The main highlight of this new snapshot is that it introduces the opportunity
for queries to opt out of caching. In case a query opts out of caching, not only
will it never be cached, but also no compound query that wraps it will be
cached.
2017-11-28 14:52:42 +01:00
Martijn van Groningen cb1204774b
Include the _index, _type and _id to nested search hits in the top_hits and inner_hits response.
Also include _type and _id for parent/child hits inside inner hits.

In the case of top_hits aggregation the nested search hits are
directly returned and are not grouped by a root or parent document, so
it is important to include the _id and _index attributes in order to know
to what documents these nested search hits belong to.

Closes #27053
2017-11-28 14:05:29 +01:00
Martijn van Groningen 4ab638b71d
percolator: Avoid TooManyClauses exception if number of terms / ranges is exactly equal to 1024
The logic whether to use CoveringQuery was in two places which is why this bug snug in.
2017-11-27 08:55:11 +01:00
kel 4885acb048 Replace `delimited_payload_filter` by `delimited_payload` (#26625)
The `delimited_payload_filter` is renamed to `delimited_payload`, the old name is 
deprecated and should be replaced by `delimited_payload`.

Closes #21978
2017-11-24 13:03:19 +01:00
Simon Willnauer fadbe0de08
Automatically prepare indices for splitting (#27451)
Today we require users to prepare their indices for split operations.
Yet, we can do this automatically when an index is created which would
make the split feature a much more appealing option since it doesn't have
any 3rd party prerequisites anymore.

This change automatically sets the number of routinng shards such that
an index is guaranteed to be able to split once into twice as many shards.
The number of routing shards is scaled towards the default shard limit per index
such that indices with a smaller amount of shards can be split more often than
larger ones. For instance an index with 1 or 2 shards can be split 10x
(until it approaches 1024 shards) while an index created with 128 shards can only
be split 3x by a factor of 2. Please note this is just a default value and users
can still prepare their indices with `index.number_of_routing_shards` for custom
splitting.

NOTE: this change has an impact on the document distribution since we are changing
the hash space. Documents are still uniformly distributed across all shards but since
we are artificually changing the number of buckets in the consistent hashign space
document might be hashed into different shards compared to previous versions.

This is a 7.0 only change.
2017-11-23 09:48:54 +01:00
Simon Willnauer 5a0b6d1977
Use the primary_term field to identify parent documents (#27469)
This change stops indexing the `_primary_term` field for nested documents
to allow fast retrieval of parent documents. Today we create a docvalues
field for children to ensure we have a dense datastructure on disk. Yet,
since we only use the primary term to tie-break on when we see the same
seqID on indexing having a dense datastructure is less important. We can
use this now to improve the nested docs performance and it's memory footprint.

Relates to #24362
2017-11-21 15:14:03 +01:00
Jim Ferenczi 6319424e4a
Move composite aggregation to core (#27474)
This change removes the module named aggs-composite and adds the `composite` aggs
as a core aggregation. This allows other plugins to use this new aggregation
and simplifies the integration in the HL rest client.
2017-11-21 13:31:01 +01:00
Luca Cavanna 29450de7b5
Cross Cluster Search: make remote clusters optional (#27182)
Today Cross Cluster Search requires at least one node in each remote cluster to be up once the cross cluster search is run. Otherwise the whole search request fails despite some of the data (either local and/or remote) is available. This happens when performing the _search/shards calls to find out which remote shards the query has to be executed on. This scenario is different from shard failures that may happen later on when the query is actually executed, in case e.g. remote shards are missing, which is not going to fail the whole request but rather yield partial results, and the _shards section in the response will indicate that.

This commit introduces a boolean setting per cluster called search.remote.$cluster_alias.skip_if_disconnected, set to false by default, which allows to skip certain clusters if they are down when trying to reach them through a cross cluster search requests. By default all clusters are mandatory.

Scroll requests support such setting too when they are first initiated (first search request with scroll parameter), but subsequent scroll rounds (_search/scroll endpoint) will fail if some of the remote clusters went down meanwhile.

The search API response contains now a new _clusters section, similar to the _shards section, that gets returned whenever one or more clusters were disconnected and got skipped:

"_clusters" : {
    "total" : 3,
    "successful" : 2,
    "skipped" : 1
}
Such section won't be part of the response if no clusters have been skipped.

The per cluster skip_unavailable setting value has also been added to the output of the remote/info API.
2017-11-21 11:41:47 +01:00
Tim Brooks 4e04f95ab4
Fix issue where pages aren't released (#27459)
This is related to #27422. Right now when we send a write to the netty
transport, we attach a listener to the future. When you submit a write
on the netty event loop and the event loop is shutdown, the onFailure
method is called. Unfortunately, netty then tries to notify the listener
which cannot be done without dispatching to the event loop. In this
case, the dispatch fails and netty logs and error and does not tell us.

This commit checks that netty is still not shutdown after sending a
message. If netty is shutdown, we complete the listener.
2017-11-20 14:53:08 -07:00
Tim Brooks 0a8f48d592
Transition transport apis to use void listeners (#27440)
Currently we use ActionListener<TcpChannel> for connect, close, and send
message listeners in TcpTransport. However, all of the listeners have to
capture a reference to a channel in the case of the exception api being
called. This commit changes these listeners to be type <Void> as passing
the channel to onResponse is not necessary. Additionally, this change
makes it easier to integrate with low level transports (which use
different implementations of TcpChannel).
2017-11-20 10:47:47 -07:00
Tim Brooks 80ef9bbdb1
Remove parameterization from TcpTransport (#27407)
This commit is a follow up to the work completed in #27132. Essentially
it transitions two more methods (sendMessage and getLocalAddress) from
Transport to TcpChannel. With this change, there is no longer a need for
TcpTransport to be aware of the specific type of channel a transport
returns. So that class is no longer parameterized by channel type.
2017-11-16 11:19:36 -07:00
Jim Ferenczi 623367d793
Add composite aggregator (#26800)
* This change adds a module called `aggs-composite` that defines a new aggregation named `composite`.
The `composite` aggregation is a multi-buckets aggregation that creates composite buckets made of multiple sources.
The sources for each bucket can be defined as:
  * A `terms` source, values are extracted from a field or a script.
  * A `date_histogram` source, values are extracted from a date field and rounded to the provided interval.
This aggregation can be used to retrieve all buckets of a deeply nested aggregation by flattening the nested aggregation in composite buckets.
A composite buckets is composed of one value per source and is built for each document as the combinations of values in the provided sources.
For instance the following aggregation:

````
"test_agg": {
  "terms": {
    "field": "field1"
  },
  "aggs": {
    "nested_test_agg":
      "terms": {
        "field": "field2"
      }
  }
}
````
... which retrieves the top N terms for `field1` and for each top term in `field1` the top N terms for `field2`, can be replaced by a `composite` aggregation in order to retrieve **all** the combinations of `field1`, `field2` in the matching documents:

````
"composite_agg": {
  "composite": {
    "sources": [
      {
	"field1": {
          "terms": {
              "field": "field1"
            }
        }
      },
      {
	"field2": {
          "terms": {
            "field": "field2"
          }
        }
      },
    }
  }
````

The response of the aggregation looks like this:

````
"aggregations": {
  "composite_agg": {
    "buckets": [
      {
        "key": {
          "field1": "alabama",
          "field2": "almanach"
        },
        "doc_count": 100
      },
      {
        "key": {
          "field1": "alabama",
          "field2": "calendar"
        },
        "doc_count": 1
      },
      {
        "key": {
          "field1": "arizona",
          "field2": "calendar"
        },
        "doc_count": 1
      }
    ]
  }
}
````

By default this aggregation returns 10 buckets sorted in ascending order of the composite key.
Pagination can be achieved by providing `after` values, the values of the composite key to aggregate after.
For instance the following aggregation will aggregate all composite keys that sorts after `arizona, calendar`:

````
"composite_agg": {
  "composite": {
    "after": {"field1": "alabama", "field2": "calendar"},
    "size": 100,
    "sources": [
      {
	"field1": {
          "terms": {
            "field": "field1"
          }
        }
      },
      {
	"field2": {
          "terms": {
            "field": "field2"
          }
	}
      }
    }
  }
````

This aggregation is optimized for indices that set an index sorting that match the composite source definition.
For instance the aggregation above could run faster on indices that defines an index sorting like this:

````
"settings": {
  "index.sort.field": ["field1", "field2"]
}
````

In this case the `composite` aggregation can early terminate on each segment.
This aggregation also accepts multi-valued field but disables early termination for these fields even if index sorting matches the sources definition.
This is mandatory because index sorting picks only one value per document to perform the sort.
2017-11-16 15:13:36 +01:00
Tim Brooks ca11085bb6
Add TcpChannel to unify Transport implementations (#27132)
Right now our different transport implementations must duplicate
functionality in order to stay compliant with the requirements of
TcpTransport. They must all implement common logic to open channels,
close channels, keep track of channels for eventual shutdown, etc.

Additionally, there is a weird and complicated relationship between
Transport and TransportService. We eventually want to start merging
some of the functionality between these classes.

This commit starts moving towards a world where TransportService retains
all the application logic and channel state. Transport implementations
in this world will only be tasked with returning a channel when one is
requested, calling transport service when a channel is accepted from
a server, and starting / stopping itself.

Specifically this commit changes how channels are opened and closed. All
Transport implementations now return a channel type that must comply with
the new TcpChannel interface. This interface has the methods necessary
for TcpTransport to completely manage the lifecycle of a channel. This
includes setting the channel up, waiting for connection, adding close
listeners, and eventually closing.
2017-11-15 12:38:39 -07:00
Clinton Gormley 1caa5c8e32 Rest test fixes (#27354)
* REST: Rename ingest.processor.grok to ingest.processor_grok
* REST: Rename remote.info to cluster.remote_info
* REST: Fixed bad YAML comments
* REST: Force dummy scripts to be strings, not numbers
* REST: Fix bad YAML in search/110_field_collapsing.yml
* REST: Adjust percentile tests to work with Perl number handling
2017-11-14 11:14:14 +01:00
Tal Levy 5c34533761
add json-processor support for non-map json types (#27335)
The Json Processor originally only supported parsing field values into Maps even
though the JSON spec specifies that strings, null-values, numbers, booleans, and arrays
are also valid JSON types. This commit enables parsing these values now.

response to #25972.
2017-11-13 10:28:19 -08:00
Martijn van Groningen 7c056f4523
reword comment 2017-11-13 08:00:34 +01:00
Ryan Ernst 8b9e23de93
Plugins: Add versionless alias to all security policy codebase properties (#26756)
This is a followup to #26521. This commit expands the alias added for
the elasticsearch client codebase to all codebases. The original full
jar name property is left intact. This only adds an alias without the
version, which should help ease the pain in updating any versions (ES
itself or dependencies).
2017-11-10 11:00:09 -08:00
Martijn van Groningen 1bd31e9b53
percolator: fixed issue where in indices created before 6.1 if minimum should match has been specified on a disjunction,
the query would be marked as verified candidate match. This is wrong as it can only marked as verified candidate match
on indices created on or after 6.1, due to the use of the CoveringQuery.
2017-11-10 12:02:33 +01:00
Martijn van Groningen b4048b4e7f
Use CoveringQuery to select percolate candidate matches and
extract all clauses from a conjunction query.

When clauses from a conjunction are extracted the number of clauses is
also stored in an internal doc values field (minimum_should_match field).
This field is used by the CoveringQuery and allows the percolator to
reduce the number of false positives when selecting candidate matches and
in certain cases be absolutely sure that a conjunction candidate match
will match and then skip MemoryIndex validation. This can greatly improve
performance.

Before this change only a single clause was extracted from a conjunction
query. The percolator tried to extract the clauses that was rarest in order
(based on term length) to attempt less candidate queries to be selected
in the first place. However this still method there is still a very high
chance that candidate query matches are false positives.

This change also removes the influencing query extraction added via #26081
as this is no longer needed because now all conjunction clauses are extracted.

https://www.elastic.co/guide/en/elasticsearch/reference/6.x/percolator.html#_influencing_query_extraction

Closes #26307
2017-11-10 07:44:42 +01:00
Tal Levy d22fd4ea58
Introduce templating support to timezone/locale in DateProcessor (#27089)
Sometimes systems like Beats would want to extract the date's timezone and/or locale
from a value in a field of the document. This PR adds support for mustache templating
to extract these values.

Closes #24024.
2017-11-09 09:45:32 -08:00
Mayya Sharipova 148376c2c5
Add limits for ngram and shingle settings (#27211)
* Add limits for ngram and shingle settings (#27211)

Create index-level settings:
max_ngram_diff - maximum allowed difference between max_gram and min_gram in
NGramTokenFilter/NGramTokenizer. Default is 1.
max_shingle_diff - maximum allowed difference between max_shingle_size and
 min_shingle_size in ShingleTokenFilter.  Default is 3.

Throw an IllegalArgumentException when
trying to create NGramTokenFilter, NGramTokenizer, ShingleTokenFilter
where difference between max_size and min_size exceeds the settings value.

Closes #25887
2017-11-07 08:14:55 -05:00
David Roberts 749c3ec716
Remove the single argument Environment constructor (#27235)
Only tests should use the single argument Environment constructor.  To
enforce this the single arg Environment constructor has been replaced with
a test framework factory method.

Production code (beyond initial Bootstrap) should always use the same
Environment object that Node.getEnvironment() returns.  This Environment
is also available via dependency injection.
2017-11-04 13:25:09 +00:00
Armin Braun 3deba0ed1f #26260 Allow ip_range to accept CIDR notation (#27192)
*  #26260 Allow ip_range to accept CIDR notation

*  #26260 added non-byte-alligned cidr test cases
2017-11-03 13:34:48 -06:00
Armin Braun 8f0f024507 #27189 Fixed rounding of bounds in scaled float comparison (#27207)
*  #27189 Fixed rounding of bounds in scaled float comparison

*  #27189 more assertions from CR
2017-11-03 13:23:07 -06:00
Armin Braun f9e755f980 Fixed byte buffer leak in Netty4 request handler
If creating the REST request throws an exception (for example, because
of invalid headers), we leak the request due to failure to release the
buffer (which would otherwise happen after replying on the
channel). This commit addresses this leak by handling the failure case.

Relates #27222
2017-11-02 20:22:19 -04:00
Colin Goodheart-Smithe c1b8140c83
Upgrade to Lucene 7.1 (#27225) 2017-11-02 13:25:33 +00:00
Colin Goodheart-Smithe 99aca9cdfc
Enhances exists queries to reduce need for `_field_names` (#26930)
* Enhances exists queries to reduce need for `_field_names`

Before this change we wrote the name all the fields in a document to a `_field_names` field and then implemented exists queries as a term query on this field. The problem with this approach is that it bloats the index and also affects indexing performance.

This change adds a new method `existsQuery()` to `MappedFieldType` which is implemented by each sub-class. For most field types if doc values are available a `DocValuesFieldExistsQuery` is used, falling back to using `_field_names` if doc values are disabled. Note that only fields where no doc values are available are written to `_field_names`.

Closes #26770

* Addresses review comments

* Addresses more review comments

* implements existsQuery explicitly on every mapper

* Reinstates ability to perform term query on `_field_names`

* Added bwc depending on index created version

* Review Comments

* Skips tests that are not supported in 6.1.0

These values will need to be changed after backporting this PR to 6.x
2017-11-01 10:46:59 +00:00
Jack Conradson abaede2373
Upgrade Painless from ANTLR 4.5.1-1 to ANTLR 4.5.3. (#27153) 2017-10-27 11:07:49 -07:00
Christoph Büscher b88dbe8f49 [Tests] Fix occasional test failure due to two random values being the same 2017-10-27 12:06:16 +02:00
Jack Conradson dda5d1af29 Allow for the Painless Definition to have multiple instances (#27096) 2017-10-26 08:33:55 -07:00
Jason Tedor 9aae2f593a Avoid stack overflow on search phases
When a search is executing locally over many shards, we can stack
overflow during query phase execution. This happens due to callbacks
that occur after a phase completes for a shard and we move to the same
phase on another shard. If all the shards for the query are local to the
local node then we will never go async and these callbacks will end up
as recursive calls. With sufficiently many shards, this will end up as a
stack overflow. This commit addresses this by truncating the stack by
forking to another thread on the executor for the phase.

Relates #27069
2017-10-25 22:05:46 -04:00
Ryan Ernst 2a8452b513 Reindex: Fix headers in reindex action (#26937)
The headers passed to reindex were skipped except for the last one. This
commit fixes the copying of the headers, as well as adds a base test
case for rest client builders to access the headers within the built
rest client.

relates #22976
2017-10-25 16:37:01 -07:00
Tim Brooks a7fa5d3335 Remove dangerous `ByteBufStreamInput` methods (#27076)
This commit removes the `ByteBufStreamInput` `readBytesReference` and
`readBytesRef` methods. These methods are zero-copy which means that
they retain a reference to the underlying netty buffer. The problem is
that our `TcpTransport` is not designed to handle zero-copy. The netty
implementation sets the read index past the current message once it has
been deserialized, handled, and mostly likely dispatched to another
thread. This means that netty is free to release this buffer. So it is
unsafe to retain a reference to it without calling `retain`. And we
cannot call `retain` because we are not currently designed to handle
reference counting past the transport level.

This should not currently impact us as we wrap the `ByteBufStreamInput`
in `NamedWriteableAwareStreamInput` in the `TcpTransport`. This stream
essentially delegates to the underling stream. However, in the case of
`readBytesReference` and `readBytesRef` it leaves thw implementations
to the standard `StreamInput` methods. These methods call the read byte
array method which delegates to `ByteBufStreamInput`. The read byte
array method on `ByteBufStreamInput` copies so it is safe. The only
impact of this commit should be removing methods that could be dangerous
if they were eventually called due to some refactoring.
2017-10-24 08:51:14 -06:00
Martijn van Groningen 93107f8466
removed unused import 2017-10-23 10:00:54 +02:00
Martijn van Groningen 141d1b62e9
ingest: date processor should not fail if timestamp is specified as json number
Closes #26967
2017-10-23 09:32:44 +02:00
Tanguy Leroux 463e7e6fa3 Revert "Upgrade to Jackson 2.9.2 (#27032)"
This reverts commit 0b9acc5ace.
2017-10-20 08:25:41 +02:00
Tanguy Leroux 0b9acc5ace Upgrade to Jackson 2.9.2 (#27032)
Upgrade to Jackson 2.9.2 and also use a boolean `closed` flag to
indicate that a FastStringReader instance is closed, so that length
is still correctly reported after the reader is closed.
2017-10-19 15:15:02 +02:00
Simon Willnauer 8dda827ff4 Don't refresh on `_flush` `_force_merge` and `_upgrade` (#27000)
Today all these API calls have a sideeffect of making documents visible
to search requests. While this is sometimes desired it's an unnecessary sideeffect
and now that we have an internal (engine-private) index reader (#26972) we artificially
add a refresh call for bwc. This change removes this sideeffect in 7.0.
2017-10-16 10:16:35 +02:00
Tim Brooks 277637f42f Do not set SO_LINGER on server channels (#26997)
Right now we are attempting to set SO_LINGER to 0 on server channels
when we are stopping the tcp transport. This is not a supported socket
option and throws an exception. This also prevents the channels from
being closed.

This commit 1. doesn't set SO_LINGER for server channges, 2. checks
that it is a supported option in nio, and 3. changes the log message
to warn for server channel close exceptions.
2017-10-13 13:06:38 -06:00
Anton Pozhidaev cee9640c20 Update by Query is modified to accept short `script` parameter. (#26841)
Update by Query is modified to accept short `script` parameter.

Closes issue #24898
2017-10-11 21:57:46 +00:00
kel 2e36f19051 Add support for parsing inline script (#23824) (#26846)
* Add support for parsing inline script (#23824)

* Fix test
2017-10-11 09:15:37 -07:00
Jason Tedor 4c06b8f1d2 Check for closed connection while opening
While opening a connection to a node, a channel can subsequently
close. If this happens, a future callback whose purpose is to close all
other channels and disconnect from the node will fire. However, this
future will not be ready to close all the channels because the
connection will not be exposed to the future callback yet. Since this
callback is run once, we will never try to disconnect from this node
again and we will be left with a closed channel. This commit adds a
check that all channels are open before exposing the channel and throws
a general connection exception. In this case, the usual connection retry
logic will take over.

Relates #26932
2017-10-10 13:34:51 -04:00
Nik Everett 4a06dd919a Painless: add tests for cached boxing (#24163)
We had a TODO about adding tests around cached boxing. In #24077
I tracked down the uncached boxing tests and saw the TODO. Cached
boxing testing is a fairly small extension to that work.
2017-10-10 10:34:03 -04:00