Commit Graph

93 Commits

Author SHA1 Message Date
Tim Brooks 21838d73b5
Extract message serialization from `TcpTransport` (#37034)
This commit introduces a NetworkMessage class. This class has two
subclasses - InboundMessage and OutboundMessage. These messages can
be serialized and deserialized independent of the transport. This allows
more granular testing. Additionally, the serialization mechanism is now
a simple Supplier. This builds the framework to eventually move the
serialization of transport messages to the network thread. This is the
one serialization component that is not currently performed on the
network thread (transport deserialization and http serialization and
deserialization are all on the network thread).
2019-01-21 14:14:18 -07:00
Tim Brooks f516d68fb2
Share `NioGroup` between http and transport impls (#37396)
Currently we create dedicated network threads for both the http and
transport implementations. Since these these threads should never
perform blocking operations, these threads could be shared. This commit
modifies the nio-transport to have 0 http workers be default. If the
default configs are used, this will cause the http transport to be run
on the transport worker threads. The http worker setting will still exist
in case the user would like to configure dedicated workers. Additionally,
this commmit deletes dedicated acceptor threads. We have never had these
for the netty transport and they can be added back if a need is
determined in the future.
2019-01-21 13:50:56 -07:00
Tim Vernum 6d99e790b3
Add SSL Configuration Library (#37287)
This introduces a new ssl-config library that can parse
and validate SSL/TLS settings and files.

It supports the standard configuration settings as used in the
Elastic Stack such as "ssl.verification_mode" and
"ssl.certificate_authorities" as well as all file formats used
in other parts of Elasticsearch security (such as PEM, JKS,
PKCS#12, PKCS#8, et al).
2019-01-16 21:52:17 +11:00
Igor Motov 6f91f06d86
Geo: Adds a set of no dependency geo classes for JDBC driver (#36477)
Adds a set of geo classes to represent geo data in the JDBC driver and 
to be used as an intermediate format to pass geo shapes for indexing 
and query generation in #35320.

Relates to #35767 and #35320
2019-01-15 10:52:46 -05:00
Tim Brooks 9de62f1262
Increase IO direct byte buffers to 256KB (#37283)
Currently we read and write 64KB at a time in the nio libraries. As a
single byte buffer per event loop thread does not consume much memory,
there is little reason to not increase it further. This commit increases
the buffer to 256KB but still limits a single write to 64KB. The write
limit could be increased, but too high of a write limit will lead to
copying more data (if all the data is not flushed and needs to be copied
on the next call). This is something to explore in the future.
2019-01-10 09:17:20 -07:00
Tim Brooks cfa58a51af
Add TLS/SSL channel close timeouts (#37246)
Closing a channel using TLS/SSL requires reading and writing a
CLOSE_NOTIFY message (for pre-1.3 TLS versions). Many implementations do
not actually send the CLOSE_NOTIFY message, which means we are depending
on the TCP close from the other side to ensure channels are closed. In
case there is an issue with this, we need a timeout. This commit adds a
timeout to the channel close process for TLS secured channels.

As part of this change, we need a timer service. We could use the
generic Elasticsearch timeout threadpool. However, it would be nice to
have a local to the nio event loop timer service dedicated to network needs. In
the future this service could support read timeouts, connect timeouts,
request timeouts, etc. This commit adds a basic priority queue backed
service. Since our timeout volume (channel closes) is very low, this
should be fine. However, this can be updated to something more efficient
in the future if needed (timer wheel). Everything being local to the event loop
thread makes the logic simple as no locking or synchronization is necessary.
2019-01-09 11:46:24 -07:00
Alpar Torok 6344e9a3ce
Testing conventions: add support for checking base classes (#36650) 2019-01-08 13:39:03 +02:00
Alpar Torok a7c3d5842a
Split third party audit exclusions by type (#36763) 2019-01-07 17:24:19 +02:00
Alpar Torok e9ef5bdce8
Converting randomized testing to create a separate unitTest task instead of replacing the builtin test task (#36311)
- Create a separate unitTest task instead of Gradle's built in 
- convert all configuration to use the new task 
- the  built in task is now disabled
2018-12-19 08:25:20 +02:00
Tim Brooks e63d52af63
Move page size constants to PageCacheRecycler (#36524)
`PageCacheRecycler` is the class that creates and holds pages of arrays
for various uses. `BigArrays` is just one user of these pages. This
commit moves the constants that define the page sizes for the recycler
to be on the recycler class.
2018-12-12 07:00:50 -07:00
Tim Brooks 373c67dd7a
Add DirectByteBuffer strategy for transport-nio (#36289)
This is related to #27260. In Elasticsearch all of the messages that we
serialize to write to the network are composed of heap bytes. When you
read or write to a nio socket in java, the heap memory you passed down
must be copied to/from direct memory. The JVM internally does some
buffering of the direct memory, however it is essentially unbounded.

This commit introduces a simple mechanism of buffering and copying the
memory in transport-nio. Each network event loop is given a 64kb
DirectByteBuffer. When we go to read we use this buffer and copy the
data after the read. Additionally, when we go to write, we copy the data
to the direct memory before calling write. 64KB is chosen as this is the
default receive buffer size we use for transport-netty4
(NETTY_RECEIVE_PREDICTOR_SIZE).

Since we only have one buffer per thread, we could afford larger.
However, if we the buffer is large and not all of the data is flushed in
a write call, we will do excess copies. This is something we can
explore in the future.
2018-12-06 18:09:07 -07:00
Jim Ferenczi 18866c4c0b
Make hits.total an object in the search response (#35849)
This commit changes the format of the `hits.total` in the search response to be an object with
a `value` and a `relation`. The `value` indicates the number of hits that match the query and the
`relation` indicates whether the number is accurate (in which case the relation is equals to `eq`)
or a lower bound of the total (in which case it is equals to `gte`).
This change also adds a parameter called `rest_total_hits_as_int` that can be used in the
search APIs to opt out from this change (retrieve the total hits as a number in the rest response).
Note that currently all search responses are accurate (`track_total_hits: true`) or they don't contain
`hits.total` (`track_total_hits: true`). We'll add a way to get a lower bound of the total hits in a
follow up (to allow numbers to be passed to `track_total_hits`).

Relates #33028
2018-12-05 19:49:06 +01:00
Tim Brooks b6ed6ef189
Add sni name to SSLEngine in nio transport (#35920)
This commit is related to #32517. It allows an "sni_server_name"
attribute on a DiscoveryNode to be propagated to the server using
the TLS SNI extentsion. Prior to this commit, this functionality
was only support for the netty transport. This commit adds this
functionality to the security nio transport.
2018-11-27 09:06:52 -07:00
John 0baffda390 ingest: grok remove duplicated patterns (#35886)
This commit removes the redundant (and incorrect) JAVACLASS
and JAVAFILE grok patterns. This helps to keep parity with 
Logstash's patterns. 

See also: https://github.com/logstash-plugins/logstash-patterns-core/pull/237
 
closes #35699
2018-11-26 11:13:46 -06:00
Igor Motov 39789d0a10
GEO: More robust handling of ignore_malformed in geoshape parsing (#35603)
Adds an XContent sub parser class that can to wrap another
XContent parser at the beginning of an object and allow skiping
all children in case of the parsing failure. It also uses this
subparser to ignore the rest of the GeoJson shape if the 
parsing fails and we need to ignore the geoshape due to the 
ignore_malformed flag.

Supersedes #34498

Closes #34047
2018-11-21 11:04:01 -10:00
Simon Willnauer 0cc0fd2d15
Add a frozen engine implementation (#34357)
This change adds a `frozen` engine that allows lazily open a directory reader
on a read-only shard. The engine wraps general purpose searchers in a LazyDirectoryReader
that also allows to release and reset the underlying index readers after any and before
secondary search phases.

Relates to #34352
2018-11-07 20:23:35 +01:00
Alan Woodward e2af849f70
Move ObjectPath and XContentUtils to libs/x-content (#34803)
These are generally useful utility classes that do not need to live in the Watcher code
2018-11-02 15:12:09 +00:00
Nik Everett 3cde1356c1
XContent: Check for bad parsers (#34561)
Adds checks for misbehaving parsers. The checks aren't perfect at all but
they are simple and fast enough that we can do them all the time so
they'll catch most badly behaving parsers.

Closes #34351
2018-10-25 17:03:42 -04:00
Jay Modi d824cbe992
Test: ensure char[] doesn't being with prefix (#34816)
The testCharsBeginsWith test has a check that a random prefix of length
2 is not the prefix of a char[]. However, there is no check that the
char[] is not randomly generated with the same two characters as the
prefix. This change ensures that the char[] does not begin with the
prefix.

Closes #34765
2018-10-25 08:58:21 -06:00
Julie Tibshirani 5a4866f67d Mute CharArraysTests#testCharsBeginsWith while we await a fix. 2018-10-23 11:37:54 -07:00
Alpar Torok 0536635c44
Upgrade forbiddenapis to 2.6 (#33809)
* Upgrade forbiddenapis to 2.6

Closes #33759

* Switch forbiddenApis back to official plugin

* Remove CLI based task

* Fix forbiddenApisJava9
2018-10-23 12:06:46 +03:00
Daniel Mitterdorfer dbb6fe58fa
Remove hand-coded XContent duplicate checks
With this commit we cleanup hand-coded duplicate checks in XContent
parsing. They were necessary previously but since we reconfigured the
underlying parser in #22073 and #22225, these checks are obsolete and
were also ineffective unless an undocumented system property has been
set. As we also remove this escape hatch, we can remove the additional
checks as well.

Closes #22253
Relates #34588
2018-10-19 10:13:13 +02:00
Daniel Mitterdorfer 92b2e1a209
Remove lenient boolean handling
With this commit we remove some leftovers from #26389 which cleaned up
lenient boolean handling.

Relates #26389
Relates #22298
Relates #34467
2018-10-16 06:30:00 +02:00
Mayya Sharipova 80c5d30f30
XContentBuilder to handle BigInteger and BigDecimal (#32888)
Although we allow to index BigInteger and BigDecimal into a keyword
field, source filtering on these fields would fail
as XContentBuilder was not able to deserialize BigInteger and BigDecimal
to json.

This modifies XContentBuilder to allow to handle BigInteger and
BigDecimal.

Closes #32395
2018-09-26 14:24:31 -04:00
Christoph Büscher ba3ceeaccf
Clean up "unused variable" warnings (#31876)
This change cleans up "unused variable" warnings. There are several cases were we 
most likely want to suppress the warnings (especially in the client documentation test
where the snippets contain many unused variables). In a lot of cases the unused
variables can just be deleted though.
2018-09-26 14:09:32 +02:00
Vladimir Dolzhenko a3e8b831ee
add elasticsearch-shard tool (#32281)
Relates #31389
2018-09-19 10:28:22 +02:00
Simon Willnauer c783488e97
Add `_source`-only snapshot repository (#32844)
This change adds a `_source` only snapshot repository that allows to wrap
any existing repository as a _backend_ to snapshot only the `_source` part
including live docs markers. Snapshots taken with the `source` repository
won't include any indices,  doc-values or points. The snapshot will be reduced in size and
functionality such that it requires full re-indexing after it's successfully restored.

The restore process will copy the `_source` data locally starts a special shard and engine
to allow `match_all` scrolls and searches. Any other query, or get call will fail with and unsupported operation exception.  The restored index is also marked as read-only.

This feature aims mainly for disaster recovery use-cases where snapshot size is
a concern or where time to restore is less of an issue.

**NOTE**: The snapshot produced by this repository is still a valid lucene index. This change doesn't allow for any longer retention policies which is out of scope for this change.
2018-09-12 17:47:10 +02:00
Alpar Torok 44ed5f6306
Enable forbiddenapis server java9 (#33245) 2018-08-31 09:31:55 +03:00
Alpar Torok 5cf6e0d4bc
Ignore module-info in jar hell checks (#33011)
* Ignore module-info in JarHell checks
* Add unit test
* integration test to test that jarhell is ran with precommit
2018-08-30 11:41:39 +03:00
Alpar Torok 82d10b484a
Run forbidden api checks with runtimeJavaVersion (#32947)
Run forbidden APIs checks with runtime hava version
2018-08-22 09:05:22 +03:00
Adrien Grand 039babddf5 CharArraysTests: Fix test bug. 2018-08-16 11:54:39 +02:00
Jay Modi 1a45b27d8b
Move CharArrays to core lib (#32851)
This change cleans up some methods in the CharArrays class from x-pack, which
includes the unification of char[] to utf8 and utf8 to char[] conversions that
intentionally do not use strings. There was previously an implementation in
x-pack and in the reloading of secure settings. The method from the reloading
of secure settings was adopted as it handled more scenarios related to the
backing byte and char buffers that were used to perform the conversions. The
cleaned up class is moved into libs/core to allow it to be used by requests
that will be migrated to the high level rest client.

Relates #32332
2018-08-15 15:26:00 -06:00
Jake Landis be62092060
Introduce the dissect library (#32297)
The dissect library will be used for the ingest node as an alternative
to Grok to split a string based on a pattern. Dissect differs from
Grok such that regular expressions are not used to split the string.
Note - Regular expressions are used during construction of the
objects, but not in the hot path.

A dissect pattern takes the form of: '%{a} %{b},%{c}' which is
composed of 3 keys (a,b,c) and two delimiters (space and comma).
This dissect pattern will match a string of the form: 'foo bar,baz'
and will result a key/value pairing of 'a=foo, b=bar, and c=baz'.
See the comments in DissectParser for a full explanation.

This commit does not include the ingest node processor that will consume
it. However, the consumption should be a trivial mapping between the
key/value pairing returned by the parser and the key/value pairing
needed for the IngestDocument.
2018-08-14 17:08:55 -07:00
Armin Braun 580d59e2d7
CORE: Upgrade to Jackson 2.8.11 (#32670)
* closes #30352
2018-08-08 12:04:25 +02:00
Jason Tedor 3fb0923182
Fix content type detection with leading whitespace (#32632)
Today content type detection on an input stream works by peeking up to
twenty bytes into the stream. If the stream is headed by more whitespace
than twenty bytes, we might fail to detect the content type. We should
be ignoring this whitespace before attempting to detect the content
type. This commit does that by ignoring all leading whitespace in an
input stream before attempting to guess the content type.
2018-08-06 18:07:46 -04:00
Armin Braun 4dda5a990b
INGEST: Fix ThreadWatchDog Throwing on Shutdown (#32578)
* INGEST: Fix ThreadWatchDog Throwing on Shutdown

* #32539 is caused by the fact that ThreadWatchDog.Default could throw on shutdown if the ThreadPool is interrupted while `interruptLongRunningExecutions` is in progress. This is a result of the watchdog not having a lifecycle of its own (normally it terminates when the threadpool terminates).
  * We can't easily use `org.elasticsearch.common.util.concurrent.EsRejectedExecutionException#isExecutorShutdown` to catch this state the same way other components do since thatwould require adding the core lib to Grok as a dependency
  * Since we have no knowledge of the lifecycle in this compontent since we're only passed the scheduler `BiFunction` I fixed this by only scheduling the watchdog when there's actually registered threads in it.
    * I think using the patter of locking via two `Atomic*` values should not be much of a performance concern here under load since either the integer will likely be > 0 in this case (because we have multiple Grok in parallel) or the running state will be true because there likely was at least one thread registered when the watchdog ran and so the enqueing of the watchdog task during `register` will happen very rarely here (in the worst case scenario of only a single Grok thread it will happen less frequently than once every `ingest.grok.watchdog.interval`). The atomic update on the count should not be relevant relative to the cost of adding a new node to the CHM either.
* Fixes #32539
  * Also fixes the watchdog to run if it doens't have to in general.
2018-08-06 22:46:26 +02:00
Christoph Büscher ff87b7aba4
Remove unnecessary warning supressions (#32250) 2018-07-23 11:31:04 +02:00
Alpar Torok 38e2e1d553
Detect and prevent configuration that triggers a Gradle bug (#31912)
* Detect and prevent configuration that triggers a Gradle bug

As we found in #31862, this can lead to a lot of wasted time as it's not
immediatly obvius what's going on.
Givent how many projects we have it's getting increasingly easier to run
into gradle/gradle#847.
2018-07-19 06:46:58 +00:00
Tim Brooks c375d5ab23
Add nio transport to security plugin (#31942)
This is related to #27260. It adds the SecurityNioTransport to the
security plugin. Additionally, it adds support for ip filtering. And it
randomly uses the nio transport in security integration tests.
2018-07-12 11:55:38 -06:00
Christoph Büscher 4ae4ac08d5
Add Expected Reciprocal Rank metric (#31891)
This change adds Expected Reciprocal Rank (ERR) as a ranking evaluation metric
as descriped in:

Chapelle, O., Metlzer, D., Zhang, Y., & Grinspan, P. (2009).
Expected reciprocal rank for graded relevance.
Proceeding of the 18th ACM Conference on Information and Knowledge Management.
https://doi.org/10.1145/1645953.1646033

ERR is an extension of the classical reciprocal rank to the graded relevance
case and assumes a cascade browsing model. It quantifies the usefulness of a
document at rank `i` conditioned on the degree of relevance of the items at ranks
less than `i`. ERR seems to be gain traction as an alternative to (n)DCG, so it
seems like a good metric to support. Also ERR seems to be the default optimization
metric used for training in RankLib, a widely used learning to rank library.

Relates to #29653
2018-07-12 15:50:58 +02:00
Nik Everett fb27f3e7f0
HLREST: Add x-pack-info API (#31870)
This is the first x-pack API we're adding to the high level REST client
so there is a lot to talk about here!

= Open source

The *client* for these APIs is open source. We're taking the previously
Elastic licensed files used for the `Request` and `Response` objects and
relicensing them under the Apache 2 license.

The implementation of these features is staying under the Elastic
license. This lines up with how the rest of the Elasticsearch language
clients work.

= Location of the new files

We're moving all of the `Request` and `Response` objects that we're
relicensing to the `x-pack/protocol` directory. We're adding a copy of
the Apache 2 license to the root fo the `x-pack/protocol` directory to
line up with the language in the root `LICENSE.txt` file. All files in
this directory will have the Apache 2 license header as well. We don't
want there to be any confusion. Even though the files are under the
`x-pack` directory, they are Apache 2 licensed.

We chose this particular directory layout because it keeps the X-Pack
stuff together and easier to think about.

= Location of the API in the REST client

We've been following the layout of the rest-api-spec files for other
APIs and we plan to do this for the X-Pack APIs with one exception:
we're dropping the `xpack` from the name of most of the APIs. So
`xpack.graph.explore` will become `graph().explore()` and
`xpack.license.get` will become `license().get()`.

`xpack.info` and `xpack.usage` are special here though because they
don't belong to any proper category. For now I'm just calling
`xpack.info` `xPackInfo()` and intend to call usage `xPackUsage` though
I'm not convinced that this is the final name for them. But it does get
us started.

= Jars, jars everywhere!

This change makes the `xpack:protocol` project a `compile` scoped
dependency of the `x-pack:plugin:core` and `client:rest-high-level`
projects. I intend to keep it a compile scoped dependency of
`x-pack:plugin:core` but I intend to bundle the contents of the protocol
jar into the `client:rest-high-level` jar in a follow up. This change
has grown large enough at this point.

In that followup I'll address javadoc issues as well.

= Breaking-Java

This breaks that transport client by a few classes around. We've
traditionally been ok with doing this to the transport client.
2018-07-08 11:03:56 -04:00
Armin Braun b7b413e55e
Extend allowed characters for grok field names (#21745) (#31653) 2018-06-29 09:12:47 +02:00
Tim Brooks 86423f9563
Ensure local addresses aren't null (#31440)
Currently we set local addresses on the creation time of a NioChannel.
However, this may return null as the local address may not have been
set yet. An example is the local address has not been set on a client
channel as the connection process is not yet complete.

This PR modifies the getter to set the local field if it is currently null.
2018-06-20 19:50:14 -06:00
Tim Brooks ffba20b748
Do not preallocate bytes for channel buffer (#31400)
Currently, when we open a new channel, we pass it an
InboundChannelBuffer. The channel buffer is preallocated a single 16kb
page. However, there is no guarantee that this channel will be read from
anytime soon. Instead, this commit does not preallocate that page. That
page will be allocated when we receive a read event.
2018-06-19 09:36:12 -06:00
Tim Brooks a705e1a9e3
Add byte array pooling to nio http transport (#31349)
This is related to #28898. This PR implements pooling of bytes arrays
when reading from the wire in the http server transport. In order to do
this, we must integrate with netty reference counting. That manner in
which this PR implements this is making Pages in InboundChannelBuffer
reference counted. When we accessing the underlying page to pass to
netty, we retain the page. When netty releases its bytebuf, it releases
the underlying pages we have passed to it.
2018-06-15 14:01:03 -06:00
Tim Brooks 700357d04e
Immediately flush channel after writing to buffer (#31301)
This is related to #27260. Currently when we queue a write with a
channel we set OP_WRITE and wait until the next selection loop to flush
the write. However, if the channel does not have a pending write, it
is probably ready to flush. This PR implements an optimistic flush logic
that will attempt this flush.
2018-06-13 15:32:13 -06:00
Martijn van Groningen 6030d4be1e
[INGEST] Interrupt the current thread if evaluation grok expressions take too long (#31024)
This adds a thread interrupter that allows us to encapsulate calls to org.joni.Matcher#search()
This method can hang forever if the regex expression is too complex.

The thread interrupter in the background checks every 3 seconds whether there are threads
execution the org.joni.Matcher#search() method for longer than 5 seconds and
if so interrupts these threads.

Joni has checks that that for every 30k iterations it checks if the current thread is interrupted and
if so returns org.joni.Matcher#INTERRUPTED

Closes #28731
2018-06-12 07:49:03 +02:00
Tanguy Leroux bf58660482
Remove all unused imports and fix CRLF (#31207)
The X-Pack opening and the recent other refactorings left a lot of 
unused imports in the codebase. This commit removes them all.
2018-06-11 15:12:12 +02:00
Jason Tedor 5296c11e4f
Rename elasticsearch-nio to nio (#31186)
This commit renames :libs:elasticsearch-nio to :libs:nio.
2018-06-07 17:00:00 -04:00
Jason Tedor 94be9b471f
Rename elasticsearch-core to core (#31185)
This commit renames :libs:elasticsearch-core to :libs:core.
2018-06-07 16:50:21 -04:00