OpenSearch

Commit Graph

Author	SHA1	Message	Date
Simon Willnauer	d82a434d10	[STORE] Make a hybrid directory default using `mmapfs` and `niofs` `mmapfs` is really good for random access but can have sideeffects if memory maps are large depending on the operating system etc. A hybrid solution where only selected files are actually memory mapped but others mostly consumed sequentially brings the best of both worlds and minimizes the memory map impact. This commit mmaps only the `dvd` and `tim` file for fast random access on docvalues and term dictionaries. Closes #6636	2014-07-10 00:01:43 +02:00
Clinton Gormley	d3f8c66e26	Updated cache.asciidoc The index level filter cache was removed a long time ago Closes #6455	2014-07-04 14:26:20 +02:00
Ian Babrou	698eb7de9b	Fixed JSON in fielddata docs	2014-07-01 12:53:10 +02:00
Adrien Grand	7a34702925	[DOCS] Clarify the trade-off of the `disk` doc values format.	2014-06-13 13:24:53 +02:00
Lee Hinman	3a3f81d59b	Enable DiskThresholdDecider by default, change default limits to 85/90% Fixes #6200 Fixes #6201	2014-06-12 16:35:29 +02:00
Clinton Gormley	c41e63c2f9	Docs: Updated index-modules/store and setup/configuration Explain how to set different index storage types, and added the vm settings required to stop mmapfs from running out of memory Closes #6327	2014-06-12 13:56:06 +02:00
Israel Tsadok	1a58016ea1	[DOCS] Add special attributes for indices allocation filtering	2014-06-05 10:38:07 +02:00
Simon Willnauer	9d5507047f	Update Documentation Feature Flags [1.2.0]	2014-05-22 15:06:42 +02:00
Simon Willnauer	85a0b76dbb	Upgrade to Lucene 4.8.1 This commit upgrades to the latest Lucene 4.8.1 release including the following bugfixes: * An IndexThrottle now kicks in when merges start falling behind limiting index threads to 1 until merges caught up. Closes #6066 * RateLimiter now kicks in at the configured rate where previously the limiter was limiting at ~8MB/sec almost all the time. Closes #6018	2014-05-19 20:47:55 +02:00
mikemccand	00fcf4d560	#6081 : set IO throttling back to 20 MB/sec now that #6018 is fixed	2014-05-12 14:42:26 -04:00
mikemccand	b6ae7fbadb	#5882 : fix docs	2014-05-12 14:16:27 -04:00
mikemccand	254ebc2f88	#6120 Remove SerialMergeScheduler (master only) It's dangerous to expose SerialMergeScheduler as an option: since it only allows one merge at a time, it can easily cause merging to fall behind. Closes #6120	2014-05-12 14:06:20 -04:00
Ivan Brusic	bac0627c5e	Update fielddata.asciidoc Spelling correction	2014-05-08 10:59:24 +02:00
Ivan Brusic	59e0c34cdb	Update fielddata.asciidoc Fixed default value for circuit breaker	2014-05-08 10:58:10 +02:00
mikemccand	9daaae27b3	clarify that CMS defaults change is coming in 1.2	2014-05-07 13:49:54 -04:00
Adrien Grand	fc78dd2f13	[DOC] Fix default values for filter cache size and field data circuit breaker. Relates to #5990	2014-05-06 10:13:05 +02:00
mikemccand	07563379dc	fix docs for merging and throttling	2014-05-05 16:22:00 -04:00
Simon Willnauer	b4f0603169	Change default merge throttling to 50MB / sec The current setting of 20MB/sec seems to be too conservative given the capabilities of modern hardware. Even on cloud infrastructure this seems to be too lowish. A 50MB default should provide better out of the box performance	2014-04-22 21:08:40 +02:00
Simon Willnauer	1cf62e7782	Use unlimited flush_threshold_ops for translog Currently we use 5k operations as a flush threshold. Indexing 5k documents per second is rather common which would cause the index to be committed on the lucene level each time the flush logic runs which is 5 seconds by default. We should rather use a size based threshold similar to the lucene index writer that doesn't cause such agressive commits which can slow down indexing significantly especially since they cause the underlying devices to fsync their data.	2014-04-22 16:37:07 +02:00
Christoph Frick	e3e631eca5	Update allocation.asciidoc	2014-04-17 14:42:58 +02:00
Kouhei Sutou	de59cde926	Remove garbage	2014-04-15 17:57:25 +02:00
Simon Willnauer	9898eed30c	[DOCS] Update merge docs to reflect the max_merge_at_once property	2014-04-15 16:42:23 +02:00
Simon Willnauer	320a206352	Switch back to ConcurrentMergeScheduler Load tests showed that SerialMS has problems to keep up with the merges under high load. We should switch back to CMS until we have a better story to balance merge threads / efforts across shards on a single node. Closes #5817	2014-04-15 16:42:23 +02:00
Kevin Wang	ecab74fe6c	add lucene language model similarities (Dirichlet & JelinekMercer)	2014-04-07 10:48:03 +02:00
Martijn van Groningen	ade1d0ef57	Added global ordinals (unique incremental numbering for terms) to fielddata. Added a terms aggregation implementations that work on global ordinals, which is also the default. Closes #5672	2014-04-07 11:06:41 +07:00
Lee Hinman	211f740100	Add `getAsRatio` to Settings class, allow DiskThresholdDecider to take percentages Adds new RatioValue class that parses ratios between 0-100% expressed in either floating-point (0.13) or percentage (51.12%) notation. Closes #5690	2014-04-04 13:19:35 -06:00
Lee Hinman	c3089701f2	[DOCS] remove extraneous ` from cache page	2014-04-02 16:07:00 -06:00
Shay Banon	0ef3b03be1	Move to use serial merge schedule by default Today, we use ConcurrentMergeScheduler, and this can be painful since it is concurrent on a shard level, with a max of 3 threads doing concurrent merges. If there are several shards being indexed, then there will be a minor explosion of threads trying to do merges, all being throttled by our merge throttling. Moving to serial merge scheduler will still maintain concurrency of merges across shards, as we have the merge thread pool that schedules those merges. It will just be a serial one on a specific shard. Also, on serial merge scheduler, we now have a limit of how many merges it will do at one go, so it will let other shards get their fair chance of merging. We use the pending merges on IW to check if merges are needed or not for it. Note, that if a merge is happening, it will not block due to a sync on the maybeMerge call at indexing (flush) time, since we wrap our merge scheduler with the EnabledMergeScheduler, where maybeMerge is not activated during indexing, only with explicit calls to IW#maybeMerge (see Merges). closes #5447	2014-03-18 13:17:00 +01:00
Konrad Feldmeier	d7b0d547d4	[DOCS] Multiple doc fixes Closes #5047	2014-03-07 14:24:58 +01:00
Oleg Anashkin	eb0e1aa38f	Fix typo in similarity docs DRF similarity -> DFR similarity	2014-02-13 07:45:30 -08:00
Clinton Gormley	93930d6dc7	Removed 0.90.* deprecation and addition notifications Closes #5052	2014-02-07 20:52:49 +01:00
Shay Banon	d36e345f1f	fix docs to reflect removal of byte buffer memory	2014-02-03 09:54:30 -05:00
Brusic	d9b71a8083	[DOCS] various docs fixes Removed unused misc.asciidoc file Added plugins directory to directory layout Fixed transport.tcp.connect_timeout value to match the code found in NetworkService.TcpSettings Clarified that phrase query does not preserve order of terms Clarified merge page Added instructions on how to build documentation to docs/README	2014-01-23 10:52:13 +01:00
Clinton Gormley	faddd66e87	[DOCS] Added breaking changes in 1.0	2014-01-15 17:50:24 +01:00
Lee Hinman	3062e59f51	[DOCS] Fix default setting in circuit breaker documentation	2014-01-15 07:05:05 -07:00
Clinton Gormley	f8a427e266	[DOCS] Moved fielddata circuit breaker higher up the page	2014-01-15 14:00:08 +01:00
Shay Banon	4aa5ef139e	randomize flush interval so multiple shards won't flush at the sam time - also, allow to update interval using update settings on an index	2014-01-07 19:58:28 +01:00
Simon Willnauer	fa16969360	Cleanup comments and class names s/ElasticSearch/Elasticsearch * Clean up s/ElasticSearch/Elasticsearch on docs/* * Clean up s/ElasticSearch/Elasticsearch on src/* bin/* & pom.xml * Clean up s/ElasticSearch/Elasticsearch on NOTICE.txt and README.textile Closes #4634	2014-01-07 11:21:51 +01:00
Lee Hinman	47607a69a1	Default the circuit breaker limit to 80% of the maximum JVM heap	2014-01-03 16:21:55 -07:00
Lee Hinman	a754224751	Add field data memory circuit breaker. This adds the field data circuit breaker, which is used to estimate the amount of memory required to load field data before loading it. It then raises a CircuitBreakingException if the limit is exceeded. It is configured with two parameters: `indices.fielddata.cache.breaker.limit` - the maximum number of bytes of field data to be loaded before circuit breaking. Defaults to `indices.fielddata.cache.size` if set, unbounded otherwise. `indices.fielddata.cache.breaker.overhead` - a contast for all field data estimations to be multiplied with before aggregation. Defaults to 1.03. Both settings can be configured dynamically using the cluster update settings API.	2014-01-02 15:04:47 -07:00
Adrien Grand	05448b6276	Doc values for geo points. This commits add doc values support to geo point using the exact same approach as for numeric data: geo points for a given document are stored uncompressed and sequentially in a single binary doc values field. Close #4207	2013-12-27 12:45:18 +01:00
Clinton Gormley	dea6b112ae	[DOCS] Corrected bloom loading docs	2013-12-20 11:20:54 +01:00
Clinton Gormley	2b8c82c883	[DOCS] Documented index.codec.bloom.load for #4525	2013-12-20 10:51:17 +01:00
Adrien Grand	52db8eb324	More documentation improvements for fielddata loading.	2013-12-18 16:05:35 +01:00
Adrien Grand	07443089ce	Improve documentation of the new `disabled` field data format.	2013-12-18 15:44:57 +01:00
Adrien Grand	4e7ce4ee02	Make field data changes immediately taken into account and add the ability to disallow field data loading. This commit changes field data configuration updates so that they are immediately taken into account for loading new segments. The way it works is that field data configuration is now cached separately from the field data cache, meaning that it is now possible to clear the field data configuration from IndexFieldDataService while the cache will stay around. On the next time that Elasticsearch will reload field data configuration, it will check if there is already a cache entry, and reuse it if it exists. To disable field data loading, all that is required is to change the field data format to "none" (supported by all field data types) using the update mapping API. Elasticsearch will then refuse to load field data on any new segment, but field data which has been loaded on the previous segments will remain available. So you need to clear the field data cache in order to reclaim memory (otherwise memory will be reclaimed slower, as segments get merged). Close #4430 Close #4431	2013-12-16 14:34:33 +01:00
Lee Hinman	f7d5d1e5c9	[DOCS] Update store docs to indicate mmapfs is now the default on 64-bit Linux	2013-11-09 11:42:43 -07:00
Clinton Gormley	870346070e	[DOCS] Added compound_on_flush docs and updated compound_format docs to include note about accepting a float	2013-10-15 13:30:56 +02:00
Adrien Grand	f2d75654bf	Add clear warnings that only the default codec, postings format and doc values format have backward compatibility warranties.	2013-10-10 13:30:08 +02:00
Adrien Grand	4fa8f6f61f	Doc values integration. This commit allows for using Lucene doc values as a backend for field data, moving the cost of building field data from the refresh operation to indexing. In addition, Lucene doc values can be stored on disk (partially, or even entirely), so that memory management is done at the operating system level (file-system cache) instead of the JVM, avoiding long pauses during major collections due to large heaps. So far doc values are supported on numeric types and non-analyzed strings (index:no or index:not_analyzed). Under the hood, it uses SORTED_SET doc values which is the only type to support multi-valued fields. Since the field data API set is a bit wider than the doc values API set, some operations are not supported: - field data filtering: this will fail if doc values are enabled, - field data cache clearing, even for memory-based doc values formats, - getting the memory usage for a specific field, - knowing whether a field is actually multi-valued. This commit also allows for configuring doc-values formats on a per-field basis similarly to postings formats. In particular the doc values format of the _version field can be configured through its own field mapper (it used to be handled in UidFieldMapper previously). Closes #3806	2013-10-09 16:34:30 +02:00

1 2

58 Commits