OpenSearch/docs/reference
Adrien Grand 9ea25df649 Switch to murmurhash3 to route documents to shards.
We currently use the djb2 hash function in order to compute the shard a
document should go to. Unfortunately this hash function is not very
sophisticated and you can sometimes hit adversarial cases, such as numeric ids
on 33 shards.

Murmur3 generates hashes with a better distribution, which should avoid the
adversarial cases.

Here are some examples of how 100000 incremental ids are distributed to shards
using either djb2 or murmur3.

5 shards:
Murmur3: [19933, 19964, 19940, 20030, 20133]
DJB:     [20000, 20000, 20000, 20000, 20000]

3 shards:
Murmur3: [33185, 33347, 33468]
DJB:     [30100, 30000, 39900]

33 shards:
Murmur3: [2999, 3096, 2930, 2986, 3070, 3093, 3023, 3052, 3112, 2940, 3036, 2985, 3031, 3048, 3127, 2961, 2901, 3105, 3041, 3130, 3013, 3035, 3031, 3019, 3008, 3022, 3111, 3086, 3016, 2996, 3075, 2945, 2977]
DJB:     [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 900, 900, 900, 900, 1000, 1000, 10000, 10000, 10000, 10000, 9100, 9100, 9100, 9100, 9000, 9000, 0, 0, 0, 0, 0, 0]

Even if djb2 looks ideal in some cases (5 shards), the fact that the
distribution of its hashes has some patterns can raise issues with some shard
counts (eg. 3, or even worse 33).

Some tests have been modified because they relied on implementation details of
the routing hash function.

Close #7954
2014-11-04 16:32:42 +01:00
..
analysis Docs: Add mention of `hyphenation_patterns_path` 2014-11-01 15:47:53 +01:00
cat Cat Recovery API: Reverting changes introduced with commit e1c75bae87 2014-10-27 20:49:58 +01:00
cluster Update update-settings.asciidoc 2014-10-22 12:46:33 +02:00
docs Docs: Document action.replication_type setting 2014-10-31 13:53:34 +01:00
images Docs: fix formula typo in documentation of the gauss decay in function_score 2014-09-01 09:46:13 +02:00
index-modules Docs: Fix curl statements in query-cache.asciidoc 2014-10-15 13:16:20 +02:00
indices Fix missing word in upgrade docs. 2014-11-03 11:44:41 -08:00
mapping Update source-field.asciidoc 2014-10-29 14:51:05 +01:00
migration Switch to murmurhash3 to route documents to shards. 2014-11-04 16:32:42 +01:00
modules Update cluster.asciidoc - fix invalid asciidoc 2014-11-04 15:22:36 +01:00
query-dsl MLT Field Query: remove it from master 2014-10-29 10:19:00 +01:00
search [Aggregations] Meta data support 2014-11-03 22:32:23 +01:00
setup Docs: rolling upgrade process seems incorrect 2014-10-24 16:45:42 +02:00
testing Docs: Removed all the added/deprecated tags from 1.x 2014-09-26 21:04:42 +02:00
analysis.asciidoc Add more anchor links to documentation 2013-09-30 13:13:16 -06:00
api-conventions.asciidoc Docs: Removed all the added/deprecated tags from 1.x 2014-09-26 21:04:42 +02:00
cat.asciidoc [DOCS] reordered cat apis menu 2014-06-03 11:06:35 +02:00
cluster.asciidoc [DOCS] Fix HTTP endpoints after stats API changes 2014-01-09 11:30:28 +01:00
docs.asciidoc Bulk UDP: Removal. 2014-09-11 09:52:09 +02:00
getting-started.asciidoc Docs: Update getting-started.asciidoc 2014-10-29 15:04:13 +01:00
glossary.asciidoc Migrated documentation into the main repo 2013-08-29 01:24:34 +02:00
index-modules.asciidoc [DOCS] Remove the section about codecs. 2014-08-07 11:24:44 +02:00
index.asciidoc Core: Added the `index.query.parse.allow_unmapped_fields` setting to fail queries if they refer to unmapped fields. 2014-09-09 15:00:47 +02:00
indices.asciidoc Add forgotten include for upgrade docs. 2014-10-10 10:55:45 -07:00
mapping.asciidoc Facets: Removal from master. 2014-08-21 10:34:39 +02:00
modules.asciidoc [DOCS] Fixed link to tribe.asciidoc 2014-01-13 22:01:12 +01:00
query-dsl.asciidoc Facets: Removal from master. 2014-08-21 10:34:39 +02:00
search.asciidoc Search Exists API: Checks if any matching documents exist for a given query 2014-07-31 15:42:30 -04:00
setup.asciidoc Docs: Removed all the added/deprecated tags from 1.x 2014-09-26 21:04:42 +02:00
testing.asciidoc [DOCS] Test framework documentation 2013-12-02 18:01:45 +01:00