OpenSearch

Commit Graph

Author	SHA1	Message	Date
Adrien Grand	9ea25df649	Switch to murmurhash3 to route documents to shards. We currently use the djb2 hash function in order to compute the shard a document should go to. Unfortunately this hash function is not very sophisticated and you can sometimes hit adversarial cases, such as numeric ids on 33 shards. Murmur3 generates hashes with a better distribution, which should avoid the adversarial cases. Here are some examples of how 100000 incremental ids are distributed to shards using either djb2 or murmur3. 5 shards: Murmur3: [19933, 19964, 19940, 20030, 20133] DJB: [20000, 20000, 20000, 20000, 20000] 3 shards: Murmur3: [33185, 33347, 33468] DJB: [30100, 30000, 39900] 33 shards: Murmur3: [2999, 3096, 2930, 2986, 3070, 3093, 3023, 3052, 3112, 2940, 3036, 2985, 3031, 3048, 3127, 2961, 2901, 3105, 3041, 3130, 3013, 3035, 3031, 3019, 3008, 3022, 3111, 3086, 3016, 2996, 3075, 2945, 2977] DJB: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 900, 900, 900, 900, 1000, 1000, 10000, 10000, 10000, 10000, 9100, 9100, 9100, 9100, 9000, 9000, 0, 0, 0, 0, 0, 0] Even if djb2 looks ideal in some cases (5 shards), the fact that the distribution of its hashes has some patterns can raise issues with some shard counts (eg. 3, or even worse 33). Some tests have been modified because they relied on implementation details of the routing hash function. Close #7954	2014-11-04 16:32:42 +01:00
javanna	f283a9f3a5	[TEST] specified number_of_shards 5 to make sure the two docs end up in different shards that can be refreshed separately	2014-04-07 19:01:19 +02:00
Luca Cavanna	e02635e9c2	[TEST] Fixed delete/50_refresh REST test to work against multiple nodes delete/50_refresh tests per shard refresh using refresh:true in delete api. We might run into troubles though if we have a replica that gets initialized after a doc was indexed and deleted, without a refresh, as that doc won't be found when searching against that specific replica shard (as a refresh happens automatically before a replica gets exposed as started).	2014-02-10 10:58:13 +01:00
Clinton Gormley	df94f08dab	Renamed "ok" and "not_ok" to "is_true" and "is_false" and tidied up the layout to make it easier to read	2013-07-01 15:58:23 +02:00
Clinton Gormley	262b047224	Added tests for delete	2013-06-28 17:22:32 +02:00

5 Commits