9ea25df649
We currently use the djb2 hash function in order to compute the shard a document should go to. Unfortunately this hash function is not very sophisticated and you can sometimes hit adversarial cases, such as numeric ids on 33 shards. Murmur3 generates hashes with a better distribution, which should avoid the adversarial cases. Here are some examples of how 100000 incremental ids are distributed to shards using either djb2 or murmur3. 5 shards: Murmur3: [19933, 19964, 19940, 20030, 20133] DJB: [20000, 20000, 20000, 20000, 20000] 3 shards: Murmur3: [33185, 33347, 33468] DJB: [30100, 30000, 39900] 33 shards: Murmur3: [2999, 3096, 2930, 2986, 3070, 3093, 3023, 3052, 3112, 2940, 3036, 2985, 3031, 3048, 3127, 2961, 2901, 3105, 3041, 3130, 3013, 3035, 3031, 3019, 3008, 3022, 3111, 3086, 3016, 2996, 3075, 2945, 2977] DJB: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 900, 900, 900, 900, 1000, 1000, 10000, 10000, 10000, 10000, 9100, 9100, 9100, 9100, 9000, 9000, 0, 0, 0, 0, 0, 0] Even if djb2 looks ideal in some cases (5 shards), the fact that the distribution of its hashes has some patterns can raise issues with some shard counts (eg. 3, or even worse 33). Some tests have been modified because they relied on implementation details of the routing hash function. Close #7954 |
||
---|---|---|
.. | ||
api | ||
test | ||
utils | ||
.gitignore | ||
LICENSE.txt | ||
README.markdown |
README.markdown
Elasticsearch REST API JSON specification
This repository contains a collection of JSON files which describe the Elasticsearch HTTP API.
Their purpose is to formalize and standardize the API, to facilitate development of libraries and integrations.
Example for the "Create Index" API:
{
"indices.create": {
"documentation": "http://www.elasticsearch.org/guide/reference/api/admin-indices-create-index/",
"methods": ["PUT", "POST"],
"url": {
"path": "/{index}",
"paths": ["/{index}"],
"parts": {
"index": {
"type" : "string",
"required" : true,
"description" : "The name of the index"
}
},
"params": {
"timeout": {
"type" : "time",
"description" : "Explicit operation timeout"
}
}
},
"body": {
"description" : "The configuration for the index (`settings` and `mappings`)"
}
}
}
The specification contains:
- The name of the API (
indices.create
), which usually corresponds to the client calls - Link to the documentation at http://elasticsearch.org
- List of HTTP methods for the endpoint
- URL specification: path, parts, parameters
- Whether body is allowed for the endpoint or not and its description
The methods
and url.paths
elements list all possible HTTP methods and URLs for the endpoint;
it is the responsibility of the developer to use this information for a sensible API on the target platform.
Utilities
The repository contains some utilities in the utils
directory:
- The
thor api:generate:spec
will generate the basic JSON specification from Java source code - The
thor api:generate:code
generates Ruby source code and tests from the specs, and can be extended to generate assets in another programming language
Run bundle install
and then thor list
in the utils folder.
The full command to generate the api spec is:
thor api:spec:generate --output=myfolder --elasticsearch=/path/to/es
License
This software is licensed under the Apache 2 license.