OpenSearch/x-pack/docs
Zachary Tong fc9fb64ad5
[Rollup] Improve ID scheme for rollup documents (#32558)
Previously, we were using a simple CRC32 for the IDs of rollup documents.
This is a very poor choice however, since 32bit IDs leads to collisions
between documents very quickly.

This commit moves Rollups over to a 128bit ID.  The ID is a concatenation
of all the keys in the document (similar to the rolling CRC before),
hashed with 128bit Murmur3, then base64 encoded.  Finally, the job
ID and a delimiter (`$`) are prepended to the ID.

This gurantees that there are 128bits per-job.  128bits should
essentially remove all chances of collisions, and the prepended
job ID means that _if_ there is a collision, it stays "within"
the job.

BWC notes:

We can only upgrade the ID scheme after we know there has been a good
checkpoint during indexing.  We don't rely on a STARTED/STOPPED
status since we can't guarantee that resulted from a real checkpoint,
or other state.  So we only upgrade the ID after we have reached
a checkpoint state during an active index run, and only after the
checkpoint has been confirmed.

Once a job has been upgraded and checkpointed, the version increments
and the new ID is used in the future.  All new jobs use the
new ID from the start
2018-08-03 11:13:25 -04:00
..
en [Rollup] Improve ID scheme for rollup documents (#32558) 2018-08-03 11:13:25 -04:00
src/test/java/org/elasticsearch/smoketest [DOCS] Check for Windows and *nix file paths (#31648) 2018-07-02 13:10:52 +01:00
build.gradle [ML][DOCS] Add documentation for detector rules and filters (#32013) 2018-07-25 16:10:32 +01:00