mirror of
https://github.com/honeymoose/OpenSearch.git
synced 2025-02-12 07:55:24 +00:00
fc9fb64ad5
Previously, we were using a simple CRC32 for the IDs of rollup documents. This is a very poor choice however, since 32bit IDs leads to collisions between documents very quickly. This commit moves Rollups over to a 128bit ID. The ID is a concatenation of all the keys in the document (similar to the rolling CRC before), hashed with 128bit Murmur3, then base64 encoded. Finally, the job ID and a delimiter (`$`) are prepended to the ID. This gurantees that there are 128bits per-job. 128bits should essentially remove all chances of collisions, and the prepended job ID means that _if_ there is a collision, it stays "within" the job. BWC notes: We can only upgrade the ID scheme after we know there has been a good checkpoint during indexing. We don't rely on a STARTED/STOPPED status since we can't guarantee that resulted from a real checkpoint, or other state. So we only upgrade the ID after we have reached a checkpoint state during an active index run, and only after the checkpoint has been confirmed. Once a job has been upgraded and checkpointed, the version increments and the new ID is used in the future. All new jobs use the new ID from the start
279 lines
7.7 KiB
Plaintext
279 lines
7.7 KiB
Plaintext
[role="xpack"]
|
|
[[rollup-get-job]]
|
|
=== Get Rollup Jobs API
|
|
++++
|
|
<titleabbrev>Get Job</titleabbrev>
|
|
++++
|
|
|
|
experimental[]
|
|
|
|
This API returns the configuration, stats and status of rollup jobs. The API can return the details for a single job,
|
|
or for all jobs.
|
|
|
|
Note: This API only returns active (both `STARTED` and `STOPPED`) jobs. If a job was created, ran for a while then deleted,
|
|
this API will not return any details about that job.
|
|
|
|
For details about a historical job, the <<rollup-get-rollup-caps,Rollup Capabilities API>> may be more useful
|
|
|
|
==== Request
|
|
|
|
`GET _xpack/rollup/job/<job_id>`
|
|
|
|
//===== Description
|
|
|
|
==== Path Parameters
|
|
|
|
`job_id`::
|
|
(string) Identifier for the job to retrieve. If omitted (or `_all` is used) all jobs will be returned
|
|
|
|
|
|
==== Request Body
|
|
|
|
There is no request body for the Get Jobs API.
|
|
|
|
==== Authorization
|
|
|
|
You must have `monitor`, `monitor_rollup`, `manage` or `manage_rollup` cluster privileges to use this API.
|
|
For more information, see
|
|
{xpack-ref}/security-privileges.html[Security Privileges].
|
|
|
|
==== Examples
|
|
|
|
If we have already created a rollup job named `sensor`, the details about the job can be retrieved with:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
GET _xpack/rollup/job/sensor
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[setup:sensor_rollup_job]
|
|
|
|
Which will yield the following response:
|
|
|
|
[source,js]
|
|
----
|
|
{
|
|
"jobs" : [
|
|
{
|
|
"config" : {
|
|
"id" : "sensor",
|
|
"index_pattern" : "sensor-*",
|
|
"rollup_index" : "sensor_rollup",
|
|
"cron" : "*/30 * * * * ?",
|
|
"groups" : {
|
|
"date_histogram" : {
|
|
"interval" : "1h",
|
|
"delay": "7d",
|
|
"field": "timestamp",
|
|
"time_zone": "UTC"
|
|
},
|
|
"terms" : {
|
|
"fields" : [
|
|
"node"
|
|
]
|
|
}
|
|
},
|
|
"metrics" : [
|
|
{
|
|
"field" : "temperature",
|
|
"metrics" : [
|
|
"min",
|
|
"max",
|
|
"sum"
|
|
]
|
|
},
|
|
{
|
|
"field" : "voltage",
|
|
"metrics" : [
|
|
"avg"
|
|
]
|
|
}
|
|
],
|
|
"timeout" : "20s",
|
|
"page_size" : 1000
|
|
},
|
|
"status" : {
|
|
"job_state" : "stopped",
|
|
"upgraded_doc_id": true
|
|
},
|
|
"stats" : {
|
|
"pages_processed" : 0,
|
|
"documents_processed" : 0,
|
|
"rollups_indexed" : 0,
|
|
"trigger_count" : 0
|
|
}
|
|
}
|
|
]
|
|
}
|
|
----
|
|
// TESTRESPONSE
|
|
|
|
The `jobs` array contains a single job (`id: sensor`) since we requested a single job in the endpoint's URL. The
|
|
details for this job contain three top-level parameters: `config`, `status` and `stats`
|
|
|
|
`config` holds the rollup job's configuration, which is identical to the configuration that was supplied when creating
|
|
the job via the <<rollup-put-job,Create Job API>>.
|
|
|
|
The `status` object holds the current status of the rollup job's indexer. The possible values and their meanings are:
|
|
|
|
- `stopped` means the indexer is paused and will not process data, even if it's cron interval triggers
|
|
- `started` means the indexer is running, but not actively indexing data. When the cron interval triggers, the job's
|
|
indexer will begin to process data
|
|
- `indexing` means the indexer is actively processing data and creating new rollup documents. When in this state, any
|
|
subsequent cron interval triggers will be ignored because the job is already active with the prior trigger
|
|
- `abort` a transient state, which is usually not witnessed by the user. The `abort` state is used if the task needs to
|
|
be shut down for some reason (job has been deleted, an unrecoverable error has been encountered, etc). Shortly after
|
|
the `abort` state is set, the job will remove itself from the cluster
|
|
|
|
Finally, the `stats` object provides transient statistics about the rollup job, such as how many documents have been
|
|
processed and how many rollup summary docs have been indexed. These stats are not persisted, so if a node is restarted
|
|
these stats will be reset.
|
|
|
|
If we add another job, we can see how multi-job responses are handled:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
PUT _xpack/rollup/job/sensor2 <1>
|
|
{
|
|
"index_pattern": "sensor-*",
|
|
"rollup_index": "sensor_rollup",
|
|
"cron": "*/30 * * * * ?",
|
|
"page_size" :1000,
|
|
"groups" : {
|
|
"date_histogram": {
|
|
"field": "timestamp",
|
|
"interval": "1h",
|
|
"delay": "7d"
|
|
},
|
|
"terms": {
|
|
"fields": ["node"]
|
|
}
|
|
},
|
|
"metrics": [
|
|
{
|
|
"field": "temperature",
|
|
"metrics": ["min", "max", "sum"]
|
|
},
|
|
{
|
|
"field": "voltage",
|
|
"metrics": ["avg"]
|
|
}
|
|
]
|
|
}
|
|
|
|
GET _xpack/rollup/job/_all <2>
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[setup:sensor_rollup_job]
|
|
<1> We create a second job with name `sensor2`
|
|
<2> Then request all jobs by using `_all` in the GetJobs API
|
|
|
|
Which will yield the following response:
|
|
|
|
[source,js]
|
|
----
|
|
{
|
|
"jobs" : [
|
|
{
|
|
"config" : {
|
|
"id" : "sensor2",
|
|
"index_pattern" : "sensor-*",
|
|
"rollup_index" : "sensor_rollup",
|
|
"cron" : "*/30 * * * * ?",
|
|
"groups" : {
|
|
"date_histogram" : {
|
|
"interval" : "1h",
|
|
"delay": "7d",
|
|
"field": "timestamp",
|
|
"time_zone": "UTC"
|
|
},
|
|
"terms" : {
|
|
"fields" : [
|
|
"node"
|
|
]
|
|
}
|
|
},
|
|
"metrics" : [
|
|
{
|
|
"field" : "temperature",
|
|
"metrics" : [
|
|
"min",
|
|
"max",
|
|
"sum"
|
|
]
|
|
},
|
|
{
|
|
"field" : "voltage",
|
|
"metrics" : [
|
|
"avg"
|
|
]
|
|
}
|
|
],
|
|
"timeout" : "20s",
|
|
"page_size" : 1000
|
|
},
|
|
"status" : {
|
|
"job_state" : "stopped",
|
|
"upgraded_doc_id": true
|
|
},
|
|
"stats" : {
|
|
"pages_processed" : 0,
|
|
"documents_processed" : 0,
|
|
"rollups_indexed" : 0,
|
|
"trigger_count" : 0
|
|
}
|
|
},
|
|
{
|
|
"config" : {
|
|
"id" : "sensor",
|
|
"index_pattern" : "sensor-*",
|
|
"rollup_index" : "sensor_rollup",
|
|
"cron" : "*/30 * * * * ?",
|
|
"groups" : {
|
|
"date_histogram" : {
|
|
"interval" : "1h",
|
|
"delay": "7d",
|
|
"field": "timestamp",
|
|
"time_zone": "UTC"
|
|
},
|
|
"terms" : {
|
|
"fields" : [
|
|
"node"
|
|
]
|
|
}
|
|
},
|
|
"metrics" : [
|
|
{
|
|
"field" : "temperature",
|
|
"metrics" : [
|
|
"min",
|
|
"max",
|
|
"sum"
|
|
]
|
|
},
|
|
{
|
|
"field" : "voltage",
|
|
"metrics" : [
|
|
"avg"
|
|
]
|
|
}
|
|
],
|
|
"timeout" : "20s",
|
|
"page_size" : 1000
|
|
},
|
|
"status" : {
|
|
"job_state" : "stopped",
|
|
"upgraded_doc_id": true
|
|
},
|
|
"stats" : {
|
|
"pages_processed" : 0,
|
|
"documents_processed" : 0,
|
|
"rollups_indexed" : 0,
|
|
"trigger_count" : 0
|
|
}
|
|
}
|
|
]
|
|
}
|
|
----
|
|
// NOTCONSOLE
|