Merge pull request #446 from opensearch-project/transforms-changes

Added new continuous parameter to transforms
2022-03-17 13:22:26 -05:00 · 2022-03-17 13:22:26 -05:00 · 8a67fb1726
parent 27f322caab 8e30e8591d
commit 8a67fb1726
2 changed files with 18 additions and 6 deletions
--- a/_im-plugin/index-transforms/index.md
+++ b/_im-plugin/index-transforms/index.md
@ -16,7 +16,7 @@ For example, suppose that you have airline data that’s scattered across multip
 You can use transform jobs in two ways:

 1. Use the OpenSearch Dashboards UI to specify the index you want to transform and any optional data filters you want to use to filter the original index. Then select the fields you want to transform and the aggregations to use in the transformation. Finally, define a schedule for your job to follow.
-2. Use the transforms API to specify all the details about your job: the index you want to transform, target groups you want the transformed index to have, any aggregations you want to use to group columns, and a schedule for your job to follow.
+1. Use the transforms API to specify all the details about your job: the index you want to transform, target groups you want the transformed index to have, any aggregations you want to use to group columns, and a schedule for your job to follow.

 OpenSearch Dashboards provides a detailed summary of the jobs you created and their relevant information, such as associated indices and job statuses. You can review and edit your job’s details and selections before creation, and even preview a transformed index’s data as you’re choosing which fields to transform. However, you can also use the REST API to create transform jobs and preview transform job results, but you must know all of the necessary settings and parameters to submit them as part of the HTTP request body. Submitting your transform job configurations as JSON scripts offers you more portability, allowing you to share and replicate your transform jobs, which is harder to do using OpenSearch Dashboards.

@ -44,16 +44,17 @@ On the other hand, aggregations let you perform simple calculations. For example

    Currently, transform jobs support histogram, date_histogram, and terms groupings. For more information about groupings, see [Bucket Aggregations]({{site.url}}{{site.baseurl}}/opensearch/bucket-agg/). In terms of aggregations, you can select from `sum`, `avg`, `max`, `min`, `value_count`, `percentiles`, and `scripted_metric`. For more information about aggregations, see [Metric Aggregations]({{site.url}}{{site.baseurl}}/opensearch/metric-agg/).

-2. Repeat step 1 for any other fields that you want to transform.
-3. After selecting the fields that you want to transform and verifying the transformation, choose **Next**.
+1. Repeat step 1 for any other fields that you want to transform.
+1. After selecting the fields that you want to transform and verifying the transformation, choose **Next**.

 ### Step 3: Specify a schedule

 You can configure transform jobs to run once or multiple times on a schedule. Transform jobs are enabled by default.

-1. For **transformation execution interval**, specify a transform interval in minutes, hours, or days.
-2. Under **Advanced**, specify an optional amount for **Pages per execution**. A larger number means more data is processed in each search request, but also uses more memory and causes higher latency. Exceeding allowed memory limits can cause exceptions and errors to occur.
-3. Choose **Next**.
+1. Choose whether the job should be **continuous**. Continuous jobs execute at each **transform execution interval** and incrementally transform newly modified buckets, which can include new data added to the source indexes. Non-continuous jobs execute only once.
+1. For **transformation execution interval**, specify a transform interval in minutes, hours, or days. This interval dicatates how often continuous jobs should execute, and non-continuous jobs execute once after the interval elapses.
+1. Under **Advanced**, specify an optional amount for **Pages per execution**. A larger number means more data is processed in each search request, but also uses more memory and causes higher latency. Exceeding allowed memory limits can cause exceptions and errors to occur.
+1. Choose **Next**.

 ### Step 4: Review and confirm details

--- a/_im-plugin/index-transforms/transforms-apis.md
+++ b/_im-plugin/index-transforms/transforms-apis.md
@ -28,6 +28,7 @@ PUT _plugins/_transform/<transform_id>
 {
  "transform": {
  "enabled": true,
+  "continuous": true,
  "schedule": {
    "interval": {
    "period": 1,
@ -78,6 +79,7 @@ PUT _plugins/_transform/<transform_id>
  "transform": {
    "transform_id": "sample",
    "schema_version": 7,
+    "continuous": true,
    "schedule": {
      "interval": {
        "start_time": 1621467964243,
@ -128,6 +130,7 @@ You can specify the following options in the HTTP request body:
 Option | Data Type | Description | Required
 :--- | :--- | :--- | :---
 enabled | Boolean | If true, the transform job is enabled at creation. | No
+continuous | Boolean | Specifies whether the transform job should be continuous. Continuous jobs execute every time they are scheduled according to the `schedule` option and run based off of newly transformed buckets and any new data added to source indexes. Non-continuous jobs execute only once. Default is false. | No
 schedule | Object | The schedule the transform job runs on. | Yes
 start_time | Integer | The Unix epoch time of the transform job's start time. | Yes
 description | String | Describes the transform job. | No
@ -291,6 +294,7 @@ GET _plugins/_transform/<transform_id>
  "transform": {
    "transform_id": "sample",
    "schema_version": 7,
+    "continuous": true,
    "schedule": {
      "interval": {
        "start_time": 1621467964243,
@ -358,6 +362,7 @@ GET _plugins/_transform/
      "transform": {
        "transform_id": "sample",
        "schema_version": 7,
+        "continuous": true,
        "schedule": {
          "interval": {
            "start_time": 1621467964243,
@ -595,6 +600,12 @@ GET _plugins/_transform/<transform_id>/_explain
  "sample": {
    "metadata_id": "PzmjweME5xbgkenl9UpsYw",
    "transform_metadata": {
+      "continuous_stats": {
+        "last_timestamp": 1621883525672,
+        "documents_behind": {
+          "sample_index": 72
+          }
+      },
      "transform_id": "sample",
      "last_updated_at": 1621883525873,
      "status": "finished",