mirror of
https://github.com/honeymoose/OpenSearch.git
synced 2025-02-14 17:05:36 +00:00
Unlike `classification`, which is using a cross validation splitter that produces training sets whose size is predictable and equal to `training_percent * class_cardinality`, for regression we have been using a random splitter that takes an independent decision for each document. This means we cannot predict the exact size of the training set. This poses a problem as we move towards performing test inference on the java side as we need to be able to provide an accurate upper bound of the training set size to the c++ process. This commit replaces the random splitter we use for regression with the same streaming-reservoir approach we do for `classification`. Backport of #58331
Elastic License Functionality
This directory tree contains files subject to the Elastic License. The files subject to the Elastic License are grouped in this directory to clearly separate them from files licensed under the Apache License 2.0.