2015-04-09 19:14:48 -04:00
|
|
|
[[search-field-stats]]
|
|
|
|
== Field stats API
|
|
|
|
|
|
|
|
experimental[]
|
|
|
|
|
2015-05-24 09:08:27 -04:00
|
|
|
The field stats api allows one to find statistical properties of a field
|
|
|
|
without executing a search, but looking up measurements that are natively
|
|
|
|
available in the Lucene index. This can be useful to explore a dataset which
|
|
|
|
you don't know much about. For example, this allows creating a histogram
|
|
|
|
aggregation with meaningful intervals based on the min/max range of values.
|
2015-04-09 19:14:48 -04:00
|
|
|
|
2015-05-24 09:08:27 -04:00
|
|
|
The field stats api by defaults executes on all indices, but can execute on
|
|
|
|
specific indices too.
|
2015-04-09 19:14:48 -04:00
|
|
|
|
|
|
|
All indices:
|
|
|
|
|
|
|
|
[source,js]
|
|
|
|
--------------------------------------------------
|
|
|
|
curl -XGET "http://localhost:9200/_field_stats?fields=rating"
|
|
|
|
--------------------------------------------------
|
|
|
|
|
|
|
|
Specific indices:
|
|
|
|
|
|
|
|
[source,js]
|
|
|
|
--------------------------------------------------
|
|
|
|
curl -XGET "http://localhost:9200/index1,index2/_field_stats?fields=rating"
|
|
|
|
--------------------------------------------------
|
|
|
|
|
|
|
|
Supported request options:
|
2015-04-23 04:50:01 -04:00
|
|
|
|
2015-04-26 09:51:52 -04:00
|
|
|
[horizontal]
|
2015-05-24 09:08:27 -04:00
|
|
|
`fields`:: A list of fields to compute stats for.
|
|
|
|
`level`:: Defines if field stats should be returned on a per index level or on a
|
|
|
|
cluster wide level. Valid values are `indices` and `cluster` (default).
|
2015-04-09 19:14:48 -04:00
|
|
|
|
2015-05-24 09:08:27 -04:00
|
|
|
[float]
|
2015-04-23 03:57:31 -04:00
|
|
|
=== Field statistics
|
2015-04-09 19:14:48 -04:00
|
|
|
|
|
|
|
The field stats api is supported on string based, number based and date based fields and can return the following statistics per field:
|
|
|
|
|
2015-04-26 09:51:52 -04:00
|
|
|
[horizontal]
|
|
|
|
`max_doc`::
|
|
|
|
|
|
|
|
The total number of documents.
|
|
|
|
|
|
|
|
`doc_count`::
|
|
|
|
|
|
|
|
The number of documents that have at least one term for this field, or -1 if
|
|
|
|
this measurement isn't available on one or more shards.
|
|
|
|
|
|
|
|
`density`::
|
|
|
|
|
|
|
|
The percentage of documents that have at least one value for this field. This
|
|
|
|
is a derived statistic and is based on the `max_doc` and `doc_count`.
|
|
|
|
|
|
|
|
`sum_doc_freq`::
|
|
|
|
|
|
|
|
The sum of each term's document frequency in this field, or -1 if this
|
2015-05-24 09:08:27 -04:00
|
|
|
measurement isn't available on one or more shards.
|
|
|
|
Document frequency is the number of documents containing a particular term.
|
2015-04-26 09:51:52 -04:00
|
|
|
|
|
|
|
`sum_total_term_freq`::
|
|
|
|
|
|
|
|
The sum of the term frequencies of all terms in this field across all
|
2015-05-24 09:08:27 -04:00
|
|
|
documents, or -1 if this measurement isn't available on one or more shards.
|
2015-04-26 09:51:52 -04:00
|
|
|
Term frequency is the total number of occurrences of a term in a particular
|
|
|
|
document and field.
|
|
|
|
|
|
|
|
`min_value`::
|
|
|
|
|
|
|
|
The lowest value in the field represented in a displayable form.
|
|
|
|
|
|
|
|
`max_value`::
|
|
|
|
|
|
|
|
The highest value in the field represented in a displayable form.
|
|
|
|
|
2015-05-24 09:08:27 -04:00
|
|
|
NOTE: Documents marked as deleted (but not yet removed by the merge process)
|
|
|
|
still affect all the mentioned statistics.
|
2015-04-09 19:14:48 -04:00
|
|
|
|
2015-05-24 09:08:27 -04:00
|
|
|
|
|
|
|
.Cluster level field statistics
|
|
|
|
==================================================
|
2015-04-09 19:14:48 -04:00
|
|
|
|
|
|
|
[source,js]
|
|
|
|
--------------------------------------------------
|
2015-05-24 09:08:27 -04:00
|
|
|
GET /_field_stats?fields=rating,answer_count,creation_date,display_name
|
2015-04-09 19:14:48 -04:00
|
|
|
--------------------------------------------------
|
|
|
|
|
2015-05-24 09:08:27 -04:00
|
|
|
[source,json]
|
2015-04-09 19:14:48 -04:00
|
|
|
--------------------------------------------------
|
|
|
|
{
|
|
|
|
"_shards": {
|
|
|
|
"total": 1,
|
|
|
|
"successful": 1,
|
|
|
|
"failed": 0
|
|
|
|
},
|
|
|
|
"indices": {
|
|
|
|
"_all": { <1>
|
|
|
|
"fields": {
|
|
|
|
"creation_date": {
|
|
|
|
"max_doc": 1326564,
|
|
|
|
"doc_count": 564633,
|
|
|
|
"density": 42,
|
|
|
|
"sum_doc_freq": 2258532,
|
|
|
|
"sum_total_term_freq": -1,
|
|
|
|
"min_value": "2008-08-01T16:37:51.513Z",
|
|
|
|
"max_value": "2013-06-02T03:23:11.593Z"
|
|
|
|
},
|
|
|
|
"display_name": {
|
|
|
|
"max_doc": 1326564,
|
|
|
|
"doc_count": 126741,
|
|
|
|
"density": 9,
|
|
|
|
"sum_doc_freq": 166535,
|
|
|
|
"sum_total_term_freq": 166616,
|
|
|
|
"min_value": "0",
|
|
|
|
"max_value": "정혜선"
|
|
|
|
},
|
|
|
|
"answer_count": {
|
|
|
|
"max_doc": 1326564,
|
|
|
|
"doc_count": 139885,
|
|
|
|
"density": 10,
|
|
|
|
"sum_doc_freq": 559540,
|
|
|
|
"sum_total_term_freq": -1,
|
|
|
|
"min_value": 0,
|
|
|
|
"max_value": 160
|
|
|
|
},
|
|
|
|
"rating": {
|
|
|
|
"max_doc": 1326564,
|
|
|
|
"doc_count": 437892,
|
|
|
|
"density": 33,
|
|
|
|
"sum_doc_freq": 1751568,
|
|
|
|
"sum_total_term_freq": -1,
|
|
|
|
"min_value": -14,
|
|
|
|
"max_value": 1277
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
--------------------------------------------------
|
|
|
|
|
|
|
|
<1> The `_all` key indicates that it contains the field stats of all indices in the cluster.
|
2015-05-24 09:08:27 -04:00
|
|
|
==================================================
|
2015-04-09 19:14:48 -04:00
|
|
|
|
2015-05-24 09:08:27 -04:00
|
|
|
.Indices level field statistics
|
|
|
|
==================================================
|
2015-04-09 19:14:48 -04:00
|
|
|
|
|
|
|
[source,js]
|
|
|
|
--------------------------------------------------
|
2015-05-24 09:08:27 -04:00
|
|
|
GET /_field_stats?fields=rating,answer_count,creation_date,display_name&level=indices
|
2015-04-09 19:14:48 -04:00
|
|
|
--------------------------------------------------
|
|
|
|
|
|
|
|
[source,js]
|
|
|
|
--------------------------------------------------
|
|
|
|
{
|
|
|
|
"_shards": {
|
|
|
|
"total": 1,
|
|
|
|
"successful": 1,
|
|
|
|
"failed": 0
|
|
|
|
},
|
|
|
|
"indices": {
|
|
|
|
"stack": { <1>
|
|
|
|
"fields": {
|
|
|
|
"creation_date": {
|
|
|
|
"max_doc": 1326564,
|
|
|
|
"doc_count": 564633,
|
|
|
|
"density": 42,
|
|
|
|
"sum_doc_freq": 2258532,
|
|
|
|
"sum_total_term_freq": -1,
|
|
|
|
"min_value": "2008-08-01T16:37:51.513Z",
|
|
|
|
"max_value": "2013-06-02T03:23:11.593Z"
|
|
|
|
},
|
|
|
|
"display_name": {
|
|
|
|
"max_doc": 1326564,
|
|
|
|
"doc_count": 126741,
|
|
|
|
"density": 9,
|
|
|
|
"sum_doc_freq": 166535,
|
|
|
|
"sum_total_term_freq": 166616,
|
|
|
|
"min_value": "0",
|
|
|
|
"max_value": "정혜선"
|
|
|
|
},
|
|
|
|
"answer_count": {
|
|
|
|
"max_doc": 1326564,
|
|
|
|
"doc_count": 139885,
|
|
|
|
"density": 10,
|
|
|
|
"sum_doc_freq": 559540,
|
|
|
|
"sum_total_term_freq": -1,
|
|
|
|
"min_value": 0,
|
|
|
|
"max_value": 160
|
|
|
|
},
|
|
|
|
"rating": {
|
|
|
|
"max_doc": 1326564,
|
|
|
|
"doc_count": 437892,
|
|
|
|
"density": 33,
|
|
|
|
"sum_doc_freq": 1751568,
|
|
|
|
"sum_total_term_freq": -1,
|
|
|
|
"min_value": -14,
|
|
|
|
"max_value": 1277
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
--------------------------------------------------
|
|
|
|
|
2015-05-24 09:08:27 -04:00
|
|
|
<1> The `stack` key means it contains all field stats for the `stack` index.
|
|
|
|
|
|
|
|
==================================================
|