This adds a few things to the `breakdown` of the profiler:
* `histogram` aggregations now contain `total_buckets` which is the
count of buckets that they collected. This could be useful when
debugging a histogram inside of another bucketing agg that is fairly
selective.
* All bucketing aggs that can delay their sub-aggregations will now add
a list of delayed sub-aggregations. This is useful because we
sometimes have fairly involved logic around which sub-aggregations get
delayed and this will save you from having to guess.
* Aggregtations wrapped in the `MultiBucketAggregatorWrapper` can't
accurately add anything to the breakdown. Instead they the wrapper
adds a marker entry `"multi_bucket_aggregator_wrapper": true` so we
can be quickly pick out such aggregations when debugging.
It also fixes a bug where `_count` breakdown entries were contributing
to the overall `time_in_nanos`. They didn't add a large amount of time
so it is unlikely that this caused a big problem, but I was there.
To support the arbitrary breakdown data this reworks the profiler so
that the `breakdown` can contain any data that is supported by
`StreamOutput#writeGenericValue(Object)` and
`XContentBuilder#value(Object)`.