druid/examples/bin/generate-example-metrics

66 lines
1.9 KiB
Plaintext
Raw Normal View History

allow string dimension indexer to handle byte[] as base64 strings (#13573) This PR expands `StringDimensionIndexer` to handle conversion of `byte[]` to base64 encoded strings, rather than the current behavior of calling java `toString`. This issue was uncovered by a regression of sorts introduced by #13519, which updated the protobuf extension to directly convert stuff to java types, resulting in `bytes` typed values being converted as `byte[]` instead of a base64 string which the previous JSON based conversion created. While outputting `byte[]` is more consistent with other input formats, and preferable when the bytes can be consumed directly (such as complex types serde), when fed to a `StringDimensionIndexer`, it resulted in an ugly java `toString` because `processRowValsToUnsortedEncodedKeyComponent` is fed the output of `row.getRaw(..)`. Converting `byte[]` to a base64 string within `StringDimensionIndexer` is consistent with the behavior of calling `row.getDimension(..)` which does do this coercion (and why many tests on binary types appeared to be doing the expected thing). I added some protobuf `bytes` tests, but they don't really hit the new `StringDimensionIndexer` behavior because they operate on the `InputRow` directly, and call `getDimension` to validate stuff. The parser based version still uses the old conversion mechanisms, so when not using a flattener incorrectly calls `toString` on the `ByteString`. I have encoded this behavior in the test for now, if we either update the parser to use the new flattener or just .. remove parsers we can remove this test stuff.
2022-12-16 04:20:17 -05:00
#!/usr/bin/env python3
2016-01-06 00:27:52 -05:00
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
2016-01-06 00:27:52 -05:00
import argparse
import json
import random
import sys
from datetime import datetime
def main():
parser = argparse.ArgumentParser(description='Generate example page request latency metrics.')
parser.add_argument('--count', '-c', type=int, default=25, help='Number of events to generate (negative for unlimited)')
args = parser.parse_args()
count = 0
while args.count < 0 or count < args.count:
timestamp = datetime.utcnow().strftime("%Y-%m-%dT%H:%M:%SZ")
r = random.randint(1, 4)
if r == 1 or r == 2:
page = '/'
elif r == 3:
page = '/list'
else:
page = '/get/' + str(random.randint(1, 99))
server = 'www' + str(random.randint(1, 5)) + '.example.com'
latency = max(1, random.gauss(80, 40))
print(json.dumps({
'timestamp': timestamp,
'metricType': 'request/latency',
'value': int(latency),
# Additional dimensions
'page': page,
'server': server,
'http_method': 'GET',
'http_code': '200',
'unit': 'milliseconds'
}))
count += 1
try:
main()
except KeyboardInterrupt:
sys.exit(1)