druid/website/.spelling

1943 lines
30 KiB
Plaintext
Raw Normal View History

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# markdown-spellcheck spelling configuration file
# Format - lines beginning # are comments
# global dictionary is at the start, file overrides afterwards
# one word per line, to define a file override use ' - filename'
# where filename is relative to this configuration file
32-bit
500MiB
64-bit
ACL
ACLs
APIs
AvroStorage
ARN
AWS
AWS_CONTAINER_CREDENTIALS_RELATIVE_URI
AWS_CONTAINER_CREDENTIALS_FULL_URI
Actian
Authorizer
Avatica
Avro
Azul
BCP
Base64
Base64-encoded
ByteBuffer
CIDR
CORS
CNF
CPUs
CSVs
Ceph
CloudWatch
ColumnDescriptor
Corretto
DDL
DML
DNS
DRUIDVERSION
DataSketches
DateTime
DateType
DimensionSpec
DimensionSpecs
Dockerfile
DogStatsD
Double.NEGATIVE_INFINITY
Double.NEGATIVE_INFINITY.
Double.POSITIVE_INFINITY
Double.POSITIVE_INFINITY.
Dropwizard
dropwizard
DruidInputSource
DruidSQL
DynamicConfigProvider
EC2
EC2ContainerCredentialsProviderWrapper
ECS
EMR
EMRFS
ETL
Elasticsearch
Enums
FirehoseFactory
FlattenSpec
Float.NEGATIVE_INFINITY
Float.POSITIVE_INFINITY
ForwardedRequestCustomizer
GC
GPG
GSSAPI
GUIs
GroupBy
Guice
HDFS
HDFSFirehose
HLL
HashSet
Homebrew
HyperLogLog
IAM
IANA
IETF
IP
IPv4
IS_BROADCAST
IS_JOINABLE
IS0
ISO-8601
ISO8601
IndexSpec
IndexTask
InfluxDB
InputFormat
InputSource
InputSources
Integer.MAX_VALUE
JBOD
JDBC
JDK
JDK7
JDK8
JKS
JMX
JRE
JS
JSON
JsonPath
JSSE
JVM
JVMs
Joda
JsonProperty
Jupyter
KMS
Kerberized
Kerberos
KeyStores
Kinesis
Kubernetes
LRU
LZ4
LZO
LimitSpec
Long.MAX_VALUE
Long.MIN_VALUE
Lucene
MapBD
MapDB
MariaDB
MiddleManager
MiddleManagers
Montréal
Murmur3
MVCC
NFS
OCF
OLAP
OOMs
OpenJDK
OpenLDAP
OpenTSDB
OutputStream
ParAccel
ParseSpec
ParseSpecs
Protobuf
pull-deps
RDBMS
RDDs
RDS
Rackspace
Redis
S3
SDK
SIGAR
SPNEGO
SqlInputSource
SQLServer
SSD
SSDs
SSL
Samza
Splunk
SqlFirehose
SqlParameter
SslContextFactory
StatsD
SYSTEM_TABLE
TCP
TGT
TLS
TopN
TopNs
UI
UIs
URI
URIs
UTF-16
UTF-8
UTF8
XMLs
ZK
ZSTD
accessor
ad-hoc
aggregator
aggregators
ambari
analytics
assumeRoleArn
assumeRoleExternalId
parallel broker merges on fork join pool (#8578) * sketch of broker parallel merges done in small batches on fork join pool * fix non-terminating sequences, auto compute parallelism * adjust benches * adjust benchmarks * now hella more faster, fixed dumb * fix * remove comments * log.info for debug * javadoc * safer block for sequence to yielder conversion * refactor LifecycleForkJoinPool into LifecycleForkJoinPoolProvider which wraps a ForkJoinPool * smooth yield rate adjustment, more logs to help tune * cleanup, less logs * error handling, bug fixes, on by default, more parallel, more tests * remove unused var * comments * timeboundary mergeFn * simplify, more javadoc * formatting * pushdown config * use nanos consistently, move logs back to debug level, bit more javadoc * static terminal result batch * javadoc for nullability of createMergeFn * cleanup * oops * fix race, add docs * spelling, remove todo, add unhandled exception log * cleanup, revert unintended change * another unintended change * review stuff * add ParallelMergeCombiningSequenceBenchmark, fixes * hyper-threading is the enemy * fix initial start delay, lol * parallelism computer now balances partition sizes to partition counts using sqrt of sequence count instead of sequence count by 2 * fix those important style issues with the benchmarks code * lazy sequence creation for benchmarks * more benchmark comments * stable sequence generation time * update defaults to use 100ms target time, 4096 batch size, 16384 initial yield, also update user docs * add jmh thread based benchmarks, cleanup some stuff * oops * style * add spread to jmh thread benchmark start range, more comments to benchmarks parameters and purpose * retool benchmark to allow modeling more typical heterogenous heavy workloads * spelling * fix * refactor benchmarks * formatting * docs * add maxThreadStartDelay parameter to threaded benchmark * why does catch need to be on its own line but else doesnt
2019-11-07 14:58:46 -05:00
async
authorizer
authorizers
autocomplete
autodiscovery
autoscaler
autoscaling
averager
averagers
backend
backfills
backpressure
base64
big-endian
blobstore
boolean
breakpoint
broadcasted
checksums
classpath
clickstream
codebase
codec
colocated
colocation
compactable
compactionTask
config
configs
consumerProperties
cron
csv
customizable
dataset
datasets
datasketches
datasource
datasources
dbcp
deepstore
denormalization
denormalize
denormalized
deprioritization
deprioritizes
dequeued
deserialization
deserialize
deserialized
downtimes
druid extension for OpenID Connect auth using pac4j lib (#8992) * druid pac4j security extension for OpenID Connect OAuth 2.0 authentication * update version in druid-pac4j pom * introducing unauthorized resource filter * authenticated but authorized /unified-webconsole.html * use httpReq.getRequestURI() for matching callback path * add documentation * minor doc addition * licesne file updates * make dependency analyze succeed * fix doc build * hopefully fixes doc build * hopefully fixes license check build * yet another try on fixing license build * revert unintentional changes to website folder * update version to 0.18.0-SNAPSHOT * check session and its expiry on each request * add crypto service * code for encrypting the cookie * update doc with cookiePassphrase * update license yaml * make sessionstore in Pac4jFilter private non static * make Pac4jFilter fields final * okta: use sha256 for hmac * remove incubating * add UTs for crypto util and session store impl * use standard charsets * add license header * remove unused file * add org.objenesis.objenesis to license.yaml * a bit of nit changes in CryptoService and embedding EncryptionResult for clarity * rename alg to cipherAlgName * take cipher alg name, mode and padding as input * add java doc for CryptoService and make it more understandable * another UT for CryptoService * cache pac4j Config * use generics clearly in Pac4jSessionStore * update cookiePassphrase doc to mention PasswordProvider * mark stuff Nullable where appropriate in Pac4jSessionStore * update doc to mention jdbc * add error log on reaching callback resource * javadoc for Pac4jCallbackResource * introduce NOOP_HTTP_ACTION_ADAPTER * add correct module name in license file * correct extensions folder name in licenses.yaml * replace druid-kubernetes-extensions to druid-pac4j * cache SecureRandom instance * rename UnauthorizedResourceFilter to AuthenticationOnlyResourceFilter
2020-03-23 21:15:45 -04:00
druid
druidkubernetes-extensions
e.g.
encodings
endian
enum
expr
failover
featureSpec
findColumnsFromHeader
filenames
filesystem
firefox
firehose
firehoses
fromPigAvroStorage
frontends
granularities
granularitySpec
gzip
gzipped
hadoop
hasher
hashtable
historicals
hostname
hostnames
http
https
idempotency
i.e.
influxdb
ingestionSpec
injective
inlined
interruptible
jackson-jq
javadoc
joinable
kerberos
keystore
keytool
keytab
kubernetes
laning
lifecycle
localhost
log4j
log4j2
log4j2.xml
lookback
lookups
mapreduce
masse
maxNumFiles
maxNumSegments
max_map_count
memcached
mergeable
metadata
millis
misconfiguration
misconfigured
mostAvailableSize
multitenancy
multitenant
mysql
namespace
namespaced
namespaces
natively
netflow
non-nullable
noop
numerics
numShards
parameterized
parseable
partitioner
partitionFunction
partitionsSpec
performant
plaintext
pluggable
postgres
postgresql
pre-aggregated
pre-aggregates
pre-aggregating
pre-aggregation
pre-computation
pre-compute
pre-computing
pre-configured
pre-filtered
pre-filtering
pre-generated
pre-made
pre-processing
preemptible
prefetch
prefetched
prefetching
prepend
prepended
prepending
prepends
prepopulated
preprocessing
priori
procs
programmatically
proto
proxied
quantile
quantiles
queryable
quickstart
realtime
rebalance
redis
regexes
reimported
reindex
reindexing
reingest
reingesting
reingestion
repo
requireSSL
rollup
rollups
rsync
runtime
schemas
searchable
secondaryPartitionPruning
seekable-stream
servlet
simple-client-sslcontext
sharded
sharding
skipHeaderRows
smooshed
splittable
ssl
sslmode
stdout
storages
stringified
subarray
subnet
subqueries
subquery
subsecond
substring
subtask
subtasks
supervisorTaskId
symlink
tiering
timeseries
timestamp
timestamps
tradeoffs
tsv
ulimit
unannounce
unannouncements
unary
unassign
uncomment
underutilization
unintuitive
unioned
unmergeable
unmerged
UNNEST
unparseable
unparsed
unsetting
untrusted
useFilterCNF
useSSL
uptime
uris
urls
useFieldDiscovery
v1
v2
vCPUs
validator
vectorizable
vectorize
vectorizeVirtualColumns
versioning
w.r.t.
whitelist
whitelisted
whitespace
wildcard
wildcards
xml
znode
znodes
- ../docs/comparisons/druid-vs-elasticsearch.md
100x
- ../docs/configuration/logging.md
_common
- ../docs/dependencies/deep-storage.md
druid-hdfs-storage
druid-s3-extensions
- ../docs/dependencies/metadata-storage.md
BasicDataSource
- ../docs/dependencies/zookeeper.md
LeadershipLatch
3.5.x
3.4.x
- ../docs/design/auth.md
AllowAll
AuthenticationResult
AuthorizationLoadingLookupTest
HttpClient
allowAll
authenticatorChain
defaultUser
- ../docs/design/coordinator.md
inputSegmentSizeBytes
skipOffsetFromLatest
- ../docs/design/router.md
brokerService
c3.2xlarge
defaultManualBrokerService
maxPriority
minPriority
runtime.properties
timeBoundary
- ../docs/design/segments.md
0x0
0x9
2GB
300mb-700mb
Bieber
IndexTask-based
Ke
datasource_intervalStart_intervalEnd_version_partitionNum
partitionNum
v9
- ../docs/development/build.md
3.x
8u92
DskipTests
Papache-release
Pdist
Ddruid.console.skip
yaml
Phadoop3
dist-hadoop3
hadoop3
hadoop2
2.x.x
3.x.x
- ../docs/development/extensions-contrib/ambari-metrics-emitter.md
ambari-metrics
metricName
trustStore
- ../docs/development/extensions-core/azure.md
StaticAzureBlobStoreFirehose
StaticS3Firehose
fetchTimeout
gz
maxCacheCapacityBytes
maxFetchCapacityBytes
maxFetchRetry
prefetchTriggerBytes
shardSpecs
- ../docs/development/extensions-contrib/cloudfiles.md
StaticCloudFilesFirehose
cloudfiles
rackspace-cloudfiles-uk
rackspace-cloudfiles-us
StaticAzureBlobStoreFirehose
gz
shardSpecs
maxCacheCapacityBytes
maxFetchCapacityBytes
fetchTimeout
maxFetchRetry
- ../docs/development/extensions-contrib/distinctcount.md
distinctCount
groupBy
maxIntermediateRows
numValuesPerPass
queryGranularity
segmentGranularity
topN
visitor_id
- ../docs/development/extensions-contrib/influx.md
cpu
web_requests
- ../docs/development/extensions-contrib/influxdb-emitter.md
_
druid_
druid_cache_total
druid_hits
druid_query
historical001
- ../docs/development/extensions-contrib/materialized-view.md
HadoopTuningConfig
TuningConfig
base-dataSource's
baseDataSource
baseDataSource-hashCode
classpathPrefix
derivativeDataSource
dimensionsSpec
druid.extensions.hadoopDependenciesDir
hadoopDependencyCoordinates
maxTaskCount
metricsSpec
queryType
tuningConfig
- ../docs/development/extensions-contrib/momentsketch-quantiles.md
arcsinh
fieldName
momentSketchMerge
momentsketch
- ../docs/development/extensions-contrib/moving-average-query.md
10-minutes
MeanNoNulls
P1D
cycleSize
doubleMax
doubleAny
doubleMean
doubleMeanNoNulls
doubleMin
doubleSum
druid.generic.useDefaultValueForNull
limitSpec
longMax
longAny
longMean
longMeanNoNulls
longMin
longSum
movingAverage
postAggregations
postAveragers
pull-deps
- ../docs/development/extensions-contrib/opentsdb-emitter.md
defaultMetrics.json
Add config option for namespacePrefix (#9372) * Add config option for namespacePrefix opentsdb emitter sends metric names to opentsdb verbatim as what druid names them, for example "query.count", this doesn't fit well with a central opentsdb server which might have namespaced metrics, for example "druid.query.count". This adds support for adding an optional prefix. The prefix also gets a trailing dot (.), after it, so the metric name becomes <namespacePrefix>.<metricname> configureable as "druid.emitter.opentsdb.namespacePrefix", as documented. Co-authored-by: Martin Gerholm <martin.gerholm@deltaprojects.com> Signed-off-by: Martin Gerholm <martin.gerholm@deltaprojects.com> Signed-off-by: Björn Zettergren <bjorn.zettergren@deltaprojects.com> * Spelling for PR #9372 Added "namespacePrefix" to .spelling exceptions, it's a variable name used in documentation for opentsdb-emitter. * fixing tests for PR #9372 changed naming of variables to be more descriptive added test of prefix being an empty string: "". added a conditional to buildNamespacePrefix to check for empty string being fed if EventConverter called without OpentsdbEmitterConfig instance. * fixing checkstyle errors for PR #9372 used == to compare literal string, should be equals() * cleaned up and updated PR #9372 Created a buildMetric function as suggested by clintropolis, and removed redundant tests for empty strings as they're only used when calling EventConverter directly without going through OpentsdbEmitterConfig. * consistent naming of tests PR #9372 Changed names of tests in files to match better with what it was actually testing changed check for Strings.isNullOrEmpty to just check for `null`, as empty string valued `namespacePrefix` is handled in OpentsdbEmitterConfig. Co-authored-by: Martin Gerholm <inspector-martin@users.noreply.github.com>
2020-02-20 17:01:41 -05:00
namespacePrefix
src
- ../docs/development/extensions-contrib/redis-cache.md
loadList
pull-deps
PT2S
- ../docs/development/extensions-contrib/sqlserver.md
com.microsoft.sqlserver.jdbc.SQLServerDriver
sqljdbc
- ../docs/development/extensions-contrib/statsd.md
convertRange
prometheus metric exporter (#10412) * prometheus-emitter * use existing jetty server to expose prometheus collection endpoint * unused variables * better variable names * removed unused dependencies * more metric definitions * reorganize * use prometheus HTTPServer instead of hooking into Jetty server * temporary empty help string * temporary non-empty help. fix incorrect dimension value in JSON (also updated statsd json) * added full help text. added metric conversion factor for timers that are not using seconds. Correct metric dimension name in documentation * added documentation for prometheus emitter * safety for invalid labelNames * fix travis checks * Unit test and better sanitization of metrics names and label values * add precondition to check namespace against regex * use precompiled regex * remove static imports. fix metric types * better docs. fix possible NPE in PrometheusEmitterConfig. Guard against multiple calls to PrometheusEmitter.start() * Update regex for label-value replacements to allow internal numeric values. Additional tests * Adds missing license header updates website/.spelling to add words used in prometheus-emitter docs. updates docs/operations/metrics.md to correct the spelling of bufferPoolName * fixes version in extensions-contrib/prometheus-emitter * fix style guide errors * update import ordering * add another word to website/.spelling * remove unthrown declared exception * remove unused import * Pushgateway strategy for metrics * typo * Format fix and nullable strategy * Update pom file for prometheus-emitter * code review comments. Counter to gauge for cache metrics, periodical task to pushGateway * Syntax fix * Dimension label regex include numeric character back, fix previous commit * bump prometheus-emitter pom dev version * Remove scheduled task inside poen that push metrics * Fix checkstyle * Unit test coverage * Unit test coverage * Spelling * Doc fix * spelling Co-authored-by: Michael Schiff <michael.schiff@tubemogul.com> Co-authored-by: Michael Schiff <schiff.michael@gmail.com> Co-authored-by: Tianxin Zhao <tianxin.zhao@tubemogul.com> Co-authored-by: Tianxin Zhao <tizhao@adobe.com>
2021-03-09 17:37:31 -05:00
- ../docs/development/extensions-contrib/prometheus.md
HTTPServer
conversionFactor
prometheus
Pushgateway
- ../docs/development/extensions-contrib/tdigestsketch-quantiles.md
postAggregator
quantileFromTDigestSketch
quantilesFromTDigestSketch
tDigestSketch
- ../docs/development/extensions-contrib/thrift.md
HadoopDruidIndexer
LzoThriftBlock
SequenceFile
classname
hadoop-lzo
inputFormat
inputSpec
ioConfig
parseSpec
thriftClass
thriftJar
- ../docs/development/extensions-contrib/time-min-max.md
timeMax
timeMin
support Aliyun OSS service as deep storage (#9898) * init commit, all tests passed * fix format Signed-off-by: frank chen <frank.chen021@outlook.com> * data stored successfully * modify config path * add doc * add aliyun-oss extension to project * remove descriptor deletion code to avoid warning message output by aliyun client * fix warnings reported by lgtm-com * fix ci warnings Signed-off-by: frank chen <frank.chen021@outlook.com> * fix errors reported by intellj inspection check Signed-off-by: frank chen <frank.chen021@outlook.com> * fix doc spelling check Signed-off-by: frank chen <frank.chen021@outlook.com> * fix dependency warnings reported by ci Signed-off-by: frank chen <frank.chen021@outlook.com> * fix warnings reported by CI Signed-off-by: frank chen <frank.chen021@outlook.com> * add package configuration to support showing extension info Signed-off-by: frank chen <frank.chen021@outlook.com> * add IT test cases and fix bugs Signed-off-by: frank chen <frank.chen021@outlook.com> * 1. code review comments adopted 2. change schema from 'aliyun-oss' to 'oss' Signed-off-by: frank chen <frank.chen021@outlook.com> * add license info Signed-off-by: frank chen <frank.chen021@outlook.com> * fix doc Signed-off-by: frank chen <frank.chen021@outlook.com> * exclude execution of IT testcases of OSS extension from CI Signed-off-by: frank chen <frank.chen021@outlook.com> * put the extensions under contrib group and add to distribution * fix names in test cases * add unit test to cover OssInputSource * fix names in test cases * fix dependency problem reported by CI Signed-off-by: frank chen <frank.chen021@outlook.com>
2020-07-02 01:20:53 -04:00
- ../docs/development/extensions-contrib/aliyun-oss-extensions.md
Alibaba
support Aliyun OSS service as deep storage (#9898) * init commit, all tests passed * fix format Signed-off-by: frank chen <frank.chen021@outlook.com> * data stored successfully * modify config path * add doc * add aliyun-oss extension to project * remove descriptor deletion code to avoid warning message output by aliyun client * fix warnings reported by lgtm-com * fix ci warnings Signed-off-by: frank chen <frank.chen021@outlook.com> * fix errors reported by intellj inspection check Signed-off-by: frank chen <frank.chen021@outlook.com> * fix doc spelling check Signed-off-by: frank chen <frank.chen021@outlook.com> * fix dependency warnings reported by ci Signed-off-by: frank chen <frank.chen021@outlook.com> * fix warnings reported by CI Signed-off-by: frank chen <frank.chen021@outlook.com> * add package configuration to support showing extension info Signed-off-by: frank chen <frank.chen021@outlook.com> * add IT test cases and fix bugs Signed-off-by: frank chen <frank.chen021@outlook.com> * 1. code review comments adopted 2. change schema from 'aliyun-oss' to 'oss' Signed-off-by: frank chen <frank.chen021@outlook.com> * add license info Signed-off-by: frank chen <frank.chen021@outlook.com> * fix doc Signed-off-by: frank chen <frank.chen021@outlook.com> * exclude execution of IT testcases of OSS extension from CI Signed-off-by: frank chen <frank.chen021@outlook.com> * put the extensions under contrib group and add to distribution * fix names in test cases * add unit test to cover OssInputSource * fix names in test cases * fix dependency problem reported by CI Signed-off-by: frank chen <frank.chen021@outlook.com>
2020-07-02 01:20:53 -04:00
Aliyun
aliyun-oss-extensions
support Aliyun OSS service as deep storage (#9898) * init commit, all tests passed * fix format Signed-off-by: frank chen <frank.chen021@outlook.com> * data stored successfully * modify config path * add doc * add aliyun-oss extension to project * remove descriptor deletion code to avoid warning message output by aliyun client * fix warnings reported by lgtm-com * fix ci warnings Signed-off-by: frank chen <frank.chen021@outlook.com> * fix errors reported by intellj inspection check Signed-off-by: frank chen <frank.chen021@outlook.com> * fix doc spelling check Signed-off-by: frank chen <frank.chen021@outlook.com> * fix dependency warnings reported by ci Signed-off-by: frank chen <frank.chen021@outlook.com> * fix warnings reported by CI Signed-off-by: frank chen <frank.chen021@outlook.com> * add package configuration to support showing extension info Signed-off-by: frank chen <frank.chen021@outlook.com> * add IT test cases and fix bugs Signed-off-by: frank chen <frank.chen021@outlook.com> * 1. code review comments adopted 2. change schema from 'aliyun-oss' to 'oss' Signed-off-by: frank chen <frank.chen021@outlook.com> * add license info Signed-off-by: frank chen <frank.chen021@outlook.com> * fix doc Signed-off-by: frank chen <frank.chen021@outlook.com> * exclude execution of IT testcases of OSS extension from CI Signed-off-by: frank chen <frank.chen021@outlook.com> * put the extensions under contrib group and add to distribution * fix names in test cases * add unit test to cover OssInputSource * fix names in test cases * fix dependency problem reported by CI Signed-off-by: frank chen <frank.chen021@outlook.com>
2020-07-02 01:20:53 -04:00
AccessKey
accessKey
support Aliyun OSS service as deep storage (#9898) * init commit, all tests passed * fix format Signed-off-by: frank chen <frank.chen021@outlook.com> * data stored successfully * modify config path * add doc * add aliyun-oss extension to project * remove descriptor deletion code to avoid warning message output by aliyun client * fix warnings reported by lgtm-com * fix ci warnings Signed-off-by: frank chen <frank.chen021@outlook.com> * fix errors reported by intellj inspection check Signed-off-by: frank chen <frank.chen021@outlook.com> * fix doc spelling check Signed-off-by: frank chen <frank.chen021@outlook.com> * fix dependency warnings reported by ci Signed-off-by: frank chen <frank.chen021@outlook.com> * fix warnings reported by CI Signed-off-by: frank chen <frank.chen021@outlook.com> * add package configuration to support showing extension info Signed-off-by: frank chen <frank.chen021@outlook.com> * add IT test cases and fix bugs Signed-off-by: frank chen <frank.chen021@outlook.com> * 1. code review comments adopted 2. change schema from 'aliyun-oss' to 'oss' Signed-off-by: frank chen <frank.chen021@outlook.com> * add license info Signed-off-by: frank chen <frank.chen021@outlook.com> * fix doc Signed-off-by: frank chen <frank.chen021@outlook.com> * exclude execution of IT testcases of OSS extension from CI Signed-off-by: frank chen <frank.chen021@outlook.com> * put the extensions under contrib group and add to distribution * fix names in test cases * add unit test to cover OssInputSource * fix names in test cases * fix dependency problem reported by CI Signed-off-by: frank chen <frank.chen021@outlook.com>
2020-07-02 01:20:53 -04:00
aliyun-oss
json
OSS
support Aliyun OSS service as deep storage (#9898) * init commit, all tests passed * fix format Signed-off-by: frank chen <frank.chen021@outlook.com> * data stored successfully * modify config path * add doc * add aliyun-oss extension to project * remove descriptor deletion code to avoid warning message output by aliyun client * fix warnings reported by lgtm-com * fix ci warnings Signed-off-by: frank chen <frank.chen021@outlook.com> * fix errors reported by intellj inspection check Signed-off-by: frank chen <frank.chen021@outlook.com> * fix doc spelling check Signed-off-by: frank chen <frank.chen021@outlook.com> * fix dependency warnings reported by ci Signed-off-by: frank chen <frank.chen021@outlook.com> * fix warnings reported by CI Signed-off-by: frank chen <frank.chen021@outlook.com> * add package configuration to support showing extension info Signed-off-by: frank chen <frank.chen021@outlook.com> * add IT test cases and fix bugs Signed-off-by: frank chen <frank.chen021@outlook.com> * 1. code review comments adopted 2. change schema from 'aliyun-oss' to 'oss' Signed-off-by: frank chen <frank.chen021@outlook.com> * add license info Signed-off-by: frank chen <frank.chen021@outlook.com> * fix doc Signed-off-by: frank chen <frank.chen021@outlook.com> * exclude execution of IT testcases of OSS extension from CI Signed-off-by: frank chen <frank.chen021@outlook.com> * put the extensions under contrib group and add to distribution * fix names in test cases * add unit test to cover OssInputSource * fix names in test cases * fix dependency problem reported by CI Signed-off-by: frank chen <frank.chen021@outlook.com>
2020-07-02 01:20:53 -04:00
oss
secretKey
support Aliyun OSS service as deep storage (#9898) * init commit, all tests passed * fix format Signed-off-by: frank chen <frank.chen021@outlook.com> * data stored successfully * modify config path * add doc * add aliyun-oss extension to project * remove descriptor deletion code to avoid warning message output by aliyun client * fix warnings reported by lgtm-com * fix ci warnings Signed-off-by: frank chen <frank.chen021@outlook.com> * fix errors reported by intellj inspection check Signed-off-by: frank chen <frank.chen021@outlook.com> * fix doc spelling check Signed-off-by: frank chen <frank.chen021@outlook.com> * fix dependency warnings reported by ci Signed-off-by: frank chen <frank.chen021@outlook.com> * fix warnings reported by CI Signed-off-by: frank chen <frank.chen021@outlook.com> * add package configuration to support showing extension info Signed-off-by: frank chen <frank.chen021@outlook.com> * add IT test cases and fix bugs Signed-off-by: frank chen <frank.chen021@outlook.com> * 1. code review comments adopted 2. change schema from 'aliyun-oss' to 'oss' Signed-off-by: frank chen <frank.chen021@outlook.com> * add license info Signed-off-by: frank chen <frank.chen021@outlook.com> * fix doc Signed-off-by: frank chen <frank.chen021@outlook.com> * exclude execution of IT testcases of OSS extension from CI Signed-off-by: frank chen <frank.chen021@outlook.com> * put the extensions under contrib group and add to distribution * fix names in test cases * add unit test to cover OssInputSource * fix names in test cases * fix dependency problem reported by CI Signed-off-by: frank chen <frank.chen021@outlook.com>
2020-07-02 01:20:53 -04:00
url
- ../docs/development/extensions-core/approximate-histograms.md
approxHistogram
approxHistogramFold
fixedBucketsHistogram
bucketNum
lowerLimit
numBuckets
upperLimit
- ../docs/development/extensions-core/avro.md
AVRO-1124
Avro-1124
SchemaRepo
avro
avroBytesDecoder
protoBytesDecoder
flattenSpec
jq
org.apache.druid.extensions
schemaRepository
schema_inline
subjectAndIdConverter
url
- ../docs/development/extensions-core/bloom-filter.md
BloomKFilter
bitset
outputStream
- ../docs/development/extensions-core/datasketches-hll.md
HLLSketchBuild
HLLSketchMerge
lgK
log2
tgtHllType
- ../docs/development/extensions-core/datasketches-quantiles.md
CDF
DoublesSketch
maxStreamLength
PMF
quantilesDoublesSketch
toString
- ../docs/development/extensions-core/datasketches-theta.md
isInputThetaSketch
thetaSketch
user_id
- ../docs/development/extensions-core/datasketches-tuple.md
ArrayOfDoublesSketch
arrayOfDoublesSketch
metricColumns
nominalEntries
numberOfValues
- ../docs/development/extensions-core/druid-basic-security.md
INFORMATION_SCHEMA
MyBasicAuthenticator
MyBasicAuthorizer
authenticatorName
authorizerName
druid_system
pollingPeriod
roleName
LDAP
ldap
MyBasicMetadataAuthenticator
MyBasicLDAPAuthenticator
MyBasicMetadataAuthorizer
MyBasicLDAPAuthorizer
credentialsValidator
sAMAccountName
objectClass
initialAdminRole
adminGroupMapping
groupMappingName
- ../docs/development/extensions-core/druid-kerberos.md
8KiB
HttpComponents
MyKerberosAuthenticator
RFC-4559
SPNego
_HOST
- ../docs/development/extensions-core/druid-lookups.md
cacheFactory
concurrencyLevel
dataFetcher
expireAfterAccess
expireAfterWrite
initialCapacity
loadingCacheSpec
maxEntriesSize
maxStoreSize
maximumSize
onHeapPolling
pollPeriod
reverseLoadingCacheSpec
druid extension for OpenID Connect auth using pac4j lib (#8992) * druid pac4j security extension for OpenID Connect OAuth 2.0 authentication * update version in druid-pac4j pom * introducing unauthorized resource filter * authenticated but authorized /unified-webconsole.html * use httpReq.getRequestURI() for matching callback path * add documentation * minor doc addition * licesne file updates * make dependency analyze succeed * fix doc build * hopefully fixes doc build * hopefully fixes license check build * yet another try on fixing license build * revert unintentional changes to website folder * update version to 0.18.0-SNAPSHOT * check session and its expiry on each request * add crypto service * code for encrypting the cookie * update doc with cookiePassphrase * update license yaml * make sessionstore in Pac4jFilter private non static * make Pac4jFilter fields final * okta: use sha256 for hmac * remove incubating * add UTs for crypto util and session store impl * use standard charsets * add license header * remove unused file * add org.objenesis.objenesis to license.yaml * a bit of nit changes in CryptoService and embedding EncryptionResult for clarity * rename alg to cipherAlgName * take cipher alg name, mode and padding as input * add java doc for CryptoService and make it more understandable * another UT for CryptoService * cache pac4j Config * use generics clearly in Pac4jSessionStore * update cookiePassphrase doc to mention PasswordProvider * mark stuff Nullable where appropriate in Pac4jSessionStore * update doc to mention jdbc * add error log on reaching callback resource * javadoc for Pac4jCallbackResource * introduce NOOP_HTTP_ACTION_ADAPTER * add correct module name in license file * correct extensions folder name in licenses.yaml * replace druid-kubernetes-extensions to druid-pac4j * cache SecureRandom instance * rename UnauthorizedResourceFilter to AuthenticationOnlyResourceFilter
2020-03-23 21:15:45 -04:00
- ../docs/development/extensions-core/druid-pac4j.md
OAuth
Okta
OpenID
pac4j
- ../docs/development/extensions-core/kubernetes.md
Env
POD_NAME
POD_NAMESPACE
ConfigMap
PT17S
- ../docs/development/extensions-core/google.md
GCS
StaticGoogleBlobStoreFirehose
- ../docs/development/extensions-core/hdfs.md
gcs-connector
hadoop2
hdfs
- ../docs/development/extensions-core/kafka-extraction-namespace.md
LookupExtractorFactory
zookeeper.connect
- ../docs/development/extensions-core/kafka-ingestion.md
0.11.x.
00Z
2016-01-01T11
2016-01-01T12
2016-01-01T14
CONNECTING_TO_STREAM
CREATING_TASKS
DISCOVERING_INITIAL_TASKS
KafkaSupervisorIOConfig
KafkaSupervisorTuningConfig
LOST_CONTACT_WITH_STREAM
OffsetOutOfRangeException
P2147483647D
PT10M
PT10S
PT1H
PT30M
PT30S
PT5S
PT80S
Docs - update dynamic config provider topic (#11795) * update dynamic config provider * update topic * add examples for dynamic config provider: * Update docs/development/extensions-core/kafka-ingestion.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/development/extensions-core/kafka-ingestion.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/development/extensions-core/kafka-ingestion.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/operations/dynamic-config-provider.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/operations/dynamic-config-provider.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/operations/dynamic-config-provider.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/operations/dynamic-config-provider.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/development/extensions-core/kafka-ingestion.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/operations/dynamic-config-provider.md Co-authored-by: Clint Wylie <cjwylie@gmail.com> * Update docs/operations/dynamic-config-provider.md Co-authored-by: Clint Wylie <cjwylie@gmail.com> * Update kafka-ingestion.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> Co-authored-by: Clint Wylie <cjwylie@gmail.com>
2021-10-14 20:51:32 -04:00
SASL
SegmentWriteOutMediumFactory
UNABLE_TO_CONNECT_TO_STREAM
UNHEALTHY_SUPERVISOR
UNHEALTHY_TASKS
dimensionCompression
earlyMessageRejectionPeriod
indexSpec
intermediateHandoffPeriod
longEncoding
maxBytesInMemory
maxPendingPersists
maxRowsInMemory
maxRowsPerSegment
maxSavedParseExceptions
maxTotalRows
metricCompression
numKafkaPartitions
taskCount
taskDuration
- ../docs/development/extensions-core/kinesis-ingestion.md
9.2dist
KinesisSupervisorIOConfig
KinesisSupervisorTuningConfig
Resharding
resharding
LZ4LZFuncompressedLZ4LZ4LZFuncompressednoneLZ4autolongsautolongslongstypeconcisetyperoaringcompressRunOnSerializationtruetypestreamendpointreplicastaskCounttaskCount
deaggregate
druid-kinesis-indexing-service
maxRecordsPerPoll
maxRecordsPerPollrecordsPerFetchfetchDelayMillisreplicasfetchDelayMillisrecordsPerFetchfetchDelayMillismaxRecordsPerPollamazon-kinesis-client1
numKinesisShards
numProcessors
q.size
repartitionTransitionDuration
replicastaskCounttaskCount
resetuseEarliestSequenceNumberPOST
resumePOST
statusrecentErrorsdruid.supervisor.maxStoredExceptionEventsstatedetailedStatestatedetailedStatestatestatePENDINGRUNNINGSUSPENDEDSTOPPINGUNHEALTHY_SUPERVISORUNHEALTHY_TASKSdetailedStatestatedruid.supervisor.unhealthinessThresholddruid.supervisor.taskUnhealthinessThresholdtaskDurationtaskCountreplicasdetailedStatedetailedStateRUNNINGPOST
supervisorPOST
supervisorfetchThreadsfetchDelayMillisrecordsPerFetchmaxRecordsPerPollpoll
suspendPOST
taskCounttaskDurationreplicas
taskCounttaskDurationtaskDurationPOST
taskDurationstartDelayperioduseEarliestSequenceNumbercompletionTimeouttaskDurationlateMessageRejectionPeriodPT1HearlyMessageRejectionPeriodPT1HPT1HrecordsPerFetchfetchDelayMillisawsAssumedRoleArnawsExternalIddeaggregateGET
terminatePOST
terminatedruid.worker.capacitytaskDurationcompletionTimeoutreplicastaskCountreplicas
PT2M
kinesis.us
amazonaws.com
PT6H
GetRecords
KCL
signalled
ProvisionedThroughputExceededException
Deaggregation
- ../docs/development/extensions-core/lookups-cached-global.md
baz
customJson
lookupParseSpec
namespaceParseSpec
simpleJson
- ../docs/development/extensions-core/orc.md
dimensionSpec
flattenSpec
- ../docs/development/extensions-core/parquet.md
binaryAsString
- ../docs/development/extensions-core/postgresql.md
sslFactory's
sslMode
- ../docs/development/extensions-core/protobuf.md
Proto
metrics.desc
metrics.desc.
metrics.proto.
metrics_pb
protoMessageType
timeAndDims
tmp
- ../docs/development/extensions-core/s3.md
SigV4
jvm.config
kms
s3
s3a
s3n
uris
- ../docs/development/extensions-core/simple-client-sslcontext.md
KeyManager
SSLContext
TrustManager
- ../docs/development/extensions-core/stats.md
GenericUDAFVariance
Golub
J.L.
LeVeque
Numer
chunk1
chunk2
stddev
t1
t2
variance1
variance2
varianceFold
variance_pop
variance_sample
- ../docs/development/extensions-core/test-stats.md
Berry_statbook
Berry_statbook_chpt6.pdf
S.E.
engineering.com
jcb0773
n1
n2
p1
p2
pvalue2tailedZtest
sqrt
successCount1
successCount2
www.isixsigma.com
www.paypal
www.ucs.louisiana.edu
zscore
zscore2sample
ztests
- ../docs/development/extensions.md
DistinctCount
artifactId
com.example
common.runtime.properties
druid-aws-rds-extensions
druid-cassandra-storage
druid-distinctcount
druid-ec2-extensions
druid-kafka-extraction-namespace
druid-kafka-indexing-service
druid-opentsdb-emitter
druid-protobuf-extensions
druid-tdigestsketch
druid.apache.org
groupId
jvm-global
kafka-emitter
org.apache.druid.extensions.contrib.
pull-deps
sqlserver-metadata-storage
statsd-emitter
- ../docs/development/geo.md
coords
dimName
maxCoords
Mb
minCoords
- ../docs/development/javascript.md
Metaspace
dev
- ../docs/development/modules.md
AggregatorFactory
ArchiveTask
ComplexMetrics
DataSegmentArchiver
DataSegmentKiller
DataSegmentMover
DataSegmentPuller
DataSegmentPusher
DruidModule
ExtractionFns
HdfsStorageDruidModule
JacksonInject
MapBinder
MoveTask
ObjectMapper
PasswordProvider
PostAggregators
QueryRunnerFactory
SegmentMetadataQuery
SegmentMetadataQueryQueryToolChest
StaticS3FirehoseFactory
loadSpec
multibind
pom.xml
- ../docs/ingestion/data-formats.md
0.6.x
0.7.x
0.7.x.
TimeAndDims
column2
column_1
column_n
com.opencsv
ctrl
Kafka Input Format for headers, key and payload parsing (#11630) ### Description Today we ingest a number of high cardinality metrics into Druid across dimensions. These metrics are rolled up on a per minute basis, and are very useful when looking at metrics on a partition or client basis. Events is another class of data that provides useful information about a particular incident/scenario inside a Kafka cluster. Events themselves are carried inside kafka payload, but nonetheless there are some very useful metadata that is carried in kafka headers that can serve as useful dimension for aggregation and in turn bringing better insights. PR(https://github.com/apache/druid/pull/10730) introduced support of Kafka headers in InputFormats. We still need an input format to parse out the headers and translate those into relevant columns in Druid. Until that’s implemented, none of the information available in the Kafka message headers would be exposed. So first there is a need to write an input format that can parse headers in any given format(provided we support the format) like we parse payloads today. Apart from headers there is also some useful information present in the key portion of the kafka record. We also need a way to expose the data present in the key as druid columns. We need a generic way to express at configuration time what attributes from headers, key and payload need to be ingested into druid. We need to keep the design generic enough so that users can specify different parsers for headers, key and payload. This PR is designed to solve the above by providing wrapper around any existing input formats and merging the data into a single unified Druid row. Lets look at a sample input format from the above discussion "inputFormat": { "type": "kafka", // New input format type "headerLabelPrefix": "kafka.header.", // Label prefix for header columns, this will avoid collusions while merging columns "recordTimestampLabelPrefix": "kafka.", // Kafka record's timestamp is made available in case payload does not carry timestamp "headerFormat": // Header parser specifying that values are of type string { "type": "string" }, "valueFormat": // Value parser from json parsing { "type": "json", "flattenSpec": { "useFieldDiscovery": true, "fields": [...] } }, "keyFormat": // Key parser also from json parsing { "type": "json" } } Since we have independent sections for header, key and payload, it will enable parsing each section with its own parser, eg., headers coming in as string and payload as json. KafkaInputFormat will be the uber class extending inputFormat interface and will be responsible for creating individual parsers for header, key and payload, blend the data resolving conflicts in columns and generating a single unified InputRow for Druid ingestion. "headerFormat" will allow users to plug parser type for the header values and will add default header prefix as "kafka.header."(can be overridden) for attributes to avoid collision while merging attributes with payload. Kafka payload parser will be responsible for parsing the Value portion of the Kafka record. This is where most of the data will come from and we should be able to plugin existing parser. One thing to note here is that if batching is performed, then the code is augmenting header and key values to every record in the batch. Kafka key parser will handle parsing Key portion of the Kafka record and will ingest the Key with dimension name as "kafka.key". ## KafkaInputFormat Class: This is the class that orchestrates sending the consumerRecord to each parser, retrieve rows, merge the columns into one final row for Druid consumption. KafkaInputformat should make sure to release the resources that gets allocated as a part of reader in CloseableIterator<InputRow> during normal and exception cases. During conflicts in dimension/metrics names, the code will prefer dimension names from payload and ignore the dimension either from headers/key. This is done so that existing input formats can be easily migrated to this new format without worrying about losing information.
2021-10-07 11:56:27 -04:00
headerFormat
headerLabelPrefix
jsonLowercase
Kafka Input Format for headers, key and payload parsing (#11630) ### Description Today we ingest a number of high cardinality metrics into Druid across dimensions. These metrics are rolled up on a per minute basis, and are very useful when looking at metrics on a partition or client basis. Events is another class of data that provides useful information about a particular incident/scenario inside a Kafka cluster. Events themselves are carried inside kafka payload, but nonetheless there are some very useful metadata that is carried in kafka headers that can serve as useful dimension for aggregation and in turn bringing better insights. PR(https://github.com/apache/druid/pull/10730) introduced support of Kafka headers in InputFormats. We still need an input format to parse out the headers and translate those into relevant columns in Druid. Until that’s implemented, none of the information available in the Kafka message headers would be exposed. So first there is a need to write an input format that can parse headers in any given format(provided we support the format) like we parse payloads today. Apart from headers there is also some useful information present in the key portion of the kafka record. We also need a way to expose the data present in the key as druid columns. We need a generic way to express at configuration time what attributes from headers, key and payload need to be ingested into druid. We need to keep the design generic enough so that users can specify different parsers for headers, key and payload. This PR is designed to solve the above by providing wrapper around any existing input formats and merging the data into a single unified Druid row. Lets look at a sample input format from the above discussion "inputFormat": { "type": "kafka", // New input format type "headerLabelPrefix": "kafka.header.", // Label prefix for header columns, this will avoid collusions while merging columns "recordTimestampLabelPrefix": "kafka.", // Kafka record's timestamp is made available in case payload does not carry timestamp "headerFormat": // Header parser specifying that values are of type string { "type": "string" }, "valueFormat": // Value parser from json parsing { "type": "json", "flattenSpec": { "useFieldDiscovery": true, "fields": [...] } }, "keyFormat": // Key parser also from json parsing { "type": "json" } } Since we have independent sections for header, key and payload, it will enable parsing each section with its own parser, eg., headers coming in as string and payload as json. KafkaInputFormat will be the uber class extending inputFormat interface and will be responsible for creating individual parsers for header, key and payload, blend the data resolving conflicts in columns and generating a single unified InputRow for Druid ingestion. "headerFormat" will allow users to plug parser type for the header values and will add default header prefix as "kafka.header."(can be overridden) for attributes to avoid collision while merging attributes with payload. Kafka payload parser will be responsible for parsing the Value portion of the Kafka record. This is where most of the data will come from and we should be able to plugin existing parser. One thing to note here is that if batching is performed, then the code is augmenting header and key values to every record in the batch. Kafka key parser will handle parsing Key portion of the Kafka record and will ingest the Key with dimension name as "kafka.key". ## KafkaInputFormat Class: This is the class that orchestrates sending the consumerRecord to each parser, retrieve rows, merge the columns into one final row for Druid consumption. KafkaInputformat should make sure to release the resources that gets allocated as a part of reader in CloseableIterator<InputRow> during normal and exception cases. During conflicts in dimension/metrics names, the code will prefer dimension names from payload and ignore the dimension either from headers/key. This is done so that existing input formats can be easily migrated to this new format without worrying about losing information.
2021-10-07 11:56:27 -04:00
kafka
KafkaStringHeaderFormat
kafka.header.
kafka.key
kafka.timestamp
keyColumnName
keyFormat
listDelimiter
Kafka Input Format for headers, key and payload parsing (#11630) ### Description Today we ingest a number of high cardinality metrics into Druid across dimensions. These metrics are rolled up on a per minute basis, and are very useful when looking at metrics on a partition or client basis. Events is another class of data that provides useful information about a particular incident/scenario inside a Kafka cluster. Events themselves are carried inside kafka payload, but nonetheless there are some very useful metadata that is carried in kafka headers that can serve as useful dimension for aggregation and in turn bringing better insights. PR(https://github.com/apache/druid/pull/10730) introduced support of Kafka headers in InputFormats. We still need an input format to parse out the headers and translate those into relevant columns in Druid. Until that’s implemented, none of the information available in the Kafka message headers would be exposed. So first there is a need to write an input format that can parse headers in any given format(provided we support the format) like we parse payloads today. Apart from headers there is also some useful information present in the key portion of the kafka record. We also need a way to expose the data present in the key as druid columns. We need a generic way to express at configuration time what attributes from headers, key and payload need to be ingested into druid. We need to keep the design generic enough so that users can specify different parsers for headers, key and payload. This PR is designed to solve the above by providing wrapper around any existing input formats and merging the data into a single unified Druid row. Lets look at a sample input format from the above discussion "inputFormat": { "type": "kafka", // New input format type "headerLabelPrefix": "kafka.header.", // Label prefix for header columns, this will avoid collusions while merging columns "recordTimestampLabelPrefix": "kafka.", // Kafka record's timestamp is made available in case payload does not carry timestamp "headerFormat": // Header parser specifying that values are of type string { "type": "string" }, "valueFormat": // Value parser from json parsing { "type": "json", "flattenSpec": { "useFieldDiscovery": true, "fields": [...] } }, "keyFormat": // Key parser also from json parsing { "type": "json" } } Since we have independent sections for header, key and payload, it will enable parsing each section with its own parser, eg., headers coming in as string and payload as json. KafkaInputFormat will be the uber class extending inputFormat interface and will be responsible for creating individual parsers for header, key and payload, blend the data resolving conflicts in columns and generating a single unified InputRow for Druid ingestion. "headerFormat" will allow users to plug parser type for the header values and will add default header prefix as "kafka.header."(can be overridden) for attributes to avoid collision while merging attributes with payload. Kafka payload parser will be responsible for parsing the Value portion of the Kafka record. This is where most of the data will come from and we should be able to plugin existing parser. One thing to note here is that if batching is performed, then the code is augmenting header and key values to every record in the batch. Kafka key parser will handle parsing Key portion of the Kafka record and will ingest the Key with dimension name as "kafka.key". ## KafkaInputFormat Class: This is the class that orchestrates sending the consumerRecord to each parser, retrieve rows, merge the columns into one final row for Druid consumption. KafkaInputformat should make sure to release the resources that gets allocated as a part of reader in CloseableIterator<InputRow> during normal and exception cases. During conflicts in dimension/metrics names, the code will prefer dimension names from payload and ignore the dimension either from headers/key. This is done so that existing input formats can be easily migrated to this new format without worrying about losing information.
2021-10-07 11:56:27 -04:00
timestampColumnName
timestampSpec
urls
Kafka Input Format for headers, key and payload parsing (#11630) ### Description Today we ingest a number of high cardinality metrics into Druid across dimensions. These metrics are rolled up on a per minute basis, and are very useful when looking at metrics on a partition or client basis. Events is another class of data that provides useful information about a particular incident/scenario inside a Kafka cluster. Events themselves are carried inside kafka payload, but nonetheless there are some very useful metadata that is carried in kafka headers that can serve as useful dimension for aggregation and in turn bringing better insights. PR(https://github.com/apache/druid/pull/10730) introduced support of Kafka headers in InputFormats. We still need an input format to parse out the headers and translate those into relevant columns in Druid. Until that’s implemented, none of the information available in the Kafka message headers would be exposed. So first there is a need to write an input format that can parse headers in any given format(provided we support the format) like we parse payloads today. Apart from headers there is also some useful information present in the key portion of the kafka record. We also need a way to expose the data present in the key as druid columns. We need a generic way to express at configuration time what attributes from headers, key and payload need to be ingested into druid. We need to keep the design generic enough so that users can specify different parsers for headers, key and payload. This PR is designed to solve the above by providing wrapper around any existing input formats and merging the data into a single unified Druid row. Lets look at a sample input format from the above discussion "inputFormat": { "type": "kafka", // New input format type "headerLabelPrefix": "kafka.header.", // Label prefix for header columns, this will avoid collusions while merging columns "recordTimestampLabelPrefix": "kafka.", // Kafka record's timestamp is made available in case payload does not carry timestamp "headerFormat": // Header parser specifying that values are of type string { "type": "string" }, "valueFormat": // Value parser from json parsing { "type": "json", "flattenSpec": { "useFieldDiscovery": true, "fields": [...] } }, "keyFormat": // Key parser also from json parsing { "type": "json" } } Since we have independent sections for header, key and payload, it will enable parsing each section with its own parser, eg., headers coming in as string and payload as json. KafkaInputFormat will be the uber class extending inputFormat interface and will be responsible for creating individual parsers for header, key and payload, blend the data resolving conflicts in columns and generating a single unified InputRow for Druid ingestion. "headerFormat" will allow users to plug parser type for the header values and will add default header prefix as "kafka.header."(can be overridden) for attributes to avoid collision while merging attributes with payload. Kafka payload parser will be responsible for parsing the Value portion of the Kafka record. This is where most of the data will come from and we should be able to plugin existing parser. One thing to note here is that if batching is performed, then the code is augmenting header and key values to every record in the batch. Kafka key parser will handle parsing Key portion of the Kafka record and will ingest the Key with dimension name as "kafka.key". ## KafkaInputFormat Class: This is the class that orchestrates sending the consumerRecord to each parser, retrieve rows, merge the columns into one final row for Druid consumption. KafkaInputformat should make sure to release the resources that gets allocated as a part of reader in CloseableIterator<InputRow> during normal and exception cases. During conflicts in dimension/metrics names, the code will prefer dimension names from payload and ignore the dimension either from headers/key. This is done so that existing input formats can be easily migrated to this new format without worrying about losing information.
2021-10-07 11:56:27 -04:00
valueFormat
- ../docs/ingestion/data-management.md
1GB
IOConfig
compactionTask
compactionTasks
ingestSegmentFirehose
numShards
- ../docs/ingestion/faq.md
IngestSegment
IngestSegmentFirehose
maxSizes
windowPeriod
- ../docs/ingestion/hadoop.md
2012-01-01T00
2012-01-03T00
2012-01-05T00
2012-01-07T00
500MB
CombineTextInputFormat
HadoopIndexTask
InputFormat
InputSplit
JobHistory
a.example.com
assumeGrouped
awaitSegmentAvailabilityTimeoutMillis
cleanupOnFailure
combineText
connectURI
dataGranularity
datetime
f.example.com
filePattern
forceExtendableShardSpecs
granularitySpec
ignoreInvalidRows
ignoreWhenNoSegments
indexSpecForIntermediatePersists
index_hadoop
inputPath
inputSpecs
interval1
interval2
jobProperties
leaveIntermediate
logParseExceptions
mapred.map.tasks
mapreduce.job.maps
maxParseExceptions
maxPartitionSize
maxSplitSize
metadataUpdateSpec
numBackgroundPersistThreads
overwriteFiles
partitionDimension
partitionDimensions
partitionSpec
pathFormat
segmentOutputPath
segmentTable
shardSpec
single_dim
targetPartitionSize
targetRowsPerSegment
useCombiner
useExplicitVersion
useNewAggs
useYarnRMJobStatusFallback
workingPath
z.example.com
- ../docs/ingestion/native-batch.md
150MB
CombiningFirehose
DataSchema
DefaultPassword
EnvironmentVariablePasswordProvider
HttpFirehose
IOConfig
InlineFirehose
LocalFirehose
PartitionsSpec
PasswordProviders
SegmentsSplitHintSpec
SplitHintSpec
accessKeyId
appendToExisting
baseDir
chatHandlerNumRetries
chatHandlerTimeout
connectorConfig
DruidInputSource: Fix issues in column projection, timestamp handling. (#10267) * DruidInputSource: Fix issues in column projection, timestamp handling. DruidInputSource, DruidSegmentReader changes: 1) Remove "dimensions" and "metrics". They are not necessary, because we can compute which columns we need to read based on what is going to be used by the timestamp, transform, dimensions, and metrics. 2) Start using ColumnsFilter (see below) to decide which columns we need to read. 3) Actually respect the "timestampSpec". Previously, it was ignored, and the timestamp of the returned InputRows was set to the `__time` column of the input datasource. (1) and (2) together fix a bug in which the DruidInputSource would not properly read columns that are used as inputs to a transformSpec. (3) fixes a bug where the timestampSpec would be ignored if you attempted to set the column to something other than `__time`. (1) and (3) are breaking changes. Web console changes: 1) Remove "Dimensions" and "Metrics" from the Druid input source. 2) Set timestampSpec to `{"column": "__time", "format": "millis"}` for compatibility with the new behavior. Other changes: 1) Add ColumnsFilter, a new class that allows input readers to determine which columns they need to read. Currently, it's only used by the DruidInputSource, but it could be used by other columnar input sources in the future. 2) Add a ColumnsFilter to InputRowSchema. 3) Remove the metric names from InputRowSchema (they were unused). 4) Add InputRowSchemas.fromDataSchema method that computes the proper ColumnsFilter for given timestamp, dimensions, transform, and metrics. 5) Add "getRequiredColumns" method to TransformSpec to support the above. * Various fixups. * Uncomment incorrectly commented lines. * Move TransformSpecTest to the proper module. * Add druid.indexer.task.ignoreTimestampSpecForDruidInputSource setting. * Fix. * Fix build. * Checkstyle. * Misc fixes. * Fix test. * Move config. * Fix imports. * Fixup. * Fix ShuffleResourceTest. * Add import. * Smarter exclusions. * Fixes based on tests. Also, add TIME_COLUMN constant in the web console. * Adjustments for tests. * Reorder test data. * Update docs. * Update docs to say Druid 0.22.0 instead of 0.21.0. * Fix test. * Fix ITAutoCompactionTest. * Changes from review & from merging.
2021-03-25 13:32:21 -04:00
countryName
dataSchema's
dropExisting
foldCase
forceGuaranteedRollup
httpAuthenticationPassword
httpAuthenticationUsername
ingestSegment
InputSource
DruidInputSource
maxColumnsToMerge
maxInputSegmentBytesPerTask
maxNumConcurrentSubTasks
maxNumSegmentsToMerge
maxRetry
pushTimeout
reportParseExceptions
secretAccessKey
segmentWriteOutMediumFactory
sql
sqls
splitHintSpec
taskStatusCheckPeriodMs
timeChunk
totalNumMergeTasks
StaticS3Firehose
prefetchTriggerBytes
awaitSegmentAvailabilityTimeoutMillis
- ../docs/ingestion/schema-design.md
product_category
product_id
product_name
- ../docs/ingestion/tasks.md
BUILD_SEGMENTS
DETERMINE_PARTITIONS
forceTimeChunkLock
taskLockTimeout
- ../docs/misc/math-expr.md
DOUBLE_ARRAY
DOY
DateTimeFormat
LONG_ARRAY
Los_Angeles
P3M
PT12H
STRING_ARRAY
String.format
acos
args
arr1
arr2
array_append
array_concat
array_set_add
array_set_add_all
array_contains
array_length
array_offset
array_offset_of
array_ordinal
array_ordinal_of
array_overlap
array_prepend
array_slice
array_to_string
asin
atan
atan2
bitwise
bitwiseAnd
bitwiseComplement
bitwiseConvertDoubleToLongBits
bitwiseConvertLongBitsToDouble
bitwiseOr
bitwiseShiftLeft
bitwiseShiftRight
bitwiseXor
bloom_filter_test
cartesian_fold
cartesian_map
case_searched
case_simple
cbrt
concat
copysign
expm1
expr
expr1
expr2
fromIndex
getExponent
hypot
ipv4_match
ipv4_parse
ipv4_stringify
java.lang.Math
java.lang.String
log10
log1p
lpad
ltrim
nextUp
nextafter
nvl
parse_long
regexp_extract
regexp_like
contains_string
icontains_string
result1
result2
rint
rpad
rtrim
scalb
signum
str1
str2
string_to_array
stringAny
strlen
strpos
timestamp_ceil
timestamp_extract
timestamp_floor
timestamp_format
timestamp_parse
timestamp_shift
todegrees
toradians
ulp
unix_timestamp
value1
value2
valueOf
IEC
human_readable_binary_byte_format
human_readable_decimal_byte_format
human_readable_decimal_format
- ../docs/misc/papers-and-talks.md
RADStack
- ../docs/operations/api-reference.md
00.000Z
2015-09-12T03
2015-09-12T05
2016-06-27_2016-06-28
Param
SupervisorSpec
dropRule
druid.query.segmentMetadata.defaultHistory
isointerval
json
loadRule
maxTime
minTime
numCandidates
param
segmentId1
segmentId2
taskId
taskid
un
- ../docs/operations/basic-cluster-tuning.md
100MiB
128MiB
15ms
2.5MiB
24GiB
256MiB
30GiB-60GiB
4GiB
5MB
64KiB
8GiB
G1GC
GroupBys
QoS-type
- ../docs/operations/dump-segment.md
DumpSegment
SegmentMetadata
__time
bitmapSerdeFactory
columnName
index.zip
time-iso8601
- ../docs/operations/export-metadata.md
hadoopStorageDirectory
- ../docs/operations/insert-segment-to-db.md
0.14.x
- ../docs/operations/metrics.md
0.14.x
1s
Bufferpool
EventReceiverFirehose
EventReceiverFirehoseMonitor
Filesystesm
JVMMonitor
QueryCountStatsMonitor
RealtimeMetricsMonitor
Sys
SysMonitor
TaskCountStatsMonitor
TaskSlotCountStatsMonitor
bufferCapacity
bufferpoolName
cms
cpuName
cpuTime
druid.server.http.numThreads
druid.server.http.queueSize
fsDevName
fsDirName
fsOptions
fsSysTypeName
fsTypeName
g1
gcGen
gcName
handoffed
hasFilters
memKind
nativeQueryIds
netAddress
netHwaddr
netName
noticeType
numComplexMetrics
numDimensions
numMetrics
poolKind
poolName
remoteAddress
serviceName
taskStatus
taskType
threadPoolNumBusyThreads.
threadPoolNumIdleThreads
threadPoolNumTotalThreads.
- ../docs/operations/other-hadoop.md
CDH
Classloader
assembly.sbt
build.sbt
classloader
druid_build
mapred-default
mapred-site
sbt
scala-2
- ../docs/operations/pull-deps.md
org.apache.hadoop
proxy.com.
remoteRepository
- ../docs/operations/recommendations.md
JBOD
druid.processing.buffer.sizeBytes.
druid.processing.numMergeBuffers
druid.processing.numThreads
tmpfs
- ../docs/operations/rule-configuration.md
broadcastByInterval
broadcastByPeriod
broadcastForever
colocatedDataSources
dropBeforeByPeriod
dropByInterval
dropByPeriod
dropForever
loadByInterval
loadByPeriod
loadForever
- ../docs/operations/segment-optimization.md
700MB
- ../docs/operations/single-server.md
128GiB
16GiB
256GiB
4GiB
512GiB
64GiB
Nano-Quickstart
i3
i3.16xlarge
i3.2xlarge
i3.4xlarge
i3.8xlarge
- ../docs/operations/tls-support.md
CN
subjectAltNames
- ../docs/querying/aggregations.md
HyperUnique
hyperUnique
longSum
- ../docs/querying/datasource.md
groupBys
- ../docs/querying/datasourcemetadataquery.md
dataSourceMetadata
- ../docs/querying/dimensionspecs.md
ExtractionDimensionSpec
SimpleDateFormat
bar_1
dimensionSpecs
isWhitelist
joda
nullHandling
product_1
product_3
registeredLookup
timeFormat
tz
v3
weekyears
- ../docs/querying/filters.md
___bar
caseSensitive
extractionFn
insensitive_contains
last_name
lowerStrict
upperStrict
- ../docs/querying/granularities.md
1970-01-01T00
P2W
PT0.750S
PT1H30M
TimeseriesQuery
- ../docs/querying/groupbyquery.md
D1
D2
D3
druid.query.groupBy.defaultStrategy
druid.query.groupBy.maxMergingDictionarySize
druid.query.groupBy.maxOnDiskStorage
druid.query.groupBy.maxResults.
groupByStrategy
maxOnDiskStorage
maxResults
orderby
orderbys
outputName
pushdown
row1
subtotalsSpec
- ../docs/querying/having.md
HavingSpec
HavingSpecs
dimSelector
equalTo
greaterThan
lessThan
- ../docs/querying/hll-old.md
DefaultDimensionSpec
druid-hll
isInputHyperUnique
- ../docs/querying/joins.md
pre-join
- ../docs/querying/limitspec.md
DefaultLimitSpec
OrderByColumnSpec
OrderByColumnSpecs
dimensionOrder
- ../docs/querying/lookups.md
60_000
kafka-extraction-namespace
mins
tierName
- ../docs/querying/multi-value-dimensions.md
row2
row3
row4
t3
t4
t5
- ../docs/querying/multitenancy.md
500ms
tenant_id
- ../docs/querying/post-aggregations.md
fieldAccess
finalizingFieldAccess
hyperUniqueCardinality
- ../docs/querying/query-context.md
brokerService
bySegment
doubleSum
druid.broker.cache.populateCache
druid.broker.cache.populateResultLevelCache
druid.broker.cache.useCache
druid.broker.cache.useResultLevelCache
druid.historical.cache.populateCache
druid.historical.cache.useCache
parallel broker merges on fork join pool (#8578) * sketch of broker parallel merges done in small batches on fork join pool * fix non-terminating sequences, auto compute parallelism * adjust benches * adjust benchmarks * now hella more faster, fixed dumb * fix * remove comments * log.info for debug * javadoc * safer block for sequence to yielder conversion * refactor LifecycleForkJoinPool into LifecycleForkJoinPoolProvider which wraps a ForkJoinPool * smooth yield rate adjustment, more logs to help tune * cleanup, less logs * error handling, bug fixes, on by default, more parallel, more tests * remove unused var * comments * timeboundary mergeFn * simplify, more javadoc * formatting * pushdown config * use nanos consistently, move logs back to debug level, bit more javadoc * static terminal result batch * javadoc for nullability of createMergeFn * cleanup * oops * fix race, add docs * spelling, remove todo, add unhandled exception log * cleanup, revert unintended change * another unintended change * review stuff * add ParallelMergeCombiningSequenceBenchmark, fixes * hyper-threading is the enemy * fix initial start delay, lol * parallelism computer now balances partition sizes to partition counts using sqrt of sequence count instead of sequence count by 2 * fix those important style issues with the benchmarks code * lazy sequence creation for benchmarks * more benchmark comments * stable sequence generation time * update defaults to use 100ms target time, 4096 batch size, 16384 initial yield, also update user docs * add jmh thread based benchmarks, cleanup some stuff * oops * style * add spread to jmh thread benchmark start range, more comments to benchmarks parameters and purpose * retool benchmark to allow modeling more typical heterogenous heavy workloads * spelling * fix * refactor benchmarks * formatting * docs * add maxThreadStartDelay parameter to threaded benchmark * why does catch need to be on its own line but else doesnt
2019-11-07 14:58:46 -05:00
enableParallelMerge
floatSum
maxQueuedBytes
maxScatterGatherBytes
minTopNThreshold
parallel broker merges on fork join pool (#8578) * sketch of broker parallel merges done in small batches on fork join pool * fix non-terminating sequences, auto compute parallelism * adjust benches * adjust benchmarks * now hella more faster, fixed dumb * fix * remove comments * log.info for debug * javadoc * safer block for sequence to yielder conversion * refactor LifecycleForkJoinPool into LifecycleForkJoinPoolProvider which wraps a ForkJoinPool * smooth yield rate adjustment, more logs to help tune * cleanup, less logs * error handling, bug fixes, on by default, more parallel, more tests * remove unused var * comments * timeboundary mergeFn * simplify, more javadoc * formatting * pushdown config * use nanos consistently, move logs back to debug level, bit more javadoc * static terminal result batch * javadoc for nullability of createMergeFn * cleanup * oops * fix race, add docs * spelling, remove todo, add unhandled exception log * cleanup, revert unintended change * another unintended change * review stuff * add ParallelMergeCombiningSequenceBenchmark, fixes * hyper-threading is the enemy * fix initial start delay, lol * parallelism computer now balances partition sizes to partition counts using sqrt of sequence count instead of sequence count by 2 * fix those important style issues with the benchmarks code * lazy sequence creation for benchmarks * more benchmark comments * stable sequence generation time * update defaults to use 100ms target time, 4096 batch size, 16384 initial yield, also update user docs * add jmh thread based benchmarks, cleanup some stuff * oops * style * add spread to jmh thread benchmark start range, more comments to benchmarks parameters and purpose * retool benchmark to allow modeling more typical heterogenous heavy workloads * spelling * fix * refactor benchmarks * formatting * docs * add maxThreadStartDelay parameter to threaded benchmark * why does catch need to be on its own line but else doesnt
2019-11-07 14:58:46 -05:00
parallelMergeInitialYieldRows
parallelMergeParallelism
parallelMergeSmallBatchRows
populateCache
populateResultLevelCache
queryId
row-matchers
serializeDateTimeAsLong
serializeDateTimeAsLongInner
skipEmptyBuckets
useCache
useResultLevelCache
vectorSize
enableJoinLeftTableScanDirect
- ../docs/querying/querying.md
7KiB
DatasourceMetadata
TimeBoundary
errorClass
errorMessage
x-jackson-smile
- ../docs/querying/scan-query.md
batchSize
compactedList
druid.query.scan.legacy
druid.query.scan.maxRowsQueuedForOrdering
druid.query.scan.maxSegmentPartitionsOrderedInMemory
maxRowsQueuedForOrdering
maxSegmentPartitionsOrderedInMemory
resultFormat
valueVector
- ../docs/querying/searchquery.md
SearchQuerySpec
cursorOnly
druid.query.search.searchStrategy
queryableIndexSegment
searchDimensions
searchStrategy
useIndexes
- ../docs/querying/searchqueryspec.md
ContainsSearchQuerySpec
FragmentSearchQuerySpec
InsensitiveContainsSearchQuerySpec
RegexSearchQuerySpec
- ../docs/querying/segmentmetadataquery.md
analysisType
analysisTypes
lenientAggregatorMerge
minmax
segmentMetadata
toInclude
- ../docs/querying/select-query.md
PagingSpec
fromNext
pagingSpec
- ../docs/querying/sorting-orders.md
BoundFilter
GroupByQuery's
SearchQuery
TopNMetricSpec
compareTo
file12
file2
- ../docs/querying/sql.md
APPROX_COUNT_DISTINCT
APPROX_QUANTILE
ARRAY_AGG
BIGINT
CATALOG_NAME
CHARACTER_MAXIMUM_LENGTH
CHARACTER_OCTET_LENGTH
CHARACTER_SET_NAME
COLLATION_NAME
COLUMN_DEFAULT
COLUMN_NAME
Concats
DATA_TYPE
DATETIME_PRECISION
DEFAULT_CHARACTER_SET_CATALOG
DEFAULT_CHARACTER_SET_NAME
DEFAULT_CHARACTER_SET_SCHEMA
ISODOW
ISOYEAR
IS_NULLABLE
JDBC_TYPE
MIDDLE_MANAGER
NULLable
NUMERIC_PRECISION
NUMERIC_PRECISION_RADIX
NUMERIC_SCALE
ORDINAL_POSITION
PT1M
PT5M
SCHEMA_NAME
SCHEMA_OWNER
SERVER_SEGMENTS
SMALLINT
SQL_PATH
STRING_AGG
SYSTEM_TABLE
TABLE_CATALOG
TABLE_NAME
TABLE_SCHEMA
TABLE_TYPE
TIME_PARSE
TIME_SHIFT
TINYINT
VARCHAR
avg_num_rows
avg_size
created_time
current_size
detailed_state
druid.server.maxSize
druid.server.tier
druid.sql.planner.maxSemiJoinRowsInMemory
druid.sql.planner.sqlTimeZone
druid.sql.planner.useApproximateCountDistinct
druid.sql.planner.useGroupingSetForExactDistinct
druid.sql.planner.useApproximateTopN
error_msg
exprs
group_id
interval_expr
is_available
is_leader
is_overshadowed
is_published
is_realtime
java.sql.Types
last_compaction_state
max_size
num_replicas
num_rows
num_segments
partition_num
plaintext_port
queue_insertion_time
runner_status
segment_id
server_type
shard_spec
sqlTimeZone
supervisor_id
sys
sys.segments
task_id
timestamp_expr
tls_port
total_size
useApproximateCountDistinct
useGroupingSetForExactDistinct
useApproximateTopN
wikipedia
IEC
- ../docs/querying/timeseriesquery.md
fieldName1
fieldName2
- ../docs/querying/topnmetricspec.md
DimensionTopNMetricSpec
metricSpec
previousStop
- ../docs/querying/topnquery.md
GroupByQuery
top500
- ../docs/querying/virtual-columns.md
outputType
- ../docs/tutorials/cluster.md
1.9TB
16CPU
WebUpd8
m5.2xlarge
metadata.storage.
256GiB
128GiB
- ../docs/tutorials/tutorial-batch-hadoop.md
PATH_TO_DRUID
namenode
- ../docs/tutorials/tutorial-delete-data.md
segmentID
segmentIds
- ../docs/tutorials/tutorial-ingestion-spec.md
dstIP
dstPort
srcIP
srcPort
- ../docs/tutorials/tutorial-kerberos-hadoop.md
common_runtime_properties
druid.extensions.directory
druid.extensions.loadList
druid.hadoop.security.kerberos.keytab
druid.hadoop.security.kerberos.principal
druid.indexer.logs.directory
druid.indexer.logs.type
druid.storage.storageDirectory
druid.storage.type
hdfs.headless.keytab
indexing_log
keytabs
- ../docs/tutorials/tutorial-query.md
dsql
- ../docs/tutorials/tutorial-retention.md
2015-09-12T12
- ../docs/tutorials/tutorial-update-data.md
bear-111
- ../docs/configuration/index.md
10KiB
2GiB
512KiB
1GiB
KiB
GiB
00.000Z
100ms
10ms
1GB
1_000_000
2012-01-01T00
2GB
30_000
524288000L
5MiB
8u60
Autoscaler
APPROX_COUNT_DISTINCT_BUILTIN
AvaticaConnectionBalancer
EventReceiverFirehose
File.getFreeSpace
File.getTotalSpace
ForkJoinPool
GCE
HadoopIndexTasks
HttpEmitter
HttpPostEmitter
InetAddress.getLocalHost
IOConfig
JRE8u60
KeyManager
L1
L2
ListManagedInstances
LoadSpec
LoggingEmitter
Los_Angeles
MDC
NoopServiceEmitter
NUMA
ONLY_EVENTS
P1D
P1W
PT-1S
PT0.050S
PT10M
PT10S
PT15M
PT1800S
PT1M
PT1S
PT24H
PT300S
PT30S
PT3600S
PT5M
PT5S
PT60S
PT90M
Param
Runtime.maxMemory
SSLContext
SegmentMetadata
SegmentWriteOutMediumFactory
ServiceEmitter
System.getProperty
TLSv1.2
TrustManager
TuningConfig
_N_
_default
_default_tier
addr
affinityConfig
allowAll
ANDed
array_mod
autoscale
autoscalers
batch_index_task
cgroup
classloader
com.metamx
common.runtime.properties
cpuacct
dataSourceName
datetime
defaultHistory
dimensionsSpec
doubleMax
doubleMin
doubleSum
druid.enableTlsPort
druid.indexer.autoscale.workerVersion
druid.service
druid.storage.disableAcl
druid_audit
druid_config
druid_dataSource
druid_pendingSegments
druid_rules
druid_segments
druid_supervisors
druid_taskLock
druid_taskLog
druid_tasks
DruidQueryRel
ec2
equalDistribution
extractionFn
file.encoding
fillCapacity
first_location
floatMax
floatAny
floatMin
floatSum
freeSpacePercent
gce
gce-extensions
getCanonicalHostName
groupBy
hdfs
httpRemote
indexTask
info_dir
inlining
java.class.path
java.io.tmpdir
javaOpts
javaOptsArray
Making optimal usage of multiple segment cache locations (#8038) * #7641 - Changing segment distribution algorithm to distribute segments to multiple segment cache locations * Fixing indentation * WIP * Adding interface for location strategy selection, least bytes used strategy impl, round-robin strategy impl, locationSelectorStrategy config with least bytes used strategy as the default strategy * fixing code style * Fixing test * Adding a method visible only for testing, fixing tests * 1. Changing the method contract to return an iterator of locations instead of a single best location. 2. Check style fixes * fixing the conditional statement * Added testSegmentDistributionUsingLeastBytesUsedStrategy, fixed testSegmentDistributionUsingRoundRobinStrategy * to trigger CI build * Add documentation for the selection strategy configuration * to re trigger CI build * updated docs as per review comments, made LeastBytesUsedStorageLocationSelectorStrategy.getLocations a synchronzied method, other minor fixes * In checkLocationConfigForNull method, using getLocations() to check for null instead of directly referring to the locations variable so that tests overriding getLocations() method do not fail * Implementing review comments. Added tests for StorageLocationSelectorStrategy * Checkstyle fixes * Adding java doc comments for StorageLocationSelectorStrategy interface * checkstyle * empty commit to retrigger build * Empty commit * Adding suppressions for words leastBytesUsed and roundRobin of ../docs/configuration/index.md file * Impl review comments including updating docs as suggested * Removing checkLocationConfigForNull(), @NotEmpty annotation serves the purpose * Round robin iterator to keep track of the no. of iterations, impl review comments, added tests for round robin strategy * Fixing the round robin iterator * Removed numLocationsToTry, updated java docs * changing property attribute value from tier to type * Fixing assert messages
2019-09-28 02:17:44 -04:00
leastBytesUsed
loadList
loadqueuepeon
loadspec
localStorage
maxHeaderSize
maxQueuedBytes
maxSize
middlemanager
minTimeMs
minmax
mins
nullable
orderby
orderbys
org.apache.druid
org.apache.druid.jetty.RequestLog
org.apache.hadoop
overlord.html
pendingSegments
pre-flight
preloaded
queryType
remoteTaskRunnerConfig
rendezvousHash
replicants
resultsets
Making optimal usage of multiple segment cache locations (#8038) * #7641 - Changing segment distribution algorithm to distribute segments to multiple segment cache locations * Fixing indentation * WIP * Adding interface for location strategy selection, least bytes used strategy impl, round-robin strategy impl, locationSelectorStrategy config with least bytes used strategy as the default strategy * fixing code style * Fixing test * Adding a method visible only for testing, fixing tests * 1. Changing the method contract to return an iterator of locations instead of a single best location. 2. Check style fixes * fixing the conditional statement * Added testSegmentDistributionUsingLeastBytesUsedStrategy, fixed testSegmentDistributionUsingRoundRobinStrategy * to trigger CI build * Add documentation for the selection strategy configuration * to re trigger CI build * updated docs as per review comments, made LeastBytesUsedStorageLocationSelectorStrategy.getLocations a synchronzied method, other minor fixes * In checkLocationConfigForNull method, using getLocations() to check for null instead of directly referring to the locations variable so that tests overriding getLocations() method do not fail * Implementing review comments. Added tests for StorageLocationSelectorStrategy * Checkstyle fixes * Adding java doc comments for StorageLocationSelectorStrategy interface * checkstyle * empty commit to retrigger build * Empty commit * Adding suppressions for words leastBytesUsed and roundRobin of ../docs/configuration/index.md file * Impl review comments including updating docs as suggested * Removing checkLocationConfigForNull(), @NotEmpty annotation serves the purpose * Round robin iterator to keep track of the no. of iterations, impl review comments, added tests for round robin strategy * Fixing the round robin iterator * Removed numLocationsToTry, updated java docs * changing property attribute value from tier to type * Fixing assert messages
2019-09-28 02:17:44 -04:00
roundRobin
runtime.properties
runtime.properties.
s3
s3a
s3n
slf4j
sql
sqlQuery
successfulSending
taskBlackListCleanupPeriod
tasklogs
timeBoundary
DruidInputSource: Fix issues in column projection, timestamp handling. (#10267) * DruidInputSource: Fix issues in column projection, timestamp handling. DruidInputSource, DruidSegmentReader changes: 1) Remove "dimensions" and "metrics". They are not necessary, because we can compute which columns we need to read based on what is going to be used by the timestamp, transform, dimensions, and metrics. 2) Start using ColumnsFilter (see below) to decide which columns we need to read. 3) Actually respect the "timestampSpec". Previously, it was ignored, and the timestamp of the returned InputRows was set to the `__time` column of the input datasource. (1) and (2) together fix a bug in which the DruidInputSource would not properly read columns that are used as inputs to a transformSpec. (3) fixes a bug where the timestampSpec would be ignored if you attempted to set the column to something other than `__time`. (1) and (3) are breaking changes. Web console changes: 1) Remove "Dimensions" and "Metrics" from the Druid input source. 2) Set timestampSpec to `{"column": "__time", "format": "millis"}` for compatibility with the new behavior. Other changes: 1) Add ColumnsFilter, a new class that allows input readers to determine which columns they need to read. Currently, it's only used by the DruidInputSource, but it could be used by other columnar input sources in the future. 2) Add a ColumnsFilter to InputRowSchema. 3) Remove the metric names from InputRowSchema (they were unused). 4) Add InputRowSchemas.fromDataSchema method that computes the proper ColumnsFilter for given timestamp, dimensions, transform, and metrics. 5) Add "getRequiredColumns" method to TransformSpec to support the above. * Various fixups. * Uncomment incorrectly commented lines. * Move TransformSpecTest to the proper module. * Add druid.indexer.task.ignoreTimestampSpecForDruidInputSource setting. * Fix. * Fix build. * Checkstyle. * Misc fixes. * Fix test. * Move config. * Fix imports. * Fixup. * Fix ShuffleResourceTest. * Add import. * Smarter exclusions. * Fixes based on tests. Also, add TIME_COLUMN constant in the web console. * Adjustments for tests. * Reorder test data. * Update docs. * Update docs to say Druid 0.22.0 instead of 0.21.0. * Fix test. * Fix ITAutoCompactionTest. * Changes from review & from merging.
2021-03-25 13:32:21 -04:00
timestampSpec
tmp
tmpfs
truststore
tuningConfig
unioning
useIndexes
user.timezone
v0.12.0
versionReplacementString
workerId
yyyy-MM-dd
taskType
index_kafka
c1
c2
ds1
equalDistributionWithCategorySpec
fillCapacityWithCategorySpec
WorkerCategorySpec
workerCategorySpec
CategoryConfig
- ../docs/design/index.md
logsearch
- ../docs/ingestion/index.md
2000-01-01T01
DateTimeFormat
JsonPath
autodetect
createBitmapIndex
dimensionExclusions
expr
jackson-jq
missingValue
schemaless
skipBytesInMemoryOverheadCheck
spatialDimensions
useFieldDiscovery
- ../docs/tutorials/index.md
4CPU
cityName
countryIsoCode
countryName
isAnonymous
isMinor
isNew
isRobot
isUnpatrolled
metroCode
regionIsoCode
regionName
4GiB
512GiB
- ../docs/development/extensions-core/druid-ranger-security.md
json
metastore
UserGroupInformation
CVE-2019-17571
CVE-2019-12399
CVE-2018-17196
bin.tar.gz
- ../docs/configuration/human-readable-byte.md
0s
1T
3G
1_000
1_000_000
1_000_000_000
1_000_000_000_000
1_000_000_000_000_000
Giga
Tera
Peta
KiB
MiB
GiB
TiB
PiB
protobuf
Golang
multiValueHandling