mirror of https://github.com/apache/druid.git
Merge remote-tracking branch 'upstream/master' into rabbitmq-module
This commit is contained in:
commit
386c81f0d9
File diff suppressed because one or more lines are too long
After Width: | Height: | Size: 30 KiB |
File diff suppressed because one or more lines are too long
After Width: | Height: | Size: 47 KiB |
|
@ -4,11 +4,36 @@ layout: doc_page
|
||||||
Concepts and Terminology
|
Concepts and Terminology
|
||||||
========================
|
========================
|
||||||
|
|
||||||
* **Aggregators**: A mechanism for combining records during realtime incremental indexing, Hadoop batch indexing, and in queries.
|
The following definitions are given with respect to the Druid data store. They are intended to help you better understand the Druid documentation, where these terms and concepts occur.
|
||||||
* **DataSource**: A table-like view of data; specified in a "specFile" and in a query.
|
|
||||||
* **Granularity**: The time interval corresponding to aggregation by time.
|
More definitions are available on the [design page](Design.html).
|
||||||
* **indexGranularity**: specifies the granularity used to bucket timestamps within a segment.
|
|
||||||
* **segmentGranularity**: specifies the granularity of the segment, i.e. the amount of time a segment will represent
|
* **Aggregation** The summarizing of data meeting certain specifications. Druid aggregates [timeseries data](#timeseries), which in effect compacts the data. Time intervals (set in configuration) are used to create buckets, while [timestamps](#timestamp) determine which buckets data aggregated in.
|
||||||
* **Segment**: A collection of (internal) records that are stored and processed together.
|
|
||||||
* **Shard**: A sub-partition of the data in a segment. It is possible to have multiple segments represent all data for a given segmentGranularity.
|
* **Aggregators** A mechanism for combining records during realtime incremental indexing, Hadoop batch indexing, and in queries.
|
||||||
* **specFile**: is specification for services in JSON format; see [Realtime](Realtime.html) and [Batch-ingestion](Batch-ingestion.html)
|
|
||||||
|
* **DataSource** A table-like view of data; specified in [specFiles](#specfile) and in queries. A dataSource specifies the source of data being ingested and ultimately stored in [segments](#segment).
|
||||||
|
|
||||||
|
* **Dimensions** Aspects or categories of data, such as languages or locations. For example, with *language* and *country* as the type of dimension, values could be "English" or "Mandarin" for language, or "USA" or "China" for country. In Druid, dimensions can serve as filters for narrowing down hits (for example, language = "English" or country = "China").
|
||||||
|
|
||||||
|
* **Granularity** The time interval corresponding to aggregation by time. Druid configuration settings specify the granularity of [timestamp](#timestamp) buckets in a [segment](#segment) (for example, by minute or by hour), as well as the granularity of the segment itself. The latter is essentially the overall range of absolute time covered by the segment. In queries, granularity settings control the summarization of findings.
|
||||||
|
|
||||||
|
* **Ingestion** The pulling and initial storing and processing of data. Druid supports realtime and batch ingestion of data, and applies indexing in both cases.
|
||||||
|
|
||||||
|
* **Metrics** Countable data that can be aggregated. Metrics, for example, can be the number of visitors to a website, number of tweets per day, or average revenue.
|
||||||
|
|
||||||
|
* **Rollup** The aggregation of data that occurs at one or more stages, based on settings in a [configuration file](#specFile).
|
||||||
|
|
||||||
|
<a name="segment"></a>
|
||||||
|
* **Segment** A collection of (internal) records that are stored and processed together. Druid chunks data into segments representing a time interval, and these are stored and manipulated in the cluster.
|
||||||
|
|
||||||
|
* **Shard** A sub-partition of the data, allowing multiple [segments](#segment) to represent the data in a certain time interval. Sharding occurs along time partitions to better handle amounts of data that exceed certain limits on segment size, although sharding along dimensions may also occur to optimize efficiency.
|
||||||
|
|
||||||
|
<a name="specfile"></a>
|
||||||
|
* **specFile** The specification for services in JSON format; see [Realtime](Realtime.html) and [Batch-ingestion](Batch-ingestion.html)
|
||||||
|
|
||||||
|
<a name="timeseries"></a>
|
||||||
|
* **Timeseries Data** Data points which are ordered in time. The closing value of a financial index or the number of tweets per hour with a certain hashtag are examples of timeseries data.
|
||||||
|
|
||||||
|
<a name="timestamp"></a>
|
||||||
|
* **Timestamp** An absolute position on a timeline, given in a standard alpha-numerical format such as with UTC time. [Timeseries data](#timeseries) points can be ordered by timestamp, and in Druid, they are.
|
||||||
|
|
|
@ -49,7 +49,7 @@ Aside from these nodes, there are 3 external dependencies to the system:
|
||||||
2. A [MySQL instance](MySQL.html) for maintenance of metadata about the data segments that should be served by the system
|
2. A [MySQL instance](MySQL.html) for maintenance of metadata about the data segments that should be served by the system
|
||||||
3. A ["deep storage" LOB store/file system](Deep-Storage.html) to hold the stored segments
|
3. A ["deep storage" LOB store/file system](Deep-Storage.html) to hold the stored segments
|
||||||
|
|
||||||
The following diagram shows how certain nodes and dependencies help manage the cluster by tracking and exchanging metadata. This management layer is illustrated in the following diagram:
|
The following diagram illustrates the cluster's management layer, showing how certain nodes and dependencies help manage the cluster by tracking and exchanging metadata:
|
||||||
|
|
||||||
<img src="../img/druid-manage-1.png" width="800"/>
|
<img src="../img/druid-manage-1.png" width="800"/>
|
||||||
|
|
||||||
|
|
|
@ -140,7 +140,8 @@ The result looks something like this:
|
||||||
|
|
||||||
This groupBy query is a bit complicated and we'll return to it later. For the time being, just make sure you are getting some blocks of data back. If you are having problems, make sure you have [curl](http://curl.haxx.se/) installed. Control+C to break out of the client script.
|
This groupBy query is a bit complicated and we'll return to it later. For the time being, just make sure you are getting some blocks of data back. If you are having problems, make sure you have [curl](http://curl.haxx.se/) installed. Control+C to break out of the client script.
|
||||||
|
|
||||||
h2. Querying Druid
|
Querying Druid
|
||||||
|
--------------
|
||||||
|
|
||||||
In your favorite editor, create the file:
|
In your favorite editor, create the file:
|
||||||
|
|
||||||
|
|
|
@ -248,5 +248,5 @@ druid.processing.buffer.sizeBytes=10000000
|
||||||
|
|
||||||
Next Steps
|
Next Steps
|
||||||
----------
|
----------
|
||||||
If you are intested in how data flows through the different Druid components, check out the Druid [Data Flow](Data-Flow.html). Now that you have an understanding of what the Druid cluster looks like, why not load some of your own data?
|
If you are intested in how data flows through the different Druid components, check out the [Druid data flow architecture](Design.html). Now that you have an understanding of what the Druid cluster looks like, why not load some of your own data?
|
||||||
Check out the next [tutorial](Tutorial%3A-Loading-Your-Data-Part-1.html) section for more info!
|
Check out the next [tutorial](Tutorial%3A-Loading-Your-Data-Part-1.html) section for more info!
|
||||||
|
|
|
@ -8,6 +8,7 @@ h1. Contents
|
||||||
* "Concepts and Terminology":./Concepts-and-Terminology.html
|
* "Concepts and Terminology":./Concepts-and-Terminology.html
|
||||||
|
|
||||||
h2. Getting Started
|
h2. Getting Started
|
||||||
|
* "Concepts and Terminology":./Concepts-and-Terminology.html
|
||||||
* "Tutorial: A First Look at Druid":./Tutorial:-A-First-Look-at-Druid.html
|
* "Tutorial: A First Look at Druid":./Tutorial:-A-First-Look-at-Druid.html
|
||||||
* "Tutorial: The Druid Cluster":./Tutorial:-The-Druid-Cluster.html
|
* "Tutorial: The Druid Cluster":./Tutorial:-The-Druid-Cluster.html
|
||||||
* "Tutorial: Loading Your Data Part 1":./Tutorial:-Loading-Your-Data-Part-1.html
|
* "Tutorial: Loading Your Data Part 1":./Tutorial:-Loading-Your-Data-Part-1.html
|
||||||
|
|
|
@ -49,7 +49,7 @@ public class JacksonModule implements Module
|
||||||
public ObjectMapper smileMapper()
|
public ObjectMapper smileMapper()
|
||||||
{
|
{
|
||||||
ObjectMapper retVal = new DefaultObjectMapper(new SmileFactory());
|
ObjectMapper retVal = new DefaultObjectMapper(new SmileFactory());
|
||||||
retVal.getJsonFactory().setCodec(retVal);
|
retVal.getFactory().setCodec(retVal);
|
||||||
return retVal;
|
return retVal;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
|
@ -89,7 +89,7 @@ public class DirectDruidClient<T> implements QueryRunner<T>
|
||||||
this.httpClient = httpClient;
|
this.httpClient = httpClient;
|
||||||
this.host = host;
|
this.host = host;
|
||||||
|
|
||||||
this.isSmile = this.objectMapper.getJsonFactory() instanceof SmileFactory;
|
this.isSmile = this.objectMapper.getFactory() instanceof SmileFactory;
|
||||||
this.openConnections = new AtomicInteger();
|
this.openConnections = new AtomicInteger();
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -269,7 +269,7 @@ public class DirectDruidClient<T> implements QueryRunner<T>
|
||||||
{
|
{
|
||||||
if (jp == null) {
|
if (jp == null) {
|
||||||
try {
|
try {
|
||||||
jp = objectMapper.getJsonFactory().createJsonParser(future.get());
|
jp = objectMapper.getFactory().createParser(future.get());
|
||||||
if (jp.nextToken() != JsonToken.START_ARRAY) {
|
if (jp.nextToken() != JsonToken.START_ARRAY) {
|
||||||
throw new IAE("Next token wasn't a START_ARRAY, was[%s]", jp.getCurrentToken());
|
throw new IAE("Next token wasn't a START_ARRAY, was[%s]", jp.getCurrentToken());
|
||||||
} else {
|
} else {
|
||||||
|
@ -292,7 +292,9 @@ public class DirectDruidClient<T> implements QueryRunner<T>
|
||||||
@Override
|
@Override
|
||||||
public void close() throws IOException
|
public void close() throws IOException
|
||||||
{
|
{
|
||||||
jp.close();
|
if(jp != null) {
|
||||||
|
jp.close();
|
||||||
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
Loading…
Reference in New Issue