druid/docs/Concepts-and-Terminology.md

Concepts and Terminology
========================

-   **Aggregators:** A mechanism for combining records during realtime incremental indexing, Hadoop batch indexing, and in queries.
-   **DataSource:** A table-like view of data; specified in a “specFile” and in a query.
-   **Granularity:** The time interval corresponding to aggregation by time.
    -   The *indexGranularity* setting in a schema is used to aggregate input (ingest) records within an interval into a single output (internal) record.
    -   The *segmentGranularity* is the interval specifying how internal records are stored together in a single file.

-   **Segment:** A collection of (internal) records that are stored and processed together.
-   **Shard:** A unit of partitioning data across machine. TODO: clarify; by time or other dimensions?
-   **specFile** is specification for services in JSON format; see [[Realtime]] and [[Batch-ingestion]]
Add docs from github wiki 2013-09-13 18:20:39 -04:00			`Concepts and Terminology`
			`========================`

			`- Aggregators: A mechanism for combining records during realtime incremental indexing, Hadoop batch indexing, and in queries.`
			`- DataSource: A table-like view of data; specified in a “specFile” and in a query.`
			`- Granularity: The time interval corresponding to aggregation by time.`
			`- The indexGranularity setting in a schema is used to aggregate input (ingest) records within an interval into a single output (internal) record.`
			`- The segmentGranularity is the interval specifying how internal records are stored together in a single file.`

			`- Segment: A collection of (internal) records that are stored and processed together.`
			`- Shard: A unit of partitioning data across machine. TODO: clarify; by time or other dimensions?`
			`- specFile is specification for services in JSON format; see [[Realtime]] and [[Batch-ingestion]]`