* Druid data is stored in [datasources](index.html#datasources), which are similar to tables in a traditional RDBMS.
* Druid datasources can be ingested with or without [rollup](#rollup). With rollup enabled, Druid partially aggregates your data during ingestion, potentially reducing its row count, decreasing storage footprint, and improving query performance. With rollup disabled, Druid stores one row for each row in your input data, without any pre-aggregation.
* Every row in Druid must have a timestamp. Data is always partitioned by time, and every query has a time filter. Query results can also be broken down by time buckets like minutes, hours, days, and so on.
* All columns in Druid datasources, other than the timestamp column, are either dimensions or metrics. This follows the [standard naming convention](https://en.wikipedia.org/wiki/Online_analytical_processing#Overview_of_OLAP_systems) of OLAP data.
* Typical production datasources have tens to hundreds of columns.
* [Dimension columns](ingestion-spec.html#dimensions) are stored as-is, so they can be filtered on, grouped by, or aggregated at query time. They are always single Strings, [arrays of Strings](../querying/multi-value-dimensions.html), single Longs, single Doubles or single Floats.
* Metric columns are stored [pre-aggregated](../querying/aggregations.html), so they can only be aggregated at query time (not filtered or grouped by). They are often stored as numbers (integers or floats) but can also be stored as complex objects like [HyperLogLog sketches or approximate quantile sketches](../querying/aggregations.html#approx). Metrics can be configured at ingestion time even when rollup is disabled, but are most useful when rollup is enabled.
The rest of this page discusses tips for users coming from other kinds of systems, as well as general tips and
common practices.
## If you're coming from a...
### Relational model
(Like Hive or PostgreSQL.)
Druid datasources are generally equivalent to tables in a relational database. Druid [lookups](../querying/lookups.html)
can act similarly to data-warehouse-style dimension tables, but as you'll see below, denormalization is often
recommended if you can get away with it.
Common practice for relational data modeling involves [normalization](https://en.wikipedia.org/wiki/Database_normalization):
the idea of splitting up data into multiple tables such that data redundancy is reduced or eliminated. For example, in a
"sales" table, best-practices relational modeling calls for a "product id" column that is a foreign key into a separate
"products" table, which in turn has "product id", "product name", and "product category" columns. This prevents the
product name and category from needing to be repeated on different rows in the "sales" table that refer to the same
product.
In Druid, on the other hand, it is common to use totally flat datasources that do not require joins at query time. In
the example of the "sales" table, in Druid it would be typical to store "product_id", "product_name", and
"product_category" as dimensions directly in a Druid "sales" datasource, without using a separate "products" table.
Totally flat schemas substantially increase performance, since the need for joins is eliminated at query time. As an
an added speed boost, this also allows Druid's query layer to operate directly on compressed dictionary-encoded data.
Perhaps counter-intuitively, this does _not_ substantially increase storage footprint relative to normalized schemas,
since Druid uses dictionary encoding to effectively store just a single integer per row for string columns.
If necessary, Druid datasources can be partially normalized through the use of [lookups](../querying/lookups.html),
which are the rough equivalent of dimension tables in a relational database. At query time, you would use Druid's SQL
`LOOKUP` function, or native lookup extraction functions, instead of using the JOIN keyword like you would in a
relational database. Since lookup tables impose an increase in memory footprint and incur more computational overhead
at query time, it is only recommended to do this if you need the ability to update a lookup table and have the changes
reflected immediately for already-ingested rows in your main table.
Tips for modeling relational data in Druid:
- Druid datasources do not have primary or unique keys, so skip those.
- Denormalize if possible. If you need to be able to update dimension / lookup tables periodically and have those
changes reflected in already-ingested data, consider partial normalization with [lookups](../querying/lookups.html).
- If you need to join two large distributed tables with each other, you must do this before loading the data into Druid.
Druid does not support query-time joins of two datasources. Lookups do not help here, since a full copy of each lookup
table is stored on each Druid server, so they are not a good choice for large tables.
- Consider whether you want to enable [rollup](#rollup) for pre-aggregation, or whether you want to disable
rollup and load your existing data as-is. Rollup in Druid is similar to creating a summary table in a relational model.
### Time series model
(Like OpenTSDB or InfluxDB.)
Similar to time series databases, Druid's data model requires a timestamp. Druid is not a timeseries database, but
If the user wishes to ingest a column as a numeric-typed dimension (Long, Double or Float), it is necessary to specify the type of the column in the `dimensions` section of the `dimensionsSpec`. If the type is omitted, Druid will ingest a column as the default String type.
One workflow with unique IDs is to be able to filter on a particular ID, while still being able to do fast unique counts on the ID column.
If you are not using schema-less dimensions, this use case is supported by setting the `name` of the metric to something different than the dimension.
If you are using schema-less dimensions, the best practice here is to include the same column twice, once as a dimension, and as a `hyperUnique` metric. This may involve
some work at ETL time.
As an example, for schema-less dimensions, repeat the same column: