190 lines
7.8 KiB
Plaintext
190 lines
7.8 KiB
Plaintext
[glossary]
|
|
[[glossary]]
|
|
= Glossary of terms
|
|
|
|
[glossary]
|
|
[[glossary-analysis]] analysis ::
|
|
|
|
Analysis is the process of converting <<glossary-text,full text>> to
|
|
<<glossary-term,terms>>. Depending on which analyzer is used, these phrases:
|
|
`FOO BAR`, `Foo-Bar`, `foo,bar` will probably all result in the
|
|
terms `foo` and `bar`. These terms are what is actually stored in
|
|
the index.
|
|
+
|
|
A full text query (not a <<glossary-term,term>> query) for `FoO:bAR` will
|
|
also be analyzed to the terms `foo`,`bar` and will thus match the
|
|
terms stored in the index.
|
|
+
|
|
It is this process of analysis (both at index time and at search time)
|
|
that allows elasticsearch to perform full text queries.
|
|
+
|
|
Also see <<glossary-text,text>> and <<glossary-term,term>>.
|
|
|
|
[[glossary-cluster]] cluster ::
|
|
|
|
A cluster consists of one or more <<glossary-node,nodes>> which share the
|
|
same cluster name. Each cluster has a single master node which is
|
|
chosen automatically by the cluster and which can be replaced if the
|
|
current master node fails.
|
|
|
|
[[glossary-document]] document ::
|
|
|
|
A document is a JSON document which is stored in elasticsearch. It is
|
|
like a row in a table in a relational database. Each document is
|
|
stored in an <<glossary-index,index>> and has a <<glossary-type,type>> and an
|
|
<<glossary-id,id>>.
|
|
+
|
|
A document is a JSON object (also known in other languages as a hash /
|
|
hashmap / associative array) which contains zero or more
|
|
<<glossary-field,fields>>, or key-value pairs.
|
|
+
|
|
The original JSON document that is indexed will be stored in the
|
|
<<glossary-source_field,`_source` field>>, which is returned by default when
|
|
getting or searching for a document.
|
|
|
|
[[glossary-id]] id ::
|
|
|
|
The ID of a <<glossary-document,document>> identifies a document. The
|
|
`index/type/id` of a document must be unique. If no ID is provided,
|
|
then it will be auto-generated. (also see <<glossary-routing,routing>>)
|
|
|
|
[[glossary-field]] field ::
|
|
|
|
A <<glossary-document,document>> contains a list of fields, or key-value
|
|
pairs. The value can be a simple (scalar) value (eg a string, integer,
|
|
date), or a nested structure like an array or an object. A field is
|
|
similar to a column in a table in a relational database.
|
|
+
|
|
The <<glossary-mapping,mapping>> for each field has a field _type_ (not to
|
|
be confused with document <<glossary-type,type>>) which indicates the type
|
|
of data that can be stored in that field, eg `integer`, `string`,
|
|
`object`. The mapping also allows you to define (amongst other things)
|
|
how the value for a field should be analyzed.
|
|
|
|
[[glossary-index]] index ::
|
|
|
|
An index is like a _table_ in a relational database. It has a
|
|
<<glossary-mapping,mapping>> which defines the <<glossary-field,fields>> in the index,
|
|
which are grouped by multiple <<glossary-type,type>>.
|
|
+
|
|
An index is a logical namespace which maps to one or more
|
|
<<glossary-primary-shard,primary shards>> and can have zero or more
|
|
<<glossary-replica-shard,replica shards>>.
|
|
|
|
[[glossary-mapping]] mapping ::
|
|
|
|
A mapping is like a _schema definition_ in a relational database. Each
|
|
<<glossary-index,index>> has a mapping, which defines each <<glossary-type,type>>
|
|
within the index, plus a number of index-wide settings.
|
|
+
|
|
A mapping can either be defined explicitly, or it will be generated
|
|
automatically when a document is indexed.
|
|
|
|
[[glossary-node]] node ::
|
|
|
|
A node is a running instance of elasticsearch which belongs to a
|
|
<<glossary-cluster,cluster>>. Multiple nodes can be started on a single
|
|
server for testing purposes, but usually you should have one node per
|
|
server.
|
|
+
|
|
At startup, a node will use unicast to discover an existing cluster with
|
|
the same cluster name and will try to join that cluster.
|
|
|
|
[[glossary-primary-shard]] primary shard ::
|
|
|
|
Each document is stored in a single primary <<glossary-shard,shard>>. When
|
|
you index a document, it is indexed first on the primary shard, then
|
|
on all <<glossary-replica-shard,replicas>> of the primary shard.
|
|
+
|
|
By default, an <<glossary-index,index>> has 5 primary shards. You can
|
|
specify fewer or more primary shards to scale the number of
|
|
<<glossary-document,documents>> that your index can handle.
|
|
+
|
|
You cannot change the number of primary shards in an index, once the
|
|
index is created.
|
|
+
|
|
See also <<glossary-routing,routing>>
|
|
|
|
[[glossary-replica-shard]] replica shard ::
|
|
|
|
Each <<glossary-primary-shard,primary shard>> can have zero or more
|
|
replicas. A replica is a copy of the primary shard, and has two
|
|
purposes:
|
|
+
|
|
1. increase failover: a replica shard can be promoted to a primary
|
|
shard if the primary fails
|
|
2. increase performance: get and search requests can be handled by
|
|
primary or replica shards.
|
|
+
|
|
By default, each primary shard has one replica, but the number of
|
|
replicas can be changed dynamically on an existing index. A replica
|
|
shard will never be started on the same node as its primary shard.
|
|
|
|
[[glossary-routing]] routing ::
|
|
|
|
When you index a document, it is stored on a single
|
|
<<glossary-primary-shard,primary shard>>. That shard is chosen by hashing
|
|
the `routing` value. By default, the `routing` value is derived from
|
|
the ID of the document or, if the document has a specified parent
|
|
document, from the ID of the parent document (to ensure that child and
|
|
parent documents are stored on the same shard).
|
|
+
|
|
This value can be overridden by specifying a `routing` value at index
|
|
time, or a <<mapping-routing-field,routing
|
|
field>> in the <<glossary-mapping,mapping>>.
|
|
|
|
[[glossary-shard]] shard ::
|
|
|
|
A shard is a single Lucene instance. It is a low-level “worker” unit
|
|
which is managed automatically by elasticsearch. An index is a logical
|
|
namespace which points to <<glossary-primary-shard,primary>> and
|
|
<<glossary-replica-shard,replica>> shards.
|
|
+
|
|
Other than defining the number of primary and replica shards that an
|
|
index should have, you never need to refer to shards directly.
|
|
Instead, your code should deal only with an index.
|
|
+
|
|
Elasticsearch distributes shards amongst all <<glossary-node,nodes>> in the
|
|
<<glossary-cluster,cluster>>, and can move shards automatically from one
|
|
node to another in the case of node failure, or the addition of new
|
|
nodes.
|
|
|
|
[[glossary-source_field]] source field ::
|
|
|
|
By default, the JSON document that you index will be stored in the
|
|
`_source` field and will be returned by all get and search requests.
|
|
This allows you access to the original object directly from search
|
|
results, rather than requiring a second step to retrieve the object
|
|
from an ID.
|
|
|
|
[[glossary-term]] term ::
|
|
|
|
A term is an exact value that is indexed in elasticsearch. The terms
|
|
`foo`, `Foo`, `FOO` are NOT equivalent. Terms (i.e. exact values) can
|
|
be searched for using _term_ queries. +
|
|
See also <<glossary-text,text>> and <<glossary-analysis,analysis>>.
|
|
|
|
[[glossary-text]] text ::
|
|
|
|
Text (or full text) is ordinary unstructured text, such as this
|
|
paragraph. By default, text will be <<glossary-analysis,analyzed>> into
|
|
<<glossary-term,terms>>, which is what is actually stored in the index.
|
|
+
|
|
Text <<glossary-field,fields>> need to be analyzed at index time in order to
|
|
be searchable as full text, and keywords in full text queries must be
|
|
analyzed at search time to produce (and search for) the same terms
|
|
that were generated at index time.
|
|
+
|
|
See also <<glossary-term,term>> and <<glossary-analysis,analysis>>.
|
|
|
|
[[glossary-type]] type ::
|
|
|
|
A type represents the _type_ of document, e.g. an `email`, a `user`, or a `tweet`.
|
|
The search API can filter documents by type.
|
|
An <<glossary-index,index>> can contain multiple types, and each type has a
|
|
list of <<glossary-field,fields>> that can be specified for
|
|
<<glossary-document,documents>> of that type. Fields with the same
|
|
name in different types in the same index must have the same <<glossary-mapping,mapping>>
|
|
(which defines how each field in the document is indexed and made searchable).
|
|
|