[DOCS] Multiple doc fixes

Closes #5047
This commit is contained in:
Konrad Feldmeier 2014-03-07 14:21:45 +01:00 committed by Clinton Gormley
parent 2affa5004f
commit d7b0d547d4
28 changed files with 121 additions and 110 deletions

View File

@ -5,7 +5,7 @@
[[shard-allocation-filtering]]
=== Shard Allocation Filtering
Allow to control allocation if indices on nodes based on include/exclude
Allows to control the allocation of indices on nodes based on include/exclude
filters. The filters can be set both on the index level and on the
cluster level. Lets start with an example of setting it on the cluster
level:

View File

@ -2,8 +2,8 @@
== Analysis
The index analysis module acts as a configurable registry of Analyzers
that can be used in order to both break indexed (analyzed) fields when a
document is indexed and process query strings. It maps to the Lucene
that can be used in order to break down indexed (analyzed) fields when a
document is indexed as well as to process query strings. It maps to the Lucene
`Analyzer`.
Analyzers are (generally) composed of a single `Tokenizer` and zero or

View File

@ -2,8 +2,8 @@
== Codec module
Codecs define how documents are written to disk and read from disk. The
postings format is the part of the codec that responsible for reading
and writing the term dictionary, postings lists and positions, payloads
postings format is the part of the codec that is responsible for reading
and writing the term dictionary, postings lists and positions, as well as the payloads
and offsets stored in the postings list. The doc values format is
responsible for reading column-stride storage for a field and is typically
used for sorting or faceting. When a field doesn't have doc values enabled,
@ -25,7 +25,7 @@ Elasticsearch, requiring data to be reindexed.
[[custom-postings]]
=== Configuring a custom postings format
Custom postings format can be defined in the index settings in the
A custom postings format can be defined in the index settings in the
`codec` part. The `codec` part can be configured when creating an index
or updating index settings. An example on how to define your custom
postings format:
@ -48,7 +48,7 @@ curl -XPUT 'http://localhost:9200/twitter/' -d '{
}'
--------------------------------------------------
Then we defining your mapping your can use the `my_format` name in the
Then when defining your mapping you can use the `my_format` name in the
`postings_format` option as the example below illustrates:
[source,js]

View File

@ -8,16 +8,31 @@ explicit mappings pre defined. For more information about mapping
definitions, check out the <<mapping,mapping section>>.
[float]
=== Dynamic / Default Mappings
=== Dynamic Mappings
Dynamic mappings allow to automatically apply generic mapping definition
to types that do not have mapping pre defined or applied to new mapping
definitions (overridden). This is mainly done thanks to the fact that
the `object` type and namely the root `object` type allow for schema
less dynamic addition of unmapped fields.
New types and new fields within types can be added dynamically just
by indexing a document. When Elasticsearch encounters a new type,
it creates the type using the `_default_` mapping (see below).
The default mapping definition is plain mapping definition that is
embedded within Elasticsearch:
When it encounters a new field within a type, it autodetects the
datatype that the field contains and adds it to the type mapping
automatically.
See <<mapping-dynamic-mapping>> for details of how to control and
configure dynamic mapping.
[float]
=== Default Mapping
When a new type is created (at <<indices-create-index,index creation>> time,
using the <<indices-put-mapping,`put-mapping` API>> or just by indexing a
document into it), the type uses the `_default_` mapping as its basis. Any
mapping specified in the <<indices-create-index,`create-index`>> or
<<indices-put-mapping,`put-mapping`>> request override values set in the
`_default_` mapping.
The default mapping definition is a plain mapping definition that is
embedded within ElasticSearch:
[source,js]
--------------------------------------------------
@ -27,13 +42,15 @@ embedded within Elasticsearch:
}
--------------------------------------------------
Pretty short, no? Basically, everything is defaulted, especially the
dynamic nature of the root object mapping. The default mapping
definition can be overridden in several manners. The simplest manner is
to simply define a file called `default-mapping.json` and placed it
under the `config` directory (which can be configured to exist in a
different location). It can also be explicitly set using the
`index.mapper.default_mapping_location` setting.
Pretty short, isn't it? Basically, everything is `_default_`ed, including the
dynamic nature of the root object mapping which allows new fields to be added
automatically.
The built-in default mapping definition can be overridden in several ways. A
`_default_` mapping can be specified when creating a new index, or the global
`_default_` mapping (for all indices) can be configured by creating a file
called `config/default-mapping.json`. (This location can be changed with
the `index.mapper.default_mapping_location` setting.)
Dynamic creation of mappings for unmapped types can be completely
disabled by setting `index.mapper.dynamic` to `false`.

View File

@ -43,7 +43,7 @@ This policy has the following settings:
Segments smaller than this are "rounded up" to this size, i.e. treated as
equal (floor) size for merge selection. This is to prevent frequent
flushing of tiny segments from allowing a long tail in the index. Default
flushing of tiny segments, thus preventing a long tail in the index. Default
is `2mb`.
`index.merge.policy.max_merge_at_once`::
@ -67,7 +67,7 @@ This policy has the following settings:
Sets the allowed number of segments per tier. Smaller values mean more
merging but fewer segments. Default is `10`. Note, this value needs to be
>= then the `max_merge_at_once` otherwise you'll force too many merges to
>= than the `max_merge_at_once` otherwise you'll force too many merges to
occur.
`index.reclaim_deletes_weight`::
@ -83,8 +83,8 @@ This policy has the following settings:
<<index-modules-settings>>.
For normal merging, this policy first computes a "budget" of how many
segments are allowed by be in the index. If the index is over-budget,
then the policy sorts segments by decreasing size (pro-rating by percent
segments are allowed to be in the index. If the index is over-budget,
then the policy sorts segments by decreasing size (proportionally considering percent
deletes), and then finds the least-cost merge. Merge cost is measured by
a combination of the "skew" of the merge (size of largest seg divided by
smallest seg), total merge size and pct deletes reclaimed, so that
@ -99,7 +99,7 @@ budget.
Note, this can mean that for large shards that holds many gigabytes of
data, the default of `max_merged_segment` (`5gb`) can cause for many
segments to be in an index, and causing searches to be slower. Use the
indices segments API to see the segments that an index have, and
indices segments API to see the segments that an index has, and
possibly either increase the `max_merged_segment` or issue an optimize
call for the index (try and aim to issue it on a low traffic time).
@ -192,24 +192,21 @@ supported, with the default being the `ConcurrentMergeScheduler`.
[float]
==== ConcurrentMergeScheduler
A merge scheduler that runs merges using a separated thread, until the
maximum number of threads at which when a merge is needed, the thread(s)
that are updating the index will pause until one or more merges
completes.
A merge scheduler that runs merges using a separate thread. When the maximum
number of threads is reached, further merges will wait until a merge thread
becomes available.
The scheduler supports the following settings:
[cols="<,<",options="header",]
|=======================================================================
|Setting |Description
|index.merge.scheduler.max_thread_count |The maximum number of threads
to perform the merge operation. Defaults to
`index.merge.scheduler.max_thread_count`::
The maximum number of threads to perform the merge operation. Defaults to
`Math.max(1, Math.min(3, Runtime.getRuntime().availableProcessors() / 2))`.
|=======================================================================
[float]
==== SerialMergeScheduler
A merge scheduler that simply does each merge sequentially using the
calling thread (blocking the operations that triggered the merge, the
calling thread (blocking the operations that triggered the merge or the
index operation).

View File

@ -163,7 +163,7 @@ An alias can also be added with the endpoint
where
[horizontal]
`index`:: The index to alias refers to. Can be any of `blank | * | _all | glob pattern | name1, name2, …`
`index`:: The index the alias refers to. Can be any of `blank | * | _all | glob pattern | name1, name2, …`
`name`:: The name of the alias. This is a required option.
`routing`:: An optional routing that can be associated with an alias.
`filter`:: An optional filter that can be associated with an alias.
@ -248,7 +248,6 @@ found index aliases.
Possible options:
[horizontal]
`index`::
The index name to get aliases for. Partially names are
supported via wildcards, also multiple index names can be specified
separated with a comma. Also the alias name for an index can be used.

View File

@ -47,7 +47,7 @@ Also, the analyzer can be derived based on a field mapping, for example:
curl -XGET 'localhost:9200/test/_analyze?field=obj1.field1' -d 'this is a test'
--------------------------------------------------
Will cause the analysis to happen based on the analyzer configure in the
Will cause the analysis to happen based on the analyzer configured in the
mapping for `obj1.field1` (and if not, the default index analyzer).
Also, the text can be provided as part of the request body, and not as a

View File

@ -1,7 +1,7 @@
[[indices-get-mapping]]
== Get Mapping
The get mapping API allows to retrieve mapping definition of index or
The get mapping API allows to retrieve mapping definitions for an index or
index/type.
[source,js]
@ -15,7 +15,7 @@ curl -XGET 'http://localhost:9200/twitter/tweet/_mapping'
The get mapping API can be used to get more than one index or type
mapping with a single call. General usage of the API follows the
following syntax: `host:port/{index}/{type}/_mapping` where both
`{index}` and `{type}` can stand for comma-separated list of names. To
`{index}` and `{type}` can accept a comma-separated list of names. To
get mappings for all indices you can use `_all` for `{index}`. The
following are some examples:

View File

@ -26,7 +26,7 @@ merge needs to execute, and if so, executes it.
`only_expunge_deletes`:: Should the optimize process only expunge segments
with deletes in it. In Lucene, a document is not deleted from a segment,
just marked as deleted. During a merge process of segments, a new
segment is created that does not have those deletes. This flag allow to
segment is created that does not have those deletes. This flag allows to
only merge segments that have deletes. Defaults to `false`.
`flush`:: Should a flush be performed after the optimize. Defaults to

View File

@ -38,7 +38,7 @@ which means conflicts are *not* ignored.
The definition of conflict is really dependent on the type merged, but
in general, if a different core type is defined, it is considered as a
conflict. New mapping definitions can be added to object types, and core
type mapping can be upgraded by specifying multi fields on a core type.
type mappings can be upgraded by specifying multi fields on a core type.
[float]
[[put-mapping-multi-index]]

View File

@ -61,7 +61,7 @@ actual index name that the template gets applied to during index creation.
=== Deleting a Template
Index templates are identified by a name (in the above case
`template_1`) and can be delete as well:
`template_1`) and can be deleted as well:
[source,js]
--------------------------------------------------

View File

@ -13,7 +13,7 @@ and get them.
Index warmup can be disabled by setting `index.warmer.enabled` to
`false`. It is supported as a realtime setting using update settings
API. This can be handy when doing initial bulk indexing, disabling pre
API. This can be handy when doing initial bulk indexing: disable pre
registered warmers to make indexing faster and less expensive and then
enable it.

View File

@ -1,15 +1,15 @@
[[mapping-dynamic-mapping]]
== Dynamic Mapping
Default mappings allow to automatically apply generic mapping definition
to types that do not have mapping pre defined. This is mainly done
Default mappings allow to automatically apply generic mapping definitions
to types that do not have mappings predefined. This is mainly done
thanks to the fact that the
<<mapping-object-type,object mapping>> and
namely the <<mapping-root-object-type,root
object mapping>> allow for schema-less dynamic addition of unmapped
fields.
The default mapping definition is plain mapping definition that is
The default mapping definition is a plain mapping definition that is
embedded within the distribution:
[source,js]
@ -20,10 +20,10 @@ embedded within the distribution:
}
--------------------------------------------------
Pretty short, no? Basically, everything is defaulted, especially the
Pretty short, isn't it? Basically, everything is defaulted, especially the
dynamic nature of the root object mapping. The default mapping
definition can be overridden in several manners. The simplest manner is
to simply define a file called `default-mapping.json` and placed it
to simply define a file called `default-mapping.json` and to place it
under the `config` directory (which can be configured to exist in a
different location). It can also be explicitly set using the
`index.mapper.default_mapping_location` setting.

View File

@ -7,8 +7,8 @@ especially for search requests, where we want to execute a search query
against the content of a document, without knowing which fields to
search on. This comes at the expense of CPU cycles and index size.
The `_all` fields can be completely disabled. Explicit field mapping and
object mapping can be excluded / included in the `_all` field. By
The `_all` fields can be completely disabled. Explicit field mappings and
object mappings can be excluded / included in the `_all` field. By
default, it is enabled and all fields are included in it for ease of
use.
@ -69,7 +69,7 @@ specific `index_analyzer` and `search_analyzer`) to be set.
For any field to allow
<<search-request-highlighting,highlighting>> it has
to be either stored or part of the `_source` field. By default `_all`
to be either stored or part of the `_source` field. By default the `_all`
field does not qualify for either, so highlighting for it does not yield
any data.

View File

@ -20,7 +20,7 @@ Here is a simple mapping:
--------------------------------------------------
The above will use the value of the `my_field` to lookup an analyzer
registered under it. For example, indexing a the following doc:
registered under it. For example, indexing the following doc:
[source,js]
--------------------------------------------------
@ -33,7 +33,7 @@ Will cause the `whitespace` analyzer to be used as the index analyzer
for all fields without explicit analyzer setting.
The default path value is `_analyzer`, so the analyzer can be driven for
a specific document by setting `_analyzer` field in it. If custom json
a specific document by setting the `_analyzer` field in it. If a custom json
field name is needed, an explicit mapping with a different path should
be set.

View File

@ -4,7 +4,7 @@
deprecated[1.0.0.RC1,See <<function-score-instead-of-boost>>]
Boosting is the process of enhancing the relevancy of a document or
field. Field level mapping allows to define explicit boost level on a
field. Field level mapping allows to define an explicit boost level on a
specific field. The boost field mapping (applied on the
<<mapping-root-object-type,root object>>) allows
to define a boost field mapping where *its content will control the
@ -20,7 +20,7 @@ mapping:
}
--------------------------------------------------
The above mapping defines mapping for a field named `my_boost`. If the
The above mapping defines a mapping for a field named `my_boost`. If the
`my_boost` field exists within the JSON document indexed, its value will
control the boost level of the document indexed. For example, the
following JSON document will be indexed with a boost value of `2.2`:

View File

@ -9,7 +9,7 @@ to the date the document was processed by the indexing chain.
[float]
==== enabled
By default it is disabled, in order to enable it, the following mapping
By default it is disabled. In order to enable it, the following mapping
should be defined:
[source,js]

View File

@ -86,7 +86,7 @@ The following table lists all the attributes that can be used with the
|`index_name` |The name of the field that will be stored in the index.
Defaults to the property/field name.
|`store` |Set to `true` to store actual field in the index, `false` to not
|`store` |Set to `true` to actually store the field in the index, `false` to not
store it. Defaults to `false` (note, the JSON document itself is stored,
and it can be retrieved from it).
@ -208,8 +208,8 @@ store it. Defaults to `false` (note, the JSON document itself is stored,
and it can be retrieved from it).
|`index` |Set to `no` if the value should not be indexed. Setting to
`no` disables `include_in_all`. If set to `no` the field can be stored
in `_source`, have `include_in_all` enabled, or `store` should be set to
`no` disables `include_in_all`. If set to `no` the field should be either stored
in `_source`, have `include_in_all` enabled, or `store` be set to
`true` for this to be useful.
|`doc_values` |Set to `true` to store field values in a column-stride fashion.
@ -317,8 +317,8 @@ store it. Defaults to `false` (note, the JSON document itself is stored,
and it can be retrieved from it).
|`index` |Set to `no` if the value should not be indexed. Setting to
`no` disables `include_in_all`. If set to `no` the field can be stored
in `_source`, have `include_in_all` enabled, or `store` should be set to
`no` disables `include_in_all`. If set to `no` the field should be either stored
in `_source`, have `include_in_all` enabled, or `store` be set to
`true` for this to be useful.
|`doc_values` |Set to `true` to store field values in a column-stride fashion.
@ -380,8 +380,8 @@ store it. Defaults to `false` (note, the JSON document itself is stored,
and it can be retrieved from it).
|`index` |Set to `no` if the value should not be indexed. Setting to
`no` disables `include_in_all`. If set to `no` the field can be stored
in `_source`, have `include_in_all` enabled, or `store` should be set to
`no` disables `include_in_all`. If set to `no` the field should be either stored
in `_source`, have `include_in_all` enabled, or `store` be set to
`true` for this to be useful.
|`boost` |The boost value. Defaults to `1.0`.
@ -488,13 +488,13 @@ Elasticsearch has several builtin formats:
contained in a very low number of documents.
`pulsing`::
A postings format in-lines the posting lists for very low
A postings format that in-lines the posting lists for very low
frequent terms in the term dictionary. This is useful to improve lookup
performance for low-frequent terms.
`bloom_default`::
A postings format that uses a bloom filter to
improve term lookup performance. This is useful for primarily keys or
improve term lookup performance. This is useful for primary keys or
fields that are used as a delete key.
`bloom_pulsing`::
@ -579,10 +579,8 @@ custom doc values formats. See
==== Similarity
Elasticsearch allows you to configure a similarity (scoring algorithm) per field.
Allowing users a simpler extension beyond the usual TF/IDF algorithm. As
part of this, new algorithms have been added including BM25. Also as
part of the changes, it is now possible to define a Similarity per
field, giving even greater control over scoring.
The `similarity` setting provides a simple way of choosing a similarity algorithm
other than the default TF/IDF, such as `BM25`.
You can configure similarities via the
<<index-modules-similarity,similarity module>>

View File

@ -139,21 +139,21 @@ number of terms that will be indexed depends on the `geohash_precision`.
Defaults to `false`. *Note*: This option implicitly enables `geohash`.
|`validate` |Set to `true` to reject geo points with invalid latitude or
longitude (default is `false`) *Note*: Validation only works when
longitude (default is `false`). *Note*: Validation only works when
normalization has been disabled.
|`validate_lat` |Set to `true` to reject geo points with an invalid
latitude
latitude.
|`validate_lon` |Set to `true` to reject geo points with an invalid
longitude
longitude.
|`normalize` |Set to `true` to normalize latitude and longitude (default
is `true`)
is `true`).
|`normalize_lat` |Set to `true` to normalize latitude
|`normalize_lat` |Set to `true` to normalize latitude.
|`normalize_lon` |Set to `true` to normalize longitude
|`normalize_lon` |Set to `true` to normalize longitude.
|=======================================================================
[float]

View File

@ -88,8 +88,8 @@ configured it may return some false positives or false negatives for
certain queries. To mitigate this, it is important to select an
appropriate value for the tree_levels parameter and to adjust
expectations accordingly. For example, a point may be near the border of
a particular grid cell. And may not match a query that only matches the
cell right next to it even though the shape is very close to the point.
a particular grid cell and may thus not match a query that only matches the
cell right next to it -- even though the shape is very close to the point.
[float]
===== Example
@ -116,8 +116,8 @@ this into a tree_levels setting of 26.
Elasticsearch uses the paths in the prefix tree as terms in the index
and in queries. The higher the levels is (and thus the precision), the
more terms are generated. Both calculating the terms, keeping them in
memory, and storing them has a price of course. Especially with higher
more terms are generated. Of course, calculating the terms, keeping them in
memory, and storing them on disk all have a price. Especially with higher
tree levels, indices can become extremely large even with a modest
amount of data. Additionally, the size of the features also matters.
Big, complex polygons can take up a lot of space at higher tree levels.
@ -174,7 +174,7 @@ for upper left and lower right points of the shape:
===== http://www.geojson.org/geojson-spec.html#id4[Polygon]
A polygon is defined by a list of a list of points. The first and last
points in each list must be the same (the polygon must be closed).
points in each (outer) list must be the same (the polygon must be closed).
[source,js]
--------------------------------------------------

View File

@ -134,7 +134,7 @@ example, if we added age and its value is a number, then it can't be
treated as a string.
The `dynamic` parameter can also be set to `strict`, meaning that not
only new fields will not be introduced into the mapping, parsing
only will new fields not be introduced into the mapping, but also that parsing
(indexing) docs with such new fields will fail.
[float]
@ -173,6 +173,6 @@ In the above, `name` and its content will not be indexed at all.
==== include_in_all
`include_in_all` can be set on the `object` type level. When set, it
propagates down to all the inner mapping defined within the `object`
propagates down to all the inner mappings defined within the `object`
that do no explicitly set it.

View File

@ -90,7 +90,7 @@ date fields, not for `date` fields that you specify in your mapping.
[float]
==== date_detection
Allows to disable automatic date type detection (a new field introduced
Allows to disable automatic date type detection (if a new field is introduced
and matches the provided format), for example:
[source,js]

View File

@ -123,7 +123,7 @@ specific attributes.
For example, lets say we have an awareness attribute called `zone`, and
we know we are going to have two zones, `zone1` and `zone2`. Here is how
we can force awareness one a node:
we can force awareness on a node:
[source,js]
-------------------------------------------------------------------
@ -153,7 +153,7 @@ The settings can be updated using the <<cluster-update-settings,cluster update s
[[allocation-filtering]]
=== Shard Allocation Filtering
Allow to control allocation if indices on nodes based on include/exclude
Allow to control allocation of indices on nodes based on include/exclude
filters. The filters can be set both on the index level and on the
cluster level. Lets start with an example of setting it on the cluster
level:

View File

@ -38,7 +38,7 @@ Accept-Encoding). Defaults to `false`.
Defaults to `6`.
|=======================================================================
It also shares the uses the common
It also uses the common
<<modules-network,network settings>>.
[float]

View File

@ -16,8 +16,8 @@ The `indices.memory.index_buffer_size` accepts either a percentage or a
byte size value. It defaults to `10%`, meaning that `10%` of the total
memory allocated to a node will be used as the indexing buffer size.
This amount is then divided between all the different shards. Also, if
percentage is used, allow to set `min_index_buffer_size` (defaults to
`48mb`) and `max_index_buffer_size` which by default is unbounded.
percentage is used, it is possible to set `min_index_buffer_size` (defaults to
`48mb`) and `max_index_buffer_size` (defaults to unbounded).
The `indices.memory.min_shard_index_buffer_size` allows to set a hard
lower limit for the memory allocated per shard for its own indexing
@ -27,7 +27,7 @@ buffer. It defaults to `4mb`.
[[indices-ttl]]
=== TTL interval
You can dynamically set the `indices.ttl.interval` allows to set how
You can dynamically set the `indices.ttl.interval`, which allows to set how
often expired documents will be automatically deleted. The default value
is 60s.
@ -40,7 +40,7 @@ See also <<mapping-ttl-field>>.
[[recovery]]
=== Recovery
The following settings can be set to manage recovery policy:
The following settings can be set to manage the recovery policy:
[horizontal]
`indices.recovery.concurrent_streams`::
@ -65,7 +65,7 @@ The following settings can be set to manage recovery policy:
[[throttling]]
=== Store level throttling
The following settings can be set to control store throttling:
The following settings can be set to control the store throttling:
[horizontal]
`indices.store.throttle.type`::

View File

@ -58,7 +58,7 @@ The following are the settings the can be configured for memcached:
|`memcached.port` |A bind port range. Defaults to `11211-11311`.
|===============================================================
It also shares the uses the common
It also uses the common
<<modules-network,network settings>>.
[float]

View File

@ -81,7 +81,7 @@ to `false`.
[float]
=== Native (Java) Scripts
Even though `mvel` is pretty fast, allow to register native Java based
Even though `mvel` is pretty fast, this allows to register native Java based
scripts for faster execution.
In order to allow for scripts, the `NativeScriptFactory` needs to be
@ -174,7 +174,7 @@ of this geo point field from the provided geohash.
[float]
=== Stored Fields
Stored fields can also be accessed when executed a script. Note, they
Stored fields can also be accessed when executing a script. Note, they
are much slower to access compared with document fields, but are not
loaded into memory. They can be simply accessed using
`_fields['my_field_name'].value` or `_fields['my_field_name'].values`.

View File

@ -11,8 +11,8 @@ The transport mechanism is completely asynchronous in nature, meaning
that there is no blocking thread waiting for a response. The benefit of
using asynchronous communication is first solving the
http://en.wikipedia.org/wiki/C10k_problem[C10k problem], as well as
being the idle solution for scatter (broadcast) / gather operations such
as search in Elasticsearch.
being the ideal solution for scatter (broadcast) / gather operations such
as search in ElasticSearch.
[float]
=== TCP Transport
@ -38,7 +38,7 @@ time setting format). Defaults to `30s`.
between all nodes. Defaults to `false`.
|=======================================================================
It also shares the uses the common
It also uses the common
<<modules-network,network settings>>.
[float]