Commit Graph

438 Commits

Author SHA1 Message Date
Alexander Reelsen 669d72e47a Fix dense/sparse vector limit documentation (#40852)
The documentation stated a wrong limit of dense/sparse vector sizes.
This was changed in #40597 but the documentation was not fixed.
2019-04-05 09:35:30 +02:00
Jim Ferenczi 4c8c4e5951 remove experimental label from search_as_you_type documentation (#40744) 2019-04-03 09:42:20 +02:00
Andy Bristol 23395a9b9f
search as you type fieldmapper (#35600)
Adds the search_as_you_type field type that acts like a text field optimized
for as-you-type search completion. It creates a couple subfields that analyze
the indexed terms as shingles, against which full terms are queried, and a
prefix subfield that analyze terms as the largest shingle size used and
edge-ngrams, against which partial terms are queried

Adds a match_bool_prefix query type that creates a boolean clause of a term
query for each term except the last, for which a boolean clause with a prefix
query is created.

The match_bool_prefix query is the recommended way of querying a search as you
type field, which will boil down to term queries for each shingle of the input
text on the appropriate shingle field, and the final (possibly partial) term
as a term query on the prefix field. This field type also supports phrase and
phrase prefix queries however
2019-03-27 13:29:13 -07:00
Julie Tibshirani 6ffa8a040d Document the limitation around field aliases and percolator. (#40073)
Currently if a field alias is updated, any percolator queries that contain the
alias will still refer to its old target. This PR documents the issue while we
look into addressing it.

Relates to #37212.
2019-03-15 10:54:09 -07:00
Mayya Sharipova e80284231d
Backport distance functions vectors (#39330)
Distance functions for dense and sparse vectors

Backport for #37947, #39313
2019-02-23 11:52:43 -05:00
Alexander Reelsen 8e5e48319e
Add documentation about breaking java time changes (#38886)
In addition remove joda time mentions across the docs, make 
sure links are updated to java time javadocs.

Forward port of #38720
2019-02-14 10:18:12 +01:00
Julie Tibshirani 4ad4bc7f5f Update the removal of types docs with the new 6.7 behavior. (#38869)
Follow-up to #38825, where we made a tweak to the deprecation behavior.
2019-02-13 14:45:17 -08:00
Alexander Reelsen 87f3579125
Add nanosecond field mapper (#37755)
This adds a dedicated field mapper that supports nanosecond resolution -
at the price of a reduced date range.

When using the date field mapper, the time is stored as milliseconds since the epoch
in a long in lucene. This field mapper stores the time in nanoseconds
since the epoch - which means its range is much smaller, ranging roughly from
1970 to 2262.

Note that aggregations will still be in milliseconds.
However docvalue fields will have full nanosecond resolution

Relates #27330
2019-02-04 11:31:16 +01:00
Nick Knize 603cdf40f1
Update geo_shape docs to include unsupported features (#38138)
There are a two major features that are not yet supported by BKD Backed geo_shape: MultiPoint queries, and CONTAINS relation. It is important we are explicitly clear in the documentation that using the new approach may not work for users that depend on these features. This commit adds an IMPORTANT NOTE section to geo_shape docs that explicitly highlights these missing features and what should be done if they are an absolute necessity.
2019-02-01 10:41:41 -06:00
Julie Tibshirani 91b79ebed4
Update 'removal of types' docs to reflect the new plan. (#38003) 2019-01-31 10:26:24 -08:00
Adrien Grand 3332e332c7
Fix typo in docs. (#38018)
This has been introduced in #37871.
2019-01-31 09:51:03 +01:00
Adrien Grand b63b50b945
Give precedence to index creation when mixing typed templates with typeless index creation and vice-versa. (#37871)
Currently if you mix typed templates and typeless index creation or typeless
templates and typed index creation then you will end up with an error because
Elasticsearch tries to create an index that has multiple types: `_doc` and
the explicit type name that you used.

This commit proposes to give precedence to the index creation call so that
the type from the template will be ignored if the index creation call is
typeless while the template is typed, and the type from the index creation
call will be used if there is a typeless template.

This is consistent with the fact that index creation already "wins" if a field
is defined differently in the index creation call and in a template: the
definition from the index creation call is used in such cases.

Closes #37773
2019-01-30 10:28:24 +01:00
Mayya Sharipova a30ce6a00a
Rename feature, feature_vector and feature_query (#37794)
Ranaming as follows:
feature -> rank_feature
feature_vector -> rank_features
feature query -> rank_feature query

Ranaming is done to distinguish from other vector types.

Closes #36723
2019-01-24 19:18:48 -05:00
Christoph Büscher b3f9becf5f
Modify removal_of_types.asciidoc (#37648)
After switching the default behaviour of "include_type_name" to "false" in 7.0,
some parts of the types removal documentation can be adapted as well.
2019-01-23 09:48:00 +01:00
Christoph Büscher 34f2d2ec91
Remove remaining occurances of "include_type_name=true" in docs (#37646) 2019-01-22 15:13:52 +01:00
Christoph Büscher 25aac4f77f
Remove `include_type_name` in asciidoc where possible (#37568)
The "include_type_name" parameter was temporarily introduced in #37285 to facilitate
moving the default parameter setting to "false" in many places in the documentation
code snippets. Most of the places can simply be reverted without causing errors.
In this change I looked for asciidoc files that contained the
"include_type_name=true" addition when creating new indices but didn't look
likey they made use of the "_doc" type for mappings. This is mostly the case
e.g. in the analysis docs where index creating often only contains settings. I
manually corrected the use of types in some places where the docs still used an
explicit type name and not the dummy "_doc" type.
2019-01-18 09:34:11 +01:00
Julie Tibshirani 36a3b84fc9
Update the default for include_type_name to false. (#37285)
* Default include_type_name to false for get and put mappings.

* Default include_type_name to false for get field mappings.

* Add a constant for the default include_type_name value.

* Default include_type_name to false for get and put index templates.

* Default include_type_name to false for create index.

* Update create index calls in REST documentation to use include_type_name=true.

* Some minor clean-ups around the get index API.

* In REST tests, use include_type_name=true by default for index creation.

* Make sure to use 'expression == false'.

* Clarify the different IndexTemplateMetaData toXContent methods.

* Fix FullClusterRestartIT#testSnapshotRestore.

* Fix the ml_anomalies_default_mappings test.

* Fix GetFieldMappingsResponseTests and GetIndexTemplateResponseTests.

We make sure to specify include_type_name=true during xContent parsing,
so we continue to test the legacy typed responses. XContent generation
for the typeless responses is currently only covered by REST tests,
but we will be adding unit test coverage for these as we implement
each typeless API in the Java HLRC.

This commit also refactors GetMappingsResponse to follow the same appraoch
as the other mappings-related responses, where we read include_type_name
out of the xContent params, instead of creating a second toXContent method.
This gives better consistency in the response parsing code.

* Fix more REST tests.

* Improve some wording in the create index documentation.

* Add a note about types removal in the create index docs.

* Fix SmokeTestMonitoringWithSecurityIT#testHTTPExporterWithSSL.

* Make sure to mention include_type_name in the REST docs for affected APIs.

* Make sure to use 'expression == false' in FullClusterRestartIT.

* Mention include_type_name in the REST templates docs.
2019-01-14 13:08:01 -08:00
Peter Dyson 96cfa000a5
[DOCS] copy_to only works one level deep, not recursively (#37249) 2019-01-13 16:24:34 +10:00
Josh Soref edb48321ba [DOCS] Various spelling corrections (#37046) 2019-01-07 14:44:12 +01:00
Nick Knize ec0dc2c0e9
[Geo] Integrate Lucene's LatLonShape (BKD Backed GeoShapes) as default `geo_shape` indexing approach (#36751)
* [Geo] Expose BKDBackedGeoShapes as new VECTOR strategy

This commit exposes lucene's LatLonShape field as a new
strategy in GeoShapeFieldMapper. To use the new indexing
approach, strategy should be set to "vector" in the
geo_shape field mapper. If the tree parameter is set
the mapper will throw an IAE. Note the following:

When using vector strategy:

* geo_shape query does not support querying by POINT,
MULTIPOINT, or GEOMETRYCOLLECTION.
* LINESTRING and MULTILINESTRING queries do not support
WITHIN relation.
* CONTAINS relation is not supported.
* The tree, precision, tree_levels, distance_error_pct,
and points_only parameters will not throw an exception
but they have no effect and will be marked as
deprecated..

All other features are supported.

* revert change to PercolatorFieldMapper

* fix ExistsQuery for geo_shape vector strategy

* add deprecation logging for tree, precision, tree_levels, distance_error_pct, and points_only

* initial update to geoshape docs, including mapping migration updates

* initial support for GeoCollection queries

* fix docs and javadoc errors

* clean up geocollection queries

* set deprecated mapping tests to NOTCONSOLE

* fix geo-shape mapper asciidoc mapping and test warnings

* add support for point queries using LatLonShapeBoundingBoxQuery

* update GeoShapeQueryBuilderTests to include POINT queries for VECTOR strategy. Other comment cleanups

* add lucene geometry build testing to ShapeBuilder tests

* remove deprecated prefix tree mapping from geo-shape.asciidoc

* refactor GeoShapeFieldMapper into LegacyGeoShapeFieldMapper and GeoShapeFieldMapper

Both classes derive from BaseGeoShapeFieldMapper that provides shared parameters:
coerce, ignoreMalformed, ignore_z_value, orientation.

* update docs to remove vector strategy

* fix GeometryCollectionBuilder#buildLucene to return the object created by the shape builder

* fix LineLength failure in GeoJsonShapeParserTests

* ShapeMapper refactor changes from PR feedback

* fix typo in geo-shape.asciidoc

* ignore circle test in docs

* update indexing-approach ref to geoshape-indexing-approach

* add warnings check for LegacyGeoShapeFieldMapper to AbstractBuilderTestCase

* fix deprecatedParameters setup

* update indexing approach

* fixing unexpected warnings failures

* move orientation back to field type

* remove if in LegacyGeoShapeFieldMapper#doXContent. Fix GeoShapeFieldMapper to work with double array as a point

* fix indexing-approach link in circle section of geoshape docs

* add strategy to deprecation warnings check

* fix test failures

* fix typo in QueryStringQueryBuilderTests

* fix total hits to totalHits().value

* fix version number

* add version check to BaseGeoShapeFieldMapper

* fix line length!

* revert version check in BaseGeoShapeFieldMapper

* Fix serialization of mappings of legacy shapes.
2018-12-18 09:54:56 -06:00
Nicholas Knize 96d279ed83 Revert "[Geo] Integrate Lucene's LatLonShape (BKD Backed GeoShapes) as default `geo_shape` indexing approach (#35320)"
This reverts commit 5bc7822562.
2018-12-17 20:09:46 -06:00
Nick Knize 5bc7822562
[Geo] Integrate Lucene's LatLonShape (BKD Backed GeoShapes) as default `geo_shape` indexing approach (#35320)
This commit  exposes lucene's LatLonShape field as the
default type in GeoShapeFieldMapper. To use the new 
indexing approach, simply set "type" : "geo_shape" in 
the mappings without setting any of the strategy, precision, 
tree_levels, or distance_error_pct parameters. Note the 
following when using the new indexing approach:

* geo_shape query does not support querying by 
MULTIPOINT.
* LINESTRING and MULTILINESTRING queries do not 
yet support WITHIN relation.
* CONTAINS relation is not yet supported.
The tree, precision, tree_levels, distance_error_pct, 
and points_only parameters are deprecated.
2018-12-17 14:38:14 -06:00
Mayya Sharipova bda03163e7 Make vector fields experimental feature
Relates to #33022
2018-12-13 07:17:52 -05:00
Mayya Sharipova b5d532f9e3
Vector field (#33022)
1. Dense vector

PUT dindex
{
  "mappings": {
    "_doc": {
      "properties": {
        "my_vector": {
          "type": "dense_vector"
        },
        "my_text" : {
          "type" : "keyword"
        }
      }
    }
  }
}

PUT dinex/_doc/1
{
  "my_text" : "text1",
  "my_vector" : [ 0.5, 10, 6 ]
}

2. Sparse vector

PUT sindex
{
  "mappings": {
    "_doc": {
      "properties": {
        "my_vector": {
          "type": "sparse_vector"
        },
        "my_text" : {
          "type" : "keyword"
        }
      }
    }
  }
}

PUT sindex/_doc/1
{
  "my_text" : "text1",
  "my_vector" : {"1": 0.5, "99": -0.5,  "5": 1}
}
2018-12-12 21:20:53 -05:00
Jim Ferenczi 18866c4c0b
Make hits.total an object in the search response (#35849)
This commit changes the format of the `hits.total` in the search response to be an object with
a `value` and a `relation`. The `value` indicates the number of hits that match the query and the
`relation` indicates whether the number is accurate (in which case the relation is equals to `eq`)
or a lower bound of the total (in which case it is equals to `gte`).
This change also adds a parameter called `rest_total_hits_as_int` that can be used in the
search APIs to opt out from this change (retrieve the total hits as a number in the rest response).
Note that currently all search responses are accurate (`track_total_hits: true`) or they don't contain
`hits.total` (`track_total_hits: true`). We'll add a way to get a lower bound of the total hits in a
follow up (to allow numbers to be passed to `track_total_hits`).

Relates #33028
2018-12-05 19:49:06 +01:00
Alan Woodward 73ceaad03a
Update to lucene-8.0.0-snapshot-c78429a554 (#36212)
Includes:

* A fix for a bug in Intervals.or() (https://issues.apache.org/jira/browse/LUCENE-8586)
* The ability to disable offset mangling in WordDelimiterGraphFilter
        (https://issues.apache.org/jira/browse/LUCENE-8509)
* BM25Similarity no longer multiplies scores by k1 + 1
2018-12-05 12:43:56 +00:00
Guido Lena Cota 89fae42833 (Minor) Fix some typos (#36180) 2018-12-04 11:10:30 +01:00
Peter Dyson 1f25a0bd31 [Docs] Add example for updating meta field (#35893) 2018-11-28 12:04:57 +01:00
Alan Woodward be8097f9ce
Improve docs for index_prefixes option (#35778)
This commit moves the documentation and examples for the `index_prefixes`
option on text fields to its own file, to bring it in line with other mapping 
parameters, and expands a bit on both.
2018-11-22 09:20:46 +00:00
Alan Woodward 26cc8ff8c3
Add pointer to the index-phrases option in shingle filter docs (#35771)
We should be discouraging the use of shingle filters and instead pointing users to the
index-phrases parameter on text fields.
2018-11-21 15:27:11 +00:00
Takuro Wada 7b2d547e8e [Docs] Delete inappropriate backtick (#35722) 2018-11-20 10:08:32 +01:00
Julie Tibshirani ec53288fc0
Remove include_type_name from the relevant APIs. (#35192)
We've decided that the bulk, delete, get, index, update, and search APIs should not
contain this request parameter, and we will instead accept both typed and typeless calls.
2018-11-06 14:33:48 -08:00
Julie Tibshirani 70da490f34
Remove some documentation that only makes sense with multiple types. (#35066)
* Remove a tip about ignore_above that only makes sense with multiple types.
* Remove a line from the percolator documentation that refers to multiple types.
2018-10-30 10:19:12 -07:00
Julie Tibshirani f854330e06
Make sure to use the type _doc in the REST documentation. (#34662)
* Replace custom type names with _doc in REST examples.
* Avoid using two mapping types in the percolator docs.
* Rename doc -> _doc in the main repository README.
* Also replace some custom type names in the HLRC docs.
2018-10-22 11:54:04 -07:00
Igor Motov 94bde37bcf
Geo: Don't flip longitude of envelopes crossing dateline (#34535)
When a envelope that crosses the dateline is specified as a part of
geo_shape query is parsed it shouldn't have its left and right points
flipped.

Fixes #34418
2018-10-19 13:53:54 -04:00
Daniel Mitterdorfer 02fb5aa4ec
Remove leftover doc about format being updatable
With this commit we remove a leftover in the docs about the `format`
field being updatable. This is not true since we removed support for
updates in #25285.

Closes #33986
Relates #25285
Relates #34006
2018-09-25 10:13:23 +02:00
markharwood 2fa09f062e
New plugin - Annotated_text field type (#30364)
New plugin for annotated_text field type.
Largely a copy of `text` field type but adds ability to include markdown-like syntax in the text.
The “AnnotatedText” class parses text+markup and converts into plain text and AnnotationTokens.
The annotation token values are injected unchanged alongside the regular text tokens to provide a
form of additional indexed overlay useful in positional searches and highlighting.
Annotated_text fields do not support fielddata as we want to phase this out.
Also includes a new "annotated" highlighter type that retains annotations and merges in search
hits as additional annotation markup.

Closes #29467
2018-09-18 10:25:27 +01:00
Jim Ferenczi 7ad71f906a
Upgrade to a Lucene 8 snapshot (#33310)
The main benefit of the upgrade for users is the search optimization for top scored documents when the total hit count is not needed. However this optimization is not activated in this change, there is another issue opened to discuss how it should be integrated smoothly.
Some comments about the change:
* Tests that can produce negative scores have been adapted but we need to forbid them completely: #33309

Closes #32899
2018-09-06 14:42:06 +02:00
Pablo Musa a88f8789a0
Highlight that index_phrases only works if no slop is used (#33303)
Highlight that `index_phrases` only works if no slop is used at query time.
2018-08-31 14:48:55 +02:00
Luca Cavanna 393eec1482
Set maxScore for empty TopDocs to Nan rather than 0 (#32938)
We used to set `maxScore` to `0` within `TopDocs` in situations where there is really no score as the size was set to `0` and scores were not even tracked. In such scenarios, `Float.Nan` is more appropriate, which gets converted to `max_score: null` on the REST layer. That's also more consistent with lucene which set `maxScore` to `Float.Nan` when merging empty `TopDocs` (see `TopDocs#merge`).
2018-08-22 17:23:54 +02:00
Dimitrios Liappis abb4c183f1
Clarify ignore_above behavior with arrays of strings
Currently docs don't explain how `ignore_above` behaves with arrays of
strings.

Clarify how `ignore_above` applies for arrays of strings and
also note that all string(s) will still be visible in the
`_source` field.

Relates #33057
2018-08-22 18:18:30 +03:00
Julie Tibshirani 815c56b677
Fix an inaccuracy in the dynamic templates documentation. (#32890) 2018-08-20 11:00:11 -07:00
Julie Tibshirani 0f0068b91c
Ensure that field aliases cannot be used in multi-fields. (#32219) 2018-07-20 00:18:54 -07:00
Julie Tibshirani 15ff3da653
Add support for field aliases. (#32172)
* Add basic support for field aliases in index mappings. (#31287)
* Allow for aliases when fetching stored fields. (#31411)
* Add tests around accessing field aliases in scripts. (#31417)
* Add documentation around field aliases. (#31538)
* Add validation for field alias mappings. (#31518)
* Return both concrete fields and aliases in DocumentFieldMappers#getMapper. (#31671)
* Make sure that field-level security is enforced when using field aliases. (#31807)
* Add more comprehensive tests for field aliases in queries + aggregations. (#31565)
* Remove the deprecated method DocumentFieldMappers#getFieldMapper. (#32148)
2018-07-18 09:33:09 -07:00
Nik Everett 0522c6644d Docs: Remove duplicate test setup
The range docs had an introductory section that described how to set up
and index *and* a test setup section in `docs/build.gradle` that
duplicated that section. This is bad because these section can (and do)
drift from one another. This change removes the setup in build.gradle
and marks the introductor snippet with `// TESTSETUP` so it is used on
all the snippets.
2018-06-28 10:59:35 -04:00
Peter Dyson e7a7b9689d [Docs] Mention ip_range datatypes on ip type page (#31416)
A link to the ip_range datatype page provides a way for newer users to know
it exists if they land directly on the ip datatype page first via a search.
2018-06-20 13:04:03 +02:00
Julie Tibshirani 3f5ebb862d
Clarify that IP range data can be specified in CIDR notation. (#31374) 2018-06-18 08:21:41 -07:00
David Turner 6ad7217656
Remove reference to multiple fields with one name (#31127)
If there is only one type per index then each field's name is unique.
2018-06-07 12:38:57 +01:00
Rafał Bigaj 749d39061a [Docs] Correct minor typos in templates.asciidoc (#31167) 2018-06-07 10:44:57 +02:00
Adrien Grand 458bca11bc
Add a `feature_vector` field. (#31102)
This field is similar to the `feature` field but is better suited to index
sparse feature vectors. A use-case for this field could be to record topics
associated with every documents alongside a metric that quantifies how well
the topic is connected to this document, and then boost queries based on the
topics that the logged user is interested in.

Relates #27552
2018-06-07 10:05:37 +02:00