OpenSearch

mirror of https://github.com/honeymoose/OpenSearch.git synced 2025-04-01 04:48:28 +00:00

Author	SHA1	Message	Date
Adrien Grand	5821fa042c	Cardinality aggregation. This aggregation computes unique term counts using the hyperloglog++ algorithm which uses linear counting to estimate low cardinalities and hyperloglog on higher cardinalities. Since this algorithm works on hashes, it is useful for high-cardinality fields to store the hash of values directly in the index, which is the purpose of the new `murmur3` field type. This is less necessary on low-cardinality string fields because the aggregator is smart enough to only compute the hash once per unique value per segment thanks to ordinals, or on numeric fields since hashing them is very fast. Close #5426	2014-03-13 19:19:56 +01:00
Simon Willnauer	da707b6f32	Remove `omit_term_freq_and_positions` for new indices `omit_term_freq_and_positions` was deprecated in `0.20` and is not documented anymore. We should reject indices that are created with this option in the future. Closes #4722	2014-01-17 14:46:48 +01:00
Lee Hinman	2341825358	Make type wrapping optional for PUT Mapping API request Put mapping now supports either of these formats: POST foo/doc/_mapping { "doc": { "_routing": {"required": true}, "properties": { "body": {"type": "string"} } } } or POST foo/doc/_mapping { "_routing": {"required": true}, "properties": { "body": {"type": "string"} } } Closes #4483	2014-01-13 09:26:09 -07:00
Martijn van Groningen	943b62634c	Replaced the multi-field type in favour for the multi fields option that can be set on any core field. When upgrading to ES 1.0 the existing mappings with a multi-field type automatically get replaced to a core field with the new `fields` option. If a `multi_field` type-ed field doesn't have a main / default field, a default field will be chosen for the multi fields syntax. The new main field type will be equal to the first `multi_field` fields' field or type string if no fields have been configured for the `multi_field` field and in both cases the default index will not be indexed (`index=no` is set on the default field). If a `multi_field` typed field has a default field, that field will replace the `multi_field` typed field. Closes to #4521	2014-01-13 09:21:53 +01:00
Simon Willnauer	10ec2e948a	Fix ASL Header in source files to reflect s/ElasticSearch/Elasticsearch This commit also removes the license to Shay Banon in favor of soley Elasticsearch. Thanks Shay for this awesome product you took it far! Closes #4636	2014-01-07 11:22:01 +01:00
Nik Everett	7690b40ec6	Allow string fields to store token counts To use this one you send a string to a field of type 'token_count'. This makes the most sense with a multi-field.	2013-12-03 09:39:32 +01:00
Shay Banon	6f90a3e39a	allow to parse directly the compressed mapping	2013-11-26 09:48:33 +01:00
Shay Banon	021aa09614	External method to set rootTypeParsers in DocumentMapperParser incorrect fixes #4113	2013-11-07 01:06:57 +01:00
Boaz Leskes	0ef2493b2c	Throw an exception if a type's mapping root node is not equal to the type in question. Also, fix all the problems it brought up in tests. Removed OverrideTypeMappingTests as it is no longer relevant. Better naming for the default percolator mapping and change it's content use _default_ as root node. Closes #4038	2013-11-05 11:54:25 +01:00
Adrien Grand	4fa8f6f61f	Doc values integration. This commit allows for using Lucene doc values as a backend for field data, moving the cost of building field data from the refresh operation to indexing. In addition, Lucene doc values can be stored on disk (partially, or even entirely), so that memory management is done at the operating system level (file-system cache) instead of the JVM, avoiding long pauses during major collections due to large heaps. So far doc values are supported on numeric types and non-analyzed strings (index:no or index:not_analyzed). Under the hood, it uses SORTED_SET doc values which is the only type to support multi-valued fields. Since the field data API set is a bit wider than the doc values API set, some operations are not supported: - field data filtering: this will fail if doc values are enabled, - field data cache clearing, even for memory-based doc values formats, - getting the memory usage for a specific field, - knowing whether a field is actually multi-valued. This commit also allows for configuring doc-values formats on a per-field basis similarly to postings formats. In particular the doc values format of the _version field can be configured through its own field mapper (it used to be handled in UidFieldMapper previously). Closes #3806	2013-10-09 16:34:30 +02:00
Simon Willnauer	f2dc4f810c	Added tests for malformed mappings with no root object This commit also makes the error message more consistent with other exception messages in the DocumentMapperParser.	2013-08-07 14:01:32 +02:00
Manuel Bernhardt	27518b5e41	Improved error message when the mapping document is malformed	2013-08-07 13:41:49 +02:00
Alexander Reelsen	4f4f3a2b10	Added prefix suggestions based on AnalyzingSuggester This commit introduces near realtime suggestions. For more information about its usage refer to github issue #3376 From the implementation point of view, a custom AnalyzingSuggester is used in combination with a custom postingsformat (which is not exposed to the user anywhere for him to use). Closes #3376	2013-08-01 08:44:09 +02:00
Shay Banon	1d63ff64c7	simplify parsing code	2013-06-11 13:19:54 +02:00
Chris Male	9e2469e04f	Add per-field Similarity support	2012-11-21 12:44:59 +13:00
Martijn van Groningen	fd5bd102aa	lucene 4: Exposed Lucene's codec api This feature adds the option to configure a `PostingsFormat` and assign it to a field in the mapping. This feature is very expert and in almost all cases Elasticsearch's defaults will suite your needs. ## Configuring a postingsformat per field There're several default postings formats configured by default which can be used in your mapping: a* `direct` - A codec that wraps the default postings format during write time, but loads the terms and postinglists into memory directly in memory during read time as raw arrays. This postings format is exceptional memory intensive, but can give a substantial increase in search performance. * `memory` - A codec that loads and stores terms and postinglists in memory using a FST. Acts like a cached postingslist. * `bloom_default` - Maintains a bloom filter for the indexed terms, which is stored to disk and builds on top of the `default` postings format. This postings format is useful for low document frequency terms and offers a fail fast for seeks to terms that don't exist. * `bloom_pulsing` - Similar to the `bloom_default` postings format, but builds on top of the `pulsing` postings format. * `default` - The default postings format. The default if none is specified. On all fields it possible to configure a `postings_format` attribute. Example mapping: ``` { "person" : { "properties" : { "second_person_id" : {"type" : "string", "postings_format" : "pulsing"} } } } ``` ## Configuring a custom postingsformat It is possible the instantiate custom postingsformats. This can be specified via the index settings. ``` { "codec" : { "postings_format" : { "my_format" : { "type" : "pulsing40" "freq_cut_off" : "5" } } } } ``` In the above example the `freq_cut_off` is set the 5 (defaults to 1). This tells the pulsing postings format to inline the postinglist of terms with a document frequency lower or equal to 5 in the term dictionary. Closes #2411	2012-11-14 23:54:29 +01:00
Shay Banon	6c3847b0a9	move spatial4j and jts to be optional dependencies allowing data and client nodes to work without them, disabling shapes if needed	2012-09-01 00:05:49 +02:00
Chris Male	bea4346f3a	Added GeoShape indexing and querying support	2012-08-13 13:44:29 +02:00
Shay Banon	bb0f5cf234	improve map builder to initialize the inner map with a map to build the data from	2012-05-20 19:50:47 +02:00
Shay Banon	acbd7b686a	Allow to customize quote analyzer to be used when quoting text in a query_string, closes #1931 .	2012-05-10 11:51:51 +03:00
Shay Banon	6a71eab51f	finalize structure, tests pass	2011-12-06 02:43:17 +02:00
Shay Banon	a8fd2d48b8	first cleanup phase, move to single src	2011-12-06 00:59:23 +02:00

22 Commits