Commit Graph

5202 Commits

Author SHA1 Message Date
Adrien Grand cb34cccc1e Fix field number attribution to _version.
IndexUpgraderMergePolicy assumed that field numbers were dense and that
fieldInfos.size() was a free field number. This can however be wrong for a
segment which doesn't have one or more fields that some older segments have.

Close #3237
2013-06-26 16:57:13 +02:00
Adrien Grand 1954f770a1 Put Eclipse settings in the root directory.
This enforces that settings are taken into account whichever mean is used to
import the project into Eclipse (manual import, m2e, mvn eclipse:eclipse, ...).
2013-06-26 16:51:47 +02:00
Alexander Reelsen 7e55354f4a Added support for PatternReplaceCharFilter
PatternReplaceCharFilter allows the use of a regex to manipulate the characters in a string before analysis

Closes #3197
2013-06-26 15:25:18 +02:00
Florian Schilling 42b3f06a32 fixed ShapeFetchService. closes #3242 2013-06-26 12:55:56 +02:00
Shay Banon c3ef49f5b0 add 0.90.3 2013-06-26 09:02:54 +01:00
Shay Banon 1b870774b6 Terms Filter Lookup: Allow to disable caching of lookup terms
closes #3241
2013-06-26 08:45:57 +01:00
Shay Banon 991b5abdf4 Terms Filter Lookup: When on cache key defined, use terms values as key to filter cache
closes #3240
2013-06-26 08:34:25 +01:00
Martijn van Groningen 64d42782a9 No need to fetch the freq for term filter 2013-06-25 22:40:59 +02:00
Boaz Leskes 99cb26fa02 A small doc change to reflect StreamOutput.writeVInt() does support negative numbers but not efficiently. StreamOutput.writeVLong & StreamInput.readVLong really support it.
This is to better describe the current situation. We probably want to normalize these methods and potentially add optimization/support for -1 values.
2013-06-25 14:13:44 +02:00
Martijn van Groningen 4c0b10aec7 Made the minimum score only active when executing the main query and not during the context rewrite phase.
This fixes parent/child queries when using minimum_score.

Closes #3203
2013-06-25 13:38:10 +02:00
Florian Schilling 84fa9ead4d The `geohash_cell` filter now adapts the format of other geo-filters. The oject fieldnames match the fieldnames document names automatically. This invalidates the `field` field in previeous versions. The value these fields value is a `geo_point` value (all formats supported) which is internally translated to a geohash. Since those points alway have a maximum precision (level 12) a `precision` definition has been included. This precision can either be defined as *length* of the geohash-string or as *distance*. It's assumed the a distance without any unit is a geohash-length.
```
GET 'http://127.0.0.1:9200/locations/_search?pretty=true' -d '{
    "query": {
        "match_all":{}
    },
    "filter": {
        "geohash_cell": {
			"pin": {
				"lat": 13.4080,
				"lon": 52.5186
			},
            "precision": 3,
            "neighbors": true
        }
    }
}'
```
Closes #3229
2013-06-25 12:16:08 +02:00
Shay Banon d094042b08 Lookup Terms Filter ignores the routing parameter
fixes #3233
2013-06-25 11:54:09 +02:00
Shay Banon cbe18608ef Deleting or closing an index doesn't clean the memory properly
fixes #3232
2013-06-25 00:44:00 +02:00
Shay Banon b91cb8b779 properly set the set flag 2013-06-25 00:06:55 +02:00
Alexander Reelsen c561b1bbcf Added Arabic/PersianNormalizationFilters from Lucene 2013-06-24 22:09:53 +02:00
Shay Banon f3c068f637 only call terms lookup once and not per segment 2013-06-24 18:01:35 +02:00
Florian Schilling e0846448e9 Reduced geobulk data 2013-06-24 16:20:38 +02:00
Shay Banon 160cb36b9d better handling of null filters when caching them 2013-06-24 15:34:44 +02:00
Shay Banon 80ede081c3 Lookup Terms Filter _cache parameter not being taken into account
fixes #3219
2013-06-24 15:23:16 +02:00
Adrien Grand 432628086f Fix NumericTokenizer.
NumericTokenizer is a simple wrapper aroung a NumericTokenStream. However, its
implementations had a few issues: its reset() method was not idempotent,
causing exceptions if reset() was called twice (causing #3211) and it had no
attributes, meaning that the only thing it allowed to do is counting the number
of generated tokens. The reason why indexing numeric data worked is that
the mapper's parseCreateField directly generates a NumericTokenStream and
by-passes the analyzer.

This commit makes NumericTokenizer.reset idempotent and makes consuming a
NumericTokenizer behave the same way as consuming the underlying
NumericTokenStream.
2013-06-24 14:13:27 +02:00
Shay Banon 58e68db148 improve geohash_filter to use terms filter
and various other cleanups
2013-06-24 11:34:59 +02:00
Shay Banon 6fd74fa39e Terms Filter Lookup: Failure when no mappings for the terms field exists (no data indexed)
closes #3216
2013-06-22 19:41:02 +02:00
Simon Willnauer 7206c60019 stabelize more tests 2013-06-20 17:04:30 +02:00
Boaz Leskes 178629382c Added version support to update requests
Moved version handling from RobinEngine into VersionType. This avoids code re-use and makes it cleaner and easier to read.

Closes #3111
2013-06-20 13:27:00 +02:00
Cédric HOURCADE 71849668e9 Add Lucene CommonGrams/CommonGramsQuery token fiter
Both filters merged in a single "common_grams" tokenfilter.

Closes #3202
2013-06-19 17:39:04 +02:00
Florian Schilling 5aa0a8438f GeoHash Filter
##############

Previous versions of the GeoPointFieldMapper just stored the actual geohash
of a point. This commit changes the behavior of storing geohashes by storing
the geohash and all its prefixes in decreasing order in the same field. To
enable this functionality the option geohash_prefix must be set in the mapping.

This behavior allows to filter GeoPoints by their geohashes. Basically a
geohash prefix is defined by the filter and all geohashes that match this
prefix will be returned. The neighbors flag allows to filter geohashes
that surround the given geohash cell. In general the neighborhood of a
geohash is defined by its eight adjacent cells.

To enable this, the type of filtered fields must be geo_point with geohashes
and geohash_prefix enabled.

For example:
    curl -XPUT 'http://127.0.0.1:9200/locations/?pretty=true' -d '{
        "mappings" : {
            "location": {
                "properties": {
                    "pin": {
                        "type": "geo_point",
                        "geohash": true,
                        "geohash_prefix": true
                    }
                }
            }
        }
    }'

This example defines a mapping for a type location in an index locations
with a field pin. The option geohash arranges storing the geohash of
the pin field.

To filter the results by the geohash a geohash_cell needs to be defined.
For example
    curl -XGET 'http://127.0.0.1:9200/locations/_search?pretty=true' -d '{
        "query": {
            "match_all":{}
        },
        "filter": {
            "geohash_cell": {
                "field": "pin",
                "geohash": "u30",
                "neighbors": true
            }
        }
    }'

This filter will match all geohashes that start with one of the following
prefixes: u30, u1r, u32, u33, u1p, u31, u0z, u2b and u2c.

Internally the GeoHashFilter is either a simple TermFilter, in case no
neighbors should be filtered or a BooleanFilter combining the TermFilters
of the geohash and all its neighbors.

Closes #2778
2013-06-19 14:35:02 +02:00
Clinton Gormley bc90e73932 Expose fielddata "fields" param in standard in indicesStatsRequest
Closes #3205
2013-06-19 13:18:55 +02:00
Clinton Gormley b27ad99b8d The "fielddata" qs param to index stats was setting idCache, not fieldData
Closes #3204
2013-06-19 12:33:01 +02:00
Boaz Leskes 02c6222320 Trimming MVEL scripts before compiling them.
This bypasses an issue with MVEL error handling why can go into an infinite loop in some edge cases. More info here: http://jira.codehaus.org/browse/MVEL-292

Closes #3168
2013-06-19 12:14:10 +02:00
Adrien Grand fccbe9c185 Import the new n-gram tokenizers and filters from Lucene.
Lucene 4.4 will feature new n-gram tokenizers and filters that should not
generate broken offsets (that cause highlighting bugs) anymore. They also
correctly handle supplementary characters and the tokenizers can work in a
streaming fashion (they are not limited to the first 1024 chars of the
stream anymore).
2013-06-19 09:45:17 +02:00
Simon Willnauer a388588b1f Upgrade to Lucene 4.3.1 2013-06-18 22:15:31 +02:00
Simon Willnauer c9c68fced7 Add ShardId and Index to SuggestionContext
Suggesters might need access to the shard they run on as well as the
index they operate on. This patch adds indexname and shard ID to the
SuggestionContext

Closes #3199
2013-06-18 15:00:42 +02:00
Cédric HOURCADE d41c37fdfa Add support for "high_freq" and "low_freq" parameters for Common Query
"minimum_should_match" parameter. High freq parameters is used when the
query has only high frequent terms.

Closes #3188
2013-06-17 20:31:38 +02:00
Simon Willnauer 8363fcf281 create 'shape' index explicitly to ensure tests don't hang 2013-06-17 17:48:35 +02:00
Simon Willnauer deda7a37fc Ensure tests wait for relocations 2013-06-17 13:55:18 +02:00
Martijn van Groningen e7d13971f3 Simplified validate check 2013-06-17 10:36:38 +02:00
Marcus Granström b7cb479a72 Added `doc_as_upsert` option to update api.
This option can reduce to amount of data being send to Elasticsearch.
Closes #3195
2013-06-17 10:23:37 +02:00
Clinton Gormley 2f616e3c2a Merge pull request #3192 from clintongormley/nodes_info_timeout
Expose timeout for nodes_info requests in the REST interface
2013-06-15 10:28:42 -07:00
Clinton Gormley 27a8083b7d Expose timeout for nodes_info requests in the REST interface
Closes #3191
2013-06-15 19:01:09 +02:00
Adrien Grand a30d58aae2 Compress PagedBytesAtomicFieldData's termOrdToBytesOffset.
Using MonotonicAppendingLongBuffer instead of a GrowableWriter should help
save several bits per value, especially when the bytes to store have similar
lengths.

Closes #3186
2013-06-15 09:31:23 +02:00
Simon Willnauer 25f19f8b87 Wait for reloctations in utility methods 2013-06-14 21:59:43 +02:00
Simon Willnauer a4fc11b3d1 Wait for Yellow state after indexing 2013-06-14 12:14:43 +02:00
Clinton Gormley f537b8ccee Change default operator to "or" for "low_freq_operator" and "high_freq_operator" parameters for "common" queries
Closes #3178
2013-06-14 11:08:56 +02:00
Martijn van Groningen 8d59ed3ab0 Use SinglePackedOrdinals over SingleArrayOrdinals to reduce the memory ordinals take for single valued fields in field data.
Closes #3185
2013-06-14 10:16:49 +02:00
Simon Willnauer b995abfa80 Call DISI#cost() ahead of time to prevent NPE
NotDocIdSet resets the internal DocIdSetIterator to null causing NPE
if cost is called.

Closes #3177
2013-06-14 09:49:30 +02:00
Clinton Gormley c3332db7d0 Fixed an error message on the terms filter 2013-06-13 19:40:47 +02:00
Simon Willnauer 4e4529f3dc Check if Alias Creation was acknoledge in tests.
if there is a failure during alias creation the tests don't fail with the
correct exception. This commit simplifies the debugging asserting on the ack
flag.
2013-06-13 15:52:33 +02:00
Simon Willnauer a654c3d103 Set a hard limit on the number of tokens we run suggestion on
PhraseSuggester can be very slow and CPU intensive if a lot of terms
are suggested. Yet, to prevent cluster instabilty and long running requests
this commit adds a hard limit of by default 10 tokens where we just return
no correction for anymore if the query is parsed into more tokens.

Closes #3164
2013-06-13 15:12:38 +02:00
Alexander Reelsen 9d3e34b9f9 Allow date format to supported group of built-in patterns
Until now 'named dates' like dateOptionalTime could not be used as a group
of dates. This patch allows it to group it arbitrarily like this:

* yyyy/MM/dd HH:mm:ss||yyyy/MM/dd||dateOptionalTime
* dateOptionalTime||yyyy/MM/dd HH:mm:ss||yyyy/MM/dd
* yyyy/MM/dd HH:mm:ss||dateOptionalTime||yyyy/MM/dd
* date_time||date_time_no_millis

Closes #2132
2013-06-13 15:03:55 +02:00
Martijn van Groningen 015d820e53 Made not found logic easier.
Relates #3172
2013-06-13 13:21:36 +02:00