Commit Graph

1071 Commits

Author SHA1 Message Date
Shay Banon 69ef822da6 cleanup docsets
- remove the DocSet abstraction, and use Bits where we can by getting it from DocIdSet
- better handling of acceptDocs, though still need to properly apply them when caching is involved
2012-11-27 10:04:21 -08:00
Igor Motov fb9143aac1 fix sporadically disappearing fields during concurrent dynamic mapping updates 2012-11-24 14:02:58 +01:00
Simon Willnauer 4ab78bc537 Add basic javadocs for o.e.cluster.rounting package and related classes 2012-11-23 15:14:30 +01:00
Simon Willnauer 32a0772821 #2436 expose KeepWordTokenFilter by default 2012-11-23 10:11:30 +01:00
Igor Motov 65a43d3ad4 Fix handling of stop word _lang_ notation
Fixes #2412
2012-11-23 09:54:02 +01:00
Shay Banon 2094207bf1 add completed count to thread pools 2012-11-22 15:55:25 +01:00
Shay Banon e1679b89bb fix failed test that were using the wrong form match query 2012-11-22 15:14:02 +01:00
Shay Banon 192cf5298a fix failed test that were using the wrong form match query 2012-11-22 14:44:03 +01:00
Shay Banon f4d6d8139d Match query should fail when trying to provide several fields in its simplified form
fixes #2432
2012-11-22 10:23:48 +01:00
Chris Male 2541847945 Added control over Query used by MatchQuery with there are zero terms after analysis 2012-11-22 22:13:29 +13:00
Shay Banon 9a90c1c3b5 conservative timeouts on internal recovery actions
safe guards against cases where intenral recovery actions take too long (possibly due to a bug)
2012-11-22 00:31:57 +01:00
Shay Banon f5a3261e15 only log that we delete unused shard if it exists 2012-11-21 20:45:31 +01:00
Shay Banon d9b78000b1 Setting logger levels using cluster update settings does not work
fixes #2428
2012-11-21 13:44:54 +01:00
Chris Male 9e2469e04f Add per-field Similarity support 2012-11-21 12:44:59 +13:00
Shay Banon 4e8a9008b7 second phase at optimizing merging/parsing large new mappings
apply the new mappings only after the parsing/merging of a full doc/mapping is done
2012-11-19 17:40:13 +01:00
Shay Banon 303752d78a first phase at optimizing merging large mappings
bulk them the same level ones when traversing and introduce them
2012-11-19 17:13:45 +01:00
Shay Banon 6e597ffccb allow to associate a payload with bulk requests 2012-11-19 16:16:35 +01:00
David Pilato 83257c8af8 Add constructor IndexRequest(String index, String type) and fix javadoc 2012-11-19 13:52:55 +01:00
David Pilato b2597b5316 Add a toString() method to MultiSearchResponse 2012-11-19 13:50:58 +01:00
Simon Willnauer 840eaf983d Add JavaDocs for Codecs, PostingsFormat and related services/modules 2012-11-19 10:25:26 +01:00
Shay Banon c09ee82ef5 keep the uidField around so we don't have to look it up 2012-11-15 12:52:58 +01:00
Shay Banon e2e25ffea3 uid to use bloom filter posting by default 2012-11-15 11:57:48 +01:00
Martijn van Groningen 3577d826f2 Removed old file. 2012-11-15 11:54:51 +01:00
Martijn van Groningen be70722de7 Renamed pulsing40 and Lucene40 postings format providers to pulsing and default respectively for more consistent naming in settings. 2012-11-15 09:54:00 +01:00
Martijn van Groningen 20c6085852 changed test method names. 2012-11-15 09:40:24 +01:00
Martijn van Groningen e80f74584b Added licence header. 2012-11-15 00:18:49 +01:00
Martijn van Groningen fd5bd102aa lucene 4: Exposed Lucene's codec api
This feature adds the option to configure a `PostingsFormat` and assign it to a field in the mapping. This feature is very expert and in almost all cases Elasticsearch's defaults will suite your needs.

## Configuring a postingsformat per field

There're several default postings formats configured by default which can be used in your mapping:
a* `direct` - A codec that wraps the default postings format during write time, but loads the terms and postinglists into memory directly in memory during read time as raw arrays. This postings format is exceptional memory intensive, but can give a substantial increase in search performance.
* `memory` - A codec that loads and stores terms and postinglists in memory using a FST. Acts like a cached postingslist.
* `bloom_default` - Maintains a bloom filter for the indexed terms, which is stored to disk and builds on top of the `default` postings format. This postings format is useful for low document frequency terms and offers a fail fast for seeks to terms that don't exist.
* `bloom_pulsing` - Similar to the `bloom_default` postings format, but builds on top of the `pulsing` postings format.
* `default` - The default postings format. The default if none is specified.

On all fields it possible to configure a `postings_format` attribute. Example mapping:
```
{
  "person" : {
     "properties" : {
         "second_person_id" : {"type" : "string", "postings_format" : "pulsing"}
     }
  }
}
```

## Configuring a custom postingsformat
It is possible the instantiate custom postingsformats. This can be specified via the index settings.
```
{
   "codec" : {
      "postings_format" : {
         "my_format" : {
            "type" : "pulsing40"
            "freq_cut_off" : "5"
         }
      }
   }
}
```
In the above example the `freq_cut_off` is set the 5 (defaults to 1). This tells the pulsing postings format to inline the postinglist of terms with a document frequency lower or equal to 5 in the term dictionary.

Closes #2411
2012-11-14 23:54:29 +01:00
Igor Motov 120560bd0a Using non-mapped fields in prefix queries shouldn't cause NullPointerException
Fixes #2408
2012-11-14 18:34:54 +01:00
Igor Motov f47d62cc30 Date fields shouldn't be returned as longs by Get API 2012-11-13 21:36:28 +01:00
Igor Motov d1281d283b Add `index.routing.allocation.require....` and `cluster.routing.allocation.require....` settings
Fixes #2404
2012-11-13 19:29:20 +01:00
Igor Motov ea2732a967 lucene 4: field visitors shouldn't return fields that were not present in the visited document 2012-11-13 07:35:54 -05:00
Shay Banon 258244ef37 Deriving the REST status code from a failure can, very rarely, cause an infinite loop
fixes #2402
2012-11-12 17:09:34 +01:00
Nicholas Tung 46e1886975 Allow both .yml and .yaml as valid YAML configuration file extensions 2012-11-12 14:06:04 +01:00
Igor Motov 3ff54c0b5c Add logging for environment paths on startup 2012-11-12 14:04:13 +01:00
Martijn van Groningen 978c95649e lucene 4: Fixed SimpleQueryTests 2012-11-12 13:44:42 +01:00
Martijn van Groningen 05746adeb2 lucene 4: Set number of replicas to 0. Makes the test run faster. 2012-11-12 13:44:42 +01:00
Martijn van Groningen e2c33ed659 lucene 4: Fixed BitsetExecutionChildQuerySearchTests class. 2012-11-12 13:44:42 +01:00
Shay Banon 9a79fb40bf lucene 4: sort values on hit are Text, not BytesRef 2012-11-12 13:44:42 +01:00
Igor Motov c46228254d lucene 4: fix TTL 2012-11-12 13:44:42 +01:00
Igor Motov c2f3eab7d3 lucene 4: fix sorting 2012-11-12 13:44:42 +01:00
Shay Banon 2b58c2dfff lucene 4: optimize read/write BytesRef handling 2012-11-12 13:44:42 +01:00
Igor Motov c8cf72d657 lucene 4: fix handling of deleted docs in TermFilter 2012-11-12 13:44:42 +01:00
uboness d069212ce4 * fixed the type check for short 2012-11-12 13:44:42 +01:00
uboness 46223c117a * removed unused Streamables class 2012-11-12 13:44:42 +01:00
uboness ed2b009f07 * changed instanceof to be consistent with other type checks 2012-11-12 13:44:41 +01:00
uboness cae66fb636 * lucene 4: added missing short support in stream input/output
* lucene 4: added more extensive test for stored fields
2012-11-12 13:44:41 +01:00
Igor Motov f8842d5a4f lucene 4: fix TokenFilterTests 2012-11-12 13:44:41 +01:00
Igor Motov 98eb97a1ff lucene 4: fix NoopCollector 2012-11-12 13:44:41 +01:00
Shay Banon 9d5cae23fa lucene 4: fix general mapping test
no need to test for boost, we already have specific boost tests, in general, we should get rid of this test, and use more specialized tests if we are missing some
2012-11-12 13:44:41 +01:00
Shay Banon 5c45aad260 lucene 4: fix boost mapping tests 2012-11-12 13:44:41 +01:00
Igor Motov ffd262e96f lucene 4: rollback optimization in SingleFieldVisitor for now to make it work 2012-11-12 13:44:41 +01:00
Igor Motov cfbd17992a lucene 4: convert script term to string 2012-11-12 13:44:41 +01:00
Igor Motov 74464f9f99 lucene 4: fix possible NPE in range queries and filters if one of the bounds is not specified 2012-11-12 13:44:41 +01:00
Igor Motov 6d40770200 lucene 4: fixed facets and filtering aliases
I am not completely sure about this one, but it reduces the number of failing tests from 98 to 31 so I am going to check it in. Please, review and fix it, if there is a better solution.

Because of change in Lucene 4.0, ContextIndexSearcher was bypassed and elasticsearch filters and collectors were ignored.

In lucene 3.6 the stack of Searcher search calls looked like this:
search(Query query, int n)
search(Query query, Filter filter, int n)
search(Weight weight, Filter filter, int nDocs)
search(Weight weight, Filter filter, ScoreDoc after, int nDocs)
search(Weight weight, Filter filter, Collector collector) <-- this is ContextIndexSearcher was injecting combined filter and collector
search(Weight weight, Filter filter, Collector collector)

In Lucene 4.0 the stack looks like this:
search(Query query, int n)
search(Query query, Filter filter, int n) <-- here lucene wraps Query and Filter into Weight
search(Weight weight, ScoreDoc after, int nDocs)
search(List<AtomicReaderContext> leaves, Weight weight, ScoreDoc after, int nDocs)
search(List<AtomicReaderContext> leaves, Weight weight, Collector collector)
...

In other words, when we have Filter, we don't have a Collector yet, but when we have Collector, Filter is already wrapped inside Weight.  The only way to fix for the problem that I could think of is by introducing two injection points: one for Filters and another one for Collectors:

search(Query query, int n)
search(Query query, Filter filter, int n) <-- here combined Filters are injected
search(Weight weight, ScoreDoc after, int nDocs)
search(List<AtomicReaderContext> leaves, Weight weight, ScoreDoc after, int nDocs)
search(List<AtomicReaderContext> leaves, Weight weight, Collector collector) <-- here Collectors are injected

Similar problem existed for count(), so I had to override search(Query query, Collector results) as well.
2012-11-12 13:44:41 +01:00
Igor Motov 2eaad61a9e lucene4: make SimpleIdCache more resilient to missing fields
Not sure if we can get a segment with the _uid field, but segments without the _parent field definitely happen.
2012-11-12 13:44:41 +01:00
Igor Motov 9ad05ecdea lucene 4: make FieldVistors behave similar to FieldSelectors
Added back reset() method for now to make things work. Will refactor it out when we have tests passing.
2012-11-12 13:44:41 +01:00
Igor Motov 7aac88cf5c lucene4: check liveDocs and acceptedDocs for null before trying to call get() on them 2012-11-12 13:44:40 +01:00
Igor Motov 3f3a95668b lucene4: add support for omit_norm setting to numeric types and don't omit norms if boost is not 1.0
This commit enables setting boost for numeric fields. However, there is still no way to take advantage of boosted numeric fields during searching because all queries against numeric fields are translated into range queries wrapped in ConstantScore. Boost for numeric fields is broken on master as well https://gist.github.com/7ecedea4f6a5219efb89
2012-11-12 13:44:40 +01:00
Igor Motov 2fb3591792 lucene4: fixed default values tests to refer to correct default FieldType constants 2012-11-12 13:44:40 +01:00
Igor Motov a5bef30be9 lucene4: fixed CompressIndexInputOutputTests 2012-11-12 13:44:40 +01:00
Igor Motov 3816366780 lucene4: fixed SimpleAllMapperTests 2012-11-12 13:44:40 +01:00
Shay Banon 25717ab253 lucene 4: only omit_norms on non analyzed field if boost is not set 2012-11-12 13:44:40 +01:00
Shay Banon 72f41111c9 lucene 4: calling tokenStream is enough, verified to return a stream to analyze content 2012-11-12 13:44:40 +01:00
Shay Banon cb5df26bf7 lucene 4: use the proper token stream to return 2012-11-12 13:44:40 +01:00
Shay Banon a10f60873c lucene 4: fix numeric types to properly return numeric streams 2012-11-12 13:44:40 +01:00
Shay Banon a38064913f lucene 4: fix engine tests 2012-11-12 13:44:40 +01:00
Shay Banon 53d9b13e2f lucene 4: fix optimization check to set docs_only+omit_norms 2012-11-12 13:44:40 +01:00
Igor Motov 8a34ea1223 lucene4: fixed FloatFieldDataTests 2012-11-12 13:44:40 +01:00
Igor Motov bf13f3f81e lucene4: fixed SimpleIndexQueryParserTests 2012-11-12 13:44:39 +01:00
Martijn van Groningen db639e5c2e lucene 4: Upgraded SimpleLuceneTests class. Test actually passes now. 2012-11-12 13:44:39 +01:00
Martijn van Groningen 2a8161d096 lucene 4: Upgraded SimpleLuceneTests class.
The complete codebase compiles now!
2012-11-12 13:44:39 +01:00
Martijn van Groningen aa2a8c66cc lucene 4: Upgraded UidFieldTests class. 2012-11-12 13:44:39 +01:00
Shay Banon f796fe8d5e lucene 4: fix cases where number values are not stored 2012-11-12 13:44:39 +01:00
Martijn van Groningen 5c0ef796e8 lucene 4: Upgraded BoostMappingTests + SimpleMapperTests 2012-11-12 13:44:39 +01:00
Shay Banon cefe2ba870 lucene 4: fix fuzzy query test 2012-11-12 13:44:39 +01:00
Shay Banon bec0ffa623 lucene 4: make sure to apply doc boost only once per field name 2012-11-12 13:44:39 +01:00
Shay Banon 7ecfa9c35f lucene 4: caching should pass acceptDocs
still work left on streamlining filters
2012-11-12 13:44:39 +01:00
Shay Banon c60f20413b lucene 4: support doc level boost 2012-11-12 13:44:39 +01:00
Shay Banon b492320e2f lucene 4: switch directory not used 2012-11-12 13:44:39 +01:00
Shay Banon dca88a9b7c lucene 4: use field type in UidField 2012-11-12 13:44:39 +01:00
Shay Banon faf3e0e857 lucene 4: comment on adding DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS 2012-11-12 13:44:38 +01:00
Shay Banon e9f8d0c722 lucene 4: extrace Lucene#readSegmentsInfo, and use it where applicable 2012-11-12 13:44:38 +01:00
Shay Banon 0660e20c47 lucene 4: cleanup terms/uid filter 2012-11-12 13:44:38 +01:00
Shay Banon 79368bb221 lucene 4: fix visitors to use constants for field names 2012-11-12 13:44:38 +01:00
Martijn van Groningen 6ca6407468 lucene 4: Re-fixed issue in SourceScoreOrderFragmentsBuilder and SourceSimpleFragmentsBuilder. 2012-11-12 13:44:38 +01:00
Simon Willnauer a3de9e521d lucene 4: replaced TrimFilter and WordDelimiterFilter with lucene versions 2012-11-12 13:44:38 +01:00
Martijn van Groningen e33ae96b38 lucene 4: added overloaded method. To fix issue in SourceScoreOrderFragmentsBuilder and SourceSimpleFragmentsBuilder. 2012-11-12 13:44:38 +01:00
Martijn van Groningen 38dc19d8bc lucene 4: Fixed compile error. 2012-11-12 13:44:38 +01:00
Martijn van Groningen 673712c0b2 lucene 4: Upgraded IndexedGeoBoundingBoxFilter & InMemoryGeoBoundingBoxFilter. 2012-11-12 13:44:38 +01:00
Martijn van Groningen d42d153c48 lucene 4: Upgraded GeoDistanceRangeFilter, GeoPolygonFilter. 2012-11-12 13:44:38 +01:00
Martijn van Groningen 415cfa2e89 lucene 4: Upgraded GeoDistanceFilter, MatchedFiltersFetchSubPhase. 2012-11-12 13:44:38 +01:00
Martijn van Groningen ba1b870580 lucene 4: Upgraded CacheKeyFilter. 2012-11-12 13:44:38 +01:00
Martijn van Groningen 3298ad2235 lucene 4: Upgraded UidField. (version can be stored later as doc values) 2012-11-12 13:44:37 +01:00
Martijn van Groningen 968b012911 lucene 4: Upgraded *ValueGeoPointFieldData and GeoDistanceDataComparator. 2012-11-12 13:44:37 +01:00
Martijn van Groningen 09fe15488d lucene 4: Upgraded ScanContext. 2012-11-12 13:44:37 +01:00
Igor Motov 41325113f0 lucene4: switched from Field.Index to boolean indexed in ParseContext.includeInAll() 2012-11-12 13:44:37 +01:00
Igor Motov daf347e67e lucene4: replace IndexCommit.getVersion() with IndexCommit.getGeneration() 2012-11-12 13:44:37 +01:00
Igor Motov 787b7a3900 lucene4: more unit test cleanup 2012-11-12 13:44:37 +01:00
Igor Motov 5ad40205c2 lucene4: remove DocumentBuilder and FieldBuilder 2012-11-12 13:44:37 +01:00
Shay Banon 594598f493 close the index input in any case when computing length 2012-11-12 13:44:37 +01:00