Commit Graph

125 Commits

Author SHA1 Message Date
Simon Willnauer 77bc5d5ecf release [1.0.0.Beta1] 2013-11-06 15:32:43 +01:00
Simon Willnauer 9654631186 Change 'standart' analyzer to use emtpy stopword list by default.
The 'default' / 'standard' analyzer can be a trappy default sicne it filters
english stopwords by default. Yet a default should not be dedicated to a certain language
since elasticsearch is used in many different scenarios where a standard analysis chain
with specialization to english full-text might be rather counter productive.

This commit changes the 'standard' analyzer to use an empty stopword list for indices
that are created from 1.0.0.Beta1 version onwards but will maintain backwards compatibiliy
for older indices.

Closes #3775
2013-11-05 21:07:21 +01:00
Shay Banon 7c32269f4f Dist. Percolation: Use .percolator instead of _percolator for type name
Use .percolator as the internal (hidden) type name for percolators within the index. Seems nicer name to represent "hidden" types within an index.
closes #4090
2013-11-05 20:02:59 +01:00
Boaz Leskes a9fdcadf01 [DOCS] Added documentation for the keep word token filter 2013-11-04 18:38:44 +01:00
Clinton Gormley 356de95840 Added simplified range syntax to query string docs 2013-11-04 18:18:36 +01:00
Karel Minarik b93dac678f [DOC] Added a link to the official Ruby client to the "Clients" page 2013-11-04 11:47:14 +01:00
Karel Minarik 7023ef2e3f [DOCS] Added a basic information about the official Ruby client to documentation 2013-11-04 11:46:36 +01:00
Ben McCann 46edfc484a [DOCS] Add some documentation about the performance of `_source` usage in scripts. 2013-11-04 11:05:55 +01:00
Igor Motov c724f0de5d Initial implementation of ResourceWatcherService
Closes #4062
2013-11-03 21:55:54 -05:00
Dan Everton 6df60b7271 [DOC] Improve documentation on search stats groups
Document the ability to return all search statistics groups and provide examples of returning search statistics for groups.
2013-11-01 13:53:39 +01:00
Martijn van Groningen 30ab6f841d [DOCS] Fixed percolate docs errors 2013-11-01 11:44:07 +01:00
Clinton Gormley 4206cc988e [DOCS] Typo on shingle tokenfilter 2013-10-31 20:18:00 +01:00
Opak Alex 6856cfc5e3 add reference for ember-data-elasticsearch-kit to integrations page 2013-10-31 11:40:01 +01:00
Alexander Reelsen dfcb3ca2d4 RegexpQueryBuilder now implements MultiTermQueryBuilder
This allows the RegexpQueryBuilder to be used in span queries

Added tests for all span multi term queries.
Also updated the documentation and removed mentioning of numeric range
queries for span queries (they have to be terms).

Closes #3392
2013-10-31 09:12:57 +01:00
Boaz Leskes 8819f91d47 Add a GetFieldMapping API
This new API allows to get the mapping for a specific set of fields rather than get the whole index mapping and traverse it.
The fields to be retrieved can be specified by their full path, index name and field name and will be resolved in this order.
In case multiple field match, the first one will be returned.

Since we are now generating the output (rather then fall back to the stored mapping), you can specify `include_defaults`=true on the request to have default values returned.

Closes #3941
2013-10-30 16:16:36 +01:00
Clinton Gormley 8b2efd4849 [DOCS] Added a version flag to percolation 2013-10-30 13:59:03 +01:00
Clinton Gormley 0585890a5f [DOCS] Fixed a typo 2013-10-30 13:57:18 +01:00
Alexander Reelsen 2ec9742147 [DOCS] Extending setup as a service documentation
* Tell people to use ES_JAVA_OPTS for es.node.name or similar parameters
* Showing a simple way to install Oracle JDK on ubuntu/debian

Closes #3999
2013-10-29 13:58:06 +01:00
David Pilato 5d90abf701 mget API should support global routing parameter
mget API support `_routing` field but not `routing` parameter.

Reproduction here:

```sh
curl -XDELETE "http://localhost:9200/test/"; echo
curl -XPUT "http://localhost:9200/test/" -d'{
   "settings": {
      "number_of_replicas": 0,
      "number_of_shards": 5
   }
}'; echo

curl -XPUT 'http://localhost:9200/test/order/1-1?routing=key1' -d '{
   "productName":"doc 1"
}'; echo
curl -XPUT 'http://localhost:9200/test/order/1-2?routing=key1' -d '{
   "productName":"doc 2"
}'; echo
curl -XPUT 'http://localhost:9200/test/order/1-3?routing=key1&refresh=true' -d '{
   "productName":"doc 3"
}'; echo

curl -XPOST 'http://localhost:9200/test/order/_mget?pretty' -d '{
    "docs" : [
        {
            "_index" : "test",
            "_type" : "order",
            "_id" : "1-1",
            "_routing" : "key1"
        },
        {
            "_index" : "test",
            "_type" : "order",
            "_id" : "1-2",
            "_routing" : "key1"
        },
        {
            "_index" : "test",
            "_type" : "order",
            "_id" : "1-3",
            "_routing" : "key1"
        }
    ]
}'; echo

curl -XPOST 'http://localhost:9200/test/order/_mget?pretty&routing=key1' -d '{
	"ids": [
		"1-1",
		"1-2",
		"1-3"
	]
}'; echo
```

Closes #3996.
2013-10-28 21:05:55 +01:00
Britta Weber c9dab6991e rename and document "index.mapping.date.parse_upper_inclusive" setting for date fields
The setting causes the upper bound for a range query/filter to be rounded up,
therefore the name `round_ceil` seems to make more sense.

Also this commit removes the redundant fourth parameter to DateMathParser.parse(..)
which was never used.
was:    parse(String text, long now, boolean roundUp, boolean upperInclusive)
is now: parse(String text, long now, boolean roundCeil)

closes #3914
2013-10-28 15:48:31 +01:00
Ben McCann cc4bc7d57d Fix nonsensical sentence in standard analyzer documentation so that it is more understandable 2013-10-25 00:18:32 +02:00
Luca Cavanna 48ac9747a8 Added third highlighter type based on lucene postings highlighter
Requires field index_options set to "offsets" in order to store positions and offsets in the postings list.
Considerably faster than the plain highlighter since it doesn't require to reanalyze the text to be highlighted: the larger the documents the better the performance gain should be.
Requires less disk space than term_vectors, needed for the fast_vector_highlighter.
Breaks the text into sentences and highlights them. Uses a BreakIterator to find sentences in the text. Plays really well with natural text, not quite the same if the text contains html markup for instance.
Treats the document as the whole corpus, and scores individual sentences as if they were documents in this corpus, using the BM25 algorithm.

Uses forked version of lucene postings highlighter to support:
- per value discrete highlighting for fields that have multiple values, needed when number_of_fragments=0 since we want to return a snippet per value
- manually passing in query terms to avoid calling extract terms multiple times, since we use a different highlighter instance per doc/field, but the query is always the same

The lucene postings highlighter api is  quite different compared to the existing highlighters api, the main difference being that it allows to highlight multiple fields in multiple docs with a single call, ensuring sequential IO.
The way it is introduced in elasticsearch in this first round is a compromise trying not to change the current highlight api, which works per document, per field. The main disadvantage is that we lose the sequential IO, but we can always refactor the highlight api to work with multiple documents.

Supports pre_tag, post_tag, number_of_fragments (0 highlights the whole field), require_field_match, no_match_size, order by score and html encoding.

Closes #3704
2013-10-24 23:38:00 +02:00
Luca Cavanna e981e411d7 [DOCS] rephrased docs for highlight no_match_size parameter
(removed 0.90.6 coming tag as it's needed only in 0.90 branch)
2013-10-24 14:38:32 +02:00
Nik Everett 14a709f563 Highlighting can return excerpt with no highlights
You can configure the highlighting api to return an excerpt of a field
even if there wasn't a match on the field.

The FVH makes excerpts from the beginning of the string to the first
boundary character after the requested length or the boundary_max_scan,
whichever comes first.  The Plain highlighter makes excerpts from the
beginning of the string to the end of the last token before the requested
length.

Closes #1171
2013-10-24 14:38:32 +02:00
Boaz Leskes 0e6e6f97dc Merge pull request #3940 from rboulton/patch-1
[Docs] Clean up wording in cluster health api doc
2013-10-22 04:09:13 -07:00
Markus Fischer 782d315da3 Fix markup 2013-10-21 16:11:09 +02:00
Richard Boulton b62cc7c716 Clean up wording to reduce confusion
The description of the timeout parameter was worded misleadingly; it implied that the API would wait until the cluster reached the desired level and then stayed at that level for the timeout. I've tweaked the sentence to remove the risk of confusion.
2013-10-21 12:37:50 +01:00
Clinton Gormley b2d82d7e75 [DOCS] Reorganised the highlight_query docs and added a version flag 2013-10-18 18:03:31 +02:00
Matt Weber 1e0a834c68 Document strict dynamic type mapping. 2013-10-18 08:29:31 -07:00
Nik Everett 60550e4cc2 phrase_len is not called phrase_length 2013-10-18 09:29:53 -04:00
Clinton Gormley adf0c8424b [DOCS] How to check max_file_descriptors 2013-10-17 11:54:36 +02:00
David Pilato 4efd94e7cf Java API Documentation (0.90+) needs update for accessors in Facets docs
Closes #3921.
(cherry picked from commit a753c48)
2013-10-17 09:50:15 +02:00
Honza Kral dd43d932f1 Added a link to official Python client to the client list, fixed perl link 2013-10-16 17:51:50 +02:00
Honza Kral 4f3ad73854 Added brief overview of the python client to the guide 2013-10-16 17:45:05 +02:00
Martijn van Groningen b7c4adeea3 [Docs] update reference to remove documentation about percolating during an index, bulk or update request. 2013-10-16 16:31:36 +02:00
Martijn van Groningen 1d0841e2b8 Added initial documentation for the redesigned percolator. 2013-10-16 14:12:19 +02:00
Boaz Leskes 18e12ef66c [Docs] updated refrences to dynamic_date_formats 2013-10-16 12:04:31 +02:00
Boaz Leskes 57b2d45142 [Docs] added document for the lenient option in match queries 2013-10-16 10:53:25 +02:00
Clinton Gormley f5e2cf9785 [Docs] Typo 2013-10-15 17:27:05 +02:00
Clinton Gormley 4798425da6 [Docs] Added a page for the Perl client 2013-10-15 17:22:34 +02:00
Alexander Reelsen 4d19239ec4 Add support for Lucene SuggestStopFilter
The suggest stop filter is an improved version of the stop filter, which
takes stopwords only into account if the last char of a query is a
whitespace. This allows you to keep stopwords, but to allow suggesting for
"a".

Example: Index document content "a word". You are now able to suggest for
"a" and get back results in the completion suggester, if the suggest stop
filter is used on the query side, but will not get back any results for
"a " as this is identified as a stopword.

The implementation allows to set the `remove_trailing` parameter for a
custom stop filter and thus use the suggest stop filter instead of the
standard stop filter.
2013-10-15 16:12:02 +02:00
Clinton Gormley 870346070e [DOCS] Added compound_on_flush docs and updated compound_format
docs to include note about accepting a float
2013-10-15 13:30:56 +02:00
Clinton Gormley d67331b554 [DOCS] Added script.disable_dynamic to the scripting page 2013-10-15 12:25:07 +02:00
steve mayzak 48656fd1ed removed a duplicate paragraphin config docs 2013-10-14 15:33:56 -07:00
Britta Weber 34441f3897 fix naming in function_score
- "boost" should be "boost_factor"
    - "mult" should be "multiply"

Also, store combine function names in ImmutableMap instead of iterating
over all possible names each time.

closes #3872 for master
2013-10-14 14:56:59 +02:00
Simon Willnauer 25d6f04f13 [DOCS] Note that cutoff_frequency doesn't handle stacked tokens gracefully 2013-10-14 14:09:38 +02:00
Britta Weber c3ab79a10e [DOCS] Add doc for delimited payload token filter 2013-10-14 13:41:35 +02:00
Clinton Gormley 9a062e465c [DOCS] Reorganised common API conventions 2013-10-13 16:46:56 +02:00
Clinton Gormley 4316b13880 [DOCS] Render common options on the same page 2013-10-13 14:14:50 +02:00
Shay Banon 420b3396f4 Set queue sizes by default on bulk/index thread pools
Now that we properly fixed the ability to set the queue size on the index / bulk thread pool, we should actually set them to a somehow reasonable value to protect from users potentially overflowing our system.

I suggest defaults to be 50 for bulk, and 200 for indexing.

Also, set the thread pool for get, which we should set (in a similar value to a "read" queue size we have today).
closes #3888
2013-10-12 21:51:37 +02:00