OpenSearch

Commit Graph

Author	SHA1	Message	Date
Simon Willnauer	b9feaa9999	Simplify TestCluster TestCluster now doesn't use any reference counting anymore and testcluster names are based on creation time to prevent confilcts if builds hang.	2013-06-10 12:07:11 +02:00
Britta Weber	11d08ac436	term vector request ================================ Returns information and statistics on terms in the fields of a particular document as stored in the index. curl -XGET 'http://localhost:9200/twitter/tweet/1/_termvector?pretty=true' Tree types of values can be requested: term information, term statistics and field statistics. By default, all term information and field statistics are returned for all fields but no term statistics. Optionally, you can specify the fields for which the information is retrieved either with a parameter in the url curl -XGET 'http://localhost:9200/twitter/tweet/1/_termvector?fields=text,...' or adding by adding the requested fields in the request body (see example below). Term information ------------------------- - term frequency in the field (always returned) - term positions ("positions" : true) - start and end offsets ("offsets" : true) - term payloads ("payloads" : true), as base64 encoded bytes If the requested information wasn't stored in the index, it will be omitted without further warning. See [mapping](http://www.elasticsearch.org/guide/reference/mapping/core-types/) on how to configure your index to store term vectors. Term statistics ------------------------- Setting "term_statistics" to "true" (default is "false") will return - total term frequency (how often a term occurs in all documents) - document frequency (the number of documents containing the current term) By default these values are not returned since term statistics can have a serious performance impact. Field statistics ------------------------- Setting "field_statistics" to "false" (default is "true") will omit - document count (how many documents contain this field) - sum of document frequencies (the sum of document frequencies for all terms in this field) - sum of total term frequencies (the sum of total term frequencies of each term in this field) Behavior ------------------------- The term and field statistics are not accurate. Deleted documents are not taken into account. The information is only retrieved for the shard the requested document resides in. The term and field statistics are therefore only useful as relative measures whereas the absolute numbers have no meaning in this context. Example ------------------------- First, we create an index that stores term vectors, payloads etc. : curl -s -XPUT 'http://localhost:9200/twitter/' -d '{ "mappings": { "tweet": { "properties": { "text": { "type": "string", "term_vector": "with_positions_offsets_payloads", "store" : "yes", "index_analyzer" : "fulltext_analyzer" }, "fullname": { "type": "string", "term_vector": "with_positions_offsets_payloads", "index_analyzer" : "fulltext_analyzer" } } } }, "settings" : { "index" : { "number_of_shards" : 1, "number_of_replicas" : 0 }, "analysis": { "analyzer": { "fulltext_analyzer": { "type": "custom", "tokenizer": "whitespace", "filter": [ "lowercase", "type_as_payload" ] } } } } }' Second, we add some documents: curl -XPUT 'http://localhost:9200/twitter/tweet/1?pretty=true' -d '{ "fullname" : "John Doe", "text" : "twitter test test test " }' curl -XPUT 'http://localhost:9200/twitter/tweet/2?pretty=true' -d '{ "fullname" : "Jane Doe", "text" : "Another twitter test ..." }' The following request returns all information and statistics for field "text" in document "1" (John Doe): curl -XGET 'http://localhost:9200/twitter/tweet/1/_termvector?pretty=true' -d '{ "fields" : ["text"], "offsets" : true, "payloads" : true, "positions" : true, "term_statistics" : true, "field_statistics" : true }' Equivalently, all parameters can be passed as URI parameters: curl -GET 'http://localhost:9200/twitter/tweet/1/_termvector?pretty=true&fields=text&offsets=true&payloads=true&positions=true&term_statistics=true&field_statistics=true' Response: { "_index" : "twitter", "_type" : "tweet", "_id" : "1", "_version" : 1, "exists" : true, "term_vectors" : { "text" : { "field_statistics" : { "sum_doc_freq" : 6, "doc_count" : 2, "sum_ttf" : 8 }, "terms" : { "test" : { "doc_freq" : 2, "ttf" : 4, "term_freq" : 3, "pos" : [ 1, 2, 3 ], "start" : [ 8, 13, 18 ], "end" : [ 12, 17, 22 ], "payload" : [ "d29yZA==", "d29yZA==", "d29yZA==" ] }, "twitter" : { "doc_freq" : 2, "ttf" : 2, "term_freq" : 1, "pos" : [ 0 ], "start" : [ 0 ], "end" : [ 7 ], "payload" : [ "d29yZA==" ] } } } } } Further changes: ------------------------- XContentBuilder new method public XContentBuilder field(XContentBuilderString name, int offset, int length, int... value) to put an integer array. IndicesAnalysisService make token filter for saving payloads available in elasticsearch AbstractFieldMapper/TypeParser make term vector options string available and also fix the parsing of this string: with_positions_payloads is actually allowed as can be seen in TermVectorsConsumerPerFields. Closes #3114	2013-06-10 11:09:11 +02:00
Simon Willnauer	945b89fd80	Don't test the test - who tests the test for the test? ;)	2013-06-07 20:40:50 +02:00
Simon Willnauer	b222e83d2b	Stabelize more tests	2013-06-07 20:33:17 +02:00
Britta Weber	ac75b1bcae	Fix addMapping() in AbstractSharedClusterTest for more than one field	2013-06-07 19:05:13 +02:00
Alexander Reelsen	a5f9173e14	Making deb installable by being lintian compatible According to #2515 the ubuntu software center does not allow to install debian packages which are not lintian compatible I worked on the package and made it lintian compatible by doing * Ignoring errors about arch dependent binaries as we will not split this package. The arch dependent libraries are used correctly. * Added a copyright file pointing to the apache license in debian Closes #2515 Closes #2320	2013-06-07 13:53:14 +02:00
Simon Willnauer	962e3d58f7	Added shortcuts for several common commands added simple way to add more complex mappings as well as shortcuts for flush and status etc. all checking if requests return failed shards	2013-06-07 12:30:30 +02:00
Martijn van Groningen	8016d32a0e	Fixed minor issue in ASCT#indexExists(...)	2013-06-06 21:42:42 +02:00
Martijn van Groningen	e218ead19e	ChildrenQuery and ParentQuery now take into account documents that have been marked. Closes #3144	2013-06-06 17:13:49 +02:00
Simon Willnauer	3b01f812d6	Stabelize more tests Wait for relocation before checking statistics or run refresh / optimze.	2013-06-06 17:03:36 +02:00
Simon Willnauer	1c513bc262	Fallback to extract terms if MultiPhraseQuery is large Currently if MPQ is very large highlighing can take down a node or cause high CPU / RAM consumption. If the query grows > 16 terms we just extract the terms and do term by term highlighting. Closes #3142 #3128	2013-06-06 11:22:49 +02:00
Simon Willnauer	f995c9c130	Correct offsets in FVH also if stored field is used for highlighting The SimpleFragemntsBuilder did not correct offsets if the used analysis chais could produce broken offsets that could lead to StringArrayIndexOutOfBounds Exceptions Closes #3140	2013-06-06 10:23:09 +02:00
Simon Willnauer	00c13532a9	report details if shard response has failed shards	2013-06-06 00:54:34 +02:00
Martijn van Groningen	7936417270	Added a benchmark for parent/child queries while indexing at the same time.	2013-06-05 22:27:18 +02:00
Martijn van Groningen	82ff1c6802	Fixed `has_parent` query and filter returning no results with multi level child docs.	2013-06-05 22:12:26 +02:00
Simon Willnauer	56dfa96851	More test cleanups	2013-06-05 15:45:03 +02:00
Simon Willnauer	23ad8401d0	Fix SearchStatsTest Use actual node in test instead of the first node in the array	2013-06-05 09:47:37 +02:00
Simon Willnauer	4ff471ff82	Stabelize more failing tests. - SimpleSortTests#testSortScript which was not using the mapping correctly - SearchStatsTests#testSimpleStats which didn't clear the stats before running the test and a previous run could have added queries	2013-06-04 08:32:48 +02:00
Adrien Grand	85f54edf66	Fix AbstractSimpleEngineTests versioning tests. Version is now stored on a distinct field, that AbstractSimpleEngineTests didn't correctly add before running tests. This generated a test failure when the version needed to be loaded from the index.	2013-06-04 00:58:54 +02:00
Simon Willnauer	07546d4d8d	Stabeilze SearchStatsTests Query stats are only present (not 0) on nodes that hold a shard of the index.	2013-06-03 15:05:57 +02:00
Christoph Kempen	9f43814a86	Changed Java dependency from Depends to Suggest. Since people are using the Oracle JAVA distribution and not the OpenJDK. You can suggest it of course. Now the installation will at least continue. If the init script is called, it will exit with a useful error message, that no JDK is available via the JAVA_HOME variable.	2013-06-02 15:09:29 +02:00
Alexander Reelsen	609ad0e572	Changing version semantics to be more readable The Version class had hard to understand semantics when two versions were compared against each other. Sample of the new logic: * V_0_20_0.before(V_0_90_0) => true * V_0_90_0.after(V_0_20_0) => true Closes #3124	2013-06-02 14:58:36 +02:00
Simon Willnauer	3417b945dd	stabelize SimpleQueryTest	2013-06-02 13:02:36 +02:00
Simon Willnauer	a3f4d33aaa	Stabelize MoreLikeThisActionTest Ensure test sends mapping with createIndex	2013-06-02 10:45:30 +02:00
Simon Willnauer	a5837b0f8d	Stableize SearchStatsTest after search refactoring SearchStatsTest depends on a given set of nodes and shards. The test needed to be adjusted to reflect a possibly random number of nodes.	2013-06-02 10:04:47 +02:00
Simon Willnauer	2682c24975	Add Test for simple allocation scenario This test checks for the "perfect" or a "sane" allocation when the total number of shards is separable by the total number of nodes the index can be allocated on.	2013-06-02 08:06:40 +02:00
Alexander Reelsen	183bb76371	Ensure config files are not overwritten in RPM upgrade In order to ensure that configuration files do not get overwritten when upgrading an RPM, it is not sufficient to mark them as configuration. You have to use the 'noreplace' parameter to make sure, they are never overwritten. Added this parameter for the /etc/elasticsearch directory as well as the /etc/sysconfig/elasticsearch file. In addition, the post remove script now only deletes the user in case of a package removal (and does nothing on package upgrade). Closes #3123	2013-05-31 16:26:10 +02:00
Simon Willnauer	e6a3c9c153	Improve integration testing by reusing an abstracted cluster across tests The new AbstractSharedClusterTest abstracts integration testing further to reduce the overhead of writing tests that don't rely on explict control over the cluster. For instance tests that run query, facets or that test highlighting don't need to explictly start and stop nodes. Testing features like the ones just mentioned are based on the assumption that the underlying cluster can be arbitray. Based on this assumption this base class allows to: * randomize cluster and index settings if not explictly specified * transparently test transport & node clients * test features like search or highlighting on different cluster sizes * allow reuse of node insteance across tests * provide utility methods that act as upper or lower bounds that a test must pass with ie. if a test requries at least 3 nodes then it should also pass with 4 nodes * given a cluster has unmodified cluster settings (persistent and transient) the cluster should not differ to a fresh started cluster when reused across nodes. * within a test the client implementation and the clients associated node can be changed at any time and should return a valid result. This patch also prepares some redundant tests like 'RelocationTests.java' for randomized testing. Test like this are very long-running on some machines and run the same test with different parameters like 'number of writers' or 'number of relocations' which can easily be chosen with a random number and run only ones during development but multiple times during CI builds. All the improvements in this change reduce the test time by ~30%	2013-05-31 10:23:45 +02:00
iksnalybok	47154a79c5	Allow negative slops in SpanNearQueryParser This is mainly due to the fact that SpanNearQuery allows some neat tricks with negative slops to run zero-sloped near queries across 2 or more SpanTermQueries. Closes #3079	2013-05-31 09:35:46 +02:00
David Pilato	663f653ced	Add more information and options in PluginManager New option -l, --list displays list of existing plugins New option -h, --help displays help Deprecate options: -install is now -i, --install -remove is now -r, --remove -url is now -u, --url Catch ArraysOutOfBoundException when no arg given to install, remove or url option Add description on plugin name structure: - elasticsearch/plugin/version for official elasticsearch plugins (download from download.elasticsearch.org) - groupId/artifactId/version for community plugins (download from maven central or oss sonatype) - username/repository for site plugins (download from github master) Closes #3112.	2013-05-30 22:26:05 +02:00
Adrien Grand	c16a46e15c	Make it easier to get started with Eclipse. This patch makes mvn eclipse:eclipse generate additional eclipse configuration files so that Eclipse: - uses Java 1.6 compliance level, - truncates lines after 140 chars, - uses 4 spaces for indentation, - automatically adds a license header when creating a new class file, - organizes imports the same way as Intellij Idea (which makes sense I guess since most of the code bas has been written with Intellij, this will prevent from having large diffs due to the fact that the order of imports has changed).	2013-05-30 16:48:58 +02:00
Shay Banon	7931add154	add 0.90.2 version	2013-05-30 14:27:33 +02:00
Adrien Grand	490c7103ae	Store _version as a numeric doc values field. Doc values can be expected to be more compact than payloads and should provide better flexibility since doc values formats can be picked on a per-field basis. This patch: - makes _version stored as a numeric doc values field, - manages backwards compatibility: if a version is not found in doc values, then it will look into payloads, - uses background merges to upgrade old segments and move _version from payloads to doc values. Closes #3103	2013-05-30 11:28:54 +02:00
Adrien Grand	5ea6c77dad	Highlighting shouldn't fail when the field to highlight is absent. PlainHighlighter fails with a NPE when the field to highlight is marked as stored in the mapping but doesn't exist in a hit. This patch makes FieldsVisitor.fields less error-prone by returning an empty list instead of null when no matching stored field was found. Closes #3109	2013-05-30 10:56:36 +02:00
Alexander Reelsen	03a86604a4	Reuse suggester implementations in suggest parsers	2013-05-29 15:07:23 +02:00
Alexander Reelsen	8a5b7b21df	Make suggester implementation pluggable This patch tries to make the suggester implementation as pluggable as facets or highlight implementations. The goal is to be able to create own suggest implementations in a suggest query. Closes #3089	2013-05-28 08:59:50 +02:00
Martijn van Groningen	8b95c5fab8	Added indices aliases exists api. Added indices aliases exists api that allows to check to existence of an index alias. This api redirects to the master to check for the existence of one or multiple index aliases. Possible options: * `index` - The index name to check index aliases for. Partially names are supported via wildcards, also multiple index names can be specified separated with a comma. Also the alias name for an index can be used. * `alias` - The name of alias to check the existence for. Like the index option, this option supports wildcards and the option the specify multiple alias names separated by a comma. This is a required option. * `ignore_indices` - What to do is an specified index name doesn't exist. If set to `missing` then those indices are ignored. The rest head endpoint is: `/{index}/_alias/{alias}` Examples: Check existence for any aliases with the name 2013 in any index: ``` curl -XHEAD 'localhost:9200/_alias/2013 ``` Check existence for any aliases that start with 2013_01 in any index ``` curl -XHEAD 'localhost:9200/_alias/2013_01* ``` Check existence for any aliases in the users index. ``` curl -XHEAD 'localhost:9200/users/_alias/* ``` Closes #3100	2013-05-27 20:40:11 +02:00
Shay Banon	b4d75a50bf	Dates accessed from scripts should use UTC timezone this was broken in the field data refactoring we did in 0.90, fixes #3091	2013-05-25 22:43:48 +02:00
Alexander Reelsen	24fccc91d8	Stabilized SimplePercolaterTests	2013-05-24 22:22:43 +02:00
Alexander Reelsen	2e4d18b519	Fixing percolation of documents with TTL set When a type is configured with a TTL, percolation of documents of this type was not possible. This fix ignores the TTL for percolation instead of throwing an exception that the document is already expired. Closes #2975	2013-05-24 17:58:02 +02:00
Simon Willnauer	6e366bae34	Never throw an IAE if the IndexMapper isn't present in PostingsFormat If we throw an exception in the PostingsFormat during a merge we essentially fail the entire merge which can lead to a corrupt index. We should rather return the default postings format for the new segment and log a warning. Closes #3088	2013-05-24 17:40:36 +02:00
Martijn van Groningen	9ed274822d	wait for green/yellow status	2013-05-24 17:32:45 +02:00
Simon Willnauer	d5ca1be34e	Added testcase to ensure #3078 doesn't fail	2013-05-23 23:18:45 +02:00
Simon Willnauer	13c1145548	Fix String.format to use Locale.ROOT	2013-05-23 18:26:12 +02:00
Martijn van Groningen	ffdebe9bc3	Added three new index alias related apis. Added apis to get specific index aliases based on filtering by alias name and index name: ``` curl -XGET 'localhost:9200/{index_or_alias}/_alias/{alias_name}' ``` Added delete index alias api for deleting a single index alias: ``` curl -XDELETE 'localhost:9200/{index}/_alias/{alias_name}' ``` Added create index alias api for adding a single index alias: ``` curl -XPUT 'localhost:9200/{index}/_alias/{alias_name}' curl -XPUT 'localhost:9200/{index}/_alias/{alias_name}' -d '{ "routing" : {routing}, "filter" : {filter} }' ``` Closes #3075 #3076 #3077	2013-05-23 09:18:17 +02:00
Simon Willnauer	841c2d1e14	Fix bug in DateFieldMapper where format is serialized instead of locale This fix adds a default serialization step in the SimpleDateMappingTests that parses the mapping, builds the mapper, serializes the mapper and rebuilds the actual mapper from the serialization result. The contained information must be equivalent to the original mapping. The fixed bug has no issue assigned to is since the code is unreleased yet.	2013-05-22 21:14:19 +02:00
Clinton Gormley	bb9871bcb5	Changed common terms query to also support camelCased parameters and renamed disable_coords to disable_coord, to be consistent with the bool query Closes #3074	2013-05-22 16:52:32 +02:00
Matt Weber	927fda8a61	Apply QueryParser boost to top leve query if applicable. Set the query boost of a parsed query string query to the product of the parsed query boost and the boost value specified in the "boost" query string parameter. This only applies if the top level query returned from the query parser has a boost assigned to it. In such a case we must multiply the boost with the top level query boost otherwise the boost will be overwritten ie. 'foo^2' has a top-level boost of 2 while 'foo^2 OR bar^3' has a top level boost of 1.0 (default) since the boolean query is the top level query. Closes #3024	2013-05-21 10:29:28 +02:00
Simon Willnauer	af4205fd30	Fix method name typo & beef up tests s/DateFieldMapper#parseLocal/DateFieldMapper#parseLocal/ SimpleDateMappingTests Tests now also check local dependent patters with root locale.	2013-05-21 09:37:35 +02:00
Shay Banon	e0825686f3	rollback multi get fields change seems like it still fails while serializing with sporadic failures in the tests (due to routing on serialization), need to test it in a consistent manner	2013-05-20 06:56:37 -07:00

... 6 7 8 9 10 ...

5230 Commits All Branches Search

5230 Commits

All Branches