don't wrap in AnalysisService the indices analyzers we have with a NamedAnalyzer, since its effectively creates a new instance of an analyzer (with per field reuse strategy) and we don't benefit as much from reusing analyzers on the indices / node level
Now, the indices level analyzers return a NamedAnalyzer, also NamedAnalyzer will use the non per field reuse strategy since thats really the common case for it (no need for per field reuse there).
Also, try and reuse numeric analyzers globally instead of creating them per numeric mapper. Although those analyzers are not used during indexing (we have a custom numeric field for it), they can be used sometimes when searching in a query string for example without specific query implemenation in the mappers
in guice, we always use eager loaded singletons for all modules we create, thus, we can actually optimize the memory used by injectors by reduced the construction information they store per binding resulting in extensive reduction in memory usage for many indices/shards case on a node
also because all are eager singletons (and effectively, read only), we can not go through trying to create just in time bindings in the parent injector before trying to craete it in the current injector, resulting in improvement of object creations time and the time it takes to create an index or a shard on a node
The currently used maven shade plugin still keeps references to the
original classes in their constant pools around. This is never a problem
at runtime, but for dependency tools which try to use the constant pool
for determining dependencies will get confused (OSGI for example). This
patch simply bumps the version and will implicetely fix
fix http://jira.codehaus.org/browse/MSHADE-105Closes#3254Closes#3255
This has two advantages in the case term filter is *not* cached:
* We iterate only once over the matching docs. Before this fix we iterated once to create the FBS and another time the consume the matching docs from the FBS.
* The DocIdSetIterator#cost method of a DocIdSetIterator from the DocsEnum is accurate, because it based on the document frequency whereas the cost method of the FBS' iterator impl is based on the total number of bits (which is based on maxDoc). This will make this filter execute faster when it is included in a filtered query, because the filtered query can base its decision on what strategy to pick on an accurate heuristic.
This change doesn't have any negative implications in the case a filter is cached (which is the default). The FBS is now created lazily in the DocIdSets#toCacheable method, which is always invoked when the term filter needs to be cached.
===============
The code handling geo-shapes is not centralized and creating points takes
place at different places. Also the collection of supported geo_shapes is
not complete regarding to the GEOJSon specification. This commit
centralizes the code related to GEO calculations and extends the old API by
a set of new shapes.
Null-Shapes
===========
The latest implementation of geo-shapes allows to index null-shapes. This
means a field that is defined to hold a geo-shape can be set to null. In
example:
{
"shape": null
}
New Shapes
==========
The geo-shapes multipoint and multilinestring have been added to the
geo_shape types. Also geo_circle is introduced by this commit.
Dateline wrapping
=================
A major issue of geo-shapes is the spherical geometry. Since ElasticSearch
works on the Geo-Coordinates by wrapping the Earths surface to a plane,
some shapes are hard to define if it’s crossing the +180°, -180 longitude.
To solve this issue ElasticSearch offers the possibility to define geo
shapes crossing this borders and decompose these shapes and automatically
re-compose them in a spherical manner. This feature may change the indexed
shape-type. If for example a polygon is defined, that crosses the dateline,
it will be re-assembled to a set of polygons. This causes indexing a
multipolygon. Also linestrings crossing the dateline might be re-assembled
to multilinestrings.
Builders
========
The API has been refactored to use builders instead of using shapes. So
parsing geo-shapes will result in builder objects. These builders can be
parsed and serialized without generating any shapes. this causes shape
generation only on the nodes executing the actual operation. Also the
baseclass ShapeBuilder implements the ToXContent interface which allows to
set fields of XContent directly.
TODO’s
======
- The geo-circle will not work, if it’s crossing the dateline
- The envelope also needs to wrapped
Closes#1997#2708
When specifying the docs to be returned in a multi get request, a parent
field could not be specified, so that some docs seemingly did not exist,
even though they did.
This fix behaves like the normal GetRequest and simply overwrites the
routing value if it has not yet been set.
Also a test for routing with mget has been added.
Closes#3274
The MultiGet API stops with a IndexMissingException, if only one of all
requests tries to access a non existing index. This patch creates a
failure for this item without failing the whole request.
Closes#3267
Relying on deleted documents when loading field data is dangerous because a
field data instance might be loaded for a given generation of a segment and
then loaded from the cache by an older generation of the same segment which
has fewer deleted documents. This could, for example, lead to under-estimated
facet counts. The same issue applies to the ID cache and filter caches.
Close#3224
The `parent` option was ignored in the delete api (rest only) and for delete actions in the bulk api.
This bug occurred in the case that the _parent field is enabled, and only the parent option was used. This resulted in a situation that documents are deleted even if the specified parent value is incorrect.
Closes#3257
The current implementation of parsing suggestions executed inside of the
the pull parser - which resulted in being reliable of the order of the
elements in the request. This fix changes the behaviour to parse the
relevant parts of the request first and then execute all the suggestions
afterwards, so we can be sure that every information has been extracted
from the request before execution.
Closes#3247
Even though proposed in the documentation, the realtime enabling/disabling of
index warmers was not supported. This commit adds support for
index.warmer.enabled as a dynamic setting.
Closes#3246
This commit merges field data implementations for byte, short, int and long
data into PackedArrayAtomicFieldData which uses Lucene's PackedInts API to
store data.
Close#3220
IndexUpgraderMergePolicy assumed that field numbers were dense and that
fieldInfos.size() was a free field number. This can however be wrong for a
segment which doesn't have one or more fields that some older segments have.
Close#3237
This enforces that settings are taken into account whichever mean is used to
import the project into Eclipse (manual import, m2e, mvn eclipse:eclipse, ...).