mirror of https://github.com/apache/lucene.git
Document the SpatialStrategies consistently and more thoroughly.
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1389204 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
11d421a36d
commit
84c5a4adde
|
@ -40,12 +40,12 @@ import org.apache.lucene.spatial.query.SpatialArgs;
|
||||||
* <li>What types of query shapes can be used?</li>
|
* <li>What types of query shapes can be used?</li>
|
||||||
* <li>What types of query operations are supported?
|
* <li>What types of query operations are supported?
|
||||||
* This might vary per shape.</li>
|
* This might vary per shape.</li>
|
||||||
* <li>Are there caches? Under what circumstances are they used?
|
* <li>Does it use the {@link org.apache.lucene.search.FieldCache}, {@link
|
||||||
* Roughly how big are they? Is it segmented by Lucene segments, such as is
|
* org.apache.lucene.index.DocValues} or some other type of cache? When?
|
||||||
* done by the Lucene {@link org.apache.lucene.search.FieldCache} and
|
|
||||||
* {@link org.apache.lucene.index.DocValues} (ideal) or is it for the entire
|
|
||||||
* index?
|
|
||||||
* </ul>
|
* </ul>
|
||||||
|
* If a strategy only supports certain shapes at index or query time, then in
|
||||||
|
* general it will throw an exception if given an incompatible one. It will not
|
||||||
|
* be coerced into compatibility.
|
||||||
* <p/>
|
* <p/>
|
||||||
* Note that a SpatialStrategy is not involved with the Lucene stored field
|
* Note that a SpatialStrategy is not involved with the Lucene stored field
|
||||||
* values of shapes, which is immaterial to indexing & search.
|
* values of shapes, which is immaterial to indexing & search.
|
||||||
|
@ -85,7 +85,7 @@ public abstract class SpatialStrategy {
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Returns the IndexableField(s) from the <code>shape</code> that are to be
|
* Returns the IndexableField(s) from the {@code shape} that are to be
|
||||||
* added to the {@link org.apache.lucene.document.Document}. These fields
|
* added to the {@link org.apache.lucene.document.Document}. These fields
|
||||||
* are expected to be marked as indexed and not stored.
|
* are expected to be marked as indexed and not stored.
|
||||||
* <p/>
|
* <p/>
|
||||||
|
@ -139,7 +139,7 @@ public abstract class SpatialStrategy {
|
||||||
/**
|
/**
|
||||||
* Returns a ValueSource with values ranging from 1 to 0, depending inversely
|
* Returns a ValueSource with values ranging from 1 to 0, depending inversely
|
||||||
* on the distance from {@link #makeDistanceValueSource(com.spatial4j.core.shape.Point)}.
|
* on the distance from {@link #makeDistanceValueSource(com.spatial4j.core.shape.Point)}.
|
||||||
* The formula is <code>c/(d + c)</code> where 'd' is the distance and 'c' is
|
* The formula is {@code c/(d + c)} where 'd' is the distance and 'c' is
|
||||||
* one tenth the distance to the farthest edge from the center. Thus the
|
* one tenth the distance to the farthest edge from the center. Thus the
|
||||||
* scores will be 1 for indexed points at the center of the query shape and as
|
* scores will be 1 for indexed points at the center of the query shape and as
|
||||||
* low as ~0.1 at its furthest edges.
|
* low as ~0.1 at its furthest edges.
|
||||||
|
|
|
@ -43,8 +43,28 @@ import org.apache.lucene.spatial.query.UnsupportedSpatialOperation;
|
||||||
|
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Based on GeoPortal's
|
* A SpatialStrategy for indexing and searching Rectangles by storing its
|
||||||
* <a href="http://geoportal.svn.sourceforge.net/svnroot/geoportal/Geoportal/trunk/src/com/esri/gpt/catalog/lucene/SpatialClauseAdapter.java">SpatialClauseAdapter</a>.
|
* coordinates in numeric fields. It supports all {@link SpatialOperation}s and
|
||||||
|
* has a custom overlap relevancy. It is based on GeoPortal's <a
|
||||||
|
* href="http://geoportal.svn.sourceforge.net/svnroot/geoportal/Geoportal/trunk/src/com/esri/gpt/catalog/lucene/SpatialClauseAdapter.java">SpatialClauseAdapter</a>.
|
||||||
|
*
|
||||||
|
* <h4>Characteristics:</h4>
|
||||||
|
* <ul>
|
||||||
|
* <li>Only indexes Rectangles; just one per field value.</li>
|
||||||
|
* <li>Can query only by a Rectangle.</li>
|
||||||
|
* <li>Supports all {@link SpatialOperation}s.</li>
|
||||||
|
* <li>Uses the FieldCache for any sorting / relevancy.</li>
|
||||||
|
* </ul>
|
||||||
|
*
|
||||||
|
* <h4>Implementation:</h4>
|
||||||
|
* This uses 4 double fields for minX, maxX, minY, maxY
|
||||||
|
* and a boolean to mark a dateline cross. Depending on the particular {@link
|
||||||
|
* SpatialOperation}s, there is a variety of {@link NumericRangeQuery}s to be
|
||||||
|
* done.
|
||||||
|
* The {@link #makeBBoxAreaSimilarityValueSource(com.spatial4j.core.shape.Rectangle)}
|
||||||
|
* works by calculating the query bbox overlap percentage against the indexed
|
||||||
|
* shape overlap percentage. The indexed shape's coordinates are retrieved from
|
||||||
|
* the {@link org.apache.lucene.search.FieldCache}.
|
||||||
*
|
*
|
||||||
* @lucene.experimental
|
* @lucene.experimental
|
||||||
*/
|
*/
|
||||||
|
|
|
@ -37,8 +37,41 @@ import java.util.Map;
|
||||||
import java.util.concurrent.ConcurrentHashMap;
|
import java.util.concurrent.ConcurrentHashMap;
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Abstract SpatialStrategy which provides common functionality for those
|
* An abstract SpatialStrategy based on {@link SpatialPrefixTree}. The two
|
||||||
* Strategys which use {@link SpatialPrefixTree}s
|
* subclasses are {@link RecursivePrefixTreeStrategy} and {@link
|
||||||
|
* TermQueryPrefixTreeStrategy}. This strategy is most effective as a fast
|
||||||
|
* approximate spatial search filter.
|
||||||
|
*
|
||||||
|
* <h4>Characteristics:</h4>
|
||||||
|
* <ul>
|
||||||
|
* <li>Can index any shape; however only {@link RecursivePrefixTreeStrategy}
|
||||||
|
* can effectively search non-point shapes. <em>Not tested.</em></li>
|
||||||
|
* <li>Can index a variable number of shapes per field value. This strategy
|
||||||
|
* can do it via multiple calls to {@link #createIndexableFields(com.spatial4j.core.shape.Shape)}
|
||||||
|
* for a document or by giving it some sort of Shape aggregate (e.g. JTS
|
||||||
|
* WKT MultiPoint). The shape's boundary is approximated to a grid precision.
|
||||||
|
* </li>
|
||||||
|
* <li>Can query with any shape. The shape's boundary is approximated to a grid
|
||||||
|
* precision.</li>
|
||||||
|
* <li>Only {@link org.apache.lucene.spatial.query.SpatialOperation#Intersects}
|
||||||
|
* is supported. If only points are indexed then this is effectively equivalent
|
||||||
|
* to IsWithin.</li>
|
||||||
|
* <li>The strategy supports {@link #makeDistanceValueSource(com.spatial4j.core.shape.Point)}
|
||||||
|
* even for multi-valued data. However, <em>it will likely be removed in the
|
||||||
|
* future</em> in lieu of using another strategy with a more scalable
|
||||||
|
* implementation. Use of this call is the only
|
||||||
|
* circumstance in which a cache is used. The cache is simple but as such
|
||||||
|
* it doesn't scale to large numbers of points nor is it real-time-search
|
||||||
|
* friendly.</li>
|
||||||
|
* </ul>
|
||||||
|
*
|
||||||
|
* <h4>Implementation:</h4>
|
||||||
|
* The {@link SpatialPrefixTree} does most of the work, for example returning
|
||||||
|
* a list of terms representing grids of various sizes for a supplied shape.
|
||||||
|
* An important
|
||||||
|
* configuration item is {@link #setDistErrPct(double)} which balances
|
||||||
|
* shape precision against scalability. See those javadocs.
|
||||||
|
*
|
||||||
* @lucene.internal
|
* @lucene.internal
|
||||||
*/
|
*/
|
||||||
public abstract class PrefixTreeStrategy extends SpatialStrategy {
|
public abstract class PrefixTreeStrategy extends SpatialStrategy {
|
||||||
|
@ -52,7 +85,12 @@ public abstract class PrefixTreeStrategy extends SpatialStrategy {
|
||||||
this.grid = grid;
|
this.grid = grid;
|
||||||
}
|
}
|
||||||
|
|
||||||
/** Used in the in-memory ValueSource as a default ArrayList length for this field's array of values, per doc. */
|
/**
|
||||||
|
* A memory hint used by {@link #makeDistanceValueSource(com.spatial4j.core.shape.Point)}
|
||||||
|
* for how big the initial size of each Document's array should be. The
|
||||||
|
* default is 2. Set this to slightly more than the default expected number
|
||||||
|
* of points per document.
|
||||||
|
*/
|
||||||
public void setDefaultFieldValuesArrayLen(int defaultFieldValuesArrayLen) {
|
public void setDefaultFieldValuesArrayLen(int defaultFieldValuesArrayLen) {
|
||||||
this.defaultFieldValuesArrayLen = defaultFieldValuesArrayLen;
|
this.defaultFieldValuesArrayLen = defaultFieldValuesArrayLen;
|
||||||
}
|
}
|
||||||
|
@ -62,8 +100,14 @@ public abstract class PrefixTreeStrategy extends SpatialStrategy {
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* The default measure of shape precision affecting indexed and query shapes.
|
* The default measure of shape precision affecting shapes at index and query
|
||||||
* Specific shapes at index and query time can use something different.
|
* times. Points don't use this as they are always indexed at the configured
|
||||||
|
* maximum precision ({@link org.apache.lucene.spatial.prefix.tree.SpatialPrefixTree#getMaxLevels()});
|
||||||
|
* this applies to all other shapes. Specific shapes at index and query time
|
||||||
|
* can use something different than this default value. If you don't set a
|
||||||
|
* default then the default is {@link SpatialArgs#DEFAULT_DISTERRPCT} --
|
||||||
|
* 2.5%.
|
||||||
|
*
|
||||||
* @see org.apache.lucene.spatial.query.SpatialArgs#getDistErrPct()
|
* @see org.apache.lucene.spatial.query.SpatialArgs#getDistErrPct()
|
||||||
*/
|
*/
|
||||||
public void setDistErrPct(double distErrPct) {
|
public void setDistErrPct(double distErrPct) {
|
||||||
|
@ -81,7 +125,8 @@ public abstract class PrefixTreeStrategy extends SpatialStrategy {
|
||||||
List<Node> cells = grid.getNodes(shape, detailLevel, true);//true=intermediates cells
|
List<Node> cells = grid.getNodes(shape, detailLevel, true);//true=intermediates cells
|
||||||
//If shape isn't a point, add a full-resolution center-point so that
|
//If shape isn't a point, add a full-resolution center-point so that
|
||||||
// PointPrefixTreeFieldCacheProvider has the center-points.
|
// PointPrefixTreeFieldCacheProvider has the center-points.
|
||||||
// TODO index each center of a multi-point? Yes/no?
|
//TODO index each point of a multi-point or other aggregate.
|
||||||
|
//TODO remove this once support for a distance ValueSource is removed.
|
||||||
if (!(shape instanceof Point)) {
|
if (!(shape instanceof Point)) {
|
||||||
Point ctr = shape.getCenter();
|
Point ctr = shape.getCenter();
|
||||||
//TODO should be smarter; don't index 2 tokens for this in CellTokenStream. Harmless though.
|
//TODO should be smarter; don't index 2 tokens for this in CellTokenStream. Harmless though.
|
||||||
|
|
|
@ -34,13 +34,15 @@ import java.io.IOException;
|
||||||
import java.util.LinkedList;
|
import java.util.LinkedList;
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Performs a spatial intersection filter between a query shape and a field indexed with {@link SpatialPrefixTree}, a Trie.
|
* Performs a spatial intersection filter between a query shape and a field
|
||||||
* SPT yields terms (grids) at length 1 and at greater lengths corresponding to greater precisions.
|
* indexed with {@link SpatialPrefixTree}, a Trie. SPT yields terms (grids) at
|
||||||
* This filter recursively traverses each grid length and uses methods on {@link Shape} to efficiently know
|
* length 1 (aka "Level 1") and at greater lengths corresponding to greater
|
||||||
* that all points at a prefix fit in the shape or not to either short-circuit unnecessary traversals or to efficiently
|
* precisions. This filter recursively traverses each grid length and uses
|
||||||
* load all enclosed points. If no indexed data lies in a portion of the shape
|
* methods on {@link Shape} to efficiently know that all points at a prefix fit
|
||||||
* then that portion of the query shape is quickly passed over without
|
* in the shape or not to either short-circuit unnecessary traversals or to
|
||||||
* decomposing the shape unnecessarily.
|
* efficiently load all enclosed points. If no indexed data lies in a portion
|
||||||
|
* of the shape then that portion of the query shape is quickly passed over
|
||||||
|
* without decomposing the shape unnecessarily.
|
||||||
*
|
*
|
||||||
* @lucene.internal
|
* @lucene.internal
|
||||||
*/
|
*/
|
||||||
|
@ -167,7 +169,7 @@ RE "scan" threshold:
|
||||||
|
|
||||||
@Override
|
@Override
|
||||||
public String toString() {
|
public String toString() {
|
||||||
return "GeoFilter{fieldName='" + fieldName + '\'' + ", shape=" + queryShape + '}';
|
return getClass().getSimpleName()+"{fieldName='" + fieldName + '\'' + ", shape=" + queryShape + '}';
|
||||||
}
|
}
|
||||||
|
|
||||||
@Override
|
@Override
|
||||||
|
|
|
@ -25,7 +25,11 @@ import org.apache.lucene.spatial.query.SpatialOperation;
|
||||||
import org.apache.lucene.spatial.query.UnsupportedSpatialOperation;
|
import org.apache.lucene.spatial.query.UnsupportedSpatialOperation;
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Based on {@link RecursivePrefixTreeFilter}.
|
* A {@link PrefixTreeStrategy} which uses {@link RecursivePrefixTreeFilter}.
|
||||||
|
* This strategy has support for searching non-point shapes (note: not tested).
|
||||||
|
* Even a query shape with distErrPct=0 (fully precise to the grid) should have
|
||||||
|
* good performance for typical data, unless there is a lot of indexed data
|
||||||
|
* coincident with the shape's edge.
|
||||||
*
|
*
|
||||||
* @lucene.experimental
|
* @lucene.experimental
|
||||||
*/
|
*/
|
||||||
|
@ -38,6 +42,13 @@ public class RecursivePrefixTreeStrategy extends PrefixTreeStrategy {
|
||||||
prefixGridScanLevel = grid.getMaxLevels() - 4;//TODO this default constant is dependent on the prefix grid size
|
prefixGridScanLevel = grid.getMaxLevels() - 4;//TODO this default constant is dependent on the prefix grid size
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Sets the grid level [1-maxLevels] at which indexed terms are scanned brute-force
|
||||||
|
* instead of by grid decomposition. By default this is maxLevels - 4. The
|
||||||
|
* final level, maxLevels, is always scanned.
|
||||||
|
*
|
||||||
|
* @param prefixGridScanLevel 1 to maxLevels
|
||||||
|
*/
|
||||||
public void setPrefixGridScanLevel(int prefixGridScanLevel) {
|
public void setPrefixGridScanLevel(int prefixGridScanLevel) {
|
||||||
//TODO if negative then subtract from maxlevels
|
//TODO if negative then subtract from maxlevels
|
||||||
this.prefixGridScanLevel = prefixGridScanLevel;
|
this.prefixGridScanLevel = prefixGridScanLevel;
|
||||||
|
|
|
@ -30,14 +30,14 @@ import org.apache.lucene.spatial.query.UnsupportedSpatialOperation;
|
||||||
import java.util.List;
|
import java.util.List;
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* A basic implementation of {@link PrefixTreeStrategy} using a large
|
* A basic implementation of {@link PrefixTreeStrategy} using a large {@link
|
||||||
* {@link TermsFilter} of all the nodes from
|
* TermsFilter} of all the nodes from {@link SpatialPrefixTree#getNodes(com.spatial4j.core.shape.Shape,
|
||||||
* {@link SpatialPrefixTree#getNodes(com.spatial4j.core.shape.Shape, int, boolean)}.
|
* int, boolean)}. It only supports the search of indexed Point shapes.
|
||||||
* It only supports the search of indexed Point shapes.
|
* <p/>
|
||||||
* <p />
|
* The precision of query shapes (distErrPct) is an important factor in using
|
||||||
* The precision of query shapes is an important factor in using this Strategy.
|
* this Strategy. If the precision is too precise then it will result in many
|
||||||
* If the precision is too precise then it will result in many terms which will
|
* terms which will amount to a slower query.
|
||||||
* amount to a slower query.
|
*
|
||||||
* @lucene.experimental
|
* @lucene.experimental
|
||||||
*/
|
*/
|
||||||
public class TermQueryPrefixTreeStrategy extends PrefixTreeStrategy {
|
public class TermQueryPrefixTreeStrategy extends PrefixTreeStrategy {
|
||||||
|
|
|
@ -28,10 +28,13 @@ import java.util.Collections;
|
||||||
import java.util.List;
|
import java.util.List;
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* A spatial Prefix Tree, or Trie, which decomposes shapes into prefixed strings at variable lengths corresponding to
|
* A spatial Prefix Tree, or Trie, which decomposes shapes into prefixed strings
|
||||||
* variable precision. Each string corresponds to a spatial region.
|
* at variable lengths corresponding to variable precision. Each string
|
||||||
*
|
* corresponds to a rectangular spatial region. This approach is
|
||||||
* Implementations of this class should be thread-safe and immutable once initialized.
|
* also referred to "Grids", "Tiles", and "Spatial Tiers".
|
||||||
|
* <p/>
|
||||||
|
* Implementations of this class should be thread-safe and immutable once
|
||||||
|
* initialized.
|
||||||
*
|
*
|
||||||
* @lucene.experimental
|
* @lucene.experimental
|
||||||
*/
|
*/
|
||||||
|
|
|
@ -44,14 +44,31 @@ import org.apache.lucene.spatial.util.CachingDoubleValueSource;
|
||||||
import org.apache.lucene.spatial.util.ValueSourceFilter;
|
import org.apache.lucene.spatial.util.ValueSourceFilter;
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Simple {@link SpatialStrategy} which represents Points in two numeric {@link DoubleField}s.
|
* Simple {@link SpatialStrategy} which represents Points in two numeric {@link
|
||||||
|
* DoubleField}s. The Strategy's best feature is decent distance sort.
|
||||||
*
|
*
|
||||||
* Note, currently only Points can be indexed by this Strategy. At query time, the bounding
|
* <h4>Characteristics:</h4>
|
||||||
* box of the given Shape is used to create {@link NumericRangeQuery}s to efficiently
|
* <ul>
|
||||||
* find Points within the Shape.
|
* <li>Only indexes points; just one per field value.</li>
|
||||||
|
* <li>Can query by a rectangle or circle.</li>
|
||||||
|
* <li>{@link
|
||||||
|
* org.apache.lucene.spatial.query.SpatialOperation#Intersects} and {@link
|
||||||
|
* SpatialOperation#IsWithin} is supported.</li>
|
||||||
|
* <li>Uses the FieldCache for
|
||||||
|
* {@link #makeDistanceValueSource(com.spatial4j.core.shape.Point)} and for
|
||||||
|
* searching with a Circle.</li>
|
||||||
|
* </ul>
|
||||||
*
|
*
|
||||||
* Due to the simple use of numeric fields, this Strategy provides support for sorting by
|
* <h4>Implementation:</h4>
|
||||||
* distance through {@link DistanceValueSource}
|
* This is a simple Strategy. Search works with {@link NumericRangeQuery}s on
|
||||||
|
* an x & y pair of fields. A Circle query does the same bbox query but adds a
|
||||||
|
* ValueSource filter on
|
||||||
|
* {@link #makeDistanceValueSource(com.spatial4j.core.shape.Point)}.
|
||||||
|
* <p />
|
||||||
|
* One performance shortcoming with this strategy is that a scenario involving
|
||||||
|
* both a search using a Circle and sort will result in calculations for the
|
||||||
|
* spatial distance being done twice -- once for the filter and second for the
|
||||||
|
* sort.
|
||||||
*
|
*
|
||||||
* @lucene.experimental
|
* @lucene.experimental
|
||||||
*/
|
*/
|
||||||
|
|
Loading…
Reference in New Issue