Document the SpatialStrategies consistently and more thoroughly.

git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1389204 13f79535-47bb-0310-9956-ffa450edef68
2025-02-18 16:06:32 +00:00 · 2012-09-24 04:46:41 +00:00 · 2012-09-24 04:46:41 +00:00 · 84c5a4adde
commit 84c5a4adde
parent 11d421a36d
8 changed files with 140 additions and 42 deletions
--- a/lucene/spatial/src/java/org/apache/lucene/spatial/SpatialStrategy.java
+++ b/lucene/spatial/src/java/org/apache/lucene/spatial/SpatialStrategy.java
@ -40,12 +40,12 @@ import org.apache.lucene.spatial.query.SpatialArgs;
 *   <li>What types of query shapes can be used?</li>
 *   <li>What types of query operations are supported?
 *   This might vary per shape.</li>
- *   <li>Are there caches?  Under what circumstances are they used?
+ *   <li>Does it use the {@link org.apache.lucene.search.FieldCache}, {@link
- *   Roughly how big are they?  Is it segmented by Lucene segments, such as is
+ *   org.apache.lucene.index.DocValues} or some other type of cache?  When?
 *   done by the Lucene {@link org.apache.lucene.search.FieldCache} and
 *   {@link org.apache.lucene.index.DocValues} (ideal) or is it for the entire
 *   index?
 * </ul>
 * If a strategy only supports certain shapes at index or query time, then in
 * general it will throw an exception if given an incompatible one.  It will not
 * be coerced into compatibility.
 * <p/>
 * Note that a SpatialStrategy is not involved with the Lucene stored field
 * values of shapes, which is immaterial to indexing & search.
@ -85,7 +85,7 @@ public abstract class SpatialStrategy {
  }
  /**
-   * Returns the IndexableField(s) from the <code>shape</code> that are to be
+   * Returns the IndexableField(s) from the {@code shape} that are to be
   * added to the {@link org.apache.lucene.document.Document}.  These fields
   * are expected to be marked as indexed and not stored.
   * <p/>
@ -139,7 +139,7 @@ public abstract class SpatialStrategy {
  /**
   * Returns a ValueSource with values ranging from 1 to 0, depending inversely
   * on the distance from {@link #makeDistanceValueSource(com.spatial4j.core.shape.Point)}.
-   * The formula is <code>c/(d + c)</code> where 'd' is the distance and 'c' is
+   * The formula is {@code c/(d + c)} where 'd' is the distance and 'c' is
   * one tenth the distance to the farthest edge from the center. Thus the
   * scores will be 1 for indexed points at the center of the query shape and as
   * low as ~0.1 at its furthest edges.
--- a/lucene/spatial/src/java/org/apache/lucene/spatial/bbox/BBoxStrategy.java
+++ b/lucene/spatial/src/java/org/apache/lucene/spatial/bbox/BBoxStrategy.java
@ -43,8 +43,28 @@ import org.apache.lucene.spatial.query.UnsupportedSpatialOperation;
 /**
- * Based on GeoPortal's
+ * A SpatialStrategy for indexing and searching Rectangles by storing its
- * <a href="http://geoportal.svn.sourceforge.net/svnroot/geoportal/Geoportal/trunk/src/com/esri/gpt/catalog/lucene/SpatialClauseAdapter.java">SpatialClauseAdapter</a>.
+ * coordinates in numeric fields. It supports all {@link SpatialOperation}s and
 * has a custom overlap relevancy. It is based on GeoPortal's <a
 * href="http://geoportal.svn.sourceforge.net/svnroot/geoportal/Geoportal/trunk/src/com/esri/gpt/catalog/lucene/SpatialClauseAdapter.java">SpatialClauseAdapter</a>.
 *
 * <h4>Characteristics:</h4>
 * <ul>
 * <li>Only indexes Rectangles; just one per field value.</li>
 * <li>Can query only by a Rectangle.</li>
 * <li>Supports all {@link SpatialOperation}s.</li>
 * <li>Uses the FieldCache for any sorting / relevancy.</li>
 * </ul>
 *
 * <h4>Implementation:</h4>
 * This uses 4 double fields for minX, maxX, minY, maxY
 * and a boolean to mark a dateline cross. Depending on the particular {@link
 * SpatialOperation}s, there is a variety of {@link NumericRangeQuery}s to be
 * done.
 * The {@link #makeBBoxAreaSimilarityValueSource(com.spatial4j.core.shape.Rectangle)}
 * works by calculating the query bbox overlap percentage against the indexed
 * shape overlap percentage. The indexed shape's coordinates are retrieved from
 * the {@link org.apache.lucene.search.FieldCache}.
 *
 * @lucene.experimental
 */
--- a/lucene/spatial/src/java/org/apache/lucene/spatial/prefix/PrefixTreeStrategy.java
+++ b/lucene/spatial/src/java/org/apache/lucene/spatial/prefix/PrefixTreeStrategy.java
@ -37,8 +37,41 @@ import java.util.Map;
 import java.util.concurrent.ConcurrentHashMap;
 /**
- * Abstract SpatialStrategy which provides common functionality for those 
+ * An abstract SpatialStrategy based on {@link SpatialPrefixTree}. The two
- * Strategys which use {@link SpatialPrefixTree}s
+ * subclasses are {@link RecursivePrefixTreeStrategy} and {@link
 * TermQueryPrefixTreeStrategy}.  This strategy is most effective as a fast
 * approximate spatial search filter.
 *
 * <h4>Characteristics:</h4>
 * <ul>
 * <li>Can index any shape; however only {@link RecursivePrefixTreeStrategy}
 * can effectively search non-point shapes. <em>Not tested.</em></li>
 * <li>Can index a variable number of shapes per field value. This strategy
 * can do it via multiple calls to {@link #createIndexableFields(com.spatial4j.core.shape.Shape)}
 * for a document or by giving it some sort of Shape aggregate (e.g. JTS
 * WKT MultiPoint).  The shape's boundary is approximated to a grid precision.
 * </li>
 * <li>Can query with any shape.  The shape's boundary is approximated to a grid
 * precision.</li>
 * <li>Only {@link org.apache.lucene.spatial.query.SpatialOperation#Intersects}
 * is supported.  If only points are indexed then this is effectively equivalent
 * to IsWithin.</li>
 * <li>The strategy supports {@link #makeDistanceValueSource(com.spatial4j.core.shape.Point)}
 * even for multi-valued data.  However, <em>it will likely be removed in the
 * future</em> in lieu of using another strategy with a more scalable
 * implementation.  Use of this call is the only
 * circumstance in which a cache is used.  The cache is simple but as such
 * it doesn't scale to large numbers of points nor is it real-time-search
 * friendly.</li>
 * </ul>
 *
 * <h4>Implementation:</h4>
 * The {@link SpatialPrefixTree} does most of the work, for example returning
 * a list of terms representing grids of various sizes for a supplied shape.
 * An important
 * configuration item is {@link #setDistErrPct(double)} which balances
 * shape precision against scalability.  See those javadocs.
 *
 * @lucene.internal
 */
 public abstract class PrefixTreeStrategy extends SpatialStrategy {
@ -52,7 +85,12 @@ public abstract class PrefixTreeStrategy extends SpatialStrategy {
    this.grid = grid;
  }
-  /** Used in the in-memory ValueSource as a default ArrayList length for this field's array of values, per doc. */
+  /**
   * A memory hint used by {@link #makeDistanceValueSource(com.spatial4j.core.shape.Point)}
   * for how big the initial size of each Document's array should be. The
   * default is 2.  Set this to slightly more than the default expected number
   * of points per document.
   */
  public void setDefaultFieldValuesArrayLen(int defaultFieldValuesArrayLen) {
    this.defaultFieldValuesArrayLen = defaultFieldValuesArrayLen;
  }
@ -62,8 +100,14 @@ public abstract class PrefixTreeStrategy extends SpatialStrategy {
  }
  /**
-   * The default measure of shape precision affecting indexed and query shapes.
+   * The default measure of shape precision affecting shapes at index and query
-   * Specific shapes at index and query time can use something different.
+   * times. Points don't use this as they are always indexed at the configured
   * maximum precision ({@link org.apache.lucene.spatial.prefix.tree.SpatialPrefixTree#getMaxLevels()});
   * this applies to all other shapes. Specific shapes at index and query time
   * can use something different than this default value.  If you don't set a
   * default then the default is {@link SpatialArgs#DEFAULT_DISTERRPCT} --
   * 2.5%.
   *
   * @see org.apache.lucene.spatial.query.SpatialArgs#getDistErrPct()
   */
  public void setDistErrPct(double distErrPct) {
@ -81,7 +125,8 @@ public abstract class PrefixTreeStrategy extends SpatialStrategy {
    List<Node> cells = grid.getNodes(shape, detailLevel, true);//true=intermediates cells
    //If shape isn't a point, add a full-resolution center-point so that
    // PointPrefixTreeFieldCacheProvider has the center-points.
-    // TODO index each center of a multi-point? Yes/no?
+    //TODO index each point of a multi-point or other aggregate.
    //TODO remove this once support for a distance ValueSource is removed.
    if (!(shape instanceof Point)) {
      Point ctr = shape.getCenter();
      //TODO should be smarter; don't index 2 tokens for this in CellTokenStream. Harmless though.
--- a/lucene/spatial/src/java/org/apache/lucene/spatial/prefix/RecursivePrefixTreeFilter.java
+++ b/lucene/spatial/src/java/org/apache/lucene/spatial/prefix/RecursivePrefixTreeFilter.java
@ -34,13 +34,15 @@ import java.io.IOException;
 import java.util.LinkedList;
 /**
- * Performs a spatial intersection filter between a query shape and a field indexed with {@link SpatialPrefixTree}, a Trie.
+ * Performs a spatial intersection filter between a query shape and a field
- * SPT yields terms (grids) at length 1 and at greater lengths corresponding to greater precisions.
+ * indexed with {@link SpatialPrefixTree}, a Trie. SPT yields terms (grids) at
- * This filter recursively traverses each grid length and uses methods on {@link Shape} to efficiently know
+ * length 1 (aka "Level 1") and at greater lengths corresponding to greater
- * that all points at a prefix fit in the shape or not to either short-circuit unnecessary traversals or to efficiently
+ * precisions. This filter recursively traverses each grid length and uses
- * load all enclosed points.  If no indexed data lies in a portion of the shape
+ * methods on {@link Shape} to efficiently know that all points at a prefix fit
- * then that portion of the query shape is quickly passed over without
+ * in the shape or not to either short-circuit unnecessary traversals or to
- * decomposing the shape unnecessarily.
+ * efficiently load all enclosed points.  If no indexed data lies in a portion
 * of the shape then that portion of the query shape is quickly passed over
 * without decomposing the shape unnecessarily.
 *
 * @lucene.internal
 */
@ -167,7 +169,7 @@ RE "scan" threshold:
  @Override
  public String toString() {
-    return "GeoFilter{fieldName='" + fieldName + '\'' + ", shape=" + queryShape + '}';
+    return getClass().getSimpleName()+"{fieldName='" + fieldName + '\'' + ", shape=" + queryShape + '}';
  }
  @Override
--- a/lucene/spatial/src/java/org/apache/lucene/spatial/prefix/RecursivePrefixTreeStrategy.java
+++ b/lucene/spatial/src/java/org/apache/lucene/spatial/prefix/RecursivePrefixTreeStrategy.java
@ -25,7 +25,11 @@ import org.apache.lucene.spatial.query.SpatialOperation;
 import org.apache.lucene.spatial.query.UnsupportedSpatialOperation;
 /**
- * Based on {@link RecursivePrefixTreeFilter}.
+ * A {@link PrefixTreeStrategy} which uses {@link RecursivePrefixTreeFilter}.
 * This strategy has support for searching non-point shapes (note: not tested).
 * Even a query shape with distErrPct=0 (fully precise to the grid) should have
 * good performance for typical data, unless there is a lot of indexed data
 * coincident with the shape's edge.
 *
 * @lucene.experimental
 */
@ -38,6 +42,13 @@ public class RecursivePrefixTreeStrategy extends PrefixTreeStrategy {
    prefixGridScanLevel = grid.getMaxLevels() - 4;//TODO this default constant is dependent on the prefix grid size
  }
  /**
   * Sets the grid level [1-maxLevels] at which indexed terms are scanned brute-force
   * instead of by grid decomposition.  By default this is maxLevels - 4.  The
   * final level, maxLevels, is always scanned.
   *
   * @param prefixGridScanLevel 1 to maxLevels
   */
  public void setPrefixGridScanLevel(int prefixGridScanLevel) {
    //TODO if negative then subtract from maxlevels
    this.prefixGridScanLevel = prefixGridScanLevel;
--- a/lucene/spatial/src/java/org/apache/lucene/spatial/prefix/TermQueryPrefixTreeStrategy.java
+++ b/lucene/spatial/src/java/org/apache/lucene/spatial/prefix/TermQueryPrefixTreeStrategy.java
@ -30,14 +30,14 @@ import org.apache.lucene.spatial.query.UnsupportedSpatialOperation;
 import java.util.List;
 /**
- * A basic implementation of {@link PrefixTreeStrategy} using a large
+ * A basic implementation of {@link PrefixTreeStrategy} using a large {@link
- * {@link TermsFilter} of all the nodes from
+ * TermsFilter} of all the nodes from {@link SpatialPrefixTree#getNodes(com.spatial4j.core.shape.Shape,
- * {@link SpatialPrefixTree#getNodes(com.spatial4j.core.shape.Shape, int, boolean)}.
+ * int, boolean)}. It only supports the search of indexed Point shapes.
- * It only supports the search of indexed Point shapes.
+ * <p/>
- * <p />
+ * The precision of query shapes (distErrPct) is an important factor in using
- * The precision of query shapes is an important factor in using this Strategy.
+ * this Strategy. If the precision is too precise then it will result in many
- * If the precision is too precise then it will result in many terms which will
+ * terms which will amount to a slower query.
- * amount to a slower query.
+ *
 * @lucene.experimental
 */
 public class TermQueryPrefixTreeStrategy extends PrefixTreeStrategy {
--- a/lucene/spatial/src/java/org/apache/lucene/spatial/prefix/tree/SpatialPrefixTree.java
+++ b/lucene/spatial/src/java/org/apache/lucene/spatial/prefix/tree/SpatialPrefixTree.java
@ -28,10 +28,13 @@ import java.util.Collections;
 import java.util.List;
 /**
- * A spatial Prefix Tree, or Trie, which decomposes shapes into prefixed strings at variable lengths corresponding to
+ * A spatial Prefix Tree, or Trie, which decomposes shapes into prefixed strings
- * variable precision.  Each string corresponds to a spatial region.
+ * at variable lengths corresponding to variable precision.   Each string
- *
+ * corresponds to a rectangular spatial region.  This approach is
- * Implementations of this class should be thread-safe and immutable once initialized.
+ * also referred to "Grids", "Tiles", and "Spatial Tiers".
 * <p/>
 * Implementations of this class should be thread-safe and immutable once
 * initialized.
 *
 * @lucene.experimental
 */
--- a/lucene/spatial/src/java/org/apache/lucene/spatial/vector/PointVectorStrategy.java
+++ b/lucene/spatial/src/java/org/apache/lucene/spatial/vector/PointVectorStrategy.java
@ -44,14 +44,31 @@ import org.apache.lucene.spatial.util.CachingDoubleValueSource;
 import org.apache.lucene.spatial.util.ValueSourceFilter;
 /**
- * Simple {@link SpatialStrategy} which represents Points in two numeric {@link DoubleField}s.
+ * Simple {@link SpatialStrategy} which represents Points in two numeric {@link
 * DoubleField}s.  The Strategy's best feature is decent distance sort.
 *
- * Note, currently only Points can be indexed by this Strategy.  At query time, the bounding
+ * <h4>Characteristics:</h4>
- * box of the given Shape is used to create {@link NumericRangeQuery}s to efficiently
+ * <ul>
- * find Points within the Shape.
+ * <li>Only indexes points; just one per field value.</li>
 * <li>Can query by a rectangle or circle.</li>
 * <li>{@link
 * org.apache.lucene.spatial.query.SpatialOperation#Intersects} and {@link
 * SpatialOperation#IsWithin} is supported.</li>
 * <li>Uses the FieldCache for
 * {@link #makeDistanceValueSource(com.spatial4j.core.shape.Point)} and for
 * searching with a Circle.</li>
 * </ul>
 *
- * Due to the simple use of numeric fields, this Strategy provides support for sorting by
+ * <h4>Implementation:</h4>
- * distance through {@link DistanceValueSource}
+ * This is a simple Strategy.  Search works with {@link NumericRangeQuery}s on
 * an x & y pair of fields.  A Circle query does the same bbox query but adds a
 * ValueSource filter on
 * {@link #makeDistanceValueSource(com.spatial4j.core.shape.Point)}.
 * <p />
 * One performance shortcoming with this strategy is that a scenario involving
 * both a search using a Circle and sort will result in calculations for the
 * spatial distance being done twice -- once for the filter and second for the
 * sort.
 *
 * @lucene.experimental
 */