Merge remote-tracking branch 'origin/feature/autoscaling' into feature/autoscaling

This commit is contained in:
Noble Paul 2017-07-13 16:57:51 +09:30
commit d2849cd453
166 changed files with 2410 additions and 2444 deletions

View File

@ -134,7 +134,7 @@ def update_example_solrconfigs(new_version):
print(' updating example solrconfig.xml files')
matcher = re.compile('<luceneMatchVersion>')
paths = ['solr/server/solr/configsets', 'solr/example']
paths = ['solr/server/solr/configsets', 'solr/example', 'solr/core/src/test-files/solr/configsets/_default']
for path in paths:
if not os.path.isdir(path):
raise RuntimeError("Can't locate configset dir (layout change?) : " + path)

View File

@ -58,8 +58,11 @@ New Features
----------------------
* SOLR-11019: Add addAll Stream Evaluator (Joel Bernstein)
* SOLR-10996: Implement TriggerListener API (ab, shalin)
* SOLR-11046: Add residuals Stream Evaluator (Joel Bernstein)
Bug Fixes
----------------------
@ -68,7 +71,8 @@ Bug Fixes
Optimizations
----------------------
(No Changes)
* SOLR-10985: Remove unnecessary toString() calls in solr-core's search package's debug logging.
(Michael Braun via Christine Poerschke)
Other Changes
----------------------
@ -80,6 +84,8 @@ Other Changes
* SOLR-10748: Make stream.body configurable and disabled by default (janhoy)
* SOLR-10964: Reduce SolrIndexSearcher casting in LTRRescorer. (Christine Poerschke)
================== 7.0.0 ==================
Versions of Major Components
@ -289,6 +295,8 @@ New Features
* SOLR-10965: New ExecutePlanAction for autoscaling which executes the operations computed by ComputePlanAction
against the cluster. (shalin)
* SOLR-10282: bin/solr support for enabling Kerberos authentication (Ishan Chattopadhyaya)
Bug Fixes
----------------------
* SOLR-9262: Connection and read timeouts are being ignored by UpdateShardHandler after SOLR-4509.
@ -347,6 +355,13 @@ Bug Fixes
* SOLR-10826: Fix CloudSolrClient to expand the collection parameter correctly (Tim Owen via Varun Thacker)
* SOLR-11039: Next button in Solr admin UI for collection list pagination does not work. (janhoy)
* SOLR-11041: MoveReplicaCmd do not specify ulog dir in case of HDFS (Cao Manh Dat)
* SOLR-11045: The new replica created by MoveReplica will have to have same name and coreName as the
old one in case of HDFS (Cao Manh Dat)
Optimizations
----------------------
@ -482,6 +497,10 @@ Other Changes
- SOLR-10977: Randomize the usage of Points based numerics in schema15.xml and all impacted tests (hossman)
- SOLR-10979: Randomize PointFields in schema-docValues*.xml and all affected tests (hossman)
- SOLR-10989: Randomize PointFields and general cleanup in schema files where some Trie fields were unused (hossman)
- SOLR-11048: Randomize PointsFields in schema-add-schema-fields-update-processor.xml in solr-core collection1 and
all affected tests (Anshum Gupta)
- SOLR-11059: Randomize PointFields in schema-blockjoinfacetcomponent.xml and all related tests (Anshum Gupta)
- SOLR-11060: Randomize PointFields in schema-custom-field.xml and all related tests (Anshum Gupta)
* SOLR-6807: Changed requestDispatcher's handleSelect to default to false, thus ignoring "qt".
Simplified configs to not refer to handleSelect or "qt". Switch all tests that assumed true to assume false
@ -498,6 +517,13 @@ Other Changes
* SOLR-11016: Fix TestCloudJSONFacetJoinDomain test-only bug (hossman)
* SOLR-11021: The elevate.xml config-file is made optional in the ElevationComponent.
The default configset doesn't ship with a elevate.xml file anymore (Varun Thacker)
* SOLR-10898: Fix SOLR-10898 to not deterministicly fail 1/512 runs (hossman)
* SOLR-10796: TestPointFields: increase randomized testing of non-trivial values. (Steve Rowe)
================== 6.7.0 ==================
Consult the LUCENE_CHANGES.txt file for additional, low level, changes in this release.
@ -630,6 +656,8 @@ when using one of Exact*StatsCache (Mikhail Khludnev)
* SOLR-10914: RecoveryStrategy's sendPrepRecoveryCmd can get stuck for 5 minutes if leader is unloaded. (shalin)
* SOLR-11024: ParallelStream should set the StreamContext when constructing SolrStreams (Joel Bernstein)
Optimizations
----------------------
* SOLR-10634: JSON Facet API: When a field/terms facet will retrieve all buckets (i.e. limit:-1)

View File

@ -555,20 +555,23 @@ function print_usage() {
echo ""
echo "Usage: solr auth enable [-type basicAuth] -credentials user:pass [-blockUnknown <true|false>] [-updateIncludeFileOnly <true|false>]"
echo " solr auth enable [-type basicAuth] -prompt <true|false> [-blockUnknown <true|false>] [-updateIncludeFileOnly <true|false>]"
echo " solr auth enable -type kerberos -config "<kerberos configs>" [-updateIncludeFileOnly <true|false>]"
echo " solr auth disable [-updateIncludeFileOnly <true|false>]"
echo ""
echo " -type <type> The authentication mechanism to enable. Defaults to 'basicAuth'."
echo " -type <type> The authentication mechanism (basicAuth or kerberos) to enable. Defaults to 'basicAuth'."
echo ""
echo " -credentials <user:pass> The username and password of the initial user"
echo " -credentials <user:pass> The username and password of the initial user. Applicable for basicAuth only."
echo " Note: only one of -prompt or -credentials must be provided"
echo ""
echo " -prompt <true|false> Prompts the user to provide the credentials"
echo " -config "<configs>" Configuration parameters (Solr startup parameters). Required and applicable only for Kerberos"
echo ""
echo " -prompt <true|false> Prompts the user to provide the credentials. Applicable for basicAuth only."
echo " Note: only one of -prompt or -credentials must be provided"
echo ""
echo " -blockUnknown <true|false> When true, this blocks out access to unauthenticated users. When not provided,"
echo " this defaults to false (i.e. unauthenticated users can access all endpoints, except the"
echo " operations like collection-edit, security-edit, core-admin-edit etc.). Check the reference"
echo " guide for Basic Authentication for more details."
echo " guide for Basic Authentication for more details. Applicable for basicAuth only."
echo ""
echo " -updateIncludeFileOnly <true|false> Only update the solr.in.sh or solr.in.cmd file, and skip actual enabling/disabling"
echo " authentication (i.e. don't update security.json)"
@ -975,6 +978,14 @@ if [[ "$SCRIPT_CMD" == "create" || "$SCRIPT_CMD" == "create_core" || "$SCRIPT_CM
exit 1
fi
if [ "$CREATE_CONFDIR" == "_default" ]; then
echo "WARNING: Using _default configset. Data driven schema functionality is enabled by default, which is"
echo " NOT RECOMMENDED for production use."
echo
echo " To turn it off:"
echo " curl http://$SOLR_TOOL_HOST:$CREATE_PORT/solr/$CREATE_NAME/config -d '{\"set-user-property\": {\"update.autoCreateFields\":\"false\"}}'"
fi
if [[ "$(whoami)" == "root" ]] && [[ "$FORCE" == "false" ]] ; then
echo "WARNING: Creating cores as the root user can cause Solr to fail and is not advisable. Exiting."
echo " If you started Solr as root (not advisable either), force core creation by adding argument -force"
@ -1242,6 +1253,11 @@ if [[ "$SCRIPT_CMD" == "auth" ]]; then
AUTH_PARAMS=("${AUTH_PARAMS[@]}" "-credentials" "$AUTH_CREDENTIALS")
shift 2
;;
-config)
AUTH_CONFIG="`echo $2| base64`"
AUTH_PARAMS=("${AUTH_PARAMS[@]}" "-config" "$AUTH_CONFIG")
shift 2
;;
-solrIncludeFile)
SOLR_INCLUDE="$2"
shift 2

View File

@ -1426,6 +1426,14 @@ if "!CREATE_PORT!"=="" (
goto err
)
if "!CREATE_CONFDIR!"=="_default" (
echo WARNING: Using _default configset. Data driven schema functionality is enabled by default, which is
echo NOT RECOMMENDED for production use.
echo To turn it off:
echo curl http://%SOLR_TOOL_HOST%:!CREATE_PORT!/solr/!CREATE_NAME!/config -d '{"set-user-property": {"update.autoCreateFields":"false"}}'
)
if "%SCRIPT_CMD%"=="create_core" (
"%JAVA%" %SOLR_SSL_OPTS% %AUTHC_OPTS% %SOLR_ZK_CREDS_AND_ACLS% -Dsolr.install.dir="%SOLR_TIP%" ^
-Dlog4j.configuration="file:%DEFAULT_SERVER_DIR%\scripts\cloud-scripts\log4j.properties" ^

View File

@ -116,8 +116,7 @@ public class LTRRescorer extends Rescorer {
final LTRScoringQuery.ModelWeight modelWeight = (LTRScoringQuery.ModelWeight) searcher
.createNormalizedWeight(scoringQuery, true);
final SolrIndexSearcher solrIndexSearch = (SolrIndexSearcher) searcher;
scoreFeatures(solrIndexSearch, firstPassTopDocs,topN, modelWeight, hits, leaves, reranked);
scoreFeatures(searcher, firstPassTopDocs,topN, modelWeight, hits, leaves, reranked);
// Must sort all documents that we reranked, and then select the top
Arrays.sort(reranked, new Comparator<ScoreDoc>() {
@Override
@ -138,7 +137,7 @@ public class LTRRescorer extends Rescorer {
return new TopDocs(firstPassTopDocs.totalHits, reranked, reranked[0].score);
}
public void scoreFeatures(SolrIndexSearcher solrIndexSearch, TopDocs firstPassTopDocs,
public void scoreFeatures(IndexSearcher indexSearcher, TopDocs firstPassTopDocs,
int topN, LTRScoringQuery.ModelWeight modelWeight, ScoreDoc[] hits, List<LeafReaderContext> leaves,
ScoreDoc[] reranked) throws IOException {
@ -183,8 +182,8 @@ public class LTRRescorer extends Rescorer {
reranked[hitUpto] = hit;
// if the heap is not full, maybe I want to log the features for this
// document
if (featureLogger != null) {
featureLogger.log(hit.doc, scoringQuery, solrIndexSearch,
if (featureLogger != null && indexSearcher instanceof SolrIndexSearcher) {
featureLogger.log(hit.doc, scoringQuery, (SolrIndexSearcher)indexSearcher,
modelWeight.getFeaturesInfo());
}
} else if (hitUpto == topN) {
@ -200,8 +199,8 @@ public class LTRRescorer extends Rescorer {
if (hit.score > reranked[0].score) {
reranked[0] = hit;
heapAdjust(reranked, topN, 0);
if (featureLogger != null) {
featureLogger.log(hit.doc, scoringQuery, solrIndexSearch,
if (featureLogger != null && indexSearcher instanceof SolrIndexSearcher) {
featureLogger.log(hit.doc, scoringQuery, (SolrIndexSearcher)indexSearcher,
modelWeight.getFeaturesInfo());
}
}

View File

@ -31,6 +31,7 @@ import org.apache.solr.common.cloud.Replica;
import org.apache.solr.common.cloud.Slice;
import org.apache.solr.common.cloud.SolrZkClient;
import org.apache.solr.common.cloud.ZkStateReader;
import org.apache.solr.common.params.CoreAdminParams;
import org.apache.solr.core.CoreContainer;
import org.apache.solr.core.CoreDescriptor;
import org.apache.solr.core.SolrResourceLoader;
@ -64,10 +65,11 @@ public class CloudUtil {
String cnn = replica.getName();
String baseUrl = replica.getStr(ZkStateReader.BASE_URL_PROP);
boolean isSharedFs = replica.getStr(CoreAdminParams.DATA_DIR) != null;
log.debug("compare against coreNodeName={} baseUrl={}", cnn, baseUrl);
if (thisCnn != null && thisCnn.equals(cnn)
&& !thisBaseUrl.equals(baseUrl)) {
&& !thisBaseUrl.equals(baseUrl) && isSharedFs) {
if (cc.getLoadedCoreNames().contains(desc.getName())) {
cc.unload(desc.getName());
}

View File

@ -324,6 +324,15 @@ public class CreateCollectionCmd implements Cmd {
ocmh.forwardToAutoScaling(AutoScaling.AUTO_ADD_REPLICAS_TRIGGER_DSL);
}
log.debug("Finished create command on all shards for collection: {}", collectionName);
// Emit a warning about production use of data driven functionality
boolean defaultConfigSetUsed = message.getStr(COLL_CONF) == null ||
message.getStr(COLL_CONF).equals(ConfigSetsHandlerApi.DEFAULT_CONFIGSET_NAME);
if (defaultConfigSetUsed) {
results.add("warning", "Using _default configset. Data driven schema functionality"
+ " is enabled by default, which is NOT RECOMMENDED for production use. To turn it off:"
+ " curl http://{host:port}/solr/" + collectionName + "/config -d '{\"set-user-property\": {\"update.autoCreateFields\":\"false\"}}'");
}
}
} catch (SolrException ex) {
throw ex;

View File

@ -34,6 +34,8 @@ import org.apache.solr.common.cloud.ZkNodeProps;
import org.apache.solr.common.params.CoreAdminParams;
import org.apache.solr.common.util.NamedList;
import org.apache.solr.common.util.Utils;
import org.apache.solr.update.UpdateLog;
import org.apache.solr.util.TimeOut;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
@ -105,18 +107,15 @@ public class MoveReplicaCmd implements Cmd{
}
assert slice != null;
Object dataDir = replica.get("dataDir");
final String ulogDir = replica.getStr("ulogDir");
if (dataDir != null && dataDir.toString().startsWith("hdfs:/")) {
moveHdfsReplica(clusterState, results, dataDir.toString(), ulogDir, targetNode, async, coll, replica, slice, timeout);
moveHdfsReplica(clusterState, results, dataDir.toString(), targetNode, async, coll, replica, slice, timeout);
} else {
moveNormalReplica(clusterState, results, targetNode, async, coll, replica, slice, timeout);
}
}
private void moveHdfsReplica(ClusterState clusterState, NamedList results, String dataDir, String ulogDir, String targetNode, String async,
private void moveHdfsReplica(ClusterState clusterState, NamedList results, String dataDir, String targetNode, String async,
DocCollection coll, Replica replica, Slice slice, int timeout) throws Exception {
String newCoreName = Assign.buildCoreName(coll, slice.getName(), replica.getType());
ZkNodeProps removeReplicasProps = new ZkNodeProps(
COLLECTION_PROP, coll.getName(),
SHARD_ID_PROP, slice.getName(),
@ -135,16 +134,32 @@ public class MoveReplicaCmd implements Cmd{
return;
}
TimeOut timeOut = new TimeOut(20L, TimeUnit.SECONDS);
while (!timeOut.hasTimedOut()) {
coll = ocmh.zkStateReader.getClusterState().getCollection(coll.getName());
if (coll.getReplica(replica.getName()) != null) {
Thread.sleep(100);
} else {
break;
}
}
if (timeOut.hasTimedOut()) {
results.add("failure", "Still see deleted replica in clusterstate!");
return;
}
String ulogDir = replica.getStr(CoreAdminParams.ULOG_DIR);
ZkNodeProps addReplicasProps = new ZkNodeProps(
COLLECTION_PROP, coll.getName(),
SHARD_ID_PROP, slice.getName(),
CoreAdminParams.NODE, targetNode,
CoreAdminParams.NAME, newCoreName,
CoreAdminParams.DATA_DIR, dataDir,
CoreAdminParams.ULOG_DIR, ulogDir);
CoreAdminParams.CORE_NODE_NAME, replica.getName(),
CoreAdminParams.NAME, replica.getCoreName(),
CoreAdminParams.ULOG_DIR, ulogDir.substring(0, ulogDir.lastIndexOf(UpdateLog.TLOG_NAME)),
CoreAdminParams.DATA_DIR, dataDir);
if(async!=null) addReplicasProps.getProperties().put(ASYNC, async);
NamedList addResult = new NamedList();
ocmh.addReplica(clusterState, addReplicasProps, addResult, null);
ocmh.addReplica(ocmh.zkStateReader.getClusterState(), addReplicasProps, addResult, null);
if (addResult.get("failure") != null) {
String errorString = String.format(Locale.ROOT, "Failed to create replica for collection=%s shard=%s" +
" on node=%s", coll.getName(), slice.getName(), targetNode);
@ -153,7 +168,7 @@ public class MoveReplicaCmd implements Cmd{
return;
} else {
String successString = String.format(Locale.ROOT, "MOVEREPLICA action completed successfully, moved replica=%s at node=%s " +
"to replica=%s at node=%s", replica.getCoreName(), replica.getNodeName(), newCoreName, targetNode);
"to replica=%s at node=%s", replica.getCoreName(), replica.getNodeName(), replica.getCoreName(), targetNode);
results.add("success", successString);
}
}

View File

@ -208,9 +208,9 @@ public class Overseer implements Closeable {
@Override
public void onEnqueue() throws Exception {
if (!itemWasMoved[0]) {
workQueue.offer(data);
stateUpdateQueue.poll();
itemWasMoved[0] = true;
workQueue.offer(data);
}
}

View File

@ -2250,10 +2250,7 @@ public class ZkController {
DocCollection collection = clusterState.getCollectionOrNull(desc
.getCloudDescriptor().getCollectionName());
if (collection != null) {
boolean autoAddReplicas = ClusterStateUtil.isAutoAddReplicas(getZkStateReader(), collection.getName());
if (autoAddReplicas) {
CloudUtil.checkSharedFSFailoverReplaced(cc, desc);
}
CloudUtil.checkSharedFSFailoverReplaced(cc, desc);
}
}
}

View File

@ -217,8 +217,8 @@ public class StreamHandler extends RequestHandlerBase implements SolrCoreAware,
.withFunctionName("scale", ScaleEvaluator.class)
.withFunctionName("sequence", SequenceEvaluator.class)
.withFunctionName("addAll", AddAllEvaluator.class)
.withFunctionName("residuals", ResidualsEvaluator.class)
// Boolean Stream Evaluators
.withFunctionName("and", AndEvaluator.class)
.withFunctionName("eor", ExclusiveOrEvaluator.class)

View File

@ -126,6 +126,7 @@ import static org.apache.solr.common.params.CoreAdminParams.DELETE_DATA_DIR;
import static org.apache.solr.common.params.CoreAdminParams.DELETE_INDEX;
import static org.apache.solr.common.params.CoreAdminParams.DELETE_INSTANCE_DIR;
import static org.apache.solr.common.params.CoreAdminParams.INSTANCE_DIR;
import static org.apache.solr.common.params.CoreAdminParams.ULOG_DIR;
import static org.apache.solr.common.params.ShardParams._ROUTE_;
import static org.apache.solr.common.util.StrUtils.formatString;
@ -633,6 +634,7 @@ public class CollectionsHandler extends RequestHandlerBase implements Permission
CoreAdminParams.NAME,
INSTANCE_DIR,
DATA_DIR,
ULOG_DIR,
REPLICA_TYPE);
return copyPropertiesWithPrefix(req.getParams(), props, COLL_PROP_PREFIX);
}),

View File

@ -204,53 +204,51 @@ public class QueryElevationComponent extends SearchComponent implements SolrCore
}
core.addTransformerFactory(markerName, elevatedMarkerFactory);
forceElevation = initArgs.getBool(QueryElevationParams.FORCE_ELEVATION, forceElevation);
try {
synchronized (elevationCache) {
elevationCache.clear();
String f = initArgs.get(CONFIG_FILE);
if (f == null) {
throw new SolrException(SolrException.ErrorCode.SERVER_ERROR,
"QueryElevationComponent must specify argument: '" + CONFIG_FILE
+ "' -- path to elevate.xml");
}
boolean exists = false;
// check if using ZooKeeper
ZkController zkController = core.getCoreContainer().getZkController();
if (zkController != null) {
// TODO : shouldn't have to keep reading the config name when it has been read before
exists = zkController.configFileExists(zkController.getZkStateReader().readConfigName(core.getCoreDescriptor().getCloudDescriptor().getCollectionName()), f);
} else {
File fC = new File(core.getResourceLoader().getConfigDir(), f);
File fD = new File(core.getDataDir(), f);
if (fC.exists() == fD.exists()) {
throw new SolrException(SolrException.ErrorCode.SERVER_ERROR,
"QueryElevationComponent missing config file: '" + f + "\n"
+ "either: " + fC.getAbsolutePath() + " or " + fD.getAbsolutePath() + " must exist, but not both.");
String f = initArgs.get(CONFIG_FILE);
if (f != null) {
try {
synchronized (elevationCache) {
elevationCache.clear();
boolean exists = false;
// check if using ZooKeeper
ZkController zkController = core.getCoreContainer().getZkController();
if (zkController != null) {
// TODO : shouldn't have to keep reading the config name when it has been read before
exists = zkController.configFileExists(zkController.getZkStateReader().readConfigName(core.getCoreDescriptor().getCloudDescriptor().getCollectionName()), f);
} else {
File fC = new File(core.getResourceLoader().getConfigDir(), f);
File fD = new File(core.getDataDir(), f);
if (fC.exists() == fD.exists()) {
throw new SolrException(SolrException.ErrorCode.SERVER_ERROR,
"QueryElevationComponent missing config file: '" + f + "\n"
+ "either: " + fC.getAbsolutePath() + " or " + fD.getAbsolutePath() + " must exist, but not both.");
}
if (fC.exists()) {
exists = true;
log.info("Loading QueryElevation from: " + fC.getAbsolutePath());
Config cfg = new Config(core.getResourceLoader(), f);
elevationCache.put(null, loadElevationMap(cfg));
}
}
if (fC.exists()) {
exists = true;
log.info("Loading QueryElevation from: " + fC.getAbsolutePath());
Config cfg = new Config(core.getResourceLoader(), f);
elevationCache.put(null, loadElevationMap(cfg));
}
}
//in other words, we think this is in the data dir, not the conf dir
if (!exists) {
// preload the first data
RefCounted<SolrIndexSearcher> searchHolder = null;
try {
searchHolder = core.getNewestSearcher(false);
IndexReader reader = searchHolder.get().getIndexReader();
getElevationMap(reader, core);
} finally {
if (searchHolder != null) searchHolder.decref();
//in other words, we think this is in the data dir, not the conf dir
if (!exists) {
// preload the first data
RefCounted<SolrIndexSearcher> searchHolder = null;
try {
searchHolder = core.getNewestSearcher(false);
IndexReader reader = searchHolder.get().getIndexReader();
getElevationMap(reader, core);
} finally {
if (searchHolder != null) searchHolder.decref();
}
}
}
} catch (Exception ex) {
throw new SolrException(SolrException.ErrorCode.SERVER_ERROR,
"Error initializing QueryElevationComponent.", ex);
}
} catch (Exception ex) {
throw new SolrException(SolrException.ErrorCode.SERVER_ERROR,
"Error initializing QueryElevationComponent.", ex);
}
}

View File

@ -179,15 +179,14 @@ public class ExactStatsCache extends StatsCache {
String termStatsString = StatsUtil.termStatsMapToString(statsMap);
rb.rsp.add(TERM_STATS_KEY, termStatsString);
if (LOG.isDebugEnabled()) {
LOG.debug("termStats=" + termStatsString + ", terms=" + terms + ", numDocs=" + searcher.maxDoc());
LOG.debug("termStats={}, terms={}, numDocs={}", termStatsString, terms, searcher.maxDoc());
}
}
if (colMap.size() != 0){
String colStatsString = StatsUtil.colStatsMapToString(colMap);
rb.rsp.add(COL_STATS_KEY, colStatsString);
if (LOG.isDebugEnabled()) {
LOG.debug("collectionStats="
+ colStatsString + ", terms=" + terms + ", numDocs=" + searcher.maxDoc());
LOG.debug("collectionStats={}, terms={}, numDocs={}", colStatsString, terms, searcher.maxDoc());
}
}
} catch (IOException e) {

View File

@ -136,7 +136,7 @@ public class LRUStatsCache extends ExactStatsCache {
throws IOException {
TermStats termStats = termStatsCache.get(term.toString());
if (termStats == null) {
LOG.debug("## Missing global termStats info: {}, using local", term.toString());
LOG.debug("## Missing global termStats info: {}, using local", term);
return localSearcher.localTermStatistics(term, context);
} else {
return termStats.toTermStatistics();

View File

@ -38,7 +38,7 @@ public class LocalStatsCache extends StatsCache {
@Override
public StatsSource get(SolrQueryRequest req) {
LOG.debug("## GET {}", req.toString());
LOG.debug("## GET {}", req);
return new LocalStatsSource();
}
@ -49,31 +49,33 @@ public class LocalStatsCache extends StatsCache {
// by returning null we don't create additional round-trip request.
@Override
public ShardRequest retrieveStatsRequest(ResponseBuilder rb) {
LOG.debug("## RDR {}", rb.req.toString());
LOG.debug("## RDR {}", rb.req);
return null;
}
@Override
public void mergeToGlobalStats(SolrQueryRequest req,
List<ShardResponse> responses) {
LOG.debug("## MTGD {}", req.toString());
for (ShardResponse r : responses) {
LOG.debug(" - {}", r);
if (LOG.isDebugEnabled()) {
LOG.debug("## MTGD {}", req);
for (ShardResponse r : responses) {
LOG.debug(" - {}", r);
}
}
}
@Override
public void returnLocalStats(ResponseBuilder rb, SolrIndexSearcher searcher) {
LOG.debug("## RLD {}", rb.req.toString());
LOG.debug("## RLD {}", rb.req);
}
@Override
public void receiveGlobalStats(SolrQueryRequest req) {
LOG.debug("## RGD {}", req.toString());
LOG.debug("## RGD {}", req);
}
@Override
public void sendGlobalStats(ResponseBuilder rb, ShardRequest outgoing) {
LOG.debug("## SGD {}", outgoing.toString());
LOG.debug("## SGD {}", outgoing);
}
}

View File

@ -43,6 +43,7 @@ import java.time.Instant;
import java.time.Period;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Base64;
import java.util.Collection;
import java.util.Enumeration;
import java.util.HashMap;
@ -115,6 +116,7 @@ import org.apache.solr.common.params.CommonParams;
import org.apache.solr.common.params.ModifiableSolrParams;
import org.apache.solr.common.util.ContentStreamBase;
import org.apache.solr.common.util.NamedList;
import org.apache.solr.common.util.StrUtils;
import org.apache.solr.security.Sha256AuthenticationProvider;
import org.apache.solr.util.configuration.SSLConfigurationsFactory;
import org.noggit.CharArr;
@ -3548,7 +3550,7 @@ public class SolrCLI {
OptionBuilder
.withArgName("type")
.hasArg()
.withDescription("The authentication mechanism to enable. Defaults to 'basicAuth'.")
.withDescription("The authentication mechanism to enable (basicAuth or kerberos). Defaults to 'basicAuth'.")
.create("type"),
OptionBuilder
.withArgName("credentials")
@ -3561,6 +3563,11 @@ public class SolrCLI {
.withDescription("Prompts the user to provide the credentials. Use either -credentials or -prompt, not both")
.create("prompt"),
OptionBuilder
.withArgName("config")
.hasArgs()
.withDescription("Configuration parameters (Solr startup parameters). Required for Kerberos authentication")
.create("config"),
OptionBuilder
.withArgName("blockUnknown")
.withDescription("Blocks all access for unknown users (requires authentication for all endpoints)")
.hasArg()
@ -3603,11 +3610,141 @@ public class SolrCLI {
}
String type = cli.getOptionValue("type", "basicAuth");
if (type.equalsIgnoreCase("basicAuth") == false) {
System.out.println("Only type=basicAuth supported at the moment.");
exit(1);
switch (type) {
case "basicAuth":
return handleBasicAuth(cli);
case "kerberos":
return handleKerberos(cli);
default:
System.out.println("Only type=basicAuth or kerberos supported at the moment.");
exit(1);
}
return 1;
}
private int handleKerberos(CommandLine cli) throws Exception {
String cmd = cli.getArgs()[0];
boolean updateIncludeFileOnly = Boolean.parseBoolean(cli.getOptionValue("updateIncludeFileOnly", "false"));
String securityJson = "{" +
"\n \"authentication\":{" +
"\n \"class\":\"solr.KerberosPlugin\"" +
"\n }" +
"\n}";
switch (cmd) {
case "enable":
String zkHost = null;
boolean zkInaccessible = false;
if (!updateIncludeFileOnly) {
try {
zkHost = getZkHost(cli);
} catch (Exception ex) {
System.out.println("Unable to access ZooKeeper. Please add the following security.json to ZooKeeper (in case of SolrCloud):\n"
+ securityJson + "\n");
zkInaccessible = true;
}
if (zkHost == null) {
if (zkInaccessible == false) {
System.out.println("Unable to access ZooKeeper. Please add the following security.json to ZooKeeper (in case of SolrCloud):\n"
+ securityJson + "\n");
zkInaccessible = true;
}
}
// check if security is already enabled or not
if (!zkInaccessible) {
try (SolrZkClient zkClient = new SolrZkClient(zkHost, 10000)) {
if (zkClient.exists("/security.json", true)) {
byte oldSecurityBytes[] = zkClient.getData("/security.json", null, null, true);
if (!"{}".equals(new String(oldSecurityBytes, StandardCharsets.UTF_8).trim())) {
System.out.println("Security is already enabled. You can disable it with 'bin/solr auth disable'. Existing security.json: \n"
+ new String(oldSecurityBytes, StandardCharsets.UTF_8));
exit(1);
}
}
} catch (Exception ex) {
if (zkInaccessible == false) {
System.out.println("Unable to access ZooKeeper. Please add the following security.json to ZooKeeper (in case of SolrCloud):\n"
+ securityJson + "\n");
zkInaccessible = true;
}
}
}
}
if (!updateIncludeFileOnly) {
if (!zkInaccessible) {
System.out.println("Uploading following security.json: " + securityJson);
try (SolrZkClient zkClient = new SolrZkClient(zkHost, 10000)) {
zkClient.setData("/security.json", securityJson.getBytes(StandardCharsets.UTF_8), true);
} catch (Exception ex) {
if (zkInaccessible == false) {
System.out.println("Unable to access ZooKeeper. Please add the following security.json to ZooKeeper (in case of SolrCloud):\n"
+ securityJson);
zkInaccessible = true;
}
}
}
}
String config = StrUtils.join(Arrays.asList(cli.getOptionValues("config")), ' ');
// config is base64 encoded (to get around parsing problems), decode it
config = config.replaceAll(" ", "");
config = new String(Base64.getDecoder().decode(config.getBytes("UTF-8")), "UTF-8");
config = config.replaceAll("\n", "").replaceAll("\r", "");
String solrIncludeFilename = cli.getOptionValue("solrIncludeFile");
File includeFile = new File(solrIncludeFilename);
if (includeFile.exists() == false || includeFile.canWrite() == false) {
System.out.println("Solr include file " + solrIncludeFilename + " doesn't exist or is not writeable.");
printAuthEnablingInstructions(config);
System.exit(0);
}
// update the solr.in.sh file to contain the necessary authentication lines
updateIncludeFileEnableAuth(includeFile, null, config);
System.out.println("Please restart any running Solr nodes.");
return 0;
case "disable":
if (!updateIncludeFileOnly) {
zkHost = getZkHost(cli);
if (zkHost == null) {
stdout.print("ZK Host not found. Solr should be running in cloud mode");
exit(1);
}
System.out.println("Uploading following security.json: {}");
try (SolrZkClient zkClient = new SolrZkClient(zkHost, 10000)) {
zkClient.setData("/security.json", "{}".getBytes(StandardCharsets.UTF_8), true);
}
}
solrIncludeFilename = cli.getOptionValue("solrIncludeFile");
includeFile = new File(solrIncludeFilename);
if (!includeFile.exists() || !includeFile.canWrite()) {
System.out.println("Solr include file " + solrIncludeFilename + " doesn't exist or is not writeable.");
System.out.println("Security has been disabled. Please remove any SOLR_AUTH_TYPE or SOLR_AUTHENTICATION_OPTS configuration from solr.in.sh/solr.in.cmd.\n");
System.exit(0);
}
// update the solr.in.sh file to comment out the necessary authentication lines
updateIncludeFileDisableAuth(includeFile);
return 0;
default:
System.out.println("Valid auth commands are: enable, disable");
exit(1);
}
System.out.println("Options not understood.");
new HelpFormatter().printHelp("bin/solr auth <enable|disable> [OPTIONS]", getToolOptions(this));
return 1;
}
private int handleBasicAuth(CommandLine cli) throws Exception {
String cmd = cli.getArgs()[0];
boolean prompt = Boolean.parseBoolean(cli.getOptionValue("prompt", "false"));
boolean updateIncludeFileOnly = Boolean.parseBoolean(cli.getOptionValue("updateIncludeFileOnly", "false"));
@ -3715,7 +3852,7 @@ public class SolrCLI {
"httpBasicAuthUser=" + username + "\nhttpBasicAuthPassword=" + password, StandardCharsets.UTF_8);
// update the solr.in.sh file to contain the necessary authentication lines
updateIncludeFileEnableAuth(includeFile, basicAuthConfFile.getAbsolutePath());
updateIncludeFileEnableAuth(includeFile, basicAuthConfFile.getAbsolutePath(), null);
return 0;
case "disable":
@ -3754,7 +3891,6 @@ public class SolrCLI {
new HelpFormatter().printHelp("bin/solr auth <enable|disable> [OPTIONS]", getToolOptions(this));
return 1;
}
private void printAuthEnablingInstructions(String username, String password) {
if (SystemUtils.IS_OS_WINDOWS) {
System.out.println("\nAdd the following lines to the solr.in.cmd file so that the solr.cmd script can use subsequently.\n");
@ -3766,8 +3902,26 @@ public class SolrCLI {
+ "SOLR_AUTHENTICATION_OPTS=\"-Dbasicauth=" + username + ":" + password + "\"\n");
}
}
private void printAuthEnablingInstructions(String kerberosConfig) {
if (SystemUtils.IS_OS_WINDOWS) {
System.out.println("\nAdd the following lines to the solr.in.cmd file so that the solr.cmd script can use subsequently.\n");
System.out.println("set SOLR_AUTH_TYPE=kerberos\n"
+ "set SOLR_AUTHENTICATION_OPTS=\"" + kerberosConfig + "\"\n");
} else {
System.out.println("\nAdd the following lines to the solr.in.sh file so that the ./solr script can use subsequently.\n");
System.out.println("SOLR_AUTH_TYPE=\"kerberos\"\n"
+ "SOLR_AUTHENTICATION_OPTS=\"" + kerberosConfig + "\"\n");
}
}
private void updateIncludeFileEnableAuth(File includeFile, String basicAuthConfFile) throws IOException {
/**
* This will update the include file (e.g. solr.in.sh / solr.in.cmd) with the authentication parameters.
* @param includeFile The include file
* @param basicAuthConfFile If basicAuth, the path of the file containing credentials. If not, null.
* @param kerberosConfig If kerberos, the config string containing startup parameters. If not, null.
*/
private void updateIncludeFileEnableAuth(File includeFile, String basicAuthConfFile, String kerberosConfig) throws IOException {
assert !(basicAuthConfFile != null && kerberosConfig != null); // only one of the two needs to be populated
List<String> includeFileLines = FileUtils.readLines(includeFile, StandardCharsets.UTF_8);
for (int i=0; i<includeFileLines.size(); i++) {
String line = includeFileLines.get(i);
@ -3780,18 +3934,34 @@ public class SolrCLI {
}
}
includeFileLines.add(""); // blank line
if (SystemUtils.IS_OS_WINDOWS) {
includeFileLines.add("REM The following lines added by solr.cmd for enabling BasicAuth");
includeFileLines.add("set SOLR_AUTH_TYPE=basic");
includeFileLines.add("set SOLR_AUTHENTICATION_OPTS=\"-Dsolr.httpclient.config=" + basicAuthConfFile + "\"");
} else {
includeFileLines.add("# The following lines added by ./solr for enabling BasicAuth");
includeFileLines.add("SOLR_AUTH_TYPE=\"basic\"");
includeFileLines.add("SOLR_AUTHENTICATION_OPTS=\"-Dsolr.httpclient.config=" + basicAuthConfFile + "\"");
if (basicAuthConfFile != null) { // for basicAuth
if (SystemUtils.IS_OS_WINDOWS) {
includeFileLines.add("REM The following lines added by solr.cmd for enabling BasicAuth");
includeFileLines.add("set SOLR_AUTH_TYPE=basic");
includeFileLines.add("set SOLR_AUTHENTICATION_OPTS=\"-Dsolr.httpclient.config=" + basicAuthConfFile + "\"");
} else {
includeFileLines.add("# The following lines added by ./solr for enabling BasicAuth");
includeFileLines.add("SOLR_AUTH_TYPE=\"basic\"");
includeFileLines.add("SOLR_AUTHENTICATION_OPTS=\"-Dsolr.httpclient.config=" + basicAuthConfFile + "\"");
}
} else { // for kerberos
if (SystemUtils.IS_OS_WINDOWS) {
includeFileLines.add("REM The following lines added by solr.cmd for enabling BasicAuth");
includeFileLines.add("set SOLR_AUTH_TYPE=kerberos");
includeFileLines.add("set SOLR_AUTHENTICATION_OPTS=\"-Dsolr.httpclient.config=" + basicAuthConfFile + "\"");
} else {
includeFileLines.add("# The following lines added by ./solr for enabling BasicAuth");
includeFileLines.add("SOLR_AUTH_TYPE=\"kerberos\"");
includeFileLines.add("SOLR_AUTHENTICATION_OPTS=\"" + kerberosConfig + "\"");
}
}
FileUtils.writeLines(includeFile, StandardCharsets.UTF_8.name(), includeFileLines);
System.out.println("Written out credentials file: " + basicAuthConfFile + ", updated Solr include file: " + includeFile.getAbsolutePath() + ".");
if (basicAuthConfFile != null) {
System.out.println("Written out credentials file: " + basicAuthConfFile);
}
System.out.println("Updated Solr include file: " + includeFile.getAbsolutePath());
}
private void updateIncludeFileDisableAuth(File includeFile) throws IOException {

View File

@ -18,14 +18,14 @@
<schema name="add-schema-fields-update-processor" version="1.6">
<fieldType name="tint" class="solr.TrieIntField" precisionStep="8" multiValued="true" positionIncrementGap="0"/>
<fieldType name="tfloat" class="solr.TrieFloatField" precisionStep="8" multiValued="true" positionIncrementGap="0"/>
<fieldType name="tlong" class="solr.TrieLongField" precisionStep="8" multiValued="true" positionIncrementGap="0"/>
<fieldType name="tdouble" class="solr.TrieDoubleField" precisionStep="8" multiValued="true" positionIncrementGap="0"/>
<fieldType name="tdate" class="solr.TrieDateField" precisionStep="6" multiValued="true" positionIncrementGap="0"/>
<fieldType name="tint" class="${solr.tests.IntegerFieldType}" docValues="${solr.tests.numeric.dv}" precisionStep="8" multiValued="true" positionIncrementGap="0"/>
<fieldType name="tfloat" class="${solr.tests.FloatFieldType}" docValues="${solr.tests.numeric.dv}" precisionStep="8" multiValued="true" positionIncrementGap="0"/>
<fieldType name="tlong" class="${solr.tests.LongFieldType}" docValues="${solr.tests.numeric.dv}" precisionStep="8" multiValued="true" positionIncrementGap="0"/>
<fieldType name="tdouble" class="${solr.tests.DoubleFieldType}" docValues="${solr.tests.numeric.dv}" precisionStep="8" multiValued="true" positionIncrementGap="0"/>
<fieldType name="tdate" class="${solr.tests.DateFieldType}" docValues="${solr.tests.numeric.dv}" precisionStep="6" multiValued="true" positionIncrementGap="0"/>
<fieldType name="boolean" class="solr.BoolField" sortMissingLast="true" multiValued="true"/>
<fieldType name="string" class="solr.StrField" sortMissingLast="true"/>
<fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="long" class="${solr.tests.LongFieldType}" docValues="${solr.tests.numeric.dv}" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="text" class="solr.TextField" multiValued="true" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>

View File

@ -17,9 +17,9 @@
-->
<schema name="test" version="1.0">
<fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="float" class="solr.TrieFloatField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="long" class="solr.TrieLongField" precisionStep="0" omitNorms="true" positionIncrementGap="0"/>
<fieldType name="int" class="${solr.tests.IntegerFieldType}" docValues="${solr.tests.numeric.dv}" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="float" class="${solr.tests.FloatFieldType}" docValues="${solr.tests.numeric.dv}" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="long" class="${solr.tests.LongFieldType}" docValues="${solr.tests.numeric.dv}" precisionStep="0" omitNorms="true" positionIncrementGap="0"/>
<fieldtype name="string" class="solr.StrField" sortMissingLast="true"/>
<field name="id" type="string" indexed="true" stored="true" multiValued="false" required="false"/>

View File

@ -17,8 +17,8 @@
-->
<schema name="test-custom-field-sort" version="1.6">
<fieldType name="int" class="solr.TrieIntField" precisionStep="0" omitNorms="true" positionIncrementGap="0"/>
<fieldType name="long" class="solr.TrieLongField" precisionStep="0" omitNorms="true" positionIncrementGap="0"/>
<fieldType name="int" class="${solr.tests.IntegerFieldType}" docValues="${solr.tests.numeric.dv}" precisionStep="0" omitNorms="true" positionIncrementGap="0"/>
<fieldType name="long" class="${solr.tests.LongFieldType}" docValues="${solr.tests.numeric.dv}" precisionStep="0" omitNorms="true" positionIncrementGap="0"/>
<fieldType name="string" class="solr.StrField" sortMissingLast="true"/>
<fieldType name="text" class="solr.TextField">
<analyzer>

View File

@ -1,42 +0,0 @@
<?xml version="1.0" encoding="UTF-8" ?>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<!-- If this file is found in the config directory, it will only be
loaded once at startup. If it is found in Solr's data
directory, it will be re-loaded every commit.
See http://wiki.apache.org/solr/QueryElevationComponent for more info
-->
<elevate>
<!-- Query elevation examples
<query text="foo bar">
<doc id="1" />
<doc id="2" />
<doc id="3" />
</query>
for use with techproducts example
<query text="ipod">
<doc id="MA147LL/A" /> put the actual ipod at the top
<doc id="IW-02" exclude="true" /> exclude this cable
</query>
-->
</elevate>

View File

@ -1004,7 +1004,6 @@
<searchComponent name="elevator" class="solr.QueryElevationComponent" >
<!-- pick a fieldType to analyze queries -->
<str name="queryFieldType">string</str>
<str name="config-file">elevate.xml</str>
</searchComponent>
<!-- A request handler for demonstrating the elevator component -->

View File

@ -74,9 +74,10 @@ public class CollectionsAPISolrJTest extends SolrCloudTestCase {
assertEquals(0, (int)status.get("status"));
assertTrue(status.get("QTime") > 0);
}
// Use of _default configset should generate a warning for data-driven functionality in production use
assertTrue(response.getWarning() != null && response.getWarning().contains("NOT RECOMMENDED for production use"));
response = CollectionAdminRequest.deleteCollection(collectionName).process(cluster.getSolrClient());
assertEquals(0, response.getStatus());
assertTrue(response.isSuccess());
Map<String,NamedList<Integer>> nodesStatus = response.getCollectionNodesStatus();

View File

@ -54,7 +54,6 @@ public class MoveReplicaHDFSTest extends MoveReplicaTest {
dfsCluster = null;
}
public static class ForkJoinThreadsFilter implements ThreadFilter {
@Override

View File

@ -0,0 +1,142 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.solr.cloud;
import java.io.IOException;
import com.carrotsearch.randomizedtesting.annotations.ThreadLeakFilters;
import org.apache.hadoop.hdfs.MiniDFSCluster;
import org.apache.solr.client.solrj.SolrClient;
import org.apache.solr.client.solrj.SolrQuery;
import org.apache.solr.client.solrj.SolrServerException;
import org.apache.solr.client.solrj.request.CollectionAdminRequest;
import org.apache.solr.client.solrj.response.CollectionAdminResponse;
import org.apache.solr.cloud.hdfs.HdfsTestUtil;
import org.apache.solr.common.SolrInputDocument;
import org.apache.solr.common.cloud.ClusterStateUtil;
import org.apache.solr.common.cloud.DocCollection;
import org.apache.solr.common.cloud.Replica;
import org.apache.solr.common.cloud.ZkConfigManager;
import org.apache.solr.common.cloud.ZkStateReader;
import org.apache.solr.util.BadHdfsThreadsFilter;
import org.junit.AfterClass;
import org.junit.BeforeClass;
import org.junit.Test;
@ThreadLeakFilters(defaultFilters = true, filters = {
BadHdfsThreadsFilter.class, // hdfs currently leaks thread(s)
MoveReplicaHDFSTest.ForkJoinThreadsFilter.class
})
public class MoveReplicaHDFSUlogDirTest extends SolrCloudTestCase {
private static MiniDFSCluster dfsCluster;
@BeforeClass
public static void setupClass() throws Exception {
configureCluster(2)
.addConfig("conf1", TEST_PATH().resolve("configsets").resolve("cloud-dynamic").resolve("conf"))
.configure();
System.setProperty("solr.hdfs.blockcache.enabled", "false");
dfsCluster = HdfsTestUtil.setupClass(createTempDir().toFile().getAbsolutePath());
ZkConfigManager configManager = new ZkConfigManager(zkClient());
configManager.uploadConfigDir(configset("cloud-hdfs"), "conf1");
System.setProperty("solr.hdfs.home", HdfsTestUtil.getDataDir(dfsCluster, "data"));
}
@AfterClass
public static void teardownClass() throws Exception {
cluster.shutdown(); // need to close before the MiniDFSCluster
HdfsTestUtil.teardownClass(dfsCluster);
dfsCluster = null;
}
@Test
public void testDataDirAndUlogAreMaintained() throws Exception {
String coll = "movereplicatest_coll2";
CollectionAdminRequest.createCollection(coll, "conf1", 1, 1)
.setCreateNodeSet("")
.process(cluster.getSolrClient());
String hdfsUri = HdfsTestUtil.getURI(dfsCluster);
String dataDir = hdfsUri + "/dummyFolder/dataDir";
String ulogDir = hdfsUri + "/dummyFolder2/ulogDir";
CollectionAdminResponse res = CollectionAdminRequest
.addReplicaToShard(coll, "shard1")
.setDataDir(dataDir)
.setUlogDir(ulogDir)
.setNode(cluster.getJettySolrRunner(0).getNodeName())
.process(cluster.getSolrClient());
ulogDir += "/tlog";
ZkStateReader zkStateReader = cluster.getSolrClient().getZkStateReader();
assertTrue(ClusterStateUtil.waitForAllActiveAndLiveReplicas(zkStateReader, 120000));
DocCollection docCollection = zkStateReader.getClusterState().getCollection(coll);
Replica replica = docCollection.getReplicas().iterator().next();
assertTrue(replica.getStr("ulogDir"), replica.getStr("ulogDir").equals(ulogDir) || replica.getStr("ulogDir").equals(ulogDir+'/'));
assertTrue(replica.getStr("dataDir"),replica.getStr("dataDir").equals(dataDir) || replica.getStr("dataDir").equals(dataDir+'/'));
new CollectionAdminRequest.MoveReplica(coll, replica.getName(), cluster.getJettySolrRunner(1).getNodeName())
.process(cluster.getSolrClient());
assertTrue(ClusterStateUtil.waitForAllActiveAndLiveReplicas(zkStateReader, 120000));
docCollection = zkStateReader.getClusterState().getCollection(coll);
assertEquals(1, docCollection.getSlice("shard1").getReplicas().size());
Replica newReplica = docCollection.getReplicas().iterator().next();
assertEquals(newReplica.getNodeName(), cluster.getJettySolrRunner(1).getNodeName());
assertTrue(newReplica.getStr("ulogDir"), newReplica.getStr("ulogDir").equals(ulogDir) || newReplica.getStr("ulogDir").equals(ulogDir+'/'));
assertTrue(newReplica.getStr("dataDir"),newReplica.getStr("dataDir").equals(dataDir) || newReplica.getStr("dataDir").equals(dataDir+'/'));
assertEquals(replica.getName(), newReplica.getName());
assertEquals(replica.getCoreName(), newReplica.getCoreName());
assertFalse(replica.getNodeName().equals(newReplica.getNodeName()));
final int numDocs = 100;
addDocs(coll, numDocs); // indexed but not committed
cluster.getJettySolrRunner(1).stop();
Thread.sleep(5000);
new CollectionAdminRequest.MoveReplica(coll, newReplica.getName(), cluster.getJettySolrRunner(0).getNodeName())
.process(cluster.getSolrClient());
assertTrue(ClusterStateUtil.waitForAllActiveAndLiveReplicas(zkStateReader, 120000));
// assert that the old core will be removed on startup
cluster.getJettySolrRunner(1).start();
assertTrue(ClusterStateUtil.waitForAllActiveAndLiveReplicas(zkStateReader, 120000));
docCollection = zkStateReader.getClusterState().getCollection(coll);
assertEquals(1, docCollection.getReplicas().size());
newReplica = docCollection.getReplicas().iterator().next();
assertEquals(newReplica.getNodeName(), cluster.getJettySolrRunner(0).getNodeName());
assertTrue(newReplica.getStr("ulogDir"), newReplica.getStr("ulogDir").equals(ulogDir) || newReplica.getStr("ulogDir").equals(ulogDir+'/'));
assertTrue(newReplica.getStr("dataDir"),newReplica.getStr("dataDir").equals(dataDir) || newReplica.getStr("dataDir").equals(dataDir+'/'));
assertEquals(0, cluster.getJettySolrRunner(1).getCoreContainer().getCores().size());
cluster.getSolrClient().commit(coll);
assertEquals(numDocs, cluster.getSolrClient().query(coll, new SolrQuery("*:*")).getResults().getNumFound());
}
private void addDocs(String collection, int numDocs) throws SolrServerException, IOException {
SolrClient solrClient = cluster.getSolrClient();
for (int docId = 1; docId <= numDocs; docId++) {
SolrInputDocument doc = new SolrInputDocument();
doc.addField("id", docId);
solrClient.add(collection, doc);
}
}
}

View File

@ -27,7 +27,7 @@ import java.util.List;
import java.util.Locale;
import java.util.Map;
import java.util.concurrent.TimeUnit;
import org.apache.solr.SolrTestCaseJ4.SuppressObjectReleaseTracker;
import org.apache.solr.SolrTestCaseJ4.SuppressSSL;
import org.apache.solr.client.solrj.SolrClient;
import org.apache.solr.client.solrj.SolrQuery;
@ -52,6 +52,7 @@ import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
@SuppressSSL(bugUrl = "https://issues.apache.org/jira/browse/SOLR-5776")
@SuppressObjectReleaseTracker(bugUrl="Testing purposes")
public class TestPullReplicaErrorHandling extends SolrCloudTestCase {
private final static int REPLICATION_TIMEOUT_SECS = 10;

View File

@ -19,9 +19,12 @@ package org.apache.solr.cloud;
import java.lang.invoke.MethodHandles;
import java.util.ArrayList;
import java.util.Collection;
import java.util.HashMap;
import java.util.LinkedHashMap;
import java.util.LinkedHashSet;
import java.util.List;
import java.util.Map;
import java.util.Set;
import com.codahale.metrics.Counter;
import org.apache.lucene.util.TestUtil;
@ -41,7 +44,6 @@ import org.apache.solr.common.util.Utils;
import org.apache.solr.core.CoreContainer;
import org.apache.solr.core.SolrCore;
import org.apache.solr.metrics.SolrMetricManager;
import org.apache.solr.request.SolrRequestHandler;
import org.junit.Test;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
@ -86,6 +88,25 @@ public class TestRandomRequestDistribution extends AbstractFullDistribZkTestBase
cloudClient.getZkStateReader().forceUpdateCollection("b1x1");
// get direct access to the metrics counters for each core/replica we're interested to monitor them
final Map<String,Counter> counters = new LinkedHashMap<>();
for (JettySolrRunner runner : jettys) {
CoreContainer container = runner.getCoreContainer();
SolrMetricManager metricManager = container.getMetricManager();
for (SolrCore core : container.getCores()) {
if ("a1x2".equals(core.getCoreDescriptor().getCollectionName())) {
String registry = core.getCoreMetricManager().getRegistryName();
Counter cnt = metricManager.counter(null, registry, "requests", "QUERY./select");
// sanity check
assertEquals(core.getName() + " has already recieved some requests?",
0, cnt.getCount());
counters.put(core.getName(), cnt);
}
}
}
assertEquals("Sanity Check: we know there should be 2 replicas", 2, counters.size());
// send queries to the node that doesn't host any core/replica and see where it routes them
ClusterState clusterState = cloudClient.getZkStateReader().getClusterState();
DocCollection b1x1 = clusterState.getCollection("b1x1");
Collection<Replica> replicas = b1x1.getSlice("shard1").getReplicas();
@ -94,29 +115,30 @@ public class TestRandomRequestDistribution extends AbstractFullDistribZkTestBase
if (!baseUrl.endsWith("/")) baseUrl += "/";
try (HttpSolrClient client = getHttpSolrClient(baseUrl + "a1x2", 2000, 5000)) {
long expectedTotalRequests = 0;
Set<String> uniqueCoreNames = new LinkedHashSet<>();
log.info("Making requests to " + baseUrl + "a1x2");
for (int i = 0; i < 10; i++) {
while (uniqueCoreNames.size() < counters.keySet().size() && expectedTotalRequests < 1000L) {
expectedTotalRequests++;
client.query(new SolrQuery("*:*"));
long actualTotalRequests = 0;
for (Map.Entry<String,Counter> e : counters.entrySet()) {
final long coreCount = e.getValue().getCount();
actualTotalRequests += coreCount;
if (0 < coreCount) {
uniqueCoreNames.add(e.getKey());
}
}
assertEquals("Sanity Check: Num Queries So Far Doesn't Match Total????",
expectedTotalRequests, actualTotalRequests);
}
}
Map<String, Integer> shardVsCount = new HashMap<>();
for (JettySolrRunner runner : jettys) {
CoreContainer container = runner.getCoreContainer();
SolrMetricManager metricManager = container.getMetricManager();
for (SolrCore core : container.getCores()) {
String registry = core.getCoreMetricManager().getRegistryName();
Counter cnt = metricManager.counter(null, registry, "requests", "QUERY./select");
SolrRequestHandler select = core.getRequestHandler("");
// long c = (long) select.getStatistics().get("requests");
shardVsCount.put(core.getName(), (int) cnt.getCount());
}
}
log.info("Shard count map = " + shardVsCount);
for (Map.Entry<String, Integer> entry : shardVsCount.entrySet()) {
assertTrue("Shard " + entry.getKey() + " received all 10 requests", entry.getValue() != 10);
log.info("Total requests: " + expectedTotalRequests);
assertEquals("either request randomization code is broken of this test seed is really unlucky, " +
"Gave up waiting for requests to hit every core at least once after " +
expectedTotalRequests + " requests",
uniqueCoreNames.size(), counters.size());
}
}

File diff suppressed because it is too large Load Diff

View File

@ -159,6 +159,19 @@ public class TestUseDocValuesAsStored extends AbstractBadConfigTestBase {
"{'id':'xyz'}"
+ "]");
}
@Test
public void testDuplicateMultiValued() throws Exception {
doTest("strTF", dvStringFieldName(3,true,false), "str", "X", "X", "Y");
doTest("strTT", dvStringFieldName(3,true,true), "str", "X", "X", "Y");
doTest("strFF", dvStringFieldName(3,false,false), "str", "X", "X", "Y");
doTest("int", "test_is_dvo", "int", "42", "42", "-666");
doTest("float", "test_fs_dvo", "float", "4.2", "4.2", "-66.666");
doTest("long", "test_ls_dvo", "long", "420", "420", "-6666666" );
doTest("double", "test_ds_dvo", "double", "0.0042", "0.0042", "-6.6666E-5");
doTest("date", "test_dts_dvo", "date", "2016-07-04T03:02:01Z", "2016-07-04T03:02:01Z", "1999-12-31T23:59:59Z" );
doTest("enum", "enums_dvo", "str", SEVERITY[0], SEVERITY[0], SEVERITY[1]);
}
@Test
public void testRandomSingleAndMultiValued() throws Exception {
@ -318,9 +331,14 @@ public class TestUseDocValuesAsStored extends AbstractBadConfigTestBase {
xpaths[i] = "//arr[@name='" + field + "']/" + type + "[.='" + value[i] + "']";
}
// Docvalues are sets, but stored values are ordered multisets, so cardinality depends on the value source
xpaths[value.length] = "*[count(//arr[@name='" + field + "']/" + type + ") = "
+ (isStoredField(field) ? value.length : valueSet.size()) + "]";
// See SOLR-10924...
// Trie/String based Docvalues are sets, but stored values & Point DVs are ordered multisets,
// so cardinality depends on the value source
final int expectedCardinality =
(isStoredField(field) || (Boolean.getBoolean(NUMERIC_POINTS_SYSPROP)
&& ! (field.startsWith("enum") || field.startsWith("test_s"))))
? value.length : valueSet.size();
xpaths[value.length] = "*[count(//arr[@name='"+field+"']/"+type+")="+expectedCardinality+"]";
assertU(adoc(fieldAndValues));
} else {

View File

@ -1,42 +0,0 @@
<?xml version="1.0" encoding="UTF-8" ?>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<!-- If this file is found in the config directory, it will only be
loaded once at startup. If it is found in Solr's data
directory, it will be re-loaded every commit.
See http://wiki.apache.org/solr/QueryElevationComponent for more info
-->
<elevate>
<!-- Query elevation examples
<query text="foo bar">
<doc id="1" />
<doc id="2" />
<doc id="3" />
</query>
for use with techproducts example
<query text="ipod">
<doc id="MA147LL/A" /> put the actual ipod at the top
<doc id="IW-02" exclude="true" /> exclude this cable
</query>
-->
</elevate>

View File

@ -1004,7 +1004,6 @@
<searchComponent name="elevator" class="solr.QueryElevationComponent" >
<!-- pick a fieldType to analyze queries -->
<str name="queryFieldType">string</str>
<str name="config-file">elevate.xml</str>
</searchComponent>
<!-- A request handler for demonstrating the elevator component -->

View File

@ -1,6 +1,7 @@
= About This Guide
:page-shortname: about-this-guide
:page-permalink: about-this-guide.html
:page-toc: false
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
@ -26,38 +27,13 @@ Designed to provide high-level documentation, this guide is intended to be more
The material as presented assumes that you are familiar with some basic search concepts and that you can read XML. It does not assume that you are a Java programmer, although knowledge of Java is helpful when working directly with Lucene or when developing custom extensions to a Lucene/Solr installation.
[[AboutThisGuide-SpecialInlineNotes]]
== Special Inline Notes
Special notes are included throughout these pages. There are several types of notes:
Information blocks::
+
NOTE: These provide additional information that's useful for you to know.
Important::
+
IMPORTANT: These provide information that is critical for you to know.
Tip::
+
TIP: These provide helpful tips.
Caution::
+
CAUTION: These provide details on scenarios or configurations you should be careful with.
Warning::
+
WARNING: These are meant to warn you from a possibly dangerous change or action.
[[AboutThisGuide-HostsandPortExamples]]
== Hosts and Port Examples
The default port when running Solr is 8983. The samples, URLs and screenshots in this guide may show different ports, because the port number that Solr uses is configurable. If you have not customized your installation of Solr, please make sure that you use port 8983 when following the examples, or configure your own installation to use the port numbers shown in the examples. For information about configuring port numbers, see the section <<managing-solr.adoc#managing-solr,Managing Solr>>.
The default port when running Solr is 8983. The samples, URLs and screenshots in this guide may show different ports, because the port number that Solr uses is configurable.
Similarly, URL examples use 'localhost' throughout; if you are accessing Solr from a location remote to the server hosting Solr, replace 'localhost' with the proper domain or IP where Solr is running.
If you have not customized your installation of Solr, please make sure that you use port 8983 when following the examples, or configure your own installation to use the port numbers shown in the examples. For information about configuring port numbers, see the section <<managing-solr.adoc#managing-solr,Managing Solr>>.
Similarly, URL examples use `localhost` throughout; if you are accessing Solr from a location remote to the server hosting Solr, replace `localhost` with the proper domain or IP where Solr is running.
For example, we might provide a sample query like:
@ -67,7 +43,32 @@ There are several items in this URL you might need to change locally. First, if
`\http://www.example.com/solr/mycollection/select?q=brown+cow`
[[AboutThisGuide-Paths]]
== Paths
Path information is given relative to `solr.home`, which is the location under the main Solr installation where Solr's collections and their `conf` and `data` directories are stored. When running the various examples mentioned through out this tutorial (i.e., `bin/solr -e techproducts`) the `solr.home` will be a sub-directory of `example/` created for you automatically.
Path information is given relative to `solr.home`, which is the location under the main Solr installation where Solr's collections and their `conf` and `data` directories are stored.
When running the various examples mentioned through out this tutorial (i.e., `bin/solr -e techproducts`) the `solr.home` will be a sub-directory of `example/` created for you automatically.
== Special Inline Notes
Special notes are included throughout these pages. There are several types of notes:
=== Information blocks
NOTE: These provide additional information that's useful for you to know.
=== Important
IMPORTANT: These provide information that is critical for you to know.
=== Tip
TIP: These provide helpful tips.
=== Caution
CAUTION: These provide details on scenarios or configurations you should be careful with.
=== Warning
WARNING: These are meant to warn you from a possibly dangerous change or action.

View File

@ -37,7 +37,6 @@ A `TypeTokenFilterFactory` is available that creates a `TypeTokenFilter` that fi
For a complete list of the available TokenFilters, see the section <<tokenizers.adoc#tokenizers,Tokenizers>>.
[[AboutTokenizers-WhenTouseaCharFiltervs.aTokenFilter]]
== When To use a CharFilter vs. a TokenFilter
There are several pairs of CharFilters and TokenFilters that have related (ie: `MappingCharFilter` and `ASCIIFoldingFilter`) or nearly identical (ie: `PatternReplaceCharFilterFactory` and `PatternReplaceFilterFactory`) functionality and it may not always be obvious which is the best choice.

View File

@ -30,12 +30,10 @@ In addition to requiring that Solr by running in <<solrcloud.adoc#solrcloud,Solr
Before enabling this feature, users should carefully consider the issues discussed in the <<Securing Runtime Libraries>> section below.
====
[[AddingCustomPluginsinSolrCloudMode-UploadingJarFiles]]
== Uploading Jar Files
The first step is to use the <<blob-store-api.adoc#blob-store-api,Blob Store API>> to upload your jar files. This will to put your jars in the `.system` collection and distribute them across your SolrCloud nodes. These jars are added to a separate classloader and only accessible to components that are configured with the property `runtimeLib=true`. These components are loaded lazily because the `.system` collection may not be loaded when a particular core is loaded.
[[AddingCustomPluginsinSolrCloudMode-ConfigAPICommandstouseJarsasRuntimeLibraries]]
== Config API Commands to use Jars as Runtime Libraries
The runtime library feature uses a special set of commands for the <<config-api.adoc#config-api,Config API>> to add, update, or remove jar files currently available in the blob store to the list of runtime libraries.
@ -74,14 +72,12 @@ curl http://localhost:8983/solr/techproducts/config -H 'Content-type:application
}'
----
[[AddingCustomPluginsinSolrCloudMode-SecuringRuntimeLibraries]]
== Securing Runtime Libraries
A drawback of this feature is that it could be used to load malicious executable code into the system. However, it is possible to restrict the system to load only trusted jars using http://en.wikipedia.org/wiki/Public_key_infrastructure[PKI] to verify that the executables loaded into the system are trustworthy.
The following steps will allow you enable security for this feature. The instructions assume you have started all your Solr nodes with the `-Denable.runtime.lib=true`.
[[Step1_GenerateanRSAPrivateKey]]
=== Step 1: Generate an RSA Private Key
The first step is to generate an RSA private key. The example below uses a 512-bit key, but you should use the strength appropriate to your needs.
@ -91,7 +87,6 @@ The first step is to generate an RSA private key. The example below uses a 512-b
$ openssl genrsa -out priv_key.pem 512
----
[[Step2_OutputthePublicKey]]
=== Step 2: Output the Public Key
The public portion of the key should be output in DER format so Java can read it.
@ -101,7 +96,6 @@ The public portion of the key should be output in DER format so Java can read it
$ openssl rsa -in priv_key.pem -pubout -outform DER -out pub_key.der
----
[[Step3_LoadtheKeytoZooKeeper]]
=== Step 3: Load the Key to ZooKeeper
The `.der` files that are output from Step 2 should then be loaded to ZooKeeper under a node `/keys/exe` so they are available throughout every node. You can load any number of public keys to that node and all are valid. If a key is removed from the directory, the signatures of that key will cease to be valid. So, before removing the a key, make sure to update your runtime library configurations with valid signatures with the `update-runtimelib` command.
@ -130,7 +124,6 @@ $ .bin/zkCli.sh -server localhost:9983
After this, any attempt to load a jar will fail. All your jars must be signed with one of your private keys for Solr to trust it. The process to sign your jars and use the signature is outlined in Steps 4-6.
[[Step4_SignthejarFile]]
=== Step 4: Sign the jar File
Next you need to sign the sha1 digest of your jar file and get the base64 string.
@ -142,7 +135,6 @@ $ openssl dgst -sha1 -sign priv_key.pem myjar.jar | openssl enc -base64
The output of this step will be a string that you will need to add the jar to your classpath in Step 6 below.
[[Step5_LoadthejartotheBlobStore]]
=== Step 5: Load the jar to the Blob Store
Load your jar to the Blob store, using the <<blob-store-api.adoc#blob-store-api,Blob Store API>>. This step does not require a signature; you will need the signature in Step 6 to add it to your classpath.
@ -155,7 +147,6 @@ http://localhost:8983/solr/.system/blob/{blobname}
The blob name that you give the jar file in this step will be used as the name in the next step.
[[Step6_AddthejartotheClasspath]]
=== Step 6: Add the jar to the Classpath
Finally, add the jar to the classpath using the Config API as detailed above. In this step, you will need to provide the signature of the jar that you got in Step 4.

View File

@ -60,7 +60,6 @@ In this case, no Analyzer class was specified on the `<analyzer>` element. Rathe
The output of an Analyzer affects the _terms_ indexed in a given field (and the terms used when parsing queries against those fields) but it has no impact on the _stored_ value for the fields. For example: an analyzer might split "Brown Cow" into two indexed terms "brown" and "cow", but the stored value will still be a single String: "Brown Cow"
====
[[Analyzers-AnalysisPhases]]
== Analysis Phases
Analysis takes place in two contexts. At index time, when a field is being created, the token stream that results from analysis is added to an index and defines the set of terms (including positions, sizes, and so on) for the field. At query time, the values being searched for are analyzed and the terms that result are matched against those that are stored in the field's index.
@ -89,7 +88,6 @@ In this theoretical example, at index time the text is tokenized, the tokens are
At query time, the only normalization that happens is to convert the query terms to lowercase. The filtering and mapping steps that occur at index time are not applied to the query terms. Queries must then, in this example, be very precise, using only the normalized terms that were stored at index time.
[[Analyzers-AnalysisforMulti-TermExpansion]]
=== Analysis for Multi-Term Expansion
In some types of queries (ie: Prefix, Wildcard, Regex, etc...) the input provided by the user is not natural language intended for Analysis. Things like Synonyms or Stop word filtering do not work in a logical way in these types of Queries.

View File

@ -27,7 +27,6 @@ All authentication and authorization plugins can work with Solr whether they are
The following section describes how to enable plugins with `security.json` and place them in the proper locations for your mode of operation.
[[AuthenticationandAuthorizationPlugins-EnablePluginswithsecurity.json]]
== Enable Plugins with security.json
All of the information required to initialize either type of security plugin is stored in a `security.json` file. This file contains 2 sections, one each for authentication and authorization.
@ -45,7 +44,7 @@ All of the information required to initialize either type of security plugin is
}
----
The `/security.json` file needs to be in the proper location before a Solr instance comes up so Solr starts with the security plugin enabled. See the section <<AuthenticationandAuthorizationPlugins-Usingsecurity.jsonwithSolr,Using security.json with Solr>> below for information on how to do this.
The `/security.json` file needs to be in the proper location before a Solr instance comes up so Solr starts with the security plugin enabled. See the section <<Using security.json with Solr>> below for information on how to do this.
Depending on the plugin(s) in use, other information will be stored in `security.json` such as user information or rules to create roles and permissions. This information is added through the APIs for each plugin provided by Solr, or, in the case of a custom plugin, the approach designed by you.
@ -66,10 +65,8 @@ Here is a more detailed `security.json` example. In this, the Basic authenticati
}}
----
[[AuthenticationandAuthorizationPlugins-Usingsecurity.jsonwithSolr]]
== Using security.json with Solr
[[AuthenticationandAuthorizationPlugins-InSolrCloudmode]]
=== In SolrCloud Mode
While configuring Solr to use an authentication or authorization plugin, you will need to upload a `security.json` file to ZooKeeper. The following command writes the file as it uploads it - you could also upload a file that you have already created locally.
@ -91,7 +88,6 @@ Depending on the authentication and authorization plugin that you use, you may h
Once `security.json` has been uploaded to ZooKeeper, you should use the appropriate APIs for the plugins you're using to update it. You can edit it manually, but you must take care to remove any version data so it will be properly updated across all ZooKeeper nodes. The version data is found at the end of the `security.json` file, and will appear as the letter "v" followed by a number, such as `{"v":138}`.
[[AuthenticationandAuthorizationPlugins-InStandaloneMode]]
=== In Standalone Mode
When running Solr in standalone mode, you need to create the `security.json` file and put it in the `$SOLR_HOME` directory for your installation (this is the same place you have located `solr.xml` and is usually `server/solr`).
@ -100,8 +96,7 @@ If you are using <<legacy-scaling-and-distribution.adoc#legacy-scaling-and-distr
You can use the authentication and authorization APIs, but if you are using the legacy scaling model, you will need to make the same API requests on each node separately. You can also edit `security.json` by hand if you prefer.
[[AuthenticationandAuthorizationPlugins-Authentication]]
== Authentication
== Authentication Plugins
Authentication plugins help in securing the endpoints of Solr by authenticating incoming requests. A custom plugin can be implemented by extending the AuthenticationPlugin class.
@ -110,7 +105,6 @@ An authentication plugin consists of two parts:
. Server-side component, which intercepts and authenticates incoming requests to Solr using a mechanism defined in the plugin, such as Kerberos, Basic Auth or others.
. Client-side component, i.e., an extension of `HttpClientConfigurer`, which enables a SolrJ client to make requests to a secure Solr instance using the authentication mechanism which the server understands.
[[AuthenticationandAuthorizationPlugins-EnablingaPlugin]]
=== Enabling a Plugin
* Specify the authentication plugin in `/security.json` as in this example:
@ -126,7 +120,6 @@ An authentication plugin consists of two parts:
* All of the content in the authentication block of `security.json` would be passed on as a map to the plugin during initialization.
* An authentication plugin can also be used with a standalone Solr instance by passing in `-DauthenticationPlugin=<plugin class name>` during startup.
[[AuthenticationandAuthorizationPlugins-AvailableAuthenticationPlugins]]
=== Available Authentication Plugins
Solr has the following implementations of authentication plugins:
@ -135,12 +128,10 @@ Solr has the following implementations of authentication plugins:
* <<basic-authentication-plugin.adoc#basic-authentication-plugin,Basic Authentication Plugin>>
* <<hadoop-authentication-plugin.adoc#hadoop-authentication-plugin,Hadoop Authentication Plugin>>
[[AuthenticationandAuthorizationPlugins-Authorization]]
== Authorization
An authorization plugin can be written for Solr by extending the {solr-javadocs}/solr-core/org/apache/solr/security/AuthorizationPlugin.html[AuthorizationPlugin] interface.
[[AuthenticationandAuthorizationPlugins-LoadingaCustomPlugin]]
=== Loading a Custom Plugin
* Make sure that the plugin implementation is in the classpath.
@ -162,21 +153,16 @@ All of the content in the `authorization` block of `security.json` would be pass
The authorization plugin is only supported in SolrCloud mode. Also, reloading the plugin isn't yet supported and requires a restart of the Solr installation (meaning, the JVM should be restarted, not simply a core reload).
====
[[AuthenticationandAuthorizationPlugins-AvailableAuthorizationPlugins]]
=== Available Authorization Plugins
Solr has one implementation of an authorization plugin:
* <<rule-based-authorization-plugin.adoc#rule-based-authorization-plugin,Rule-Based Authorization Plugin>>
[[AuthenticationandAuthorizationPlugins-PKISecuringInter-NodeRequests]]
[[AuthenticationandAuthorizationPlugins-PKI]]
== Securing Inter-Node Requests
There are a lot of requests that originate from the Solr nodes itself. For example, requests from overseer to nodes, recovery threads, etc. Each Authentication plugin declares whether it is capable of securing inter-node requests or not. If not, Solr will fall back to using a special internode authentication mechanism where each Solr node is a super user and is fully trusted by other Solr nodes, described below.
[[AuthenticationandAuthorizationPlugins-PKIAuthenticationPlugin]]
=== PKIAuthenticationPlugin
The PKIAuthenticationPlugin is used when there is any request going on between two Solr nodes, and the configured Authentication plugin does not wish to handle inter-node security.

View File

@ -22,10 +22,9 @@ Solr can support Basic authentication for users with the use of the BasicAuthPlu
An authorization plugin is also available to configure Solr with permissions to perform various activities in the system. The authorization plugin is described in the section <<rule-based-authorization-plugin.adoc#rule-based-authorization-plugin,Rule-Based Authorization Plugin>>.
[[BasicAuthenticationPlugin-EnableBasicAuthentication]]
== Enable Basic Authentication
To use Basic authentication, you must first create a `security.json` file. This file and where to put it is described in detail in the section <<authentication-and-authorization-plugins.adoc#AuthenticationandAuthorizationPlugins-EnablePluginswithsecurity.json,Enable Plugins with security.json>>.
To use Basic authentication, you must first create a `security.json` file. This file and where to put it is described in detail in the section <<authentication-and-authorization-plugins.adoc#enable-plugins-with-security-json,Enable Plugins with security.json>>.
For Basic authentication, the `security.json` file must have an `authentication` part which defines the class being used for authentication. Usernames and passwords (as a sha256(password+salt) hash) could be added when the file is created, or can be added later with the Basic authentication API, described below.
@ -68,7 +67,6 @@ If you are using SolrCloud, you must upload `security.json` to ZooKeeper. You ca
bin/solr zk cp file:path_to_local_security.json zk:/security.json -z localhost:9983
----
[[BasicAuthenticationPlugin-Caveats]]
=== Caveats
There are a few things to keep in mind when using the Basic authentication plugin.
@ -77,19 +75,16 @@ There are a few things to keep in mind when using the Basic authentication plugi
* A user who has access to write permissions to `security.json` will be able to modify all the permissions and how users have been assigned permissions. Special care should be taken to only grant access to editing security to appropriate users.
* Your network should, of course, be secure. Even with Basic authentication enabled, you should not unnecessarily expose Solr to the outside world.
[[BasicAuthenticationPlugin-EditingAuthenticationPluginConfiguration]]
== Editing Authentication Plugin Configuration
An Authentication API allows modifying user IDs and passwords. The API provides an endpoint with specific commands to set user details or delete a user.
[[BasicAuthenticationPlugin-APIEntryPoint]]
=== API Entry Point
`admin/authentication`
This endpoint is not collection-specific, so users are created for the entire Solr cluster. If users need to be restricted to a specific collection, that can be done with the authorization rules.
[[BasicAuthenticationPlugin-AddaUserorEditaPassword]]
=== Add a User or Edit a Password
The `set-user` command allows you to add users and change their passwords. For example, the following defines two users and their passwords:
@ -101,7 +96,6 @@ curl --user solr:SolrRocks http://localhost:8983/solr/admin/authentication -H 'C
"harry":"HarrysSecret"}}'
----
[[BasicAuthenticationPlugin-DeleteaUser]]
=== Delete a User
The `delete-user` command allows you to remove a user. The user password does not need to be sent to remove a user. In the following example, we've asked that user IDs 'tom' and 'harry' be removed from the system.
@ -112,7 +106,6 @@ curl --user solr:SolrRocks http://localhost:8983/solr/admin/authentication -H 'C
"delete-user": ["tom","harry"]}'
----
[[BasicAuthenticationPlugin-Setaproperty]]
=== Set a Property
Set arbitrary properties for authentication plugin. The only supported property is `'blockUnknown'`
@ -123,7 +116,6 @@ curl --user solr:SolrRocks http://localhost:8983/solr/admin/authentication -H 'C
"set-property": {"blockUnknown":false}}'
----
[[BasicAuthenticationPlugin-UsingBasicAuthwithSolrJ]]
=== Using BasicAuth with SolrJ
In SolrJ, the basic authentication credentials need to be set for each request as in this example:
@ -144,7 +136,6 @@ req.setBasicAuthCredentials(userName, password);
QueryResponse rsp = req.process(solrClient);
----
[[BasicAuthenticationPlugin-UsingCommandLinescriptswithBasicAuth]]
=== Using Command Line scripts with BasicAuth
Add the following line to the `solr.in.sh` or `solr.in.cmd` file. This example tells the `bin/solr` command line to to use "basic" as the type of authentication, and to pass credentials with the user-name "solr" and password "SolrRocks":

View File

@ -28,7 +28,6 @@ When using the blob store, note that the API does not delete or overwrite a prev
The blob store API is implemented as a requestHandler. A special collection named ".system" is used to store the blobs. This collection can be created in advance, but if it does not exist it will be created automatically.
[[BlobStoreAPI-Aboutthe.systemCollection]]
== About the .system Collection
Before uploading blobs to the blob store, a special collection must be created and it must be named `.system`. Solr will automatically create this collection if it does not already exist, but you can also create it manually if you choose.
@ -46,7 +45,6 @@ curl http://localhost:8983/solr/admin/collections?action=CREATE&name=.system&rep
IMPORTANT: The `bin/solr` script cannot be used to create the `.system` collection.
[[BlobStoreAPI-UploadFilestoBlobStore]]
== Upload Files to Blob Store
After the `.system` collection has been created, files can be uploaded to the blob store with a request similar to the following:
@ -132,7 +130,6 @@ For the latest version of a blob, the \{version} can be omitted,
curl http://localhost:8983/solr/.system/blob/{blobname}?wt=filestream > {outputfilename}
----
[[BlobStoreAPI-UseaBlobinaHandlerorComponent]]
== Use a Blob in a Handler or Component
To use the blob as the class for a request handler or search component, you create a request handler in `solrconfig.xml` as usual. You will need to define the following parameters:

View File

@ -42,7 +42,7 @@ This example shows how you could add this search components to `solrconfig.xml`
This component can be added into any search request handler. This component work with distributed search in SolrCloud mode.
Documents should be added in children-parent blocks as described in <<uploading-data-with-index-handlers.adoc#UploadingDatawithIndexHandlers-NestedChildDocuments,indexing nested child documents>>. Examples:
Documents should be added in children-parent blocks as described in <<uploading-data-with-index-handlers.adoc#nested-child-documents,indexing nested child documents>>. Examples:
.Sample document
[source,xml]
@ -95,7 +95,7 @@ Documents should be added in children-parent blocks as described in <<uploading-
</add>
----
Queries are constructed the same way as for a <<other-parsers.adoc#OtherParsers-BlockJoinQueryParsers,Parent Block Join query>>. For example:
Queries are constructed the same way as for a <<other-parsers.adoc#block-join-query-parsers,Parent Block Join query>>. For example:
[source,text]
----

View File

@ -22,7 +22,6 @@ CharFilter is a component that pre-processes input characters.
CharFilters can be chained like Token Filters and placed in front of a Tokenizer. CharFilters can add, change, or remove characters while preserving the original character offsets to support features like highlighting.
[[CharFilterFactories-solr.MappingCharFilterFactory]]
== solr.MappingCharFilterFactory
This filter creates `org.apache.lucene.analysis.MappingCharFilter`, which can be used for changing one string to another (for example, for normalizing `é` to `e`.).
@ -65,7 +64,6 @@ Mapping file syntax:
|===
** A backslash followed by any other character is interpreted as if the character were present without the backslash.
[[CharFilterFactories-solr.HTMLStripCharFilterFactory]]
== solr.HTMLStripCharFilterFactory
This filter creates `org.apache.solr.analysis.HTMLStripCharFilter`. This CharFilter strips HTML from the input stream and passes the result to another CharFilter or a Tokenizer.
@ -114,7 +112,6 @@ Example:
</analyzer>
----
[[CharFilterFactories-solr.ICUNormalizer2CharFilterFactory]]
== solr.ICUNormalizer2CharFilterFactory
This filter performs pre-tokenization Unicode normalization using http://site.icu-project.org[ICU4J].
@ -138,7 +135,6 @@ Example:
</analyzer>
----
[[CharFilterFactories-solr.PatternReplaceCharFilterFactory]]
== solr.PatternReplaceCharFilterFactory
This filter uses http://www.regular-expressions.info/reference.html[regular expressions] to replace or change character patterns.

View File

@ -24,10 +24,9 @@ The Collapsing query parser groups documents (collapsing the result set) accordi
[IMPORTANT]
====
In order to use these features with SolrCloud, the documents must be located on the same shard. To ensure document co-location, you can define the `router.name` parameter as `compositeId` when creating the collection. For more information on this option, see the section <<shards-and-indexing-data-in-solrcloud.adoc#ShardsandIndexingDatainSolrCloud-DocumentRouting,Document Routing>>.
In order to use these features with SolrCloud, the documents must be located on the same shard. To ensure document co-location, you can define the `router.name` parameter as `compositeId` when creating the collection. For more information on this option, see the section <<shards-and-indexing-data-in-solrcloud.adoc#document-routing,Document Routing>>.
====
[[CollapseandExpandResults-CollapsingQueryParser]]
== Collapsing Query Parser
The `CollapsingQParser` is really a _post filter_ that provides more performant field collapsing than Solr's standard approach when the number of distinct groups in the result set is high. This parser collapses the result set to a single document per group before it forwards the result set to the rest of the search components. So all downstream components (faceting, highlighting, etc...) will work with the collapsed result set.
@ -121,7 +120,6 @@ fq={!collapse field=group_field hint=top_fc}
The CollapsingQParserPlugin fully supports the QueryElevationComponent.
[[CollapseandExpandResults-ExpandComponent]]
== Expand Component
The ExpandComponent can be used to expand the groups that were collapsed by the http://heliosearch.org/the-collapsingqparserplugin-solrs-new-high-performance-field-collapsing-postfilter/[CollapsingQParserPlugin].

View File

@ -24,7 +24,7 @@ The Collections API is used to create, remove, or reload collections.
In the context of SolrCloud you can use it to create collections with a specific number of shards and replicas, move replicas or shards, and create or delete collection aliases.
[[CollectionsAPI-create]]
[[create]]
== CREATE: Create a Collection
`/admin/collections?action=CREATE&name=_name_`
@ -45,7 +45,7 @@ The `compositeId` router hashes the value in the uniqueKey field and looks up th
+
When using the `implicit` router, the `shards` parameter is required. When using the `compositeId` router, the `numShards` parameter is required.
+
For more information, see also the section <<shards-and-indexing-data-in-solrcloud.adoc#ShardsandIndexingDatainSolrCloud-DocumentRouting,Document Routing>>.
For more information, see also the section <<shards-and-indexing-data-in-solrcloud.adoc#document-routing,Document Routing>>.
`numShards`::
The number of shards to be created as part of the collection. This is a required parameter when the `router.name` is `compositeId`.
@ -68,7 +68,7 @@ Allows defining the nodes to spread the new collection across. The format is a c
+
If not provided, the CREATE operation will create shard-replicas spread across all live Solr nodes.
+
Alternatively, use the special value of `EMPTY` to initially create no shard-replica within the new collection and then later use the <<CollectionsAPI-addreplica,ADDREPLICA>> operation to add shard-replicas when and where required.
Alternatively, use the special value of `EMPTY` to initially create no shard-replica within the new collection and then later use the <<addreplica,ADDREPLICA>> operation to add shard-replicas when and where required.
`createNodeSet.shuffle`::
Controls wether or not the shard-replicas created for this collection will be assigned to the nodes specified by the `createNodeSet` in a sequential manner, or if the list of nodes should be shuffled prior to creating individual replicas.
@ -89,10 +89,10 @@ Please note that <<realtime-get.adoc#realtime-get,RealTime Get>> or retrieval by
Set core property _name_ to _value_. See the section <<defining-core-properties.adoc#defining-core-properties,Defining core.properties>> for details on supported properties and values.
`autoAddReplicas`::
When set to `true`, enables automatic addition of replicas on shared file systems (such as HDFS) only. See the section <<running-solr-on-hdfs.adoc#RunningSolronHDFS-AutomaticallyAddReplicasinSolrCloud,autoAddReplicas Settings>> for more details on settings and overrides. The default is `false`.
When set to `true`, enables automatic addition of replicas on shared file systems (such as HDFS) only. See the section <<running-solr-on-hdfs.adoc#automatically-add-replicas-in-solrcloud,autoAddReplicas Settings>> for more details on settings and overrides. The default is `false`.
`async`::
Request ID to track this action which will be <<CollectionsAPI-async,processed asynchronously>>.
Request ID to track this action which will be <<Asynchronous Calls,processed asynchronously>>.
`rule`::
Replica placement rules. See the section <<rule-based-replica-placement.adoc#rule-based-replica-placement,Rule-based Replica Placement>> for details.
@ -141,7 +141,7 @@ http://localhost:8983/solr/admin/collections?action=CREATE&name=newCollection&nu
</response>
----
[[CollectionsAPI-modifycollection]]
[[modifycollection]]
== MODIFYCOLLECTION: Modify Attributes of a Collection
`/admin/collections?action=MODIFYCOLLECTION&collection=_<collection-name>&<attribute-name>=<attribute-value>&<another-attribute-name>=<another-value>_`
@ -165,10 +165,9 @@ The attributes that can be modified are:
* rule
* snitch
+
See the <<CollectionsAPI-create,CREATE action>> section above for details on these attributes.
See the <<create,CREATE action>> section above for details on these attributes.
[[CollectionsAPI-reload]]
[[reload]]
== RELOAD: Reload a Collection
`/admin/collections?action=RELOAD&name=_name_`
@ -177,11 +176,11 @@ The RELOAD action is used when you have changed a configuration in ZooKeeper.
=== RELOAD Parameters
|`name`::
`name`::
The name of the collection to reload. This parameter is required.
`async`::
Request ID to track this action which will be <<CollectionsAPI-async,processed asynchronously>>.
Request ID to track this action which will be <<Asynchronous Calls,processed asynchronously>>.
=== RELOAD Response
@ -222,7 +221,7 @@ http://localhost:8983/solr/admin/collections?action=RELOAD&name=newCollection
</response>
----
[[CollectionsAPI-splitshard]]
[[splitshard]]
== SPLITSHARD: Split a Shard
`/admin/collections?action=SPLITSHARD&collection=_name_&shard=_shardID_`
@ -233,7 +232,7 @@ This command allows for seamless splitting and requires no downtime. A shard bei
The split is performed by dividing the original shard's hash range into two equal partitions and dividing up the documents in the original shard according to the new sub-ranges. Two parameters discussed below, `ranges` and `split.key` provide further control over how the split occurs.
Shard splitting can be a long running process. In order to avoid timeouts, you should run this as an <<CollectionsAPI-async,asynchronous call>>.
Shard splitting can be a long running process. In order to avoid timeouts, you should run this as an <<Asynchronous Calls,asynchronous call>>.
=== SPLITSHARD Parameters
@ -259,7 +258,7 @@ For example, suppose `split.key=A!` hashes to the range `12-15` and belongs to s
Set core property _name_ to _value_. See the section <<defining-core-properties.adoc#defining-core-properties,Defining core.properties>> for details on supported properties and values.
`async`::
Request ID to track this action which will be <<CollectionsAPI-async,processed asynchronously>>
Request ID to track this action which will be <<Asynchronous Calls,processed asynchronously>>
=== SPLITSHARD Response
@ -338,7 +337,7 @@ http://localhost:8983/solr/admin/collections?action=SPLITSHARD&collection=anothe
</response>
----
[[CollectionsAPI-createshard]]
[[createshard]]
== CREATESHARD: Create a Shard
Shards can only created with this API for collections that use the 'implicit' router (i.e., when the collection was created, `router.name=implicit`). A new shard with a name can be created for an existing 'implicit' collection.
@ -364,7 +363,7 @@ The format is a comma-separated list of node_names, such as `localhost:8983_solr
Set core property _name_ to _value_. See the section <<defining-core-properties.adoc#defining-core-properties,Defining core.properties>> for details on supported properties and values.
`async`::
Request ID to track this action which will be <<CollectionsAPI-async,processed asynchronously>>.
Request ID to track this action which will be <<Asynchronous Calls,processed asynchronously>>.
=== CREATESHARD Response
@ -393,7 +392,7 @@ http://localhost:8983/solr/admin/collections?action=CREATESHARD&collection=anImp
</response>
----
[[CollectionsAPI-deleteshard]]
[[deleteshard]]
== DELETESHARD: Delete a Shard
Deleting a shard will unload all replicas of the shard, remove them from `clusterstate.json`, and (by default) delete the instanceDir and dataDir for each replica. It will only remove shards that are inactive, or which have no range given for custom sharding.
@ -418,7 +417,7 @@ By default Solr will delete the dataDir of each replica that is deleted. Set thi
By default Solr will delete the index of each replica that is deleted. Set this to `false` to prevent the index directory from being deleted.
`async`::
Request ID to track this action which will be <<CollectionsAPI-async,processed asynchronously>>.
Request ID to track this action which will be <<Asynchronous Calls,processed asynchronously>>.
=== DELETESHARD Response
@ -455,7 +454,7 @@ http://localhost:8983/solr/admin/collections?action=DELETESHARD&collection=anoth
</response>
----
[[CollectionsAPI-createalias]]
[[createalias]]
== CREATEALIAS: Create or Modify an Alias for a Collection
The `CREATEALIAS` action will create a new alias pointing to one or more collections. If an alias by the same name already exists, this action will replace the existing alias, effectively acting like an atomic "MOVE" command.
@ -471,14 +470,12 @@ The alias name to be created. This parameter is required.
A comma-separated list of collections to be aliased. The collections must already exist in the cluster. This parameter is required.
`async`::
Request ID to track this action which will be <<CollectionsAPI-async,processed asynchronously>>.
Request ID to track this action which will be <<Asynchronous Calls,processed asynchronously>>.
[[CollectionsAPI-Output.5]]
=== CREATEALIAS Response
The output will simply be a responseHeader with details of the time it took to process the request. To confirm the creation of the alias, you can look in the Solr Admin UI, under the Cloud section and find the `aliases.json` file.
[[CollectionsAPI-Examples.5]]
=== Examples using CREATEALIAS
*Input*
@ -502,7 +499,7 @@ http://localhost:8983/solr/admin/collections?action=CREATEALIAS&name=testalias&c
</response>
----
[[CollectionsAPI-listaliases]]
[[listaliases]]
== LISTALIASES: List of all aliases in the cluster
`/admin/collections?action=LISTALIASES`
@ -531,7 +528,7 @@ The output will contain a list of aliases with the corresponding collection name
</response>
----
[[CollectionsAPI-deletealias]]
[[deletealias]]
== DELETEALIAS: Delete a Collection Alias
`/admin/collections?action=DELETEALIAS&name=_name_`
@ -542,7 +539,7 @@ The output will contain a list of aliases with the corresponding collection name
The name of the alias to delete. This parameter is required.
`async`::
Request ID to track this action which will be <<CollectionsAPI-async,processed asynchronously>>.
Request ID to track this action which will be <<Asynchronous Calls,processed asynchronously>>.
=== DELETEALIAS Response
@ -571,7 +568,7 @@ http://localhost:8983/solr/admin/collections?action=DELETEALIAS&name=testalias
</response>
----
[[CollectionsAPI-delete]]
[[delete]]
== DELETE: Delete a Collection
`/admin/collections?action=DELETE&name=_collection_`
@ -582,7 +579,7 @@ http://localhost:8983/solr/admin/collections?action=DELETEALIAS&name=testalias
The name of the collection to delete. This parameter is required.
`async`::
Request ID to track this action which will be <<CollectionsAPI-async,processed asynchronously>>.
Request ID to track this action which will be <<Asynchronous Calls,processed asynchronously>>.
=== DELETE Response
@ -625,7 +622,7 @@ http://localhost:8983/solr/admin/collections?action=DELETE&name=newCollection
</response>
----
[[CollectionsAPI-deletereplica]]
[[deletereplica]]
== DELETEREPLICA: Delete a Replica
Deletes a named replica from the specified collection and shard.
@ -665,7 +662,7 @@ By default Solr will delete the index of the replica that is deleted. Set this t
When set to `true`, no action will be taken if the replica is active. Default `false`.
`async`::
Request ID to track this action which will be <<CollectionsAPI-async,processed asynchronously>>.
Request ID to track this action which will be <<Asynchronous Calls,processed asynchronously>>.
=== Examples using DELETEREPLICA
@ -688,7 +685,7 @@ http://localhost:8983/solr/admin/collections?action=DELETEREPLICA&collection=tes
</response>
----
[[CollectionsAPI-addreplica]]
[[addreplica]]
== ADDREPLICA: Add Replica
Add a replica to a shard in a collection. The node name can be specified if the replica is to be created in a specific node.
@ -722,7 +719,8 @@ The directory in which the core should be created
`property._name_=_value_`::
Set core property _name_ to _value_. See <<defining-core-properties.adoc#defining-core-properties,Defining core.properties>> for details about supported properties and values.
`async`:: string |No |Request ID to track this action which will be <<CollectionsAPI-async,processed asynchronously>>
`async`::
Request ID to track this action which will be <<Asynchronous Calls,processed asynchronously>>
=== Examples using ADDREPLICA
@ -754,7 +752,7 @@ http://localhost:8983/solr/admin/collections?action=ADDREPLICA&collection=test2&
</response>
----
[[CollectionsAPI-clusterprop]]
[[clusterprop]]
== CLUSTERPROP: Cluster Properties
Add, edit or delete a cluster-wide property.
@ -794,7 +792,7 @@ http://localhost:8983/solr/admin/collections?action=CLUSTERPROP&name=urlScheme&v
</response>
----
[[CollectionsAPI-migrate]]
[[migrate]]
== MIGRATE: Migrate Documents to Another Collection
`/admin/collections?action=MIGRATE&collection=_name_&split.key=_key1!_&target.collection=_target_collection_&forward.timeout=60`
@ -827,7 +825,7 @@ The timeout, in seconds, until which write requests made to the source collectio
Set core property _name_ to _value_. See the section <<defining-core-properties.adoc#defining-core-properties,Defining core.properties>> for details on supported properties and values.
`async`::
Request ID to track this action which will be <<CollectionsAPI-async,processed asynchronously>>.
Request ID to track this action which will be <<Asynchronous Calls,processed asynchronously>>.
=== MIGRATE Response
@ -988,7 +986,7 @@ http://localhost:8983/solr/admin/collections?action=MIGRATE&collection=test1&spl
</response>
----
[[CollectionsAPI-addrole]]
[[addrole]]
== ADDROLE: Add a Role
`/admin/collections?action=ADDROLE&role=_roleName_&node=_nodeName_`
@ -1003,7 +1001,7 @@ Use this command to dedicate a particular node as Overseer. Invoke it multiple t
The name of the role. The only supported role as of now is `overseer`. This parameter is required.
`node`::
|The name of the node that will be assigned the role. It is possible to assign a role even before that node is started. This parameter is started.
The name of the node that will be assigned the role. It is possible to assign a role even before that node is started. This parameter is started.
=== ADDROLE Response
@ -1030,7 +1028,7 @@ http://localhost:8983/solr/admin/collections?action=ADDROLE&role=overseer&node=1
</response>
----
[[CollectionsAPI-removerole]]
[[removerole]]
== REMOVEROLE: Remove Role
Remove an assigned role. This API is used to undo the roles assigned using ADDROLE operation
@ -1046,7 +1044,6 @@ The name of the role. The only supported role as of now is `overseer`. This para
The name of the node where the role should be removed.
[[CollectionsAPI-Output.11]]
=== REMOVEROLE Response
The response will include the status of the request and the properties that were updated or removed. If the status is anything other than "0", an error message will explain why the request failed.
@ -1072,7 +1069,7 @@ http://localhost:8983/solr/admin/collections?action=REMOVEROLE&role=overseer&nod
</response>
----
[[CollectionsAPI-overseerstatus]]
[[overseerstatus]]
== OVERSEERSTATUS: Overseer Status and Statistics
Returns the current status of the overseer, performance statistics of various overseer APIs, and the last 10 failures per operation type.
@ -1146,7 +1143,7 @@ http://localhost:8983/solr/admin/collections?action=OVERSEERSTATUS&wt=json
}
----
[[CollectionsAPI-clusterstatus]]
[[clusterstatus]]
== CLUSTERSTATUS: Cluster Status
Fetch the cluster status including collections, shards, replicas, configuration name as well as collection aliases and cluster properties.
@ -1168,7 +1165,6 @@ This can be used if you need the details of the shard where a particular documen
The response will include the status of the request and the status of the cluster.
[[CollectionsAPI-Examples.15]]
=== Examples using CLUSTERSTATUS
*Input*
@ -1247,10 +1243,10 @@ http://localhost:8983/solr/admin/collections?action=clusterstatus&wt=json
}
----
[[CollectionsAPI-requeststatus]]
[[requeststatus]]
== REQUESTSTATUS: Request Status of an Async Call
Request the status and response of an already submitted <<CollectionsAPI-async,Asynchronous Collection API>> (below) call. This call is also used to clear up the stored statuses.
Request the status and response of an already submitted <<Asynchronous Calls,Asynchronous Collection API>> (below) call. This call is also used to clear up the stored statuses.
`/admin/collections?action=REQUESTSTATUS&requestid=_request-id_`
@ -1307,10 +1303,10 @@ http://localhost:8983/solr/admin/collections?action=REQUESTSTATUS&requestid=1004
</response>
----
[[CollectionsAPI-deletestatus]]
[[deletestatus]]
== DELETESTATUS: Delete Status
Deletes the stored response of an already failed or completed <<CollectionsAPI-async,Asynchronous Collection API>> call.
Deletes the stored response of an already failed or completed <<Asynchronous Calls,Asynchronous Collection API>> call.
`/admin/collections?action=DELETESTATUS&requestid=_request-id_`
@ -1384,7 +1380,7 @@ http://localhost:8983/solr/admin/collections?action=DELETESTATUS&flush=true
</response>
----
[[CollectionsAPI-list]]
[[list]]
== LIST: List Collections
Fetch the names of the collections in the cluster.
@ -1413,7 +1409,7 @@ http://localhost:8983/solr/admin/collections?action=LIST&wt=json
"example2"]}
----
[[CollectionsAPI-addreplicaprop]]
[[addreplicaprop]]
== ADDREPLICAPROP: Add Replica Property
Assign an arbitrary property to a particular replica and give it the value specified. If the property already exists, it will be overwritten with the new value.
@ -1501,7 +1497,7 @@ http://localhost:8983/solr/admin/collections?action=ADDREPLICAPROP&shard=shard1&
http://localhost:8983/solr/admin/collections?action=ADDREPLICAPROP&shard=shard1&collection=collection1&replica=core_node3&property=testprop&property.value=value2&shardUnique=true
----
[[CollectionsAPI-deletereplicaprop]]
[[deletereplicaprop]]
== DELETEREPLICAPROP: Delete Replica Property
Deletes an arbitrary property from a particular replica.
@ -1555,7 +1551,7 @@ http://localhost:8983/solr/admin/collections?action=DELETEREPLICAPROP&shard=shar
</response>
----
[[CollectionsAPI-balanceshardunique]]
[[balanceshardunique]]
== BALANCESHARDUNIQUE: Balance a Property Across Nodes
`/admin/collections?action=BALANCESHARDUNIQUE&collection=_collectionName_&property=_propertyName_`
@ -1607,7 +1603,7 @@ http://localhost:8983/solr/admin/collections?action=BALANCESHARDUNIQUE&collectio
Examining the clusterstate after issuing this call should show exactly one replica in each shard that has this property.
[[CollectionsAPI-rebalanceleaders]]
[[rebalanceleaders]]
== REBALANCELEADERS: Rebalance Leaders
Reassigns leaders in a collection according to the preferredLeader property across active nodes.
@ -1709,10 +1705,7 @@ The replica in the "inactivePreferreds" section had the `preferredLeader` proper
Examining the clusterstate after issuing this call should show that every live node that has the `preferredLeader` property should also have the "leader" property set to _true_.
[[CollectionsAPI-FORCELEADER_ForceShardLeader]]
[[CollectionsAPI-forceleader]]
[[forceleader]]
== FORCELEADER: Force Shard Leader
In the unlikely event of a shard losing its leader, this command can be invoked to force the election of a new leader.
@ -1729,7 +1722,7 @@ The name of the shard where leader election should occur. This parameter is requ
WARNING: This is an expert level command, and should be invoked only when regular leader election is not working. This may potentially lead to loss of data in the event that the new leader doesn't have certain updates, possibly recent ones, which were acknowledged by the old leader before going down.
[[CollectionsAPI-migratestateformat]]
[[migratestateformat]]
== MIGRATESTATEFORMAT: Migrate Cluster State
A expert level utility API to move a collection from shared `clusterstate.json` zookeeper node (created with `stateFormat=1`, the default in all Solr releases prior to 5.0) to the per-collection `state.json` stored in ZooKeeper (created with `stateFormat=2`, the current default) seamlessly without any application down-time.
@ -1742,11 +1735,11 @@ A expert level utility API to move a collection from shared `clusterstate.json`
The name of the collection to be migrated from `clusterstate.json` to its own `state.json` ZooKeeper node. This parameter is required.
`async`::
Request ID to track this action which will be <<CollectionsAPI-async,processed asynchronously>>.
Request ID to track this action which will be <<Asynchronous Calls,processed asynchronously>>.
This API is useful in migrating any collections created prior to Solr 5.0 to the more scalable cluster state format now used by default. If a collection was created in any Solr 5.x version or higher, then executing this command is not necessary.
[[CollectionsAPI-backup]]
[[backup]]
== BACKUP: Backup Collection
Backs up Solr collections and associated configurations to a shared filesystem - for example a Network File System.
@ -1761,15 +1754,15 @@ The BACKUP command will backup Solr indexes and configurations for a specified c
The name of the collection to be backed up. This parameter is required.
`location`::
The location on a shared drive for the backup command to write to. Alternately it can be set as a <<CollectionsAPI-clusterprop,cluster property>>.
The location on a shared drive for the backup command to write to. Alternately it can be set as a <<clusterprop,cluster property>>.
`async`::
Request ID to track this action which will be <<CollectionsAPI-async,processed asynchronously>>.
Request ID to track this action which will be <<Asynchronous Calls,processed asynchronously>>.
`repository`::
The name of a repository to be used for the backup. If no repository is specified then the local filesystem repository will be used automatically.
[[CollectionsAPI-restore]]
[[restore]]
== RESTORE: Restore Collection
Restores Solr indexes and associated configurations.
@ -1782,7 +1775,7 @@ The collection created will be have the same number of shards and replicas as th
While restoring, if a configSet with the same name exists in ZooKeeper then Solr will reuse that, or else it will upload the backed up configSet in ZooKeeper and use that.
You can use the collection <<CollectionsAPI-createalias,CREATEALIAS>> command to make sure clients don't need to change the endpoint to query or index against the newly restored collection.
You can use the collection <<createalias,CREATEALIAS>> command to make sure clients don't need to change the endpoint to query or index against the newly restored collection.
=== RESTORE Parameters
@ -1790,10 +1783,10 @@ You can use the collection <<CollectionsAPI-createalias,CREATEALIAS>> command to
The collection where the indexes will be restored into. This parameter is required.
`location`::
The location on a shared drive for the RESTORE command to read from. Alternately it can be set as a <<CollectionsAPI-clusterprop,cluster property>>.
The location on a shared drive for the RESTORE command to read from. Alternately it can be set as a <<clusterprop,cluster property>>.
`async`::
Request ID to track this action which will be <<CollectionsAPI-async,processed asynchronously>>.
Request ID to track this action which will be <<Asynchronous Calls,processed asynchronously>>.
`repository`::
The name of a repository to be used for the backup. If no repository is specified then the local filesystem repository will be used automatically.
@ -1814,12 +1807,11 @@ When creating collections, the shards and/or replicas are spread across all avai
If a node is not live when the CREATE operation is called, it will not get any parts of the new collection, which could lead to too many replicas being created on a single live node. Defining `maxShardsPerNode` sets a limit on the number of replicas CREATE will spread to each node. If the entire collection can not be fit into the live nodes, no collection will be created at all.
`autoAddReplicas`::
When set to `true`, enables auto addition of replicas on shared file systems. See the section <<running-solr-on-hdfs.adoc#RunningSolronHDFS-AutomaticallyAddReplicasinSolrCloud,Automatically Add Replicas in SolrCloud>> for more details on settings and overrides.
When set to `true`, enables auto addition of replicas on shared file systems. See the section <<running-solr-on-hdfs.adoc#automatically-add-replicas-in-solrcloud,Automatically Add Replicas in SolrCloud>> for more details on settings and overrides.
`property._name_=_value_`::
Set core property _name_ to _value_. See the section <<defining-core-properties.adoc#defining-core-properties,Defining core.properties>> for details on supported properties and values.
[[CollectionsAPI-deletenode]]
== DELETENODE: Delete Replicas in a Node
Deletes all replicas of all collections in that node. Please note that the node itself will remain as a live node after this operation.
@ -1828,12 +1820,12 @@ Deletes all replicas of all collections in that node. Please note that the node
=== DELETENODE Parameters
`node`:: string |Yes |The node to be removed. This parameter is required.
`node`::
The node to be removed. This parameter is required.
`async`::
Request ID to track this action which will be <<CollectionsAPI-async,processed asynchronously>>.
Request ID to track this action which will be <<Asynchronous Calls,processed asynchronously>>.
[[CollectionsAPI-replacenode]]
== REPLACENODE: Move All Replicas in a Node to Another
This command recreates replicas in one node (the source) to another node (the target). After each replica is copied, the replicas in the source node are deleted.
@ -1854,7 +1846,7 @@ The target node where replicas will be copied. This parameter is required.
If this flag is set to `true`, all replicas are created in separate threads. Keep in mind that this can lead to very high network and disk I/O if the replicas have very large indices. The default is `false`.
`async`::
Request ID to track this action which will be <<CollectionsAPI-async,processed asynchronously>>.
Request ID to track this action which will be <<Asynchronous Calls,processed asynchronously>>.
`timeout`::
Time in seconds to wait until new replicas are created, and until leader replicas are fully recovered. The default is `300`, or 5 minutes.
@ -1864,7 +1856,6 @@ Time in seconds to wait until new replicas are created, and until leader replica
This operation does not hold necessary locks on the replicas that belong to on the source node. So don't perform other collection operations in this period.
====
[[CollectionsAPI-movereplica]]
== MOVEREPLICA: Move a Replica to a New Node
This command moves a replica from one node to a new node. In case of shared filesystems the `dataDir` will be reused.
@ -1889,12 +1880,11 @@ The name of the node that contains the replica. This parameter is required.
The name of the destination node. This parameter is required.
`async`::
Request ID to track this action which will be <<CollectionsAPI-async,processed asynchronously>>.
Request ID to track this action which will be <<Asynchronous Calls,processed asynchronously>>.
[[CollectionsAPI-async]]
== Asynchronous Calls
Since some collection API calls can be long running tasks (such as SPLITSHARD), you can optionally have the calls run asynchronously. Specifying `async=<request-id>` enables you to make an asynchronous call, the status of which can be requested using the <<CollectionsAPI-requeststatus,REQUESTSTATUS>> call at any time.
Since some collection API calls can be long running tasks (such as SPLITSHARD), you can optionally have the calls run asynchronously. Specifying `async=<request-id>` enables you to make an asynchronous call, the status of which can be requested using the <<requeststatus,REQUESTSTATUS>> call at any time.
As of now, REQUESTSTATUS does not automatically clean up the tracking data structures, meaning the status of completed or failed tasks stays stored in ZooKeeper unless cleared manually. DELETESTATUS can be used to clear the stored statuses. However, there is a limit of 10,000 on the number of async call responses stored in a cluster.

View File

@ -36,6 +36,6 @@ image::images/collections-core-admin/collection-admin.png[image,width=653,height
Replicas can be deleted by clicking the red "X" next to the replica name.
If the shard is inactive, for example after a <<collections-api.adoc#CollectionsAPI-splitshard,SPLITSHARD action>>, an option to delete the shard will appear as a red "X" next to the shard name.
If the shard is inactive, for example after a <<collections-api.adoc#splitshard,SPLITSHARD action>>, an option to delete the shard will appear as a red "X" next to the shard name.
image::images/collections-core-admin/DeleteShard.png[image,width=486,height=250]

View File

@ -36,7 +36,6 @@ The `zkcli.sh` provided by Solr is not the same as the https://zookeeper.apache.
ZooKeeper's `zkCli.sh` provides a completely general, application-agnostic shell for manipulating data in ZooKeeper. Solr's `zkcli.sh` discussed in this section is specific to Solr, and has command line arguments specific to dealing with Solr data in ZooKeeper.
====
[[CommandLineUtilities-UsingSolr_sZooKeeperCLI]]
== Using Solr's ZooKeeper CLI
Use the `help` option to get a list of available commands from the script itself, as in `./server/scripts/cloud-scrips/zkcli.sh help`.
@ -91,23 +90,20 @@ The short form parameter options may be specified with a single dash (eg: `-c my
The long form parameter options may be specified using either a single dash (eg: `-collection mycollection`) or a double dash (eg: `--collection mycollection`)
====
[[CommandLineUtilities-ZooKeeperCLIExamples]]
== ZooKeeper CLI Examples
Below are some examples of using the `zkcli.sh` CLI which assume you have already started the SolrCloud example (`bin/solr -e cloud -noprompt`)
If you are on Windows machine, simply replace `zkcli.sh` with `zkcli.bat` in these examples.
[[CommandLineUtilities-Uploadaconfigurationdirectory]]
=== Upload a configuration directory
=== Upload a Configuration Directory
[source,bash]
----
./server/scripts/cloud-scripts/zkcli.sh -zkhost 127.0.0.1:9983 -cmd upconfig -confname my_new_config -confdir server/solr/configsets/_default/conf
----
[[CommandLineUtilities-BootstrapZooKeeperfromexistingSOLR_HOME]]
=== Bootstrap ZooKeeper from existing SOLR_HOME
=== Bootstrap ZooKeeper from an Existing solr.home
[source,bash]
----
@ -120,32 +116,28 @@ If you are on Windows machine, simply replace `zkcli.sh` with `zkcli.bat` in the
Using the boostrap command with a zookeeper chroot in the `-zkhost` parameter, e.g. `-zkhost 127.0.0.1:2181/solr`, will automatically create the chroot path before uploading the configs.
====
[[CommandLineUtilities-PutarbitrarydataintoanewZooKeeperfile]]
=== Put arbitrary data into a new ZooKeeper file
=== Put Arbitrary Data into a New ZooKeeper file
[source,bash]
----
./server/scripts/cloud-scripts/zkcli.sh -zkhost 127.0.0.1:9983 -cmd put /my_zk_file.txt 'some data'
----
[[CommandLineUtilities-PutalocalfileintoanewZooKeeperfile]]
=== Put a local file into a new ZooKeeper file
=== Put a Local File into a New ZooKeeper File
[source,bash]
----
./server/scripts/cloud-scripts/zkcli.sh -zkhost 127.0.0.1:9983 -cmd putfile /my_zk_file.txt /tmp/my_local_file.txt
----
[[CommandLineUtilities-Linkacollectiontoaconfigurationset]]
=== Link a collection to a configuration set
=== Link a Collection to a ConfigSet
[source,bash]
----
./server/scripts/cloud-scripts/zkcli.sh -zkhost 127.0.0.1:9983 -cmd linkconfig -collection gettingstarted -confname my_new_config
----
[[CommandLineUtilities-CreateanewZooKeeperpath]]
=== Create a new ZooKeeper path
=== Create a New ZooKeeper Path
This can be useful to create a chroot path in ZooKeeper before first cluster start.
@ -154,13 +146,11 @@ This can be useful to create a chroot path in ZooKeeper before first cluster sta
./server/scripts/cloud-scripts/zkcli.sh -zkhost 127.0.0.1:2181 -cmd makepath /solr
----
[[CommandLineUtilities-Setaclusterproperty]]
=== Set a cluster property
=== Set a Cluster Property
This command will add or modify a single cluster property in `clusterprops.json`. Use this command instead of the usual getfile \-> edit \-> putfile cycle.
Unlike the CLUSTERPROP command on the <<collections-api.adoc#CollectionsAPI-clusterprop,Collections API>>, this command does *not* require a running Solr cluster.
Unlike the CLUSTERPROP command on the <<collections-api.adoc#clusterprop,Collections API>>, this command does *not* require a running Solr cluster.
[source,bash]
----

View File

@ -20,7 +20,7 @@
Several query parsers share supported query parameters.
The table below summarizes Solr's common query parameters, which are supported by the <<requesthandlers-and-searchcomponents-in-solrconfig#RequestHandlersandSearchComponentsinSolrConfig-SearchHandlers,Search RequestHandlers>>
The table below summarizes Solr's common query parameters, which are supported by the <<requesthandlers-and-searchcomponents-in-solrconfig#searchhandlers,Search RequestHandlers>>
// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
@ -249,7 +249,7 @@ As this check is periodically performed, the actual time for which a request can
This parameter may be set to either true or false.
If set to true, and if <<indexconfig-in-solrconfig.adoc#IndexConfiginSolrConfig-mergePolicyFactory,the mergePolicyFactory>> for this collection is a {solr-javadocs}/solr-core/org/apache/solr/index/SortingMergePolicyFactory.html[`SortingMergePolicyFactory`] which uses a `sort` option which is compatible with <<CommonQueryParameters-ThesortParameter,the sort parameter>> specified for this query, then Solr will attempt to use an {lucene-javadocs}/core/org/apache/lucene/search/EarlyTerminatingSortingCollector.html[`EarlyTerminatingSortingCollector`].
If set to true, and if <<indexconfig-in-solrconfig.adoc#mergepolicyfactory,the mergePolicyFactory>> for this collection is a {solr-javadocs}/solr-core/org/apache/solr/index/SortingMergePolicyFactory.html[`SortingMergePolicyFactory`] which uses a `sort` option which is compatible with <<CommonQueryParameters-ThesortParameter,the sort parameter>> specified for this query, then Solr will attempt to use an {lucene-javadocs}/core/org/apache/lucene/search/EarlyTerminatingSortingCollector.html[`EarlyTerminatingSortingCollector`].
If early termination is used, a `segmentTerminatedEarly` header will be included in the `responseHeader`.

View File

@ -24,15 +24,13 @@ This feature is enabled by default and works similarly in both SolrCloud and sta
When using this API, `solrconfig.xml` is not changed. Instead, all edited configuration is stored in a file called `configoverlay.json`. The values in `configoverlay.json` override the values in `solrconfig.xml`.
[[ConfigAPI-APIEntryPoints]]
== API Entry Points
== Config API Entry Points
* `/config`: retrieve or modify the config. GET to retrieve and POST for executing commands
* `/config/overlay`: retrieve the details in the `configoverlay.json` alone
* `/config/params` : allows creating parameter sets that can override or take the place of parameters defined in `solrconfig.xml`. See the <<request-parameters-api.adoc#request-parameters-api,Request Parameters API>> section for more details.
[[ConfigAPI-Retrievingtheconfig]]
== Retrieving the config
== Retrieving the Config
All configuration items, can be retrieved by sending a GET request to the `/config` endpoint - the results will be the effective configuration resulting from merging settings in `configoverlay.json` with those in `solrconfig.xml`:
@ -55,18 +53,16 @@ To further restrict returned results to a single component within a top level se
curl http://localhost:8983/solr/techproducts/config/requestHandler?componentName=/select
----
[[ConfigAPI-Commandstomodifytheconfig]]
== Commands to modify the config
== Commands to Modify the Config
This API uses specific commands to tell Solr what property or type of property to add to `configoverlay.json`. The commands are passed as part of the data sent with the request.
The config commands are categorized into 3 different sections which manipulate various data structures in `solrconfig.xml`. Each of these is described below.
* <<ConfigAPI-CommandsforCommonProperties,Common Properties>>
* <<ConfigAPI-CommandsforCustomHandlersandLocalComponents,Components>>
* <<ConfigAPI-CommandsforUser-DefinedProperties,User-defined properties>>
* <<Commands for Common Properties,Common Properties>>
* <<Commands for Custom Handlers and Local Components,Components>>
* <<Commands for User-Defined Properties,User-defined properties>>
[[ConfigAPI-CommandsforCommonProperties]]
=== Commands for Common Properties
The common properties are those that are frequently need to be customized in a Solr instance. They are manipulated with two commands:
@ -120,7 +116,6 @@ The properties that are configured with these commands are predefined and listed
* `requestDispatcher.requestParsers.enableStreamBody`
* `requestDispatcher.requestParsers.addHttpRequestToContext`
[[ConfigAPI-CommandsforCustomHandlersandLocalComponents]]
=== Commands for Custom Handlers and Local Components
Custom request handlers, search components, and other types of localized Solr components (such as custom query parsers, update processors, etc.) can be added, updated and deleted with specific commands for the component being modified.
@ -133,7 +128,6 @@ Settings removed from `configoverlay.json` are not removed from `solrconfig.xml`
The full list of available commands follows below:
[[ConfigAPI-GeneralPurposeCommands]]
==== General Purpose Commands
These commands are the most commonly used:
@ -151,7 +145,6 @@ These commands are the most commonly used:
* `update-queryresponsewriter`
* `delete-queryresponsewriter`
[[ConfigAPI-AdvancedCommands]]
==== Advanced Commands
These commands allow registering more advanced customizations to Solr:
@ -179,9 +172,8 @@ These commands allow registering more advanced customizations to Solr:
* `update-runtimelib`
* `delete-runtimelib`
See the section <<ConfigAPI-CreatingandUpdatingRequestHandlers,Creating and Updating Request Handlers>> below for examples of using these commands.
See the section <<Creating and Updating Request Handlers>> below for examples of using these commands.
[[ConfigAPI-Whatabout_updateRequestProcessorChain_]]
==== What about updateRequestProcessorChain?
The Config API does not let you create or edit `updateRequestProcessorChain` elements. However, it is possible to create `updateProcessor` entries and can use them by name to create a chain.
@ -198,7 +190,6 @@ curl http://localhost:8983/solr/techproducts/config -H 'Content-type:application
You can use this directly in your request by adding a parameter in the `updateRequestProcessorChain` for the specific update processor called `processor=firstFld`.
[[ConfigAPI-CommandsforUser-DefinedProperties]]
=== Commands for User-Defined Properties
Solr lets users templatize the `solrconfig.xml` using the place holder format `${variable_name:default_val}`. You could set the values using system properties, for example, `-Dvariable_name= my_customvalue`. The same can be achieved during runtime using these commands:
@ -208,11 +199,10 @@ Solr lets users templatize the `solrconfig.xml` using the place holder format `$
The structure of the request is similar to the structure of requests using other commands, in the format of `"command":{"variable_name":"property_value"}`. You can add more than one variable at a time if necessary.
For more information about user-defined properties, see the section <<configuring-solrconfig-xml.adoc#Configuringsolrconfig.xml-Userdefinedpropertiesfromcore.properties,User defined properties from core.properties>>.
For more information about user-defined properties, see the section <<configuring-solrconfig-xml.adoc#user-defined-properties-in-core-properties,User defined properties in core.properties>>.
See also the section <<ConfigAPI-CreatingandUpdatingUser-DefinedProperties,Creating and Updating User-Defined Properties>> below for examples of how to use this type of command.
See also the section <<Creating and Updating User-Defined Properties>> below for examples of how to use this type of command.
[[ConfigAPI-HowtoMapsolrconfig.xmlPropertiestoJSON]]
== How to Map solrconfig.xml Properties to JSON
By using this API, you will be generating JSON representations of properties defined in `solrconfig.xml`. To understand how properties should be represented with the API, let's take a look at a few examples.
@ -364,15 +354,12 @@ Define the same properties with the Config API:
}
----
[[ConfigAPI-NameComponentsfortheConfigAPI]]
=== Name Components for the Config API
The Config API always allows changing the configuration of any component by name. However, some configurations such as `listener` or `initParams` do not require a name in `solrconfig.xml`. In order to be able to `update` and `delete` of the same item in `configoverlay.json`, the name attribute becomes mandatory.
[[ConfigAPI-Examples]]
== Examples
== Config API Examples
[[ConfigAPI-CreatingandUpdatingCommonProperties]]
=== Creating and Updating Common Properties
This change sets the `query.filterCache.autowarmCount` to 1000 items and unsets the `query.filterCache.size`.
@ -403,7 +390,6 @@ And you should get a response like this:
"size":25}}}}}
----
[[ConfigAPI-CreatingandUpdatingRequestHandlers]]
=== Creating and Updating Request Handlers
To create a request handler, we can use the `add-requesthandler` command:
@ -471,7 +457,6 @@ curl http://localhost:8983/solr/techproducts/config -H 'Content-type:application
}'
----
[[ConfigAPI-CreatingandUpdatingUser-DefinedProperties]]
=== Creating and Updating User-Defined Properties
This command sets a user property.
@ -507,14 +492,12 @@ To unset the variable, issue a command like this:
curl http://localhost:8983/solr/techproducts/config -H'Content-type:application/json' -d '{"unset-user-property" : "variable_name"}'
----
[[ConfigAPI-HowItWorks]]
== How It Works
== How the Config API Works
Every core watches the ZooKeeper directory for the configset being used with that core. In standalone mode, however, there is no watch (because ZooKeeper is not running). If there are multiple cores in the same node using the same configset, only one ZooKeeper watch is used. For instance, if the configset 'myconf' is used by a core, the node would watch `/configs/myconf`. Every write operation performed through the API would 'touch' the directory (sets an empty byte[] to trigger watches) and all watchers are notified. Every core would check if the Schema file, `solrconfig.xml` or `configoverlay.json` is modified by comparing the `znode` versions and if modified, the core is reloaded.
If `params.json` is modified, the params object is just updated without a core reload (see the section <<request-parameters-api.adoc#request-parameters-api,Request Parameters API>> for more information about `params.json`).
[[ConfigAPI-EmptyCommand]]
=== Empty Command
If an empty command is sent to the `/config` endpoint, the watch is triggered on all cores using this configset. For example:
@ -528,7 +511,6 @@ Directly editing any files without 'touching' the directory *will not* make it v
It is possible for components to watch for the configset 'touch' events by registering a listener using `SolrCore#registerConfListener()` .
[[ConfigAPI-ListeningtoconfigChanges]]
=== Listening to config Changes
Any component can register a listener using:

View File

@ -1,6 +1,7 @@
= ConfigSets API
:page-shortname: configsets-api
:page-permalink: configsets-api.html
:page-toclevels: 1
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
@ -24,45 +25,40 @@ To use a ConfigSet created with this API as the configuration for a collection,
This API can only be used with Solr running in SolrCloud mode. If you are not running Solr in SolrCloud mode but would still like to use shared configurations, please see the section <<config-sets.adoc#config-sets,Config Sets>>.
[[ConfigSetsAPI-APIEntryPoints]]
== API Entry Points
== ConfigSets API Entry Points
The base URL for all API calls is `\http://<hostname>:<port>/solr`.
* `/admin/configs?action=CREATE`: <<ConfigSetsAPI-create,create>> a ConfigSet, based on an existing ConfigSet
* `/admin/configs?action=DELETE`: <<ConfigSetsAPI-delete,delete>> a ConfigSet
* `/admin/configs?action=LIST`: <<ConfigSetsAPI-list,list>> all ConfigSets
* `/admin/configs?action=UPLOAD`: <<ConfigSetsAPI-upload,upload>> a ConfigSet
* `/admin/configs?action=CREATE`: <<configsets-create,create>> a ConfigSet, based on an existing ConfigSet
* `/admin/configs?action=DELETE`: <<configsets-delete,delete>> a ConfigSet
* `/admin/configs?action=LIST`: <<configsets-list,list>> all ConfigSets
* `/admin/configs?action=UPLOAD`: <<configsets-upload,upload>> a ConfigSet
[[ConfigSetsAPI-createCreateaConfigSet]]
[[ConfigSetsAPI-create]]
[[configsets-create]]
== Create a ConfigSet
`/admin/configs?action=CREATE&name=_name_&baseConfigSet=_baseConfigSet_`
Create a ConfigSet, based on an existing ConfigSet.
[[ConfigSetsAPI-Input]]
=== Input
=== Create ConfigSet Parameters
The following parameters are supported when creating a ConfigSet.
name:: The ConfigSet to be created. This parameter is required.
name::
The ConfigSet to be created. This parameter is required.
baseConfigSet:: The ConfigSet to copy as a base. This parameter is required.
baseConfigSet::
The ConfigSet to copy as a base. This parameter is required.
configSetProp._name_=_value_:: Any ConfigSet property from base to override.
configSetProp._name_=_value_::
Any ConfigSet property from base to override.
[[ConfigSetsAPI-Output]]
=== Output
=== Create ConfigSet Response
*Output Content*
The response will include the status of the request. If the status is anything other than "success", an error message will explain why the request failed.
The output will include the status of the request. If the status is anything other than "success", an error message will explain why the request failed.
[[ConfigSetsAPI-Examples]]
=== Examples
=== Create ConfigSet Examples
*Input*
@ -85,31 +81,23 @@ http://localhost:8983/solr/admin/configs?action=CREATE&name=myConfigSet&baseConf
</response>
----
[[ConfigSetsAPI-deleteDeleteaConfigSet]]
[[ConfigSetsAPI-delete]]
[[configsets-delete]]
== Delete a ConfigSet
`/admin/configs?action=DELETE&name=_name_`
Delete a ConfigSet
[[ConfigSetsAPI-Input.1]]
=== Input
=== Delete ConfigSet Parameters
*Query Parameters*
name::
The ConfigSet to be deleted. This parameter is required.
name:: The ConfigSet to be deleted. This parameter is required.
[[ConfigSetsAPI-Output.1]]
=== Output
*Output Content*
=== Delete ConfigSet Response
The output will include the status of the request. If the status is anything other than "success", an error message will explain why the request failed.
[[ConfigSetsAPI-Examples.1]]
=== Examples
=== Delete ConfigSet Examples
*Input*
@ -132,15 +120,14 @@ http://localhost:8983/solr/admin/configs?action=DELETE&name=myConfigSet
</response>
----
[[ConfigSetsAPI-list]]
[[configsets-list]]
== List ConfigSets
`/admin/configs?action=LIST`
Fetch the names of the ConfigSets in the cluster.
[[ConfigSetsAPI-Examples.2]]
=== Examples
=== List ConfigSet Examples
*Input*
@ -161,7 +148,7 @@ http://localhost:8983/solr/admin/configs?action=LIST&wt=json
"myConfig2"]}
----
[[ConfigSetsAPI-upload]]
[[configsets-upload]]
== Upload a ConfigSet
`/admin/configs?action=UPLOAD&name=_name_`
@ -173,22 +160,18 @@ Upload a ConfigSet, sent in as a zipped file. Please note that a ConfigSet is up
* XSLT transformer (tr parameter) cannot be used at request processing time.
* StatelessScriptUpdateProcessor does not initialize, if specified in the ConfigSet.
[[ConfigSetsAPI-Input.3]]
=== Input
=== Upload ConfigSet Parameters
name:: The ConfigSet to be created when the upload is complete. This parameter is required.
name::
The ConfigSet to be created when the upload is complete. This parameter is required.
The body of the request should contain a zipped config set.
[[ConfigSetsAPI-Output.3]]
=== Output
*Output Content*
=== Upload ConfigSet Response
The output will include the status of the request. If the status is anything other than "success", an error message will explain why the request failed.
[[ConfigSetsAPI-Examples.3]]
=== Examples
=== Upload ConfigSet Examples
Create a ConfigSet named 'myConfigSet' based on a 'predefinedTemplate' ConfigSet, overriding the immutable property to false.

View File

@ -25,7 +25,6 @@ Solr logs are a key way to know what's happening in the system. There are severa
In addition to the logging options described below, there is a way to configure which request parameters (such as parameters sent as part of queries) are logged with an additional request parameter called `logParamsList`. See the section on <<common-query-parameters.adoc#CommonQueryParameters-ThelogParamsListParameter,Common Query Parameters>> for more information.
====
[[ConfiguringLogging-TemporaryLoggingSettings]]
== Temporary Logging Settings
You can control the amount of logging output in Solr by using the Admin Web interface. Select the *LOGGING* link. Note that this page only lets you change settings in the running system and is not saved for the next run. (For more information about the Admin Web interface, see <<using-the-solr-administration-user-interface.adoc#using-the-solr-administration-user-interface,Using the Solr Administration User Interface>>.)
@ -59,7 +58,6 @@ Log levels settings are as follows:
Multiple settings at one time are allowed.
[[ConfiguringLogging-LoglevelAPI]]
=== Log level API
There is also a way of sending REST commands to the logging endpoint to do the same. Example:
@ -70,7 +68,6 @@ There is also a way of sending REST commands to the logging endpoint to do the s
curl -s http://localhost:8983/solr/admin/info/logging --data-binary "set=root:WARN&wt=json"
----
[[ConfiguringLogging-ChoosingLogLevelatStartup]]
== Choosing Log Level at Startup
You can temporarily choose a different logging level as you start Solr. There are two ways:
@ -87,7 +84,6 @@ bin/solr start -f -v
bin/solr start -f -q
----
[[ConfiguringLogging-PermanentLoggingSettings]]
== Permanent Logging Settings
Solr uses http://logging.apache.org/log4j/1.2/[Log4J version 1.2] for logging which is configured using `server/resources/log4j.properties`. Take a moment to inspect the contents of the `log4j.properties` file so that you are familiar with its structure. By default, Solr log messages will be written to `SOLR_LOGS_DIR/solr.log`.
@ -109,7 +105,6 @@ On every startup of Solr, the start script will clean up old logs and rotate the
You can disable the automatic log rotation at startup by changing the setting `SOLR_LOG_PRESTART_ROTATION` found in `bin/solr.in.sh` or `bin/solr.in.cmd` to false.
[[ConfiguringLogging-LoggingSlowQueries]]
== Logging Slow Queries
For high-volume search applications, logging every query can generate a large amount of logs and, depending on the volume, potentially impact performance. If you mine these logs for additional insights into your application, then logging every query request may be useful.

View File

@ -51,14 +51,12 @@ We've covered the options in the following sections:
* <<update-request-processors.adoc#update-request-processors,Update Request Processors>>
* <<codec-factory.adoc#codec-factory,Codec Factory>>
[[Configuringsolrconfig.xml-SubstitutingPropertiesinSolrConfigFiles]]
== Substituting Properties in Solr Config Files
Solr supports variable substitution of property values in config files, which allows runtime specification of various configuration options in `solrconfig.xml`. The syntax is `${propertyname[:option default value]`}. This allows defining a default that can be overridden when Solr is launched. If a default value is not specified, then the property _must_ be specified at runtime or the configuration file will generate an error when parsed.
There are multiple methods for specifying properties that can be used in configuration files. Of those below, strongly consider "config overlay" as the preferred approach, as it stays local to the config set and because it's easy to modify.
[[Configuringsolrconfig.xml-JVMSystemProperties]]
=== JVM System Properties
Any JVM System properties, usually specified using the `-D` flag when starting the JVM, can be used as variables in any XML configuration file in Solr.
@ -79,8 +77,7 @@ bin/solr start -Dsolr.lock.type=none
In general, any Java system property that you want to set can be passed through the `bin/solr` script using the standard `-Dproperty=value` syntax. Alternatively, you can add common system properties to the `SOLR_OPTS` environment variable defined in the Solr include file (`bin/solr.in.sh` or `bin/solr.in.cmd`). For more information about how the Solr include file works, refer to: <<taking-solr-to-production.adoc#taking-solr-to-production,Taking Solr to Production>>.
[[Configuringsolrconfig.xml-ConfigAPI]]
=== Config API
=== Config API to Override solrconfig.xml
The <<config-api.adoc#config-api,Config API>> allows you to use an API to modify Solr's configuration, specifically user defined properties. Changes made with this API are stored in a file named `configoverlay.json`. This file should only be edited with the API, but will look like this example:
@ -94,7 +91,6 @@ The <<config-api.adoc#config-api,Config API>> allows you to use an API to modify
For more details, see the section <<config-api.adoc#config-api,Config API>>.
[[Configuringsolrconfig.xml-solrcore.properties]]
=== solrcore.properties
If the configuration directory for a Solr core contains a file named `solrcore.properties` that file can contain any arbitrary user defined property names and values using the Java standard https://en.wikipedia.org/wiki/.properties[properties file format], and those properties can be used as variables in the XML configuration files for that Solr core.
@ -120,7 +116,6 @@ The path and name of the `solrcore.properties` file can be overridden using the
====
[[Configuringsolrconfig.xml-Userdefinedpropertiesfromcore.properties]]
=== User-Defined Properties in core.properties
Every Solr core has a `core.properties` file, automatically created when using the APIs. When you create a SolrCloud collection, you can pass through custom parameters to go into each core.properties that will be created, by prefixing the parameter name with "property." as a URL parameter. Example:
@ -148,7 +143,6 @@ The `my.custom.prop` property can then be used as a variable, such as in `solrco
</requestHandler>
----
[[Configuringsolrconfig.xml-ImplicitCoreProperties]]
=== Implicit Core Properties
Several attributes of a Solr core are available as "implicit" properties that can be used in variable substitution, independent of where or how they underlying value is initialized. For example: regardless of whether the name for a particular Solr core is explicitly configured in `core.properties` or inferred from the name of the instance directory, the implicit property `solr.core.name` is available for use as a variable in that core's configuration file...

View File

@ -22,8 +22,7 @@ Content streams are bulk data passed with a request to Solr.
When Solr RequestHandlers are accessed using path based URLs, the `SolrQueryRequest` object containing the parameters of the request may also contain a list of ContentStreams containing bulk data for the request. (The name SolrQueryRequest is a bit misleading: it is involved in all requests, regardless of whether it is a query request or an update request.)
[[ContentStreams-StreamSources]]
== Stream Sources
== Content Stream Sources
Currently request handlers can get content streams in a variety of ways:
@ -34,7 +33,6 @@ Currently request handlers can get content streams in a variety of ways:
By default, curl sends a `contentType="application/x-www-form-urlencoded"` header. If you need to test a SolrContentHeader content stream, you will need to set the content type with curl's `-H` flag.
[[ContentStreams-RemoteStreaming]]
== RemoteStreaming
Remote streaming lets you send the contents of a URL as a stream to a given SolrRequestHandler. You could use remote streaming to send a remote or local file to an update plugin.
@ -65,10 +63,9 @@ curl -d '
[IMPORTANT]
====
If `enableRemoteStreaming="true"` is used, be aware that this allows _anyone_ to send a request to any URL or local file. If <<ContentStreams-DebuggingRequests,DumpRequestHandler>> is enabled, it will allow anyone to view any file on your system.
If `enableRemoteStreaming="true"` is used, be aware that this allows _anyone_ to send a request to any URL or local file. If the <<Debugging Requests,DumpRequestHandler>> is enabled, it will allow anyone to view any file on your system.
====
[[ContentStreams-DebuggingRequests]]
== Debugging Requests
The implicit "dump" RequestHandler (see <<implicit-requesthandlers.adoc#implicit-requesthandlers,Implicit RequestHandlers>>) simply outputs the contents of the SolrQueryRequest using the specified writer type `wt`. This is a useful tool to help understand what streams are available to the RequestHandlers.

View File

@ -29,7 +29,7 @@ CoreAdmin actions can be executed by via HTTP requests that specify an `action`
All action names are uppercase, and are defined in depth in the sections below.
[[CoreAdminAPI-STATUS]]
[[coreadmin-status]]
== STATUS
The `STATUS` action returns the status of all running Solr cores, or status for only the named core.
@ -44,7 +44,7 @@ The name of a core, as listed in the "name" attribute of a `<core>` element in `
`indexInfo`::
If `false`, information about the index will not be returned with a core STATUS request. In Solr implementations with a large number of cores (i.e., more than hundreds), retrieving the index information for each core can take a lot of time and isn't always required. The default is `true`.
[[CoreAdminAPI-CREATE]]
[[coreadmin-create]]
== CREATE
The `CREATE` action creates a new core and registers it.
@ -102,7 +102,7 @@ WARNING: While it's possible to create a core for a non-existent collection, thi
The shard id this core represents. Normally you want to be auto-assigned a shard id.
`property._name_=_value_`::
Sets the core property _name_ to _value_. See the section on defining <<defining-core-properties.adoc#Definingcore.properties-core.properties_files,core.properties file contents>>.
Sets the core property _name_ to _value_. See the section on defining <<defining-core-properties.adoc#defining-core-properties-files,core.properties file contents>>.
`async`::
Request ID to track this action which will be processed asynchronously.
@ -115,7 +115,7 @@ Use `collection.configName=_configname_` to point to the config for a new collec
http://localhost:8983/solr/admin/cores?action=CREATE&name=my_core&collection=my_collection&shard=shard2
[[CoreAdminAPI-RELOAD]]
[[coreadmin-reload]]
== RELOAD
The RELOAD action loads a new core from the configuration of an existing, registered Solr core. While the new core is initializing, the existing one will continue to handle requests. When the new Solr core is ready, it takes over and the old core is unloaded.
@ -134,7 +134,7 @@ RELOAD performs "live" reloads of SolrCore, reusing some existing objects. Some
`core`::
The name of the core, as listed in the "name" attribute of a `<core>` element in `solr.xml`. This parameter is required.
[[CoreAdminAPI-RENAME]]
[[coreadmin-rename]]
== RENAME
The `RENAME` action changes the name of a Solr core.
@ -153,7 +153,7 @@ The new name for the Solr core. If the persistent attribute of `<solr>` is `true
Request ID to track this action which will be processed asynchronously.
[[CoreAdminAPI-SWAP]]
[[coreadmin-swap]]
== SWAP
`SWAP` atomically swaps the names used to access two existing Solr cores. This can be used to swap new content into production. The prior core remains available and can be swapped back, if necessary. Each core will be known by the name of the other, after the swap.
@ -162,9 +162,7 @@ Request ID to track this action which will be processed asynchronously.
[IMPORTANT]
====
Do not use `SWAP` with a SolrCloud node. It is not supported and can result in the core being unusable.
====
=== SWAP Parameters
@ -179,7 +177,7 @@ The name of one of the cores to be swapped. This parameter is required.
Request ID to track this action which will be processed asynchronously.
[[CoreAdminAPI-UNLOAD]]
[[coreadmin-unload]]
== UNLOAD
The `UNLOAD` action removes a core from Solr. Active requests will continue to be processed, but no new requests will be sent to the named core. If a core is registered under more than one name, only the given name is removed.
@ -210,8 +208,7 @@ If `true`, removes everything related to the core, including the index directory
`async`::
Request ID to track this action which will be processed asynchronously.
[[CoreAdminAPI-MERGEINDEXES]]
[[coreadmin-mergeindexes]]
== MERGEINDEXES
The `MERGEINDEXES` action merges one or more indexes to another index. The indexes must have completed commits, and should be locked against writes until the merge is complete or the resulting merged index may become corrupted. The target core index must already exist and have a compatible schema with the one or more indexes that will be merged to it. Another commit on the target core should also be performed after the merge is complete.
@ -243,7 +240,7 @@ Multi-valued, source cores that would be merged.
Request ID to track this action which will be processed asynchronously
[[CoreAdminAPI-SPLIT]]
[[coreadmin-split]]
== SPLIT
The `SPLIT` action splits an index into two or more indexes. The index being split can continue to handle requests. The split pieces can be placed into a specified directory on the server's filesystem or it can be merged into running Solr cores.
@ -270,7 +267,6 @@ The key to be used for splitting the index. If this parameter is used, `ranges`
`async`::
Request ID to track this action which will be processed asynchronously.
=== SPLIT Examples
The `core` index will be split into as many pieces as the number of `path` or `targetCore` parameters.
@ -305,9 +301,9 @@ This example uses the `ranges` parameter with hash ranges 0-500, 501-1000 and 10
The `targetCore` must already exist and must have a compatible schema with the `core` index. A commit is automatically called on the `core` index before it is split.
This command is used as part of the <<collections-api.adoc#CollectionsAPI-splitshard,SPLITSHARD>> command but it can be used for non-cloud Solr cores as well. When used against a non-cloud core without `split.key` parameter, this action will split the source index and distribute its documents alternately so that each split piece contains an equal number of documents. If the `split.key` parameter is specified then only documents having the same route key will be split from the source index.
This command is used as part of the <<collections-api.adoc#splitshard,SPLITSHARD>> command but it can be used for non-cloud Solr cores as well. When used against a non-cloud core without `split.key` parameter, this action will split the source index and distribute its documents alternately so that each split piece contains an equal number of documents. If the `split.key` parameter is specified then only documents having the same route key will be split from the source index.
[[CoreAdminAPI-REQUESTSTATUS]]
[[coreadmin-requeststatus]]
== REQUESTSTATUS
Request the status of an already submitted asynchronous CoreAdmin API call.
@ -326,7 +322,7 @@ The call below will return the status of an already submitted asynchronous CoreA
[source,bash]
http://localhost:8983/solr/admin/cores?action=REQUESTSTATUS&requestid=1
[[CoreAdminAPI-REQUESTRECOVERY]]
[[coreadmin-requestrecovery]]
== REQUESTRECOVERY
The `REQUESTRECOVERY` action manually asks a core to recover by synching with the leader. This should be considered an "expert" level command and should be used in situations where the node (SorlCloud replica) is unable to become active automatically.
@ -338,7 +334,6 @@ The `REQUESTRECOVERY` action manually asks a core to recover by synching with th
`core`::
The name of the core to re-sync. This parameter is required.
[[CoreAdminAPI-Examples.1]]
=== REQUESTRECOVERY Examples
[source,bash]

View File

@ -140,8 +140,6 @@ The CDCR replication logic requires modification to the maintenance logic of the
If the communication with one of the target data center is slow, the Updates Log on the source data center can grow to a substantial size. In such a scenario, it is necessary for the Updates Log to be able to efficiently find a given update operation given its identifier. Given that its identifier is an incremental number, it is possible to implement an efficient search strategy. Each transaction log file contains as part of its filename the version number of the first element. This is used to quickly traverse all the transaction log files and find the transaction log file containing one specific version number.
[[CrossDataCenterReplication_CDCR_-Monitoring]]
=== Monitoring
CDCR provides the following monitoring capabilities over the replication operations:
@ -155,24 +153,19 @@ Information about the lifecycle and statistics will be provided on a per-shard b
The CDC Replicator is a background thread that is responsible for replicating updates from a Source data center to one or more target data centers. It is responsible in providing monitoring information on a per-shard basis. As there can be a large number of collections and shards in a cluster, we will use a fixed-size pool of CDC Replicator threads that will be shared across shards.
[[CrossDataCenterReplication_CDCR_-Limitations]]
=== Limitations
=== CDCR Limitations
The current design of CDCR has some limitations. CDCR will continue to evolve over time and many of these limitations will be addressed. Among them are:
* CDCR is unlikely to be satisfactory for bulk-load situations where the update rate is high, especially if the bandwidth between the Source and target clusters is restricted. In this scenario, the initial bulk load should be performed, the Source and target data centers synchronized and CDCR be utilized for incremental updates.
* CDCR is currently only active-passive; data is pushed from the Source cluster to the target cluster. There is active work being done in this area in the 6x code line to remove this limitation.
* CDCR works most robustly with the same number of shards in the Source and target collection. The shards in the two collections may have different numbers of replicas.
* Running CDCR with the indexes on HDFS is not currently supported, see the https://issues.apache.org/jira/browse/SOLR-9861[Solr CDCR over HDFS] JIRA issue.
[[CrossDataCenterReplication_CDCR_-Configuration]]
== Configuration
== CDCR Configuration
The source and target configurations differ in the case of the data centers being in separate clusters. "Cluster" here means separate ZooKeeper ensembles controlling disjoint Solr instances. Whether these data centers are physically separated or not is immaterial for this discussion.
[[CrossDataCenterReplication_CDCR_-SourceConfiguration]]
=== Source Configuration
Here is a sample of a source configuration file, a section in `solrconfig.xml`. The presence of the <replica> section causes CDCR to use this cluster as the Source and should not be present in the target collections in the cluster-to-cluster case. Details about each setting are after the two examples:
@ -211,8 +204,6 @@ Here is a sample of a source configuration file, a section in `solrconfig.xml`.
</updateHandler>
----
[[CrossDataCenterReplication_CDCR_-TargetConfiguration]]
=== Target Configuration
Here is a typical target configuration.
@ -256,7 +247,6 @@ The configuration details, defaults and options are as follows:
CDCR can be configured to forward update requests to one or more replicas. A replica is defined with a “replica” list as follows:
`zkHost`::
The host address for ZooKeeper of the target SolrCloud. Usually this is a comma-separated list of addresses to each node in the target ZooKeeper ensemble. This parameter is required.
@ -303,41 +293,27 @@ Monitor actions are performed at a core level, i.e., by using the following base
Currently, none of the CDCR API calls have parameters.
=== API Entry Points (Control)
* `<collection>/cdcr?action=STATUS`: <<CrossDataCenterReplication_CDCR_-STATUS,Returns the current state>> of CDCR.
* `<collection>/cdcr?action=START`: <<CrossDataCenterReplication_CDCR_-START,Starts CDCR>> replication
* `<collection>/cdcr?action=STOP`: <<CrossDataCenterReplication_CDCR_-STOP,Stops CDCR>> replication.
* `<collection>/cdcr?action=ENABLEBUFFER`: <<CrossDataCenterReplication_CDCR_-ENABLEBUFFER,Enables the buffering>> of updates.
* `<collection>/cdcr?action=DISABLEBUFFER`: <<CrossDataCenterReplication_CDCR_-DISABLEBUFFER,Disables the buffering>> of updates.
* `<collection>/cdcr?action=STATUS`: <<CDCR STATUS,Returns the current state>> of CDCR.
* `<collection>/cdcr?action=START`: <<CDCR START,Starts CDCR>> replication
* `<collection>/cdcr?action=STOP`: <<CDCR STOP,Stops CDCR>> replication.
* `<collection>/cdcr?action=ENABLEBUFFER`: <<ENABLEBUFFER,Enables the buffering>> of updates.
* `<collection>/cdcr?action=DISABLEBUFFER`: <<DISABLEBUFFER,Disables the buffering>> of updates.
=== API Entry Points (Monitoring)
* `core/cdcr?action=QUEUES`: <<CrossDataCenterReplication_CDCR_-QUEUES,Fetches statistics about the queue>> for each replica and about the update logs.
* `core/cdcr?action=OPS`: <<CrossDataCenterReplication_CDCR_-OPS,Fetches statistics about the replication performance>> (operations per second) for each replica.
* `core/cdcr?action=ERRORS`: <<CrossDataCenterReplication_CDCR_-ERRORS,Fetches statistics and other information about replication errors>> for each replica.
* `core/cdcr?action=QUEUES`: <<QUEUES,Fetches statistics about the queue>> for each replica and about the update logs.
* `core/cdcr?action=OPS`: <<OPS,Fetches statistics about the replication performance>> (operations per second) for each replica.
* `core/cdcr?action=ERRORS`: <<ERRORS,Fetches statistics and other information about replication errors>> for each replica.
=== Control Commands
[[CrossDataCenterReplication_CDCR_-STATUS]]
==== STATUS
==== CDCR STATUS
`/collection/cdcr?action=STATUS`
===== Input
*Query Parameters:* There are no parameters to this command.
===== Output
*Output Content*
The current state of the CDCR, which includes the state of the replication process and the state of the buffer.
[[cdcr_examples]]
===== Examples
===== CDCR Status Example
*Input*
@ -362,22 +338,15 @@ The current state of the CDCR, which includes the state of the replication proce
}
----
[[CrossDataCenterReplication_CDCR_-ENABLEBUFFER]]
==== ENABLEBUFFER
`/collection/cdcr?action=ENABLEBUFFER`
===== Input
===== Enable Buffer Response
*Query Parameters:* There are no parameters to this command.
The status of the process and an indication of whether the buffer is enabled.
===== Output
*Output Content*
The status of the process and an indication of whether the buffer is enabled
===== Examples
===== Enable Buffer Example
*Input*
@ -402,20 +371,15 @@ The status of the process and an indication of whether the buffer is enabled
}
----
[[CrossDataCenterReplication_CDCR_-DISABLEBUFFER]]
==== DISABLEBUFFER
`/collection/cdcr?action=DISABLEBUFFER`
===== Input
===== Disable Buffer Response
*Query Parameters:* There are no parameters to this command
The status of CDCR and an indication that the buffer is disabled.
===== Output
*Output Content:* The status of CDCR and an indication that the buffer is disabled.
===== Examples
===== Disable Buffer Example
*Input*
@ -440,20 +404,15 @@ http://host:8983/solr/<collection_name>/cdcr?action=DISABLEBUFFER
}
----
[[CrossDataCenterReplication_CDCR_-START]]
==== START
==== CDCR START
`/collection/cdcr?action=START`
===== Input
===== CDCR Start Response
*Query Parameters:* There are no parameters for this action
Confirmation that CDCR is started and the status of buffering
===== Output
*Output Content:* Confirmation that CDCR is started and the status of buffering
===== Examples
===== CDCR Start Examples
*Input*
@ -478,20 +437,15 @@ http://host:8983/solr/<collection_name>/cdcr?action=START
}
----
[[CrossDataCenterReplication_CDCR_-STOP]]
==== STOP
==== CDCR STOP
`/collection/cdcr?action=STOP`
===== Input
===== CDCR Stop Response
*Query Parameters:* There are no parameters for this command.
The status of CDCR, including the confirmation that CDCR is stopped.
===== Output
*Output Content:* The status of CDCR, including the confirmation that CDCR is stopped
===== Examples
===== CDCR Stop Examples
*Input*
@ -517,19 +471,13 @@ http://host:8983/solr/<collection_name>/cdcr?action=START
----
[[CrossDataCenterReplication_CDCR_-Monitoringcommands]]
=== Monitoring commands
=== CDCR Monitoring Commands
[[CrossDataCenterReplication_CDCR_-QUEUES]]
==== QUEUES
`/core/cdcr?action=QUEUES`
===== Input
*Query Parameters:* There are no parameters for this command
===== Output
===== QUEUES Response
*Output Content*
@ -537,7 +485,7 @@ The output is composed of a list “queues” which contains a list of (ZooKeepe
The “queues” object also contains information about the updates log, such as the size (in bytes) of the updates log on disk (“tlogTotalSize”), the number of transaction log files (“tlogTotalCount”) and the status of the updates log synchronizer (“updateLogSynchronizer”).
===== Examples
===== QUEUES Examples
*Input*
@ -569,20 +517,15 @@ The “queues” object also contains information about the updates log, such as
}
----
[[CrossDataCenterReplication_CDCR_-OPS]]
==== OPS
`/core/cdcr?action=OPS`
===== Input
===== OPS Response
*Query Parameters:* There are no parameters for this command.
The output is composed of `operationsPerSecond` which contains a list of (ZooKeeper) target hosts, themselves containing a list of target collections. For each collection, the average number of processed operations per second since the start of the replication process is provided. The operations are further broken down into two groups: add and delete operations.
===== Output
*Output Content:* The output is composed of a list “operationsPerSecond” which contains a list of (ZooKeeper) target hosts, themselves containing a list of target collections. For each collection, the average number of processed operations per second since the start of the replication process is provided. The operations are further broken down into two groups: add and delete operations.
===== Examples
===== OPS Examples
*Input*
@ -612,20 +555,15 @@ The “queues” object also contains information about the updates log, such as
}
----
[[CrossDataCenterReplication_CDCR_-ERRORS]]
==== ERRORS
`/core/cdcr?action=ERRORS`
===== Input
===== ERRORS Response
*Query Parameters:* There are no parameters for this command.
The output is composed of a list “errors” which contains a list of (ZooKeeper) target hosts, themselves containing a list of target collections. For each collection, information about errors encountered during the replication is provided, such as the number of consecutive errors encountered by the replicator thread, the number of bad requests or internal errors since the start of the replication process, and a list of the last errors encountered ordered by timestamp.
===== Output
*Output Content:* The output is composed of a list “errors” which contains a list of (ZooKeeper) target hosts, themselves containing a list of target collections. For each collection, information about errors encountered during the replication is provided, such as the number of consecutive errors encountered by the replicator thread, the number of bad requests or internal errors since the start of the replication process, and a list of the last errors encountered ordered by timestamp.
===== Examples
===== ERRORS Examples
*Input*
@ -728,7 +666,6 @@ http://host:port/solr/collection_name/cdcr?action=DISABLEBUFFER
+
* Renable indexing
[[CrossDataCenterReplication_CDCR_-Monitoring.1]]
== Monitoring
. Network and disk space monitoring are essential. Ensure that the system has plenty of available storage to queue up changes if there is a disconnect between the Source and Target. A network outage between the two data centers can cause your disk usage to grow.
@ -763,8 +700,3 @@ curl http://<Source>/solr/cloud1/update -H 'Content-type:application/json' -d '[
#check the Target
curl "http://<Target>:8983/solr/<collection_name>/select?q=SKU:ABC&wt=json&indent=true"
----
[[CrossDataCenterReplication_CDCR_-Limitations.1]]
== Limitations
* Running CDCR with the indexes on HDFS is not currently supported, see: https://issues.apache.org/jira/browse/SOLR-9861[Solr CDCR over HDFS].

View File

@ -35,7 +35,6 @@ If you are using replication to replicate the Solr index (as described in <<lega
NOTE: If the environment variable `SOLR_DATA_HOME` if defined, or if `solr.data.home` is configured for your DirectoryFactory, the location of data directory will be `<SOLR_DATA_HOME>/<instance_name>/data`.
[[DataDirandDirectoryFactoryinSolrConfig-SpecifyingtheDirectoryFactoryForYourIndex]]
== Specifying the DirectoryFactory For Your Index
The default {solr-javadocs}/solr-core/org/apache/solr/core/StandardDirectoryFactory.html[`solr.StandardDirectoryFactory`] is filesystem based, and tries to pick the best implementation for the current JVM and platform. You can force a particular implementation and/or config options by specifying {solr-javadocs}/solr-core/org/apache/solr/core/MMapDirectoryFactory.html[`solr.MMapDirectoryFactory`], {solr-javadocs}/solr-core/org/apache/solr/core/NIOFSDirectoryFactory.html[`solr.NIOFSDirectoryFactory`], or {solr-javadocs}/solr-core/org/apache/solr/core/SimpleFSDirectoryFactory.html[`solr.SimpleFSDirectoryFactory`].
@ -57,7 +56,5 @@ The {solr-javadocs}/solr-core/org/apache/solr/core/RAMDirectoryFactory.html[`sol
[NOTE]
====
If you are using Hadoop and would like to store your indexes in HDFS, you should use the {solr-javadocs}/solr-core/org/apache/solr/core/HdfsDirectoryFactory.html[`solr.HdfsDirectoryFactory`] instead of either of the above implementations. For more details, see the section <<running-solr-on-hdfs.adoc#running-solr-on-hdfs,Running Solr on HDFS>>.
====

View File

@ -23,7 +23,6 @@ The Dataimport screen shows the configuration of the DataImportHandler (DIH) and
.The Dataimport Screen
image::images/dataimport-screen/dataimport.png[image,width=485,height=250]
This screen also lets you adjust various options to control how the data is imported to Solr, and view the data import configuration file that controls the import.
For more information about data importing with DIH, see the section on <<uploading-structured-data-store-data-with-the-data-import-handler.adoc#uploading-structured-data-store-data-with-the-data-import-handler,Uploading Structured Data Store Data with the Data Import Handler>>.

View File

@ -26,7 +26,6 @@ Preventing duplicate or near duplicate documents from entering an index or taggi
* Lookup3Signature: 64-bit hash used for exact duplicate detection. This is much faster than MD5 and smaller to index.
* http://wiki.apache.org/solr/TextProfileSignature[TextProfileSignature]: Fuzzy hashing implementation from Apache Nutch for near duplicate detection. It's tunable but works best on longer text.
Other, more sophisticated algorithms for fuzzy/near hashing can be added later.
[IMPORTANT]
@ -36,12 +35,10 @@ Adding in the de-duplication process will change the `allowDups` setting so that
Of course the `signatureField` could be the unique field, but generally you want the unique field to be unique. When a document is added, a signature will automatically be generated and attached to the document in the specified `signatureField`.
====
[[De-Duplication-ConfigurationOptions]]
== Configuration Options
There are two places in Solr to configure de-duplication: in `solrconfig.xml` and in `schema.xml`.
[[De-Duplication-Insolrconfig.xml]]
=== In solrconfig.xml
The `SignatureUpdateProcessorFactory` has to be registered in `solrconfig.xml` as part of an <<update-request-processors.adoc#update-request-processors,Update Request Processor Chain>>, as in this example:
@ -84,8 +81,6 @@ Set to *false* to disable de-duplication processing. The default is *true*.
overwriteDupes::
If true, the default, when a document exists that already matches this signature, it will be overwritten.
[[De-Duplication-Inschema.xml]]
=== In schema.xml
If you are using a separate field for storing the signature, you must have it indexed:

View File

@ -29,7 +29,6 @@ A minimal `core.properties` file looks like the example below. However, it can a
name=my_core_name
----
[[Definingcore.properties-Placementofcore.properties]]
== Placement of core.properties
Solr cores are configured by placing a file named `core.properties` in a sub-directory under `solr.home`. There are no a-priori limits to the depth of the tree, nor are there limits to the number of cores that can be defined. Cores may be anywhere in the tree with the exception that cores may _not_ be defined under an existing core. That is, the following is not allowed:
@ -61,11 +60,8 @@ Your `core.properties` file can be empty if necessary. Suppose `core.properties`
You can run Solr without configuring any cores.
====
[[Definingcore.properties-Definingcore.propertiesFiles]]
== Defining core.properties Files
[[Definingcore.properties-core.properties_files]]
The minimal `core.properties` file is an empty file, in which case all of the properties are defaulted appropriately.
Java properties files allow the hash (`#`) or bang (`!`) characters to specify comment-to-end-of-line.
@ -98,4 +94,4 @@ The following properties are available:
`roles`:: Future parameter for SolrCloud or a way for users to mark nodes for their own use.
Additional user-defined properties may be specified for use as variables. For more information on how to define local properties, see the section <<configuring-solrconfig-xml.adoc#Configuringsolrconfig.xml-SubstitutingPropertiesinSolrConfigFiles,Substituting Properties in Solr Config Files>>.
Additional user-defined properties may be specified for use as variables. For more information on how to define local properties, see the section <<configuring-solrconfig-xml.adoc#substituting-properties-in-solr-config-files,Substituting Properties in Solr Config Files>>.

View File

@ -20,8 +20,7 @@
Fields are defined in the fields element of `schema.xml`. Once you have the field types set up, defining the fields themselves is simple.
[[DefiningFields-Example]]
== Example
== Example Field Definition
The following example defines a field named `price` with a type named `float` and a default value of `0.0`; the `indexed` and `stored` properties are explicitly set to `true`, while any other properties specified on the `float` field type are inherited.
@ -30,7 +29,6 @@ The following example defines a field named `price` with a type named `float` an
<field name="price" type="float" default="0.0" indexed="true" stored="true"/>
----
[[DefiningFields-FieldProperties]]
== Field Properties
Field definitions can have the following properties:
@ -44,7 +42,6 @@ The name of the `fieldType` for this field. This will be found in the `name` att
`default`::
A default value that will be added automatically to any document that does not have a value in this field when it is indexed. If this property is not specified, there is no default.
[[DefiningFields-OptionalFieldTypeOverrideProperties]]
== Optional Field Type Override Properties
Fields can have many of the same properties as field types. Properties from the table below which are specified on an individual field will override any explicit value for that property specified on the the `fieldType` of the field, or any implicit default property value provided by the underlying `fieldType` implementation. The table below is reproduced from <<field-type-definitions-and-properties.adoc#field-type-definitions-and-properties,Field Type Definitions and Properties>>, which has more details:

View File

@ -31,12 +31,10 @@ For specific information on each of these language identification implementation
For more information about language analysis in Solr, see <<language-analysis.adoc#language-analysis,Language Analysis>>.
[[DetectingLanguagesDuringIndexing-ConfiguringLanguageDetection]]
== Configuring Language Detection
You can configure the `langid` UpdateRequestProcessor in `solrconfig.xml`. Both implementations take the same parameters, which are described in the following section. At a minimum, you must specify the fields for language identification and a field for the resulting language code.
[[DetectingLanguagesDuringIndexing-ConfiguringTikaLanguageDetection]]
=== Configuring Tika Language Detection
Here is an example of a minimal Tika `langid` configuration in `solrconfig.xml`:
@ -51,7 +49,6 @@ Here is an example of a minimal Tika `langid` configuration in `solrconfig.xml`:
</processor>
----
[[DetectingLanguagesDuringIndexing-ConfiguringLangDetectLanguageDetection]]
=== Configuring LangDetect Language Detection
Here is an example of a minimal LangDetect `langid` configuration in `solrconfig.xml`:
@ -66,7 +63,6 @@ Here is an example of a minimal LangDetect `langid` configuration in `solrconfig
</processor>
----
[[DetectingLanguagesDuringIndexing-langidParameters]]
== langid Parameters
As previously mentioned, both implementations of the `langid` UpdateRequestProcessor take the same parameters.

View File

@ -22,10 +22,9 @@ When a Solr node receives a search request, the request is routed behind the sce
The chosen replica acts as an aggregator: it creates internal requests to randomly chosen replicas of every shard in the collection, coordinates the responses, issues any subsequent internal requests as needed (for example, to refine facets values, or request additional stored fields), and constructs the final response for the client.
[[DistributedRequests-LimitingWhichShardsareQueried]]
== Limiting Which Shards are Queried
While one of the advantages of using SolrCloud is the ability to query very large collections distributed among various shards, in some cases <<shards-and-indexing-data-in-solrcloud.adoc#ShardsandIndexingDatainSolrCloud-DocumentRouting,you may know that you are only interested in results from a subset of your shards>>. You have the option of searching over all of your data or just parts of it.
While one of the advantages of using SolrCloud is the ability to query very large collections distributed among various shards, in some cases <<shards-and-indexing-data-in-solrcloud.adoc#document-routing,you may know that you are only interested in results from a subset of your shards>>. You have the option of searching over all of your data or just parts of it.
Querying all shards for a collection should look familiar; it's as though SolrCloud didn't even come into play:
@ -71,7 +70,6 @@ And of course, you can specify a list of shards (seperated by commas) each defin
http://localhost:8983/solr/gettingstarted/select?q=*:*&shards=shard1,localhost:7574/solr/gettingstarted|localhost:7500/solr/gettingstarted
----
[[DistributedRequests-ConfiguringtheShardHandlerFactory]]
== Configuring the ShardHandlerFactory
You can directly configure aspects of the concurrency and thread-pooling used within distributed search in Solr. This allows for finer grained control and you can tune it to target your own specific requirements. The default configuration favors throughput over latency.
@ -118,7 +116,6 @@ If specified, the thread pool will use a backing queue instead of a direct hando
`fairnessPolicy`::
Chooses the JVM specifics dealing with fair policy queuing, if enabled distributed searches will be handled in a First in First out fashion at a cost to throughput. If disabled throughput will be favored over latency. The default is `false`.
[[DistributedRequests-ConfiguringstatsCache_DistributedIDF_]]
== Configuring statsCache (Distributed IDF)
Document and term statistics are needed in order to calculate relevancy. Solr provides four implementations out of the box when it comes to document stats calculation:
@ -135,15 +132,13 @@ The implementation can be selected by setting `<statsCache>` in `solrconfig.xml`
<statsCache class="org.apache.solr.search.stats.ExactStatsCache"/>
----
[[DistributedRequests-AvoidingDistributedDeadlock]]
== Avoiding Distributed Deadlock
Each shard serves top-level query requests and then makes sub-requests to all of the other shards. Care should be taken to ensure that the max number of threads serving HTTP requests is greater than the possible number of requests from both top-level clients and other shards. If this is not the case, the configuration may result in a distributed deadlock.
For example, a deadlock might occur in the case of two shards, each with just a single thread to service HTTP requests. Both threads could receive a top-level request concurrently, and make sub-requests to each other. Because there are no more remaining threads to service requests, the incoming requests will be blocked until the other pending requests are finished, but they will not finish since they are waiting for the sub-requests. By ensuring that Solr is configured to handle a sufficient number of threads, you can avoid deadlock situations like this.
[[DistributedRequests-PreferLocalShards]]
== Prefer Local Shards
== preferLocalShards Parameter
Solr allows you to pass an optional boolean parameter named `preferLocalShards` to indicate that a distributed query should prefer local replicas of a shard when available. In other words, if a query includes `preferLocalShards=true`, then the query controller will look for local replicas to service the query instead of selecting replicas at random from across the cluster. This is useful when a query requests many fields or large fields to be returned per document because it avoids moving large amounts of data over the network when it is available locally. In addition, this feature can be useful for minimizing the impact of a problematic replica with degraded performance, as it reduces the likelihood that the degraded replica will be hit by other healthy replicas.

View File

@ -26,14 +26,12 @@ Everything on this page is specific to legacy setup of distributed search. Users
Update reorders (i.e., replica A may see update X then Y, and replica B may see update Y then X). *deleteByQuery* also handles reorders the same way, to ensure replicas are consistent. All replicas of a shard are consistent, even if the updates arrive in a different order on different replicas.
[[DistributedSearchwithIndexSharding-DistributingDocumentsacrossShards]]
== Distributing Documents across Shards
When not using SolrCloud, it is up to you to get all your documents indexed on each shard of your server farm. Solr supports distributed indexing (routing) in its true form only in the SolrCloud mode.
In the legacy distributed mode, Solr does not calculate universal term/doc frequencies. For most large-scale implementations, it is not likely to matter that Solr calculates TF/IDF at the shard level. However, if your collection is heavily skewed in its distribution across servers, you may find misleading relevancy results in your searches. In general, it is probably best to randomly distribute documents to your shards.
[[DistributedSearchwithIndexSharding-ExecutingDistributedSearcheswiththeshardsParameter]]
== Executing Distributed Searches with the shards Parameter
If a query request includes the `shards` parameter, the Solr server distributes the request across all the shards listed as arguments to the parameter. The `shards` parameter uses this syntax:
@ -63,7 +61,6 @@ The following components support distributed search:
* The *Stats* component, which returns simple statistics for numeric fields within the DocSet.
* The *Debug* component, which helps with debugging.
[[DistributedSearchwithIndexSharding-LimitationstoDistributedSearch]]
== Limitations to Distributed Search
Distributed searching in Solr has the following limitations:
@ -78,12 +75,10 @@ Distributed searching in Solr has the following limitations:
Formerly a limitation was that TF/IDF relevancy computations only used shard-local statistics. This is still the case by default. If your data isn't randomly distributed, or if you want more exact statistics, then remember to configure the ExactStatsCache.
[[DistributedSearchwithIndexSharding-AvoidingDistributedDeadlock]]
== Avoiding Distributed Deadlock
== Avoiding Distributed Deadlock with Distributed Search
Like in SolrCloud mode, inter-shard requests could lead to a distributed deadlock. It can be avoided by following the instructions in the section <<distributed-requests.adoc#distributed-requests,Distributed Requests>>.
[[DistributedSearchwithIndexSharding-TestingIndexShardingonTwoLocalServers]]
== Testing Index Sharding on Two Local Servers
For simple functional testing, it's easiest to just set up two local Solr servers on different ports. (In a production environment, of course, these servers would be deployed on separate machines.)

View File

@ -42,28 +42,24 @@ The first step is to define the RequestHandler to use (aka, 'qt'). By default `/
Then choose the Document Type to define the type of document to load. The remaining parameters will change depending on the document type selected.
[[DocumentsScreen-JSON]]
== JSON
== JSON Documents
When using the JSON document type, the functionality is similar to using a requestHandler on the command line. Instead of putting the documents in a curl command, they can instead be input into the Document entry box. The document structure should still be in proper JSON format.
Then you can choose when documents should be added to the index (Commit Within), & whether existing documents should be overwritten with incoming documents with the same id (if this is not *true*, then the incoming documents will be dropped).
This option will only add or overwrite documents to the index; for other update tasks, see the <<DocumentsScreen-SolrCommand,Solr Command>> option.
This option will only add or overwrite documents to the index; for other update tasks, see the <<Solr Command>> option.
[[DocumentsScreen-CSV]]
== CSV
== CSV Documents
When using the CSV document type, the functionality is similar to using a requestHandler on the command line. Instead of putting the documents in a curl command, they can instead be input into the Document entry box. The document structure should still be in proper CSV format, with columns delimited and one row per document.
Then you can choose when documents should be added to the index (Commit Within), and whether existing documents should be overwritten with incoming documents with the same id (if this is not *true*, then the incoming documents will be dropped).
[[DocumentsScreen-DocumentBuilder]]
== Document Builder
The Document Builder provides a wizard-like interface to enter fields of a document
[[DocumentsScreen-FileUpload]]
== File Upload
The File Upload option allows choosing a prepared file and uploading it. If using only `/update` for the Request-Handler option, you will be limited to XML, CSV, and JSON.
@ -72,18 +68,16 @@ However, to use the ExtractingRequestHandler (aka Solr Cell), you can modify the
Then you can choose when documents should be added to the index (Commit Within), and whether existing documents should be overwritten with incoming documents with the same id (if this is not *true*, then the incoming documents will be dropped).
[[DocumentsScreen-SolrCommand]]
== Solr Command
The Solr Command option allows you use XML or JSON to perform specific actions on documents, such as defining documents to be added or deleted, updating only certain fields of documents, or commit and optimize commands on the index.
The documents should be structured as they would be if using `/update` on the command line.
[[DocumentsScreen-XML]]
== XML
== XML Documents
When using the XML document type, the functionality is similar to using a requestHandler on the command line. Instead of putting the documents in a curl command, they can instead be input into the Document entry box. The document structure should still be in proper Solr XML format, with each document separated by `<doc>` tags and each field defined.
Then you can choose when documents should be added to the index (Commit Within), and whether existing documents should be overwritten with incoming documents with the same id (if this is not **true**, then the incoming documents will be dropped).
This option will only add or overwrite documents to the index; for other update tasks, see the <<DocumentsScreen-SolrCommand,Solr Command>> option.
This option will only add or overwrite documents to the index; for other update tasks, see the <<Solr Command>> option.

View File

@ -28,7 +28,6 @@ For other features that we now commonly associate with search, such as sorting,
In Lucene 4.0, a new approach was introduced. DocValue fields are now column-oriented fields with a document-to-value mapping built at index time. This approach promises to relieve some of the memory requirements of the fieldCache and make lookups for faceting, sorting, and grouping much faster.
[[DocValues-EnablingDocValues]]
== Enabling DocValues
To use docValues, you only need to enable it for a field that you will use it with. As with all schema design, you need to define a field type and then define fields of that type with docValues enabled. All of these actions are done in `schema.xml`.
@ -76,7 +75,6 @@ Lucene index back-compatibility is only supported for the default codec. If you
If `docValues="true"` for a field, then DocValues will automatically be used any time the field is used for <<common-query-parameters.adoc#CommonQueryParameters-ThesortParameter,sorting>>, <<faceting.adoc#faceting,faceting>> or <<function-queries.adoc#function-queries,function queries>>.
[[DocValues-RetrievingDocValuesDuringSearch]]
=== Retrieving DocValues During Search
Field values retrieved during search queries are typically returned from stored values. However, non-stored docValues fields will be also returned along with other stored fields when all fields (or pattern matching globs) are specified to be returned (e.g. "`fl=*`") for search queries depending on the effective value of the `useDocValuesAsStored` parameter for each field. For schema versions >= 1.6, the implicit default is `useDocValuesAsStored="true"`. See <<field-type-definitions-and-properties.adoc#field-type-definitions-and-properties,Field Type Definitions and Properties>> & <<defining-fields.adoc#defining-fields,Defining Fields>> for more details.

View File

@ -24,10 +24,8 @@ This section describes enabling SSL using a self-signed certificate.
For background on SSL certificates and keys, see http://www.tldp.org/HOWTO/SSL-Certificates-HOWTO/.
[[EnablingSSL-BasicSSLSetup]]
== Basic SSL Setup
[[EnablingSSL-Generateaself-signedcertificateandakey]]
=== Generate a Self-Signed Certificate and a Key
To generate a self-signed certificate and a single key that will be used to authenticate both the server and the client, we'll use the JDK https://docs.oracle.com/javase/8/docs/technotes/tools/unix/keytool.html[`keytool`] command and create a separate keystore. This keystore will also be used as a truststore below. It's possible to use the keystore that comes with the JDK for these purposes, and to use a separate truststore, but those options aren't covered here.
@ -45,7 +43,6 @@ keytool -genkeypair -alias solr-ssl -keyalg RSA -keysize 2048 -keypass secret -s
The above command will create a keystore file named `solr-ssl.keystore.jks` in the current directory.
[[EnablingSSL-ConvertthecertificateandkeytoPEMformatforusewithcURL]]
=== Convert the Certificate and Key to PEM Format for Use with cURL
cURL isn't capable of using JKS formatted keystores, so the JKS keystore needs to be converted to PEM format, which cURL understands.
@ -73,7 +70,6 @@ If you want to use cURL on OS X Yosemite (10.10), you'll need to create a certif
openssl pkcs12 -nokeys -in solr-ssl.keystore.p12 -out solr-ssl.cacert.pem
----
[[EnablingSSL-SetcommonSSLrelatedsystemproperties]]
=== Set Common SSL-Related System Properties
The Solr Control Script is already setup to pass SSL-related Java system properties to the JVM. To activate the SSL settings, uncomment and update the set of properties beginning with SOLR_SSL_* in `bin/solr.in.sh`. (or `bin\solr.in.cmd` on Windows).
@ -116,7 +112,6 @@ REM Enable clients to authenticate (but not require)
set SOLR_SSL_WANT_CLIENT_AUTH=false
----
[[EnablingSSL-RunSingleNodeSolrusingSSL]]
=== Run Single Node Solr using SSL
Start Solr using the command shown below; by default clients will not be required to authenticate:
@ -133,12 +128,10 @@ bin/solr -p 8984
bin\solr.cmd -p 8984
----
[[EnablingSSL-SolrCloud]]
== SSL with SolrCloud
This section describes how to run a two-node SolrCloud cluster with no initial collections and a single-node external ZooKeeper. The commands below assume you have already created the keystore described above.
[[EnablingSSL-ConfigureZooKeeper]]
=== Configure ZooKeeper
NOTE: ZooKeeper does not support encrypted communication with clients like Solr. There are several related JIRA tickets where SSL support is being planned/worked on: https://issues.apache.org/jira/browse/ZOOKEEPER-235[ZOOKEEPER-235]; https://issues.apache.org/jira/browse/ZOOKEEPER-236[ZOOKEEPER-236]; https://issues.apache.org/jira/browse/ZOOKEEPER-1000[ZOOKEEPER-1000]; and https://issues.apache.org/jira/browse/ZOOKEEPER-2120[ZOOKEEPER-2120].
@ -161,12 +154,10 @@ server/scripts/cloud-scripts/zkcli.sh -zkhost localhost:2181 -cmd clusterprop -n
server\scripts\cloud-scripts\zkcli.bat -zkhost localhost:2181 -cmd clusterprop -name urlScheme -val https
----
If you have set up your ZooKeeper cluster to use a <<taking-solr-to-production.adoc#TakingSolrtoProduction-ZooKeeperchroot,chroot for Solr>> , make sure you use the correct `zkhost` string with `zkcli`, e.g. `-zkhost localhost:2181/solr`.
If you have set up your ZooKeeper cluster to use a <<taking-solr-to-production.adoc#zookeeper-chroot,chroot for Solr>> , make sure you use the correct `zkhost` string with `zkcli`, e.g. `-zkhost localhost:2181/solr`.
[[EnablingSSL-RunSolrCloudwithSSL]]
=== Run SolrCloud with SSL
[[EnablingSSL-CreateSolrhomedirectoriesfortwonodes]]
==== Create Solr Home Directories for Two Nodes
Create two copies of the `server/solr/` directory which will serve as the Solr home directories for each of your two SolrCloud nodes:
@ -187,7 +178,6 @@ xcopy /E server\solr cloud\node1\
xcopy /E server\solr cloud\node2\
----
[[EnablingSSL-StartthefirstSolrnode]]
==== Start the First Solr Node
Next, start the first Solr node on port 8984. Be sure to stop the standalone server first if you started it when working through the previous section on this page.
@ -220,7 +210,6 @@ bin/solr -cloud -s cloud/node1 -z localhost:2181 -p 8984 -Dsolr.ssl.checkPeerNam
bin\solr.cmd -cloud -s cloud\node1 -z localhost:2181 -p 8984 -Dsolr.ssl.checkPeerName=false
----
[[EnablingSSL-StartthesecondSolrnode]]
==== Start the Second Solr Node
Finally, start the second Solr node on port 7574 - again, to skip hostname verification, add `-Dsolr.ssl.checkPeerName=false`;
@ -237,14 +226,13 @@ bin/solr -cloud -s cloud/node2 -z localhost:2181 -p 7574
bin\solr.cmd -cloud -s cloud\node2 -z localhost:2181 -p 7574
----
[[EnablingSSL-ExampleClientActions]]
== Example Client Actions
[IMPORTANT]
====
cURL on OS X Mavericks (10.9) has degraded SSL support. For more information and workarounds to allow one-way SSL, see http://curl.haxx.se/mail/archive-2013-10/0036.html. cURL on OS X Yosemite (10.10) is improved - 2-way SSL is possible - see http://curl.haxx.se/mail/archive-2014-10/0053.html .
The cURL commands in the following sections will not work with the system `curl` on OS X Yosemite (10.10). Instead, the certificate supplied with the `-E` param must be in PKCS12 format, and the file supplied with the `--cacert` param must contain only the CA certificate, and no key (see <<EnablingSSL-ConvertthecertificateandkeytoPEMformatforusewithcURL,above>> for instructions on creating this file):
The cURL commands in the following sections will not work with the system `curl` on OS X Yosemite (10.10). Instead, the certificate supplied with the `-E` param must be in PKCS12 format, and the file supplied with the `--cacert` param must contain only the CA certificate, and no key (see <<Convert the Certificate and Key to PEM Format for Use with cURL,above>> for instructions on creating this file):
[source,bash]
curl -E solr-ssl.keystore.p12:secret --cacert solr-ssl.cacert.pem ...
@ -271,7 +259,6 @@ bin\solr.cmd create -c mycollection -shards 2
The `create` action will pass the `SOLR_SSL_*` properties set in your include file to the SolrJ code used to create the collection.
[[EnablingSSL-RetrieveSolrCloudclusterstatususingcURL]]
=== Retrieve SolrCloud Cluster Status using cURL
To get the resulting cluster status (again, if you have not enabled client authentication, remove the `-E solr-ssl.pem:secret` option):
@ -317,7 +304,6 @@ You should get a response that looks like this:
"properties":{"urlScheme":"https"}}}
----
[[EnablingSSL-Indexdocumentsusingpost.jar]]
=== Index Documents using post.jar
Use `post.jar` to index some example documents to the SolrCloud collection created above:
@ -329,7 +315,6 @@ cd example/exampledocs
java -Djavax.net.ssl.keyStorePassword=secret -Djavax.net.ssl.keyStore=../../server/etc/solr-ssl.keystore.jks -Djavax.net.ssl.trustStore=../../server/etc/solr-ssl.keystore.jks -Djavax.net.ssl.trustStorePassword=secret -Durl=https://localhost:8984/solr/mycollection/update -jar post.jar *.xml
----
[[EnablingSSL-QueryusingcURL]]
=== Query Using cURL
Use cURL to query the SolrCloud collection created above, from a directory containing the PEM formatted certificate and key created above (e.g. `example/etc/`) - if you have not enabled client authentication (system property `-Djetty.ssl.clientAuth=true)`, then you can remove the `-E solr-ssl.pem:secret` option:
@ -339,8 +324,7 @@ Use cURL to query the SolrCloud collection created above, from a directory conta
curl -E solr-ssl.pem:secret --cacert solr-ssl.pem "https://localhost:8984/solr/mycollection/select?q=*:*&wt=json&indent=on"
----
[[EnablingSSL-IndexadocumentusingCloudSolrClient]]
=== Index a document using CloudSolrClient
=== Index a Document using CloudSolrClient
From a java client using SolrJ, index a document. In the code below, the `javax.net.ssl.*` system properties are set programmatically, but you could instead specify them on the java command line, as in the `post.jar` example above:

View File

@ -18,14 +18,12 @@
// specific language governing permissions and limitations
// under the License.
[[Errata-ErrataForThisDocumentation]]
== Errata For This Documentation
Any mistakes found in this documentation after its release will be listed on the on-line version of this page:
https://lucene.apache.org/solr/guide/{solr-docs-version}/errata.html
[[Errata-ErrataForPastVersionsofThisDocumentation]]
== Errata For Past Versions of This Documentation
Any known mistakes in past releases of this documentation will be noted below.

View File

@ -25,19 +25,16 @@ This feature uses a stream sorting technique that begins to send records within
The cases where this functionality may be useful include: session analysis, distributed merge joins, time series roll-ups, aggregations on high cardinality fields, fully distributed field collapsing, and sort based stats.
[[ExportingResultSets-FieldRequirements]]
== Field Requirements
All the fields being sorted and exported must have docValues set to true. For more information, see the section on <<docvalues.adoc#docvalues,DocValues>>.
[[ExportingResultSets-The_exportRequestHandler]]
== The /export RequestHandler
The `/export` request handler with the appropriate configuration is one of Solr's out-of-the-box request handlers - see <<implicit-requesthandlers.adoc#implicit-requesthandlers,Implicit RequestHandlers>> for more information.
Note that this request handler's properties are defined as "invariants", which means they cannot be overridden by other properties passed at another time (such as at query time).
[[ExportingResultSets-RequestingResultsExport]]
== Requesting Results Export
You can use `/export` to make requests to export the result set of a query.
@ -53,19 +50,16 @@ Here is an example of an export request of some indexed log data:
http://localhost:8983/solr/core_name/export?q=my-query&sort=severity+desc,timestamp+desc&fl=severity,timestamp,msg
----
[[ExportingResultSets-SpecifyingtheSortCriteria]]
=== Specifying the Sort Criteria
The `sort` property defines how documents will be sorted in the exported result set. Results can be sorted by any field that has a field type of int,long, float, double, string. The sort fields must be single valued fields.
Up to four sort fields can be specified per request, with the 'asc' or 'desc' properties.
[[ExportingResultSets-SpecifyingtheFieldList]]
=== Specifying the Field List
The `fl` property defines the fields that will be exported with the result set. Any of the field types that can be sorted (i.e., int, long, float, double, string, date, boolean) can be used in the field list. The fields can be single or multi-valued. However, returning scores and wildcards are not supported at this time.
[[ExportingResultSets-DistributedSupport]]
== Distributed Support
See the section <<streaming-expressions.adoc#streaming-expressions,Streaming Expressions>> for distributed support.

View File

@ -21,7 +21,7 @@
Faceting is the arrangement of search results into categories based on indexed terms.
Searchers are presented with the indexed terms, along with numerical counts of how many matching documents were found were each term. Faceting makes it easy for users to explore search results, narrowing in on exactly the results they are looking for.
Searchers are presented with the indexed terms, along with numerical counts of how many matching documents were found for each term. Faceting makes it easy for users to explore search results, narrowing in on exactly the results they are looking for.
[[Faceting-GeneralParameters]]
== General Parameters
@ -351,7 +351,7 @@ The `facet.mincount` parameter, the same one as used in field faceting is also a
[NOTE]
====
Range faceting on date fields is a common situation where the <<working-with-dates.adoc#WorkingwithDates-TZ,`TZ`>> parameter can be useful to ensure that the "facet counts per day" or "facet counts per month" are based on a meaningful definition of when a given day/month "starts" relative to a particular TimeZone.
Range faceting on date fields is a common situation where the <<working-with-dates.adoc#tz,`TZ`>> parameter can be useful to ensure that the "facet counts per day" or "facet counts per month" are based on a meaningful definition of when a given day/month "starts" relative to a particular TimeZone.
For more information, see the examples in the <<working-with-dates.adoc#working-with-dates,Working with Dates>> section.

View File

@ -27,7 +27,6 @@ A field type definition can include four types of information:
* If the field type is `TextField`, a description of the field analysis for the field type.
* Field type properties - depending on the implementation class, some properties may be mandatory.
[[FieldTypeDefinitionsandProperties-FieldTypeDefinitionsinschema.xml]]
== Field Type Definitions in schema.xml
Field types are defined in `schema.xml`. Each field type is defined between `fieldType` elements. They can optionally be grouped within a `types` element. Here is an example of a field type definition for a type called `text_general`:
@ -91,9 +90,9 @@ For multivalued fields, specifies a distance between multiple values, which prev
`autoGeneratePhraseQueries`:: For text fields. If `true`, Solr automatically generates phrase queries for adjacent terms. If `false`, terms must be enclosed in double-quotes to be treated as phrases.
`enableGraphQueries`::
For text fields, applicable when querying with <<the-standard-query-parser.adoc#TheStandardQueryParser-StandardQueryParserParameters,`sow=false`>>. Use `true` (the default) for field types with query analyzers including graph-aware filters, e.g., <<filter-descriptions.adoc#FilterDescriptions-SynonymGraphFilter,Synonym Graph Filter>> and <<filter-descriptions.adoc#FilterDescriptions-WordDelimiterGraphFilter,Word Delimiter Graph Filter>>.
For text fields, applicable when querying with <<the-standard-query-parser.adoc#TheStandardQueryParser-StandardQueryParserParameters,`sow=false`>>. Use `true` (the default) for field types with query analyzers including graph-aware filters, e.g., <<filter-descriptions.adoc#synonym-graph-filter,Synonym Graph Filter>> and <<filter-descriptions.adoc#word-delimiter-graph-filter,Word Delimiter Graph Filter>>.
+
Use `false` for field types with query analyzers including filters that can match docs when some tokens are missing, e.g., <<filter-descriptions.adoc#FilterDescriptions-ShingleFilter,Shingle Filter>>.
Use `false` for field types with query analyzers including filters that can match docs when some tokens are missing, e.g., <<filter-descriptions.adoc#shingle-filter,Shingle Filter>>.
[[FieldTypeDefinitionsandProperties-docValuesFormat]]
`docValuesFormat`::
@ -137,9 +136,8 @@ The default values for each property depend on the underlying `FieldType` class,
// TODO: SOLR-10655 END
[[FieldTypeDefinitionsandProperties-FieldTypeSimilarity]]
== Field Type Similarity
A field type may optionally specify a `<similarity/>` that will be used when scoring documents that refer to fields with this type, as long as the "global" similarity for the collection allows it.
By default, any field type which does not define a similarity, uses `BM25Similarity`. For more details, and examples of configuring both global & per-type Similarities, please see <<other-schema-elements.adoc#OtherSchemaElements-Similarity,Other Schema Elements>>.
By default, any field type which does not define a similarity, uses `BM25Similarity`. For more details, and examples of configuring both global & per-type Similarities, please see <<other-schema-elements.adoc#similarity,Other Schema Elements>>.

View File

@ -27,17 +27,17 @@ The following table lists the field types that are available in Solr. The `org.a
|Class |Description
|BinaryField |Binary data.
|BoolField |Contains either true or false. Values of "1", "t", or "T" in the first character are interpreted as true. Any other values in the first character are interpreted as false.
|CollationField |Supports Unicode collation for sorting and range queries. ICUCollationField is a better choice if you can use ICU4J. See the section <<language-analysis.adoc#LanguageAnalysis-UnicodeCollation,Unicode Collation>>.
|CollationField |Supports Unicode collation for sorting and range queries. ICUCollationField is a better choice if you can use ICU4J. See the section <<language-analysis.adoc#unicode-collation,Unicode Collation>>.
|CurrencyField |Deprecated in favor of CurrencyFieldType.
|CurrencyFieldType |Supports currencies and exchange rates. See the section <<working-with-currencies-and-exchange-rates.adoc#working-with-currencies-and-exchange-rates,Working with Currencies and Exchange Rates>>.
|DateRangeField |Supports indexing date ranges, to include point in time date instances as well (single-millisecond durations). See the section <<working-with-dates.adoc#working-with-dates,Working with Dates>> for more detail on using this field type. Consider using this field type even if it's just for date instances, particularly when the queries typically fall on UTC year/month/day/hour, etc., boundaries.
|ExternalFileField |Pulls values from a file on disk. See the section <<working-with-external-files-and-processes.adoc#working-with-external-files-and-processes,Working with External Files and Processes>>.
|EnumField |Allows defining an enumerated set of values which may not be easily sorted by either alphabetic or numeric order (such as a list of severities, for example). This field type takes a configuration file, which lists the proper order of the field values. See the section <<working-with-enum-fields.adoc#working-with-enum-fields,Working with Enum Fields>> for more information.
|ICUCollationField |Supports Unicode collation for sorting and range queries. See the section <<language-analysis.adoc#LanguageAnalysis-UnicodeCollation,Unicode Collation>>.
|ICUCollationField |Supports Unicode collation for sorting and range queries. See the section <<language-analysis.adoc#unicode-collation,Unicode Collation>>.
|LatLonPointSpatialField |<<spatial-search.adoc#spatial-search,Spatial Search>>: a latitude/longitude coordinate pair; possibly multi-valued for multiple points. Usually it's specified as "lat,lon" order with a comma.
|LatLonType |(deprecated) <<spatial-search.adoc#spatial-search,Spatial Search>>: a single-valued latitude/longitude coordinate pair. Usually it's specified as "lat,lon" order with a comma.
|PointType |<<spatial-search.adoc#spatial-search,Spatial Search>>: A single-valued n-dimensional point. It's both for sorting spatial data that is _not_ lat-lon, and for some more rare use-cases. (NOTE: this is _not_ related to the "Point" based numeric fields)
|PreAnalyzedField |Provides a way to send to Solr serialized token streams, optionally with independent stored values of a field, and have this information stored and indexed without any additional text processing. Configuration and usage of PreAnalyzedField is documented on the <<working-with-external-files-and-processes.adoc#WorkingwithExternalFilesandProcesses-ThePreAnalyzedFieldType,Working with External Files and Processes>> page.
|PreAnalyzedField |Provides a way to send to Solr serialized token streams, optionally with independent stored values of a field, and have this information stored and indexed without any additional text processing. Configuration and usage of PreAnalyzedField is documented on the <<working-with-external-files-and-processes.adoc#the-preanalyzedfield-type,Working with External Files and Processes>> page.
|RandomSortField |Does not contain a value. Queries that sort on this field type will return results in random order. Use a dynamic field to use this feature.
|SpatialRecursivePrefixTreeFieldType |(RPT for short) <<spatial-search.adoc#spatial-search,Spatial Search>>: Accepts latitude comma longitude strings or other shapes in WKT format.
|StrField |String (UTF-8 encoded string or Unicode). Strings are intended for small fields and are _not_ tokenized or analyzed in any way. They have a hard limit of slightly less than 32K.

View File

@ -50,7 +50,6 @@ The following sections describe the filter factories that are included in this r
For user tips about Solr's filters, see http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters.
[[FilterDescriptions-ASCIIFoldingFilter]]
== ASCII Folding Filter
This filter converts alphabetic, numeric, and symbolic Unicode characters which are not in the Basic Latin Unicode block (the first 127 ASCII characters) to their ASCII equivalents, if one exists. This filter converts characters from the following Unicode blocks:
@ -92,10 +91,9 @@ This filter converts alphabetic, numeric, and symbolic Unicode characters which
*Out:* "a" (ASCII character 97)
[[FilterDescriptions-Beider-MorseFilter]]
== Beider-Morse Filter
Implements the Beider-Morse Phonetic Matching (BMPM) algorithm, which allows identification of similar names, even if they are spelled differently or in different languages. More information about how this works is available in the section on <<phonetic-matching.adoc#PhoneticMatching-Beider-MorsePhoneticMatching_BMPM_,Phonetic Matching>>.
Implements the Beider-Morse Phonetic Matching (BMPM) algorithm, which allows identification of similar names, even if they are spelled differently or in different languages. More information about how this works is available in the section on <<phonetic-matching.adoc#beider-morse-phonetic-matching-bmpm,Phonetic Matching>>.
[IMPORTANT]
====
@ -125,10 +123,9 @@ BeiderMorseFilter changed its behavior in Solr 5.0 due to an update to version 3
</analyzer>
----
[[FilterDescriptions-ClassicFilter]]
== Classic Filter
This filter takes the output of the <<tokenizers.adoc#Tokenizers-ClassicTokenizer,Classic Tokenizer>> and strips periods from acronyms and "'s" from possessives.
This filter takes the output of the <<tokenizers.adoc#classic-tokenizer,Classic Tokenizer>> and strips periods from acronyms and "'s" from possessives.
*Factory class:* `solr.ClassicFilterFactory`
@ -150,7 +147,6 @@ This filter takes the output of the <<tokenizers.adoc#Tokenizers-ClassicTokenize
*Out:* "IBM", "cat", "can't"
[[FilterDescriptions-CommonGramsFilter]]
== Common Grams Filter
This filter creates word shingles by combining common tokens such as stop words with regular tokens. This is useful for creating phrase queries containing common words, such as "the cat." Solr normally ignores stop words in queried phrases, so searching for "the cat" would return all matches for the word "cat."
@ -181,12 +177,10 @@ This filter creates word shingles by combining common tokens such as stop words
*Out:* "the_cat"
[[FilterDescriptions-CollationKeyFilter]]
== Collation Key Filter
Collation allows sorting of text in a language-sensitive way. It is usually used for sorting, but can also be used with advanced searches. We've covered this in much more detail in the section on <<language-analysis.adoc#LanguageAnalysis-UnicodeCollation,Unicode Collation>>.
Collation allows sorting of text in a language-sensitive way. It is usually used for sorting, but can also be used with advanced searches. We've covered this in much more detail in the section on <<language-analysis.adoc#unicode-collation,Unicode Collation>>.
[[FilterDescriptions-Daitch-MokotoffSoundexFilter]]
== Daitch-Mokotoff Soundex Filter
Implements the Daitch-Mokotoff Soundex algorithm, which allows identification of similar names, even if they are spelled differently. More information about how this works is available in the section on <<phonetic-matching.adoc#phonetic-matching,Phonetic Matching>>.
@ -207,7 +201,6 @@ Implements the Daitch-Mokotoff Soundex algorithm, which allows identification of
</analyzer>
----
[[FilterDescriptions-DoubleMetaphoneFilter]]
== Double Metaphone Filter
This filter creates tokens using the http://commons.apache.org/codec/apidocs/org/apache/commons/codec/language/DoubleMetaphone.html[`DoubleMetaphone`] encoding algorithm from commons-codec. For more information, see the <<phonetic-matching.adoc#phonetic-matching,Phonetic Matching>> section.
@ -260,7 +253,6 @@ Discard original token (`inject="false"`).
Note that "Kuczewski" has two encodings, which are added at the same position.
[[FilterDescriptions-EdgeN-GramFilter]]
== Edge N-Gram Filter
This filter generates edge n-gram tokens of sizes within the given range.
@ -327,7 +319,6 @@ A range of 4 to 6.
*Out:* "four", "scor", "score", "twen", "twent", "twenty"
[[FilterDescriptions-EnglishMinimalStemFilter]]
== English Minimal Stem Filter
This filter stems plural English words to their singular form.
@ -352,7 +343,6 @@ This filter stems plural English words to their singular form.
*Out:* "dog", "cat"
[[FilterDescriptions-EnglishPossessiveFilter]]
== English Possessive Filter
This filter removes singular possessives (trailing *'s*) from words. Note that plural possessives, e.g. the *s'* in "divers' snorkels", are not removed by this filter.
@ -377,7 +367,6 @@ This filter removes singular possessives (trailing *'s*) from words. Note that p
*Out:* "Man", "dog", "bites", "dogs'", "man"
[[FilterDescriptions-FingerprintFilter]]
== Fingerprint Filter
This filter outputs a single token which is a concatenation of the sorted and de-duplicated set of input tokens. This can be useful for clustering/linking use cases.
@ -406,7 +395,6 @@ This filter outputs a single token which is a concatenation of the sorted and de
*Out:* "brown_dog_fox_jumped_lazy_over_quick_the"
[[FilterDescriptions-FlattenGraphFilter]]
== Flatten Graph Filter
This filter must be included on index-time analyzer specifications that include at least one graph-aware filter, including Synonym Graph Filter and Word Delimiter Graph Filter.
@ -417,7 +405,6 @@ This filter must be included on index-time analyzer specifications that include
See the examples below for <<Synonym Graph Filter>> and <<Word Delimiter Graph Filter>>.
[[FilterDescriptions-HunspellStemFilter]]
== Hunspell Stem Filter
The `Hunspell Stem Filter` provides support for several languages. You must provide the dictionary (`.dic`) and rules (`.aff`) files for each language you wish to use with the Hunspell Stem Filter. You can download those language files http://wiki.services.openoffice.org/wiki/Dictionaries[here].
@ -456,7 +443,6 @@ Be aware that your results will vary widely based on the quality of the provided
*Out:* "jump", "jump", "jump"
[[FilterDescriptions-HyphenatedWordsFilter]]
== Hyphenated Words Filter
This filter reconstructs hyphenated words that have been tokenized as two tokens because of a line break or other intervening whitespace in the field test. If a token ends with a hyphen, it is joined with the following token and the hyphen is discarded.
@ -483,10 +469,9 @@ Note that for this filter to work properly, the upstream tokenizer must not remo
*Out:* "A", "hyphenated", "word"
[[FilterDescriptions-ICUFoldingFilter]]
== ICU Folding Filter
This filter is a custom Unicode normalization form that applies the foldings specified in http://www.unicode.org/reports/tr30/tr30-4.html[Unicode Technical Report 30] in addition to the `NFKC_Casefold` normalization form as described in <<FilterDescriptions-ICUNormalizer2Filter,ICU Normalizer 2 Filter>>. This filter is a better substitute for the combined behavior of the <<FilterDescriptions-ASCIIFoldingFilter,ASCII Folding Filter>>, <<FilterDescriptions-LowerCaseFilter,Lower Case Filter>>, and <<FilterDescriptions-ICUNormalizer2Filter,ICU Normalizer 2 Filter>>.
This filter is a custom Unicode normalization form that applies the foldings specified in http://www.unicode.org/reports/tr30/tr30-4.html[Unicode Technical Report 30] in addition to the `NFKC_Casefold` normalization form as described in <<ICU Normalizer 2 Filter>>. This filter is a better substitute for the combined behavior of the <<ASCII Folding Filter>>, <<Lower Case Filter>>, and <<ICU Normalizer 2 Filter>>.
To use this filter, see `solr/contrib/analysis-extras/README.txt` for instructions on which jars you need to add to your `solr_home/lib`. For more information about adding jars, see the section <<lib-directives-in-solrconfig.adoc#lib-directives-in-solrconfig,Lib Directives in Solrconfig>>.
@ -506,7 +491,6 @@ To use this filter, see `solr/contrib/analysis-extras/README.txt` for instructio
For detailed information on this normalization form, see http://www.unicode.org/reports/tr30/tr30-4.html.
[[FilterDescriptions-ICUNormalizer2Filter]]
== ICU Normalizer 2 Filter
This filter factory normalizes text according to one of five Unicode Normalization Forms as described in http://unicode.org/reports/tr15/[Unicode Standard Annex #15]:
@ -539,7 +523,6 @@ For detailed information about these Unicode Normalization Forms, see http://uni
To use this filter, see `solr/contrib/analysis-extras/README.txt` for instructions on which jars you need to add to your `solr_home/lib`.
[[FilterDescriptions-ICUTransformFilter]]
== ICU Transform Filter
This filter applies http://userguide.icu-project.org/transforms/general[ICU Tranforms] to text. This filter supports only ICU System Transforms. Custom rule sets are not supported.
@ -564,7 +547,6 @@ For detailed information about ICU Transforms, see http://userguide.icu-project.
To use this filter, see `solr/contrib/analysis-extras/README.txt` for instructions on which jars you need to add to your `solr_home/lib`.
[[FilterDescriptions-KeepWordFilter]]
== Keep Word Filter
This filter discards all tokens except those that are listed in the given word list. This is the inverse of the Stop Words Filter. This filter can be useful for building specialized indices for a constrained set of terms.
@ -638,7 +620,6 @@ Using LowerCaseFilterFactory before filtering for keep words, no `ignoreCase` fl
*Out:* "happy", "funny"
[[FilterDescriptions-KStemFilter]]
== KStem Filter
KStem is an alternative to the Porter Stem Filter for developers looking for a less aggressive stemmer. KStem was written by Bob Krovetz, ported to Lucene by Sergio Guzman-Lara (UMASS Amherst). This stemmer is only appropriate for English language text.
@ -663,7 +644,6 @@ KStem is an alternative to the Porter Stem Filter for developers looking for a l
*Out:* "jump", "jump", "jump"
[[FilterDescriptions-LengthFilter]]
== Length Filter
This filter passes tokens whose length falls within the min/max limit specified. All other tokens are discarded.
@ -694,7 +674,6 @@ This filter passes tokens whose length falls within the min/max limit specified.
*Out:* "turn", "right"
[[FilterDescriptions-LimitTokenCountFilter]]
== Limit Token Count Filter
This filter limits the number of accepted tokens, typically useful for index analysis.
@ -726,7 +705,6 @@ By default, this filter ignores any tokens in the wrapped `TokenStream` once the
*Out:* "1", "2", "3", "4", "5", "6", "7", "8", "9", "10"
[[FilterDescriptions-LimitTokenOffsetFilter]]
== Limit Token Offset Filter
This filter limits tokens to those before a configured maximum start character offset. This can be useful to limit highlighting, for example.
@ -758,7 +736,6 @@ By default, this filter ignores any tokens in the wrapped `TokenStream` once the
*Out:* "0", "2", "4", "6", "8", "A"
[[FilterDescriptions-LimitTokenPositionFilter]]
== Limit Token Position Filter
This filter limits tokens to those before a configured maximum token position.
@ -790,7 +767,6 @@ By default, this filter ignores any tokens in the wrapped `TokenStream` once the
*Out:* "1", "2", "3"
[[FilterDescriptions-LowerCaseFilter]]
== Lower Case Filter
Converts any uppercase letters in a token to the equivalent lowercase token. All other characters are left unchanged.
@ -815,10 +791,9 @@ Converts any uppercase letters in a token to the equivalent lowercase token. All
*Out:* "down", "with", "camelcase"
[[FilterDescriptions-ManagedStopFilter]]
== Managed Stop Filter
This is specialized version of the <<FilterDescriptions-StopFilter,Stop Words Filter Factory>> that uses a set of stop words that are <<managed-resources.adoc#managed-resources,managed from a REST API.>>
This is specialized version of the <<Stop Filter,Stop Words Filter Factory>> that uses a set of stop words that are <<managed-resources.adoc#managed-resources,managed from a REST API.>>
*Arguments:*
@ -836,12 +811,11 @@ With this configuration the set of words is named "english" and can be managed v
</analyzer>
----
See <<FilterDescriptions-StopFilter,Stop Filter>> for example input/output.
See <<Stop Filter>> for example input/output.
[[FilterDescriptions-ManagedSynonymFilter]]
== Managed Synonym Filter
This is specialized version of the <<FilterDescriptions-SynonymFilter,Synonym Filter Factory>> that uses a mapping on synonyms that is <<managed-resources.adoc#managed-resources,managed from a REST API.>>
This is specialized version of the <<Synonym Filter>> that uses a mapping on synonyms that is <<managed-resources.adoc#managed-resources,managed from a REST API.>>
.Managed Synonym Filter has been Deprecated
[WARNING]
@ -851,12 +825,11 @@ Managed Synonym Filter has been deprecated in favor of Managed Synonym Graph Fil
*Factory class:* `solr.ManagedSynonymFilterFactory`
For arguments and examples, see the Managed Synonym Graph Filter below.
For arguments and examples, see the <<Managed Synonym Graph Filter>> below.
[[FilterDescriptions-ManagedSynonymGraphFilter]]
== Managed Synonym Graph Filter
This is specialized version of the <<FilterDescriptions-SynonymGraphFilter,Synonym Graph Filter Factory>> that uses a mapping on synonyms that is <<managed-resources.adoc#managed-resources,managed from a REST API.>>
This is specialized version of the <<Synonym Graph Filter>> that uses a mapping on synonyms that is <<managed-resources.adoc#managed-resources,managed from a REST API.>>
This filter maps single- or multi-token synonyms, producing a fully correct graph output. This filter is a replacement for the Managed Synonym Filter, which produces incorrect graphs for multi-token synonyms.
@ -881,9 +854,8 @@ With this configuration the set of mappings is named "english" and can be manage
</analyzer>
----
See <<FilterDescriptions-ManagedSynonymFilter,Managed Synonym Filter>> for example input/output.
See <<Managed Synonym Filter>> for example input/output.
[[FilterDescriptions-N-GramFilter]]
== N-Gram Filter
Generates n-gram tokens of sizes in the given range. Note that tokens are ordered by position and then by gram size.
@ -950,7 +922,6 @@ A range of 3 to 5.
*Out:* "fou", "four", "our", "sco", "scor", "score", "cor", "core", "ore"
[[FilterDescriptions-NumericPayloadTokenFilter]]
== Numeric Payload Token Filter
This filter adds a numeric floating point payload value to tokens that match a given type. Refer to the Javadoc for the `org.apache.lucene.analysis.Token` class for more information about token types and payloads.
@ -979,7 +950,6 @@ This filter adds a numeric floating point payload value to tokens that match a g
*Out:* "bing"[0.75], "bang"[0.75], "boom"[0.75]
[[FilterDescriptions-PatternReplaceFilter]]
== Pattern Replace Filter
This filter applies a regular expression to each token and, for those that match, substitutes the given replacement string in place of the matched pattern. Tokens which do not match are passed though unchanged.
@ -1048,7 +1018,6 @@ More complex pattern with capture group reference in the replacement. Tokens tha
*Out:* "cat", "foo_1234", "9987", "blah1234foo"
[[FilterDescriptions-PhoneticFilter]]
== Phonetic Filter
This filter creates tokens using one of the phonetic encoding algorithms in the `org.apache.commons.codec.language` package. For more information, see the section on <<phonetic-matching.adoc#phonetic-matching,Phonetic Matching>>.
@ -1119,7 +1088,6 @@ Default Soundex encoder.
*Out:* "four"(1), "F600"(1), "score"(2), "S600"(2), "and"(3), "A530"(3), "twenty"(4), "T530"(4)
[[FilterDescriptions-PorterStemFilter]]
== Porter Stem Filter
This filter applies the Porter Stemming Algorithm for English. The results are similar to using the Snowball Porter Stemmer with the `language="English"` argument. But this stemmer is coded directly in Java and is not based on Snowball. It does not accept a list of protected words and is only appropriate for English language text. However, it has been benchmarked as http://markmail.org/thread/d2c443z63z37rwf6[four times faster] than the English Snowball stemmer, so can provide a performance enhancement.
@ -1144,7 +1112,6 @@ This filter applies the Porter Stemming Algorithm for English. The results are s
*Out:* "jump", "jump", "jump"
[[FilterDescriptions-RemoveDuplicatesTokenFilter]]
== Remove Duplicates Token Filter
The filter removes duplicate tokens in the stream. Tokens are considered to be duplicates ONLY if they have the same text and position values.
@ -1223,7 +1190,6 @@ This filter reverses tokens to provide faster leading wildcard and prefix querie
*Out:* "oof*", "rab*"
[[FilterDescriptions-ShingleFilter]]
== Shingle Filter
This filter constructs shingles, which are token n-grams, from the token stream. It combines runs of tokens into a single token.
@ -1278,7 +1244,6 @@ A shingle size of four, do not include original token.
*Out:* "To be"(1), "To be or"(1), "To be or not"(1), "be or"(2), "be or not"(2), "be or not to"(2), "or not"(3), "or not to"(3), "or not to be"(3), "not to"(4), "not to be"(4), "to be"(5)
[[FilterDescriptions-SnowballPorterStemmerFilter]]
== Snowball Porter Stemmer Filter
This filter factory instantiates a language-specific stemmer generated by Snowball. Snowball is a software package that generates pattern-based word stemmers. This type of stemmer is not as accurate as a table-based stemmer, but is faster and less complex. Table-driven stemmers are labor intensive to create and maintain and so are typically commercial products.
@ -1349,7 +1314,6 @@ Spanish stemmer, Spanish words:
*Out:* "cant", "cant"
[[FilterDescriptions-StandardFilter]]
== Standard Filter
This filter removes dots from acronyms and the substring "'s" from the end of tokens. This filter depends on the tokens being tagged with the appropriate term-type to recognize acronyms and words with apostrophes.
@ -1363,7 +1327,6 @@ This filter removes dots from acronyms and the substring "'s" from the end of to
This filter is no longer operational in Solr when the `luceneMatchVersion` (in `solrconfig.xml`) is higher than "3.1".
====
[[FilterDescriptions-StopFilter]]
== Stop Filter
This filter discards, or _stops_ analysis of, tokens that are on the given stop words list. A standard stop words list is included in the Solr `conf` directory, named `stopwords.txt`, which is appropriate for typical English language text.
@ -1414,10 +1377,9 @@ Case-sensitive matching, capitalized words not stopped. Token positions skip sto
*Out:* "what"(4)
[[FilterDescriptions-SuggestStopFilter]]
== Suggest Stop Filter
Like <<FilterDescriptions-StopFilter,Stop Filter>>, this filter discards, or _stops_ analysis of, tokens that are on the given stop words list.
Like <<Stop Filter>>, this filter discards, or _stops_ analysis of, tokens that are on the given stop words list.
Suggest Stop Filter differs from Stop Filter in that it will not remove the last token unless it is followed by a token separator. For example, a query `"find the"` would preserve the `'the'` since it was not followed by a space, punctuation etc., and mark it as a `KEYWORD` so that following filters will not change or remove it.
@ -1455,7 +1417,6 @@ By contrast, a query like "`find the popsicle`" would remove '`the`' as a stopwo
*Out:* "the"(2)
[[FilterDescriptions-SynonymFilter]]
== Synonym Filter
This filter does synonym mapping. Each token is looked up in the list of synonyms and if a match is found, then the synonym is emitted in place of the token. The position value of the new tokens are set such they all occur at the same position as the original token.
@ -1470,7 +1431,6 @@ Synonym Filter has been deprecated in favor of Synonym Graph Filter, which is re
For arguments and examples, see the Synonym Graph Filter below.
[[FilterDescriptions-SynonymGraphFilter]]
== Synonym Graph Filter
This filter maps single- or multi-token synonyms, producing a fully correct graph output. This filter is a replacement for the Synonym Filter, which produces incorrect graphs for multi-token synonyms.
@ -1542,7 +1502,6 @@ small => tiny,teeny,weeny
*Out:* "the"(1), "large"(2), "large"(3), "couch"(4), "sofa"(4), "divan"(4)
[[FilterDescriptions-TokenOffsetPayloadFilter]]
== Token Offset Payload Filter
This filter adds the numeric character offsets of the token as a payload value for that token.
@ -1567,7 +1526,6 @@ This filter adds the numeric character offsets of the token as a payload value f
*Out:* "bing"[0,4], "bang"[5,9], "boom"[10,14]
[[FilterDescriptions-TrimFilter]]
== Trim Filter
This filter trims leading and/or trailing whitespace from tokens. Most tokenizers break tokens at whitespace, so this filter is most often used for special situations.
@ -1596,7 +1554,6 @@ The PatternTokenizerFactory configuration used here splits the input on simple c
*Out:* "one", "two", "three", "four"
[[FilterDescriptions-TypeAsPayloadFilter]]
== Type As Payload Filter
This filter adds the token's type, as an encoded byte sequence, as its payload.
@ -1621,10 +1578,9 @@ This filter adds the token's type, as an encoded byte sequence, as its payload.
*Out:* "Pay"[<ALPHANUM>], "Bob's"[<APOSTROPHE>], "I.O.U."[<ACRONYM>]
[[FilterDescriptions-TypeTokenFilter]]
== Type Token Filter
This filter blacklists or whitelists a specified list of token types, assuming the tokens have type metadata associated with them. For example, the <<tokenizers.adoc#Tokenizers-UAX29URLEmailTokenizer,UAX29 URL Email Tokenizer>> emits "<URL>" and "<EMAIL>" typed tokens, as well as other types. This filter would allow you to pull out only e-mail addresses from text as tokens, if you wish.
This filter blacklists or whitelists a specified list of token types, assuming the tokens have type metadata associated with them. For example, the <<tokenizers.adoc#uax29-url-email-tokenizer,UAX29 URL Email Tokenizer>> emits "<URL>" and "<EMAIL>" typed tokens, as well as other types. This filter would allow you to pull out only e-mail addresses from text as tokens, if you wish.
*Factory class:* `solr.TypeTokenFilterFactory`
@ -1645,7 +1601,6 @@ This filter blacklists or whitelists a specified list of token types, assuming t
</analyzer>
----
[[FilterDescriptions-WordDelimiterFilter]]
== Word Delimiter Filter
This filter splits tokens at word delimiters.
@ -1660,7 +1615,6 @@ Word Delimiter Filter has been deprecated in favor of Word Delimiter Graph Filte
For a full description, including arguments and examples, see the Word Delimiter Graph Filter below.
[[FilterDescriptions-WordDelimiterGraphFilter]]
== Word Delimiter Graph Filter
This filter splits tokens at word delimiters.

View File

@ -25,14 +25,13 @@ Function queries are supported by the <<the-dismax-query-parser.adoc#the-dismax-
Function queries use _functions_. The functions can be a constant (numeric or string literal), a field, another function or a parameter substitution argument. You can use these functions to modify the ranking of results for users. These could be used to change the ranking of results based on a user's location, or some other calculation.
[[FunctionQueries-UsingFunctionQuery]]
== Using Function Query
Functions must be expressed as function calls (for example, `sum(a,b)` instead of simply `a+b`).
There are several ways of using function queries in a Solr query:
* Via an explicit QParser that expects function arguments, such <<other-parsers.adoc#OtherParsers-FunctionQueryParser,`func`>> or <<other-parsers.adoc#OtherParsers-FunctionRangeQueryParser,`frange`>> . For example:
* Via an explicit QParser that expects function arguments, such <<other-parsers.adoc#function-query-parser,`func`>> or <<other-parsers.adoc#function-range-query-parser,`frange`>> . For example:
+
[source,text]
----
@ -76,7 +75,6 @@ q=_val_:mynumericfield _val_:"recip(rord(myfield),1,2,3)"
Only functions with fast random access are recommended.
[[FunctionQueries-AvailableFunctions]]
== Available Functions
The table below summarizes the functions available for function queries.
@ -89,7 +87,7 @@ Returns the absolute value of the specified value or function.
* `abs(x)` `abs(-5)`
=== childfield(field) Function
Returns the value of the given field for one of the matched child docs when searching by <<other-parsers.adoc#OtherParsers-BlockJoinParentQueryParser,{!parent}>>. It can be used only in `sort` parameter.
Returns the value of the given field for one of the matched child docs when searching by <<other-parsers.adoc#block-join-parent-query-parser,{!parent}>>. It can be used only in `sort` parameter.
*Syntax Examples*
@ -149,7 +147,6 @@ You can quote the term if it's more complex, or do parameter substitution for th
* `docfreq(text,'solr')`
* `...&defType=func` `&q=docfreq(text,$myterm)&myterm=solr`
[[FunctionQueries-field]]
=== field Function
Returns the numeric docValues or indexed value of the field with the specified name. In its simplest (single argument) form, this function can only be used on single valued fields, and can be called using the name of the field as a string, or for most conventional field names simply use the field name by itself with out using the `field(...)` syntax.
@ -232,7 +229,7 @@ If the value of `x` does not fall between `min` and `max`, then either the value
=== max Function
Returns the maximum numeric value of multiple nested functions or constants, which are specified as arguments: `max(x,y,...)`. The `max` function can also be useful for "bottoming out" another function or field at some specified constant.
Use the `field(myfield,max)` syntax for <<FunctionQueries-field,selecting the maximum value of a single multivalued field>>.
Use the `field(myfield,max)` syntax for <<field Function,selecting the maximum value of a single multivalued field>>.
*Syntax Example*
@ -248,7 +245,7 @@ Returns the number of documents in the index, including those that are marked as
=== min Function
Returns the minimum numeric value of multiple nested functions of constants, which are specified as arguments: `min(x,y,...)`. The `min` function can also be useful for providing an "upper bound" on a function using a constant.
Use the `field(myfield,min)` <<FunctionQueries-field,syntax for selecting the minimum value of a single multivalued field>>.
Use the `field(myfield,min)` <<field Function,syntax for selecting the minimum value of a single multivalued field>>.
*Syntax Example*
@ -502,8 +499,6 @@ Returns `true` if any member of the field exists.
*Syntax Example*
* `if(lt(ms(mydatefield),315569259747),0.8,1)` translates to this pseudocode: `if mydatefield < 315569259747 then 0.8 else 1`
[[FunctionQueries-ExampleFunctionQueries]]
== Example Function Queries
To give you a better understanding of how function queries can be used in Solr, suppose an index stores the dimensions in meters x,y,z of some hypothetical boxes with arbitrary names stored in field `boxname`. Suppose we want to search for box matching name `findbox` but ranked according to volumes of boxes. The query parameters would be:
@ -521,7 +516,6 @@ Suppose that you also have a field storing the weight of the box as `weight`. To
http://localhost:8983/solr/collection_name/select?q=boxname:findbox _val_:"div(weight,product(x,y,z))"&fl=boxname x y z weight score
----
[[FunctionQueries-SortByFunction]]
== Sort By Function
You can sort your query results by the output of a function. For example, to sort results by distance, you could enter:

View File

@ -33,10 +33,8 @@ In this section you will learn how to start a SolrCloud cluster using startup sc
This tutorial assumes that you're already familiar with the basics of using Solr. If you need a refresher, please see the <<getting-started.adoc#getting-started,Getting Started section>> to get a grounding in Solr concepts. If you load documents as part of that exercise, you should start over with a fresh Solr installation for these SolrCloud tutorials.
====
[[GettingStartedwithSolrCloud-SolrCloudExample]]
== SolrCloud Example
[[GettingStartedwithSolrCloud-InteractiveStartup]]
=== Interactive Startup
The `bin/solr` script makes it easy to get started with SolrCloud as it walks you through the process of launching Solr nodes in cloud mode and adding a collection. To get started, simply do:
@ -120,7 +118,6 @@ To stop Solr in SolrCloud mode, you would use the `bin/solr` script and issue th
bin/solr stop -all
----
[[GettingStartedwithSolrCloud-Startingwith-noprompt]]
=== Starting with -noprompt
You can also get SolrCloud started with all the defaults instead of the interactive session using the following command:
@ -130,7 +127,6 @@ You can also get SolrCloud started with all the defaults instead of the interact
bin/solr -e cloud -noprompt
----
[[GettingStartedwithSolrCloud-RestartingNodes]]
=== Restarting Nodes
You can restart your SolrCloud nodes using the `bin/solr` script. For instance, to restart node1 running on port 8983 (with an embedded ZooKeeper server), you would do:
@ -149,7 +145,6 @@ bin/solr restart -c -p 7574 -z localhost:9983 -s example/cloud/node2/solr
Notice that you need to specify the ZooKeeper address (`-z localhost:9983`) when starting node2 so that it can join the cluster with node1.
[[GettingStartedwithSolrCloud-Addinganodetoacluster]]
=== Adding a node to a cluster
Adding a node to an existing cluster is a bit advanced and involves a little more understanding of Solr. Once you startup a SolrCloud cluster using the startup scripts, you can add a new node to it by:

View File

@ -31,7 +31,6 @@ The `nodes` function can be combined with the `scoreNodes` function to provide r
This document assumes a basic understanding of graph terminology and streaming expressions. You can begin exploring graph traversal concepts with this https://en.wikipedia.org/wiki/Graph_traversal[Wikipedia article]. More details about streaming expressions are available in this Guide, in the section <<streaming-expressions.adoc#streaming-expressions,Streaming Expressions>>.
====
[[GraphTraversal-BasicSyntax]]
== Basic Syntax
We'll start with the most basic syntax and slowly build up more complexity. The most basic syntax for `nodes` is:
@ -161,7 +160,6 @@ When scattering both branches and leaves the output would like this:
Now the level 0 root node is included in the output.
[[GraphTraversal-Aggregations]]
== Aggregations
`nodes` also supports aggregations. For example:
@ -182,8 +180,7 @@ Edges are uniqued as part of the traversal so the count will *not* reflect the n
The aggregation functions supported are `count(*)`, `sum(field)`, `min(field)`, `max(field)`, and `avg(field)`. The fields being aggregated should be present in the edges collected during the traversal. Later examples (below) will show aggregations can be a powerful tool for providing recommendations and limiting the scope of traversals.
[[GraphTraversal-Nestingnodesfunctions]]
== Nesting nodes functions
== Nesting nodes Functions
The `nodes` function can be nested to traverse deeper into the graph. For example:
@ -207,14 +204,12 @@ Put more simply, the inner expression gathers all the people that "\johndoe@apac
This construct of nesting `nodes` functions is the basic technique for doing a controlled traversal through the graph.
[[GraphTraversal-CycleDetection]]
== Cycle Detection
The `nodes` function performs cycle detection across the entire traversal. This ensures that nodes that have already been visited are not traversed again. Cycle detection is important for both limiting the size of traversals and gathering accurate aggregations. Without cycle detection the size of the traversal could grow exponentially with each hop in the traversal. With cycle detection only new nodes encountered are traversed.
Cycle detection *does not* cross collection boundaries. This is because internally the collection name is part of the node ID. For example the node ID "\johndoe@apache.org", is really `emails/johndoe@apache.org`. When traversing to another collection "\johndoe@apache.org" will be traversed.
[[GraphTraversal-FilteringtheTraversal]]
== Filtering the Traversal
Each level in the traversal can be filtered with a filter query. For example:
@ -229,7 +224,6 @@ nodes(emails,
In the example above only emails that match the filter query will be included in the traversal. Any Solr query can be included here. So you can do fun things like <<spatial-search.adoc#spatial-search,geospatial queries>>, apply any of the available <<query-syntax-and-parsing.adoc#query-syntax-and-parsing,query parsers>>, or even write custom query parsers to limit the traversal.
[[GraphTraversal-RootStreams]]
== Root Streams
Any streaming expression can be used to provide the root nodes for a traversal. For example:
@ -246,7 +240,6 @@ The example above provides the root nodes through a search expression. You can a
Notice that the `walk` parameter maps a field from the tuples generated by the inner stream. In this case it maps the `to` field from the inner stream to the `from` field.
[[GraphTraversal-SkippingHighFrequencyNodes]]
== Skipping High Frequency Nodes
It's often desirable to skip traversing high frequency nodes in the graph. This is similar in nature to a search term stop list. The best way to describe this is through an example use case.
@ -277,7 +270,6 @@ The `nodes` function has the `maxDocFreq` param to allow for filtering out high
In the example above, the inner search expression searches the `logs` collection and returning all the articles viewed by "user1". The outer `nodes` expression takes all the articles emitted from the inner search expression and finds all the records in the logs collection for those articles. It then gathers and aggregates the users that have read the articles. The `maxDocFreq` parameter limits the articles returned to those that appear in no more then 10,000 log records (per shard). This guards against returning articles that have been viewed by millions of users.
[[GraphTraversal-TrackingtheTraversal]]
== Tracking the Traversal
By default the `nodes` function only tracks enough information to do cycle detection. This provides enough information to output the nodes and aggregations in the graph.
@ -298,7 +290,6 @@ nodes(emails,
gather="to")
----
[[GraphTraversal-Cross-CollectionTraversals]]
== Cross-Collection Traversals
Nested `nodes` functions can operate on different SolrCloud collections. This allow traversals to "walk" from one collection to another to gather nodes. Cycle detection does not cross collection boundaries, so nodes collected in one collection will be traversed in a different collection. This was done deliberately to support cross-collection traversals. Note that the output from a cross-collection traversal will likely contain duplicate nodes with different collection attributes.
@ -320,7 +311,6 @@ nodes(logs,
The example above finds all people who sent emails with a body that contains "solr rocks". It then finds all the people these people have emailed. Then it traverses to the logs collection and gathers all the content IDs that these people have edited.
[[GraphTraversal-CombiningnodesWithOtherStreamingExpressions]]
== Combining nodes With Other Streaming Expressions
The `nodes` function can act as both a stream source and a stream decorator. The connection with the wider stream expression library provides tremendous power and flexibility when performing graph traversals. Here is an example of using the streaming expression library to intersect two friend networks:
@ -348,10 +338,8 @@ The `nodes` function can act as both a stream source and a stream decorator. The
The example above gathers two separate friend networks, one rooted with "\johndoe@apache.org" and another rooted with "\janedoe@apache.org". The friend networks are then sorted by the `node` field, and intersected. The resulting node set will be the intersection of the two friend networks.
[[GraphTraversal-SampleUseCases]]
== Sample Use Cases
== Sample Use Cases for Graph Traversal
[[GraphTraversal-CalculateMarketBasketCo-occurrence]]
=== Calculate Market Basket Co-occurrence
It is often useful to know which products are most frequently purchased with a particular product. This example uses a simple market basket table (indexed in Solr) to store past shopping baskets. The schema for the table is very simple with each row containing a `basketID` and a `productID`. This can be seen as a graph with each row in the table representing an edge. And it can be traversed very quickly to calculate basket co-occurrence, even when the graph contains billions of edges.
@ -378,15 +366,13 @@ Let's break down exactly what this traversal is doing.
In a nutshell this expression finds the products that most frequently co-occur with product "ABC" in past shopping baskets.
[[GraphTraversal-UsingthescoreNodesFunctiontoMakeaRecommendation]]
=== Using the scoreNodes Function to Make a Recommendation
This use case builds on the market basket example <<GraphTraversal-CalculateMarketBasketCo-occurrence,above>> that calculates which products co-occur most frequently with productID:ABC. The ranked co-occurrence counts provide candidates for a recommendation. The `scoreNodes` function can be used to score the candidates to find the best recommendation.
This use case builds on the market basket example <<Calculate Market Basket Co-occurrence,above>> that calculates which products co-occur most frequently with productID:ABC. The ranked co-occurrence counts provide candidates for a recommendation. The `scoreNodes` function can be used to score the candidates to find the best recommendation.
Before diving into the syntax of the `scoreNodes` function it's useful to understand why the raw co-occurrence counts may not produce the best recommendation. The reason is that raw co-occurrence counts favor items that occur frequently across all baskets. A better recommendation would find the product that has the most significant relationship with productID ABC. The `scoreNodes` function uses a term frequency-inverse document frequency (TF-IDF) algorithm to find the most significant relationship.
[[GraphTraversal-HowItWorks]]
==== *How It Works*
==== How scoreNodes Works
The `scoreNodes` function assigns a score to each node emitted by the nodes expression. By default the `scoreNodes` function uses the `count(*)` aggregation, which is the co-occurrence count, as the TF value. The IDF value for each node is fetched from the collection where the node was gathered. Each node is then scored using the TF*IDF formula, which provides a boost to nodes with a lower frequency across all market baskets.
@ -394,8 +380,7 @@ Combining the co-occurrence count with the IDF provides a score that shows how i
The `scoreNodes` function adds the score to each node in the `nodeScore` field.
[[GraphTraversal-ExampleSyntax]]
==== *Example Syntax*
==== Example scoreNodes Syntax
[source,plain]
----
@ -417,7 +402,6 @@ This example builds on the earlier example "Calculate market basket co-occurrenc
. The `scoreNodes` function then assigns a score to the candidates based on the TF*IDF of each node.
. The outer `top` expression selects the highest scoring node. This is the recommendation.
[[GraphTraversal-RecommendContentBasedonCollaborativeFilter]]
=== Recommend Content Based on Collaborative Filter
In this example we'll recommend content for a user based on a collaborative filter. This recommendation is made using log records that contain the `userID` and `articleID` and the action performed. In this scenario each log record can be viewed as an edge in a graph. The userID and articleID are the nodes and the action is an edge property used to filter the traversal.
@ -458,7 +442,6 @@ Note that it skips high frequency nodes using the `maxDocFreq` param to filter o
Any article selected in step 1 (user1 reading list), will not appear in this step due to cycle detection. So this step returns the articles read by the users with the most similar readings habits to "user1" that "user1" has not read yet. It also counts the number of times each article has been read across this user group.
. The outer `top` expression takes the top articles emitted from step 4. This is the recommendation.
[[GraphTraversal-ProteinPathwayTraversal]]
=== Protein Pathway Traversal
In recent years, scientists have become increasingly able to rationally design drugs that target the mutated proteins, called oncogenes, responsible for some cancers. Proteins typically act through long chains of chemical interactions between multiple proteins, called pathways, and, while the oncogene in the pathway may not have a corresponding drug, another protein in the pathway may. Graph traversal on a protein collection that records protein interactions and drugs may yield possible candidates. (Thanks to Lewis Geer of the NCBI, for providing this example).
@ -481,7 +464,6 @@ Let's break down exactly what this traversal is doing.
. The outer `nodes` expression also works with the `proteins` collection. It gathers all the drugs that correspond to proteins emitted from step 1.
. Using this stepwise approach you can gather the drugs along the pathway of interactions any number of steps away from the root protein.
[[GraphTraversal-ExportingGraphMLtoSupportGraphVisualization]]
== Exporting GraphML to Support Graph Visualization
In the examples above, the `nodes` expression was sent to Solr's `/stream` handler like any other streaming expression. This approach outputs the nodes in the same JSON tuple format as other streaming expressions so that it can be treated like any other streaming expression. You can use the `/stream` handler when you need to operate directly on the tuples, such as in the recommendation use cases above.
@ -496,8 +478,7 @@ There are a few things to keep mind when exporting a graph in GraphML:
. The `/graph` handler currently accepts an arbitrarily complex streaming expression which includes a `nodes` expression. If the streaming expression doesn't include a `nodes` expression, the `/graph` handler will not properly output GraphML.
. The `/graph` handler currently accepts a single arbitrarily complex, nested `nodes` expression per request. This means you cannot send in a streaming expression that joins or intersects the node sets from multiple `nodes` expressions. The `/graph` handler does support any level of nesting within a single `nodes` expression. The `/stream` handler does support joining and intersecting node sets, but the `/graph` handler currently does not.
[[GraphTraversal-SampleRequest]]
=== Sample Request
=== Sample GraphML Request
[source,bash]
----
@ -512,7 +493,6 @@ curl --data-urlencode 'expr=nodes(enron_emails,
gather="to")' http://localhost:8983/solr/enron_emails/graph
----
[[GraphTraversal-SampleGraphMLOutput]]
=== Sample GraphML Output
[source,xml]

View File

@ -30,7 +30,7 @@ For some of the authentication schemes (e.g., Kerberos), Solr provides a native
There are two plugin classes:
* `HadoopAuthPlugin`: This can be used with standalone Solr as well as Solrcloud with <<authentication-and-authorization-plugins.adoc#AuthenticationandAuthorizationPlugins-PKI,PKI authentication>> for internode communication.
* `HadoopAuthPlugin`: This can be used with standalone Solr as well as Solrcloud with <<authentication-and-authorization-plugins.adoc#securing-inter-node-requests,PKI authentication>> for internode communication.
* `ConfigurableInternodeAuthHadoopPlugin`: This is an extension of HadoopAuthPlugin that allows you to configure the authentication scheme for internode communication.
[TIP]
@ -38,7 +38,6 @@ There are two plugin classes:
For most SolrCloud or standalone Solr setups, the `HadoopAuthPlugin` should suffice.
====
[[HadoopAuthenticationPlugin-PluginConfiguration]]
== Plugin Configuration
`class`::
@ -70,11 +69,8 @@ Configures proxy users for the underlying Hadoop authentication mechanism. This
`clientBuilderFactory`:: No |
The `HttpClientBuilderFactory` implementation used for the Solr internal communication. Only applicable for `ConfigurableInternodeAuthHadoopPlugin`.
[[HadoopAuthenticationPlugin-ExampleConfigurations]]
== Example Configurations
[[HadoopAuthenticationPlugin-KerberosAuthenticationusingHadoopAuthenticationPlugin]]
=== Kerberos Authentication using Hadoop Authentication Plugin
This example lets you configure Solr to use Kerberos Authentication, similar to how you would use the <<kerberos-authentication-plugin.adoc#kerberos-authentication-plugin,Kerberos Authentication Plugin>>.
@ -105,7 +101,6 @@ To setup this plugin, use the following in your `security.json` file.
}
----
[[HadoopAuthenticationPlugin-SimpleAuthenticationwithDelegationTokens]]
=== Simple Authentication with Delegation Tokens
Similar to the previous example, this is an example of setting up a Solr cluster that uses delegation tokens. Refer to the parameters in the Hadoop authentication library's https://hadoop.apache.org/docs/stable/hadoop-auth/Configuration.html[documentation] or refer to the section <<kerberos-authentication-plugin.adoc#kerberos-authentication-plugin,Kerberos Authentication Plugin>> for further details. Please note that this example does not use Kerberos and the requests made to Solr must contain valid delegation tokens.

View File

@ -24,7 +24,6 @@ The fragments are included in a special section of the query response (the `high
Highlighting is extremely configurable, perhaps more than any other part of Solr. There are many parameters each for fragment sizing, formatting, ordering, backup/alternate behavior, and more options that are hard to categorize. Nonetheless, highlighting is very simple to use.
[[Highlighting-Usage]]
== Usage
=== Common Highlighter Parameters
@ -36,7 +35,7 @@ Use this parameter to enable or disable highlighting. The default is `false`. If
`hl.method`::
The highlighting implementation to use. Acceptable values are: `unified`, `original`, `fastVector`. The default is `original`.
+
See the <<Highlighting-ChoosingaHighlighter,Choosing a Highlighter>> section below for more details on the differences between the available highlighters.
See the <<Choosing a Highlighter>> section below for more details on the differences between the available highlighters.
`hl.fl`::
Specifies a list of fields to highlight. Accepts a comma- or space-delimited list of fields for which Solr should generate highlighted snippets.
@ -92,7 +91,6 @@ The default is `51200` characters.
There are more parameters supported as well depending on the highlighter (via `hl.method`) chosen.
[[Highlighting-HighlightingintheQueryResponse]]
=== Highlighting in the Query Response
In the response to a query, Solr includes highlighting data in a section separate from the documents. It is up to a client to determine how to process this response and display the highlights to users.
@ -136,7 +134,6 @@ Note the two sections `docs` and `highlighting`. The `docs` section contains the
The `highlighting` section includes the ID of each document, and the field that contains the highlighted portion. In this example, we used the `hl.fl` parameter to say we wanted query terms highlighted in the "manu" field. When there is a match to the query term in that field, it will be included for each document ID in the list.
[[Highlighting-ChoosingaHighlighter]]
== Choosing a Highlighter
Solr provides a `HighlightComponent` (a `SearchComponent`) and it's in the default list of components for search handlers. It offers a somewhat unified API over multiple actual highlighting implementations (or simply "highlighters") that do the business of highlighting.
@ -173,7 +170,6 @@ The Unified Highlighter is exclusively configured via search parameters. In cont
In addition to further information below, more information can be found in the {solr-javadocs}/solr-core/org/apache/solr/highlight/package-summary.html[Solr javadocs].
[[Highlighting-SchemaOptionsandPerformanceConsiderations]]
=== Schema Options and Performance Considerations
Fundamental to the internals of highlighting are detecting the _offsets_ of the individual words that match the query. Some of the highlighters can run the stored text through the analysis chain defined in the schema, some can look them up from _postings_, and some can look them up from _term vectors._ These choices have different trade-offs:
@ -198,7 +194,6 @@ This is definitely the fastest option for highlighting wildcard queries on large
+
This adds substantial weight to the index similar in size to the compressed stored text. If you are using the Unified Highlighter then this is not a recommended configuration since it's slower and heavier than postings with light term vectors. However, this could make sense if full term vectors are already needed for another use-case.
[[Highlighting-TheUnifiedHighlighter]]
== The Unified Highlighter
The Unified Highlighter supports these following additional parameters to the ones listed earlier:
@ -243,7 +238,6 @@ Indicates which character to break the text on. Use only if you have defined `hl
This is useful when the text has already been manipulated in advance to have a special delineation character at desired highlight passage boundaries. This character will still appear in the text as the last character of a passage.
[[Highlighting-TheOriginalHighlighter]]
== The Original Highlighter
The Original Highlighter supports these following additional parameters to the ones listed earlier:
@ -314,7 +308,6 @@ If this may happen and you know you don't need them for highlighting (i.e. your
The Original Highlighter has a plugin architecture that enables new functionality to be registered in `solrconfig.xml`. The "```techproducts```" configset shows most of these settings explicitly. You can use it as a guide to provide your own components to include a `SolrFormatter`, `SolrEncoder`, and `SolrFragmenter.`
[[Highlighting-TheFastVectorHighlighter]]
== The FastVector Highlighter
The FastVector Highlighter (FVH) can be used in conjunction with the Original Highlighter if not all fields should be highlighted with the FVH. In such a mode, set `hl.method=original` and `f.yourTermVecField.hl.method=fastVector` for all fields that should use the FVH. One annoyance to keep in mind is that the Original Highlighter uses `hl.simple.pre` whereas the FVH (and other highlighters) use `hl.tag.pre`.
@ -349,15 +342,12 @@ The maximum number of phrases to analyze when searching for the highest-scoring
`hl.multiValuedSeparatorChar`::
Text to use to separate one value from the next for a multi-valued field. The default is " " (a space).
[[Highlighting-UsingBoundaryScannerswiththeFastVectorHighlighter]]
=== Using Boundary Scanners with the FastVector Highlighter
The FastVector Highlighter will occasionally truncate highlighted words. To prevent this, implement a boundary scanner in `solrconfig.xml`, then use the `hl.boundaryScanner` parameter to specify the boundary scanner for highlighting.
Solr supports two boundary scanners: `breakIterator` and `simple`.
[[Highlighting-ThebreakIteratorBoundaryScanner]]
==== The breakIterator Boundary Scanner
The `breakIterator` boundary scanner offers excellent performance right out of the box by taking locale and boundary type into account. In most cases you will want to use the `breakIterator` boundary scanner. To implement the `breakIterator` boundary scanner, add this code to the `highlighting` section of your `solrconfig.xml` file, adjusting the type, language, and country values as appropriate to your application:
@ -375,7 +365,6 @@ The `breakIterator` boundary scanner offers excellent performance right out of t
Possible values for the `hl.bs.type` parameter are WORD, LINE, SENTENCE, and CHARACTER.
[[Highlighting-ThesimpleBoundaryScanner]]
==== The simple Boundary Scanner
The `simple` boundary scanner scans term boundaries for a specified maximum character value (`hl.bs.maxScan`) and for common delimiters such as punctuation marks (`hl.bs.chars`). The `simple` boundary scanner may be useful for some custom To implement the `simple` boundary scanner, add this code to the `highlighting` section of your `solrconfig.xml` file, adjusting the values as appropriate to your application:

View File

@ -27,13 +27,11 @@ The following sections cover provide general information about how various SolrC
If you are already familiar with SolrCloud concepts and basic functionality, you can skip to the section covering <<solrcloud-configuration-and-parameters.adoc#solrcloud-configuration-and-parameters,SolrCloud Configuration and Parameters>>.
[[HowSolrCloudWorks-KeySolrCloudConcepts]]
== Key SolrCloud Concepts
A SolrCloud cluster consists of some "logical" concepts layered on top of some "physical" concepts.
[[HowSolrCloudWorks-Logical]]
=== Logical
=== Logical Concepts
* A Cluster can host multiple Collections of Solr Documents.
* A collection can be partitioned into multiple Shards, which contain a subset of the Documents in the Collection.
@ -41,8 +39,7 @@ A SolrCloud cluster consists of some "logical" concepts layered on top of some "
** The theoretical limit to the number of Documents that Collection can reasonably contain.
** The amount of parallelization that is possible for an individual search request.
[[HowSolrCloudWorks-Physical]]
=== Physical
=== Physical Concepts
* A Cluster is made up of one or more Solr Nodes, which are running instances of the Solr server process.
* Each Node can host multiple Cores.

View File

@ -20,7 +20,6 @@
Solr ships with many out-of-the-box RequestHandlers, which are called implicit because they are not configured in `solrconfig.xml`.
[[ImplicitRequestHandlers-ListofImplicitlyAvailableEndpoints]]
== List of Implicitly Available Endpoints
// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
@ -44,19 +43,18 @@ Solr ships with many out-of-the-box RequestHandlers, which are called implicit b
|`/debug/dump` |{solr-javadocs}/solr-core/org/apache/solr/handler/DumpRequestHandler.html[DumpRequestHandler] |`_DEBUG_DUMP` |Echo the request contents back to the client.
|<<exporting-result-sets.adoc#exporting-result-sets,`/export`>> |{solr-javadocs}/solr-core/org/apache/solr/handler/component/SearchHandler.html[SearchHandler] |`_EXPORT` |Export full sorted result sets.
|<<realtime-get.adoc#realtime-get,`/get`>> |{solr-javadocs}/solr-core/org/apache/solr/handler/RealTimeGetHandler.html[RealTimeGetHandler] |`_GET` |Real-time get: low-latency retrieval of the latest version of a document.
|<<graph-traversal.adoc#GraphTraversal-ExportingGraphMLtoSupportGraphVisualization,`/graph`>> |{solr-javadocs}/solr-core/org/apache/solr/handler/GraphHandler.html[GraphHandler] |`_ADMIN_GRAPH` |Return http://graphml.graphdrawing.org/[GraphML] formatted output from a <<graph-traversal.adoc#graph-traversal,`gather` `Nodes` streaming expression>>.
|<<graph-traversal.adoc#exporting-graphml-to-support-graph-visualization,`/graph`>> |{solr-javadocs}/solr-core/org/apache/solr/handler/GraphHandler.html[GraphHandler] |`_ADMIN_GRAPH` |Return http://graphml.graphdrawing.org/[GraphML] formatted output from a <<graph-traversal.adoc#graph-traversal,`gather` `Nodes` streaming expression>>.
|<<index-replication.adoc#index-replication,`/replication`>> |{solr-javadocs}/solr-core/org/apache/solr/handler/ReplicationHandler.html[ReplicationHandler] |`_REPLICATION` |Replicate indexes for SolrCloud recovery and Master/Slave index distribution.
|<<schema-api.adoc#schema-api,`/schema`>> |{solr-javadocs}/solr-core/org/apache/solr/handler/SchemaHandler.html[SchemaHandler] |`_SCHEMA` |Retrieve/modify Solr schema.
|<<parallel-sql-interface.adoc#sql-request-handler,`/sql`>> |{solr-javadocs}/solr-core/org/apache/solr/handler/SQLHandler.html[SQLHandler] |`_SQL` |Front end of the Parallel SQL interface.
|<<streaming-expressions.adoc#StreamingExpressions-StreamingRequestsandResponses,`/stream`>> |{solr-javadocs}/solr-core/org/apache/solr/handler/StreamHandler.html[StreamHandler] |`_STREAM` |Distributed stream processing.
|<<the-terms-component.adoc#TheTermsComponent-UsingtheTermsComponentinaRequestHandler,`/terms`>> |{solr-javadocs}/solr-core/org/apache/solr/handler/component/SearchHandler.html[SearchHandler] |`_TERMS` |Return a field's indexed terms and the number of documents containing each term.
|<<streaming-expressions.adoc#streaming-requests-and-responses,`/stream`>> |{solr-javadocs}/solr-core/org/apache/solr/handler/StreamHandler.html[StreamHandler] |`_STREAM` |Distributed stream processing.
|<<the-terms-component.adoc#using-the-terms-component-in-a-request-handler,`/terms`>> |{solr-javadocs}/solr-core/org/apache/solr/handler/component/SearchHandler.html[SearchHandler] |`_TERMS` |Return a field's indexed terms and the number of documents containing each term.
|<<uploading-data-with-index-handlers.adoc#uploading-data-with-index-handlers,`/update`>> |{solr-javadocs}/solr-core/org/apache/solr/handler/UpdateRequestHandler.html[UpdateRequestHandler] |`_UPDATE` |Add, delete and update indexed documents formatted as SolrXML, CSV, SolrJSON or javabin.
|<<uploading-data-with-index-handlers.adoc#UploadingDatawithIndexHandlers-CSVUpdateConveniencePaths,`/update/csv`>> |{solr-javadocs}/solr-core/org/apache/solr/handler/UpdateRequestHandler.html[UpdateRequestHandler] |`_UPDATE_CSV` |Add and update CSV-formatted documents.
|<<uploading-data-with-index-handlers.adoc#UploadingDatawithIndexHandlers-CSVUpdateConveniencePaths,`/update/json`>> |{solr-javadocs}/solr-core/org/apache/solr/handler/UpdateRequestHandler.html[UpdateRequestHandler] |`_UPDATE_JSON` |Add, delete and update SolrJSON-formatted documents.
|<<uploading-data-with-index-handlers.adoc#csv-update-convenience-paths,`/update/csv`>> |{solr-javadocs}/solr-core/org/apache/solr/handler/UpdateRequestHandler.html[UpdateRequestHandler] |`_UPDATE_CSV` |Add and update CSV-formatted documents.
|<<uploading-data-with-index-handlers.adoc#csv-update-convenience-paths,`/update/json`>> |{solr-javadocs}/solr-core/org/apache/solr/handler/UpdateRequestHandler.html[UpdateRequestHandler] |`_UPDATE_JSON` |Add, delete and update SolrJSON-formatted documents.
|<<transforming-and-indexing-custom-json.adoc#transforming-and-indexing-custom-json,`/update/json/docs`>> |{solr-javadocs}/solr-core/org/apache/solr/handler/UpdateRequestHandler.html[UpdateRequestHandler] |`_UPDATE_JSON_DOCS` |Add and update custom JSON-formatted documents.
|===
[[ImplicitRequestHandlers-HowtoViewtheConfiguration]]
== How to View the Configuration
You can see configuration for all request handlers, including the implicit request handlers, via the <<config-api.adoc#config-api,Config API>>. E.g. for the `gettingstarted` collection:
@ -71,7 +69,6 @@ To include the expanded paramset in the response, as well as the effective param
`curl "http://localhost:8983/solr/gettingstarted/config/requestHandler?componentName=/export&expandParams=true"`
[[ImplicitRequestHandlers-HowtoEdittheConfiguration]]
== How to Edit the Configuration
Because implicit request handlers are not present in `solrconfig.xml`, configuration of their associated `default`, `invariant` and `appends` parameters may be edited via<<request-parameters-api.adoc#request-parameters-api, Request Parameters API>> using the paramset listed in the above table. However, other parameters, including SearchHandler components, may not be modified. The invariants and appends specified in the implicit configuration cannot be overridden.

View File

@ -26,7 +26,6 @@ The figure below shows a Solr configuration using index replication. The master
image::images/index-replication/worddav2b7e14725d898b4104cdd9c502fc77cd.png[image,width=159,height=235]
[[IndexReplication-IndexReplicationinSolr]]
== Index Replication in Solr
Solr includes a Java implementation of index replication that works over HTTP:
@ -46,7 +45,6 @@ Although there is no explicit concept of "master/slave" nodes in a <<solrcloud.a
When using SolrCloud, the `ReplicationHandler` must be available via the `/replication` path. Solr does this implicitly unless overridden explicitly in your `solrconfig.xml`, but if you wish to override the default behavior, make certain that you do not explicitly set any of the "master" or "slave" configuration options mentioned below, or they will interfere with normal SolrCloud operation.
====
[[IndexReplication-ReplicationTerminology]]
== Replication Terminology
The table below defines the key terms associated with Solr replication.
@ -79,15 +77,13 @@ Snapshot::
A directory containing hard links to the data files of an index. Snapshots are distributed from the master nodes when the slaves pull them, "smart copying" any segments the slave node does not have in snapshot directory that contains the hard links to the most recent index data files.
[[IndexReplication-ConfiguringtheReplicationHandler]]
== Configuring the ReplicationHandler
In addition to `ReplicationHandler` configuration options specific to the master/slave roles, there are a few special configuration options that are generally supported (even when using SolrCloud).
* `maxNumberOfBackups` an integer value dictating the maximum number of backups this node will keep on disk as it receives `backup` commands.
* Similar to most other request handlers in Solr you may configure a set of <<requesthandlers-and-searchcomponents-in-solrconfig.adoc#RequestHandlersandSearchComponentsinSolrConfig-SearchHandlers,defaults, invariants, and/or appends>> parameters corresponding with any request parameters supported by the `ReplicationHandler` when <<IndexReplication-HTTPAPICommandsfortheReplicationHandler,processing commands>>.
* Similar to most other request handlers in Solr you may configure a set of <<requesthandlers-and-searchcomponents-in-solrconfig.adoc#searchhandlers,defaults, invariants, and/or appends>> parameters corresponding with any request parameters supported by the `ReplicationHandler` when <<HTTP API Commands for the ReplicationHandler,processing commands>>.
[[IndexReplication-ConfiguringtheReplicationRequestHandleronaMasterServer]]
=== Configuring the Replication RequestHandler on a Master Server
Before running a replication, you should set the following parameters on initialization of the handler:
@ -125,7 +121,6 @@ The example below shows a possible 'master' configuration for the `ReplicationHa
</requestHandler>
----
[[IndexReplication-Replicatingsolrconfig.xml]]
==== Replicating solrconfig.xml
In the configuration file on the master server, include a line like the following:
@ -139,7 +134,6 @@ This ensures that the local configuration `solrconfig_slave.xml` will be saved a
On the master server, the file name of the slave configuration file can be anything, as long as the name is correctly identified in the `confFiles` string; then it will be saved as whatever file name appears after the colon ':'.
[[IndexReplication-ConfiguringtheReplicationRequestHandleronaSlaveServer]]
=== Configuring the Replication RequestHandler on a Slave Server
The code below shows how to configure a ReplicationHandler on a slave.
@ -188,7 +182,6 @@ The code below shows how to configure a ReplicationHandler on a slave.
</requestHandler>
----
[[IndexReplication-SettingUpaRepeaterwiththeReplicationHandler]]
== Setting Up a Repeater with the ReplicationHandler
A master may be able to serve only so many slaves without affecting performance. Some organizations have deployed slave servers across multiple data centers. If each slave downloads the index from a remote data center, the resulting download may consume too much network bandwidth. To avoid performance degradation in cases like this, you can configure one or more slaves as repeaters. A repeater is simply a node that acts as both a master and a slave.
@ -213,7 +206,6 @@ Here is an example of a ReplicationHandler configuration for a repeater:
</requestHandler>
----
[[IndexReplication-CommitandOptimizeOperations]]
== Commit and Optimize Operations
When a commit or optimize operation is performed on the master, the RequestHandler reads the list of file names which are associated with each commit point. This relies on the `replicateAfter` parameter in the configuration to decide which types of events should trigger replication.
@ -233,7 +225,6 @@ The `replicateAfter` parameter can accept multiple arguments. For example:
<str name="replicateAfter">optimize</str>
----
[[IndexReplication-SlaveReplication]]
== Slave Replication
The master is totally unaware of the slaves.
@ -246,7 +237,6 @@ The slave continuously keeps polling the master (depending on the `pollInterval`
* After the download completes, all the new files are moved to the live index directory and the file's timestamp is same as its counterpart on the master.
* A commit command is issued on the slave by the Slave's ReplicationHandler and the new index is loaded.
[[IndexReplication-ReplicatingConfigurationFiles]]
=== Replicating Configuration Files
To replicate configuration files, list them using using the `confFiles` parameter. Only files found in the `conf` directory of the master's Solr instance will be replicated.
@ -259,7 +249,6 @@ As a precaution when replicating configuration files, Solr copies configuration
If a replication involved downloading of at least one configuration file, the ReplicationHandler issues a core-reload command instead of a commit command.
[[IndexReplication-ResolvingCorruptionIssuesonSlaveServers]]
=== Resolving Corruption Issues on Slave Servers
If documents are added to the slave, then the slave is no longer in sync with its master. However, the slave will not undertake any action to put itself in sync, until the master has new index data.
@ -268,7 +257,6 @@ When a commit operation takes place on the master, the index version of the mast
To correct this problem, the slave then copies all the index files from master to a new index directory and asks the core to load the fresh index from the new directory.
[[IndexReplication-HTTPAPICommandsfortheReplicationHandler]]
== HTTP API Commands for the ReplicationHandler
You can use the HTTP commands below to control the ReplicationHandler's operations.
@ -355,7 +343,6 @@ There are two supported parameters:
* `location`: Location where the snapshot is created.
[[IndexReplication-DistributionandOptimization]]
== Distribution and Optimization
Optimizing an index is not something most users should generally worry about - but in particular users should be aware of the impacts of optimizing an index when using the `ReplicationHandler`.

View File

@ -29,10 +29,8 @@ By default, the settings are commented out in the sample `solrconfig.xml` includ
</indexConfig>
----
[[IndexConfiginSolrConfig-WritingNewSegments]]
== Writing New Segments
[[IndexConfiginSolrConfig-ramBufferSizeMB]]
=== ramBufferSizeMB
Once accumulated document updates exceed this much memory space (defined in megabytes), then the pending updates are flushed. This can also create new segments or trigger a merge. Using this setting is generally preferable to `maxBufferedDocs`. If both `maxBufferedDocs` and `ramBufferSizeMB` are set in `solrconfig.xml`, then a flush will occur when either limit is reached. The default is 100Mb.
@ -42,7 +40,6 @@ Once accumulated document updates exceed this much memory space (defined in mega
<ramBufferSizeMB>100</ramBufferSizeMB>
----
[[IndexConfiginSolrConfig-maxBufferedDocs]]
=== maxBufferedDocs
Sets the number of document updates to buffer in memory before they are flushed as a new segment. This may also trigger a merge. The default Solr configuration sets to flush by RAM usage (`ramBufferSizeMB`).
@ -52,20 +49,17 @@ Sets the number of document updates to buffer in memory before they are flushed
<maxBufferedDocs>1000</maxBufferedDocs>
----
[[IndexConfiginSolrConfig-useCompoundFile]]
=== useCompoundFile
Controls whether newly written (and not yet merged) index segments should use the <<IndexConfiginSolrConfig-CompoundFileSegments,Compound File Segment>> format. The default is false.
Controls whether newly written (and not yet merged) index segments should use the <<Compound File Segments>> format. The default is false.
[source,xml]
----
<useCompoundFile>false</useCompoundFile>
----
[[IndexConfiginSolrConfig-MergingIndexSegments]]
== Merging Index Segments
[[IndexConfiginSolrConfig-mergePolicyFactory]]
=== mergePolicyFactory
Defines how merging segments is done.
@ -99,7 +93,6 @@ Choosing the best merge factors is generally a trade-off of indexing speed vs. s
Conversely, keeping more segments can accelerate indexing, because merges happen less often, making an update is less likely to trigger a merge. But searches become more computationally expensive and will likely be slower, because search terms must be looked up in more index segments. Faster index updates also means shorter commit turnaround times, which means more timely search results.
[[IndexConfiginSolrConfig-CustomizingMergePolicies]]
=== Customizing Merge Policies
If the configuration options for the built-in merge policies do not fully suit your use case, you can customize them: either by creating a custom merge policy factory that you specify in your configuration, or by configuring a {solr-javadocs}/solr-core/org/apache/solr/index/WrapperMergePolicyFactory.html[merge policy wrapper] which uses a `wrapped.prefix` configuration option to control how the factory it wraps will be configured:
@ -117,7 +110,6 @@ If the configuration options for the built-in merge policies do not fully suit y
The example above shows Solr's {solr-javadocs}/solr-core/org/apache/solr/index/SortingMergePolicyFactory.html[`SortingMergePolicyFactory`] being configured to sort documents in merged segments by `"timestamp desc"`, and wrapped around a `TieredMergePolicyFactory` configured to use the values `maxMergeAtOnce=10` and `segmentsPerTier=10` via the `inner` prefix defined by `SortingMergePolicyFactory` 's `wrapped.prefix` option. For more information on using `SortingMergePolicyFactory`, see <<common-query-parameters.adoc#CommonQueryParameters-ThesegmentTerminateEarlyParameter,the segmentTerminateEarly parameter>>.
[[IndexConfiginSolrConfig-mergeScheduler]]
=== mergeScheduler
The merge scheduler controls how merges are performed. The default `ConcurrentMergeScheduler` performs merges in the background using separate threads. The alternative, `SerialMergeScheduler`, does not perform merges with separate threads.
@ -127,7 +119,6 @@ The merge scheduler controls how merges are performed. The default `ConcurrentMe
<mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler"/>
----
[[IndexConfiginSolrConfig-mergedSegmentWarmer]]
=== mergedSegmentWarmer
When using Solr in for <<near-real-time-searching.adoc#near-real-time-searching,Near Real Time Searching>> a merged segment warmer can be configured to warm the reader on the newly merged segment, before the merge commits. This is not required for near real-time search, but will reduce search latency on opening a new near real-time reader after a merge completes.
@ -137,7 +128,6 @@ When using Solr in for <<near-real-time-searching.adoc#near-real-time-searching,
<mergedSegmentWarmer class="org.apache.lucene.index.SimpleMergedSegmentWarmer"/>
----
[[IndexConfiginSolrConfig-CompoundFileSegments]]
== Compound File Segments
Each Lucene segment is typically comprised of a dozen or so files. Lucene can be configured to bundle all of the files for a segment into a single compound file using a file extension of `.cfs`; it's an abbreviation for Compound File Segment.
@ -149,16 +139,14 @@ On systems where the number of open files allowed per process is limited, CFS ma
.CFS: New Segments vs Merged Segments
[NOTE]
====
To configure whether _newly written segments_ should use CFS, see the <<IndexConfiginSolrConfig-useCompoundFile,`useCompoundFile`>> setting described above. To configure whether _merged segments_ use CFS, review the Javadocs for your <<IndexConfiginSolrConfig-mergePolicyFactory,`mergePolicyFactory`>> .
To configure whether _newly written segments_ should use CFS, see the <<useCompoundFile,`useCompoundFile`>> setting described above. To configure whether _merged segments_ use CFS, review the Javadocs for your <<mergePolicyFactory,`mergePolicyFactory`>> .
Many <<IndexConfiginSolrConfig-MergingIndexSegments,Merge Policy>> implementations support `noCFSRatio` and `maxCFSSegmentSizeMB` settings with default values that prevent compound files from being used for large segments, but do use compound files for small segments.
Many <<Merging Index Segments,Merge Policy>> implementations support `noCFSRatio` and `maxCFSSegmentSizeMB` settings with default values that prevent compound files from being used for large segments, but do use compound files for small segments.
====
[[IndexConfiginSolrConfig-IndexLocks]]
== Index Locks
[[IndexConfiginSolrConfig-lockType]]
=== lockType
The LockFactory options specify the locking implementation to use.
@ -177,7 +165,6 @@ For more information on the nuances of each LockFactory, see http://wiki.apache.
<lockType>native</lockType>
----
[[IndexConfiginSolrConfig-writeLockTimeout]]
=== writeLockTimeout
The maximum time to wait for a write lock on an IndexWriter. The default is 1000, expressed in milliseconds.
@ -187,7 +174,6 @@ The maximum time to wait for a write lock on an IndexWriter. The default is 1000
<writeLockTimeout>1000</writeLockTimeout>
----
[[IndexConfiginSolrConfig-OtherIndexingSettings]]
== Other Indexing Settings
There are a few other parameters that may be important to configure for your implementation. These settings affect how or when updates are made to an index.

View File

@ -43,7 +43,6 @@ This section describes how Solr adds data to its index. It covers the following
* *<<uima-integration.adoc#uima-integration,UIMA Integration>>*: Information about integrating Solr with Apache's Unstructured Information Management Architecture (UIMA). UIMA lets you define custom pipelines of Analysis Engines that incrementally add metadata to your documents as annotations.
[[IndexingandBasicDataOperations-IndexingUsingClientAPIs]]
== Indexing Using Client APIs
Using client APIs, such as <<using-solrj.adoc#using-solrj,SolrJ>>, from your applications is an important option for updating Solr indexes. See the <<client-apis.adoc#client-apis,Client APIs>> section for more information.

View File

@ -55,8 +55,7 @@ For example, if an `<initParams>` section has the name "myParams", you can call
[source,xml]
<requestHandler name="/dump1" class="DumpRequestHandler" initParams="myParams"/>
[[InitParamsinSolrConfig-Wildcards]]
== Wildcards
== Wildcards in initParams
An `<initParams>` section can support wildcards to define nested paths that should use the parameters defined. A single asterisk (\*) denotes that a nested path one level deeper should use the parameters. Double asterisks (**) denote all nested paths no matter how deep should use the parameters.

View File

@ -38,12 +38,10 @@ If the field name is defined in the Schema that is associated with the index, th
For more information on indexing in Solr, see the https://wiki.apache.org/solr/FrontPage[Solr Wiki].
[[IntroductiontoSolrIndexing-TheSolrExampleDirectory]]
== The Solr Example Directory
When starting Solr with the "-e" option, the `example/` directory will be used as base directory for the example Solr instances that are created. This directory also includes an `example/exampledocs/` subdirectory containing sample documents in a variety of formats that you can use to experiment with indexing into the various examples.
[[IntroductiontoSolrIndexing-ThecurlUtilityforTransferringFiles]]
== The curl Utility for Transferring Files
Many of the instructions and examples in this section make use of the `curl` utility for transferring content through a URL. `curl` posts and retrieves data over HTTP, FTP, and many other protocols. Most Linux distributions include a copy of `curl`. You'll find curl downloads for Linux, Windows, and many other operating systems at http://curl.haxx.se/download.html. Documentation for `curl` is available here: http://curl.haxx.se/docs/manpage.html.

View File

@ -24,7 +24,6 @@ Configuring your JVM can be a complex topic and a full discussion is beyond the
For more general information about improving Solr performance, see https://wiki.apache.org/solr/SolrPerformanceFactors.
[[JVMSettings-ChoosingMemoryHeapSettings]]
== Choosing Memory Heap Settings
The most important JVM configuration settings are those that determine the amount of memory it is allowed to allocate. There are two primary command-line options that set memory limits for the JVM. These are `-Xms`, which sets the initial size of the JVM's memory heap, and `-Xmx`, which sets the maximum size to which the heap is allowed to grow.
@ -41,12 +40,10 @@ When setting the maximum heap size, be careful not to let the JVM consume all av
On systems with many CPUs/cores, it can also be beneficial to tune the layout of the heap and/or the behavior of the garbage collector. Adjusting the relative sizes of the generational pools in the heap can affect how often GC sweeps occur and whether they run concurrently. Configuring the various settings of how the garbage collector should behave can greatly reduce the overall performance impact when it does run. There is a lot of good information on this topic available on Sun's website. A good place to start is here: http://www.oracle.com/technetwork/java/javase/tech/index-jsp-140228.html[Oracle's Java HotSpot Garbage Collection].
[[JVMSettings-UsetheServerHotSpotVM]]
== Use the Server HotSpot VM
If you are using Sun's JVM, add the `-server` command-line option when you start Solr. This tells the JVM that it should optimize for a long running, server process. If the Java runtime on your system is a JRE, rather than a full JDK distribution (including `javac` and other development tools), then it is possible that it may not support the `-server` JVM option. Test this by running `java -help` and look for `-server` as an available option in the displayed usage message.
[[JVMSettings-CheckingJVMSettings]]
== Checking JVM Settings
A great way to see what JVM settings your server is using, along with other useful information, is to use the admin RequestHandler, `solr/admin/system`. This request handler will display a wealth of server statistics and settings.

View File

@ -29,17 +29,14 @@ Support for the Kerberos authentication plugin is available in SolrCloud mode or
If you are using Solr with a Hadoop cluster secured with Kerberos and intend to store your Solr indexes in HDFS, also see the section <<running-solr-on-hdfs.adoc#running-solr-on-hdfs,Running Solr on HDFS>> for additional steps to configure Solr for that purpose. The instructions on this page apply only to scenarios where Solr will be secured with Kerberos. If you only need to store your indexes in a Kerberized HDFS system, please see the other section referenced above.
====
[[KerberosAuthenticationPlugin-HowSolrWorksWithKerberos]]
== How Solr Works With Kerberos
When setting up Solr to use Kerberos, configurations are put in place for Solr to use a _service principal_, or a Kerberos username, which is registered with the Key Distribution Center (KDC) to authenticate requests. The configurations define the service principal name and the location of the keytab file that contains the credentials.
[[KerberosAuthenticationPlugin-security.json]]
=== security.json
The Solr authentication model uses a file called `security.json`. A description of this file and how it is created and maintained is covered in the section <<authentication-and-authorization-plugins.adoc#authentication-and-authorization-plugins,Authentication and Authorization Plugins>>. If this file is created after an initial startup of Solr, a restart of each node of the system is required.
[[KerberosAuthenticationPlugin-ServicePrincipalsandKeytabFiles]]
=== Service Principals and Keytab Files
Each Solr node must have a service principal registered with the Key Distribution Center (KDC). The Kerberos plugin uses SPNego to negotiate authentication.
@ -56,7 +53,6 @@ Along with the service principal, each Solr node needs a keytab file which shoul
Since a Solr cluster requires internode communication, each node must also be able to make Kerberos enabled requests to other nodes. By default, Solr uses the same service principal and keytab as a 'client principal' for internode communication. You may configure a distinct client principal explicitly, but doing so is not recommended and is not covered in the examples below.
[[KerberosAuthenticationPlugin-KerberizedZooKeeper]]
=== Kerberized ZooKeeper
When setting up a kerberized SolrCloud cluster, it is recommended to enable Kerberos security for ZooKeeper as well.
@ -65,15 +61,13 @@ In such a setup, the client principal used to authenticate requests with ZooKeep
See the <<ZooKeeper Configuration>> section below for an example of starting ZooKeeper in Kerberos mode.
[[KerberosAuthenticationPlugin-BrowserConfiguration]]
=== Browser Configuration
In order for your browser to access the Solr Admin UI after enabling Kerberos authentication, it must be able to negotiate with the Kerberos authenticator service to allow you access. Each browser supports this differently, and some (like Chrome) do not support it at all. If you see 401 errors when trying to access the Solr Admin UI after enabling Kerberos authentication, it's likely your browser has not been configured properly to know how or where to negotiate the authentication request.
Detailed information on how to set up your browser is beyond the scope of this documentation; please see your system administrators for Kerberos for details on how to configure your browser.
[[KerberosAuthenticationPlugin-PluginConfiguration]]
== Plugin Configuration
== Kerberos Authentication Configuration
.Consult Your Kerberos Admins!
[WARNING]
@ -97,7 +91,6 @@ We'll walk through each of these steps below.
To use host names instead of IP addresses, use the `SOLR_HOST` configuration in `bin/solr.in.sh` or pass a `-Dhost=<hostname>` system parameter during Solr startup. This guide uses IP addresses. If you specify a hostname, replace all the IP addresses in the guide with the Solr hostname as appropriate.
====
[[KerberosAuthenticationPlugin-GetServicePrincipalsandKeytabs]]
=== Get Service Principals and Keytabs
Before configuring Solr, make sure you have a Kerberos service principal for each Solr host and ZooKeeper (if ZooKeeper has not already been configured) available in the KDC server, and generate a keytab file as shown below.
@ -128,7 +121,6 @@ Copy the keytab file from the KDC servers `/tmp/107.keytab` location to the S
You might need to take similar steps to create a ZooKeeper service principal and keytab if it has not already been set up. In that case, the example below shows a different service principal for ZooKeeper, so the above might be repeated with `zookeeper/host1` as the service principal for one of the nodes
[[KerberosAuthenticationPlugin-ZooKeeperConfiguration]]
=== ZooKeeper Configuration
If you are using a ZooKeeper that has already been configured to use Kerberos, you can skip the ZooKeeper-related steps shown here.
@ -173,7 +165,6 @@ Once all of the pieces are in place, start ZooKeeper with the following paramete
bin/zkServer.sh start -Djava.security.auth.login.config=/etc/zookeeper/conf/jaas-client.conf
----
[[KerberosAuthenticationPlugin-Createsecurity.json]]
=== Create security.json
Create the `security.json` file.
@ -194,7 +185,6 @@ More details on how to use a `/security.json` file in Solr are available in the
If you already have a `/security.json` file in ZooKeeper, download the file, add or modify the authentication section and upload it back to ZooKeeper using the <<command-line-utilities.adoc#command-line-utilities,Command Line Utilities>> available in Solr.
====
[[KerberosAuthenticationPlugin-DefineaJAASConfigurationFile]]
=== Define a JAAS Configuration File
The JAAS configuration file defines the properties to use for authentication, such as the service principal and the location of the keytab file. Other properties can also be set to ensure ticket caching and other features.
@ -227,7 +217,6 @@ The main properties we are concerned with are the `keyTab` and `principal` prope
* `debug`: this boolean property will output debug messages for help in troubleshooting.
* `principal`: the name of the service principal to be used.
[[KerberosAuthenticationPlugin-SolrStartupParameters]]
=== Solr Startup Parameters
While starting up Solr, the following host-specific parameters need to be passed. These parameters can be passed at the command line with the `bin/solr` start command (see <<solr-control-script-reference.adoc#solr-control-script-reference,Solr Control Script Reference>> for details on how to pass system parameters) or defined in `bin/solr.in.sh` or `bin/solr.in.cmd` as appropriate for your operating system.
@ -252,7 +241,6 @@ The app name (section name) within the JAAS configuration file which is required
`java.security.auth.login.config`::
Path to the JAAS configuration file for configuring a Solr client for internode communication. This parameter is required.
Here is an example that could be added to `bin/solr.in.sh`. Make sure to change this example to use the right hostname and the keytab file path.
[source,bash]
@ -273,7 +261,6 @@ For Java 1.8, this is available here: http://www.oracle.com/technetwork/java/jav
Replace the `local_policy.jar` present in `JAVA_HOME/jre/lib/security/` with the new `local_policy.jar` from the downloaded package and restart the Solr node.
====
[[KerberosAuthenticationPlugin-UsingDelegationTokens]]
=== Using Delegation Tokens
The Kerberos plugin can be configured to use delegation tokens, which allow an application to reuse the authentication of an end-user or another application.
@ -304,7 +291,6 @@ The ZooKeeper path where the secret provider information is stored. This is in t
`solr.kerberos.delegation.token.secret.manager.znode.working.path`::
The ZooKeeper path where token information is stored. This is in the form of the path + /security/zkdtsm. The path can include the chroot or the chroot can be omitted if you are not using it. This example includes the chroot: `server1:9983,server2:9983,server3:9983/solr/security/zkdtsm`.
[[KerberosAuthenticationPlugin-StartSolr]]
=== Start Solr
Once the configuration is complete, you can start Solr with the `bin/solr` script, as in the example below, which is for users in SolrCloud mode only. This example assumes you modified `bin/solr.in.sh` or `bin/solr.in.cmd`, with the proper values, but if you did not, you would pass the system parameters along with the start command. Note you also need to customize the `-z` property as appropriate for the location of your ZooKeeper nodes.
@ -314,7 +300,6 @@ Once the configuration is complete, you can start Solr with the `bin/solr` scrip
bin/solr -c -z server1:2181,server2:2181,server3:2181/solr
----
[[KerberosAuthenticationPlugin-TesttheConfiguration]]
=== Test the Configuration
. Do a `kinit` with your username. For example, `kinit \user@EXAMPLE.COM`.
@ -325,7 +310,6 @@ bin/solr -c -z server1:2181,server2:2181,server3:2181/solr
curl --negotiate -u : "http://192.168.0.107:8983/solr/"
----
[[KerberosAuthenticationPlugin-UsingSolrJwithaKerberizedSolr]]
== Using SolrJ with a Kerberized Solr
To use Kerberos authentication in a SolrJ application, you need the following two lines before you create a SolrClient:
@ -353,7 +337,6 @@ SolrJClient {
};
----
[[KerberosAuthenticationPlugin-DelegationTokenswithSolrJ]]
=== Delegation Tokens with SolrJ
Delegation tokens are also supported with SolrJ, in the following ways:

View File

@ -26,7 +26,6 @@ In other languages the tokenization rules are often not so simple. Some European
For information about language detection at index time, see <<detecting-languages-during-indexing.adoc#detecting-languages-during-indexing,Detecting Languages During Indexing>>.
[[LanguageAnalysis-KeywordMarkerFilterFactory]]
== KeywordMarkerFilterFactory
Protects words from being modified by stemmers. A customized protected word list may be specified with the "protected" attribute in the schema. Any words in the protected word list will not be modified by any stemmer in Solr.
@ -44,7 +43,6 @@ A sample Solr `protwords.txt` with comments can be found in the `sample_techprod
</fieldtype>
----
[[LanguageAnalysis-KeywordRepeatFilterFactory]]
== KeywordRepeatFilterFactory
Emits each token twice, one with the `KEYWORD` attribute and once without.
@ -69,8 +67,6 @@ A sample fieldType configuration could look like this:
IMPORTANT: When adding the same token twice, it will also score twice (double), so you may have to re-tune your ranking rules.
[[LanguageAnalysis-StemmerOverrideFilterFactory]]
== StemmerOverrideFilterFactory
Overrides stemming algorithms by applying a custom mapping, then protecting these terms from being modified by stemmers.
@ -90,7 +86,6 @@ A sample http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/core/src/test-fil
</fieldtype>
----
[[LanguageAnalysis-DictionaryCompoundWordTokenFilter]]
== Dictionary Compound Word Token Filter
This filter splits, or _decompounds_, compound words into individual words using a dictionary of the component words. Each input token is passed through unchanged. If it can also be decompounded into subwords, each subword is also added to the stream at the same logical position.
@ -129,7 +124,6 @@ Assume that `germanwords.txt` contains at least the following words: `dumm kopf
*Out:* "Donaudampfschiff"(1), "Donau"(1), "dampf"(1), "schiff"(1), "dummkopf"(2), "dumm"(2), "kopf"(2)
[[LanguageAnalysis-UnicodeCollation]]
== Unicode Collation
Unicode Collation is a language-sensitive method of sorting text that can also be used for advanced search purposes.
@ -175,7 +169,6 @@ Expert options:
`variableTop`:: Single character or contraction. Controls what is variable for `alternate`.
[[LanguageAnalysis-SortingTextforaSpecificLanguage]]
=== Sorting Text for a Specific Language
In this example, text is sorted according to the default German rules provided by ICU4J.
@ -223,7 +216,6 @@ An example using the "city_sort" field to sort:
q=*:*&fl=city&sort=city_sort+asc
----
[[LanguageAnalysis-SortingTextforMultipleLanguages]]
=== Sorting Text for Multiple Languages
There are two approaches to supporting multiple languages: if there is a small list of languages you wish to support, consider defining collated fields for each language and using `copyField`. However, adding a large number of sort fields can increase disk and indexing costs. An alternative approach is to use the Unicode `default` collator.
@ -237,7 +229,6 @@ The Unicode `default` or `ROOT` locale has rules that are designed to work well
strength="primary" />
----
[[LanguageAnalysis-SortingTextwithCustomRules]]
=== Sorting Text with Custom Rules
You can define your own set of sorting rules. It's easiest to take existing rules that are close to what you want and customize them.
@ -277,7 +268,6 @@ This rule set can now be used for custom collation in Solr:
strength="primary" />
----
[[LanguageAnalysis-JDKCollation]]
=== JDK Collation
As mentioned above, ICU Unicode Collation is better in several ways than JDK Collation, but if you cannot use ICU4J for some reason, you can use `solr.CollationField`.
@ -321,7 +311,6 @@ Using a Tailored ruleset:
== ASCII & Decimal Folding Filters
[[LanguageAnalysis-AsciiFolding]]
=== ASCII Folding
This filter converts alphabetic, numeric, and symbolic Unicode characters which are not in the first 127 ASCII characters (the "Basic Latin" Unicode block) into their ASCII equivalents, if one exists. Only those characters with reasonable ASCII alternatives are converted.
@ -348,7 +337,6 @@ This can increase recall by causing more matches. On the other hand, it can redu
*Out:* "Bjorn", "Angstrom"
[[LanguageAnalysis-DecimalDigitFolding]]
=== Decimal Digit Folding
This filter converts any character in the Unicode "Decimal Number" general category (`Nd`) into their equivalent Basic Latin digits (0-9).
@ -369,7 +357,6 @@ This can increase recall by causing more matches. On the other hand, it can redu
</analyzer>
----
[[LanguageAnalysis-Language-SpecificFactories]]
== Language-Specific Factories
These factories are each designed to work with specific languages. The languages covered here are:
@ -380,8 +367,8 @@ These factories are each designed to work with specific languages. The languages
* <<Catalan>>
* <<Traditional Chinese>>
* <<Simplified Chinese>>
* <<LanguageAnalysis-Czech,Czech>>
* <<LanguageAnalysis-Danish,Danish>>
* <<Czech>>
* <<Danish>>
* <<Dutch>>
* <<Finnish>>
@ -389,7 +376,7 @@ These factories are each designed to work with specific languages. The languages
* <<Galician>>
* <<German>>
* <<Greek>>
* <<LanguageAnalysis-Hebrew_Lao_Myanmar_Khmer,Hebrew, Lao, Myanmar, Khmer>>
* <<hebrew-lao-myanmar-khmer,Hebrew, Lao, Myanmar, Khmer>>
* <<Hindi>>
* <<Indonesian>>
* <<Italian>>
@ -410,7 +397,6 @@ These factories are each designed to work with specific languages. The languages
* <<Turkish>>
* <<Ukrainian>>
[[LanguageAnalysis-Arabic]]
=== Arabic
Solr provides support for the http://www.mtholyoke.edu/~lballest/Pubs/arab_stem05.pdf[Light-10] (PDF) stemming algorithm, and Lucene includes an example stopword list.
@ -432,7 +418,6 @@ This algorithm defines both character normalization and stemming, so these are s
</analyzer>
----
[[LanguageAnalysis-BrazilianPortuguese]]
=== Brazilian Portuguese
This is a Java filter written specifically for stemming the Brazilian dialect of the Portuguese language. It uses the Lucene class `org.apache.lucene.analysis.br.BrazilianStemmer`. Although that stemmer can be configured to use a list of protected words (which should not be stemmed), this factory does not accept any arguments to specify such a list.
@ -457,7 +442,6 @@ This is a Java filter written specifically for stemming the Brazilian dialect of
*Out:* "pra", "pra"
[[LanguageAnalysis-Bulgarian]]
=== Bulgarian
Solr includes a light stemmer for Bulgarian, following http://members.unine.ch/jacques.savoy/Papers/BUIR.pdf[this algorithm] (PDF), and Lucene includes an example stopword list.
@ -477,7 +461,6 @@ Solr includes a light stemmer for Bulgarian, following http://members.unine.ch/j
</analyzer>
----
[[LanguageAnalysis-Catalan]]
=== Catalan
Solr can stem Catalan using the Snowball Porter Stemmer with an argument of `language="Catalan"`. Solr includes a set of contractions for Catalan, which can be stripped using `solr.ElisionFilterFactory`.
@ -507,14 +490,13 @@ Solr can stem Catalan using the Snowball Porter Stemmer with an argument of `lan
*Out:* "llengu"(1), "llengu"(2)
[[LanguageAnalysis-TraditionalChinese]]
=== Traditional Chinese
The default configuration of the <<tokenizers.adoc#Tokenizers-ICUTokenizer,ICU Tokenizer>> is suitable for Traditional Chinese text. It follows the Word Break rules from the Unicode Text Segmentation algorithm for non-Chinese text, and uses a dictionary to segment Chinese words. To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <<lib-directives-in-solrconfig.adoc#lib-directives-in-solrconfig,Lib Directives in SolrConfig>>). See the `solr/contrib/analysis-extras/README.txt` for information on which jars you need to add to your `SOLR_HOME/lib`.
The default configuration of the <<tokenizers.adoc#icu-tokenizer,ICU Tokenizer>> is suitable for Traditional Chinese text. It follows the Word Break rules from the Unicode Text Segmentation algorithm for non-Chinese text, and uses a dictionary to segment Chinese words. To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <<lib-directives-in-solrconfig.adoc#lib-directives-in-solrconfig,Lib Directives in SolrConfig>>). See the `solr/contrib/analysis-extras/README.txt` for information on which jars you need to add to your `SOLR_HOME/lib`.
<<tokenizers.adoc#Tokenizers-StandardTokenizer,Standard Tokenizer>> can also be used to tokenize Traditional Chinese text. Following the Word Break rules from the Unicode Text Segmentation algorithm, it produces one token per Chinese character. When combined with <<LanguageAnalysis-CJKBigramFilter,CJK Bigram Filter>>, overlapping bigrams of Chinese characters are formed.
<<tokenizers.adoc#standard-tokenizer,Standard Tokenizer>> can also be used to tokenize Traditional Chinese text. Following the Word Break rules from the Unicode Text Segmentation algorithm, it produces one token per Chinese character. When combined with <<CJK Bigram Filter>>, overlapping bigrams of Chinese characters are formed.
<<LanguageAnalysis-CJKWidthFilter,CJK Width Filter>> folds fullwidth ASCII variants into the equivalent Basic Latin forms.
<<CJK Width Filter>> folds fullwidth ASCII variants into the equivalent Basic Latin forms.
*Examples:*
@ -537,10 +519,9 @@ The default configuration of the <<tokenizers.adoc#Tokenizers-ICUTokenizer,ICU T
</analyzer>
----
[[LanguageAnalysis-CJKBigramFilter]]
=== CJK Bigram Filter
Forms bigrams (overlapping 2-character sequences) of CJK characters that are generated from <<tokenizers.adoc#Tokenizers-StandardTokenizer,Standard Tokenizer>> or <<tokenizers.adoc#Tokenizers-ICUTokenizer,ICU Tokenizer>>.
Forms bigrams (overlapping 2-character sequences) of CJK characters that are generated from <<tokenizers.adoc#standard-tokenizer,Standard Tokenizer>> or <<tokenizers.adoc#icu-tokenizer,ICU Tokenizer>>.
By default, all CJK characters produce bigrams, but finer grained control is available by specifying orthographic type arguments `han`, `hiragana`, `katakana`, and `hangul`. When set to `false`, characters of the corresponding type will be passed through as unigrams, and will not be included in any bigrams.
@ -560,18 +541,17 @@ In all cases, all non-CJK input is passed through unmodified.
`outputUnigrams`:: (true/false) If true, in addition to forming bigrams, all characters are also passed through as unigrams. Default is false.
See the example under <<LanguageAnalysis-TraditionalChinese,Traditional Chinese>>.
See the example under <<Traditional Chinese>>.
[[LanguageAnalysis-SimplifiedChinese]]
=== Simplified Chinese
For Simplified Chinese, Solr provides support for Chinese sentence and word segmentation with the <<LanguageAnalysis-HMMChineseTokenizer,HMM Chinese Tokenizer>>. This component includes a large dictionary and segments Chinese text into words with the Hidden Markov Model. To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <<lib-directives-in-solrconfig.adoc#lib-directives-in-solrconfig,Lib Directives in SolrConfig>>). See the `solr/contrib/analysis-extras/README.txt` for information on which jars you need to add to your `SOLR_HOME/lib`.
For Simplified Chinese, Solr provides support for Chinese sentence and word segmentation with the <<HMM Chinese Tokenizer>>. This component includes a large dictionary and segments Chinese text into words with the Hidden Markov Model. To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <<lib-directives-in-solrconfig.adoc#lib-directives-in-solrconfig,Lib Directives in SolrConfig>>). See the `solr/contrib/analysis-extras/README.txt` for information on which jars you need to add to your `SOLR_HOME/lib`.
The default configuration of the <<tokenizers.adoc#Tokenizers-ICUTokenizer,ICU Tokenizer>> is also suitable for Simplified Chinese text. It follows the Word Break rules from the Unicode Text Segmentation algorithm for non-Chinese text, and uses a dictionary to segment Chinese words. To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <<lib-directives-in-solrconfig.adoc#lib-directives-in-solrconfig,Lib Directives in SolrConfig>>). See the `solr/contrib/analysis-extras/README.txt` for information on which jars you need to add to your `SOLR_HOME/lib`.
The default configuration of the <<tokenizers.adoc#icu-tokenizer,ICU Tokenizer>> is also suitable for Simplified Chinese text. It follows the Word Break rules from the Unicode Text Segmentation algorithm for non-Chinese text, and uses a dictionary to segment Chinese words. To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <<lib-directives-in-solrconfig.adoc#lib-directives-in-solrconfig,Lib Directives in SolrConfig>>). See the `solr/contrib/analysis-extras/README.txt` for information on which jars you need to add to your `SOLR_HOME/lib`.
Also useful for Chinese analysis:
<<LanguageAnalysis-CJKWidthFilter,CJK Width Filter>> folds fullwidth ASCII variants into the equivalent Basic Latin forms, and folds halfwidth Katakana variants into their equivalent fullwidth forms.
<<CJK Width Filter>> folds fullwidth ASCII variants into the equivalent Basic Latin forms, and folds halfwidth Katakana variants into their equivalent fullwidth forms.
*Examples:*
@ -598,7 +578,6 @@ Also useful for Chinese analysis:
</analyzer>
----
[[LanguageAnalysis-HMMChineseTokenizer]]
=== HMM Chinese Tokenizer
For Simplified Chinese, Solr provides support for Chinese sentence and word segmentation with the `solr.HMMChineseTokenizerFactory` in the `analysis-extras` contrib module. This component includes a large dictionary and segments Chinese text into words with the Hidden Markov Model. To use this tokenizer, see `solr/contrib/analysis-extras/README.txt` for instructions on which jars you need to add to your `solr_home/lib`.
@ -613,9 +592,8 @@ To use the default setup with fallback to English Porter stemmer for English wor
`<analyzer class="org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer"/>`
Or to configure your own analysis setup, use the `solr.HMMChineseTokenizerFactory` along with your custom filter setup. See an example of this in the <<LanguageAnalysis-SimplifiedChinese,Simplified Chinese>> section.
Or to configure your own analysis setup, use the `solr.HMMChineseTokenizerFactory` along with your custom filter setup. See an example of this in the <<Simplified Chinese>> section.
[[LanguageAnalysis-Czech]]
=== Czech
Solr includes a light stemmer for Czech, following https://dl.acm.org/citation.cfm?id=1598600[this algorithm], and Lucene includes an example stopword list.
@ -641,12 +619,11 @@ Solr includes a light stemmer for Czech, following https://dl.acm.org/citation.c
*Out:* "preziden", "preziden", "preziden"
[[LanguageAnalysis-Danish]]
=== Danish
Solr can stem Danish using the Snowball Porter Stemmer with an argument of `language="Danish"`.
Also relevant are the <<LanguageAnalysis-Scandinavian,Scandinavian normalization filters>>.
Also relevant are the <<Scandinavian,Scandinavian normalization filters>>.
*Factory class:* `solr.SnowballPorterFilterFactory`
@ -671,8 +648,6 @@ Also relevant are the <<LanguageAnalysis-Scandinavian,Scandinavian normalization
*Out:* "undersøg"(1), "undersøg"(2)
[[LanguageAnalysis-Dutch]]
=== Dutch
Solr can stem Dutch using the Snowball Porter Stemmer with an argument of `language="Dutch"`.
@ -700,7 +675,6 @@ Solr can stem Dutch using the Snowball Porter Stemmer with an argument of `langu
*Out:* "kanal", "kanal"
[[LanguageAnalysis-Finnish]]
=== Finnish
Solr includes support for stemming Finnish, and Lucene includes an example stopword list.
@ -726,10 +700,8 @@ Solr includes support for stemming Finnish, and Lucene includes an example stopw
*Out:* "kala", "kala"
[[LanguageAnalysis-French]]
=== French
[[LanguageAnalysis-ElisionFilter]]
==== Elision Filter
Removes article elisions from a token stream. This filter can be useful for languages such as French, Catalan, Italian, and Irish.
@ -760,7 +732,6 @@ Removes article elisions from a token stream. This filter can be useful for lang
*Out:* "histoire", "art"
[[LanguageAnalysis-FrenchLightStemFilter]]
==== French Light Stem Filter
Solr includes three stemmers for French: one in the `solr.SnowballPorterFilterFactory`, a lighter stemmer called `solr.FrenchLightStemFilterFactory`, and an even less aggressive stemmer called `solr.FrenchMinimalStemFilterFactory`. Lucene includes an example stopword list.
@ -800,7 +771,6 @@ Solr includes three stemmers for French: one in the `solr.SnowballPorterFilterFa
*Out:* "le", "chat", "le", "chat"
[[LanguageAnalysis-Galician]]
=== Galician
Solr includes a stemmer for Galician following http://bvg.udc.es/recursos_lingua/stemming.jsp[this algorithm], and Lucene includes an example stopword list.
@ -826,8 +796,6 @@ Solr includes a stemmer for Galician following http://bvg.udc.es/recursos_lingua
*Out:* "feliz", "luz"
[[LanguageAnalysis-German]]
=== German
Solr includes four stemmers for German: one in the `solr.SnowballPorterFilterFactory language="German"`, a stemmer called `solr.GermanStemFilterFactory`, a lighter stemmer called `solr.GermanLightStemFilterFactory`, and an even less aggressive stemmer called `solr.GermanMinimalStemFilterFactory`. Lucene includes an example stopword list.
@ -868,8 +836,6 @@ Solr includes four stemmers for German: one in the `solr.SnowballPorterFilterFac
*Out:* "haus", "haus"
[[LanguageAnalysis-Greek]]
=== Greek
This filter converts uppercase letters in the Greek character set to the equivalent lowercase character.
@ -893,7 +859,6 @@ Use of custom charsets is no longer supported as of Solr 3.1. If you need to ind
</analyzer>
----
[[LanguageAnalysis-Hindi]]
=== Hindi
Solr includes support for stemming Hindi following http://computing.open.ac.uk/Sites/EACLSouthAsia/Papers/p6-Ramanathan.pdf[this algorithm] (PDF), support for common spelling differences through the `solr.HindiNormalizationFilterFactory`, support for encoding differences through the `solr.IndicNormalizationFilterFactory` following http://ldc.upenn.edu/myl/IndianScriptsUnicode.html[this algorithm], and Lucene includes an example stopword list.
@ -914,8 +879,6 @@ Solr includes support for stemming Hindi following http://computing.open.ac.uk/S
</analyzer>
----
[[LanguageAnalysis-Indonesian]]
=== Indonesian
Solr includes support for stemming Indonesian (Bahasa Indonesia) following http://www.illc.uva.nl/Publications/ResearchReports/MoL-2003-02.text.pdf[this algorithm] (PDF), and Lucene includes an example stopword list.
@ -941,7 +904,6 @@ Solr includes support for stemming Indonesian (Bahasa Indonesia) following http:
*Out:* "bagai", "bagai"
[[LanguageAnalysis-Italian]]
=== Italian
Solr includes two stemmers for Italian: one in the `solr.SnowballPorterFilterFactory language="Italian"`, and a lighter stemmer called `solr.ItalianLightStemFilterFactory`. Lucene includes an example stopword list.
@ -969,7 +931,6 @@ Solr includes two stemmers for Italian: one in the `solr.SnowballPorterFilterFac
*Out:* "propag", "propag", "propag"
[[LanguageAnalysis-Irish]]
=== Irish
Solr can stem Irish using the Snowball Porter Stemmer with an argument of `language="Irish"`. Solr includes `solr.IrishLowerCaseFilterFactory`, which can handle Irish-specific constructs. Solr also includes a set of contractions for Irish which can be stripped using `solr.ElisionFilterFactory`.
@ -999,22 +960,20 @@ Solr can stem Irish using the Snowball Porter Stemmer with an argument of `langu
*Out:* "siopadóir", "síceapaite", "fearr", "athair"
[[LanguageAnalysis-Japanese]]
=== Japanese
Solr includes support for analyzing Japanese, via the Lucene Kuromoji morphological analyzer, which includes several analysis components - more details on each below:
* <<LanguageAnalysis-JapaneseIterationMarkCharFilter,`JapaneseIterationMarkCharFilter`>> normalizes Japanese horizontal iteration marks (odoriji) to their expanded form.
* <<LanguageAnalysis-JapaneseTokenizer,`JapaneseTokenizer`>> tokenizes Japanese using morphological analysis, and annotates each term with part-of-speech, base form (a.k.a. lemma), reading and pronunciation.
* <<LanguageAnalysis-JapaneseBaseFormFilter,`JapaneseBaseFormFilter`>> replaces original terms with their base forms (a.k.a. lemmas).
* <<LanguageAnalysis-JapanesePartOfSpeechStopFilter,`JapanesePartOfSpeechStopFilter`>> removes terms that have one of the configured parts-of-speech.
* <<LanguageAnalysis-JapaneseKatakanaStemFilter,`JapaneseKatakanaStemFilter`>> normalizes common katakana spelling variations ending in a long sound character (U+30FC) by removing the long sound character.
* <<Japanese Iteration Mark CharFilter,`JapaneseIterationMarkCharFilter`>> normalizes Japanese horizontal iteration marks (odoriji) to their expanded form.
* <<Japanese Tokenizer,`JapaneseTokenizer`>> tokenizes Japanese using morphological analysis, and annotates each term with part-of-speech, base form (a.k.a. lemma), reading and pronunciation.
* <<Japanese Base Form Filter,`JapaneseBaseFormFilter`>> replaces original terms with their base forms (a.k.a. lemmas).
* <<Japanese Part Of Speech Stop Filter,`JapanesePartOfSpeechStopFilter`>> removes terms that have one of the configured parts-of-speech.
* <<Japanese Katakana Stem Filter,`JapaneseKatakanaStemFilter`>> normalizes common katakana spelling variations ending in a long sound character (U+30FC) by removing the long sound character.
Also useful for Japanese analysis, from lucene-analyzers-common:
* <<LanguageAnalysis-CJKWidthFilter,`CJKWidthFilter`>> folds fullwidth ASCII variants into the equivalent Basic Latin forms, and folds halfwidth Katakana variants into their equivalent fullwidth forms.
* <<CJK Width Filter,`CJKWidthFilter`>> folds fullwidth ASCII variants into the equivalent Basic Latin forms, and folds halfwidth Katakana variants into their equivalent fullwidth forms.
[[LanguageAnalysis-JapaneseIterationMarkCharFilter]]
==== Japanese Iteration Mark CharFilter
Normalizes horizontal Japanese iteration marks (odoriji) to their expanded form. Vertical iteration marks are not supported.
@ -1027,7 +986,6 @@ Normalizes horizontal Japanese iteration marks (odoriji) to their expanded form.
`normalizeKana`:: set to `false` to not normalize kana iteration marks (default is `true`)
[[LanguageAnalysis-JapaneseTokenizer]]
==== Japanese Tokenizer
Tokenizer for Japanese that uses morphological analysis, and annotates each term with part-of-speech, base form (a.k.a. lemma), reading and pronunciation.
@ -1052,7 +1010,6 @@ For some applications it might be good to use `search` mode for indexing and `no
`discardPunctuation`:: set to `false` to keep punctuation, `true` to discard (the default)
[[LanguageAnalysis-JapaneseBaseFormFilter]]
==== Japanese Base Form Filter
Replaces original terms' text with the corresponding base form (lemma). (`JapaneseTokenizer` annotates each term with its base form.)
@ -1061,7 +1018,6 @@ Replaces original terms' text with the corresponding base form (lemma). (`Japane
(no arguments)
[[LanguageAnalysis-JapanesePartOfSpeechStopFilter]]
==== Japanese Part Of Speech Stop Filter
Removes terms with one of the configured parts-of-speech. `JapaneseTokenizer` annotates terms with parts-of-speech.
@ -1074,12 +1030,11 @@ Removes terms with one of the configured parts-of-speech. `JapaneseTokenizer` an
`enablePositionIncrements`:: if `luceneMatchVersion` is `4.3` or earlier and `enablePositionIncrements="false"`, no position holes will be left by this filter when it removes tokens. *This argument is invalid if `luceneMatchVersion` is `5.0` or later.*
[[LanguageAnalysis-JapaneseKatakanaStemFilter]]
==== Japanese Katakana Stem Filter
Normalizes common katakana spelling variations ending in a long sound character (U+30FC) by removing the long sound character.
<<LanguageAnalysis-CJKWidthFilter,`solr.CJKWidthFilterFactory`>> should be specified prior to this filter to normalize half-width katakana to full-width.
<<CJK Width Filter,`solr.CJKWidthFilterFactory`>> should be specified prior to this filter to normalize half-width katakana to full-width.
*Factory class:* `JapaneseKatakanaStemFilterFactory`
@ -1087,7 +1042,6 @@ Normalizes common katakana spelling variations ending in a long sound character
`minimumLength`:: terms below this length will not be stemmed. Default is 4, value must be 2 or more.
[[LanguageAnalysis-CJKWidthFilter]]
==== CJK Width Filter
Folds fullwidth ASCII variants into the equivalent Basic Latin forms, and folds halfwidth Katakana variants into their equivalent fullwidth forms.
@ -1115,14 +1069,13 @@ Example:
</fieldType>
----
[[LanguageAnalysis-Hebrew_Lao_Myanmar_Khmer]]
[[hebrew-lao-myanmar-khmer]]
=== Hebrew, Lao, Myanmar, Khmer
Lucene provides support, in addition to UAX#29 word break rules, for Hebrew's use of the double and single quote characters, and for segmenting Lao, Myanmar, and Khmer into syllables with the `solr.ICUTokenizerFactory` in the `analysis-extras` contrib module. To use this tokenizer, see `solr/contrib/analysis-extras/README.txt for` instructions on which jars you need to add to your `solr_home/lib`.
See <<tokenizers.adoc#Tokenizers-ICUTokenizer,the ICUTokenizer>> for more information.
See <<tokenizers.adoc#icu-tokenizer,the ICUTokenizer>> for more information.
[[LanguageAnalysis-Latvian]]
=== Latvian
Solr includes support for stemming Latvian, and Lucene includes an example stopword list.
@ -1150,16 +1103,14 @@ Solr includes support for stemming Latvian, and Lucene includes an example stopw
*Out:* "tirg", "tirg"
[[LanguageAnalysis-Norwegian]]
=== Norwegian
Solr includes two classes for stemming Norwegian, `NorwegianLightStemFilterFactory` and `NorwegianMinimalStemFilterFactory`. Lucene includes an example stopword list.
Another option is to use the Snowball Porter Stemmer with an argument of language="Norwegian".
Also relevant are the <<LanguageAnalysis-Scandinavian,Scandinavian normalization filters>>.
Also relevant are the <<Scandinavian,Scandinavian normalization filters>>.
[[LanguageAnalysis-NorwegianLightStemmer]]
==== Norwegian Light Stemmer
The `NorwegianLightStemFilterFactory` requires a "two-pass" sort for the -dom and -het endings. This means that in the first pass the word "kristendom" is stemmed to "kristen", and then all the general rules apply so it will be further stemmed to "krist". The effect of this is that "kristen," "kristendom," "kristendommen," and "kristendommens" will all be stemmed to "krist."
@ -1209,7 +1160,6 @@ The second pass is to pick up -dom and -het endings. Consider this example:
*Out:* "forelske"
[[LanguageAnalysis-NorwegianMinimalStemmer]]
==== Norwegian Minimal Stemmer
The `NorwegianMinimalStemFilterFactory` stems plural forms of Norwegian nouns only.
@ -1244,10 +1194,8 @@ The `NorwegianMinimalStemFilterFactory` stems plural forms of Norwegian nouns on
*Out:* "bil"
[[LanguageAnalysis-Persian]]
=== Persian
[[LanguageAnalysis-PersianFilterFactories]]
==== Persian Filter Factories
Solr includes support for normalizing Persian, and Lucene includes an example stopword list.
@ -1267,7 +1215,6 @@ Solr includes support for normalizing Persian, and Lucene includes an example st
</analyzer>
----
[[LanguageAnalysis-Polish]]
=== Polish
Solr provides support for Polish stemming with the `solr.StempelPolishStemFilterFactory`, and `solr.MorphologikFilterFactory` for lemmatization, in the `contrib/analysis-extras` module. The `solr.StempelPolishStemFilterFactory` component includes an algorithmic stemmer with tables for Polish. To use either of these filters, see `solr/contrib/analysis-extras/README.txt` for instructions on which jars you need to add to your `solr_home/lib`.
@ -1308,7 +1255,6 @@ Note the lower case filter is applied _after_ the Morfologik stemmer; this is be
The Morfologik dictionary parameter value is a constant specifying which dictionary to choose. The dictionary resource must be named `path/to/_language_.dict` and have an associated `.info` metadata file. See http://morfologik.blogspot.com/[the Morfologik project] for details. If the dictionary attribute is not provided, the Polish dictionary is loaded and used by default.
[[LanguageAnalysis-Portuguese]]
=== Portuguese
Solr includes four stemmers for Portuguese: one in the `solr.SnowballPorterFilterFactory`, an alternative stemmer called `solr.PortugueseStemFilterFactory`, a lighter stemmer called `solr.PortugueseLightStemFilterFactory`, and an even less aggressive stemmer called `solr.PortugueseMinimalStemFilterFactory`. Lucene includes an example stopword list.
@ -1352,8 +1298,6 @@ Solr includes four stemmers for Portuguese: one in the `solr.SnowballPorterFilte
*Out:* "pra", "pra"
[[LanguageAnalysis-Romanian]]
=== Romanian
Solr can stem Romanian using the Snowball Porter Stemmer with an argument of `language="Romanian"`.
@ -1375,11 +1319,8 @@ Solr can stem Romanian using the Snowball Porter Stemmer with an argument of `la
</analyzer>
----
[[LanguageAnalysis-Russian]]
=== Russian
[[LanguageAnalysis-RussianStemFilter]]
==== Russian Stem Filter
Solr includes two stemmers for Russian: one in the `solr.SnowballPorterFilterFactory language="Russian"`, and a lighter stemmer called `solr.RussianLightStemFilterFactory`. Lucene includes an example stopword list.
@ -1399,11 +1340,9 @@ Solr includes two stemmers for Russian: one in the `solr.SnowballPorterFilterFac
</analyzer>
----
[[LanguageAnalysis-Scandinavian]]
=== Scandinavian
Scandinavian is a language group spanning three languages <<LanguageAnalysis-Norwegian,Norwegian>>, <<LanguageAnalysis-Swedish,Swedish>> and <<LanguageAnalysis-Danish,Danish>> which are very similar.
Scandinavian is a language group spanning three languages <<Norwegian>>, <<Swedish>> and <<Danish>> which are very similar.
Swedish å, ä, ö are in fact the same letters as Norwegian and Danish å, æ, ø and thus interchangeable when used between these languages. They are however folded differently when people type them on a keyboard lacking these characters.
@ -1413,7 +1352,6 @@ There are two filters for helping with normalization between Scandinavian langua
See also each language section for other relevant filters.
[[LanguageAnalysis-ScandinavianNormalizationFilter]]
==== Scandinavian Normalization Filter
This filter normalize use of the interchangeable Scandinavian characters æÆäÄöÖøØ and folded variants (aa, ao, ae, oe and oo) by transforming them to åÅæÆøØ.
@ -1441,7 +1379,6 @@ It's a semantically less destructive solution than `ScandinavianFoldingFilter`,
*Out:* "blåbærsyltetøj", "blåbærsyltetøj", "blåbærsyltetøj", "blabarsyltetoj"
[[LanguageAnalysis-ScandinavianFoldingFilter]]
==== Scandinavian Folding Filter
This filter folds Scandinavian characters åÅäæÄÆ\->a and öÖøØ\->o. It also discriminate against use of double vowels aa, ae, ao, oe and oo, leaving just the first one.
@ -1469,10 +1406,8 @@ It's a semantically more destructive solution than `ScandinavianNormalizationFil
*Out:* "blabarsyltetoj", "blabarsyltetoj", "blabarsyltetoj", "blabarsyltetoj"
[[LanguageAnalysis-Serbian]]
=== Serbian
[[LanguageAnalysis-SerbianNormalizationFilter]]
==== Serbian Normalization Filter
Solr includes a filter that normalizes Serbian Cyrillic and Latin characters. Note that this filter only works with lowercased input.
@ -1499,7 +1434,6 @@ See the Solr wiki for tips & advice on using this filter: https://wiki.apache.or
</analyzer>
----
[[LanguageAnalysis-Spanish]]
=== Spanish
Solr includes two stemmers for Spanish: one in the `solr.SnowballPorterFilterFactory language="Spanish"`, and a lighter stemmer called `solr.SpanishLightStemFilterFactory`. Lucene includes an example stopword list.
@ -1526,15 +1460,13 @@ Solr includes two stemmers for Spanish: one in the `solr.SnowballPorterFilterFac
*Out:* "tor", "tor", "tor"
[[LanguageAnalysis-Swedish]]
=== Swedish
[[LanguageAnalysis-SwedishStemFilter]]
==== Swedish Stem Filter
Solr includes two stemmers for Swedish: one in the `solr.SnowballPorterFilterFactory language="Swedish"`, and a lighter stemmer called `solr.SwedishLightStemFilterFactory`. Lucene includes an example stopword list.
Also relevant are the <<LanguageAnalysis-Scandinavian,Scandinavian normalization filters>>.
Also relevant are the <<Scandinavian,Scandinavian normalization filters>>.
*Factory class:* `solr.SwedishStemFilterFactory`
@ -1557,8 +1489,6 @@ Also relevant are the <<LanguageAnalysis-Scandinavian,Scandinavian normalization
*Out:* "klok", "klok", "klok"
[[LanguageAnalysis-Thai]]
=== Thai
This filter converts sequences of Thai characters into individual Thai words. Unlike European languages, Thai does not use whitespace to delimit words.
@ -1577,7 +1507,6 @@ This filter converts sequences of Thai characters into individual Thai words. Un
</analyzer>
----
[[LanguageAnalysis-Turkish]]
=== Turkish
Solr includes support for stemming Turkish with the `solr.SnowballPorterFilterFactory`; support for case-insensitive search with the `solr.TurkishLowerCaseFilterFactory`; support for stripping apostrophes and following suffixes with `solr.ApostropheFilterFactory` (see http://www.ipcsit.com/vol57/015-ICNI2012-M021.pdf[Role of Apostrophes in Turkish Information Retrieval]); support for a form of stemming that truncating tokens at a configurable maximum length through the `solr.TruncateTokenFilterFactory` (see http://www.users.muohio.edu/canf/papers/JASIST2008offPrint.pdf[Information Retrieval on Turkish Texts]); and Lucene includes an example stopword list.
@ -1613,10 +1542,6 @@ Solr includes support for stemming Turkish with the `solr.SnowballPorterFilterFa
</analyzer>
----
[[LanguageAnalysis-BacktoTop#main]]
===
[[LanguageAnalysis-Ukrainian]]
=== Ukrainian
Solr provides support for Ukrainian lemmatization with the `solr.MorphologikFilterFactory`, in the `contrib/analysis-extras` module. To use this filter, see `solr/contrib/analysis-extras/README.txt` for instructions on which jars you need to add to your `solr_home/lib`.

View File

@ -22,21 +22,17 @@ With the *Learning To Rank* (or *LTR* for short) contrib module you can configur
The module also supports feature extraction inside Solr. The only thing you need to do outside Solr is train your own ranking model.
[[LearningToRank-Concepts]]
== Concepts
== Learning to Rank Concepts
[[LearningToRank-Re-Ranking]]
=== Re-Ranking
Re-Ranking allows you to run a simple query for matching documents and then re-rank the top N documents using the scores from a different, complex query. This page describes the use of *LTR* complex queries, information on other rank queries included in the Solr distribution can be found on the <<query-re-ranking.adoc#query-re-ranking,Query Re-Ranking>> page.
Re-Ranking allows you to run a simple query for matching documents and then re-rank the top N documents using the scores from a different, more complex query. This page describes the use of *LTR* complex queries, information on other rank queries included in the Solr distribution can be found on the <<query-re-ranking.adoc#query-re-ranking,Query Re-Ranking>> page.
[[LearningToRank-LearningToRank]]
=== Learning To Rank
=== Learning To Rank Models
In information retrieval systems, https://en.wikipedia.org/wiki/Learning_to_rank[Learning to Rank] is used to re-rank the top N retrieved documents using trained machine learning models. The hope is that such sophisticated models can make more nuanced ranking decisions than standard ranking functions like https://en.wikipedia.org/wiki/Tf%E2%80%93idf[TF-IDF] or https://en.wikipedia.org/wiki/Okapi_BM25[BM25].
[[LearningToRank-Model]]
==== Model
==== Ranking Model
A ranking model computes the scores used to rerank documents. Irrespective of any particular algorithm or implementation, a ranking model's computation can use three types of inputs:
@ -44,27 +40,23 @@ A ranking model computes the scores used to rerank documents. Irrespective of an
* features that represent the document being scored
* features that represent the query for which the document is being scored
[[LearningToRank-Feature]]
==== Feature
A feature is a value, a number, that represents some quantity or quality of the document being scored or of the query for which documents are being scored. For example documents often have a 'recency' quality and 'number of past purchases' might be a quantity that is passed to Solr as part of the search query.
[[LearningToRank-Normalizer]]
==== Normalizer
Some ranking models expect features on a particular scale. A normalizer can be used to translate arbitrary feature values into normalized values e.g. on a 0..1 or 0..100 scale.
[[LearningToRank-Training]]
=== Training
=== Training Models
[[LearningToRank-Featureengineering]]
==== Feature engineering
==== Feature Engineering
The LTR contrib module includes several feature classes as well as support for custom features. Each feature class's javadocs contain an example to illustrate use of that class. The process of https://en.wikipedia.org/wiki/Feature_engineering[feature engineering] itself is then entirely up to your domain expertise and creativity.
[cols=",,,",options="header",]
|===
|Feature |Class |Example parameters |<<LearningToRank-ExternalFeatureInformation,External Feature Information>>
|Feature |Class |Example parameters |<<External Feature Information>>
|field length |{solr-javadocs}/solr-ltr/org/apache/solr/ltr/feature/FieldLengthFeature.html[FieldLengthFeature] |`{"field":"title"}` |not (yet) supported
|field value |{solr-javadocs}/solr-ltr/org/apache/solr/ltr/feature/FieldValueFeature.html[FieldValueFeature] |`{"field":"hits"}` |not (yet) supported
|original score |{solr-javadocs}/solr-ltr/org/apache/solr/ltr/feature/OriginalScoreFeature.html[OriginalScoreFeature] |`{}` |not applicable
@ -84,12 +76,10 @@ The LTR contrib module includes several feature classes as well as support for c
|(custom) |(custom class extending {solr-javadocs}/solr-ltr/org/apache/solr/ltr/norm/Normalizer.html[Normalizer]) |
|===
[[LearningToRank-Featureextraction]]
==== Feature Extraction
The ltr contrib module includes a <<transforming-result-documents.adoc#transforming-result-documents,[features>> transformer] to support the calculation and return of feature values for https://en.wikipedia.org/wiki/Feature_extraction[feature extraction] purposes including and especially when you do not yet have an actual reranking model.
[[LearningToRank-Featureselectionandmodeltraining]]
==== Feature Selection and Model Training
Feature selection and model training take place offline and outside Solr. The ltr contrib module supports two generalized forms of models as well as custom models. Each model class's javadocs contain an example to illustrate configuration of that class. In the form of JSON files your trained model or models (e.g. different models for different customer geographies) can then be directly uploaded into Solr using provided REST APIs.
@ -102,8 +92,7 @@ Feature selection and model training take place offline and outside Solr. The lt
|(custom) |(custom class extending {solr-javadocs}/solr-ltr/org/apache/solr/ltr/model/LTRScoringModel.html[LTRScoringModel]) |(not applicable)
|===
[[LearningToRank-QuickStartExample]]
== Quick Start Example
== Quick Start with LTR
The `"techproducts"` example included with Solr is pre-configured with the plugins required for learning-to-rank, but they are disabled by default.
@ -114,7 +103,6 @@ To enable the plugins, please specify the `solr.ltr.enabled` JVM System Property
bin/solr start -e techproducts -Dsolr.ltr.enabled=true
----
[[LearningToRank-Uploadingfeatures]]
=== Uploading Features
To upload features in a `/path/myFeatures.json` file, please run:
@ -154,7 +142,6 @@ To view the features you just uploaded please open the following URL in a browse
]
----
[[LearningToRank-Extractingfeatures]]
=== Extracting Features
To extract features as part of a query, add `[features]` to the `fl` parameter, for example:
@ -184,7 +171,6 @@ The output XML will include feature values as a comma-separated list, resembling
}}
----
[[LearningToRank-Uploadingamodel]]
=== Uploading a Model
To upload the model in a `/path/myModel.json` file, please run:
@ -219,7 +205,6 @@ To view the model you just uploaded please open the following URL in a browser:
}
----
[[LearningToRank-Runningarerankquery]]
=== Running a Rerank Query
To rerank the results of a query, add the `rq` parameter to your search, for example:
@ -258,12 +243,10 @@ The output XML will include feature values as a comma-separated list, resembling
}}
----
[[LearningToRank-ExternalFeatureInformation]]
=== External Feature Information
The {solr-javadocs}/solr-ltr/org/apache/solr/ltr/feature/ValueFeature.html[ValueFeature] and {solr-javadocs}/solr-ltr/org/apache/solr/ltr/feature/SolrFeature.html[SolrFeature] classes support the use of external feature information, `efi` for short.
[[LearningToRank-Uploadingfeatures.1]]
==== Uploading Features
To upload features in a `/path/myEfiFeatures.json` file, please run:
@ -308,9 +291,8 @@ To view the features you just uploaded please open the following URL in a browse
]
----
As an aside, you may have noticed that the `myEfiFeatures.json` example uses `"store":"myEfiFeatureStore"` attributes: read more about feature `store` in the <<Lifecycle>> section of this page.
As an aside, you may have noticed that the `myEfiFeatures.json` example uses `"store":"myEfiFeatureStore"` attributes: read more about feature `store` in the <<LTR Lifecycle>> section of this page.
[[LearningToRank-Extractingfeatures.1]]
==== Extracting Features
To extract `myEfiFeatureStore` features as part of a query, add `efi.*` parameters to the `[features]` part of the `fl` parameter, for example:
@ -321,7 +303,6 @@ http://localhost:8983/solr/techproducts/query?q=test&fl=id,cat,manu,score,[featu
[source,text]
http://localhost:8983/solr/techproducts/query?q=test&fl=id,cat,manu,score,[features store=myEfiFeatureStore efi.text=test efi.preferredManufacturer=Apache efi.fromMobile=0 efi.answer=13]
[[LearningToRank-Uploadingamodel.1]]
==== Uploading a Model
To upload the model in a `/path/myEfiModel.json` file, please run:
@ -359,7 +340,6 @@ To view the model you just uploaded please open the following URL in a browser:
}
----
[[LearningToRank-Runningarerankquery.1]]
==== Running a Rerank Query
To obtain the feature values computed during reranking, add `[features]` to the `fl` parameter and `efi.*` parameters to the `rq` parameter, for example:
@ -368,39 +348,34 @@ To obtain the feature values computed during reranking, add `[features]` to the
http://localhost:8983/solr/techproducts/query?q=test&rq=\{!ltr model=myEfiModel efi.text=test efi.preferredManufacturer=Apache efi.fromMobile=1}&fl=id,cat,manu,score,[features]] link:[]
[source,text]
http://localhost:8983/solr/techproducts/query?q=test&rq=\{!ltr model=myEfiModel efi.text=test efi.preferredManufacturer=Apache efi.fromMobile=0 efi.answer=13}&fl=id,cat,manu,score,[features]]
http://localhost:8983/solr/techproducts/query?q=test&rq={!ltr model=myEfiModel efi.text=test efi.preferredManufacturer=Apache efi.fromMobile=0 efi.answer=13}&fl=id,cat,manu,score,[features]
Notice the absence of `efi.*` parameters in the `[features]` part of the `fl` parameter.
[[LearningToRank-Extractingfeatureswhilstreranking]]
==== Extracting Features While Reranking
To extract features for `myEfiFeatureStore` features while still reranking with `myModel`:
[source,text]
http://localhost:8983/solr/techproducts/query?q=test&rq=\{!ltr model=myModel}&fl=id,cat,manu,score,[features store=myEfiFeatureStore efi.text=test efi.preferredManufacturer=Apache efi.fromMobile=1]] link:[]
http://localhost:8983/solr/techproducts/query?q=test&rq={!ltr model=myModel}&fl=id,cat,manu,score,[features store=myEfiFeatureStore efi.text=test efi.preferredManufacturer=Apache efi.fromMobile=1]
Notice the absence of `efi.*` parameters in the `rq` parameter (because `myModel` does not use `efi` feature) and the presence of `efi.*` parameters in the `[features]` part of the `fl` parameter (because `myEfiFeatureStore` contains `efi` features).
Notice the absence of `efi.\*` parameters in the `rq` parameter (because `myModel` does not use `efi` feature) and the presence of `efi.*` parameters in the `[features]` part of the `fl` parameter (because `myEfiFeatureStore` contains `efi` features).
Read more about model evolution in the <<Lifecycle>> section of this page.
Read more about model evolution in the <<LTR Lifecycle>> section of this page.
[[LearningToRank-Trainingexample]]
=== Training Example
Example training data and a demo 'train and upload model' script can be found in the `solr/contrib/ltr/example` folder in the https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git[Apache lucene-solr git repository] which is mirrored on https://github.com/apache/lucene-solr/tree/releases/lucene-solr/6.4.0/solr/contrib/ltr/example[github.com] (the `solr/contrib/ltr/example` folder is not shipped in the solr binary release).
[[LearningToRank-Installation]]
== Installation
== Installation of LTR
The ltr contrib module requires the `dist/solr-ltr-*.jar` JARs.
[[LearningToRank-Configuration]]
== Configuration
== LTR Configuration
Learning-To-Rank is a contrib module and therefore its plugins must be configured in `solrconfig.xml`.
[[LearningToRank-Minimumrequirements]]
=== Minimum requirements
=== Minimum Requirements
* Include the required contrib JARs. Note that by default paths are relative to the Solr core so they may need adjustments to your configuration, or an explicit specification of the `$solr.install.dir`.
+
@ -437,15 +412,12 @@ Learning-To-Rank is a contrib module and therefore its plugins must be configure
</transformer>
----
[[LearningToRank-Advancedoptions]]
=== Advanced Options
[[LearningToRank-LTRThreadModule]]
==== LTRThreadModule
A thread module can be configured for the query parser and/or the transformer to parallelize the creation of feature weights. For details, please refer to the {solr-javadocs}/solr-ltr/org/apache/solr/ltr/LTRThreadModule.html[LTRThreadModule] javadocs.
[[LearningToRank-Featurevectorcustomization]]
==== Feature Vector Customization
The features transformer returns dense CSV values such as `featureA=0.1,featureB=0.2,featureC=0.3,featureD=0.0`.
@ -462,7 +434,6 @@ For sparse CSV output such as `featureA:0.1 featureB:0.2 featureC:0.3` you can c
</transformer>
----
[[LearningToRank-Implementationandcontributions]]
==== Implementation and Contributions
.How does Solr Learning-To-Rank work under the hood?
@ -481,10 +452,8 @@ Contributions for further models, features and normalizers are welcome. Related
* http://wiki.apache.org/lucene-java/HowToContribute
====
[[LearningToRank-Lifecycle]]
== Lifecycle
== LTR Lifecycle
[[LearningToRank-Featurestores]]
=== Feature Stores
It is recommended that you organise all your features into stores which are akin to namespaces:
@ -501,7 +470,6 @@ To inspect the content of the `commonFeatureStore` feature store:
`\http://localhost:8983/solr/techproducts/schema/feature-store/commonFeatureStore`
[[LearningToRank-Models]]
=== Models
* A model uses features from exactly one feature store.
@ -537,13 +505,11 @@ To delete the `currentFeatureStore` feature store:
curl -XDELETE 'http://localhost:8983/solr/techproducts/schema/feature-store/currentFeatureStore'
----
[[LearningToRank-Applyingchanges]]
=== Applying Changes
The feature store and the model store are both <<managed-resources.adoc#managed-resources,Managed Resources>>. Changes made to managed resources are not applied to the active Solr components until the Solr collection (or Solr core in single server mode) is reloaded.
[[LearningToRank-Examples]]
=== Examples
=== LTR Examples
==== One Feature Store, Multiple Ranking Models
@ -628,7 +594,6 @@ The feature store and the model store are both <<managed-resources.adoc#managed-
}
----
[[LearningToRank-Modelevolution]]
==== Model Evolution
* `linearModel201701` uses features from `featureStore201701`
@ -752,8 +717,7 @@ The feature store and the model store are both <<managed-resources.adoc#managed-
}
----
[[LearningToRank-AdditionalResources]]
== Additional Resources
== Additional LTR Resources
* "Learning to Rank in Solr" presentation at Lucene/Solr Revolution 2015 in Austin:
** Slides: http://www.slideshare.net/lucidworks/learning-to-rank-in-solr-presented-by-michael-nilsson-diego-ceccarelli-bloomberg-lp

View File

@ -32,7 +32,6 @@ We can prefix this query string with local parameters to provide more informatio
These local parameters would change the query to require a match on both "solr" and "rocks" while searching the "title" field by default.
[[LocalParametersinQueries-BasicSyntaxofLocalParameters]]
== Basic Syntax of Local Parameters
To specify a local parameter, insert the following before the argument to be modified:
@ -45,7 +44,6 @@ To specify a local parameter, insert the following before the argument to be mod
You may specify only one local parameters prefix per argument. Values in the key-value pairs may be quoted via single or double quotes, and backslash escaping works within quoted strings.
[[LocalParametersinQueries-QueryTypeShortForm]]
== Query Type Short Form
If a local parameter value appears without a name, it is given the implicit name of "type". This allows short-form representation for the type of query parser to use when parsing a query string. Thus
@ -74,7 +72,6 @@ is equivalent to
`q={!type=dismax qf=myfield v='solr rocks'`}
[[LocalParametersinQueries-ParameterDereferencing]]
== Parameter Dereferencing
Parameter dereferencing, or indirection, lets you use the value of another argument rather than specifying it directly. This can be used to simplify queries, decouple user input from query parameters, or decouple front-end GUI parameters from defaults set in `solrconfig.xml`.

View File

@ -27,7 +27,6 @@ image::images/logging/logging.png[image,width=621,height=250]
While this example shows logged messages for only one core, if you have multiple cores in a single instance, they will each be listed, with the level for each.
[[Logging-SelectingaLoggingLevel]]
== Selecting a Logging Level
When you select the *Level* link on the left, you see the hierarchy of classpaths and classnames for your instance. A row highlighted in yellow indicates that the class has logging capabilities. Click on a highlighted row, and a menu will appear to allow you to change the log level for that class. Characters in boldface indicate that the class will not be affected by level changes to root.

View File

@ -46,9 +46,9 @@ Built on streaming expressions, new in Solr 6 is a <<parallel-sql-interface.adoc
Replication across data centers is now possible with <<cross-data-center-replication-cdcr.adoc#cross-data-center-replication-cdcr,Cross Data Center Replication>>. Using an active-passive model, a SolrCloud cluster can be replicated to another data center, and monitored with a new API.
=== Graph Query Parser
=== Graph QueryParser
A new <<other-parsers.adoc#OtherParsers-GraphQueryParser,`graph` query parser>> makes it possible to to graph traversal queries of Directed (Cyclic) Graphs modelled using Solr documents.
A new <<other-parsers.adoc#graph-query-parser,`graph` query parser>> makes it possible to to graph traversal queries of Directed (Cyclic) Graphs modelled using Solr documents.
[[major-5-6-docvalues]]
=== DocValues

View File

@ -28,12 +28,12 @@ Support for backups when running SolrCloud is provided with the <<collections-ap
Two commands are available:
* `action=BACKUP`: This command backs up Solr indexes and configurations. More information is available in the section <<collections-api.adoc#CollectionsAPI-backup,Backup Collection>>.
* `action=RESTORE`: This command restores Solr indexes and configurations. More information is available in the section <<collections-api.adoc#CollectionsAPI-restore,Restore Collection>>.
* `action=BACKUP`: This command backs up Solr indexes and configurations. More information is available in the section <<collections-api.adoc#backup,Backup Collection>>.
* `action=RESTORE`: This command restores Solr indexes and configurations. More information is available in the section <<collections-api.adoc#restore,Restore Collection>>.
== Standalone Mode Backups
Backups and restoration uses Solr's replication handler. Out of the box, Solr includes implicit support for replication so this API can be used. Configuration of the replication handler can, however, be customized by defining your own replication handler in `solrconfig.xml` . For details on configuring the replication handler, see the section <<index-replication.adoc#IndexReplication-ConfiguringtheReplicationHandler,Configuring the ReplicationHandler>>.
Backups and restoration uses Solr's replication handler. Out of the box, Solr includes implicit support for replication so this API can be used. Configuration of the replication handler can, however, be customized by defining your own replication handler in `solrconfig.xml` . For details on configuring the replication handler, see the section <<index-replication.adoc#configuring-the-replicationhandler,Configuring the ReplicationHandler>>.
=== Backup API
@ -58,7 +58,7 @@ The path where the backup will be created. If the path is not absolute then the
|name |The snapshot will be created in a directory called `snapshot.<name>`. If a name is not specified then the directory name would have the following format: `snapshot.<yyyyMMddHHmmssSSS>`.
`numberToKeep`::
The number of backups to keep. If `maxNumberOfBackups` has been specified on the replication handler in `solrconfig.xml`, `maxNumberOfBackups` is always used and attempts to use `numberToKeep` will cause an error. Also, this parameter is not taken into consideration if the backup name is specified. More information about `maxNumberOfBackups` can be found in the section <<index-replication.adoc#IndexReplication-ConfiguringtheReplicationHandler,Configuring the ReplicationHandler>>.
The number of backups to keep. If `maxNumberOfBackups` has been specified on the replication handler in `solrconfig.xml`, `maxNumberOfBackups` is always used and attempts to use `numberToKeep` will cause an error. Also, this parameter is not taken into consideration if the backup name is specified. More information about `maxNumberOfBackups` can be found in the section <<index-replication.adoc#configuring-the-replicationhandler,Configuring the ReplicationHandler>>.
`repository`::
The name of the repository to be used for the backup. If no repository is specified then the local filesystem repository will be used automatically.

View File

@ -33,15 +33,13 @@ All of the examples in this section assume you are running the "techproducts" So
bin/solr -e techproducts
----
[[ManagedResources-Overview]]
== Overview
== Managed Resources Overview
Let's begin learning about managed resources by looking at a couple of examples provided by Solr for managing stop words and synonyms using a REST API. After reading this section, you'll be ready to dig into the details of how managed resources are implemented in Solr so you can start building your own implementation.
[[ManagedResources-Stopwords]]
=== Stop Words
=== Managing Stop Words
To begin, you need to define a field type that uses the <<filter-descriptions.adoc#FilterDescriptions-ManagedStopFilter,ManagedStopFilterFactory>>, such as:
To begin, you need to define a field type that uses the <<filter-descriptions.adoc#managed-stop-filter,ManagedStopFilterFactory>>, such as:
[source,xml,subs="verbatim,callouts"]
----
@ -56,7 +54,7 @@ To begin, you need to define a field type that uses the <<filter-descriptions.ad
There are two important things to notice about this field type definition:
<1> The filter implementation class is `solr.ManagedStopFilterFactory`. This is a special implementation of the <<filter-descriptions.adoc#FilterDescriptions-StopFilter,StopFilterFactory>> that uses a set of stop words that are managed from a REST API.
<1> The filter implementation class is `solr.ManagedStopFilterFactory`. This is a special implementation of the <<filter-descriptions.adoc#stop-filter,StopFilterFactory>> that uses a set of stop words that are managed from a REST API.
<2> The `managed=”english”` attribute gives a name to the set of managed stop words, in this case indicating the stop words are for English text.
@ -134,8 +132,7 @@ curl -X DELETE "http://localhost:8983/solr/techproducts/schema/analysis/stopword
NOTE: PUT/POST is used to add terms to an existing list instead of replacing the list entirely. This is because it is more common to add a term to an existing list than it is to replace a list altogether, so the API favors the more common approach of incrementally adding terms especially since deleting individual terms is also supported.
[[ManagedResources-Synonyms]]
=== Synonyms
=== Managing Synonyms
For the most part, the API for managing synonyms behaves similar to the API for stop words, except instead of working with a list of words, it uses a map, where the value for each entry in the map is a set of synonyms for a term. As with stop words, the `sample_techproducts_configs` <<config-sets.adoc#config-sets,configset>> includes a pre-built set of synonym mappings suitable for the sample data that is activated by the following field type definition in schema.xml:
@ -209,8 +206,7 @@ Note that the expansion is performed when processing the PUT request so the unde
Lastly, you can delete a mapping by sending a DELETE request to the managed endpoint.
[[ManagedResources-ApplyingChanges]]
== Applying Changes
== Applying Managed Resource Changes
Changes made to managed resources via this REST API are not applied to the active Solr components until the Solr collection (or Solr core in single server mode) is reloaded.
@ -227,7 +223,6 @@ However, the intent of this API implementation is that changes will be applied u
Changing things like stop words and synonym mappings typically require re-indexing existing documents if being used by index-time analyzers. The RestManager framework does not guard you from this, it simply makes it possible to programmatically build up a set of stop words, synonyms etc.
====
[[ManagedResources-RestManagerEndpoint]]
== RestManager Endpoint
Metadata about registered ManagedResources is available using the `/schema/managed` endpoint for each collection.

View File

@ -34,8 +34,7 @@ Specifies whether statistics are returned with results. You can override the `st
`wt`::
The output format. This operates the same as the <<response-writers.adoc#response-writers,`wt` parameter in a query>>. The default is `xml`.
[[MBeanRequestHandler-Examples]]
== Examples
== MBeanRequestHandler Examples
The following examples assume you are running Solr's `techproducts` example configuration:

View File

@ -27,7 +27,6 @@ To merge indexes, they must meet these requirements:
Optimally, the two indexes should be built using the same schema.
[[MergingIndexes-UsingIndexMergeTool]]
== Using IndexMergeTool
To merge the indexes, do the following:
@ -43,9 +42,8 @@ java -cp $SOLR/server/solr-webapp/webapp/WEB-INF/lib/lucene-core-VERSION.jar:$SO
This will create a new index at `/path/to/newindex` that contains both index1 and index2.
. Copy this new directory to the location of your application's solr index (move the old one aside first, of course) and start Solr.
[[MergingIndexes-UsingCoreAdmin]]
== Using CoreAdmin
The `MERGEINDEXES` command of the <<coreadmin-api.adoc#CoreAdminAPI-MERGEINDEXES,CoreAdminHandler>> can be used to merge indexes into a new core either from one or more arbitrary `indexDir` directories or by merging from one or more existing `srcCore` core names.
The `MERGEINDEXES` command of the <<coreadmin-api.adoc#coreadmin-mergeindexes,CoreAdminHandler>> can be used to merge indexes into a new core either from one or more arbitrary `indexDir` directories or by merging from one or more existing `srcCore` core names.
See the <<coreadmin-api.adoc#CoreAdminAPI-MERGEINDEXES,CoreAdminHandler>> section for details.
See the <<coreadmin-api.adoc#coreadmin-mergeindexes,CoreAdminHandler>> section for details.

View File

@ -28,7 +28,6 @@ The second is to use it as a search component. This is less desirable since it p
The final approach is to use it as a request handler but with externally supplied text. This case, also referred to as the MoreLikeThisHandler, will supply information about similar documents in the index based on the text of the input document.
[[MoreLikeThis-HowMoreLikeThisWorks]]
== How MoreLikeThis Works
`MoreLikeThis` constructs a Lucene query based on terms in a document. It does this by pulling terms from the defined list of fields ( see the `mlt.fl` parameter, below). For best results, the fields should have stored term vectors in `schema.xml`. For example:
@ -42,7 +41,6 @@ If term vectors are not stored, `MoreLikeThis` will generate terms from stored f
The next phase filters terms from the original document using thresholds defined with the MoreLikeThis parameters. Finally, a query is run with these terms, and any other query parameters that have been defined (see the `mlt.qf` parameter, below) and a new document set is returned.
[[MoreLikeThis-CommonParametersforMoreLikeThis]]
== Common Parameters for MoreLikeThis
The table below summarizes the `MoreLikeThis` parameters supported by Lucene/Solr. These parameters can be used with any of the three possible MoreLikeThis approaches.
@ -77,8 +75,6 @@ Specifies if the query will be boosted by the interesting term relevance. It can
`mlt.qf`::
Query fields and their boosts using the same format as that used by the <<the-dismax-query-parser.adoc#the-dismax-query-parser,DisMax Query Parser>>. These fields must also be specified in `mlt.fl`.
[[MoreLikeThis-ParametersfortheMoreLikeThisComponent]]
== Parameters for the MoreLikeThisComponent
Using MoreLikeThis as a search component returns similar documents for each document in the response set. In addition to the common parameters, these additional options are available:
@ -89,8 +85,6 @@ If set to `true`, activates the `MoreLikeThis` component and enables Solr to ret
`mlt.count`::
Specifies the number of similar documents to be returned for each result. The default value is 5.
[[MoreLikeThis-ParametersfortheMoreLikeThisHandler]]
== Parameters for the MoreLikeThisHandler
The table below summarizes parameters accessible through the `MoreLikeThisHandler`. It supports faceting, paging, and filtering using common query parameters, but does not work well with alternate query parsers.
@ -105,7 +99,6 @@ Specifies an offset into the main query search results to locate the document on
Controls how the `MoreLikeThis` component presents the "interesting" terms (the top TF/IDF terms) for the query. Supports three settings. The setting list lists the terms. The setting none lists no terms. The setting details lists the terms along with the boost value used for each term. Unless `mlt.boost=true`, all terms will have `boost=1.0`.
[[MoreLikeThis-MoreLikeThisQueryParser]]
== More Like This Query Parser
== MoreLikeThis Query Parser
The `mlt` query parser provides a mechanism to retrieve documents similar to a given document, like the handler. More information on the usage of the mlt query parser can be found in the section <<other-parsers.adoc#other-parsers,Other Parsers>>.

View File

@ -26,7 +26,6 @@ With NRT, you can modify a `commit` command to be a *soft commit*, which avoids
However, pay special attention to cache and autowarm settings as they can have a significant impact on NRT performance.
[[NearRealTimeSearching-CommitsandOptimizing]]
== Commits and Optimizing
A commit operation makes index changes visible to new search requests. A *hard commit* uses the transaction log to get the id of the latest document changes, and also calls `fsync` on the index files to ensure they have been flushed to stable storage and no data loss will result from a power failure. The current transaction log is closed and a new one is opened. See the "transaction log" discussion below for data loss issues.
@ -45,7 +44,6 @@ The number of milliseconds to wait before pushing documents to the index. It wor
Use `maxDocs` and `maxTime` judiciously to fine-tune your commit strategies.
[[NearRealTimeSearching-TransactionLogs]]
=== Transaction Logs (tlogs)
Transaction logs are a "rolling window" of at least the last `N` (default 100) documents indexed. Tlogs are configured in solrconfig.xml, including the value of `N`. The current transaction log is closed and a new one opened each time any variety of hard commit occurs. Soft commits have no effect on the transaction log.
@ -54,7 +52,6 @@ When tlogs are enabled, documents being added to the index are written to the tl
When Solr is shut down gracefully (i.e. using the `bin/solr stop` command and the like) Solr will close the tlog file and index segments so no replay will be necessary on startup.
[[NearRealTimeSearching-AutoCommits]]
=== AutoCommits
An autocommit also uses the parameters `maxDocs` and `maxTime`. However it's useful in many strategies to use both a hard `autocommit` and `autosoftcommit` to achieve more flexible commits.
@ -72,7 +69,6 @@ For example:
It's better to use `maxTime` rather than `maxDocs` to modify an `autoSoftCommit`, especially when indexing a large number of documents through the commit operation. It's also better to turn off `autoSoftCommit` for bulk indexing.
[[NearRealTimeSearching-OptionalAttributesforcommitandoptimize]]
=== Optional Attributes for commit and optimize
`waitSearcher`::
@ -99,7 +95,6 @@ Example of `commit` and `optimize` with optional attributes:
<optimize waitSearcher="false"/>
----
[[NearRealTimeSearching-PassingcommitandcommitWithinparametersaspartoftheURL]]
=== Passing commit and commitWithin Parameters as Part of the URL
Update handlers can also get `commit`-related parameters as part of the update URL, if the `stream.body` feature is enabled. This example adds a small test document and causes an explicit commit to happen immediately afterwards:
@ -132,10 +127,9 @@ curl http://localhost:8983/solr/my_collection/update?commitWithin=10000
-H "Content-Type: text/xml" --data-binary '<add><doc><field name="id">testdoc</field></doc></add>'
----
WARNING: While the `stream.body` feature is great for development and testing, it should normally not be enabled in production systems, as it lets a user with READ permissions post data that may alter the system state. The feature is disabled by default. See <<requestdispatcher-in-solrconfig.adoc#RequestDispatcherinSolrConfig-requestParsersElement,RequestDispatcher in SolrConfig>> for details.
WARNING: While the `stream.body` feature is great for development and testing, it should normally not be enabled in production systems, as it lets a user with READ permissions post data that may alter the system state. The feature is disabled by default. See <<requestdispatcher-in-solrconfig.adoc#requestparsers-element,RequestDispatcher in SolrConfig>> for details.
[[NearRealTimeSearching-ChangingdefaultcommitWithinBehavior]]
=== Changing default commitWithin Behavior
=== Changing Default commitWithin Behavior
The `commitWithin` settings allow forcing document commits to happen in a defined time period. This is used most frequently with <<near-real-time-searching.adoc#near-real-time-searching,Near Real Time Searching>>, and for that reason the default is to perform a soft commit. This does not, however, replicate new documents to slave servers in a master/slave environment. If that's a requirement for your implementation, you can force a hard commit by adding a parameter, as in this example:

View File

@ -24,7 +24,6 @@ This section details the other parsers, and gives examples for how they might be
Many of these parsers are expressed the same way as <<local-parameters-in-queries.adoc#local-parameters-in-queries,Local Parameters in Queries>>.
[[OtherParsers-BlockJoinQueryParsers]]
== Block Join Query Parsers
There are two query parsers that support block joins. These parsers allow indexing and searching for relational content that has been<<uploading-data-with-index-handlers.adoc#uploading-data-with-index-handlers,indexed as nested documents>>.
@ -55,7 +54,6 @@ The example usage of the query parsers below assumes these two documents and eac
</add>
----
[[OtherParsers-BlockJoinChildrenQueryParser]]
=== Block Join Children Query Parser
This parser takes a query that matches some parent documents and returns their children.
@ -80,16 +78,16 @@ Using the example documents above, we can construct a query such as `q={!child o
Note that the query for `someParents` should match only parent documents passed by `allParents` or you may get an exception:
....
[literal]
Parent query must not match any docs besides parent filter. Combine them as must (+) and must-not (-) clauses to find a problem doc.
....
In older version the error is:
....
[literal]
Parent query yields document which is not matched by parents filter.
....
You can search for `q=+(someParents) -(allParents)` to find a cause.
[[OtherParsers-BlockJoinParentQueryParser]]
=== Block Join Parent Query Parser
This parser takes a query that matches child documents and returns their parents.
@ -101,13 +99,15 @@ The parameter `allParents` is a filter that matches *only parent documents*; her
The parameter `someChildren` is a query that matches some or all of the child documents.
Note that the query for `someChildren` should match only child documents or you may get an exception:
....
[literal]
Child query must not match same docs with parent filter. Combine them as must clauses (+) to find a problem doc.
....
In older version it's:
....
In older version the error is:
[literal]
child query must only match non-parent docs.
....
You can search for `q=+(parentFilter) +(someChildren)` to find a cause .
Again using the example documents above, we can construct a query such as `q={!parent which="content_type:parentDocument"}comments:SolrCloud`. We get this document in response:
@ -133,20 +133,17 @@ A common mistake is to try to filter parents with a `which` filter, as in this b
Instead, you should use a sibling mandatory clause as a filter:
`q= *+title:join* +{!parent which="*content_type:parentDocument*"}comments:SolrCloud`
====
[[OtherParsers-Scoring]]
=== Scoring
=== Scoring with the Block Join Parent Query Parser
You can optionally use the `score` local parameter to return scores of the subordinate query. The values to use for this parameter define the type of aggregation, which are `avg` (average), `max` (maximum), `min` (minimum), `total (sum)`. Implicit default is `none` which returns `0.0`.
[[OtherParsers-BoostQueryParser]]
== Boost Query Parser
`BoostQParser` extends the `QParserPlugin` and creates a boosted query from the input value. The main value is the query to be boosted. Parameter `b` is the function query to use as the boost. The query to be boosted may be of any type.
Examples:
=== Boost Query Parser Examples
Creates a query "foo" which is boosted (scores are multiplied) by the function query `log(popularity)`:
@ -162,7 +159,7 @@ Creates a query "foo" which is boosted by the date boosting function referenced
{!boost b=recip(ms(NOW,mydatefield),3.16e-11,1,1)}foo
----
[[OtherParsers-CollapsingQueryParser]]
[[other-collapsing]]
== Collapsing Query Parser
The `CollapsingQParser` is really a _post filter_ that provides more performant field collapsing than Solr's standard approach when the number of distinct groups in the result set is high.
@ -171,7 +168,6 @@ This parser collapses the result set to a single document per group before it fo
Details about using the `CollapsingQParser` can be found in the section <<collapse-and-expand-results.adoc#collapse-and-expand-results,Collapse and Expand Results>>.
[[OtherParsers-ComplexPhraseQueryParser]]
== Complex Phrase Query Parser
The `ComplexPhraseQParser` provides support for wildcards, ORs, etc., inside phrase queries using Lucene's {lucene-javadocs}/queryparser/org/apache/lucene/queryparser/complexPhrase/ComplexPhraseQueryParser.html[`ComplexPhraseQueryParser`].
@ -204,15 +200,13 @@ A mix of ordered and unordered complex phrase queries:
+_query_:"{!complexphrase inOrder=true}manu:\"a* c*\"" +_query_:"{!complexphrase inOrder=false df=name}\"bla* pla*\""
----
[[OtherParsers-Limitations]]
=== Limitations
=== Complex Phrase Parser Limitations
Performance is sensitive to the number of unique terms that are associated with a pattern. For instance, searching for "a*" will form a large OR clause (technically a SpanOr with many terms) for all of the terms in your index for the indicated field that start with the single letter 'a'. It may be prudent to restrict wildcards to at least two or preferably three letters as a prefix. Allowing very short prefixes may result in to many low-quality documents being returned.
Notice that it also supports leading wildcards "*a" as well with consequent performance implications. Applying <<filter-descriptions.adoc#reversed-wildcard-filter,ReversedWildcardFilterFactory>> in index-time analysis is usually a good idea.
[[OtherParsers-MaxBooleanClauses]]
==== MaxBooleanClauses
==== MaxBooleanClauses with Complex Phrase Parser
You may need to increase MaxBooleanClauses in `solrconfig.xml` as a result of the term expansion above:
@ -221,10 +215,9 @@ You may need to increase MaxBooleanClauses in `solrconfig.xml` as a result of th
<maxBooleanClauses>4096</maxBooleanClauses>
----
This property is described in more detail in the section <<query-settings-in-solrconfig.adoc#QuerySettingsinSolrConfig-QuerySizingandWarming,Query Sizing and Warming>>.
This property is described in more detail in the section <<query-settings-in-solrconfig.adoc#query-sizing-and-warming,Query Sizing and Warming>>.
[[OtherParsers-Stopwords]]
==== Stopwords
==== Stopwords with Complex Phrase Parser
It is recommended not to use stopword elimination with this query parser.
@ -246,12 +239,10 @@ the document is returned. The next query that _does_ use the Complex Phrase Quer
does _not_ return that document because SpanNearQuery has no good way to handle stopwords in a way analogous to PhraseQuery. If you must remove stopwords for your use case, use a custom filter factory or perhaps a customized synonyms filter that reduces given stopwords to some impossible token.
[[OtherParsers-Escaping]]
==== Escaping
==== Escaping with Complex Phrase Parser
Special care has to be given when escaping: clauses between double quotes (usually whole query) is parsed twice, these parts have to be escaped as twice. eg `"foo\\: bar\\^"`.
[[OtherParsers-FieldQueryParser]]
== Field Query Parser
The `FieldQParser` extends the `QParserPlugin` and creates a field query from the input value, applying text analysis and constructing a phrase query if appropriate. The parameter `f` is the field to be queried.
@ -265,7 +256,6 @@ Example:
This example creates a phrase query with "foo" followed by "bar" (assuming the analyzer for `myfield` is a text field with an analyzer that splits on whitespace and lowercase terms). This is generally equivalent to the Lucene query parser expression `myfield:"Foo Bar"`.
[[OtherParsers-FunctionQueryParser]]
== Function Query Parser
The `FunctionQParser` extends the `QParserPlugin` and creates a function query from the input value. This is only one way to use function queries in Solr; for another, more integrated, approach, see the section on <<function-queries.adoc#function-queries,Function Queries>>.
@ -277,7 +267,6 @@ Example:
{!func}log(foo)
----
[[OtherParsers-FunctionRangeQueryParser]]
== Function Range Query Parser
The `FunctionRangeQParser` extends the `QParserPlugin` and creates a range query over a function. This is also referred to as `frange`, as seen in the examples below.
@ -312,15 +301,13 @@ Both of these examples restrict the results by a range of values found in a decl
For more information about range queries over functions, see Yonik Seeley's introductory blog post https://lucidworks.com/2009/07/06/ranges-over-functions-in-solr-14/[Ranges over Functions in Solr 1.4].
[[OtherParsers-GraphQueryParser]]
== Graph Query Parser
The `graph` query parser does a breadth first, cyclic aware, graph traversal of all documents that are "reachable" from a starting set of root documents identified by a wrapped query.
The graph is built according to linkages between documents based on the terms found in `from` and `to` fields that you specify as part of the query.
[[OtherParsers-Parameters]]
=== Parameters
=== Graph Query Parameters
`to`::
The field name of matching documents to inspect to identify outgoing edges for graph traversal. Defaults to `edge_ids`.
@ -342,17 +329,15 @@ Boolean that indicates if the results of the query should be filtered so that on
`useAutn`:: Boolean that indicates if an Automatons should be compiled for each iteration of the breadth first search, which may be faster for some graphs. Defaults to `false`.
[[OtherParsers-Limitations.1]]
=== Limitations
=== Graph Query Limitations
The `graph` parser only works in single node Solr installations, or with <<solrcloud.adoc#solrcloud,SolrCloud>> collections that use exactly 1 shard.
[[OtherParsers-Examples]]
=== Examples
=== Graph Query Examples
To understand how the graph parser works, consider the following Directed Cyclic Graph, containing 8 nodes (A to H) and 9 edges (1 to 9):
image::images/other-parsers/graph_qparser_example.png[image,height=200]
image::images/other-parsers/graph_qparser_example.png[image,height=100]
One way to model this graph as Solr documents, would be to create one document per node, with mutivalued fields identifying the incoming and outgoing edges for each node:
@ -426,7 +411,6 @@ http://localhost:8983/solr/my_graph/query?fl=id&q={!graph+from=in_edge+to=out_ed
}
----
[[OtherParsers-SimplifiedModels]]
=== Simplified Models
The Document & Field modeling used in the above examples enumerated all of the outgoing and income edges for each node explicitly, to help demonstrate exactly how the "from" and "to" params work, and to give you an idea of what is possible. With multiple sets of fields like these for identifying incoming and outgoing edges, it's possible to model many independent Directed Graphs that contain some or all of the documents in your collection.
@ -469,7 +453,6 @@ http://localhost:8983/solr/alt_graph/query?fl=id&q={!graph+from=id+to=out_edge+m
}
----
[[OtherParsers-JoinQueryParser]]
== Join Query Parser
`JoinQParser` extends the `QParserPlugin`. It allows normalizing relationships between documents with a join operation. This is different from the concept of a join in a relational database because no information is being truly joined. An appropriate SQL analogy would be an "inner query".
@ -493,8 +476,7 @@ fq = price:[* TO 12]
The join operation is done on a term basis, so the "from" and "to" fields must use compatible field types. For example: joining between a `StrField` and a `TrieIntField` will not work, likewise joining between a `StrField` and a `TextField` that uses `LowerCaseFilterFactory` will only work for values that are already lower cased in the string field.
[[OtherParsers-Scoring.1]]
=== Scoring
=== Join Parser Scoring
You can optionally use the `score` parameter to return scores of the subordinate query. The values to use for this parameter define the type of aggregation, which are `avg` (average), `max` (maximum), `min` (minimum) `total`, or `none`.
@ -504,7 +486,6 @@ You can optionally use the `score` parameter to return scores of the subordinate
Specifying `score` local parameter switches the join algorithm. This might have performance implication on large indices, but it's more important that this algorithm won't work for single value numeric field starting from 7.0. Users are encouraged to change field types to string and rebuild indexes during migration.
====
[[OtherParsers-JoiningAcrossCollections]]
=== Joining Across Collections
You can also specify a `fromIndex` parameter to join with a field from another core or collection. If running in SolrCloud mode, then the collection specified in the `fromIndex` parameter must have a single shard and a replica on all Solr nodes where the collection you're joining to has a replica.
@ -548,7 +529,6 @@ At query time, the `JoinQParser` will access the local replica of the *movie_dir
For more information about join queries, see the Solr Wiki page on http://wiki.apache.org/solr/Join[Joins]. Erick Erickson has also written a blog post about join performance titled https://lucidworks.com/2012/06/20/solr-and-joins/[Solr and Joins].
[[OtherParsers-LuceneQueryParser]]
== Lucene Query Parser
The `LuceneQParser` extends the `QParserPlugin` by parsing Solr's variant on the Lucene QueryParser syntax. This is effectively the same query parser that is used in Lucene. It uses the operators `q.op`, the default operator ("OR" or "AND") and `df`, the default field name.
@ -562,7 +542,6 @@ Example:
For more information about the syntax for the Lucene Query Parser, see the {lucene-javadocs}/queryparser/org/apache/lucene/queryparser/classic/package-summary.html[Classic QueryParser javadocs].
[[OtherParsers-LearningToRankQueryParser]]
== Learning To Rank Query Parser
The `LTRQParserPlugin` is a special purpose parser for reranking the top results of a simple query using a more complex ranking query which is based on a machine learnt model.
@ -576,7 +555,6 @@ Example:
Details about using the `LTRQParserPlugin` can be found in the <<learning-to-rank.adoc#learning-to-rank,Learning To Rank>> section.
[[OtherParsers-MaxScoreQueryParser]]
== Max Score Query Parser
The `MaxScoreQParser` extends the `LuceneQParser` but returns the Max score from the clauses. It does this by wrapping all `SHOULD` clauses in a `DisjunctionMaxQuery` with tie=1.0. Any `MUST` or `PROHIBITED` clauses are passed through as-is. Non-boolean queries, e.g., NumericRange falls-through to the `LuceneQParser` parser behavior.
@ -588,7 +566,6 @@ Example:
{!maxscore tie=0.01}C OR (D AND E)
----
[[OtherParsers-MoreLikeThisQueryParser]]
== More Like This Query Parser
`MLTQParser` enables retrieving documents that are similar to a given document. It uses Lucene's existing `MoreLikeThis` logic and also works in SolrCloud mode. The document identifier used here is the unique id value and not the Lucene internal document id. The list of returned documents excludes the queried document.
@ -638,7 +615,6 @@ Adding more constraints to what qualifies as similar using mintf and mindf.
{!mlt qf=name mintf=2 mindf=3}1
----
[[OtherParsers-NestedQueryParser]]
== Nested Query Parser
The `NestedParser` extends the `QParserPlugin` and creates a nested query, with the ability for that query to redefine its type via local parameters. This is useful in specifying defaults in configuration and letting clients indirectly reference them.
@ -662,7 +638,6 @@ If the `q1` parameter is price, then the query would be a function query on the
For more information about the possibilities of nested queries, see Yonik Seeley's blog post https://lucidworks.com/2009/03/31/nested-queries-in-solr/[Nested Queries in Solr].
[[OtherParsers-PayloadQueryParsers]]
== Payload Query Parsers
These query parsers utilize payloads encoded on terms during indexing.
@ -672,7 +647,6 @@ The main query, for both of these parsers, is parsed straightforwardly from the
* `PayloadScoreQParser`
* `PayloadCheckQParser`
[[OtherParsers-PayloadScoreParser]]
=== Payload Score Parser
`PayloadScoreQParser` incorporates each matching term's numeric (integer or float) payloads into the scores.
@ -695,7 +669,6 @@ If `true`, multiples computed payload factor by the score of the original query.
{!payload_score f=my_field_dpf v=some_term func=max}
----
[[OtherParsers-PayloadCheckParser]]
=== Payload Check Parser
`PayloadCheckQParser` only matches when the matching terms also have the specified payloads.
@ -719,7 +692,6 @@ Each specified payload will be encoded using the encoder determined from the fie
{!payload_check f=words_dps payloads="VERB NOUN"}searching stuff
----
[[OtherParsers-PrefixQueryParser]]
== Prefix Query Parser
`PrefixQParser` extends the `QParserPlugin` by creating a prefix query from the input value. Currently no analysis or value transformation is done to create this prefix query.
@ -735,7 +707,6 @@ Example:
This would be generally equivalent to the Lucene query parser expression `myfield:foo*`.
[[OtherParsers-RawQueryParser]]
== Raw Query Parser
`RawQParser` extends the `QParserPlugin` by creating a term query from the input value without any text analysis or transformation. This is useful in debugging, or when raw terms are returned from the terms component (this is not the default).
@ -751,18 +722,16 @@ Example:
This example constructs the query: `TermQuery(Term("myfield","Foo Bar"))`.
For easy filter construction to drill down in faceting, the <<OtherParsers-TermQueryParser,TermQParserPlugin>> is recommended.
For easy filter construction to drill down in faceting, the <<Term Query Parser,TermQParserPlugin>> is recommended.
For full analysis on all fields, including text fields, you may want to use the <<OtherParsers-FieldQueryParser,FieldQParserPlugin>>.
For full analysis on all fields, including text fields, you may want to use the <<Field Query Parser,FieldQParserPlugin>>.
[[OtherParsers-Re-RankingQueryParser]]
== Re-Ranking Query Parser
The `ReRankQParserPlugin` is a special purpose parser for Re-Ranking the top results of a simple query using a more complex ranking query.
Details about using the `ReRankQParserPlugin` can be found in the <<query-re-ranking.adoc#query-re-ranking,Query Re-Ranking>> section.
[[OtherParsers-SimpleQueryParser]]
== Simple Query Parser
The Simple query parser in Solr is based on Lucene's SimpleQueryParser. This query parser is designed to allow users to enter queries however they want, and it will do its best to interpret the query and return results.
@ -811,14 +780,12 @@ Defines the default field if none is defined in the Schema, or overrides the def
Any errors in syntax are ignored and the query parser will interpret queries as best it can. However, this can lead to odd results in some cases.
[[OtherParsers-SpatialQueryParsers]]
== Spatial Query Parsers
There are two spatial QParsers in Solr: `geofilt` and `bbox`. But there are other ways to query spatially: using the `frange` parser with a distance function, using the standard (lucene) query parser with the range syntax to pick the corners of a rectangle, or with RPT and BBoxField you can use the standard query parser but use a special syntax within quotes that allows you to pick the spatial predicate.
All these options are documented further in the section <<spatial-search.adoc#spatial-search,Spatial Search>>.
[[OtherParsers-SurroundQueryParser]]
== Surround Query Parser
The `SurroundQParser` enables the Surround query syntax, which provides proximity search functionality. There are two positional operators: `w` creates an ordered span query and `n` creates an unordered one. Both operators take a numeric value to indicate distance between two terms. The default is 1, and the maximum is 99.
@ -838,7 +805,6 @@ This query parser will also accept boolean operators (`AND`, `OR`, and `NOT`, in
The non-unary operators (everything but `NOT`) support both infix `(a AND b AND c)` and prefix `AND(a, b, c)` notation.
[[OtherParsers-SwitchQueryParser]]
== Switch Query Parser
`SwitchQParser` is a `QParserPlugin` that acts like a "switch" or "case" statement.
@ -895,7 +861,6 @@ Using the example configuration below, clients can optionally specify the custom
</requestHandler>
----
[[OtherParsers-TermQueryParser]]
== Term Query Parser
`TermQParser` extends the `QParserPlugin` by creating a single term query from the input value equivalent to `readableToIndexed()`. This is useful for generating filter queries from the external human readable terms returned by the faceting or terms components. The only parameter is `f`, for the field.
@ -907,14 +872,13 @@ Example:
{!term f=weight}1.5
----
For text fields, no analysis is done since raw terms are already returned from the faceting and terms components. To apply analysis to text fields as well, see the <<OtherParsers-FieldQueryParser,Field Query Parser>>, above.
For text fields, no analysis is done since raw terms are already returned from the faceting and terms components. To apply analysis to text fields as well, see the <<Field Query Parser>>, above.
If no analysis or transformation is desired for any type of field, see the <<OtherParsers-RawQueryParser,Raw Query Parser>>, above.
If no analysis or transformation is desired for any type of field, see the <<Raw Query Parser>>, above.
[[OtherParsers-TermsQueryParser]]
== Terms Query Parser
`TermsQParser` functions similarly to the <<OtherParsers-TermQueryParser,Term Query Parser>> but takes in multiple values separated by commas and returns documents matching any of the specified values.
`TermsQParser` functions similarly to the <<Term Query Parser,Term Query Parser>> but takes in multiple values separated by commas and returns documents matching any of the specified values.
This can be useful for generating filter queries from the external human readable terms returned by the faceting or terms components, and may be more efficient in some cases than using the <<the-standard-query-parser.adoc#the-standard-query-parser,Standard Query Parser>> to generate an boolean query since the default implementation `method` avoids scoring.
@ -929,7 +893,6 @@ Separator to use when parsing the input. If set to " " (a single blank space), w
`method`::
The internal query-building implementation: `termsFilter`, `booleanQuery`, `automaton`, or `docValuesTermsFilter`. Defaults to `termsFilter`.
*Examples*
[source,text]
@ -942,7 +905,6 @@ The internal query-building implementation: `termsFilter`, `booleanQuery`, `auto
{!terms f=categoryId method=booleanQuery separator=" "}8 6 7 5309
----
[[OtherParsers-XMLQueryParser]]
== XML Query Parser
The {solr-javadocs}/solr-core/org/apache/solr/search/XmlQParserPlugin.html[XmlQParserPlugin] extends the {solr-javadocs}/solr-core/org/apache/solr/search/QParserPlugin.html[QParserPlugin] and supports the creation of queries from XML. Example:
@ -1002,7 +964,6 @@ The XmlQParser implementation uses the {solr-javadocs}/solr-core/org/apache/solr
|<LegacyNumericRangeQuery> |LegacyNumericRangeQuery(Builder) is deprecated
|===
[[OtherParsers-CustomizingXMLQueryParser]]
=== Customizing XML Query Parser
You can configure your own custom query builders for additional XML elements. The custom builders need to extend the {solr-javadocs}/solr-core/org/apache/solr/search/SolrQueryBuilder.html[SolrQueryBuilder] or the {solr-javadocs}/solr-core/org/apache/solr/search/SolrSpanQueryBuilder.html[SolrSpanQueryBuilder] class. Example solrconfig.xml snippet:

View File

@ -20,7 +20,6 @@
This section describes several other important elements of `schema.xml` not covered in earlier sections.
[[OtherSchemaElements-UniqueKey]]
== Unique Key
The `uniqueKey` element specifies which field is a unique identifier for documents. Although `uniqueKey` is not required, it is nearly always warranted by your application design. For example, `uniqueKey` should be used if you will ever update a document in the index.
@ -37,7 +36,6 @@ Schema defaults and `copyFields` cannot be used to populate the `uniqueKey` fiel
Further, the operation will fail if the `uniqueKey` field is used, but is multivalued (or inherits the multivalue-ness from the `fieldtype`). However, `uniqueKey` will continue to work, as long as the field is properly used.
[[OtherSchemaElements-Similarity]]
== Similarity
Similarity is a Lucene class used to score a document in searching.

View File

@ -54,7 +54,7 @@ Faceting makes use of fields defined when the search applications were indexed.
Solr also supports a feature called <<morelikethis.adoc#morelikethis,MoreLikeThis>>, which enables users to submit new queries that focus on particular terms returned in an earlier query. MoreLikeThis queries can make use of faceting or clustering to provide additional aid to users.
A Solr component called a <<response-writers.adoc#response-writers,*response writer*>> manages the final presentation of the query response. Solr includes a variety of response writers, including an <<response-writers.adoc#ResponseWriters-TheStandardXMLResponseWriter,XML Response Writer>> and a <<response-writers.adoc#ResponseWriters-JSONResponseWriter,JSON Response Writer>>.
A Solr component called a <<response-writers.adoc#response-writers,*response writer*>> manages the final presentation of the query response. Solr includes a variety of response writers, including an <<response-writers.adoc#standard-xml-response-writer,XML Response Writer>> and a <<response-writers.adoc#json-response-writer,JSON Response Writer>>.
The diagram below summarizes some key elements of the search process.

View File

@ -24,7 +24,7 @@ In most search applications, the "top" matching results (sorted by score, or som
In many applications the UI for these sorted results are displayed to the user in "pages" containing a fixed number of matching results, and users don't typically look at results past the first few pages worth of results.
== Basic Pagination
In Solr, this basic paginated searching is supported using the `start` and `rows` parameters, and performance of this common behaviour can be tuned by utilizing the <<query-settings-in-solrconfig.adoc#QuerySettingsinSolrConfig-queryResultCache,`queryResultCache`>> and adjusting the <<query-settings-in-solrconfig.adoc#QuerySettingsinSolrConfig-queryResultWindowSize,`queryResultWindowSize`>> configuration options based on your expected page sizes.
In Solr, this basic paginated searching is supported using the `start` and `rows` parameters, and performance of this common behaviour can be tuned by utilizing the <<query-settings-in-solrconfig.adoc#queryresultcache,`queryResultCache`>> and adjusting the <<query-settings-in-solrconfig.adoc#queryresultwindowsize,`queryResultWindowSize`>> configuration options based on your expected page sizes.
=== Basic Pagination Examples
@ -103,7 +103,7 @@ There are a few important constraints to be aware of when using `cursorMark` par
* If `id` is your uniqueKey field, then sort params like `id asc` and `name asc, id desc` would both work fine, but `name asc` by itself would not
. Sorts including <<working-with-dates.adoc#working-with-dates,Date Math>> based functions that involve calculations relative to `NOW` will cause confusing results, since every document will get a new sort value on every subsequent request. This can easily result in cursors that never end, and constantly return the same documents over and over even if the documents are never updated.
+
In this situation, choose & re-use a fixed value for the <<working-with-dates.adoc#WorkingwithDates-NOW,`NOW` request param>> in all of your cursor requests.
In this situation, choose & re-use a fixed value for the <<working-with-dates.adoc#now,`NOW` request param>> in all of your cursor requests.
Cursor mark values are computed based on the sort values of each document in the result, which means multiple documents with identical sort values will produce identical Cursor mark values if one of them is the last document on a page of results. In that situation, the subsequent request using that `cursorMark` would not know which of the documents with the identical mark values should be skipped. Requiring that the uniqueKey field be used as a clause in the sort criteria guarantees that a deterministic ordering will be returned, and that every `cursorMark` value will identify a unique point in the sequence of documents.

View File

@ -24,7 +24,7 @@ The same statistics are also exposed via the <<mbean-request-handler.adoc#mbean-
These statistics are per core. When you are running in SolrCloud mode these statistics would co-relate to each performance of an individual replica.
== Request Handlers
== Request Handler Statistics
=== Update Request Handler
@ -93,7 +93,7 @@ Both Update Request Handler and Search Request Handler along with handlers like
|transaction_logs_total_size |Total size of all the TLogs created so far from the beginning of the Solr instance.
|===
== Caches
== Cache Statistics
=== Document Cache

View File

@ -22,11 +22,9 @@ Phonetic matching algorithms may be used to encode tokens so that two different
For overviews of and comparisons between algorithms, see http://en.wikipedia.org/wiki/Phonetic_algorithm and http://ntz-develop.blogspot.com/2011/03/phonetic-algorithms.html
[[PhoneticMatching-Beider-MorsePhoneticMatching_BMPM_]]
== Beider-Morse Phonetic Matching (BMPM)
For examples of how to use this encoding in your analyzer, see <<filter-descriptions.adoc#FilterDescriptions-Beider-MorseFilter,Beider Morse Filter>> in the Filter Descriptions section.
For examples of how to use this encoding in your analyzer, see <<filter-descriptions.adoc#beider-morse-filter,Beider Morse Filter>> in the Filter Descriptions section.
Beider-Morse Phonetic Matching (BMPM) is a "soundalike" tool that lets you search using a new phonetic matching system. BMPM helps you search for personal names (or just surnames) in a Solr/Lucene index, and is far superior to the existing phonetic codecs, such as regular soundex, metaphone, caverphone, etc.
@ -59,7 +57,7 @@ For more information, see here: http://stevemorse.org/phoneticinfo.htm and http:
== Daitch-Mokotoff Soundex
To use this encoding in your analyzer, see <<filter-descriptions.adoc#FilterDescriptions-Daitch-MokotoffSoundexFilter,Daitch-Mokotoff Soundex Filter>> in the Filter Descriptions section.
To use this encoding in your analyzer, see <<filter-descriptions.adoc#daitch-mokotoff-soundex-filter,Daitch-Mokotoff Soundex Filter>> in the Filter Descriptions section.
The Daitch-Mokotoff Soundex algorithm is a refinement of the Russel and American Soundex algorithms, yielding greater accuracy in matching especially Slavic and Yiddish surnames with similar pronunciation but differences in spelling.
@ -76,13 +74,13 @@ For more information, see http://en.wikipedia.org/wiki/Daitch%E2%80%93Mokotoff_S
== Double Metaphone
To use this encoding in your analyzer, see <<filter-descriptions.adoc#FilterDescriptions-DoubleMetaphoneFilter,Double Metaphone Filter>> in the Filter Descriptions section. Alternatively, you may specify `encoding="DoubleMetaphone"` with the <<filter-descriptions.adoc#FilterDescriptions-PhoneticFilter,Phonetic Filter>>, but note that the Phonetic Filter version will *not* provide the second ("alternate") encoding that is generated by the Double Metaphone Filter for some tokens.
To use this encoding in your analyzer, see <<filter-descriptions.adoc#double-metaphone-filter,Double Metaphone Filter>> in the Filter Descriptions section. Alternatively, you may specify `encoding="DoubleMetaphone"` with the <<filter-descriptions.adoc#phonetic-filter,Phonetic Filter>>, but note that the Phonetic Filter version will *not* provide the second ("alternate") encoding that is generated by the Double Metaphone Filter for some tokens.
Encodes tokens using the double metaphone algorithm by Lawrence Philips. See the original article at http://www.drdobbs.com/the-double-metaphone-search-algorithm/184401251?pgno=2
== Metaphone
To use this encoding in your analyzer, specify `encoding="Metaphone"` with the <<filter-descriptions.adoc#FilterDescriptions-PhoneticFilter,Phonetic Filter>>.
To use this encoding in your analyzer, specify `encoding="Metaphone"` with the <<filter-descriptions.adoc#phonetic-filter,Phonetic Filter>>.
Encodes tokens using the Metaphone algorithm by Lawrence Philips, described in "Hanging on the Metaphone" in Computer Language, Dec. 1990.
@ -91,7 +89,7 @@ Another reference for more information is http://www.drdobbs.com/the-double-meta
== Soundex
To use this encoding in your analyzer, specify `encoding="Soundex"` with the <<filter-descriptions.adoc#FilterDescriptions-PhoneticFilter,Phonetic Filter>>.
To use this encoding in your analyzer, specify `encoding="Soundex"` with the <<filter-descriptions.adoc#phonetic-filter,Phonetic Filter>>.
Encodes tokens using the Soundex algorithm, which is used to relate similar names, but can also be used as a general purpose scheme to find words with similar phonemes.
@ -99,7 +97,7 @@ See also http://en.wikipedia.org/wiki/Soundex.
== Refined Soundex
To use this encoding in your analyzer, specify `encoding="RefinedSoundex"` with the <<filter-descriptions.adoc#FilterDescriptions-PhoneticFilter,Phonetic Filter>>.
To use this encoding in your analyzer, specify `encoding="RefinedSoundex"` with the <<filter-descriptions.adoc#phonetic-filter,Phonetic Filter>>.
Encodes tokens using an improved version of the Soundex algorithm.
@ -107,7 +105,7 @@ See http://en.wikipedia.org/wiki/Soundex.
== Caverphone
To use this encoding in your analyzer, specify `encoding="Caverphone"` with the <<filter-descriptions.adoc#FilterDescriptions-PhoneticFilter,Phonetic Filter>>.
To use this encoding in your analyzer, specify `encoding="Caverphone"` with the <<filter-descriptions.adoc#phonetic-filter,Phonetic Filter>>.
Caverphone is an algorithm created by the Caversham Project at the University of Otago. The algorithm is optimised for accents present in the southern part of the city of Dunedin, New Zealand.
@ -115,7 +113,7 @@ See http://en.wikipedia.org/wiki/Caverphone and the Caverphone 2.0 specification
== Kölner Phonetik a.k.a. Cologne Phonetic
To use this encoding in your analyzer, specify `encoding="ColognePhonetic"` with the <<filter-descriptions.adoc#FilterDescriptions-PhoneticFilter,Phonetic Filter>>.
To use this encoding in your analyzer, specify `encoding="ColognePhonetic"` with the <<filter-descriptions.adoc#phonetic-filter,Phonetic Filter>>.
The Kölner Phonetik, an algorithm published by Hans Joachim Postel in 1969, is optimized for the German language.
@ -123,7 +121,7 @@ See http://de.wikipedia.org/wiki/K%C3%B6lner_Phonetik
== NYSIIS
To use this encoding in your analyzer, specify `encoding="Nysiis"` with the <<filter-descriptions.adoc#FilterDescriptions-PhoneticFilter,Phonetic Filter>>.
To use this encoding in your analyzer, specify `encoding="Nysiis"` with the <<filter-descriptions.adoc#phonetic-filter,Phonetic Filter>>.
NYSIIS is an encoding used to relate similar names, but can also be used as a general purpose scheme to find words with similar phonemes.

Some files were not shown because too many files have changed in this diff Show More