lucene/solr/solr-ref-guide/src/solr-glossary.adoc

205 lines
12 KiB
Plaintext
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

= Solr Glossary
:page-shortname: solr-glossary
:page-permalink: solr-glossary.html
:page-toc: false
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
These are common terms used with Solr.
== Solr Terms
Where possible, terms are linked to relevant parts of the Solr Reference Guide for more information.
*Jump to a letter:*
<<SolrGlossary-A,A>> <<SolrGlossary-B,B>> <<SolrGlossary-C,C>> <<SolrGlossary-D,D>> <<SolrGlossary-E,E>> <<SolrGlossary-F,F>> G H <<SolrGlossary-I,I>> J K <<SolrGlossary-L,L>> <<SolrGlossary-M,M>> <<SolrGlossary-N,N>> <<SolrGlossary-O,O>> P <<SolrGlossary-Q,Q>> <<SolrGlossary-R,R>> <<SolrGlossary-S,S>> <<SolrGlossary-T,T>> U V <<SolrGlossary-W,W>> X Y <<SolrGlossary-Z,Z>>
[[SolrGlossary-A]]
=== A
[[atomicupdates]]<<updating-parts-of-documents.adoc#UpdatingPartsofDocuments-AtomicUpdates,Atomic updates>>::
An approach to updating only one or more fields of a document, instead of reindexing the entire document.
[[SolrGlossary-B]]
=== B
[[booleanoperators]]Boolean operators::
These control the inclusion or exclusion of keywords in a query by using operators such as AND, OR, and NOT.
[[SolrGlossary-C]]
=== C
[[cluster]]Cluster::
In Solr, a cluster is a set of Solr nodes operating in coordination with each other via <<zookeeper,ZooKeeper>>, and managed as a unit. A cluster may contain many collections. See also <<solrclouddef,SolrCloud>>.
[[collection]]Collection::
In Solr, one or more <<document,Documents>> grouped together in a single logical index using a single configuration and Schema.
+
In <<solrclouddef,SolrCloud>> a collection may be divided up into multiple logical shards, which may in turn be distributed across many nodes, or in a Single node Solr installation, a collection may be a single <<core,Core>>.
[[defcommit]]Commit::
To make document changes permanent in the index. In the case of added documents, they would be searchable after a _commit_.
[[core]]Core::
An individual Solr instance (represents a logical index). Multiple cores can run on a single node. See also <<solrclouddef,SolrCloud>>.
[[corereload]]Core reload::
To re-initialize a Solr core after changes to `schema.xml`, `solrconfig.xml` or other configuration files.
[[SolrGlossary-D]]
=== D
[[distributedsearch]]Distributed search::
Distributed search is one where queries are processed across more than one <<shard,Shard>>.
[[document]]Document::
A group of <<field,fields>> and their values. Documents are the basic unit of data in a <<collection,collection>>. Documents are assigned to <<shard,shards>> using standard hashing, or by specifically assigning a shard within the document ID. Documents are versioned after each write operation.
[[SolrGlossary-E]]
=== E
[[ensemble]]Ensemble::
A <<zookeeper,ZooKeeper>> term to indicate multiple ZooKeeper instances running simultaneously and in coordination with each other for fault tolerance.
[[SolrGlossary-F]]
=== F
[[deffacet]]Facet::
The arrangement of search results into categories based on indexed terms.
[[field]]Field::
The content to be indexed/searched along with metadata defining how the content should be processed by Solr.
[[SolrGlossary-I]]
=== I
[[idf]]Inverse document frequency (IDF)::
A measure of the general importance of a term. It is calculated as the number of total Documents divided by the number of Documents that a particular word occurs in the collection. See http://en.wikipedia.org/wiki/Tf-idf and {lucene-javadocs}/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html[the Lucene TFIDFSimilarity javadocs] for more info on TF-IDF based scoring and Lucene scoring in particular. See also <<termfrequency,Term frequency>>.
[[invertedindex]]Inverted index::
A way of creating a searchable index that lists every word and the documents that contain those words, similar to an index in the back of a book which lists words and the pages on which they can be found. When performing keyword searches, this method is considered more efficient than the alternative, which would be to create a list of documents paired with every word used in each document. Since users search using terms they expect to be in documents, finding the term before the document saves processing resources and time.
[[SolrGlossary-L]]
=== L
[[leader]]Leader::
A single <<replica,Replica>> for each <<shard,Shard>> that takes charge of coordinating index updates (document additions or deletions) to other replicas in the same shard. This is a transient responsibility assigned to a node via an election, if the current Shard Leader goes down, a new node will automatically be elected to take its place. See also <<solrclouddef,SolrCloud>>.
[[SolrGlossary-M]]
=== M
[[metadata]]Metadata::
Literally, _data about data_. Metadata is information about a document, such as its title, author, or location.
[[SolrGlossary-N]]
=== N
[[naturallanguagequery]]Natural language query::
A search that is entered as a user would normally speak or write, as in, "What is aspirin?"
[[node]]Node::
A JVM instance running Solr. Also known as a Solr server.
[[SolrGlossary-O]]
=== O
[[optimisticconcurrency]]<<updating-parts-of-documents.adoc#UpdatingPartsofDocuments-OptimisticConcurrency,Optimistic concurrency>>::
Also known as "optimistic locking", this is an approach that allows for updates to documents currently in the index while retaining locking or version control.
[[overseer]]Overseer::
A single node in <<solrclouddef,SolrCloud>> that is responsible for processing and coordinating actions involving the entire cluster. It keeps track of the state of existing nodes, collections, shards, and replicas, and assigns new replicas to nodes. This is a transient responsibility assigned to a node via an election, if the current Overseer goes down, a new node will be automatically elected to take its place. See also <<solrclouddef,SolrCloud>>.
[[SolrGlossary-Q]]
=== Q
[[query-parser]]Query parser::
A query parser processes the terms entered by a user.
[[SolrGlossary-R]]
=== R
[[recall]]Recall::
The ability of a search engine to retrieve _all_ of the possible matches to a user's query.
[[relevancedef]]Relevance::
The appropriateness of a document to the search conducted by the user.
[[replica]]Replica::
A <<core,Core>> that acts as a physical copy of a <<shard,Shard>> in a <<solrclouddef,SolrCloud>> <<collection,Collection>>.
[[replication]]<<index-replication.adoc#index-replication,Replication>>::
A method of copying a master index from one server to one or more "slave" or "child" servers.
[[requesthandler]]<<requesthandlers-and-searchcomponents-in-solrconfig.adoc#requesthandlers-and-searchcomponents-in-solrconfig,RequestHandler>>::
Logic and configuration parameters that tell Solr how to handle incoming "requests", whether the requests are to return search results, to index documents, or to handle other custom situations.
[[SolrGlossary-S]]
=== S
[[searchcomponent]]<<requesthandlers-and-searchcomponents-in-solrconfig.adoc#requesthandlers-and-searchcomponents-in-solrconfig,SearchComponent>>::
Logic and configuration parameters used by request handlers to process query requests. Examples of search components include faceting, highlighting, and "more like this" functionality.
[[shard]]Shard::
In SolrCloud, a logical partition of a single <<collection,Collection>>. Every shard consists of at least one physical <<replica,Replica>>, but there may be multiple Replicas distributed across multiple <<node,Nodes>> for fault tolerance. See also <<solrclouddef,SolrCloud>>.
[[solrclouddef]]<<solrcloud.adoc#solrcloud,SolrCloud>>::
Umbrella term for a suite of functionality in Solr which allows managing a <<cluster,Cluster>> of Solr <<node,Nodes>> for scalability, fault tolerance, and high availability.
[[schema]]<<documents-fields-and-schema-design.adoc#documents-fields-and-schema-design,Solr Schema (managed-schema or schema.xml)>>::
The Solr index Schema defines the fields to be indexed and the type for the field (text, integers, etc.) By default schema data can be "managed" at run time using the <<schema-api.adoc#schema-api,Schema API>> and is typically kept in a file named `managed-schema` which Solr modifies as needed, but a collection may be configured to use a static Schema, which is only loaded on startup from a human edited configuration file - typically named `schema.xml`. See <<schema-factory-definition-in-solrconfig.adoc#schema-factory-definition-in-solrconfig,Schema Factory Definition in SolrConfig>> for details.
[[solrconfig]]<<the-well-configured-solr-instance.adoc#the-well-configured-solr-instance,SolrConfig (solrconfig.xml)>>::
The Apache Solr configuration file. Defines indexing options, RequestHandlers, highlighting, spellchecking and various other configurations. The file, solrconfig.xml is located in the Solr home conf directory.
[[spellcheck]]<<spell-checking.adoc#spell-checking,Spell Check>>::
The ability to suggest alternative spellings of search terms to a user, as a check against spelling errors causing few or zero results.
[[stopwords]]Stopwords::
Generally, words that have little meaning to a user's search but which may have been entered as part of a <<naturallanguagequery,natural language>> query. Stopwords are generally very small pronouns, conjunctions and prepositions (such as, "the", "with", or "and")
[[suggesterdef]]<<suggester.adoc#suggester,Suggester>>::
Functionality in Solr that provides the ability to suggest possible query terms to users as they type.
[[synonyms]]Synonyms::
Synonyms generally are terms which are near to each other in meaning and may substitute for one another. In a search engine implementation, synonyms may be abbreviations as well as words, or terms that are not consistently hyphenated. Examples of synonyms in this context would be "Inc." and "Incorporated" or "iPod" and "i-pod".
[[SolrGlossary-T]]
=== T
[[termfrequency]]Term frequency::
The number of times a word occurs in a given document. See http://en.wikipedia.org/wiki/Tf-idf and {lucene-javadocs}/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html[the Lucene TFIDFSimilarity javadocs] for more info on TF-IDF based scoring and Lucene scoring in particular. See also <<idf,Inverse document frequency (IDF)>>.
[[transactionlog]]Transaction log::
An append-only log of write operations maintained by each <<replica,Replica>>. This log is required with SolrCloud implementations and is created and managed automatically by Solr.
[[SolrGlossary-W]]
=== W
[[wildcard]]Wildcard::
A wildcard allows a substitution of one or more letters of a word to account for possible variations in spelling or tenses.
[[SolrGlossary-Z]]
=== Z
[[zookeeper]]ZooKeeper::
Also known as http://zookeeper.apache.org/[Apache ZooKeeper]. The system used by SolrCloud to keep track of configuration files and node names for a cluster. A ZooKeeper cluster is used as the central configuration store for the cluster, a coordinator for operations requiring distributed synchronization, and the system of record for cluster topology. See also <<solrclouddef,SolrCloud>>.