This commit is contained in:
Karl Wright 2018-04-08 06:44:45 -04:00
commit bd8fe72426
16 changed files with 93 additions and 36 deletions

View File

@ -153,6 +153,11 @@ Build
* LUCENE-8230: Upgrade forbiddenapis to version 2.5. (Uwe Schindler)
Documentation
* LUCENE-8238: Improve WordDelimiterFilter and WordDelimiterGraphFilter javadocs
xo (Mike Sokolov via Mike McCandless)
======================= Lucene 7.3.0 =======================
API Changes

View File

@ -55,11 +55,14 @@ import org.apache.lucene.util.InPlaceMergeSorter;
* </li>
* </ul>
*
* The <b>combinations</b> parameter affects how subwords are combined:
* The <b>GENERATE...</b> options affect how incoming tokens are broken into parts, and the
* various <b>CATENATE_...</b> parameters affect how those parts are combined.
*
* <ul>
* <li>combinations="0" causes no subword combinations: <code>"PowerShot"</code>
* &#8594; <code>0:"Power", 1:"Shot"</code> (0 and 1 are the token positions)</li>
* <li>combinations="1" means that in addition to the subwords, maximum runs of
* <li>If no CATENATE option is set, then no subword combinations are generated:
* <code>"PowerShot"</code> &#8594; <code>0:"Power", 1:"Shot"</code> (0 and 1 are the token
* positions)</li>
* <li>CATENATE_WORDS means that in addition to the subwords, maximum runs of
* non-numeric subwords are catenated and produced at the same position of the
* last subword in the run:
* <ul>
@ -72,12 +75,15 @@ import org.apache.lucene.util.InPlaceMergeSorter;
* </li>
* </ul>
* </li>
* <li>CATENATE_NUMBERS works like CATENATE_WORDS, but for adjacent digit sequences.</li>
* <li>CATENATE_ALL smushes together all the token parts without distinguishing numbers and words.</li>
* </ul>
*
* One use for {@link WordDelimiterFilter} is to help match words with different
* subword delimiters. For example, if the source text contained "wi-fi" one may
* want "wifi" "WiFi" "wi-fi" "wi+fi" queries to all match. One way of doing so
* is to specify combinations="1" in the analyzer used for indexing, and
* combinations="0" (the default) in the analyzer used for querying. Given that
* is to specify CATENATE options in the analyzer used for indexing, and
* not in the analyzer used for querying. Given that
* the current {@link StandardTokenizer} immediately removes many intra-word
* delimiters, it is recommended that this filter be used after a tokenizer that
* does not do this (such as {@link WhitespaceTokenizer}).

View File

@ -62,11 +62,14 @@ import org.apache.lucene.util.RamUsageEstimator;
* </li>
* </ul>
*
* The <b>combinations</b> parameter affects how subwords are combined:
* The <b>GENERATE...</b> options affect how incoming tokens are broken into parts, and the
* various <b>CATENATE_...</b> parameters affect how those parts are combined.
*
* <ul>
* <li>combinations="0" causes no subword combinations: <code>"PowerShot"</code>
* &#8594; <code>0:"Power", 1:"Shot"</code> (0 and 1 are the token positions)</li>
* <li>combinations="1" means that in addition to the subwords, maximum runs of
* <li>If no CATENATE option is set, then no subword combinations are generated:
* <code>"PowerShot"</code> &#8594; <code>0:"Power", 1:"Shot"</code> (0 and 1 are the token
* positions)</li>
* <li>CATENATE_WORDS means that in addition to the subwords, maximum runs of
* non-numeric subwords are catenated and produced at the same position of the
* last subword in the run:
* <ul>
@ -79,12 +82,14 @@ import org.apache.lucene.util.RamUsageEstimator;
* </li>
* </ul>
* </li>
* <li>CATENATE_NUMBERS works like CATENATE_WORDS, but for adjacent digit sequences.</li>
* <li>CATENATE_ALL smushes together all the token parts without distinguishing numbers and words.</li>
* </ul>
* One use for {@link WordDelimiterGraphFilter} is to help match words with different
* subword delimiters. For example, if the source text contained "wi-fi" one may
* want "wifi" "WiFi" "wi-fi" "wi+fi" queries to all match. One way of doing so
* is to specify combinations="1" in the analyzer used for indexing, and
* combinations="0" (the default) in the analyzer used for querying. Given that
* is to specify CATENATE options in the analyzer used for indexing, and not
* in the analyzer used for querying. Given that
* the current {@link StandardTokenizer} immediately removes many intra-word
* delimiters, it is recommended that this filter be used after a tokenizer that
* does not do this (such as {@link WhitespaceTokenizer}).

View File

@ -2222,12 +2222,7 @@ public class TestIndexSorting extends LuceneTestCase {
// pits index time sorting against query time sorting
public void testRandom3() throws Exception {
int numDocs;
if (TEST_NIGHTLY) {
numDocs = atLeast(100000);
} else {
numDocs = atLeast(1000);
}
int numDocs = atLeast(1000);
List<RandomDoc> docs = new ArrayList<>();
Sort sort = randomSort();

View File

@ -111,6 +111,9 @@ Bug Fixes
* SOLR-11929: UpdateLog metrics are not initialized on core reload. (ab, Steve Rowe)
* SOLR-12199: TestReplicationHandler.doTestRepeater(): TEST_PORT interpolation failure:
Server refused connection at: http://127.0.0.1:TEST_PORT/solr (Mikhail Khludnev, Dawid Weiss, Steve Rowe)
Optimizations
----------------------

View File

@ -294,6 +294,7 @@ public class TestReplicationHandler extends SolrTestCaseJ4 {
public void doTestDetails() throws Exception {
slaveJetty.stop();
slave.setTestPort(masterJetty.getLocalPort());
slave.copyConfigFile(CONF_DIR + "solrconfig-slave.xml", "solrconfig.xml");
slaveJetty = createJetty(slave);
@ -706,6 +707,7 @@ public class TestReplicationHandler extends SolrTestCaseJ4 {
public void doTestIndexFetchWithMasterUrl() throws Exception {
//change solrconfig on slave
//this has no entry for pollinginterval
slave.setTestPort(masterJetty.getLocalPort());
slave.copyConfigFile(CONF_DIR + "solrconfig-slave1.xml", "solrconfig.xml");
slaveJetty.stop();
slaveJetty = createJetty(slave);
@ -841,6 +843,7 @@ public class TestReplicationHandler extends SolrTestCaseJ4 {
try {
slave.setTestPort(masterJetty.getLocalPort());
slave.copyConfigFile(CONF_DIR +"solrconfig-slave1.xml", "solrconfig.xml");
slave.copyConfigFile(CONF_DIR +slaveSchema, "schema.xml");
slaveJetty.stop();
@ -986,6 +989,7 @@ public class TestReplicationHandler extends SolrTestCaseJ4 {
@Test
public void doTestRepeater() throws Exception {
// no polling
slave.setTestPort(masterJetty.getLocalPort());
slave.copyConfigFile(CONF_DIR + "solrconfig-slave1.xml", "solrconfig.xml");
slaveJetty.stop();
slaveJetty = createJetty(slave);
@ -993,7 +997,7 @@ public class TestReplicationHandler extends SolrTestCaseJ4 {
slaveClient = createNewSolrClient(slaveJetty.getLocalPort());
try {
repeater = new SolrInstance(createTempDir("solr-instance").toFile(), "repeater", null);
repeater = new SolrInstance(createTempDir("solr-instance").toFile(), "repeater", masterJetty.getLocalPort());
repeater.setUp();
repeater.copyConfigFile(CONF_DIR + "solrconfig-repeater.xml",
"solrconfig.xml");

View File

@ -21,6 +21,7 @@ import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashSet;
import java.util.List;
import java.util.Objects;
import java.util.Set;
import org.apache.lucene.analysis.Analyzer;
@ -42,17 +43,17 @@ import org.apache.lucene.index.RandomIndexWriter;
import org.apache.lucene.index.SortedSetDocValues;
import org.apache.lucene.index.Term;
import org.apache.lucene.index.Terms;
import org.apache.lucene.index.TermsEnum.SeekStatus;
import org.apache.solr.legacy.LegacyIntField;
import org.apache.solr.legacy.LegacyLongField;
import org.apache.solr.legacy.LegacyNumericUtils;
import org.apache.lucene.index.TermsEnum;
import org.apache.lucene.index.TermsEnum.SeekStatus;
import org.apache.lucene.store.Directory;
import org.apache.lucene.util.BytesRef;
import org.apache.lucene.util.LuceneTestCase;
import org.apache.lucene.util.StringHelper;
import org.apache.lucene.util.TestUtil;
import org.apache.solr.index.SlowCompositeReaderWrapper;
import org.apache.solr.legacy.LegacyIntField;
import org.apache.solr.legacy.LegacyLongField;
import org.apache.solr.legacy.LegacyNumericUtils;
// TODO:
// - test w/ del docs
@ -145,6 +146,9 @@ public class TestDocTermOrds extends LuceneTestCase {
public void testTriggerUnInvertLimit() throws IOException {
final boolean SHOULD_TRIGGER = false; // Set this to true to use the test with the old implementation
assumeFalse("Don't run this massive test with MemoryPostingsFormat, as it can OOM",
Objects.equals(Codec.getDefault().postingsFormat().getName(), "Memory"));
// Ensure enough terms inside of a single UnInvert-pass-structure to trigger the limit
final int REF_LIMIT = (int) Math.pow(2, 24); // Maximum number of references within a single pass-structure
final int DOCS = (1<<16)-1; // The number of documents within a single pass (simplified)

View File

@ -213,6 +213,7 @@
imagesDir="${build.content.dir}"
doctype="book"
safemode="unsafe">
<attribute key="attribute-missing" value="warn" />
<attribute key="section-toc" value='' /><!-- we don't use these in the pdf -->
<attribute key="icons" value="font" />
<attribute key="icon-set" value="fa" />

View File

@ -91,6 +91,7 @@ asciidoctor:
safe: 0
attributes:
<<: *solr-attributes-ref
attribute-missing: "warn"
icons: "font"
source-highlighter: "pygments"
pygments-css: "style"

View File

@ -762,7 +762,7 @@ span.label.label-primary {
.col-lg-12 .nav li a {background-color: white}
a code {
color: ##2156a5;
color: #2156a5;
}
table th code {

View File

@ -61,7 +61,7 @@ The LTR contrib module includes several feature classes as well as support for c
|solr query |{solr-javadocs}/solr-ltr/org/apache/solr/ltr/feature/SolrFeature.html[SolrFeature] |`{"q":"{!func}` `recip(ms(NOW,last_modified)` `,3.16e-11,1,1)"}` |supported
|solr filter query |{solr-javadocs}/solr-ltr/org/apache/solr/ltr/feature/SolrFeature.html[SolrFeature] |`{"fq":["{!terms f=category}book"]}` |supported
|solr query + filter query |{solr-javadocs}/solr-ltr/org/apache/solr/ltr/feature/SolrFeature.html[SolrFeature] |`{"q":"{!func}` `recip(ms(NOW,last_modified),` `3.16e-11,1,1)",` `"fq":["{!terms f=category}book"]}` |supported
|value |{solr-javadocs}/solr-ltr/org/apache/solr/ltr/feature/ValueFeature.html[ValueFeature] |`{"value":"${userFromMobile}","required":true}` |supported
|value |{solr-javadocs}/solr-ltr/org/apache/solr/ltr/feature/ValueFeature.html[ValueFeature] |`{"value":"$\{userFromMobile}","required":true}` |supported
|(custom) |(custom class extending {solr-javadocs}/solr-ltr/org/apache/solr/ltr/feature/Feature.html[Feature]) | |
|===

View File

@ -141,5 +141,5 @@ These attributes use variables that are inserted by Ant during the PDF creation
`<attribute key="solr-docs-version" value="${solr-docs-version}" />`:: The version of Solr covered by this guide.
`<attribute key="solr-javadocs" value="${solr-javadocs}" />`:: Sets the path for Solr javadoc links to include the right path for the current release version.
`<attribute key="lucene-javadocs" value="${lucene-javadocs}" />`:: Sets the path for Lucene javadoc links to the right path for the current release version.
`<attribute key="build-date" value="${DSTAMP}" />`:: Sets the date of the build to add the date to the footer of each page of the PDF.
`<attribute key="build-date" value="$\{DSTAMP}" />`:: Sets the date of the build to add the date to the footer of each page of the PDF.
`<attribute key="build-year" value="${current.year}" />`:: Sets the year of the build to add the date to the copyright notice.

View File

@ -84,7 +84,7 @@ Tag values come from a plugin called Snitch. If there is a tag named rack
* *node*: node name
* *role*: The role of the node. The only supported role is 'overseer'
* *ip_1, ip_2, ip_3, ip_4*: These are ip fragments for each node. For example, in a host with ip `192.168.1.2`, `ip_1 = 2`, `ip_2 =1`, `ip_3 = 168` and` ip_4 = 192`
* *sysprop.{PROPERTY_NAME}*: These are values available from system properties. `sysprop.key` means a value that is passed to the node as `-Dkey=keyValue` during the node startup. It is possible to use rules like `sysprop.key:expectedVal,shard:*`
* *sysprop.\{PROPERTY_NAME}*: These are values available from system properties. `sysprop.key` means a value that is passed to the node as `-Dkey=keyValue` during the node startup. It is possible to use rules like `sysprop.key:expectedVal,shard:*`
=== How Snitches are Configured

View File

@ -58,7 +58,7 @@ The core functionality of atomically updating a document requires that all field
If `<copyField/>` destinations are configured as stored, then Solr will attempt to index both the current value of the field as well as an additional copy from any source fields. If such fields contain some information that comes from the indexing program and some information that comes from copyField, then the information which originally came from the indexing program will be lost when an atomic update is made.
There are other kinds of derived fields that must also be set so they aren't stored. Some spatial field types use derived fields. Examples of this are solr.BBoxField and solr.LatLonType. CurrencyFieldType also uses derived fields. These types create additional fields which are normally specified by a dynamic field definition. That dynamic field definition must be not stored, or indexing will fail.
There are other kinds of derived fields that must also be set so they aren't stored. Some spatial field types, such as BBoxField and LatLonType, use derived fields. CurrencyFieldType also uses derived fields. These types create additional fields which are normally specified by a dynamic field definition. That dynamic field definition must be not stored, or indexing will fail.
=== Example Updating Part of a Document
@ -188,30 +188,61 @@ When the client resubmits a changed document to Solr, the `\_version_` can be in
If the document being updated does not include the `\_version_` field, and atomic updates are not being used, the document will be treated by normal Solr rules, which is usually to discard the previous version.
When using Optimistic Concurrency, clients can include an optional `versions=true` request parameter to indicate that the _new_ versions of the documents being added should be included in the response. This allows clients to immediately know what the `\_version_` is of every documented added without needing to make a redundant <<realtime-get.adoc#realtime-get,`/get` request>>.
When using Optimistic Concurrency, clients can include an optional `versions=true` request parameter to indicate that the _new_ versions of the documents being added should be included in the response. This allows clients to immediately know what the `\_version_` is of every document added without needing to make a redundant <<realtime-get.adoc#realtime-get,`/get` request>>.
For example:
Following are some examples using `versions=true` in queries:
[source]
[source,bash]
----
$ curl -X POST -H 'Content-Type: application/json' 'http://localhost:8983/solr/techproducts/update?versions=true' --data-binary '
[ { "id" : "aaa" },
{ "id" : "bbb" } ]'
----
[source,json]
----
{"responseHeader":{"status":0,"QTime":6},
"adds":["aaa",1498562471222312960,
"bbb",1498562471225458688]}
----
In this example, we have added 2 documents "aaa" and "bbb". Because we added `versions=true` to the request, the response shows the document version for each document.
[source,bash]
----
$ curl -X POST -H 'Content-Type: application/json' 'http://localhost:8983/solr/techproducts/update?_version_=999999&versions=true' --data-binary '
[{ "id" : "aaa",
"foo_s" : "update attempt with wrong existing version" }]'
----
[source,json]
----
{"responseHeader":{"status":409,"QTime":3},
"error":{"msg":"version conflict for aaa expected=999999 actual=1498562471222312960",
"code":409}}
----
In this example, we've attempted to update document "aaa" but specified the wrong version in the request: `_version_=999999` doesn't match the document version we just got when we added the document. We get an error in response.
[source,bash]
----
$ curl -X POST -H 'Content-Type: application/json' 'http://localhost:8983/solr/techproducts/update?_version_=1498562471222312960&versions=true&commit=true' --data-binary '
[{ "id" : "aaa",
"foo_s" : "update attempt with correct existing version" }]'
----
[source,json]
----
{"responseHeader":{"status":0,"QTime":5},
"adds":["aaa",1498562624496861184]}
----
Now we've sent an update with a value for `\_version_` that matches the value in the index, and it succeeds. Because we included `versions=true` to the update request, the response includes a different value for the `\_version_` field.
[source,bash]
----
$ curl 'http://localhost:8983/solr/techproducts/query?q=*:*&fl=id,_version_'
----
[source,json]
----
{
"responseHeader":{
"status":0,
@ -229,11 +260,13 @@ $ curl 'http://localhost:8983/solr/techproducts/query?q=*:*&fl=id,_version_'
}}
----
For more information, please also see https://www.youtube.com/watch?v=WYVM6Wz-XTw[Yonik Seeley's presentation on NoSQL features in Solr 4] from Apache Lucene EuroCon 2012.
Finally, we can issue a query that requests the `\_version_` field be included in the response, and we can see that for the two documents in our example index.
For more information, please also see Yonik Seeley's presentation on https://www.youtube.com/watch?v=WYVM6Wz-XTw[NoSQL features in Solr 4] from Apache Lucene EuroCon 2012.
== Document Centric Versioning Constraints
Optimistic Concurrency is extremely powerful, and works very efficiently because it uses an internally assigned, globally unique values for the `\_version_` field. However, In some situations users may want to configure their own document specific version field, where the version values are assigned on a per-document basis by an external system, and have Solr reject updates that attempt to replace a document with an "older" version. In situations like this the {solr-javadocs}/solr-core/org/apache/solr/update/processor/DocBasedVersionConstraintsProcessorFactory.html[`DocBasedVersionConstraintsProcessorFactory`] can be useful.
Optimistic Concurrency is extremely powerful, and works very efficiently because it uses an internally assigned, globally unique values for the `\_version_` field. However, in some situations users may want to configure their own document specific version field, where the version values are assigned on a per-document basis by an external system, and have Solr reject updates that attempt to replace a document with an "older" version. In situations like this the {solr-javadocs}/solr-core/org/apache/solr/update/processor/DocBasedVersionConstraintsProcessorFactory.html[`DocBasedVersionConstraintsProcessorFactory`] can be useful.
The basic usage of `DocBasedVersionConstraintsProcessorFactory` is to configure it in `solrconfig.xml` as part of the <<update-request-processors.adoc#update-request-processor-configuration,UpdateRequestProcessorChain>> and specify the name of your custom `versionField` in your schema that should be checked when validating updates:

View File

@ -65,7 +65,7 @@ If you have a `/var/solr/solr.in.sh` file for your existing Solr install, runnin
Open `/etc/default/solr.in.sh` with a text editor and verify that the following variables are set correctly, or add them bottom of the include file as needed:
[source]
[source,properties]
ZK_HOST=
SOLR_HOST=
SOLR_PORT=

View File

@ -306,7 +306,7 @@ For MySQL driver, which doesn't honor fetchSize and pulls whole resultSet, which
+
In this case, set `batchSize=-1` that pass setFetchSize(Integer.MIN_VALUE), and switch result set to pull row by row
All of them substitute properties via `${placeholders}`.
All of them substitute properties via `$\{placeholders}`.
=== URLDataSource