When SPLITSHARD is issued asynchronously, any exception in a sub-operation isn't propagated and the overall
SPLITSHARD task proceeds as if there were no failures. This results in marking the active parent shard inactive
and can result in two empty sub-shards, thus causing data loss.
use the underlying ZKStateReader of the ClusterStateProvider when waiting for the alias ZNodeVersion to change
prior versions of the test waited using the zkStateReader of the remote client, but there was no garuntee that the state had been updated on the ClusterStateProvider being used by the test
* Refactor existing work around in BasicAuthIntegrationTest up into SolrCloudAuthTestCase for re-use in JWTAuthPluginIntegrationTest
* Simplify BasicAuthOnSingleNodeTest and PKIAuthenticationIntegrationTest to use their existing (static) security settings on creation of MiniSolrCloud. Since they no longer modify security.json once the nodes are alive, the issue no longer affects them
The leader node on the target cluster will now increment its term after bootstrap succeeds so that all replicas of this leader are forced to recover and fetch the latest index from the leader.
A new replicaType property has been added to NodeAddTrigger so that new replicas of the given type are added when the preferredOp is addreplica. The default value of replicaType is `NRT`.
This closes#821.
my previous commit added waitForState calls to doIntegrationTest that forgot to take into account initial repFactor when createShard was called
as a result, the test could only pass if wather was called after a initial leader went live, before other replicas became live
this commit fixes this mistake, and hardens the assertions about the location of those replicas given the rule in use
also adds new expecation that trying to add additional replicas that would violate rule will cause request ot fail
These fixes all relate to testWatcher + testMultipleWatchers:
* add additional asserts to the test methods to assert the expected property values are found
* mark Watcher.props volatile to prevent stale read by test thread
* add some randomization to Watcher.props to either come from the onStateChanged() input or
from an explicit call to ZkStateReader.getCollectionProperties
- previuosly, for reasons i don't understand, the test only consulted
ZkStateReader.getCollectionProperties inside the Watcher, and ignored the onStateChanged()
input
- now the test validates both
* move all Watcher.triggered access into the existing synchronization blocks to prevent
waitForTrigger() from returning prematurely due to gaining synch lock _after_
Watcher.triggered was incremented in onStateChanged(), but _before_ onStateChanged() updated
Watcher.props
* add detailed logging to provide additional info to help debug any additional jenkins failures
that might pop up in the future if these fixes aren't sufficient
Remove errors from each host detail map
Display secureClientPort and server.1, server.2, server.3...
Added test for various failure responses and expected result from multiple nodes
* ensure all collections/replicas are active
* use waitForState or waitForActiveCollection before checking rules/snitch to prevent false failures on stale state
* ensure cluster policy is cleared after each test method
Some of these changes should also help ensure we don't get (more) spurious failures due to SOLR-13616
* SOLR-13565: initial commit
* SOLR-13565: updated with testcase
* SOLR-13565: removed unused methods
* SOLR-13565: better logging
* SOLR-13565: disable SSL
* SOLR-13565: more tests
* SOLR-13565: syncing with master
* SOLR-13565: fixing tests
* SOLR-13565: fixing tests
* SOLR-13534: Fix test
Remove buggy 'port roulette' code that can easily fail if OS gives the selected port to a different process just before creating the server
Use jetty's built in support for listining on an OS selected port instead
Also increase timeouts to better account for slow/heavily loaded (ie:jenkins) VMs where SolrCore reloading may take longer then 10 seconds
* SOLR-13565: set proper permission name
* SOLR-13565: syncing with master
* SOLR-13565: syncing with master
* SOLR-13565: removed accidental change
* SOLR-13565: removed accidental change
* SOLR-13565: removed accidental change
* SOLR-13565: more tests
* SOLR-13565: Tests with key signing tests
* SOLR-13565: fixing concurrency issues in tests
* SOLR-13565: add tests with 512 bit RSA
* SOLR-13565: fixing concurrency issues
* SOLR-13565: remove unused code
this point in the test since the collections available are changing due
to deletions and we might try to communicate with a collection
that was (correctly) deleted.
If the field is non-stored, non-indexed and docvalue enabled numeric field
then inplace update can be done. previously, lucene didn't support
docvalue update for field that is not yet present in indexWriter but
LUCENE-8316 added support for this.
This adds support to update field which satisfies inplace conditions
but which doesn't yet exist in any docs
Remove buggy 'port roulette' code that can easily fail if OS gives the selected port to a different process just before creating the server
Use jetty's built in support for listining on an OS selected port instead
Also increase timeouts to better account for slow/heavily loaded (ie:jenkins) VMs where SolrCore reloading may take longer then 10 seconds
* tighten assertions related to type of watcher that should be removed
* use waitForActiveCollection before deleting collections to work around SOLR-13616 and/or SOLR-13627
- ensure all collections/replicas are active
- tighten assertions around expected replica locations
- eliminate some redundent code
These changes should also help ensure we don't get (more) spurious failures due to SOLR-13616
Payload func with undefined used to throw NPE. In SOLR-11610, this
was fixed to return proper error but there are no tests to verify
changed the behavior.
This add simple test to verify error code and error message
This commit introduces custom tiebreakers which allows users to
specify custom tiebreakers when ordering hits to return. A
default tiebreaker is introduced for tie breaking on shard index
first and then docID.
group.query after execution forms QueryCommandResult. In case of
group.main=true or group.format=simple, QueryCommandResult was not
consumed in EndResultTransformer. Also, MainEndResultTransformer assumed
that always group.field would be specified. When group.field not specified
it failed with AIOOBE. After adding suppport for QueryCommandResult in
EndResultTransformers and handling AIOOBE, group.query started giving results
Working on tests exposed few other issues. Results differed b/w standalone
& distributed mode.
* One of the reason is that TopGroupShardResponseProcessor doesn't consider correct
limit and offset when group format is simple. In case of simple, start and rows should be used
as limit and offset instead of group.limit and group.offset.
* Secondly, In distributed second phase grouping, computing docsToCollect didn't consider
group response format. This issue is again similar to above issue
* offset(group.offset or start) not being considered during TopDocs#merge caused
different results. The fix was to use to offset in merge process
* group.offset doesn't support negative values but there is no checks on the value.
In case of negative values AIOOBE. Now, checks are added for negative values and
returns proper error message(this change is for both standalone and distrbuted).
Validation is done only in case of group.format=grouped as that is only case when
group.offset is consumed.
Fixing above issues resolved the differences b/w standalone and distributed mode.
* When ramPerThreadHardLimitMB is not specified, then Lucene's
default value 1945 is used. The specified value should be
greater than 0 and less than 2048MB
* Improve error message when collapsing is not supported on given
fieldtype
* Return 400 error code when unsupported value are passed for max,min
or in case of syntax error
This reverts commit 6d6f14d391.
Reason for revert: after doing more testing I realized there are tests I overlooked which can (with randomized SSL usage) trigger NullPointerException
(or other related failures) in After/AfterClass due assumptions about cleanup that isn't actaully neccessary due to the AssumptionFailure
that may occur during Before/BeforeClass
* Inplace update supports set and inc operation but when null or
empty list is specified with set op, then it should always be treated
as atomic update since this case is equivalent to removing field
from the document
The UH now detects that parts of the query are not understood by it.
When found, it highlights more safely/reliably.
Fixes compatibility with complex and surround query parsers.
this test already uses waitForState (frequently via verifyReplicaStatus) so there is no reason to include CPU/network/ZK intensive infinite loop checks looking for udpated cluster state
This specific commit affects all points in the casebase where the argument of a StringBuilder.append() call is itself a regular String concatenation.
This defeats the purpose of using StringBuilder and also introduces an extra alloction.
These changes should avoid that.
ant tests have run, succeeded on local machine.
Removing test files from the changes.
Another suggested rework.