SOLR-13822: A Package management system with the following features. A packages.json in ZK to store
the configuration, APIs to read/edit them and isolated classloaders to load the classes from
those packages if the 'class' attribute is prefixed with `<package-name>:`
* Wait for leader in testBasicLeaderElection
It can take some time (>4 seconds) to elect a new leader, and if the update is attempted immediately it'll fail. Need to either wait for the leader or retry the udpate in case of failure (which is what clients would do)
* Wait for leader to be active in testKillTlogReplica
* Add hack to prevent unrelated failure
* Reduce the time wait time for replica state change
* A more robust attempt to add replicas in the tests
* Wait for replication for 2 times the replication time
ZkController#getSolrCloudManager() created a new instance of ZkStateReader, thereby causing mismatch in the
visibility of the cluster state and, as a result, undesired race conditions.
* SOLR-13815: add simple live split test to help debugging possible issue
* SOLR-13815: fix live split data loss due to cluster state change berween checking current shard state and getting list of subShards
* SOLR-13760 - restore viability of date math in TRA start property by
fixing the start date for time routed aliases
upon the receipt of the first document to avoid problems
with date math calculations required by DRA's
Prior to this commit, the ByteArrayUtf8CharSequence issues had been
fixed on single value removeregex commands, but not if multiple regex's
were used.
This commit fixes our NamedList parsing for this additional case. It
also adds some tests for related atomic-update cases.
Co-Authored-By: Tim Owen
This is to make V2 APIs easier to write and less error prone
* All specs are always in sync with code
* specs are generated from code
* no need to learn and write json schema
remove use of LbSolrClient to prevent premature failure of low timeAllowed options on slow jenkins machines
increase cluster size to also test codepaths where requests are proxied by a node that does not host any core in the collection
replace implicit assumption about default index order with explicit assumption about uniqueKey order, to prevent spurious failures when concurrent out of order merges take place
This groundwork commit allows tests to randomize request content-type
more flexibly. This will be taken advantage of by subsequent commits.
Co-Authored-By: Thomas Woeckinger
Closes: #755
SOLR-13649: Property 'blockUnknown' of BasicAuthPlugin and JWTAuthPlugin now defaults to 'true'. This change is backward incompatible. To achieve the previous default behavior, explicitly set 'blockUnknown':'false' in security.json
When SPLITSHARD is issued asynchronously, any exception in a sub-operation isn't propagated and the overall
SPLITSHARD task proceeds as if there were no failures. This results in marking the active parent shard inactive
and can result in two empty sub-shards, thus causing data loss.
use the underlying ZKStateReader of the ClusterStateProvider when waiting for the alias ZNodeVersion to change
prior versions of the test waited using the zkStateReader of the remote client, but there was no garuntee that the state had been updated on the ClusterStateProvider being used by the test
* Refactor existing work around in BasicAuthIntegrationTest up into SolrCloudAuthTestCase for re-use in JWTAuthPluginIntegrationTest
* Simplify BasicAuthOnSingleNodeTest and PKIAuthenticationIntegrationTest to use their existing (static) security settings on creation of MiniSolrCloud. Since they no longer modify security.json once the nodes are alive, the issue no longer affects them
The leader node on the target cluster will now increment its term after bootstrap succeeds so that all replicas of this leader are forced to recover and fetch the latest index from the leader.
A new replicaType property has been added to NodeAddTrigger so that new replicas of the given type are added when the preferredOp is addreplica. The default value of replicaType is `NRT`.
This closes#821.
my previous commit added waitForState calls to doIntegrationTest that forgot to take into account initial repFactor when createShard was called
as a result, the test could only pass if wather was called after a initial leader went live, before other replicas became live
this commit fixes this mistake, and hardens the assertions about the location of those replicas given the rule in use
also adds new expecation that trying to add additional replicas that would violate rule will cause request ot fail
These fixes all relate to testWatcher + testMultipleWatchers:
* add additional asserts to the test methods to assert the expected property values are found
* mark Watcher.props volatile to prevent stale read by test thread
* add some randomization to Watcher.props to either come from the onStateChanged() input or
from an explicit call to ZkStateReader.getCollectionProperties
- previuosly, for reasons i don't understand, the test only consulted
ZkStateReader.getCollectionProperties inside the Watcher, and ignored the onStateChanged()
input
- now the test validates both
* move all Watcher.triggered access into the existing synchronization blocks to prevent
waitForTrigger() from returning prematurely due to gaining synch lock _after_
Watcher.triggered was incremented in onStateChanged(), but _before_ onStateChanged() updated
Watcher.props
* add detailed logging to provide additional info to help debug any additional jenkins failures
that might pop up in the future if these fixes aren't sufficient
Remove errors from each host detail map
Display secureClientPort and server.1, server.2, server.3...
Added test for various failure responses and expected result from multiple nodes
* ensure all collections/replicas are active
* use waitForState or waitForActiveCollection before checking rules/snitch to prevent false failures on stale state
* ensure cluster policy is cleared after each test method
Some of these changes should also help ensure we don't get (more) spurious failures due to SOLR-13616
* SOLR-13565: initial commit
* SOLR-13565: updated with testcase
* SOLR-13565: removed unused methods
* SOLR-13565: better logging
* SOLR-13565: disable SSL
* SOLR-13565: more tests
* SOLR-13565: syncing with master
* SOLR-13565: fixing tests
* SOLR-13565: fixing tests
* SOLR-13534: Fix test
Remove buggy 'port roulette' code that can easily fail if OS gives the selected port to a different process just before creating the server
Use jetty's built in support for listining on an OS selected port instead
Also increase timeouts to better account for slow/heavily loaded (ie:jenkins) VMs where SolrCore reloading may take longer then 10 seconds
* SOLR-13565: set proper permission name
* SOLR-13565: syncing with master
* SOLR-13565: syncing with master
* SOLR-13565: removed accidental change
* SOLR-13565: removed accidental change
* SOLR-13565: removed accidental change
* SOLR-13565: more tests
* SOLR-13565: Tests with key signing tests
* SOLR-13565: fixing concurrency issues in tests
* SOLR-13565: add tests with 512 bit RSA
* SOLR-13565: fixing concurrency issues
* SOLR-13565: remove unused code
this point in the test since the collections available are changing due
to deletions and we might try to communicate with a collection
that was (correctly) deleted.
If the field is non-stored, non-indexed and docvalue enabled numeric field
then inplace update can be done. previously, lucene didn't support
docvalue update for field that is not yet present in indexWriter but
LUCENE-8316 added support for this.
This adds support to update field which satisfies inplace conditions
but which doesn't yet exist in any docs
Remove buggy 'port roulette' code that can easily fail if OS gives the selected port to a different process just before creating the server
Use jetty's built in support for listining on an OS selected port instead
Also increase timeouts to better account for slow/heavily loaded (ie:jenkins) VMs where SolrCore reloading may take longer then 10 seconds
* tighten assertions related to type of watcher that should be removed
* use waitForActiveCollection before deleting collections to work around SOLR-13616 and/or SOLR-13627
- ensure all collections/replicas are active
- tighten assertions around expected replica locations
- eliminate some redundent code
These changes should also help ensure we don't get (more) spurious failures due to SOLR-13616
Payload func with undefined used to throw NPE. In SOLR-11610, this
was fixed to return proper error but there are no tests to verify
changed the behavior.
This add simple test to verify error code and error message
This commit introduces custom tiebreakers which allows users to
specify custom tiebreakers when ordering hits to return. A
default tiebreaker is introduced for tie breaking on shard index
first and then docID.
group.query after execution forms QueryCommandResult. In case of
group.main=true or group.format=simple, QueryCommandResult was not
consumed in EndResultTransformer. Also, MainEndResultTransformer assumed
that always group.field would be specified. When group.field not specified
it failed with AIOOBE. After adding suppport for QueryCommandResult in
EndResultTransformers and handling AIOOBE, group.query started giving results
Working on tests exposed few other issues. Results differed b/w standalone
& distributed mode.
* One of the reason is that TopGroupShardResponseProcessor doesn't consider correct
limit and offset when group format is simple. In case of simple, start and rows should be used
as limit and offset instead of group.limit and group.offset.
* Secondly, In distributed second phase grouping, computing docsToCollect didn't consider
group response format. This issue is again similar to above issue
* offset(group.offset or start) not being considered during TopDocs#merge caused
different results. The fix was to use to offset in merge process
* group.offset doesn't support negative values but there is no checks on the value.
In case of negative values AIOOBE. Now, checks are added for negative values and
returns proper error message(this change is for both standalone and distrbuted).
Validation is done only in case of group.format=grouped as that is only case when
group.offset is consumed.
Fixing above issues resolved the differences b/w standalone and distributed mode.