This change restores the Lucene70Codec for file format compatibility of
indices that are created within the Lucene 7 major version. These indices
can be opened via an expert API on DirectoryReader in read-only mode. Changes
to these indices are prohibited and will be rejected by the IndexWriter.
In fact, IndexWriter will not open an index that is created with a major version
less than N-1 to the current major version.
Hunspell: improve stemming of all-caps words
Repeat Hunspell's logic:
* when encountering a mixed- or (inflectable) all-case dictionary entry, add its title-case analog as a hidden entry
* use that hidden entry for stemming case variants for title- and uppercase words, but don't consider it a valid word itself
* ...unless there's another explicit dictionary entry of that title case
Today we force indices that were created with N-2 and older versions of Lucene
to fail on open. This check doesn't even check if the codecs are available. In order
to allow users to open older indices and for us to support N-2 versions this change
adds an API on DirectoryReader to specify a minimum index version on a per reader basis.
This doesn't apply for the IndexWriter which will fail on opening older indices.
Call stem() recursively just once with different arguments depending on various conditions.
NOTE: committing in directly as this is a refactoring, not a functional change (no CHANGES.txt entry).
* add test case for SOLR-15071
* add temporary @Ignore to be removed when the fix is committed
Co-authored-by: Florin Babes <florin.babes@emag.ro>
Co-authored-by: Christine Poerschke <cpoerschke@apache.org>
* introduce jattach check if jstack is missing. jattach ships in the Solr docker image instead of jstack.
* get the full path to the jattach command
Co-authored-by: Christine Poerschke <cpoerschke@apache.org>
Prior to this commit, SuggestComponent used a HashMap as part of the
response it built on the server side. This class is serialized/
deserialized differently depending on the SolrJ ResponseParser used:
a LinkedHashMap when javabin was used, and a SimpleOrderedMap when XML
was used. This discrepancy led to ClassCastException's in downstream
SolrJ code.
This commit fixes the issue by changing SuggestComponent to avoid these
types that are serialized differently. "suggest" response sections now
deserialize as a NamedList in SolrJ, and the SuggesterResponse POJO has
been updated accordingly.
This ensures all derefernced fields are not parsed into actual valuesource
but parsed into a placeholder value. This works for 1-level of dereferencing
* When the schema defines _root_, and you want to do atomic/partial updates...
** _root_ needn't be stored or have docValues any more
** _nest_path_ field isn't needed for this any more
** Simplified internal logic
* Allow (and recommend, eventually insist) that the _root_ field be passed for atomic/partial updates to child docs.
** In the absence of _root_, assume the _route_ param is equivalent to ameliorate back-compat scope. This is a temporary hack; remove in SOLR-15064.
** One of the two is required; you'll get an exception if the assumption is false. THIS IS A BACK-COMPAT CHANGE
* Ensure that the update log contains the _root_ field if it's defined in the schema; in some cases it wasn't. It's important for robustness of atomic/partial updates to child docs. Caveat: the buffer replay scenario is not tested with child docs.
* Limited the cases when a realtime searcher is re-opened. It was being applied to any update that included child docs but now only some narrow subset: only for atomic/partial updates, and when the update log contains an in-place update for the same nest because it's complicated to resolve those log entries.
* Internal improvements to RealTimeGetComponent to aid clarity & robustness & probably performance...
** Use SolrDocumentFetcher.solrDoc(docID, ReturnFields) instead of more manual loading. Will do more with this in another PR.
** Clarify when only root doc IDs are expected.
** Use Resolution enum more, add PARTIAL, remove DOC_WITH_CHILDREN; enhance docs.
** When have ReturnFields, a Set of "onlyTheseFields" becomes redundant. Add a child doc resolution via a transformer when needed.
** Clarified where copy-field targets are removed
* NestPathField should default to single valued, instead of inheriting the schema default, which for ancient schemas was multi-valued.
* AddUpdateCommand.getLuceneDocument(s) methods are very internal; made package visible and refactored a bit for clarity
* DocumentBuilder: when in-place update, skip id and _root_ here, thus also simplifying further logic
* NestedShardedAtomicUpdateTest no longer extends AbstractFullDistribZkTestBase because it wasn't really leveraging the "control client" checking, and it added too much complexity to debug failures.
missing, allBuckets, and numBuckets is not supported with stream method.
So, avoiding picking stream method when any one of them is enabled even if
facet sort is 'index asc'
CopyFields are regenerated in case of replace-field or replace-field-type.
While regenerating, source and destionation are checked against fields but source/dest
could match dynamic rule too.
For example,
<copyField source="something_s" dest="spellcheck"/>
<dynamicField name="*_s" type="string"/>
here, something_s is not present in schema but matches the dynamic rule.
To handle the above case, need to check dynamicFieldCache too while regenerating the
copyFields