Commit Graph

33250 Commits

Author SHA1 Message Date
Dawid Weiss cb68d7d2c5 LUCENE-9232: add a script-hack check so that in case somebody upgrades the scripts automatically they'll know they need to add the hack. 2020-02-21 10:40:27 +01:00
Jan Høydahl 89b13377a1
SOLR-14250: Do not log error when trying to consume non-existing input stream due to Expect: 100-continue (#1250) 2020-02-21 10:30:10 +01:00
Dawid Weiss f8a2c39906 LUCENE-9155: add missing naist dictionary generation, clean up the code a bit. 2020-02-21 10:24:05 +01:00
Noble Paul 9f3f7244ac
SOLR-14270 export command to have an option to write to a zip file (#1266) 2020-02-21 13:41:50 +11:00
Robert Muir 9302eee1e0
LUCENE-9235: upgrade all python to python3
Die, python2, die.

Some generated .java files change (parameterized automata for
spell-correction).

This is because the order of python dictionaries was not well-defined
previously. A sort() was added so that the python code now generates
reproducible output (Thanks @mikemccand).

So we'll suffer a change once, but the automata are equivalent. If you
run the script again you should not see source code changes.

The relevant unit tests are exhaustive (if you trust the paper!), so we can
be confident it does not break things, even though it looks very scary.
2020-02-20 21:27:38 -05:00
Mike Drob 79966132fc SOLR-14264 Set minimum gzip size for responses 2020-02-20 17:14:21 -06:00
Anshum Gupta cea4226367
SOLR-14271: Remove duplicate async id check meant for pre Solr 8 versions (#1268)
* SOLR-14271: Remove duplicate async id check meant for pre Solr 8 versions
2020-02-20 15:13:05 -08:00
Anshum Gupta cb18586ea0
LUCENE-9155: Add Apache License header to the Kuromoji dictionary compilation (#1271) 2020-02-20 14:59:06 -08:00
Jan Høydahl 073e7bd08c
LUCENE-9233 Add top level LICENSE file
This will tag our repo at GitHub as Apache 2.0 licensed
2020-02-20 20:53:57 +01:00
Nhat Nguyen a0b8f5c7c2 LUCENE-9228: Sort dvUpdates by terms before apply
With this change, we sort dvUpdates in the term order before applying if
they all update a single field to the same value. This optimization can
reduce the flush time by around 20% for the docValues update user cases.
2020-02-20 13:18:10 -05:00
Dawid Weiss 62662e477a LUCENE-9155: Port Kuromoji dictionary compilation (regenerate). 2020-02-20 19:00:56 +01:00
Dawid Weiss 7604639b59 Move jgit version declaration to scriptDepVersions. 2020-02-20 13:54:07 +01:00
Shalin Shekhar Mangar 2fdd3b02bb SOLR-12550: Adding entry to CHANGES.txt 2020-02-20 04:33:53 -08:00
Marc A. Morissette 051133c13f
SOLR-12550: ConcurrentUpdateSolrClient doesn't respect timeouts for commits and optimize (#417)
ConcurrentUpdateSolrClient now propagates its connection and read timeouts to the private HttpSolrClient used to commit and optimize.
2020-02-20 04:29:20 -08:00
iverase 054b3be627 LUCENE-8707: fix test bug. when bounding box if a triangle
is within a circle, the triangle is within the circle as well.
2020-02-19 18:21:03 +01:00
Ignacio Vera d48bafb299
LUCENE-8707: Add LatLonShape and XYShape distance query (#587) 2020-02-19 16:03:30 +01:00
Mikhail Khludnev 001a35cc06 SOLR-14263: stripping .adoc to fix build. 2020-02-19 13:53:12 +03:00
Jan Høydahl 8389b87e39 LUCENE-9229: Fix some broken links
Change some wiki -> cwiki links

Signed-off-by: Jan Høydahl <janhoy@apache.org>
2020-02-19 10:33:26 +01:00
markharwood 79a4a680e7 Test fix - new binary doc values test could use invalid values. 2020-02-19 09:14:14 +00:00
Robert Muir b9a569e7be
LUCENE-9230: explicitly call python version we want from builds
On newer linux distros, at least, 'python' now means python3. So
we can't rely on what version of python it will invoke (at least for a
few years).

For example in Fedora Linux:

https://fedoraproject.org/wiki/Changes/Python_means_Python3

For python2.x code, explicitly call 'python2.7' and for python3.x code,
explicitly call 'python3'.

Ant variable names are cleaned up, e.g. 'python.exe' is renamed to
'python2.exe' and 'python32.exe' is renamed to 'python3.exe'. This also
makes it easy to identify remaining python 2.x code that should be
migrated to python 3.x
2020-02-18 18:58:17 -05:00
Erick Erickson aa130c4259 SOLR-14263: Update jvm-settings.adoc 2020-02-18 16:44:51 -05:00
Dawid Weiss 491c99a3de LUCENE-9232: tone down daemon defaults in generated local settings. 2020-02-18 19:43:39 +01:00
Dawid Weiss 22232a66dd LUCENE-9232: don't fork daemon on the initial run that writes local settings. 2020-02-18 19:38:11 +01:00
Dawid Weiss 2a88aa9d0f LUCENE-9219: Port ECJ-based linter to gradle
Co-authored-by: Tomoko Uchida <tomoko@apache.org>
2020-02-19 02:43:47 +09:00
Christine Poerschke 003303a9cc SOLR-13041: Add hashCode for autoscaling.Condition to accompany the already present equals.
(Zsolt Gyulavari via Christine Poerschke)
2020-02-18 14:47:44 +00:00
Eric Pugh f23def6b72 SOLR-13965: s/StreamHandler/GraphHandler fix GraphHandler.getDescription()
(Eric Pugh via Christine Poerschke)
2020-02-18 14:32:52 +00:00
Eric Pugh 5d32c04096 SOLR-13965: StreamHandler class-level javadoc edits
(Eric Pugh via Christine Poerschke)
2020-02-18 14:31:44 +00:00
markharwood ce2959fe4c
LUCENE-9211 Add compression for Binary doc value fields (#1234)
Stores groups of 32 binary doc values in LZ4-compressed blocks.
2020-02-18 14:02:42 +00:00
Robert Muir ccb390d4a6
LUCENE-9220: prevent zip file reproducibility issues based on users umask 2020-02-17 13:34:00 -05:00
Robert Muir 0203815ab2
LUCENE-9220: regenerate all stemmers/stopwords/test data from snowball 2.0 (#1262)
Previous situation:

* The snowball base classes (Among, SnowballProgram, etc) had accumulated local performance-related changes. There was a task that would also "patch" generated classes (e.g. GermanStemmer) after-the-fact.
* Snowball classes had many "non-changes" from the original such as removal of tabs addition of javadocs, license headers, etc.
* Snowball test data (inputs and expected stems) was incorporated into lucene testing, but this was maintained manually. Also files had become large, making the test too slow (Nightly).
* Snowball stopwords lists from their website were manually maintained. In some cases encoding fixes were manually applied.
* Some generated stemmers (such as Estonian and Armenian) exist in lucene, but have no corresponding `.sbl` file in snowball sources at all.

Besides this mess, snowball project is "moving along" and acquiring new languages, adding non-BSD-licensed test data, huge test data, and other complexity. So it is time to automate the integration better.

New situation:

* Lucene has a `gradle snowball` regeneration task. It works on Linux or Mac only. It checks out their repos, applies the `snowball.patch` in our repository, compiles snowball stemmers, regenerates all java code, applies any adjustments so that our build is happy.
* Tests data is automatically regenerated from the commit hash of the snowball test data repository. Not all languages are tested from their data: only where the license is simple BSD. Test data is also (deterministically) sampled, so that we don't have huge files. We just want to make sure our integration works.
* Randomized tests are still set to test every language with generated fake words. The regeneration task ensures all languages get tested (it writes a simple text file list of them).
* Stopword files are automatically regenerated from the commit hash of the snowball website repository.
* The regeneration procedure is idempotent. This way when stuff does change, you know exactly what happened. For example if test data changes to a different license, you may see a git deletion. Or if a new language/stopwords/test data gets added, you will see git additions.
2020-02-17 12:38:01 -05:00
Claire Pollard 188f620208
Update README.txt (#1090)
Update the analysis-extras README to include reference to including solr-analysis-extras jar.
2020-02-15 22:57:46 +01:00
Maxim Antonov cbf0eba176 Fixes #124 in bin/solr for 'cdpath' users
Signed-off-by: Jan Høydahl <janhoy@apache.org>
2020-02-15 18:10:14 +01:00
Jan Høydahl c777db7c8a Add HTML output format to gitHubPRs.py
Also better JSON output structure
2020-02-15 02:06:16 +01:00
Chris Hostetter f549ee3535 SOLR-13794: Replace redundent test only copy of '_default' configset with SolrTestCase logic to correctly set 'solr.default.confdir' system property
This change allows us to remove kludgy test only code from ZkController
2020-02-14 11:36:53 -07:00
Ignacio Vera ebec456602
Return CELL_CROSSES_QUERY when point inside the triangle (#1259) 2020-02-14 17:06:33 +01:00
Erick Erickson 2b4fad53e5 Merge branch 'master' of https://gitbox.apache.org/repos/asf/lucene-solr 2020-02-14 10:46:53 -05:00
Erick Erickson f52676cd82 LUCENE-9224: (ant) RAT report complains about ... solr/webapp rat-report.xml (from gradle) 2020-02-14 10:46:44 -05:00
Ignacio Vera 4a54ffb553
LUCENE-9218: XYGeometries should expose values as floats (#1252) 2020-02-14 11:39:10 +01:00
Adrien Grand 5cbe58f22c
Add back assertions removed by LUCENE-9187. (#1236)
This time they would only apply to TestFastLZ4/TestHighLZ4 and avoid slowing
down all tests.
2020-02-14 10:37:06 +01:00
Dawid Weiss dcf448efeb LUCENE-9134: Minor cleanups. 2020-02-13 11:18:01 +01:00
Chris Hostetter f1fc3e7ba2 SOLR-14247: Revert SolrTestCase Logger removal 2020-02-12 11:51:36 -07:00
Chris Hostetter 49e20dbee4 SOLR-14245: Fix ReplicaListTransformerTest
Previous changes to this issue 'fixed' the way the test was creating mock Replica instances,
to ensure all properties were specified -- but these changes tickled a bug in the existing test
scaffolding that caused it's "expecations" to be based on a regex check against only the base "url"
even though the test logic itself looked at the entire "core url"

The result is that there were reproducible failures if/when the randomly generated regex matched
".*1.*" because the existing test logic did not expect that to match the url or a Replica with
a core name of "core1" because it only considered the base url
2020-02-12 11:10:26 -07:00
Erick Erickson 0767a9d4d7 Code comment only change 2020-02-11 19:32:54 -05:00
Erick Erickson f9357ab0d2
LUCENE-9134: Port ant-regenerate tasks to Gradle build (util and packed) (#1251)
* LUCENE-9134: Port ant-regenerate tasks to Gradle build
2020-02-11 18:56:11 -05:00
yonik c3e44e1fec SOLR-14058: fix peersync bounds check iterating over versions 2020-02-11 10:43:21 -08:00
David Smiley 9a4f7661e9
SOLR-14194: Highlighters now supports docValues for the uniqueKey
and the original highlighter can highlight docValues.
2020-02-11 02:18:08 -05:00
Mike 71b869381e
SOLR-14247 Remove unneeded sleeps (#1244) 2020-02-10 21:13:56 -06:00
Dawid Weiss b21312f411 SOLR-14243: ant clean-jars should not delete gradle-wrapper.jar. 2020-02-10 16:12:19 +01:00
Shalin Shekhar Mangar c65b97665c
SOLR-13996: Refactor HttpShardHandler.prepDistributed method (#1220)
SOLR-13996: Refactor HttpShardHandler.prepDistributed method into smaller pieces

This commit introduces an interface named ReplicaSource which is marked as experimental. It has two sub-classes named CloudReplicaSource (for solr cloud) and LegacyReplicaSource for non-cloud clusters. The prepDistributed method now calls out to these sub-classes depending on whether the cluster is running on cloud mode or not.
2020-02-10 19:57:05 +05:30
Ignacio Vera 87421d7231 LUCENE-9216: Make sure we index LEAST_DOUBLE_VALUE (#1246) 2020-02-10 11:50:08 +01:00