Apache Solr Release Notes
Introduction
------------
Apache Solr is an open source enterprise search server based on the Apache Lucene Java
search library, with XML/HTTP and JSON APIs, hit highlighting, faceted search,
caching, replication, and a web administration interface. It runs in a Java
servlet container such as Tomcat.
See http://lucene.apache.org/solr for more information.
Getting Started
---------------
You need a Java 1.6 VM or later installed.
In this release, there is an example Solr server including a bundled
servlet container in the directory named "example".
See the tutorial at http://lucene.apache.org/solr/tutorial.html
$Id$
================== 5.0.0 ==================
(No changes)
================== 4.1.0 ==================
Detailed Change List
----------------------
New Features
----------------------
* SOLR-2255: Enhanced pivot faceting to use local-params in the same way that
regular field value faceting can. This means support for excluding a filter
query, using a different output key, and specifying 'threads' to do
facet.method=fcs concurrently. PivotFacetHelper now extends SimpleFacet and
the getFacetImplementation() extension hook was removed. (dsmiley)
* SOLR-3897: A highlighter parameter "hl.preserveMulti" to return all of the
values of a multiValued field in their original order when highlighting.
(Joel Bernstein via yonik)
* SOLR-3929: Support configuring IndexWriter max thread count in solrconfig.
(phunt via Mark Miller)
* SOLR-3906: Add support for AnalyzingSuggester (LUCENE-3842), where the
underlying analyzed form used for suggestions is separate from the returned
text. (Robert Muir)
* SOLR-3985: ExternalFileField caches can be reloaded on firstSearcher/
newSearcher events using the ExternalFileFieldReloader (Alan Woodward)
* SOLR-3911: Make Directory and DirectoryFactory first class so that the majority
of Solr's features work with any custom implementations. (Mark Miller)
* SOLR-1972: Add extra statistics to RequestHandlers - 5 & 15-minute reqs/sec
rolling averages; median, 75th, 95th, 99th, 99.9th percentile request times
(Alan Woodward, Shawn Heisey, Adrien Grand)
Optimizations
----------------------
* SOLR-3788: Admin Cores UI should redirect to newly created core details
(steffkes)
* SOLR-3895: XML and XSLT UpdateRequestHandler should not try to resolve
external entities. This improves speed of loading e.g. XSL-transformed
XHTML documents. (Martin Herfurt, uschindler, hossman)
* SOLR-3614: Fix XML parsing in XPathEntityProcessor to correctly expand
named entities, but ignore external entities. (uschindler, hossman)
* SOLR-3734: Improve Schema-Browser Handling for CopyField using
dynamicField's (steffkes)
* SOLR-3941: The "commitOnLeader" part of distributed recovery can use
openSearcher=false. (Tomas Fernandez Lobbe via Mark Miller)
Bug Fixes
----------------------
* SOLR-4007: Morfologik dictionaries not available in Solr field type
due to class loader lookup problems. (Lance Norskog, Dawid Weiss)
* SOLR-3560: Handle different types of Exception Messages for Logging UI
(steffkes)
* SOLR-3637: Commit Status at Core-Admin UI is always false (steffkes)
* SOLR-3917: Partial State on Schema-Browser UI is not defined for Dynamic
Fields & Types (steffkes)
* SOLR-3939: Consider a sync attempt from leader to replica that fails due
to 404 a success. (Mark Miller, Joel Bernstein)
* SOLR-3940: Rejoining the leader election incorrectly triggers the code path
for a fresh cluster start rather than fail over. (Mark Miller)
* SOLR-3961: Fixed error using LimitTokenCountFilterFactory
(Jack Krupansky, hossman)
* SOLR-3933: Distributed commits are not guaranteed to be ordered within a
request. (Mark Miller)
* SOLR-3939: An empty or just replicated index cannot become the leader of a
shard after a leader goes down. (Joel Bernstein, yonik, Mark Miller)
* SOLR-3971: A collection that is created with numShards=1 turns into a
numShards=2 collection after starting up a second core and not specifying
numShards. (Mark Miller)
* SOLR-3988: Fixed SolrTestCaseJ4.adoc(SolrInputDocument) to respect
field and document boosts (hossman)
* SOLR-3981: Fixed bug that resulted in document boosts being compounded in
destination fields. (hossman)
* SOLR-3920: Fix server list caching in CloudSolrServer when using more than one
collection list with the same instance. (Grzegorz Sobczyk, Mark Miller)
* SOLR-3938: prepareCommit command omits commitData causing a failure to trigger
replication to slaves. (yonik)
* SOLR-3992: QuerySenderListener doesn't populate document cache.
(Shotaro Kamio, yonik)
* SOLR-3995: Recovery may never finish on SolrCore shutdown if the last reference to
a SolrCore is closed by the recovery process. (Mark Miller)
* SOLR-3998: Atomic update on uniqueKey field itself causes duplicate document.
(Eric Spencer, yonik)
* SOLR-4001: In CachingDirectoryFactory#close, if there are still refs for a
Directory outstanding, we need to wait for them to be released before closing.
(Mark Miller)
* SOLR-4005: If CoreContainer fails to register a created core, it should close it.
(Mark Miller)
* SOLR-4009: OverseerCollectionProcessor is not resiliant to many error conditions
and can stop running on errors. (milesli, Mark Miller)
* SOLR-4019: Log stack traces for 503/Service Unavailable SolrException if not
thrown by PingRequestHandler. Do not log exceptions if a user tries to view a
hidden file using ShowFileRequestHandler. (Tomás Fernández Löbbe via James Dyer)
* SOLR-3589: Edismax parser does not honor mm parameter if analyzer splits a token.
(Tom Burton-West, Robert Muir)
* SOLR-4031: Upgrade to Jetty 8.1.7 to fix a bug where in very rare occasions
the content of two concurrent requests get mixed up. (Per Steffensen, yonik)
Other Changes
----------------------
* SOLR-3899: SolrCore should not log at warning level when the index directory
changes - it's an info event. (Tobias Bergman, Mark Miller)
* SOLR-3861: Refactor SolrCoreState so that it's managed by SolrCore.
(Mark Miller, hossman)
* SOLR-3966: Eliminate superfluous warning from LanguageIdentifierUpdateProcessor
(Markus Jelsma via hossman)
* SOLR-3932: SolrCmdDistributorTest either takes 3 seconds or 3 minutes.
(yonik, Mark Miller)
* SOLR-3856: New tests for SqlEntityProcessor/CachedSqlEntityProcessor
(James Dyer)
================== 4.0.0 ==================
Versions of Major Components
---------------------
Apache Tika 1.2
Carrot2 3.5.0
Velocity 1.6.4 and Velocity Tools 2.0
Apache UIMA 2.3.1
Apache ZooKeeper 3.3.6
Upgrading from Solr 4.0.0-BETA
----------------------
In order to better support distributed search mode, the TermVectorComponent's
response format has been changed so that if the schema defines a
uniqueKeyField, then that field value is used as the "key" for each document in
it's response section, instead of the internal lucene doc id. Users w/o a
uniqueKeyField will continue to see the same response format. See SOLR-3229
for more details.
If you are using SolrCloud's distributed update request capabilities and a non
string type id field, you must re-index.
Upgrading from Solr 4.0.0-ALPHA
----------------------
Solr is now much more strict about requiring that the uniqueKeyField feature
(if used) must refer to a field which is not multiValued. If you upgrade from
an earlier version of Solr and see an error that your uniqueKeyField "can not
be configured to be multivalued" please add 'multiValued="false"' to the
declaration for your uniqueKeyField. See SOLR-3682 for more details.
In addition, please review the notes above about upgrading from 4.0.0-BETA
Upgrading from Solr 3.6
----------------------
* The Lucene index format has changed and as a result, once you upgrade,
previous versions of Solr will no longer be able to read your indices.
In a master/slave configuration, all searchers/slaves should be upgraded
before the master. If the master were to be updated first, the older
searchers would not be able to read the new index format.
* Setting abortOnConfigurationError=false is no longer supported
(since it has never worked properly). Solr will now warn you if
you attempt to set this configuration option at all. (see SOLR-1846)
* The default logic for the 'mm' param of the 'dismax' QParser has
been changed. If no 'mm' param is specified (either in the query,
or as a default in solrconfig.xml) then the effective value of the
'q.op' param (either in the query or as a default in solrconfig.xml
or from the 'defaultOperator' option in schema.xml) is used to
influence the behavior. If q.op is effectively "AND" then mm=100%.
If q.op is effectively "OR" then mm=0%. Users who wish to force the
legacy behavior should set a default value for the 'mm' param in
their solrconfig.xml file.
* The VelocityResponseWriter is no longer built into the core. Its JAR and
dependencies now need to be added (via or solr/home lib inclusion),
and it needs to be registered in solrconfig.xml like this:
* The update request parameter to choose Update Request Processor Chain is
renamed from "update.processor" to "update.chain". The old parameter was
deprecated but still working since Solr3.2, but is now removed
entirely.
* The and sections of solrconfig.xml are discontinued
and replaced with the section. There are also better defaults.
When migrating, if you don't know what your old settings mean, simply delete
both and sections. If you have customizations,
put them in section - with same syntax as before.
* Two of the SolrServer subclasses in SolrJ were renamed/replaced.
CommonsHttpSolrServer is now HttpSolrServer, and
StreamingUpdateSolrServer is now ConcurrentUpdateSolrServer.
* The PingRequestHandler no longer looks for a option in the
(legacy) section of solrconfig.xml. Users who wish to take
advantage of this feature should configure a "healthcheckFile" init param
directly on the PingRequestHandler. As part of this change, relative file
paths have been fixed to be resolved against the data dir. See the example
solrconfig.xml and SOLR-1258 for more details.
* Due to low level changes to support SolrCloud, the uniqueKey field can no
longer be populated via or in the
schema.xml. Users wishing to have Solr automatically generate a uniqueKey
value when adding documents should instead use an instance of
solr.UUIDUpdateProcessorFactory in their update processor chain. See
SOLR-2796 for more details.
In addition, please review the notes above about upgrading from 4.0.0-BETA, and 4.0.0-ALPHA
Detailed Change List
----------------------
New Features
----------------------
* SOLR-3670: New CountFieldValuesUpdateProcessorFactory makes it easy to index
the number of values in another field for later use at query time. (hossman)
* SOLR-2768: new "mod(x,y)" function for computing the modulus of two value
sources. (hossman)
* SOLR-3238: Numerous small improvements to the Admin UI (steffkes)
* SOLR-3597: seems like a lot of wasted whitespace at the top of the admin screens
(steffkes)
* SOLR-3304: Added Solr adapters for Lucene 4's new spatial module. With
SpatialRecursivePrefixTreeFieldType ("location_rpt" in example schema), it is
possible to index a variable number of points per document (and sort on them),
index not just points but any Spatial4j supported shape such as polygons, and
to query on these shapes too. Polygons requires adding JTS to the classpath.
(David Smiley)
* SOLR-3825: Added optional capability to log what ids are in a response
(Scott Stults via gsingers)
* SOLR-3821: Added 'df' to the UI Query form (steffkes)
* SOLR-3822: Added hover titles to the edismax params on the UI Query form
(steffkes)
Optimizations
----------------------
* SOLR-3715: improve concurrency of the transaction log by removing
synchronization around log record serialization. (yonik)
* SOLR-3807: Currently during recovery we pause for a number of seconds after
waiting for the leader to see a recovering state so that any previous updates
will have finished before our commit on the leader - we don't need this wait
for peersync. (Mark Miller)
* SOLR-3837: When a leader is elected and asks replicas to sync back to him and
that fails, we should ask those nodes to recovery asynchronously rather than
synchronously. (Mark Miller)
* SOLR-3709: Cache the url list created from the ClusterState in CloudSolrServer
on each request. (Mark Miller)
Bug Fixes
----------------------
* SOLR-3685: Solr Cloud sometimes skipped peersync attempt and replicated instead due
to tlog flags not being cleared when no updates were buffered during a previous
replication. (Markus Jelsma, Mark Miller, yonik)
* SOLR-3229: Fixed TermVectorComponent to work with distributed search
(Hang Xie, hossman)
* SOLR-3725: Fixed package-local-src-tgz target to not bring in unnecessary jars
and binary contents. (Michael Dodsworth via rmuir)
* SOLR-3649: Fixed bug in JavabinLoader that caused deleteById(List ids)
to not work in SolrJ (siren)
* SOLR-3730: Rollback is not implemented quite right and can cause corner case fails in
SolrCloud tests. (rmuir, Mark Miller)
* SOLR-2981: Fixed StatsComponent to no longer return duplicated information
when requesting multiple stats.facet fields.
(Roman Kliewer via hossman)
* SOLR-3743: Fixed issues with atomic updates and optimistic concurrency in
conjunction with stored copyField targets by making real-time get never
return copyField targets. (yonik)
* SOLR-3746: Proper error reporting if updateLog is configured w/o necessary
"_version_" field in schema.xml (hossman)
* SOLR-3745: Proper error reporting if SolrCloud mode is used w/o
necessary "_version_" field in schema.xml (hossman)
* SOLR-3770: Overseer may lose updates to cluster state (siren)
* SOLR-3721: Fix bug that could theoretically allow multiple recoveries to run
briefly at the same time if the recovery thread join call was interrupted.
(Per Steffensen, Mark Miller)
* SOLR-3782: A leader going down while updates are coming in can cause shard
inconsistency. (Mark Miller)
* SOLR-3611: We do not show ZooKeeper data in the UI for a node that has children.
(Mark Miller)
* SOLR-3789: Fix bug in SnapPuller that caused "internal" compression to fail.
(siren)
* SOLR-3790: ConcurrentModificationException could be thrown when using hl.fl=*.
Fixed in r1231606. (yonik, koji)
* SOLR-3668: DataImport : Specifying Custom Parameters (steffkes)
* SOLR-3793: UnInvertedField faceting cached big terms in the filter
cache that ignored deletions, leading to duplicate documents in search
later when a filter of the same term was specified.
(Günter Hipler, hossman, yonik)
* SOLR-3679: Core Admin UI gives no feedback if "Add Core" fails (steffkes, hossman)
* SOLR-3795: Fixed LukeRequestHandler response to correctly return field name
strings in copyDests and copySources arrays (hossman)
* SOLR-3699: Fixed some Directory leaks when there were errors during SolrCore
or SolrIndexWriter initialization. (hossman)
* SOLR-3518: Include final 'hits' in log information when aggregating a
distibuted request (Markus Jelsma via hossman)
* SOLR-3628: SolrInputField and SolrInputDocument are now consistently backed
by Collections passed in to setValue/setField, and defensively copy values
from Collections passed to addValue/addField
(Tom Switzer via hossman)
* SOLR-3595: CurrencyField now generates an appropriate error on schema init
if it is configured as multiValued - this has never been properly supported,
but previously failed silently in odd ways. (hossman)
* SOLR-3823: Fix 'bq' parsing in edismax. Please note that this required
reverting the negative boost support added by SOLR-3278 (hossman)
* SOLR-3827: Fix shareSchema=true in solr.xml
(Tomás Fernández Löbbe via hossman)
* SOLR-3809: Fixed config file replication when subdirectories are used
(Emmanuel Espina via hossman)
* SOLR-3828: Fixed QueryElevationComponent so that using 'markExcludes' does
not modify the result set or ranking of 'excluded' documents relative to
not using elevation at all. (Alexey Serba via hossman)
* SOLR-3569: Fixed debug output on distributed requests when there are no
results found. (David Bowen via hossman)
* SOLR-3811: Query Form using wrong values for dismax, edismax (steffkes)
* SOLR-3779: DataImportHandler's LineEntityProcessor when used in conjunction
with FileListEntityProcessor would only process the first file.
(Ahmet Arslan via James Dyer)
* SOLR-3791: CachedSqlEntityProcessor would throw a NullPointerException when
a query returns a row with a NULL key. (Steffen Moelter via James Dyer)
* SOLR-3833: When a election is started because a leader went down, the new
leader candidate should decline if the last state they published was not
active. (yonik, Mark Miller)
* SOLR-3836: When doing peer sync, we should only count sync attempts that
cannot reach the given host as success when the candidate leader is
syncing with the replicas - not when replicas are syncing to the leader.
(Mark Miller)
* SOLR-3835: In our leader election algorithm, if on connection loss we found
we did not create our election node, we should retry, not throw an exception.
(Mark Miller)
* SOLR-3834: A new leader on cluster startup should also run the leader sync
process in case there was a bad cluster shutdown. (Mark Miller)
* SOLR-3772: On cluster startup, we should wait until we see all registered
replicas before running the leader process - or if they all do not come up,
N amount of time. (Mark Miller)
* SOLR-3756: If we are elected the leader of a shard, but we fail to publish
this for any reason, we should clean up and re trigger a leader election.
(Mark Miller)
* SOLR-3812: ConnectionLoss during recovery can cause lost updates, leading to
shard inconsistency. (Mark Miller)
* SOLR-3813: When a new leader syncs, we need to ask all shards to sync back,
not just those that are active. (Mark Miller)
* SOLR-3641: CoreContainer is not persisting roles core attribute.
(hossman, Mark Miller)
* SOLR-3527: SolrCmdDistributor drops some of the important commit attributes
(maxOptimizeSegments, softCommit, expungeDeletes) when sending a commit to
replicas. (Andy Laird, Tomas Fernandez Lobbe, Mark Miller)
* SOLR-3844: SolrCore reload can fail because it tries to remove the index
write lock while already holding it. (Mark Miller)
* SOLR-3831: Atomic updates do not distribute correctly to other nodes.
(Jim Musil, Mark Miller)
* SOLR-3465: Replication causes two searcher warmups.
(Michael Garski, Mark Miller)
* SOLR-3645: /terms should default to distrib=false. (Nick Cotton, Mark Miller)
* SOLR-3759: Various fixes to the example-DIH configs (Ahmet Arslan, hossman)
* SOLR-3777: Dataimport-UI does not send unchecked checkboxes (Glenn MacStravic
via steffkes)
* SOLR-3850: DataImportHandler "cacheKey" parameter was incorrectly renamed "cachePk"
(James Dyer)
* SOLR-3087: Fixed DOMUtil so that code doing attribute validation will
automaticly ignore nodes in the resserved "xml" prefix - in particular this
fixes some bugs related to xinclude and fieldTypes.
(Amit Nithian, hossman)
* SOLR-3783: Fixed Pivot Faceting to work with facet.missing=true (hossman)
* SOLR-3869: A PeerSync attempt to it's replicas by a candidate leader should
not fail on o.a.http.conn.ConnectTimeoutException. (Mark Miller)
* SOLR-3875: Fixed index boosts on multi-valued fields when docBoost is used
(hossman)
* SOLR-3878: Exception when using open-ended range query with CurrencyField (janhoy)
* SOLR-3891: CacheValue in CachingDirectoryFactory cannot be used outside of
solr.core package. (phunt via Mark Miller)
* SOLR-3892: Inconsistent locking when accessing cache in CachingDirectoryFactory
from RAMDirectoryFactory and MockDirectoryFactory. (phunt via Mark Miller)
* SOLR-3883: Distributed indexing forwards non-applicable request params.
(Dan Sutton, Per Steffensen, yonik, Mark Miller)
* SOLR-3903: Fixed MissingFormatArgumentException in ConcurrentUpdateSolrServer
(hossman)
* SOLR-3916: Fixed whitespace bug in parsing the fl param (hossman)
Other Changes
----------------------
* SOLR-3690: Fixed binary release packages to include dependencie needed for
the solr-test-framework (hossman)
* SOLR-2857: The /update/json and /update/csv URLs were restored to aid
in the migration of existing clients. (yonik)
* SOLR-3691: SimplePostTool: Mode for crawling/posting web pages
See http://wiki.apache.org/solr/ExtractingRequestHandler for examples (janhoy)
* SOLR-3707: Upgrade Solr to Tika 1.2 (janhoy)
* SOLR-2747: Updated changes2html.pl to handle Solr's CHANGES.txt; added
target 'changes-to-html' to solr/build.xml.
(Steve Rowe, Robert Muir)
* SOLR-3752: When a leader goes down, have the Overseer clear the leader state
in cluster.json (Mark Miller)
* SOLR-3751: Add defensive checks for SolrCloud updates and requests that ensure
the local state matches what we can tell the request expected. (Mark Miller)
* SOLR-3773: Hash based on the external String id rather than the indexed
representation for distributed updates. (Michael Garski, yonik, Mark Miller)
* SOLR-3780: Maven build: Make solrj tests run separately from solr-core.
(Steve Rowe)
* SOLR-3772: Optionally, on cluster startup, we can wait until we see all registered
replicas before running the leader process - or if they all do not come up,
N amount of time. (Jan Høydahl, Per Steffensen, Mark Miller)
* SOLR-3750: Optionaly, on session expiration, we can explicitly wait some time before
running the leader sync process so that we are sure every node participates.
(Per Steffensen, Mark Miller)
* SOLR-3824: Velocity: Error messages from search not displayed (janhoy)
* SOLR-3826: Test framework improvements for specifying coreName on initCore
(Amit Nithian, hossman)
* SOLR-3749: Allow default UpdateLog syncLevel to be configured by
solrconfig.xml (Raintung Li, Mark Miller)
* SOLR-3845: Rename numReplicas to replicationFactor in Collections API.
(yonik, Mark Miller)
* SOLR-3815: SolrCloud - Add properties such as "range" to shards, which changes
the clusterstate.json and puts the shard replicas under "replicas". (yonik)
* SOLR-3871: SyncStrategy should use an executor for the threads it creates to
request recoveries. (Mark Miller)
* SOLR-3870: SyncStrategy should have a close so it can abort earlier on
shutdown. (Mark Miller)
================== 4.0.0-BETA ===================
Versions of Major Components
---------------------
Apache Tika 1.1
Carrot2 3.5.0
Velocity 1.6.4 and Velocity Tools 2.0
Apache UIMA 2.3.1
Apache ZooKeeper 3.3.6
Upgrading from Solr 4.0.0-ALPHA
----------------------
Solr is now much more strict about requiring that the uniqueKeyField feature
(if used) must refer to a field which is not multiValued. If you upgrade from
an earlier version of Solr and see an error that your uniqueKeyField "can not
be configured to be multivalued" please add 'multiValued="false"' to the
declaration for your uniqueKeyField. See SOLR-3682 for more details.
Detailed Change List
----------------------
New Features
----------------------
* LUCENE-4201: Added JapaneseIterationMarkCharFilterFactory to normalize Japanese
iteration marks. (Robert Muir, Christian Moen)
* SOLR-1856: In Solr Cell, literals should override Tika-parsed values.
Patch adds a param "literalsOverride" which defaults to true, but can be set
to "false" to let Tika-parsed values be appended to literal values (Chris Harris, janhoy)
* SOLR-3488: Added a Collection management API for SolrCloud.
(Tommaso Teofili, Sami Siren, yonik, Mark Miller)
* SOLR-3559: Full deleteByQuery support with SolrCloud distributed indexing.
All replicas of a shard will be consistent, even if updates arrive in a
different order on different replicas. (yonik)
* SOLR-1929: Index encrypted documents with ExtractingUpdateRequestHandler.
By supplying resource.password= or specifying an external file with regular
expressions matching file names, Solr will decrypt and index PDFs and DOCX formats.
(janhoy, Yiannis Pericleous)
* SOLR-3562: Add options to remove instance dir or data dir on core unload.
(Mark Miller, Per Steffensen)
* SOLR-2702: The default directory factory was changed to NRTCachingDirectoryFactory
which wraps the StandardDirectoryFactory and caches small files for improved
Near Real-time (NRT) performance. (Mark Miller, yonik)
* SOLR-2616: Include a sample java util logging configuration file.
(David Smiley, Mark Miller)
* SOLR-3460: Add cloud-scripts directory and a zkcli.sh|bat tool for easy scripting
and interaction with ZooKeeper. (Mark Miller)
* SOLR-1725: StatelessScriptUpdateProcessorFactory allows users to implement
the full ScriptUpdateProcessor API using any scripting language with a
javax.script.ScriptEngineFactory
(Uri Boness, ehatcher, Simon Rosenthal, hossman)
* SOLR-139: Change to updateable documents to create the document if it doesn't
already exist. To assert that the document must exist, use the optimistic
concurrency feature by specifying a _version_ of 1. (yonik)
* LUCENE-2510, LUCENE-4044: Migrated Solr's Tokenizer-, TokenFilter-, and
CharFilterFactories to the lucene-analysis module. To add new analysis
modules to Solr (like ICU, SmartChinese, Morfologik,...), just drop in
the JAR files from Lucene's binary distribution into your Solr instance's
lib folder. The factories are automatically made available with SPI.
(Chris Male, Robert Muir, Uwe Schindler)
* SOLR-3634, SOLR-3635: CoreContainer and CoreAdminHandler will now remember
and report back information about failures to initialize SolrCores. These
failures will be accessible from the web UI and CoreAdminHandler STATUS
command until they are "reset" by creating/renaming a SolrCore with the
same name. (hossman, steffkes)
* SOLR-1280: Added commented-out example of the new script update processor
to the example configuration. See http://wiki.apache.org/solr/ScriptUpdateProcessor (ehatcher)
* SOLR-3672: SimplePostTool: Improvements for posting files
Support for auto mode, recursive and wildcards (janhoy)
Optimizations
----------------------
* SOLR-3708: Add hashCode to ClusterState so that structures built based on the
ClusterState can be easily cached. (Mark Miller)
* SOLR-3709: Cache the url list created from the ClusterState in CloudSolrServer on each
request. (Mark Miller, yonik)
* SOLR-3710: Change CloudSolrServer so that update requests are only sent to leaders by
default. (Mark Miller)
Bug Fixes
----------------------
* SOLR-3582: Our ZooKeeper watchers respond to session events as if they are change events,
creating undesirable side effects. (Trym R. Møller, Mark Miller)
* SOLR-3467: ExtendedDismax escaping is missing several reserved characters
(Michael Dodsworth via janhoy)
* SOLR-3587: After reloading a SolrCore, the original Analyzer is still used rather than a new
one. (Alexey Serba, yonik, rmuir, Mark Miller)
* LUCENE-4185: Fix a bug where CharFilters were wrongly being applied twice. (Michael Froh, rmuir)
* SOLR-3610: After reloading a core, indexing would fail on any newly added fields to the schema. (Brent Mills, rmuir)
* SOLR-3377: edismax fails to correctly parse a fielded query wrapped by parens.
This regression was introduced in 3.6. (Bernd Fehling, Jan Høydahl, yonik)
* SOLR-3621: Fix rare concurrency issue when opening a new IndexWriter for replication or rollback.
(Mark Miller)
* SOLR-1781: Replication index directories not always cleaned up.
(Markus Jelsma, Terje Sten Bjerkseth, Mark Miller)
* SOLR-3639: Update ZooKeeper to 3.3.6 for a variety of bug fixes. (Mark Miller)
* SOLR-3629: Typo in solr.xml persistence when overriding the solrconfig.xml
file name using the "config" attribute prevented the override file from being
used. (Ryan Zezeski, hossman)
* SOLR-3642: Correct broken check for multivalued fields in stats.facet
(Yandong Yao, hossman)
* SOLR-3660: Velocity: Link to admin page broken (janhoy)
* SOLR-3658: Adding thousands of docs with one UpdateProcessorChain instance can briefly create
spikes of threads in the thousands. (yonik, Mark Miller)
* SOLR-3656: A core reload now always uses the same dataDir. (Mark Miller, yonik)
* SOLR-3662: Core reload bugs: a reload always obtained a non-NRT searcher, which
could go back in time with respect to the previous core's NRT searcher. Versioning
did not work correctly across a core reload, and update handler synchronization
was changed to synchronize on core state since more than on update handler
can coexist for a single index during a reload. (yonik)
* SOLR-3663: There are a couple of bugs in the sync process when a leader goes down and a
new leader is elected. (Mark Miller)
* SOLR-3623: Fixed inconsistent treatment of third-party dependencies for
solr contribs analysis-extras & uima (hossman)
* SOLR-3652: Fixed range faceting to error instead of looping infinitely
when 'gap' is zero -- or effectively zero due to floating point arithmetic
underflow. (hossman)
* SOLR-3648: Fixed VelocityResponseWriter template loading in SolrCloud mode.
For the example configuration, this means /browse now works with SolrCloud.
(janhoy, ehatcher)
* SOLR-3677: Fixed missleading error message in web ui to distinguish between
no SolrCores loaded vs. no /admin/ handler available.
(hossman, steffkes)
* SOLR-3428: SolrCmdDistributor flushAdds/flushDeletes can cause repeated
adds/deletes to be sent (Mark Miller, Per Steffensen)
* SOLR-3647: DistributedQueue should use our Solr zk client rather than the std zk
client. ZooKeeper expiration can be permanent otherwise. (Mark Miller)
Other Changes
----------------------
* SOLR-3524: Make discarding punctuation configurable in JapaneseTokenizerFactory.
The default is to discard punctuation, but this is overridable as an expert option.
(Kazuaki Hiraga, Jun Ohtani via Christian Moen)
* SOLR-1770: Move the default core instance directory into a collection1 folder.
(Mark Miller)
* SOLR-3355: Add shard and collection to SolrCore statistics. (Michael Garski, Mark Miller)
* SOLR-3575: solr.xml should default to persist=true (Mark Miller)
* SOLR-3563: Unloading all cores in a SolrCloud collection will now cause the removal of
that collection's meta data from ZooKeeper. (Mark Miller, Per Steffensen)
* SOLR-3599: Add zkClientTimeout to solr.xml so that it's obvious how to change it and so
that you can change it with a system property. (Mark Miller)
* SOLR-3609: Change Solr's expanded webapp directory to be at a consistent path called
solr-webapp rather than a temporary directory. (Mark Miller)
* SOLR-3600: Raise the default zkClientTimeout from 10 seconds to 15 seconds. (Mark Miller)
* SOLR-3215: Clone SolrInputDocument when distrib indexing so that update processors after
the distrib update process do not process the document twice. (Mark Miller)
* SOLR-3683: Improved error handling if an contains both an
explicit class attribute, as well as nested factories. (hossman)
* SOLR-3682: Fail to parse schema.xml if uniqueKeyField is multivalued (hossman)
* SOLR-2115: DIH no longer requires the "config" parameter to be specified in solrconfig.xml.
Instead, the configuration is loaded and parsed with every import. This allows the use of
a different configuration with each import, and makes correcting configuration errors simpler.
Also, the configuration itself can be passed using the "dataConfig" parameter rather than
using a file (this previously worked in debug mode only). When configuration errors are
encountered, the error message is returned in XML format. (James Dyer)
* SOLR-3439: Make SolrCell easier to use out of the box. Also improves "/browse" to display
rich-text documents correctly, along with facets for author and content_type.
With the new "content" field, highlighting of body is supported. See also SOLR-3672 for
easier posting of a whole directory structure. (Jack Krupansky, janhoy)
* SOLR-3579: SolrCloud view should default to the graph view rather than tree view.
(steffkes, Mark Miller)
================== 4.0.0-ALPHA ==================
More information about this release, including any errata related to the
release notes, upgrade instructions, or other changes may be found online at:
https://wiki.apache.org/solr/Solr4.0
Versions of Major Components
---------------------
Apache Tika 1.1
Carrot2 3.5.0
Velocity 1.6.4 and Velocity Tools 2.0
Apache UIMA 2.3.1
Apache ZooKeeper 3.3.4
Upgrading from Solr 3.6-dev
----------------------
* The Lucene index format has changed and as a result, once you upgrade,
previous versions of Solr will no longer be able to read your indices.
In a master/slave configuration, all searchers/slaves should be upgraded
before the master. If the master were to be updated first, the older
searchers would not be able to read the new index format.
* Setting abortOnConfigurationError=false is no longer supported
(since it has never worked properly). Solr will now warn you if
you attempt to set this configuration option at all. (see SOLR-1846)
* The default logic for the 'mm' param of the 'dismax' QParser has
been changed. If no 'mm' param is specified (either in the query,
or as a default in solrconfig.xml) then the effective value of the
'q.op' param (either in the query or as a default in solrconfig.xml
or from the 'defaultOperator' option in schema.xml) is used to
influence the behavior. If q.op is effectively "AND" then mm=100%.
If q.op is effectively "OR" then mm=0%. Users who wish to force the
legacy behavior should set a default value for the 'mm' param in
their solrconfig.xml file.
* The VelocityResponseWriter is no longer built into the core. Its JAR and
dependencies now need to be added (via or solr/home lib inclusion),
and it needs to be registered in solrconfig.xml like this:
* The update request parameter to choose Update Request Processor Chain is
renamed from "update.processor" to "update.chain". The old parameter was
deprecated but still working since Solr3.2, but is now removed
entirely.
* The and sections of solrconfig.xml are discontinued
and replaced with the section. There are also better defaults.
When migrating, if you don't know what your old settings mean, simply delete
both and sections. If you have customizations,
put them in section - with same syntax as before.
* Two of the SolrServer subclasses in SolrJ were renamed/replaced.
CommonsHttpSolrServer is now HttpSolrServer, and
StreamingUpdateSolrServer is now ConcurrentUpdateSolrServer.
* The PingRequestHandler no longer looks for a option in the
(legacy) section of solrconfig.xml. Users who wish to take
advantage of this feature should configure a "healthcheckFile" init param
directly on the PingRequestHandler. As part of this change, relative file
paths have been fixed to be resolved against the data dir. See the example
solrconfig.xml and SOLR-1258 for more details.
* Due to low level changes to support SolrCloud, the uniqueKey field can no
longer be populated via or in the
schema.xml. Users wishing to have Solr automatically generate a uniqueKey
value when adding documents should instead use an instance of
solr.UUIDUpdateProcessorFactory in their update processor chain. See
SOLR-2796 for more details.
Detailed Change List
----------------------
New Features
----------------------
* SOLR-3272: Solr filter factory for MorfologikFilter (Polish lemmatisation).
(Rafał Kuć via Dawid Weiss, Steven Rowe, Uwe Schindler).
* SOLR-571: The autowarmCount for LRUCaches (LRUCache and FastLRUCache) now
supports "percentages" which get evaluated relative the current size of
the cache when warming happens.
(Tomas Fernandez Lobbe and hossman)
* SOLR-1932: New relevancy function queries: termfreq, tf, docfreq, idf
norm, maxdoc, numdocs. (yonik)
* SOLR-1665: Add debug component options for timings, results and query info only (gsingers, hossman, yonik)
* SOLR-2112: Solrj API now supports streaming results. (ryan)
* SOLR-792: Adding PivotFacetComponent for Hierarchical faceting
(ehatcher, Jeremy Hinegardner, Thibaut Lassalle, ryan)
* LUCENE-2507, SOLR-2571, SOLR-2576: Added DirectSolrSpellChecker, which uses Lucene's
DirectSpellChecker to retrieve correction candidates directly from the term dictionary using
levenshtein automata. (James Dyer, rmuir)
* SOLR-1873, SOLR-2358: SolrCloud - added shared/central config and core/shard management via zookeeper,
built-in load balancing, and distributed indexing.
(Jamie Johnson, Sami Siren, Ted Dunning, yonik, Mark Miller)
Additional Work:
- SOLR-2324: SolrCloud solr.xml parameters are not persisted by CoreContainer.
(Massimo Schiavon, Mark Miller)
- SOLR-2287: Allow users to query by multiple, compatible collections with SolrCloud.
(Soheb Mahmood, Alex Cowell, Mark Miller)
- SOLR-2622: ShowFileRequestHandler does not work in SolrCloud mode.
(Stefan Matheis, Mark Miller)
- SOLR-3108: Error in SolrCloud's replica lookup code when replica's are hosted in same Solr instance.
(Bruno Dumon, Sami Siren, Mark Miller)
- SOLR-3080: Remove shard info from zookeeper when SolrCore is explicitly unloaded.
(yonik, Mark Miller, siren)
- SOLR-3437: Recovery issues a spurious commit to the cluster. (Trym R. Møller via Mark Miller)
- SOLR-2822: Skip update processors already run on other nodes (hossman)
* SOLR-1566: Transforming documents in the ResponseWriters. This will allow
for more complex results in responses and open the door for function queries
as results.
(ryan with patches from grant, noble, cmale, yonik, Jan Høydahl,
Arul Kalaipandian, Luca Cavanna, hossman)
- SOLR-2037: Thanks to SOLR-1566, documents boosted by the QueryElevationComponent
can be marked as boosted. (gsingers, ryan, yonik)
* SOLR-2396: Add CollationField, which is much more efficient than
the Solr 3.x CollationKeyFilterFactory, and also supports
Locale-sensitive range queries. (rmuir)
* SOLR-2338: Add support for using in a schema's fieldType,
for customizing scoring on a per-field basis. (hossman, yonik, rmuir)
* SOLR-2335: New 'field("...")' function syntax for referring to complex
field names (containing whitespace or special characters) in functions.
* SOLR-2383: /browse improvements: generalize range and date facet display
(Jan Høydahl via yonik)
* SOLR-2272: Pseudo-join queries / filters. Examples:
- To restrict to the set of parents with at least one blue-eyed child:
fq={!join from=parent to=name}eyes:blue
- To restrict to the set of children with at least one blue-eyed parent:
fq={!join from=name to=parent}eyes:blue
(yonik)
* SOLR-1942: Added the ability to select postings format per fieldType in schema.xml
as well as support custom Codecs in solrconfig.xml.
(simonw via rmuir)
* SOLR-2136: Boolean type added to function queries, along with
new functions exists(), if(), and(), or(), xor(), not(), def(),
and true and false constants. (yonik)
* SOLR-2491: Add support for using spellcheck collation in conjunction
with grouping. Note that the number of hits returned for collations
is the number of ungrouped hits. (James Dyer via rmuir)
* SOLR-1298: Return FunctionQuery as pseudo field. The solr 'fl' param
now supports functions. For example: fl=id,sum(x,y) -- NOTE: only
functions with fast random access are reccomended. (yonik, ryan)
* SOLR-705: Optionally return shard info with each document in distributed
search. Use fl=id,[shard] to return the shard url. (ryan)
* SOLR-2417: Add explain info directly to return documents using
?fl=id,[explain] (ryan)
* SOLR-2533: Converted ValueSource.ValueSourceSortField over to new rewriteable Lucene
SortFields. ValueSourceSortField instances must be rewritten before they can be used.
This is done by SolrIndexSearcher when necessary. (Chris Male).
* SOLR-2193, SOLR-2565: You may now specify a 'soft' commit when committing. This will
use Lucene's NRT feature to avoid guaranteeing documents are on stable storage in exchange
for faster reopen times. There is also a new 'soft' autocommit tracker that can be
configured. (Mark Miller, Robert Muir)
* SOLR-2399: Updated Solr Admin interface. New look and feel with per core administration
and many new options. (Stefan Matheis via ryan)
* SOLR-1032: CSV handler now supports "literal.field_name=value" parameters.
(Simon Rosenthal, ehatcher)
* SOLR-2656: realtime-get, efficiently retrieves the latest stored fields for specified
documents, even if they are not yet searchable (i.e. without reopening a searcher)
(yonik)
* SOLR-2703: Added support for Lucene's "surround" query parser. (Simon Rosenthal, ehatcher)
* SOLR-2754: Added factories for several ranking algorithms:
- BM25SimilarityFactory: Okapi BM25
- DFRSimilarityFactory: Divergence from Randomness models
- IBSimilarityFactory: Information-based models
- LMDirichletSimilarity: LM with Dirichlet smoothing
- LMJelinekMercerSimilarity: LM with Jelinek-Mercer smoothing
(David Mark Nemeskey, Robert Muir)
* SOLR-2134 Trie* fields should support sortMissingLast=true, and deprecate Sortable* Field Types
(Ryan McKinley, Mike McCandless, Uwe Schindler, Erick Erickson)
* SOLR-2438 added MultiTermAwareComponent to the various classes to allow automatic lowercasing
for multiterm queries (wildcards, regex, prefix, range, etc). You can now optionally specify a
"multiterm" analyzer in our schema.xml, but Solr should "do the right thing" if you don't
specify (Pete Sturge Erick Erickson, Mentoring from Seeley and Muir)
* SOLR-2481: Add support for commitWithin in DataImportHandler (Sami Siren via yonik)
* SOLR-2992: Add support for IndexWriter.prepareCommit() via prepareCommit=true
on update URLs. (yonik)
* SOLR-2906: Added LFU cache options to Solr. (Shawn Heisey via Erick Erickson)
* SOLR-3069: Ability to add openSearcher=false to not open a searcher when doing
a hard commit. commitWithin now only invokes a softCommit. (yonik)
* SOLR-2802: New FieldMutatingUpdateProcessor and Factory to simplify the
development of UpdateProcessors that modify field values of documents as
they are indexed. Also includes several useful new implementations:
- RemoveBlankFieldUpdateProcessorFactory
- TrimFieldUpdateProcessorFactory
- HTMLStripFieldUpdateProcessorFactory
- RegexReplaceProcessorFactory
- FieldLengthUpdateProcessorFactory
- ConcatFieldUpdateProcessorFactory
- FirstFieldValueUpdateProcessorFactory
- LastFieldValueUpdateProcessorFactory
- MinFieldValueUpdateProcessorFactory
- MaxFieldValueUpdateProcessorFactory
- TruncateFieldUpdateProcessorFactory
- IgnoreFieldUpdateProcessorFactory
(hossman, janhoy)
* SOLR-3120: Optional post filtering for spatial queries bbox and geofilt
for LatLonType. (yonik)
* SOLR-2459: Expose LogLevel selection with a RequestHandler rather then servlet
(Stefan Matheis, Upayavira, ryan)
* SOLR-3134: Include shard info in distributed response when shards.info=true
(Russell Black, ryan)
* SOLR-2898: Support grouped faceting. (Martijn van Groningen)
Additional Work:
- SOLR-3406: Extended grouped faceting support to facet.query and facet.range parameters.
(David Boychuck, Martijn van Groningen)
* SOLR-2949: QueryElevationComponent is now supported with distributed search.
(Mark Miller, yonik)
* SOLR-3221: Added the ability to directly configure aspects of the concurrency
and thread-pooling used within distributed search in solr. This allows for finer
grained controlled and can be tuned by end users to target their own specific
requirements. This builds on the work of the HttpCommComponent and uses the same configuration
block to configure the thread pool. The default configuration has
the same behaviour as solr 3.5, favouring throughput over latency. More
information can be found on the wiki (http://wiki.apache.org/solr/SolrConfigXml) (Greg Bowyer)
* SOLR-3278: Negative boost support to the Extended Dismax Query Parser Boost Query (bq).
(James Dyer)
* SOLR-3255: OpenExchangeRates.Org Exchange Rate Provider for CurrencyField (janhoy)
* SOLR-3358: Logging events are captured and available from the /admin/logging
request handler. (ryan)
* SOLR-1535: PreAnalyzedField type provides a functionality to index (and optionally store)
field content that was already processed and split into tokens using some external processing
chain. Serialization format is pluggable, and defaults to JSON. (ab)
* SOLR-3363: Consolidated Exceptions in Analysis Factories so they only throw
InitalizationExceptions (Chris Male)
* SOLR-2690: New support for a "TZ" request param which overrides the TimeZone
used when rounding Dates in DateMath expressions for the entire request
(all date range queries and date faceting is affected). The default TZ
is still UTC. (David Schlotfeldt, hossman)
* SOLR-3402: Analysis Factories are now configured with their Lucene Version
throw setLuceneMatchVersion, rather than through the Map passed to init.
Parsing and simple error checking for the Version is now done inside
the code that creates the Analysis Factories. (Chris Male)
* SOLR-3178: Optimistic locking. If a _version_ is provided with an update
that does not match the version in the index, an HTTP 409 error (Conflict)
will result. (Per Steffensen, yonik)
* SOLR-139: Updateable documents. JSON Example:
{"id":"mydoc", "f1":{"set":10}, "f2":{"add":20}} will result in field "f1"
being set to 10, "f2" having an additional value of 20 added, and all
other existing fields unchanged. All source fields must be stored for
this feature to work correctly. (Ryan McKinley, Erik Hatcher, yonik)
* SOLR-2857: Support XML,CSV,JSON, and javabin in a single RequestHandler and
choose the correct ContentStreamLoader based on Content-Type header. This
also deprecates the existing [Xml,JSON,CSV,Binary,Xslt]UpdateRequestHandler.
(ryan)
* SOLR-2585: Context-Sensitive Spelling Suggestions & Collations. This adds support
for the "spellcheck.alternativeTermCount" & "spellcheck.maxResultsForSuggest"
parameters, letting users receive suggestions even when all the queried terms
exist in the dictionary. This differs from "spellcheck.onlyMorePopular" in
that the suggestions need not consist entirely of terms with a greater document
frequency than the queried terms. (James Dyer)
* SOLR-2058: Edismax query parser to allow "phrase slop" to be specified per-field
on the pf/pf2/pf3 parameters using optional "FieldName~slop^boost" syntax. The
prior "FieldName^boost" syntax is still accepted. In such cases the value on the
"ps" parameter serves as the default slop. (Ron Mayer via James Dyer)
* SOLR-3495: New UpdateProcessors have been added to create default values for
configured fields. These works similarly to the
option in schema.xml, but are applied in the UpdateProcessorChain, so they
may be used prior to other UpdateProcessors, or to generate a uniqueKey field
value when using the DistributedUpdateProcessor (ie: SolrCloud)
TimestampUpdateProcessorFactory
UUIDUpdateProcessorFactory
DefaultValueUpdateProcessorFactory
(hossman)
* SOLR-2993: Add WordBreakSolrSpellChecker to offer suggestions by combining adjacent
query terms and/or breaking terms into multiple words. This spellchecker can be
configured with a traditional checker (ie: DirectSolrSpellChecker). The results
are combined and collations can contain a mix of corrections from both spellcheckers.
(James Dyer)
* SOLR-3508: Simplify JSON update format for deletes as well as allow
version specification for optimistic locking. Examples:
- {"delete":"myid"}
- {"delete":["id1","id2","id3"]}
- {"delete":{"id":"myid", "_version_":123456789}}
(yonik)
* SOLR-3211: Allow parameter overrides in conjunction with "spellcheck.maxCollationTries".
To do so, use parameters starting with "spellcheck.collateParam." For instance, to
override the "mm" parameter, specify "spellcheck.collateParam.mm". This is helpful
in cases where testing spellcheck collations for result counts should use different
parameters from the main query (James Dyer)
* SOLR-2599: CloneFieldUpdateProcessorFactory provides similar functionality
to schema.xml's declaration but as an update processor that can
be combined with other processors in any order. (Jan Høydahl & hossman)
* SOLR-3351: eDismax: ps2 and ps3 params (janhoy)
* SOLR-3542: Add WeightedFragListBuilder for FVH and set it to default fragListBuilder
in example solrconfig.xml. (Sebastian Lutze, koji)
* SOLR-2396: Add ICUCollationField to contrib/analysis-extras, which is much
more efficient than the Solr 3.x ICUCollationKeyFilterFactory, and also
supports Locale-sensitive range queries. (rmuir)
Optimizations
----------------------
* SOLR-1875: Per-segment field faceting for single valued string fields.
Enable with facet.method=fcs, control the number of threads used with
the "threads" local param on the facet.field param. This algorithm will
only be faster in the presence of rapid index changes. (yonik)
* SOLR-1904: When facet.enum.cache.minDf > 0 and the base doc set is a
SortedIntSet, convert to HashDocSet for better performance. (yonik)
* SOLR-2092: Speed up single-valued and multi-valued "fc" faceting. Typical
improvement is 5%, but can be much greater (up to 10x faster) when facet.offset
is very large (deep paging). (yonik)
* SOLR-2193, SOLR-2565: The default Solr update handler has been improved so
that it uses fewer locks, keeps the IndexWriter open rather than closing it
on each commit (ie commits no longer wait for background merges to complete),
works with SolrCore to provide faster 'soft' commits, and has an improved API
that requires less instanceof special casing. (Mark Miller, Robert Muir)
Additional Work:
- SOLR-2697: commit and autocommit operations don't reset
DirectUpdateHandler2.numDocsPending stats attribute.
(Alexey Serba, Mark Miller)
* SOLR-2950: The QueryElevationComponent now avoids using the FieldCache and looking up
every document id (gsingers, yonik)
Bug Fixes
----------------------
* SOLR-3139: Make ConcurrentUpdateSolrServer send UpdateRequest.getParams()
as HTTP request params (siren)
* SOLR-3165: Cannot use DIH in Solrcloud + Zookeeper (Alexey Serba,
Mark Miller, siren)
* SOLR-3068: Occasional NPE in ThreadDumpHandler (siren)
* SOLR-2762: FSTLookup could return duplicate results or one results less
than requested. (David Smiley, Dawid Weiss)
* SOLR-2741: Bugs in facet range display in trunk (janhoy)
* SOLR-1908: Fixed SignatureUpdateProcessor to fail to initialize on
invalid config. Specifically: a signatureField that does not exist,
or overwriteDupes=true with a signatureField that is not indexed.
(hossman)
* SOLR-1824: IndexSchema will now fail to initialize if there is a
problem initializing one of the fields or field types. (hossman)
* SOLR-1928: TermsComponent didn't correctly break ties for non-text
fields sorted by count. (yonik)
* SOLR-2107: MoreLikeThisHandler doesn't work with alternate qparsers. (yonik)
* SOLR-2108: Fixed false positives when using wildcard queries on fields with reversed
wildcard support. For example, a query of *zemog* would match documents that contain
'gomez'. (Landon Kuhn via Robert Muir)
* SOLR-1962: SolrCore#initIndex should not use a mix of indexPath and newIndexPath (Mark Miller)
* SOLR-2275: fix DisMax 'mm' parsing to be tolerant of whitespace
(Erick Erickson via hossman)
* SOLR-2193, SOLR-2565, SOLR-2651: SolrCores now properly share IndexWriters across SolrCore reloads.
(Mark Miller, Robert Muir)
Additional Work:
- SOLR-2705: On reload, IndexWriterProvider holds onto the initial SolrCore it was created with.
(Yury Kats, Mark Miller)
* SOLR-2682: Remove addException() in SimpleFacet. FacetComponent no longer catches and embeds
exceptions occurred during facet processing, it throws HTTP 400 or 500 exceptions instead. (koji)
* SOLR-2654: Directorys used by a SolrCore are now closed when they are no longer used.
(Mark Miller)
* SOLR-2854: Now load URL content stream data (via stream.url) when called for during request handling,
rather than loading URL content streams automatically regardless of use.
(David Smiley and Ryan McKinley via ehatcher)
* SOLR-2829: Fix problem with false-positives due to incorrect
equals methods. (Yonik Seeley, Hossman, Erick Erickson.
Marc Tinnemeyer caught the bug)
* SOLR-2848: Removed 'instanceof AbstractLuceneSpellChecker' hacks from distributed spellchecking code,
and added a merge() method to SolrSpellChecker instead. Previously if you extended SolrSpellChecker
your spellchecker would not work in distributed fashion. (James Dyer via rmuir)
* SOLR-2509: StringIndexOutOfBoundsException in the spellchecker collate when the term contains
a hyphen. (Thomas Gambier caught the bug, Steffen Godskesen did the patch, via Erick Erickson)
* SOLR-1730: Made it clearer when a core failed to load as well as better logging when the
QueryElevationComponent fails to properly initialize (gsingers)
* SOLR-1520: QueryElevationComponent now supports non-string ids (gsingers)
* SOLR-3037: When using binary format in solrj the codec screws up parameters
(Sami Siren, Jörg Maier via yonik)
* SOLR-3062: A join in the main query was not respecting any filters pushed
down to it via acceptDocs since LUCENE-1536. (Mike Hugo, yonik)
* SOLR-3214: If you use multiple fl entries rather than a comma separated list, all but the first
entry can be ignored if you are using distributed search. (Tomas Fernandez Lobbe via Mark Miller)
* SOLR-3352: eDismax: pf2 should kick in for a query with 2 terms (janhoy)
* SOLR-3361: ReplicationHandler "maxNumberOfBackups" doesn't work if backups are triggered on commit
(James Dyer, Tomas Fernandez Lobbe)
* SOLR-2605: fixed tracking of the 'defaultCoreName' in CoreContainer so that
CoreAdminHandler could return consistent information regardless of wether
there is a a default core name or not. (steffkes, hossman)
* SOLR-3370: fixed CSVResponseWriter to respect globs in the 'fl' param
(Keith Fligg via hossman)
* SOLR-3436: Group count incorrect when not all shards are queried in the second
pass. (Francois Perron, Martijn van Groningen)
* SOLR-3454: Exception when using result grouping with main=true and using
wt=javabin. (Ludovic Boutros, Martijn van Groningen)
* SOLR-3446: Better errors when PatternTokenizerFactory is configured with
an invalid pattern, and include the 'name' whenever possible in plugin init
error messages. (hossman)
* LUCENE-4075: Cleaner path usage in TestXPathEntityProcessor
(Greg Bowyer via hossman)
* SOLR-2923: IllegalArgumentException when using useFilterForSortedQuery on an
empty index. (Adrien Grand via Mark Miller)
* SOLR-2352: Fixed TermVectorComponent so that it will not fail if the fl
param contains globs or psuedo-fields (hossman)
* SOLR-3541: add missing solrj dependencies to binary packages.
(Thijs Vonk via siren)
* SOLR-3522: fixed parsing of the 'literal()' function (hossman)
* SOLR-3548: Fixed a bug in the cachability of queries using the {!join}
parser or the strdist() function, as well as some minor improvements to
the hashCode implementation of {!bbox} and {!geofilt} queries.
(hossman)
* SOLR-3470: contrib/clustering: custom Carrot2 tokenizer and stemmer factories
are respected now (Stanislaw Osinski, Dawid Weiss)
* SOLR-3430: Added a new DIH test against a real SQL database. Fixed problems
revealed by this new test related to the expanded cache support added to
3.6/SOLR-2382 (James Dyer)
* SOLR-1958: When using the MailEntityProcessor, import would fail if
fetchMailsSince was not specified. (Max Lynch via James Dyer)
Other Changes
----------------------
* SOLR-1846: Eliminate support for the abortOnConfigurationError
option. It has never worked very well, and in recent versions of
Solr hasn't worked at all. (hossman)
* SOLR-1889: The default logic for the 'mm' param of DismaxQParser and
ExtendedDismaxQParser has been changed to be determined based on the
effective value of the 'q.op' param (hossman)
* SOLR-1946: Misc improvements to the SystemInfoHandler: /admin/system
(hossman)
* SOLR-2289: Tweak spatial coords for example docs so they are a bit
more spread out (Erick Erickson via hossman)
* SOLR-2288: Small tweaks to eliminate compiler warnings. primarily
using Generics where applicable in method/object declatations, and
adding @SuppressWarnings("unchecked") when appropriate (hossman)
* SOLR-2375: Suggester Lookup implementations now store trie data
and load it back on init. This means that large tries don't have to be
rebuilt on every commit or core reload. (ab)
* SOLR-2413: Support for returning multi-valued fields w/o tag
in the XMLResponseWriter was removed. XMLResponseWriter only
no longer work with values less then 2.2 (ryan)
* SOLR-2423: FieldType argument changed from String to Object
Conversion from SolrInputDocument > Object > Fieldable is now managed
by FieldType rather then DocumentBuilder. (ryan)
* SOLR-2461: QuerySenderListener and AbstractSolrEventListener are
now public (hossman)
* LUCENE-2995: Moved some spellchecker and suggest APIs to modules/suggest:
HighFrequencyDictionary, SortedIterator, TermFreqIterator, and the
suggester APIs and implementations. (rmuir)
* SOLR-2576: Remove deprecated SpellingResult.add(Token, int).
(James Dyer via rmuir)
* LUCENE-3232: Moved MutableValue classes to new 'common' module. (Chris Male)
* LUCENE-2883: FunctionQuery, DocValues (and its impls), ValueSource (and its
impls) and BoostedQuery have been consolidated into the queries module. They
can now be found at o.a.l.queries.function.
* SOLR-2027: FacetField.getValues() now returns an empty list if there are no
values, instead of null (Chris Male)
* SOLR-1825: SolrQuery.addFacetQuery now enables facets automatically, like
addFacetField (Chris Male)
* SOLR-2663: FieldTypePluginLoader has been refactored out of IndexSchema
and made public. (hossman)
* SOLR-2331,SOLR-2691: Refactor CoreContainer's SolrXML serialization code and improve testing
(Yury Kats, hossman, Mark Miller)
* SOLR-2698: Enhance CoreAdmin STATUS command to return index size.
(Yury Kats, hossman, Mark Miller)
* SOLR-2654: The same Directory instance is now always used across a SolrCore so that
it's easier to add other DirectoryFactory's without static caching hacks.
(Mark Miller)
* LUCENE-3286: 'luke' ant target has been disabled due to incompatibilities with XML
queryparser location (Chris Male)
* SOLR-1897: The data dir from the core descriptor should override the data dir from
the solrconfig.xml rather than the other way round. (Mark Miller)
* SOLR-2756: Maven configuration: Excluded transitive stax:stax-api dependency
from org.codehaus.woodstox:wstx-asl dependency. (David Smiley via Steve Rowe)
* SOLR-2588: Moved VelocityResponseWriter back to contrib module in order to
remove it as a mandatory core dependency. (ehatcher)
* SOLR-2862: More explicit lexical resources location logged if Carrot2 clustering
extension is used. Fixed solr. impl. of IResource and IResourceLookup. (Dawid Weiss)
* SOLR-1123: Changed JSONResponseWriter to now use application/json as its Content-Type
by default. However the Content-Type can be overwritten and is set to text/plain in
the example configuration. (Uri Boness, Chris Male)
* SOLR-2607: Removed deprecated client/ruby directory, which included solr-ruby and flare.
(ehatcher)
* SOLR-3032: logOnce from SolrException logOnce and all the supporting
structure is gone. abortOnConfugrationError is also gone as it is no longer referenced.
Errors should be caught and logged at the top-most level or logged and NOT propagated up the
chain. (Erick Erickson)
* SOLR-2105: Remove support for deprecated "update.processor" (since 3.2), in favor of
"update.chain" (janhoy)
* SOLR-3005: Default QueryResponseWriters are now initialized via init() with an empty
NamedList. (Gasol Wu, Chris Male)
* SOLR-2607: Removed obsolete client/ folder (ehatcher, Eric Pugh, janhoy)
* SOLR-3202, SOLR-3244: Dropping Support for JSP. New Admin UI is all client side
(ryan, Aliaksandr Zhuhrou, Uwe Schindler)
* SOLR-3159: Upgrade example and tests to run with Jetty 8 (ryan)
* SOLR-3254: Upgrade Solr to Tika 1.1 (janhoy)
* SOLR-3329: Dropped getSourceID() from SolrInfoMBean and using
getClass().getPackage().getSpecificationVersion() for Version. (ryan)
* SOLR-3302: Upgraded SLF4j to version 1.6.4 (hossman)
* SOLR-3322: Add more context to IndexReaderFactory.newReader (ab)
* SOLR-3343: Moved FastWriter, FileUtils, RegexFileFilter, RTimer and SystemIdResolver
from org.apache.solr.common to org.apache.solr.util (Chris Male)
* SOLR-3357: ResourceLoader.newInstance now accepts a Class representation of the expected
instance type (Chris Male)
* SOLR-3388: HTTP caching is now disabled by default for RequestUpdateHandlers. (ryan)
* SOLR-3309: web.xml now specifies metadata-complete=true (which requires
Servlet 2.5) to prevent servlet containers from scanning class annotations
on startup. This allows for faster startup times on some servlet containers.
(Bill Bell, hossman)
* SOLR-1893: Refactored some common code from LRUCache and FastLRUCache into
SolrCacheBase (Tomás Fernández Löbbe via hossman)
* SOLR-3403: Deprecated Analysis Factories now log their own deprecation messages.
No logging support is provided by Factory parent classes. (Chris Male)
* SOLR-1258: PingRequestHandler is now directly configured with a
"healthcheckFile" instead of looking for the legacy
syntax. Filenames specified as relative
paths have been fixed so that they are resolved against the data dir
instead of the CWD of the java process. (hossman)
* SOLR-3083: JMX beans now report Numbers as numeric values rather then String
(Tagged Siteops, Greg Bowyer via ryan)
* SOLR-2796: Due to low level changes to support SolrCloud, the uniqueKey
field can no longer be populated via or
in the schema.xml.
* SOLR-3534: The Dismax and eDismax query parsers will fall back on the 'df' parameter
when 'qf' is absent. And if neither is present nor the schema default search field
then an exception will be thrown now. (dsmiley)
* SOLR-3262: The "threads" feature of DIH is removed (deprecated in Solr 3.6)
(James Dyer)
* SOLR-3422: Refactored DIH internal data classes. All entities in
data-config.xml must have a name (James Dyer)
Documentation
----------------------
* SOLR-2232: Improved README info on solr.solr.home in examples
(Eric Pugh and hossman)
================== 3.6.1 ==================
More information about this release, including any errata related to the
release notes, upgrade instructions, or other changes may be found online at:
https://wiki.apache.org/solr/Solr3.6.1
Bug Fixes
* LUCENE-3969: Throw IAE on bad arguments that could cause confusing errors in
PatternTokenizer. CommonGrams populates PositionLengthAttribute correctly.
(Uwe Schindler, Mike McCandless, Robert Muir)
* SOLR-3361: ReplicationHandler "maxNumberOfBackups" doesn't work if backups are triggered on commit
(James Dyer, Tomas Fernandez Lobbe)
* SOLR-3375: Fix charset problems with HttpSolrServer (Roger Håkansson, yonik, siren)
* SOLR-3436: Group count incorrect when not all shards are queried in the second
pass. (Francois Perron, Martijn van Groningen)
* SOLR-3454: Exception when using result grouping with main=true and using
wt=javabin. (Ludovic Boutros, Martijn van Groningen)
* SOLR-3489: Config file replication less error prone (Jochen Just via janhoy)
* SOLR-3477: SOLR does not start up when no cores are defined (Tomás Fernández Löbbe via tommaso)
* SOLR-3470: contrib/clustering: custom Carrot2 tokenizer and stemmer factories
are respected now (Stanislaw Osinski, Dawid Weiss)
* SOLR-3360: More DIH bug fixes for the deprecated "threads" parameter.
(Mikhail Khludnev, Claudio R, via James Dyer)
* SOLR-3430: Added a new DIH test against a real SQL database. Fixed problems
revealed by this new test related to the expanded cache support added to
3.6/SOLR-2382 (James Dyer)
* SOLR-3336: SolrEntityProcessor substitutes most variables at query time.
(Michael Kroh, Lance Norskog, via Martijn van Groningen)
================== 3.6.0 ==================
More information about this release, including any errata related to the
release notes, upgrade instructions, or other changes may be found online at:
https://wiki.apache.org/solr/Solr3.6
Upgrading from Solr 3.5
----------------------
* SOLR-2983: As a consequence of moving the code which sets a MergePolicy from SolrIndexWriter to SolrIndexConfig,
(custom) MergePolicies should now have an empty constructor; thus an IndexWriter should not be passed as constructor
parameter but instead set using the setIndexWriter() method.
* As doGet() methods in SimplePostTool was changed to static, the client applications of this
class need to be recompiled.
* In Solr version 3.5 and earlier, HTMLStripCharFilter had known bugs in the
character offsets it provided, triggering e.g. exceptions in highlighting.
HTMLStripCharFilter has been re-implemented, addressing this and other
issues. See the entry for LUCENE-3690 in the Bug Fixes section below for a
detailed list of changes. For people who depend on the behavior of
HTMLStripCharFilter in Solr version 3.5 and earlier: the old implementation
(bugs and all) is preserved as LegacyHTMLStripCharFilter.
* As of Solr 3.6, the and sections of solrconfig.xml are deprecated
and replaced with a new section. Read more in SOLR-1052 below.
* SOLR-3040: The DIH's admin UI (dataimport.jsp) now requires DIH request handlers to start with
a '/'. (dsmiley)
* SOLR-3161: is now the default. An existing config will
probably work as-is because handleSelect was explicitly enabled in default configs. HandleSelect
makes /select work as well as enables the 'qt' parameter. Instead, consider explicitly
configuring /select as is done in the example solrconfig.xml, and register your other search
handlers with a leading '/' which is a recommended practice. (David Smiley, Erik Hatcher)
* SOLR-3161: Don't use the 'qt' parameter with a leading '/'. It probably won't work in 4.0
and it's now limited in 3.6 to SearchHandler subclasses that aren't lazy-loaded.
* SOLR-2724: Specifying and in
schema.xml is now considered deprecated. Instead you are encouraged to specify these via the "df"
and "q.op" parameters in your request handler definition. (David Smiley)
* Bugs found and fixed in the SignatureUpdateProcessor that previously caused
some documents to produce the same signature even when the configured fields
contained distinct (non-String) values. Users of SignatureUpdateProcessor
are strongly advised that they should re-index as document signatures may
have now changed. (see SOLR-3200 & SOLR-3226 for details)
New Features
----------------------
* SOLR-2020: Add Java client that uses Apache Http Components http client (4.x).
(Chantal Ackermann, Ryan McKinley, Yonik Seeley, siren)
* SOLR-2854: Now load URL content stream data (via stream.url) when called for during request handling,
rather than loading URL content streams automatically regardless of use.
(David Smiley and Ryan McKinley via ehatcher)
* SOLR-2904: BinaryUpdateRequestHandler should be able to accept multiple update requests from
a stream (shalin)
* SOLR-1565: StreamingUpdateSolrServer supports RequestWriter API and therefore, javabin update
format (shalin)
* SOLR-2438 added MultiTermAwareComponent to the various classes to allow automatic lowercasing
for multiterm queries (wildcards, regex, prefix, range, etc). You can now optionally specify a
"multiterm" analyzer in our schema.xml, but Solr should "do the right thing" if you don't
specify (Pete Sturge Erick Erickson, Mentoring from Seeley and Muir)
* SOLR-2919: Added support for localized range queries when the analysis chain uses
CollationKeyFilter or ICUCollationKeyFilter. (Michael Sokolov, rmuir)
* SOLR-2982: Added BeiderMorseFilterFactory for Beider-Morse (BMPM) phonetic encoder. Upgrades
commons-codec to version 1.6 (Brooke Schreier Ganz, rmuir)
* SOLR-1843: A new "rootName" attribute is now available when
configuring in solrconfig.xml. If this attribute is set,
Solr will use it as the root name for all MBeans Solr exposes via
JMX. The default root name is "solr" followed by the core name.
(Constantijn Visinescu, hossman)
* SOLR-2906: Added LFU cache options to Solr. (Shawn Heisey via Erick Erickson)
* SOLR-3036: Ability to specify overwrite=false on the URL for XML updates.
(Sami Siren via yonik)
* SOLR-2603: Add the encoding function for alternate fields in highlighting.
(Massimo Schiavon, koji)
* SOLR-1729: Evaluation of NOW for date math is done only once per request for
consistency, and is also propagated to shards in distributed search.
Adding a parameter NOW= to the request will override the
current time. (Peter Sturge, yonik, Simon Willnauer)
* SOLR-1709: Distributed support for Date and Numeric Range Faceting
(Peter Sturge, David Smiley, hossman, Simon Willnauer)
* SOLR-3054, LUCENE-3671: Add TypeTokenFilterFactory that creates TypeTokenFilter
that filters tokens based on their TypeAttribute. (Tommaso Teofili via
Uwe Schindler)
* LUCENE-3305, SOLR-3056: Added Kuromoji morphological analyzer for Japanese.
See the 'text_ja' fieldtype in the example to get started.
(Christian Moen, Masaru Hasegawa via Robert Muir)
* SOLR-1860: StopFilterFactory, CommonGramsFilterFactory, and
CommonGramsQueryFilterFactory can optionally read stopwords in Snowball
format (specify format="snowball"). (Robert Muir)
* SOLR-3105: ElisionFilterFactory optionally allows the parameter
ignoreCase (default=false). (Robert Muir)
* LUCENE-3714: Add WFSTLookupFactory, a suggester that uses a weighted FST
for more fine-grained suggestions. (Mike McCandless, Dawid Weiss, Robert Muir)
* SOLR-3143: Add SuggestQueryConverter, a QueryConverter intended for
auto-suggesters. (Robert Muir)
* SOLR-3033: ReplicationHandler's backup command now supports a 'maxNumberOfBackups'
init param that can be used to delete all but the most recent N backups. (Torsten Krah, James Dyer)
* SOLR-2202: Currency FieldType, whith support for currencies and exchange rates
(Greg Fodor & Andrew Morrison via janhoy, rmuir, Uwe Schindler)
* SOLR-3026: eDismax: Locking down which fields can be explicitly queried (user fields aka uf)
(janhoy, hossmann, Tomás Fernández Löbbe)
* SOLR-2826: URLClassify Update Processor (janhoy)
* SOLR-2764: Create a NorwegianLightStemmer and NorwegianMinimalStemmer (janhoy)
* SOLR-3221: Added the ability to directly configure aspects of the concurrency
and thread-pooling used within distributed search in solr. This allows for finer
grained controlled and can be tuned by end users to target their own specific
requirements. This builds on the work of the HttpCommComponent and uses the same configuration
block to configure the thread pool. The default configuration has
the same behaviour as solr 3.5, favouring throughput over latency. More
information can be found on the wiki (http://wiki.apache.org/solr/SolrConfigXml) (Greg Bowyer)
* SOLR-2001: The query component will substitute an empty query that matches
no documents if the query parser returns null. This also prevents an
exception from being thrown by the default parser if "q" is missing. (yonik)
- SOLR-435: if q is "" then it's also acceptable. (dsmiley, hoss)
* SOLR-2919: Added parametric tailoring options to ICUCollationKeyFilterFactory.
These can be used to customize range query/sort behavior, for example to
support numeric collation, ignore punctuation/whitespace, ignore accents but
not case, control whether upper/lowercase values are sorted first, etc. (rmuir)
* SOLR-2346: Add a chance to set content encoding explicitly via content type
of stream for extracting request handler. This is convenient when Tika's
auto detector cannot detect encoding, especially the text file is too short
to detect encoding. (koji)
* SOLR-1499: Added SolrEntityProcessor that imports data from another Solr core
or instance based on a specified query.
(Lance Norskog, Erik Hatcher, Pulkit Singhal, Ahmet Arslan, Luca Cavanna,
Martijn van Groningen)
* SOLR-3190: Minor improvements to SolrEntityProcessor. Add more consistency
between solr parameters and parameters used in SolrEntityProcessor and
ability to specify a custom HttpClient instance.
(Luca Cavanna via Martijn van Groningen)
* SOLR-2382: Added pluggable cache support to DIH so that any Entity can be
made cache-able by adding the "cacheImpl" parameter. Include
"SortedMapBackedCache" to provide in-memory caching (as previously this was
the only option when using CachedSqlEntityProcessor). Users can provide
their own implementations of DIHCache for other caching strategies.
Deprecate CachedSqlEntityProcessor in favor of specifing "cacheImpl" with
SqlEntityProcessor. Make SolrWriter implement DIHWriter and allow the
possibility of pluggable Writers (DIH writing to something other than Solr).
(James Dyer, Noble Paul)
Optimizations
----------------------
* SOLR-1931: Speedup for LukeRequestHandler and admin/schema browser. New parameter
reportDocCount defaults to 'false'. Old behavior still possible by specifying this as 'true'
(Erick Erickson)
* SOLR-3012: Move System.getProperty("type") in postData() to main() and add type argument so that
the client applications of SimplePostTool can set content type via method argument. (koji)
* SOLR-2888: FSTSuggester refactoring: internal storage is now UTF-8,
external sorting (on disk) prevents OOMs even with large data sets
(the bottleneck is now FST construction), code cleanups and API cleanups.
(Dawid Weiss, Robert Muir)
Bug Fixes
----------------------
* SOLR-3187 SystemInfoHandler leaks filehandles (siren)
* LUCENE-3820: Fixed invalid position indexes by reimplementing PatternReplaceCharFilter.
This change also drops real support for boundary characters -- all input is prebuffered
for pattern matching. (Dawid Weiss)
* SOLR-3068: Fixed NPE in ThreadDumpHandler (siren)
* SOLR-2912: Fixed File descriptor leak in ShowFileRequestHandler (Michael Ryan, shalin)
* SOLR-2819: Improved speed of parsing hex entities in HTMLStripCharFilter
(Bernhard Berger, hossman)
* SOLR-2509: StringIndexOutOfBoundsException in the spellchecker collate when the term contains
a hyphen. (Thomas Gambier caught the bug, Steffen Godskesen did the patch, via Erick Erickson)
* SOLR-2955: Fixed IllegalStateException when querying with group.sort=score desc in sharded
environment. (Steffen Elberg Godskesen, Martijn van Groningen)
* SOLR-2956: Fixed inconsistencies in the flags (and flag key) reported by
the LukeRequestHandler (hossman)
* SOLR-1730: Made it clearer when a core failed to load as well as better logging when the
QueryElevationComponent fails to properly initialize (gsingers)
* SOLR-1520: QueryElevationComponent now supports non-string ids (gsingers)
* SOLR-3024: Fixed JSONTestUtil.matchObj, in previous releases it was not
respecting the 'delta' arg (David Smiley via hossman)
* SOLR-2542: Fixed DIH Context variables which were broken for all scopes other
then SCOPE_ENTITY (Linbin Chen & Frank Wesemann via hossman)
* SOLR-3042: Fixed Maven Jetty plugin configuration.
(David Smiley via Steve Rowe)
* SOLR-2970: CSV ResponseWriter returns fields defined as stored=false in schema (janhoy)
* LUCENE-3690, LUCENE-2208, SOLR-882, SOLR-42: Re-implemented
HTMLStripCharFilter as a JFlex-generated scanner and moved it to
lucene/contrib/analyzers/common/. See below for a list of bug fixes and
other changes. To get the same behavior as HTMLStripCharFilter in Solr
version 3.5 and earlier (including the bugs), use LegacyHTMLStripCharFilter,
which is the previous implementation.
Behavior changes from the previous version:
- Known offset bugs are fixed.
- The "Mark invalid" exceptions reported in SOLR-1283 are no longer
triggered (the bug is still present in LegacyHTMLStripCharFilter).
- The character entity "'" is now always properly decoded.
- More cases of