Apache Solr Release Notes Introduction ------------ Apache Solr is an open source enterprise search server based on the Apache Lucene Java search library, with XML/HTTP and JSON APIs, hit highlighting, faceted search, caching, replication, and a web administration interface. It runs in a Java servlet container such as Tomcat. See http://lucene.apache.org/solr for more information. Getting Started --------------- You need a Java 1.6 VM or later installed. In this release, there is an example Solr server including a bundled servlet container in the directory named "example". See the tutorial at http://lucene.apache.org/solr/tutorial.html $Id$ ================== 5.0.0 ================== (No changes) ================== 4.1.0 ================== Detailed Change List ---------------------- New Features ---------------------- * SOLR-2255: Enhanced pivot faceting to use local-params in the same way that regular field value faceting can. This means support for excluding a filter query, using a different output key, and specifying 'threads' to do facet.method=fcs concurrently. PivotFacetHelper now extends SimpleFacet and the getFacetImplementation() extension hook was removed. (dsmiley) * SOLR-3897: A highlighter parameter "hl.preserveMulti" to return all of the values of a multiValued field in their original order when highlighting. (Joel Bernstein via yonik) * SOLR-3929: Support configuring IndexWriter max thread count in solrconfig. (phunt via Mark Miller) * SOLR-3906: Add support for AnalyzingSuggester (LUCENE-3842), where the underlying analyzed form used for suggestions is separate from the returned text. (Robert Muir) Optimizations ---------------------- * SOLR-3788: Admin Cores UI should redirect to newly created core details (steffkes) * SOLR-3895: XML and XSLT UpdateRequestHandler should not try to resolve external entities. This improves speed of loading e.g. XSL-transformed XHTML documents. (Martin Herfurt, uschindler, hossman) * SOLR-3614: Fix XML parsing in XPathEntityProcessor to correctly expand named entities, but ignore external entities. (uschindler, hossman) * SOLR-3734: Improve Schema-Browser Handling for CopyField using dynamicField's (steffkes) * SOLR-3941: The "commitOnLeader" part of distributed recovery can use openSearcher=false. (Tomas Fernandez Lobbe via Mark Miller) Bug Fixes ---------------------- * SOLR-3560: Handle different types of Exception Messages for Logging UI (steffkes) * SOLR-3637: Commit Status at Core-Admin UI is always false (steffkes) * SOLR-3917: Partial State on Schema-Browser UI is not defined for Dynamic Fields & Types (steffkes) * SOLR-3939: Consider a sync attempt from leader to replica that fails due to 404 a success. (Mark Miller, Joel Bernstein) * SOLR-3940: Rejoining the leader election incorrectly triggers the code path for a fresh cluster start rather than fail over. (Mark Miller) * SOLR-3961: Fixed error using LimitTokenCountFilterFactory (Jack Krupansky, hossman) Other Changes ---------------------- * SOLR-3899: SolrCore should not log at warning level when the index directory changes - it's an info event. (Tobias Bergman, Mark Miller) * SOLR-3861: Refactor SolrCoreState so that it's managed by SolrCore. (Mark Miller, hossman) * SOLR-3966: Eliminate superfluous warning from LanguageIdentifierUpdateProcessor (Markus Jelsma via hossman) ================== 4.0.0 ================== Versions of Major Components --------------------- Apache Tika 1.2 Carrot2 3.5.0 Velocity 1.6.4 and Velocity Tools 2.0 Apache UIMA 2.3.1 Apache ZooKeeper 3.3.6 Upgrading from Solr 4.0.0-BETA ---------------------- In order to better support distributed search mode, the TermVectorComponent's response format has been changed so that if the schema defines a uniqueKeyField, then that field value is used as the "key" for each document in it's response section, instead of the internal lucene doc id. Users w/o a uniqueKeyField will continue to see the same response format. See SOLR-3229 for more details. If you are using SolrCloud's distributed update request capabilities and a non string type id field, you must re-index. Upgrading from Solr 4.0.0-ALPHA ---------------------- Solr is now much more strict about requiring that the uniqueKeyField feature (if used) must refer to a field which is not multiValued. If you upgrade from an earlier version of Solr and see an error that your uniqueKeyField "can not be configured to be multivalued" please add 'multiValued="false"' to the declaration for your uniqueKeyField. See SOLR-3682 for more details. In addition, please review the notes above about upgrading from 4.0.0-BETA Upgrading from Solr 3.6 ---------------------- * The Lucene index format has changed and as a result, once you upgrade, previous versions of Solr will no longer be able to read your indices. In a master/slave configuration, all searchers/slaves should be upgraded before the master. If the master were to be updated first, the older searchers would not be able to read the new index format. * Setting abortOnConfigurationError=false is no longer supported (since it has never worked properly). Solr will now warn you if you attempt to set this configuration option at all. (see SOLR-1846) * The default logic for the 'mm' param of the 'dismax' QParser has been changed. If no 'mm' param is specified (either in the query, or as a default in solrconfig.xml) then the effective value of the 'q.op' param (either in the query or as a default in solrconfig.xml or from the 'defaultOperator' option in schema.xml) is used to influence the behavior. If q.op is effectively "AND" then mm=100%. If q.op is effectively "OR" then mm=0%. Users who wish to force the legacy behavior should set a default value for the 'mm' param in their solrconfig.xml file. * The VelocityResponseWriter is no longer built into the core. Its JAR and dependencies now need to be added (via or solr/home lib inclusion), and it needs to be registered in solrconfig.xml like this: * The update request parameter to choose Update Request Processor Chain is renamed from "update.processor" to "update.chain". The old parameter was deprecated but still working since Solr3.2, but is now removed entirely. * The and sections of solrconfig.xml are discontinued and replaced with the section. There are also better defaults. When migrating, if you don't know what your old settings mean, simply delete both and sections. If you have customizations, put them in section - with same syntax as before. * Two of the SolrServer subclasses in SolrJ were renamed/replaced. CommonsHttpSolrServer is now HttpSolrServer, and StreamingUpdateSolrServer is now ConcurrentUpdateSolrServer. * The PingRequestHandler no longer looks for a option in the (legacy) section of solrconfig.xml. Users who wish to take advantage of this feature should configure a "healthcheckFile" init param directly on the PingRequestHandler. As part of this change, relative file paths have been fixed to be resolved against the data dir. See the example solrconfig.xml and SOLR-1258 for more details. * Due to low level changes to support SolrCloud, the uniqueKey field can no longer be populated via or in the schema.xml. Users wishing to have Solr automatically generate a uniqueKey value when adding documents should instead use an instance of solr.UUIDUpdateProcessorFactory in their update processor chain. See SOLR-2796 for more details. In addition, please review the notes above about upgrading from 4.0.0-BETA, and 4.0.0-ALPHA Detailed Change List ---------------------- New Features ---------------------- * SOLR-3670: New CountFieldValuesUpdateProcessorFactory makes it easy to index the number of values in another field for later use at query time. (hossman) * SOLR-2768: new "mod(x,y)" function for computing the modulus of two value sources. (hossman) * SOLR-3238: Numerous small improvements to the Admin UI (steffkes) * SOLR-3597: seems like a lot of wasted whitespace at the top of the admin screens (steffkes) * SOLR-3304: Added Solr adapters for Lucene 4's new spatial module. With SpatialRecursivePrefixTreeFieldType ("location_rpt" in example schema), it is possible to index a variable number of points per document (and sort on them), index not just points but any Spatial4j supported shape such as polygons, and to query on these shapes too. Polygons requires adding JTS to the classpath. (David Smiley) * SOLR-3825: Added optional capability to log what ids are in a response (Scott Stults via gsingers) * SOLR-3821: Added 'df' to the UI Query form (steffkes) * SOLR-3822: Added hover titles to the edismax params on the UI Query form (steffkes) Optimizations ---------------------- * SOLR-3715: improve concurrency of the transaction log by removing synchronization around log record serialization. (yonik) * SOLR-3807: Currently during recovery we pause for a number of seconds after waiting for the leader to see a recovering state so that any previous updates will have finished before our commit on the leader - we don't need this wait for peersync. (Mark Miller) * SOLR-3837: When a leader is elected and asks replicas to sync back to him and that fails, we should ask those nodes to recovery asynchronously rather than synchronously. (Mark Miller) * SOLR-3709: Cache the url list created from the ClusterState in CloudSolrServer on each request. (Mark Miller) Bug Fixes ---------------------- * SOLR-3685: Solr Cloud sometimes skipped peersync attempt and replicated instead due to tlog flags not being cleared when no updates were buffered during a previous replication. (Markus Jelsma, Mark Miller, yonik) * SOLR-3229: Fixed TermVectorComponent to work with distributed search (Hang Xie, hossman) * SOLR-3725: Fixed package-local-src-tgz target to not bring in unnecessary jars and binary contents. (Michael Dodsworth via rmuir) * SOLR-3649: Fixed bug in JavabinLoader that caused deleteById(List ids) to not work in SolrJ (siren) * SOLR-3730: Rollback is not implemented quite right and can cause corner case fails in SolrCloud tests. (rmuir, Mark Miller) * SOLR-2981: Fixed StatsComponent to no longer return duplicated information when requesting multiple stats.facet fields. (Roman Kliewer via hossman) * SOLR-3743: Fixed issues with atomic updates and optimistic concurrency in conjunction with stored copyField targets by making real-time get never return copyField targets. (yonik) * SOLR-3746: Proper error reporting if updateLog is configured w/o necessary "_version_" field in schema.xml (hossman) * SOLR-3745: Proper error reporting if SolrCloud mode is used w/o necessary "_version_" field in schema.xml (hossman) * SOLR-3770: Overseer may lose updates to cluster state (siren) * SOLR-3721: Fix bug that could theoretically allow multiple recoveries to run briefly at the same time if the recovery thread join call was interrupted. (Per Steffensen, Mark Miller) * SOLR-3782: A leader going down while updates are coming in can cause shard inconsistency. (Mark Miller) * SOLR-3611: We do not show ZooKeeper data in the UI for a node that has children. (Mark Miller) * SOLR-3789: Fix bug in SnapPuller that caused "internal" compression to fail. (siren) * SOLR-3790: ConcurrentModificationException could be thrown when using hl.fl=*. Fixed in r1231606. (yonik, koji) * SOLR-3668: DataImport : Specifying Custom Parameters (steffkes) * SOLR-3793: UnInvertedField faceting cached big terms in the filter cache that ignored deletions, leading to duplicate documents in search later when a filter of the same term was specified. (Günter Hipler, hossman, yonik) * SOLR-3679: Core Admin UI gives no feedback if "Add Core" fails (steffkes, hossman) * SOLR-3795: Fixed LukeRequestHandler response to correctly return field name strings in copyDests and copySources arrays (hossman) * SOLR-3699: Fixed some Directory leaks when there were errors during SolrCore or SolrIndexWriter initialization. (hossman) * SOLR-3518: Include final 'hits' in log information when aggregating a distibuted request (Markus Jelsma via hossman) * SOLR-3628: SolrInputField and SolrInputDocument are now consistently backed by Collections passed in to setValue/setField, and defensively copy values from Collections passed to addValue/addField (Tom Switzer via hossman) * SOLR-3595: CurrencyField now generates an appropriate error on schema init if it is configured as multiValued - this has never been properly supported, but previously failed silently in odd ways. (hossman) * SOLR-3823: Fix 'bq' parsing in edismax. Please note that this required reverting the negative boost support added by SOLR-3278 (hossman) * SOLR-3827: Fix shareSchema=true in solr.xml (Tomás Fernández Löbbe via hossman) * SOLR-3809: Fixed config file replication when subdirectories are used (Emmanuel Espina via hossman) * SOLR-3828: Fixed QueryElevationComponent so that using 'markExcludes' does not modify the result set or ranking of 'excluded' documents relative to not using elevation at all. (Alexey Serba via hossman) * SOLR-3569: Fixed debug output on distributed requests when there are no results found. (David Bowen via hossman) * SOLR-3811: Query Form using wrong values for dismax, edismax (steffkes) * SOLR-3779: DataImportHandler's LineEntityProcessor when used in conjunction with FileListEntityProcessor would only process the first file. (Ahmet Arslan via James Dyer) * SOLR-3791: CachedSqlEntityProcessor would throw a NullPointerException when a query returns a row with a NULL key. (Steffen Moelter via James Dyer) * SOLR-3833: When a election is started because a leader went down, the new leader candidate should decline if the last state they published was not active. (yonik, Mark Miller) * SOLR-3836: When doing peer sync, we should only count sync attempts that cannot reach the given host as success when the candidate leader is syncing with the replicas - not when replicas are syncing to the leader. (Mark Miller) * SOLR-3835: In our leader election algorithm, if on connection loss we found we did not create our election node, we should retry, not throw an exception. (Mark Miller) * SOLR-3834: A new leader on cluster startup should also run the leader sync process in case there was a bad cluster shutdown. (Mark Miller) * SOLR-3772: On cluster startup, we should wait until we see all registered replicas before running the leader process - or if they all do not come up, N amount of time. (Mark Miller) * SOLR-3756: If we are elected the leader of a shard, but we fail to publish this for any reason, we should clean up and re trigger a leader election. (Mark Miller) * SOLR-3812: ConnectionLoss during recovery can cause lost updates, leading to shard inconsistency. (Mark Miller) * SOLR-3813: When a new leader syncs, we need to ask all shards to sync back, not just those that are active. (Mark Miller) * SOLR-3641: CoreContainer is not persisting roles core attribute. (hossman, Mark Miller) * SOLR-3527: SolrCmdDistributor drops some of the important commit attributes (maxOptimizeSegments, softCommit, expungeDeletes) when sending a commit to replicas. (Andy Laird, Tomas Fernandez Lobbe, Mark Miller) * SOLR-3844: SolrCore reload can fail because it tries to remove the index write lock while already holding it. (Mark Miller) * SOLR-3831: Atomic updates do not distribute correctly to other nodes. (Jim Musil, Mark Miller) * SOLR-3465: Replication causes two searcher warmups. (Michael Garski, Mark Miller) * SOLR-3645: /terms should default to distrib=false. (Nick Cotton, Mark Miller) * SOLR-3759: Various fixes to the example-DIH configs (Ahmet Arslan, hossman) * SOLR-3777: Dataimport-UI does not send unchecked checkboxes (Glenn MacStravic via steffkes) * SOLR-3850: DataImportHandler "cacheKey" parameter was incorrectly renamed "cachePk" (James Dyer) * SOLR-3087: Fixed DOMUtil so that code doing attribute validation will automaticly ignore nodes in the resserved "xml" prefix - in particular this fixes some bugs related to xinclude and fieldTypes. (Amit Nithian, hossman) * SOLR-3783: Fixed Pivot Faceting to work with facet.missing=true (hossman) * SOLR-3869: A PeerSync attempt to it's replicas by a candidate leader should not fail on o.a.http.conn.ConnectTimeoutException. (Mark Miller) * SOLR-3875: Fixed index boosts on multi-valued fields when docBoost is used (hossman) * SOLR-3878: Exception when using open-ended range query with CurrencyField (janhoy) * SOLR-3891: CacheValue in CachingDirectoryFactory cannot be used outside of solr.core package. (phunt via Mark Miller) * SOLR-3892: Inconsistent locking when accessing cache in CachingDirectoryFactory from RAMDirectoryFactory and MockDirectoryFactory. (phunt via Mark Miller) * SOLR-3883: Distributed indexing forwards non-applicable request params. (Dan Sutton, Per Steffensen, yonik, Mark Miller) * SOLR-3903: Fixed MissingFormatArgumentException in ConcurrentUpdateSolrServer (hossman) * SOLR-3916: Fixed whitespace bug in parsing the fl param (hossman) Other Changes ---------------------- * SOLR-3690: Fixed binary release packages to include dependencie needed for the solr-test-framework (hossman) * SOLR-2857: The /update/json and /update/csv URLs were restored to aid in the migration of existing clients. (yonik) * SOLR-3691: SimplePostTool: Mode for crawling/posting web pages See http://wiki.apache.org/solr/ExtractingRequestHandler for examples (janhoy) * SOLR-3707: Upgrade Solr to Tika 1.2 (janhoy) * SOLR-2747: Updated changes2html.pl to handle Solr's CHANGES.txt; added target 'changes-to-html' to solr/build.xml. (Steve Rowe, Robert Muir) * SOLR-3752: When a leader goes down, have the Overseer clear the leader state in cluster.json (Mark Miller) * SOLR-3751: Add defensive checks for SolrCloud updates and requests that ensure the local state matches what we can tell the request expected. (Mark Miller) * SOLR-3773: Hash based on the external String id rather than the indexed representation for distributed updates. (Michael Garski, yonik, Mark Miller) * SOLR-3780: Maven build: Make solrj tests run separately from solr-core. (Steve Rowe) * SOLR-3772: Optionally, on cluster startup, we can wait until we see all registered replicas before running the leader process - or if they all do not come up, N amount of time. (Jan Høydahl, Per Steffensen, Mark Miller) * SOLR-3750: Optionaly, on session expiration, we can explicitly wait some time before running the leader sync process so that we are sure every node participates. (Per Steffensen, Mark Miller) * SOLR-3824: Velocity: Error messages from search not displayed (janhoy) * SOLR-3826: Test framework improvements for specifying coreName on initCore (Amit Nithian, hossman) * SOLR-3749: Allow default UpdateLog syncLevel to be configured by solrconfig.xml (Raintung Li, Mark Miller) * SOLR-3845: Rename numReplicas to replicationFactor in Collections API. (yonik, Mark Miller) * SOLR-3815: SolrCloud - Add properties such as "range" to shards, which changes the clusterstate.json and puts the shard replicas under "replicas". (yonik) * SOLR-3871: SyncStrategy should use an executor for the threads it creates to request recoveries. (Mark Miller) * SOLR-3870: SyncStrategy should have a close so it can abort earlier on shutdown. (Mark Miller) ================== 4.0.0-BETA =================== Versions of Major Components --------------------- Apache Tika 1.1 Carrot2 3.5.0 Velocity 1.6.4 and Velocity Tools 2.0 Apache UIMA 2.3.1 Apache ZooKeeper 3.3.6 Upgrading from Solr 4.0.0-ALPHA ---------------------- Solr is now much more strict about requiring that the uniqueKeyField feature (if used) must refer to a field which is not multiValued. If you upgrade from an earlier version of Solr and see an error that your uniqueKeyField "can not be configured to be multivalued" please add 'multiValued="false"' to the declaration for your uniqueKeyField. See SOLR-3682 for more details. Detailed Change List ---------------------- New Features ---------------------- * LUCENE-4201: Added JapaneseIterationMarkCharFilterFactory to normalize Japanese iteration marks. (Robert Muir, Christian Moen) * SOLR-1856: In Solr Cell, literals should override Tika-parsed values. Patch adds a param "literalsOverride" which defaults to true, but can be set to "false" to let Tika-parsed values be appended to literal values (Chris Harris, janhoy) * SOLR-3488: Added a Collection management API for SolrCloud. (Tommaso Teofili, Sami Siren, yonik, Mark Miller) * SOLR-3559: Full deleteByQuery support with SolrCloud distributed indexing. All replicas of a shard will be consistent, even if updates arrive in a different order on different replicas. (yonik) * SOLR-1929: Index encrypted documents with ExtractingUpdateRequestHandler. By supplying resource.password= or specifying an external file with regular expressions matching file names, Solr will decrypt and index PDFs and DOCX formats. (janhoy, Yiannis Pericleous) * SOLR-3562: Add options to remove instance dir or data dir on core unload. (Mark Miller, Per Steffensen) * SOLR-2702: The default directory factory was changed to NRTCachingDirectoryFactory which wraps the StandardDirectoryFactory and caches small files for improved Near Real-time (NRT) performance. (Mark Miller, yonik) * SOLR-2616: Include a sample java util logging configuration file. (David Smiley, Mark Miller) * SOLR-3460: Add cloud-scripts directory and a zkcli.sh|bat tool for easy scripting and interaction with ZooKeeper. (Mark Miller) * SOLR-1725: StatelessScriptUpdateProcessorFactory allows users to implement the full ScriptUpdateProcessor API using any scripting language with a javax.script.ScriptEngineFactory (Uri Boness, ehatcher, Simon Rosenthal, hossman) * SOLR-139: Change to updateable documents to create the document if it doesn't already exist. To assert that the document must exist, use the optimistic concurrency feature by specifying a _version_ of 1. (yonik) * LUCENE-2510, LUCENE-4044: Migrated Solr's Tokenizer-, TokenFilter-, and CharFilterFactories to the lucene-analysis module. To add new analysis modules to Solr (like ICU, SmartChinese, Morfologik,...), just drop in the JAR files from Lucene's binary distribution into your Solr instance's lib folder. The factories are automatically made available with SPI. (Chris Male, Robert Muir, Uwe Schindler) * SOLR-3634, SOLR-3635: CoreContainer and CoreAdminHandler will now remember and report back information about failures to initialize SolrCores. These failures will be accessible from the web UI and CoreAdminHandler STATUS command until they are "reset" by creating/renaming a SolrCore with the same name. (hossman, steffkes) * SOLR-1280: Added commented-out example of the new script update processor to the example configuration. See http://wiki.apache.org/solr/ScriptUpdateProcessor (ehatcher) * SOLR-3672: SimplePostTool: Improvements for posting files Support for auto mode, recursive and wildcards (janhoy) Optimizations ---------------------- * SOLR-3708: Add hashCode to ClusterState so that structures built based on the ClusterState can be easily cached. (Mark Miller) * SOLR-3709: Cache the url list created from the ClusterState in CloudSolrServer on each request. (Mark Miller, yonik) * SOLR-3710: Change CloudSolrServer so that update requests are only sent to leaders by default. (Mark Miller) Bug Fixes ---------------------- * SOLR-3582: Our ZooKeeper watchers respond to session events as if they are change events, creating undesirable side effects. (Trym R. Møller, Mark Miller) * SOLR-3467: ExtendedDismax escaping is missing several reserved characters (Michael Dodsworth via janhoy) * SOLR-3587: After reloading a SolrCore, the original Analyzer is still used rather than a new one. (Alexey Serba, yonik, rmuir, Mark Miller) * LUCENE-4185: Fix a bug where CharFilters were wrongly being applied twice. (Michael Froh, rmuir) * SOLR-3610: After reloading a core, indexing would fail on any newly added fields to the schema. (Brent Mills, rmuir) * SOLR-3377: edismax fails to correctly parse a fielded query wrapped by parens. This regression was introduced in 3.6. (Bernd Fehling, Jan Høydahl, yonik) * SOLR-3621: Fix rare concurrency issue when opening a new IndexWriter for replication or rollback. (Mark Miller) * SOLR-1781: Replication index directories not always cleaned up. (Markus Jelsma, Terje Sten Bjerkseth, Mark Miller) * SOLR-3639: Update ZooKeeper to 3.3.6 for a variety of bug fixes. (Mark Miller) * SOLR-3629: Typo in solr.xml persistence when overriding the solrconfig.xml file name using the "config" attribute prevented the override file from being used. (Ryan Zezeski, hossman) * SOLR-3642: Correct broken check for multivalued fields in stats.facet (Yandong Yao, hossman) * SOLR-3660: Velocity: Link to admin page broken (janhoy) * SOLR-3658: Adding thousands of docs with one UpdateProcessorChain instance can briefly create spikes of threads in the thousands. (yonik, Mark Miller) * SOLR-3656: A core reload now always uses the same dataDir. (Mark Miller, yonik) * SOLR-3662: Core reload bugs: a reload always obtained a non-NRT searcher, which could go back in time with respect to the previous core's NRT searcher. Versioning did not work correctly across a core reload, and update handler synchronization was changed to synchronize on core state since more than on update handler can coexist for a single index during a reload. (yonik) * SOLR-3663: There are a couple of bugs in the sync process when a leader goes down and a new leader is elected. (Mark Miller) * SOLR-3623: Fixed inconsistent treatment of third-party dependencies for solr contribs analysis-extras & uima (hossman) * SOLR-3652: Fixed range faceting to error instead of looping infinitely when 'gap' is zero -- or effectively zero due to floating point arithmetic underflow. (hossman) * SOLR-3648: Fixed VelocityResponseWriter template loading in SolrCloud mode. For the example configuration, this means /browse now works with SolrCloud. (janhoy, ehatcher) * SOLR-3677: Fixed missleading error message in web ui to distinguish between no SolrCores loaded vs. no /admin/ handler available. (hossman, steffkes) * SOLR-3428: SolrCmdDistributor flushAdds/flushDeletes can cause repeated adds/deletes to be sent (Mark Miller, Per Steffensen) * SOLR-3647: DistributedQueue should use our Solr zk client rather than the std zk client. ZooKeeper expiration can be permanent otherwise. (Mark Miller) Other Changes ---------------------- * SOLR-3524: Make discarding punctuation configurable in JapaneseTokenizerFactory. The default is to discard punctuation, but this is overridable as an expert option. (Kazuaki Hiraga, Jun Ohtani via Christian Moen) * SOLR-1770: Move the default core instance directory into a collection1 folder. (Mark Miller) * SOLR-3355: Add shard and collection to SolrCore statistics. (Michael Garski, Mark Miller) * SOLR-3575: solr.xml should default to persist=true (Mark Miller) * SOLR-3563: Unloading all cores in a SolrCloud collection will now cause the removal of that collection's meta data from ZooKeeper. (Mark Miller, Per Steffensen) * SOLR-3599: Add zkClientTimeout to solr.xml so that it's obvious how to change it and so that you can change it with a system property. (Mark Miller) * SOLR-3609: Change Solr's expanded webapp directory to be at a consistent path called solr-webapp rather than a temporary directory. (Mark Miller) * SOLR-3600: Raise the default zkClientTimeout from 10 seconds to 15 seconds. (Mark Miller) * SOLR-3215: Clone SolrInputDocument when distrib indexing so that update processors after the distrib update process do not process the document twice. (Mark Miller) * SOLR-3683: Improved error handling if an contains both an explicit class attribute, as well as nested factories. (hossman) * SOLR-3682: Fail to parse schema.xml if uniqueKeyField is multivalued (hossman) * SOLR-2115: DIH no longer requires the "config" parameter to be specified in solrconfig.xml. Instead, the configuration is loaded and parsed with every import. This allows the use of a different configuration with each import, and makes correcting configuration errors simpler. Also, the configuration itself can be passed using the "dataConfig" parameter rather than using a file (this previously worked in debug mode only). When configuration errors are encountered, the error message is returned in XML format. (James Dyer) * SOLR-3439: Make SolrCell easier to use out of the box. Also improves "/browse" to display rich-text documents correctly, along with facets for author and content_type. With the new "content" field, highlighting of body is supported. See also SOLR-3672 for easier posting of a whole directory structure. (Jack Krupansky, janhoy) * SOLR-3579: SolrCloud view should default to the graph view rather than tree view. (steffkes, Mark Miller) ================== 4.0.0-ALPHA ================== More information about this release, including any errata related to the release notes, upgrade instructions, or other changes may be found online at: https://wiki.apache.org/solr/Solr4.0 Versions of Major Components --------------------- Apache Tika 1.1 Carrot2 3.5.0 Velocity 1.6.4 and Velocity Tools 2.0 Apache UIMA 2.3.1 Apache ZooKeeper 3.3.4 Upgrading from Solr 3.6-dev ---------------------- * The Lucene index format has changed and as a result, once you upgrade, previous versions of Solr will no longer be able to read your indices. In a master/slave configuration, all searchers/slaves should be upgraded before the master. If the master were to be updated first, the older searchers would not be able to read the new index format. * Setting abortOnConfigurationError=false is no longer supported (since it has never worked properly). Solr will now warn you if you attempt to set this configuration option at all. (see SOLR-1846) * The default logic for the 'mm' param of the 'dismax' QParser has been changed. If no 'mm' param is specified (either in the query, or as a default in solrconfig.xml) then the effective value of the 'q.op' param (either in the query or as a default in solrconfig.xml or from the 'defaultOperator' option in schema.xml) is used to influence the behavior. If q.op is effectively "AND" then mm=100%. If q.op is effectively "OR" then mm=0%. Users who wish to force the legacy behavior should set a default value for the 'mm' param in their solrconfig.xml file. * The VelocityResponseWriter is no longer built into the core. Its JAR and dependencies now need to be added (via or solr/home lib inclusion), and it needs to be registered in solrconfig.xml like this: * The update request parameter to choose Update Request Processor Chain is renamed from "update.processor" to "update.chain". The old parameter was deprecated but still working since Solr3.2, but is now removed entirely. * The and sections of solrconfig.xml are discontinued and replaced with the section. There are also better defaults. When migrating, if you don't know what your old settings mean, simply delete both and sections. If you have customizations, put them in section - with same syntax as before. * Two of the SolrServer subclasses in SolrJ were renamed/replaced. CommonsHttpSolrServer is now HttpSolrServer, and StreamingUpdateSolrServer is now ConcurrentUpdateSolrServer. * The PingRequestHandler no longer looks for a option in the (legacy) section of solrconfig.xml. Users who wish to take advantage of this feature should configure a "healthcheckFile" init param directly on the PingRequestHandler. As part of this change, relative file paths have been fixed to be resolved against the data dir. See the example solrconfig.xml and SOLR-1258 for more details. * Due to low level changes to support SolrCloud, the uniqueKey field can no longer be populated via or in the schema.xml. Users wishing to have Solr automatically generate a uniqueKey value when adding documents should instead use an instance of solr.UUIDUpdateProcessorFactory in their update processor chain. See SOLR-2796 for more details. Detailed Change List ---------------------- New Features ---------------------- * SOLR-3272: Solr filter factory for MorfologikFilter (Polish lemmatisation). (Rafał Kuć via Dawid Weiss, Steven Rowe, Uwe Schindler). * SOLR-571: The autowarmCount for LRUCaches (LRUCache and FastLRUCache) now supports "percentages" which get evaluated relative the current size of the cache when warming happens. (Tomas Fernandez Lobbe and hossman) * SOLR-1932: New relevancy function queries: termfreq, tf, docfreq, idf norm, maxdoc, numdocs. (yonik) * SOLR-1665: Add debug component options for timings, results and query info only (gsingers, hossman, yonik) * SOLR-2112: Solrj API now supports streaming results. (ryan) * SOLR-792: Adding PivotFacetComponent for Hierarchical faceting (ehatcher, Jeremy Hinegardner, Thibaut Lassalle, ryan) * LUCENE-2507, SOLR-2571, SOLR-2576: Added DirectSolrSpellChecker, which uses Lucene's DirectSpellChecker to retrieve correction candidates directly from the term dictionary using levenshtein automata. (James Dyer, rmuir) * SOLR-1873, SOLR-2358: SolrCloud - added shared/central config and core/shard management via zookeeper, built-in load balancing, and distributed indexing. (Jamie Johnson, Sami Siren, Ted Dunning, yonik, Mark Miller) Additional Work: - SOLR-2324: SolrCloud solr.xml parameters are not persisted by CoreContainer. (Massimo Schiavon, Mark Miller) - SOLR-2287: Allow users to query by multiple, compatible collections with SolrCloud. (Soheb Mahmood, Alex Cowell, Mark Miller) - SOLR-2622: ShowFileRequestHandler does not work in SolrCloud mode. (Stefan Matheis, Mark Miller) - SOLR-3108: Error in SolrCloud's replica lookup code when replica's are hosted in same Solr instance. (Bruno Dumon, Sami Siren, Mark Miller) - SOLR-3080: Remove shard info from zookeeper when SolrCore is explicitly unloaded. (yonik, Mark Miller, siren) - SOLR-3437: Recovery issues a spurious commit to the cluster. (Trym R. Møller via Mark Miller) - SOLR-2822: Skip update processors already run on other nodes (hossman) * SOLR-1566: Transforming documents in the ResponseWriters. This will allow for more complex results in responses and open the door for function queries as results. (ryan with patches from grant, noble, cmale, yonik, Jan Høydahl, Arul Kalaipandian, Luca Cavanna, hossman) - SOLR-2037: Thanks to SOLR-1566, documents boosted by the QueryElevationComponent can be marked as boosted. (gsingers, ryan, yonik) * SOLR-2396: Add CollationField, which is much more efficient than the Solr 3.x CollationKeyFilterFactory, and also supports Locale-sensitive range queries. (rmuir) * SOLR-2338: Add support for using in a schema's fieldType, for customizing scoring on a per-field basis. (hossman, yonik, rmuir) * SOLR-2335: New 'field("...")' function syntax for referring to complex field names (containing whitespace or special characters) in functions. * SOLR-2383: /browse improvements: generalize range and date facet display (Jan Høydahl via yonik) * SOLR-2272: Pseudo-join queries / filters. Examples: - To restrict to the set of parents with at least one blue-eyed child: fq={!join from=parent to=name}eyes:blue - To restrict to the set of children with at least one blue-eyed parent: fq={!join from=name to=parent}eyes:blue (yonik) * SOLR-1942: Added the ability to select postings format per fieldType in schema.xml as well as support custom Codecs in solrconfig.xml. (simonw via rmuir) * SOLR-2136: Boolean type added to function queries, along with new functions exists(), if(), and(), or(), xor(), not(), def(), and true and false constants. (yonik) * SOLR-2491: Add support for using spellcheck collation in conjunction with grouping. Note that the number of hits returned for collations is the number of ungrouped hits. (James Dyer via rmuir) * SOLR-1298: Return FunctionQuery as pseudo field. The solr 'fl' param now supports functions. For example: fl=id,sum(x,y) -- NOTE: only functions with fast random access are reccomended. (yonik, ryan) * SOLR-705: Optionally return shard info with each document in distributed search. Use fl=id,[shard] to return the shard url. (ryan) * SOLR-2417: Add explain info directly to return documents using ?fl=id,[explain] (ryan) * SOLR-2533: Converted ValueSource.ValueSourceSortField over to new rewriteable Lucene SortFields. ValueSourceSortField instances must be rewritten before they can be used. This is done by SolrIndexSearcher when necessary. (Chris Male). * SOLR-2193, SOLR-2565: You may now specify a 'soft' commit when committing. This will use Lucene's NRT feature to avoid guaranteeing documents are on stable storage in exchange for faster reopen times. There is also a new 'soft' autocommit tracker that can be configured. (Mark Miller, Robert Muir) * SOLR-2399: Updated Solr Admin interface. New look and feel with per core administration and many new options. (Stefan Matheis via ryan) * SOLR-1032: CSV handler now supports "literal.field_name=value" parameters. (Simon Rosenthal, ehatcher) * SOLR-2656: realtime-get, efficiently retrieves the latest stored fields for specified documents, even if they are not yet searchable (i.e. without reopening a searcher) (yonik) * SOLR-2703: Added support for Lucene's "surround" query parser. (Simon Rosenthal, ehatcher) * SOLR-2754: Added factories for several ranking algorithms: - BM25SimilarityFactory: Okapi BM25 - DFRSimilarityFactory: Divergence from Randomness models - IBSimilarityFactory: Information-based models - LMDirichletSimilarity: LM with Dirichlet smoothing - LMJelinekMercerSimilarity: LM with Jelinek-Mercer smoothing (David Mark Nemeskey, Robert Muir) * SOLR-2134 Trie* fields should support sortMissingLast=true, and deprecate Sortable* Field Types (Ryan McKinley, Mike McCandless, Uwe Schindler, Erick Erickson) * SOLR-2438 added MultiTermAwareComponent to the various classes to allow automatic lowercasing for multiterm queries (wildcards, regex, prefix, range, etc). You can now optionally specify a "multiterm" analyzer in our schema.xml, but Solr should "do the right thing" if you don't specify (Pete Sturge Erick Erickson, Mentoring from Seeley and Muir) * SOLR-2481: Add support for commitWithin in DataImportHandler (Sami Siren via yonik) * SOLR-2992: Add support for IndexWriter.prepareCommit() via prepareCommit=true on update URLs. (yonik) * SOLR-2906: Added LFU cache options to Solr. (Shawn Heisey via Erick Erickson) * SOLR-3069: Ability to add openSearcher=false to not open a searcher when doing a hard commit. commitWithin now only invokes a softCommit. (yonik) * SOLR-2802: New FieldMutatingUpdateProcessor and Factory to simplify the development of UpdateProcessors that modify field values of documents as they are indexed. Also includes several useful new implementations: - RemoveBlankFieldUpdateProcessorFactory - TrimFieldUpdateProcessorFactory - HTMLStripFieldUpdateProcessorFactory - RegexReplaceProcessorFactory - FieldLengthUpdateProcessorFactory - ConcatFieldUpdateProcessorFactory - FirstFieldValueUpdateProcessorFactory - LastFieldValueUpdateProcessorFactory - MinFieldValueUpdateProcessorFactory - MaxFieldValueUpdateProcessorFactory - TruncateFieldUpdateProcessorFactory - IgnoreFieldUpdateProcessorFactory (hossman, janhoy) * SOLR-3120: Optional post filtering for spatial queries bbox and geofilt for LatLonType. (yonik) * SOLR-2459: Expose LogLevel selection with a RequestHandler rather then servlet (Stefan Matheis, Upayavira, ryan) * SOLR-3134: Include shard info in distributed response when shards.info=true (Russell Black, ryan) * SOLR-2898: Support grouped faceting. (Martijn van Groningen) Additional Work: - SOLR-3406: Extended grouped faceting support to facet.query and facet.range parameters. (David Boychuck, Martijn van Groningen) * SOLR-2949: QueryElevationComponent is now supported with distributed search. (Mark Miller, yonik) * SOLR-3221: Added the ability to directly configure aspects of the concurrency and thread-pooling used within distributed search in solr. This allows for finer grained controlled and can be tuned by end users to target their own specific requirements. This builds on the work of the HttpCommComponent and uses the same configuration block to configure the thread pool. The default configuration has the same behaviour as solr 3.5, favouring throughput over latency. More information can be found on the wiki (http://wiki.apache.org/solr/SolrConfigXml) (Greg Bowyer) * SOLR-3278: Negative boost support to the Extended Dismax Query Parser Boost Query (bq). (James Dyer) * SOLR-3255: OpenExchangeRates.Org Exchange Rate Provider for CurrencyField (janhoy) * SOLR-3358: Logging events are captured and available from the /admin/logging request handler. (ryan) * SOLR-1535: PreAnalyzedField type provides a functionality to index (and optionally store) field content that was already processed and split into tokens using some external processing chain. Serialization format is pluggable, and defaults to JSON. (ab) * SOLR-3363: Consolidated Exceptions in Analysis Factories so they only throw InitalizationExceptions (Chris Male) * SOLR-2690: New support for a "TZ" request param which overrides the TimeZone used when rounding Dates in DateMath expressions for the entire request (all date range queries and date faceting is affected). The default TZ is still UTC. (David Schlotfeldt, hossman) * SOLR-3402: Analysis Factories are now configured with their Lucene Version throw setLuceneMatchVersion, rather than through the Map passed to init. Parsing and simple error checking for the Version is now done inside the code that creates the Analysis Factories. (Chris Male) * SOLR-3178: Optimistic locking. If a _version_ is provided with an update that does not match the version in the index, an HTTP 409 error (Conflict) will result. (Per Steffensen, yonik) * SOLR-139: Updateable documents. JSON Example: {"id":"mydoc", "f1":{"set":10}, "f2":{"add":20}} will result in field "f1" being set to 10, "f2" having an additional value of 20 added, and all other existing fields unchanged. All source fields must be stored for this feature to work correctly. (Ryan McKinley, Erik Hatcher, yonik) * SOLR-2857: Support XML,CSV,JSON, and javabin in a single RequestHandler and choose the correct ContentStreamLoader based on Content-Type header. This also deprecates the existing [Xml,JSON,CSV,Binary,Xslt]UpdateRequestHandler. (ryan) * SOLR-2585: Context-Sensitive Spelling Suggestions & Collations. This adds support for the "spellcheck.alternativeTermCount" & "spellcheck.maxResultsForSuggest" parameters, letting users receive suggestions even when all the queried terms exist in the dictionary. This differs from "spellcheck.onlyMorePopular" in that the suggestions need not consist entirely of terms with a greater document frequency than the queried terms. (James Dyer) * SOLR-2058: Edismax query parser to allow "phrase slop" to be specified per-field on the pf/pf2/pf3 parameters using optional "FieldName~slop^boost" syntax. The prior "FieldName^boost" syntax is still accepted. In such cases the value on the "ps" parameter serves as the default slop. (Ron Mayer via James Dyer) * SOLR-3495: New UpdateProcessors have been added to create default values for configured fields. These works similarly to the option in schema.xml, but are applied in the UpdateProcessorChain, so they may be used prior to other UpdateProcessors, or to generate a uniqueKey field value when using the DistributedUpdateProcessor (ie: SolrCloud) TimestampUpdateProcessorFactory UUIDUpdateProcessorFactory DefaultValueUpdateProcessorFactory (hossman) * SOLR-2993: Add WordBreakSolrSpellChecker to offer suggestions by combining adjacent query terms and/or breaking terms into multiple words. This spellchecker can be configured with a traditional checker (ie: DirectSolrSpellChecker). The results are combined and collations can contain a mix of corrections from both spellcheckers. (James Dyer) * SOLR-3508: Simplify JSON update format for deletes as well as allow version specification for optimistic locking. Examples: - {"delete":"myid"} - {"delete":["id1","id2","id3"]} - {"delete":{"id":"myid", "_version_":123456789}} (yonik) * SOLR-3211: Allow parameter overrides in conjunction with "spellcheck.maxCollationTries". To do so, use parameters starting with "spellcheck.collateParam." For instance, to override the "mm" parameter, specify "spellcheck.collateParam.mm". This is helpful in cases where testing spellcheck collations for result counts should use different parameters from the main query (James Dyer) * SOLR-2599: CloneFieldUpdateProcessorFactory provides similar functionality to schema.xml's declaration but as an update processor that can be combined with other processors in any order. (Jan Høydahl & hossman) * SOLR-3351: eDismax: ps2 and ps3 params (janhoy) * SOLR-3542: Add WeightedFragListBuilder for FVH and set it to default fragListBuilder in example solrconfig.xml. (Sebastian Lutze, koji) * SOLR-2396: Add ICUCollationField to contrib/analysis-extras, which is much more efficient than the Solr 3.x ICUCollationKeyFilterFactory, and also supports Locale-sensitive range queries. (rmuir) Optimizations ---------------------- * SOLR-1875: Per-segment field faceting for single valued string fields. Enable with facet.method=fcs, control the number of threads used with the "threads" local param on the facet.field param. This algorithm will only be faster in the presence of rapid index changes. (yonik) * SOLR-1904: When facet.enum.cache.minDf > 0 and the base doc set is a SortedIntSet, convert to HashDocSet for better performance. (yonik) * SOLR-2092: Speed up single-valued and multi-valued "fc" faceting. Typical improvement is 5%, but can be much greater (up to 10x faster) when facet.offset is very large (deep paging). (yonik) * SOLR-2193, SOLR-2565: The default Solr update handler has been improved so that it uses fewer locks, keeps the IndexWriter open rather than closing it on each commit (ie commits no longer wait for background merges to complete), works with SolrCore to provide faster 'soft' commits, and has an improved API that requires less instanceof special casing. (Mark Miller, Robert Muir) Additional Work: - SOLR-2697: commit and autocommit operations don't reset DirectUpdateHandler2.numDocsPending stats attribute. (Alexey Serba, Mark Miller) * SOLR-2950: The QueryElevationComponent now avoids using the FieldCache and looking up every document id (gsingers, yonik) Bug Fixes ---------------------- * SOLR-3139: Make ConcurrentUpdateSolrServer send UpdateRequest.getParams() as HTTP request params (siren) * SOLR-3165: Cannot use DIH in Solrcloud + Zookeeper (Alexey Serba, Mark Miller, siren) * SOLR-3068: Occasional NPE in ThreadDumpHandler (siren) * SOLR-2762: FSTLookup could return duplicate results or one results less than requested. (David Smiley, Dawid Weiss) * SOLR-2741: Bugs in facet range display in trunk (janhoy) * SOLR-1908: Fixed SignatureUpdateProcessor to fail to initialize on invalid config. Specifically: a signatureField that does not exist, or overwriteDupes=true with a signatureField that is not indexed. (hossman) * SOLR-1824: IndexSchema will now fail to initialize if there is a problem initializing one of the fields or field types. (hossman) * SOLR-1928: TermsComponent didn't correctly break ties for non-text fields sorted by count. (yonik) * SOLR-2107: MoreLikeThisHandler doesn't work with alternate qparsers. (yonik) * SOLR-2108: Fixed false positives when using wildcard queries on fields with reversed wildcard support. For example, a query of *zemog* would match documents that contain 'gomez'. (Landon Kuhn via Robert Muir) * SOLR-1962: SolrCore#initIndex should not use a mix of indexPath and newIndexPath (Mark Miller) * SOLR-2275: fix DisMax 'mm' parsing to be tolerant of whitespace (Erick Erickson via hossman) * SOLR-2193, SOLR-2565, SOLR-2651: SolrCores now properly share IndexWriters across SolrCore reloads. (Mark Miller, Robert Muir) Additional Work: - SOLR-2705: On reload, IndexWriterProvider holds onto the initial SolrCore it was created with. (Yury Kats, Mark Miller) * SOLR-2682: Remove addException() in SimpleFacet. FacetComponent no longer catches and embeds exceptions occurred during facet processing, it throws HTTP 400 or 500 exceptions instead. (koji) * SOLR-2654: Directorys used by a SolrCore are now closed when they are no longer used. (Mark Miller) * SOLR-2854: Now load URL content stream data (via stream.url) when called for during request handling, rather than loading URL content streams automatically regardless of use. (David Smiley and Ryan McKinley via ehatcher) * SOLR-2829: Fix problem with false-positives due to incorrect equals methods. (Yonik Seeley, Hossman, Erick Erickson. Marc Tinnemeyer caught the bug) * SOLR-2848: Removed 'instanceof AbstractLuceneSpellChecker' hacks from distributed spellchecking code, and added a merge() method to SolrSpellChecker instead. Previously if you extended SolrSpellChecker your spellchecker would not work in distributed fashion. (James Dyer via rmuir) * SOLR-2509: StringIndexOutOfBoundsException in the spellchecker collate when the term contains a hyphen. (Thomas Gambier caught the bug, Steffen Godskesen did the patch, via Erick Erickson) * SOLR-1730: Made it clearer when a core failed to load as well as better logging when the QueryElevationComponent fails to properly initialize (gsingers) * SOLR-1520: QueryElevationComponent now supports non-string ids (gsingers) * SOLR-3037: When using binary format in solrj the codec screws up parameters (Sami Siren, Jörg Maier via yonik) * SOLR-3062: A join in the main query was not respecting any filters pushed down to it via acceptDocs since LUCENE-1536. (Mike Hugo, yonik) * SOLR-3214: If you use multiple fl entries rather than a comma separated list, all but the first entry can be ignored if you are using distributed search. (Tomas Fernandez Lobbe via Mark Miller) * SOLR-3352: eDismax: pf2 should kick in for a query with 2 terms (janhoy) * SOLR-3361: ReplicationHandler "maxNumberOfBackups" doesn't work if backups are triggered on commit (James Dyer, Tomas Fernandez Lobbe) * SOLR-2605: fixed tracking of the 'defaultCoreName' in CoreContainer so that CoreAdminHandler could return consistent information regardless of wether there is a a default core name or not. (steffkes, hossman) * SOLR-3370: fixed CSVResponseWriter to respect globs in the 'fl' param (Keith Fligg via hossman) * SOLR-3436: Group count incorrect when not all shards are queried in the second pass. (Francois Perron, Martijn van Groningen) * SOLR-3454: Exception when using result grouping with main=true and using wt=javabin. (Ludovic Boutros, Martijn van Groningen) * SOLR-3446: Better errors when PatternTokenizerFactory is configured with an invalid pattern, and include the 'name' whenever possible in plugin init error messages. (hossman) * LUCENE-4075: Cleaner path usage in TestXPathEntityProcessor (Greg Bowyer via hossman) * SOLR-2923: IllegalArgumentException when using useFilterForSortedQuery on an empty index. (Adrien Grand via Mark Miller) * SOLR-2352: Fixed TermVectorComponent so that it will not fail if the fl param contains globs or psuedo-fields (hossman) * SOLR-3541: add missing solrj dependencies to binary packages. (Thijs Vonk via siren) * SOLR-3522: fixed parsing of the 'literal()' function (hossman) * SOLR-3548: Fixed a bug in the cachability of queries using the {!join} parser or the strdist() function, as well as some minor improvements to the hashCode implementation of {!bbox} and {!geofilt} queries. (hossman) * SOLR-3470: contrib/clustering: custom Carrot2 tokenizer and stemmer factories are respected now (Stanislaw Osinski, Dawid Weiss) * SOLR-3430: Added a new DIH test against a real SQL database. Fixed problems revealed by this new test related to the expanded cache support added to 3.6/SOLR-2382 (James Dyer) * SOLR-1958: When using the MailEntityProcessor, import would fail if fetchMailsSince was not specified. (Max Lynch via James Dyer) Other Changes ---------------------- * SOLR-1846: Eliminate support for the abortOnConfigurationError option. It has never worked very well, and in recent versions of Solr hasn't worked at all. (hossman) * SOLR-1889: The default logic for the 'mm' param of DismaxQParser and ExtendedDismaxQParser has been changed to be determined based on the effective value of the 'q.op' param (hossman) * SOLR-1946: Misc improvements to the SystemInfoHandler: /admin/system (hossman) * SOLR-2289: Tweak spatial coords for example docs so they are a bit more spread out (Erick Erickson via hossman) * SOLR-2288: Small tweaks to eliminate compiler warnings. primarily using Generics where applicable in method/object declatations, and adding @SuppressWarnings("unchecked") when appropriate (hossman) * SOLR-2375: Suggester Lookup implementations now store trie data and load it back on init. This means that large tries don't have to be rebuilt on every commit or core reload. (ab) * SOLR-2413: Support for returning multi-valued fields w/o tag in the XMLResponseWriter was removed. XMLResponseWriter only no longer work with values less then 2.2 (ryan) * SOLR-2423: FieldType argument changed from String to Object Conversion from SolrInputDocument > Object > Fieldable is now managed by FieldType rather then DocumentBuilder. (ryan) * SOLR-2461: QuerySenderListener and AbstractSolrEventListener are now public (hossman) * LUCENE-2995: Moved some spellchecker and suggest APIs to modules/suggest: HighFrequencyDictionary, SortedIterator, TermFreqIterator, and the suggester APIs and implementations. (rmuir) * SOLR-2576: Remove deprecated SpellingResult.add(Token, int). (James Dyer via rmuir) * LUCENE-3232: Moved MutableValue classes to new 'common' module. (Chris Male) * LUCENE-2883: FunctionQuery, DocValues (and its impls), ValueSource (and its impls) and BoostedQuery have been consolidated into the queries module. They can now be found at o.a.l.queries.function. * SOLR-2027: FacetField.getValues() now returns an empty list if there are no values, instead of null (Chris Male) * SOLR-1825: SolrQuery.addFacetQuery now enables facets automatically, like addFacetField (Chris Male) * SOLR-2663: FieldTypePluginLoader has been refactored out of IndexSchema and made public. (hossman) * SOLR-2331,SOLR-2691: Refactor CoreContainer's SolrXML serialization code and improve testing (Yury Kats, hossman, Mark Miller) * SOLR-2698: Enhance CoreAdmin STATUS command to return index size. (Yury Kats, hossman, Mark Miller) * SOLR-2654: The same Directory instance is now always used across a SolrCore so that it's easier to add other DirectoryFactory's without static caching hacks. (Mark Miller) * LUCENE-3286: 'luke' ant target has been disabled due to incompatibilities with XML queryparser location (Chris Male) * SOLR-1897: The data dir from the core descriptor should override the data dir from the solrconfig.xml rather than the other way round. (Mark Miller) * SOLR-2756: Maven configuration: Excluded transitive stax:stax-api dependency from org.codehaus.woodstox:wstx-asl dependency. (David Smiley via Steve Rowe) * SOLR-2588: Moved VelocityResponseWriter back to contrib module in order to remove it as a mandatory core dependency. (ehatcher) * SOLR-2862: More explicit lexical resources location logged if Carrot2 clustering extension is used. Fixed solr. impl. of IResource and IResourceLookup. (Dawid Weiss) * SOLR-1123: Changed JSONResponseWriter to now use application/json as its Content-Type by default. However the Content-Type can be overwritten and is set to text/plain in the example configuration. (Uri Boness, Chris Male) * SOLR-2607: Removed deprecated client/ruby directory, which included solr-ruby and flare. (ehatcher) * SOLR-3032: logOnce from SolrException logOnce and all the supporting structure is gone. abortOnConfugrationError is also gone as it is no longer referenced. Errors should be caught and logged at the top-most level or logged and NOT propagated up the chain. (Erick Erickson) * SOLR-2105: Remove support for deprecated "update.processor" (since 3.2), in favor of "update.chain" (janhoy) * SOLR-3005: Default QueryResponseWriters are now initialized via init() with an empty NamedList. (Gasol Wu, Chris Male) * SOLR-2607: Removed obsolete client/ folder (ehatcher, Eric Pugh, janhoy) * SOLR-3202, SOLR-3244: Dropping Support for JSP. New Admin UI is all client side (ryan, Aliaksandr Zhuhrou, Uwe Schindler) * SOLR-3159: Upgrade example and tests to run with Jetty 8 (ryan) * SOLR-3254: Upgrade Solr to Tika 1.1 (janhoy) * SOLR-3329: Dropped getSourceID() from SolrInfoMBean and using getClass().getPackage().getSpecificationVersion() for Version. (ryan) * SOLR-3302: Upgraded SLF4j to version 1.6.4 (hossman) * SOLR-3322: Add more context to IndexReaderFactory.newReader (ab) * SOLR-3343: Moved FastWriter, FileUtils, RegexFileFilter, RTimer and SystemIdResolver from org.apache.solr.common to org.apache.solr.util (Chris Male) * SOLR-3357: ResourceLoader.newInstance now accepts a Class representation of the expected instance type (Chris Male) * SOLR-3388: HTTP caching is now disabled by default for RequestUpdateHandlers. (ryan) * SOLR-3309: web.xml now specifies metadata-complete=true (which requires Servlet 2.5) to prevent servlet containers from scanning class annotations on startup. This allows for faster startup times on some servlet containers. (Bill Bell, hossman) * SOLR-1893: Refactored some common code from LRUCache and FastLRUCache into SolrCacheBase (Tomás Fernández Löbbe via hossman) * SOLR-3403: Deprecated Analysis Factories now log their own deprecation messages. No logging support is provided by Factory parent classes. (Chris Male) * SOLR-1258: PingRequestHandler is now directly configured with a "healthcheckFile" instead of looking for the legacy syntax. Filenames specified as relative paths have been fixed so that they are resolved against the data dir instead of the CWD of the java process. (hossman) * SOLR-3083: JMX beans now report Numbers as numeric values rather then String (Tagged Siteops, Greg Bowyer via ryan) * SOLR-2796: Due to low level changes to support SolrCloud, the uniqueKey field can no longer be populated via or in the schema.xml. * SOLR-3534: The Dismax and eDismax query parsers will fall back on the 'df' parameter when 'qf' is absent. And if neither is present nor the schema default search field then an exception will be thrown now. (dsmiley) * SOLR-3262: The "threads" feature of DIH is removed (deprecated in Solr 3.6) (James Dyer) * SOLR-3422: Refactored DIH internal data classes. All entities in data-config.xml must have a name (James Dyer) Documentation ---------------------- * SOLR-2232: Improved README info on solr.solr.home in examples (Eric Pugh and hossman) ================== 3.6.1 ================== More information about this release, including any errata related to the release notes, upgrade instructions, or other changes may be found online at: https://wiki.apache.org/solr/Solr3.6.1 Bug Fixes * LUCENE-3969: Throw IAE on bad arguments that could cause confusing errors in PatternTokenizer. CommonGrams populates PositionLengthAttribute correctly. (Uwe Schindler, Mike McCandless, Robert Muir) * SOLR-3361: ReplicationHandler "maxNumberOfBackups" doesn't work if backups are triggered on commit (James Dyer, Tomas Fernandez Lobbe) * SOLR-3375: Fix charset problems with HttpSolrServer (Roger Håkansson, yonik, siren) * SOLR-3436: Group count incorrect when not all shards are queried in the second pass. (Francois Perron, Martijn van Groningen) * SOLR-3454: Exception when using result grouping with main=true and using wt=javabin. (Ludovic Boutros, Martijn van Groningen) * SOLR-3489: Config file replication less error prone (Jochen Just via janhoy) * SOLR-3477: SOLR does not start up when no cores are defined (Tomás Fernández Löbbe via tommaso) * SOLR-3470: contrib/clustering: custom Carrot2 tokenizer and stemmer factories are respected now (Stanislaw Osinski, Dawid Weiss) * SOLR-3360: More DIH bug fixes for the deprecated "threads" parameter. (Mikhail Khludnev, Claudio R, via James Dyer) * SOLR-3430: Added a new DIH test against a real SQL database. Fixed problems revealed by this new test related to the expanded cache support added to 3.6/SOLR-2382 (James Dyer) * SOLR-3336: SolrEntityProcessor substitutes most variables at query time. (Michael Kroh, Lance Norskog, via Martijn van Groningen) ================== 3.6.0 ================== More information about this release, including any errata related to the release notes, upgrade instructions, or other changes may be found online at: https://wiki.apache.org/solr/Solr3.6 Upgrading from Solr 3.5 ---------------------- * SOLR-2983: As a consequence of moving the code which sets a MergePolicy from SolrIndexWriter to SolrIndexConfig, (custom) MergePolicies should now have an empty constructor; thus an IndexWriter should not be passed as constructor parameter but instead set using the setIndexWriter() method. * As doGet() methods in SimplePostTool was changed to static, the client applications of this class need to be recompiled. * In Solr version 3.5 and earlier, HTMLStripCharFilter had known bugs in the character offsets it provided, triggering e.g. exceptions in highlighting. HTMLStripCharFilter has been re-implemented, addressing this and other issues. See the entry for LUCENE-3690 in the Bug Fixes section below for a detailed list of changes. For people who depend on the behavior of HTMLStripCharFilter in Solr version 3.5 and earlier: the old implementation (bugs and all) is preserved as LegacyHTMLStripCharFilter. * As of Solr 3.6, the and sections of solrconfig.xml are deprecated and replaced with a new section. Read more in SOLR-1052 below. * SOLR-3040: The DIH's admin UI (dataimport.jsp) now requires DIH request handlers to start with a '/'. (dsmiley) * SOLR-3161: is now the default. An existing config will probably work as-is because handleSelect was explicitly enabled in default configs. HandleSelect makes /select work as well as enables the 'qt' parameter. Instead, consider explicitly configuring /select as is done in the example solrconfig.xml, and register your other search handlers with a leading '/' which is a recommended practice. (David Smiley, Erik Hatcher) * SOLR-3161: Don't use the 'qt' parameter with a leading '/'. It probably won't work in 4.0 and it's now limited in 3.6 to SearchHandler subclasses that aren't lazy-loaded. * SOLR-2724: Specifying and in schema.xml is now considered deprecated. Instead you are encouraged to specify these via the "df" and "q.op" parameters in your request handler definition. (David Smiley) * Bugs found and fixed in the SignatureUpdateProcessor that previously caused some documents to produce the same signature even when the configured fields contained distinct (non-String) values. Users of SignatureUpdateProcessor are strongly advised that they should re-index as document signatures may have now changed. (see SOLR-3200 & SOLR-3226 for details) New Features ---------------------- * SOLR-2020: Add Java client that uses Apache Http Components http client (4.x). (Chantal Ackermann, Ryan McKinley, Yonik Seeley, siren) * SOLR-2854: Now load URL content stream data (via stream.url) when called for during request handling, rather than loading URL content streams automatically regardless of use. (David Smiley and Ryan McKinley via ehatcher) * SOLR-2904: BinaryUpdateRequestHandler should be able to accept multiple update requests from a stream (shalin) * SOLR-1565: StreamingUpdateSolrServer supports RequestWriter API and therefore, javabin update format (shalin) * SOLR-2438 added MultiTermAwareComponent to the various classes to allow automatic lowercasing for multiterm queries (wildcards, regex, prefix, range, etc). You can now optionally specify a "multiterm" analyzer in our schema.xml, but Solr should "do the right thing" if you don't specify (Pete Sturge Erick Erickson, Mentoring from Seeley and Muir) * SOLR-2919: Added support for localized range queries when the analysis chain uses CollationKeyFilter or ICUCollationKeyFilter. (Michael Sokolov, rmuir) * SOLR-2982: Added BeiderMorseFilterFactory for Beider-Morse (BMPM) phonetic encoder. Upgrades commons-codec to version 1.6 (Brooke Schreier Ganz, rmuir) * SOLR-1843: A new "rootName" attribute is now available when configuring in solrconfig.xml. If this attribute is set, Solr will use it as the root name for all MBeans Solr exposes via JMX. The default root name is "solr" followed by the core name. (Constantijn Visinescu, hossman) * SOLR-2906: Added LFU cache options to Solr. (Shawn Heisey via Erick Erickson) * SOLR-3036: Ability to specify overwrite=false on the URL for XML updates. (Sami Siren via yonik) * SOLR-2603: Add the encoding function for alternate fields in highlighting. (Massimo Schiavon, koji) * SOLR-1729: Evaluation of NOW for date math is done only once per request for consistency, and is also propagated to shards in distributed search. Adding a parameter NOW= to the request will override the current time. (Peter Sturge, yonik, Simon Willnauer) * SOLR-1709: Distributed support for Date and Numeric Range Faceting (Peter Sturge, David Smiley, hossman, Simon Willnauer) * SOLR-3054, LUCENE-3671: Add TypeTokenFilterFactory that creates TypeTokenFilter that filters tokens based on their TypeAttribute. (Tommaso Teofili via Uwe Schindler) * LUCENE-3305, SOLR-3056: Added Kuromoji morphological analyzer for Japanese. See the 'text_ja' fieldtype in the example to get started. (Christian Moen, Masaru Hasegawa via Robert Muir) * SOLR-1860: StopFilterFactory, CommonGramsFilterFactory, and CommonGramsQueryFilterFactory can optionally read stopwords in Snowball format (specify format="snowball"). (Robert Muir) * SOLR-3105: ElisionFilterFactory optionally allows the parameter ignoreCase (default=false). (Robert Muir) * LUCENE-3714: Add WFSTLookupFactory, a suggester that uses a weighted FST for more fine-grained suggestions. (Mike McCandless, Dawid Weiss, Robert Muir) * SOLR-3143: Add SuggestQueryConverter, a QueryConverter intended for auto-suggesters. (Robert Muir) * SOLR-3033: ReplicationHandler's backup command now supports a 'maxNumberOfBackups' init param that can be used to delete all but the most recent N backups. (Torsten Krah, James Dyer) * SOLR-2202: Currency FieldType, whith support for currencies and exchange rates (Greg Fodor & Andrew Morrison via janhoy, rmuir, Uwe Schindler) * SOLR-3026: eDismax: Locking down which fields can be explicitly queried (user fields aka uf) (janhoy, hossmann, Tomás Fernández Löbbe) * SOLR-2826: URLClassify Update Processor (janhoy) * SOLR-2764: Create a NorwegianLightStemmer and NorwegianMinimalStemmer (janhoy) * SOLR-3221: Added the ability to directly configure aspects of the concurrency and thread-pooling used within distributed search in solr. This allows for finer grained controlled and can be tuned by end users to target their own specific requirements. This builds on the work of the HttpCommComponent and uses the same configuration block to configure the thread pool. The default configuration has the same behaviour as solr 3.5, favouring throughput over latency. More information can be found on the wiki (http://wiki.apache.org/solr/SolrConfigXml) (Greg Bowyer) * SOLR-2001: The query component will substitute an empty query that matches no documents if the query parser returns null. This also prevents an exception from being thrown by the default parser if "q" is missing. (yonik) - SOLR-435: if q is "" then it's also acceptable. (dsmiley, hoss) * SOLR-2919: Added parametric tailoring options to ICUCollationKeyFilterFactory. These can be used to customize range query/sort behavior, for example to support numeric collation, ignore punctuation/whitespace, ignore accents but not case, control whether upper/lowercase values are sorted first, etc. (rmuir) * SOLR-2346: Add a chance to set content encoding explicitly via content type of stream for extracting request handler. This is convenient when Tika's auto detector cannot detect encoding, especially the text file is too short to detect encoding. (koji) * SOLR-1499: Added SolrEntityProcessor that imports data from another Solr core or instance based on a specified query. (Lance Norskog, Erik Hatcher, Pulkit Singhal, Ahmet Arslan, Luca Cavanna, Martijn van Groningen) * SOLR-3190: Minor improvements to SolrEntityProcessor. Add more consistency between solr parameters and parameters used in SolrEntityProcessor and ability to specify a custom HttpClient instance. (Luca Cavanna via Martijn van Groningen) * SOLR-2382: Added pluggable cache support to DIH so that any Entity can be made cache-able by adding the "cacheImpl" parameter. Include "SortedMapBackedCache" to provide in-memory caching (as previously this was the only option when using CachedSqlEntityProcessor). Users can provide their own implementations of DIHCache for other caching strategies. Deprecate CachedSqlEntityProcessor in favor of specifing "cacheImpl" with SqlEntityProcessor. Make SolrWriter implement DIHWriter and allow the possibility of pluggable Writers (DIH writing to something other than Solr). (James Dyer, Noble Paul) Optimizations ---------------------- * SOLR-1931: Speedup for LukeRequestHandler and admin/schema browser. New parameter reportDocCount defaults to 'false'. Old behavior still possible by specifying this as 'true' (Erick Erickson) * SOLR-3012: Move System.getProperty("type") in postData() to main() and add type argument so that the client applications of SimplePostTool can set content type via method argument. (koji) * SOLR-2888: FSTSuggester refactoring: internal storage is now UTF-8, external sorting (on disk) prevents OOMs even with large data sets (the bottleneck is now FST construction), code cleanups and API cleanups. (Dawid Weiss, Robert Muir) Bug Fixes ---------------------- * SOLR-3187 SystemInfoHandler leaks filehandles (siren) * LUCENE-3820: Fixed invalid position indexes by reimplementing PatternReplaceCharFilter. This change also drops real support for boundary characters -- all input is prebuffered for pattern matching. (Dawid Weiss) * SOLR-3068: Fixed NPE in ThreadDumpHandler (siren) * SOLR-2912: Fixed File descriptor leak in ShowFileRequestHandler (Michael Ryan, shalin) * SOLR-2819: Improved speed of parsing hex entities in HTMLStripCharFilter (Bernhard Berger, hossman) * SOLR-2509: StringIndexOutOfBoundsException in the spellchecker collate when the term contains a hyphen. (Thomas Gambier caught the bug, Steffen Godskesen did the patch, via Erick Erickson) * SOLR-2955: Fixed IllegalStateException when querying with group.sort=score desc in sharded environment. (Steffen Elberg Godskesen, Martijn van Groningen) * SOLR-2956: Fixed inconsistencies in the flags (and flag key) reported by the LukeRequestHandler (hossman) * SOLR-1730: Made it clearer when a core failed to load as well as better logging when the QueryElevationComponent fails to properly initialize (gsingers) * SOLR-1520: QueryElevationComponent now supports non-string ids (gsingers) * SOLR-3024: Fixed JSONTestUtil.matchObj, in previous releases it was not respecting the 'delta' arg (David Smiley via hossman) * SOLR-2542: Fixed DIH Context variables which were broken for all scopes other then SCOPE_ENTITY (Linbin Chen & Frank Wesemann via hossman) * SOLR-3042: Fixed Maven Jetty plugin configuration. (David Smiley via Steve Rowe) * SOLR-2970: CSV ResponseWriter returns fields defined as stored=false in schema (janhoy) * LUCENE-3690, LUCENE-2208, SOLR-882, SOLR-42: Re-implemented HTMLStripCharFilter as a JFlex-generated scanner and moved it to lucene/contrib/analyzers/common/. See below for a list of bug fixes and other changes. To get the same behavior as HTMLStripCharFilter in Solr version 3.5 and earlier (including the bugs), use LegacyHTMLStripCharFilter, which is the previous implementation. Behavior changes from the previous version: - Known offset bugs are fixed. - The "Mark invalid" exceptions reported in SOLR-1283 are no longer triggered (the bug is still present in LegacyHTMLStripCharFilter). - The character entity "'" is now always properly decoded. - More cases of