Apache Solr Release Notes Introduction ------------ Apache Solr is an open source enterprise search server based on the Apache Lucene Java search library, with XML/HTTP and JSON APIs, hit highlighting, faceted search, caching, replication, and a web administration interface. It runs in a Java servlet container such as Tomcat. See http://lucene.apache.org/solr for more information. Getting Started --------------- You need a Java 1.7 VM or later installed. In this release, there is an example Solr server including a bundled servlet container in the directory named "example". See the tutorial at http://lucene.apache.org/solr/tutorial.html $Id$ ================== 5.0.0 ================== Versions of Major Components --------------------- Apache Tika 1.3 Carrot2 3.6.2 Velocity 1.7 and Velocity Tools 2.0 Apache UIMA 2.3.1 Apache ZooKeeper 3.4.5 Upgrading from Solr 4.x ---------------------- TBD... Detailed Change List ---------------------- Other Changes ---------------------- * SOLR-4622: Hardcoded SolrCloud defaults for hostContext and hostPort that were deprecated in 4.3 have been removed completely. (hossman) * SOLR-4792: Stop shipping a .war. (Robert Muir) ================== 4.4.0 ================== Versions of Major Components --------------------- Apache Tika 1.3 Carrot2 3.6.2 Velocity 1.7 and Velocity Tools 2.0 Apache UIMA 2.3.1 Apache ZooKeeper 3.4.5 Upgrading from Solr 4.3.0 ---------------------- * TieredMergePolicy and the various subtypes of LogMergePolicy no longer have an explicit "setUseCompoundFile" method. Instead the behavior of new segments is determined by the IndexWriter configuration, and the MergePolicy is only consulted to determine if merge segements should use the compound file format (based on the value of "setNoCFSRatio"). If you have explicitly configured one of these classes using <mergePolicy> and include an init arg like this... <bool name="useCompoundFile">true</bool> ...this will now be treated as if you specified... <useCompoundFile>true</useCompoundFile> ...directly on the <indexConfig> (overriding any value already set using that syntax) and a warning will be logged to updated your configuration. Users with an explicitly declared <mergePolicy> are encouraged to review the current javadocs for their MergePolicy subclass and review their configured options carefully. See SOLR-4941, SOLR-4934 and LUCENE-5038 for more information. * SOLR-4778: The signature of LogWatcher.registerListener has changed, from (ListenerConfig, CoreContainer) to (ListenerConfig). Users implementing their own LogWatcher classes will need to change their code accordingly. * LUCENE-5063: ByteField and ShortField have been deprecated and will be removed in 5.0. If you are still using these field types, you should migrate your fields to TrieIntField. Detailed Change List ---------------------- New Features ---------------------- * SOLR-3251: Dynamically add fields to schema. (Steve Rowe, Robert Muir, yonik) * SOLR-4761, SOLR-4976: Add option to plugin a merged segment warmer into solrconfig.xml. Info about segments warmed in the background is available via infostream. (Mark Miller, Ryan Ernst, Mike McCandless, Robert Muir) * SOLR-3240: Add "spellcheck.collateMaxCollectDocs" option so that when testing potential Collations against the index, SpellCheckComponent will only collect n documents, thereby estimating the hit-count. This is a performance optimization in cases where exact hit-counts are unnecessary. Also, when "collateExtendedResults" is false, this optimization is always made (James Dyer). * SOLR-4785: New MaxScoreQParserPlugin returning max() instead of sum() of terms (janhoy) * SOLR-4234: Add support for binary files in ZooKeeper. (Eric Pugh via Mark Miller) * SOLR-4048: Add findRecursive method to NamedList. (Shawn Heisey) * SOLR-4228: SolrJ's SolrPing object has new methods for ping, enable, and disable. (Shawn Heisey, hossman, Steve Rowe) * SOLR-4893: Extend FieldMutatingUpdateProcessor.ConfigurableFieldNameSelector to enable checking whether a field matches any schema field. To select field names that don't match any fields or dynamic fields in the schema, add <bool name="fieldNameMatchesSchemaField">false</bool> to an update processor's configuration in solrconfig.xml. (Steve Rowe, hossman) * SOLR-4921: Admin UI now supports adding documents to Solr (gsingers, steffkes) * SOLR-4916: Add support to write and read Solr index files and transaction log files to and from HDFS. (phunt, Mark Miller, Greg Chanan) * SOLR-4892: Add FieldMutatingUpdateProcessorFactory subclasses Parse{Date,Integer,Long,Float,Double,Boolean}UpdateProcessorFactory. These factories have a default selector that matches all fields that either don’t match any schema field, or are in the schema with the corresponding typeClass. If they see a value that is not a CharSequence, or can't parse the value, they leave it as is. For multi-valued fields, these processors will not convert any values unless all are first successfully parsed, or already are instances of the target class. Ordering the processors, e.g. [Boolean, Long, Double, Date] will allow e.g. values ["2", "5", "8.6"] to be left alone by the Boolean and Long processors, but then converted by the Double processor. (Steve Rowe, hossman) * SOLR-4972: Add PUT command to ZkCli tool. (Roman Shaposhnik via Mark Miller) * SOLR-4973: Adding getter method for defaultCollection on CloudSolrServer. (Furkan KAMACI via Mark Miller) Bug Fixes ---------------------- * SOLR-4333: edismax parser to not double-escape colons if already escaped by the client application (James Dyer, Robert J. van der Boon) * SOLR-4776: Solrj doesn't return "between" count in range facets (Philip K. Warren via shalin) * SOLR-4616: HitRatio on caches is now exposed over JMX MBeans as a float. (Greg Bowyer) * SOLR-4803: Fixed core discovery mode (ie: new style solr.xml) to treat 'collection1' as the default core name. (hossman) * SOLR-4790: Throw an error if a core has the same name as another core, both old and new style solr.xml * SOLR-4842: Fix facet.field local params from affecting other facet.field's. (ehatcher, hossman) * SOLR-4814: If a SolrCore cannot be created it should remove any information it published about itself from ZooKeeper. (Mark Miller) * SOLR-4863: Removed non-existent attribute sourceId from dynamic JMX stats to fix AttributeNotFoundException (suganuma, hossman via shalin) * SOLR-4891: JsonLoader should preserve field value types from the JSON content stream. (Steve Rowe) * SOLR-4805: SolreCore#reload should not call preRegister and publish a DOWN state to ZooKeeper. (Mark Miller, Jared Rodriguez) * SOLR-4899: When reconnecting after ZooKeeper expiration, we need to be willing to wait forever, not just for 30 seconds. (Mark Miller) * SOLR-4920: JdbcDataSource incorrectly suppresses exceptions when retrieving a connection from a JNDI context and falls back to trying to use DriverManager to obtain a connection. Additionally, if a SQLException is thrown while initializing a connection, such as in setAutoCommit(), the connection will not be closed. (Chris Eldredge via shalin) * SOLR-4915: The root cause should be returned to the user when a SolrCore create call fails. (Mark Miller) * SOLR-4925 : Collection create throws NPE when 'numShards' param is missing (Noble Paul) * SOLR-4910: persisting solr.xml is broken. More stringent testing of persistence fixed up a number of issues and several bugs with persistence. Among them are > don't persisting implicit properties > should persist zkHost in the <solr> tag (user's list) > reloading a core that has transient="true" returned an error. reload should load a transient core if it's not yet loaded. > No longer persisting loadOnStartup or transient core properties if they were not specified in the original solr.xml > Testing flushed out the fact that you couldn't swap a core marked transient=true loadOnStartup=false because it hadn't been loaded yet. > SOLR-4862, CREATE fails to persist schema, config, and dataDir > SOLR-4363, not persisting coreLoadThreads in <solr> tag > SOLR-3900, logWatcher properties not persisted > SOLR-4850, cores defined as loadOnStartup=true, transient=false can't be searched (Erick Erickson) * SOLR-4923: Commits to non leaders as part of a request that also contain updates can execute out of order. (hossman, Ricardo Merizalde, Mark Miller) * SOLR-4932: persisting solr.xml saves some parameters it shouldn't when they weren't defined in the original. Benign since the default values are saved, but still incorrect. (Erick Erickson, thanks Shawn Heisey for helping test!) * SOLR-4934, SOLR-4941: Fix handling of <mergePolicy> init arg "useCompoundFile" needed after changes in LUCENE-5038 (hossman) * SOLR-4456: Admin UI: Displays dashboard even if Solr is down (steffkes) * SOLR-4949: UI Analysis page dropping characters from input box (steffkes) * SOLR-4960: Fix race conditions in shutdown of CoreContainer and getCore that could cause a request to attempt to use a core that has shut down. (yonik) * SOLR-4926: Fixed rare replication bug that normally only manifested when using compound file format. (yonik, Mark Miller) * SOLR-4974: Outgrowth of SOLR-4960 that includes transient cores and pending cores (Erick Erickson) Optimizations ---------------------- * SOLR-4923: Commit to all nodes in a collection in parallel rather than locally and then to all other nodes. (hossman, Ricardo Merizalde, Mark Miller) * SOLR-3838: Admin UI - Multiple filter queries are not supported in Query UI (steffkes) * SOLR-4719 : Admin UI - Default to wt=json on Query-Screen (steffkes) * SOLR-4611: Admin UI - Analysis-Urls with empty parameters create empty result table (steffkes) * SOLR-4955: Admin UI - Show address bar on top for Schema + Config (steffkes) Other Changes ---------------------- * SOLR-4737: Update Guava to 14.0.1 (Mark Miller) * SOLR-2079: Add option to pass HttpServletRequest in the SolrQueryRequest context map. (Tomás Fernández Löbbe via Robert Muir) * SOLR-4738: Update Jetty to 8.1.10.v20130312 (Mark Miller, Robert Muir) * SOLR-4749: Clean up and refactor CoreContainer code around solr.xml and SolrCore management. (Mark Miller) * SOLR-4547: Move logging of filenames on commit from INFO to DEBUG. (Shawn Heisey, hossman) * SOLR-4757: Change the example to use the new solr.xml format and core discovery by directory structure. (Mark Miller) * SOLR-4759: Velocity (/browse) template cosmetic cleanup. (Mark Bennett, ehatcher) * SOLR-4778: LogWatcher init code moved out of CoreContainer (Alan Woodward) * SOLR-4784: Make class LuceneQParser public (janhoy) * SOLR-4448: Allow the solr internal load balancer to be more easily pluggable. (Philip Hoy via Robert Muir) * SOLR-4224: Refactor JavaBinCodec input stream definition to enhance reuse. (phunt via Mark Miller) * SOLR-4931: SolrDeletionPolicy onInit and onCommit methods changed to override exact signatures (with generics) from IndexDeletionPolicy (shalin) * SOLR-4942: test improvements to randomize use of compound files (hosman) * SOLR-4966: CSS, JS and other files in webapp without license (uschindler, steffkes) ================== 4.3.1 ================== Versions of Major Components --------------------- Apache Tika 1.3 Carrot2 3.6.2 Velocity 1.7 and Velocity Tools 2.0 Apache UIMA 2.3.1 Apache ZooKeeper 3.4.5 Detailed Change List ---------------------- Bug Fixes ---------------------- * SOLR-4795: Sub shard leader should not accept any updates from parent after it goes active (shalin) * SOLR-4798: shard splitting does not respect the router for the collection when executing the index split. One effect of this is that documents may be placed in the wrong shard when the default compositeId router is used in conjunction with IDs containing "!". (yonik) * SOLR-4797: Shard splitting creates sub shards which have the wrong hash range in cluster state. This happens when numShards is not a power of two and router is compositeId. (shalin) * SOLR-4791: solr.xml sharedLib does not work in 4.3.0 (Ryan Ernst, Jan Høydahl via Erick Erickson) * SOLR-4806: Shard splitting does not abort if WaitForState times out (shalin) * SOLR-4807: The zkcli script now works with log4j. The zkcli.bat script was broken on Windows in 4.3.0, now it works. (Shawn Heisey) * SOLR-4813: Fix SynonymFilterFactory to allow init parameters for tokenizer factory used when parsing synonyms file. (Shingo Sasaki, hossman) * SOLR-4829: Fix transaction log leaks (a failure to clean up some old logs) on a shard leader, or when unexpected exceptions are thrown during log recovery. (Steven Bower, Mark Miller, yonik) * SOLR-4751: Fix replication problem of files in sub directory of conf directory. (Minoru Osuka via Koji) * SOLR-4741: Deleting a collection should set DELETE_DATA_DIR to true. (Mark Miller) * SOLR-4752: There are some minor bugs in the Collections API parameter validation. (Mark Miller) * SOLR-4563: RSS DIH-example not working (janhoy) * SOLR-4796: zkcli.sh should honor JAVA_HOME (Roman Shaposhnik via Mark Miller) * SOLR-4734: Leader election fails with an NPE if there is no UpdateLog. (Mark Miller, Alexander Eibner) * SOLR-4868: Setting the log level for the log4j root category results in adding a new category, the empty string. (Shawn Heisey) * SOLR-4855: DistributedUpdateProcessor doesn't check for peer sync requests (shalin) * SOLR-4867: Admin UI - setting loglevel on root throws RangeError (steffkes) * SOLR-4870: RecentUpdates.update() does not increment numUpdates loop counter (Alexey Kudinov via shalin) * SOLR-4877, LUCENE-5023: Removed SolrIndexSearcher#getDocSetNC()'s special case for handling TermQuery to prevent NullPointerException if reader does not have fields. (Bao Yang Yang, Uwe Schindler) * SOLR-4881: Fix DocumentAnalysisRequestHandler to correctly use EmptyEntityResolver to prevent loading of external entities like UpdateRequestHandler does. (Hossman, Uwe Schindler) * SOLR-4858: SolrCore reloading was broken when the UpdateLog was enabled. (Hossman, Anshum Gupta, Alexey Serba, Mark Miller, yonik) * SOLR-4853: Fixed SolrJettyTestBase so it may be reused by end users (hossman) * SOLR-4744: Update failure on sub shard is not propagated to clients by parent shard (Anshum Gupta, yonik, shalin) Other Changes ---------------------- * SOLR-4760: Include core name in logs when loading schema. (Shawn Heisey) ================== 4.3.0 ================== Versions of Major Components --------------------- Apache Tika 1.3 Carrot2 3.6.2 Velocity 1.7 and Velocity Tools 2.0 Apache UIMA 2.3.1 Apache ZooKeeper 3.4.5 Upgrading from Solr 4.2.0 ---------------------- * In the schema REST API, the output path for copyFields and dynamicFields has been changed from all lowercase "copyfields" and "dynamicfields" to camelCase "copyFields" and "dynamicFields", respectively, to align with all other schema REST API outputs, which use camelCase. The URL format remains the same: all resource names are lowercase. See SOLR-4623 for details. * Slf4j/logging jars are no longer included in the Solr webapp. All logging jars are now in example/lib/ext. Changing logging impls is now as easy as updating the jars in this folder with those necessary for the logging impl you would like. If you are using another webapp container, these jars will need to go in the corresponding location for that container. In conjunction, the dist-excl-slf4j and dist-war-excl-slf4 build targets have been removed since they are redundent. See the Slf4j documentation, SOLR-3706, and SOLR-4651 for more details. * The hardcoded SolrCloud defaults for 'hostContext="solr"' and 'hostPort="8983"' have been deprecated and will be removed in Solr 5.0. Existing solr.xml files that do not have these options explicitly specified should be updated accordingly. See SOLR-4622 for more details. Detailed Change List ---------------------- New Features ---------------------- * SOLR-4648 PreAnalyzedUpdateProcessorFactory allows using the functionality of PreAnalyzedField with other field types. See javadoc for details and examples. (Andrzej Bialecki) * SOLR-4623: Provide REST API read access to all elements of the live schema. Add a REST API request to return the entire live schema, in JSON, XML, and schema.xml formats. Move REST API methods from package org.apache.solr.rest to org.apache.solr.rest.schema, and rename base functionality REST API classes to remove the current schema focus, to prepare for other non-schema REST APIs. Change output path for copyFields and dynamicFields from "copyfields" and "dynamicfields" (all lowercase) to "copyFields" and "dynamicFields", respectively, to align with all other REST API outputs, which use camelCase. (Steve Rowe) * SOLR-4658: In preparation for REST API requests that can modify the schema, a "managed schema" is introduced. Add '<schemaFactory class="ManagedSchemaFactory" mutable="true"/>' to solrconfig.xml in order to use it, and to enable schema modifications via REST API requests. (Steve Rowe, Robert Muir) * SOLR-4656: Added two new highlight parameters, hl.maxMultiValuedToMatch and hl.maxMultiValuedToExamine. maxMultiValuedToMatch stops looking for snippets after finding the specified number of matches, no matter how far into the multivalued field you've gone. maxMultiValuedToExamine stops looking for matches after the specified number of multiValued entries have been examined. If both are specified, the limit hit first stops the loop. Also this patch cuts down on the copying of the document entries during highlighting. These optimizations are probably unnoticeable unless there are a large number of entries in the multiValued field. Conspicuously, this will prevent the "best" match from being found if it appears later in the MV list than the cutoff specified by either of these params. (Erick Erickson) * SOLR-4675: Improve PostingsSolrHighlighter to support per-field/query-time overrides and add additional configuration parameters. See the javadocs for more details and examples. (Robert Muir) * SOLR-3755: A new collections api to add additional shards dynamically by splitting existing shards. (yonik, Anshum Gupta, shalin) * SOLR-4530: DIH: Provide configuration to use Tika's IdentityHtmlMapper (Alexandre Rafalovitch via shalin) * SOLR-4662: Discover SolrCores by directory structure rather than defining them in solr.xml. Also, change the format of solr.xml to be closer to that of solrconfig.xml. This version of Solr will ship the example in the old style, but you can manually try the new style. Solr 4.4 will ship with the new style, and Solr 5.0 will remove support for the old style. (Erick Erickson, Mark Miller) Additional Work: - SOLR-4347: Ensure that newly-created cores via Admin handler are persisted in solr.xml (Erick Erickson) - SOLR-1905: Cores created by the admin request handler should be persisted to solr.xml. Also fixed a problem whereby properties like solr.solr.datadir would be persisted to solr.xml. Also, cores that didn't happen to be loaded were not persisted. (Erick Erickson) * SOLR-4717/SOLR-1351: SimpleFacets now work with localParams allowing faceting on the same field multiple ways (ryan, Uri Boness) * SOLR-4671: CSVResponseWriter now supports pseudo fields. (ryan, nihed mbarek) * SOLR-4358: HttpSolrServer sends the stream name and exposes 'useMultiPartPost' (Karl Wright via ryan) Bug Fixes ---------------------- * SOLR-4543: setting shardHandlerFactory in solr.xml/solr.properties does not work. (Ryan Ernst, Robert Muir via Erick Erickson) * SOLR-4634: Fix scripting engine tests to work with Java 8's "Nashorn" Javascript implementation. (Uwe Schindler) * SOLR-4636: If opening a reader fails for some reason when opening a SolrIndexSearcher, a Directory can be left unreleased. (Mark Miller) * SOLR-4405: Admin UI - admin-extra files are not rendered into the core-menu (steffkes) * SOLR-3956: Fixed group.facet=true to work with negative facet.limit (Chris van der Merwe, hossman) * SOLR-4650: copyField doesn't work with source globs that don't match any explicit or dynamic fields. This regression was introduced in Solr 4.2. (Daniel Collins, Steve Rowe) * SOLR-4641: Schema now throws exception on illegal field parameters. (Robert Muir) * SOLR-3758: Fixed SpellCheckComponent to work consistently with distributed grouping (James Dyer) * SOLR-4652: Fix broken behavior with shared libraries in resource loader for solr.xml plugins. (Ryan Ernst, Robert Muir, Uwe Schindler) * SOLR-4664: ZkStateReader should update aliases on construction. (Mark Miller, Elodie Sannier) * SOLR-4682: CoreAdminRequest.mergeIndexes can not merge multiple cores or indexDirs. (Jason.D.Cao via shalin) * SOLR-4581: When faceting on numeric fields in Solr 4.2, negative values (constraints) were sorted incorrectly. (Alexander Buhr, shalin, yonik) * SOLR-4699: The System admin handler should not assume a file system based data directory location. (Mark Miller) * SOLR-4695: Fix core admin SPLIT action to be useful with non-cloud setups (shalin) * SOLR-4680: Correct example spellcheck configuration's queryAnalyzerFieldType and use "text" field instead of narrower "name" field (ehatcher, Mark Bennett) * SOLR-4702: Fix example /browse "Did you mean?" suggestion feature. (ehatcher, Mark Bennett) * SOLR-4710: You cannot delete a collection fully from ZooKeeper unless all nodes are up and functioning correctly. (Mark Miller) * SOLR-4487: SolrExceptions thrown by HttpSolrServer will now contain the proper HTTP status code returned by the remote server, even if that status code is not something Solr itself returned -- eg: from the Servlet Container, or an intermediate HTTP Proxy (hossman) * SOLR-4661: Admin UI Replication details now correctly displays the current replicable generation/version of the master. (hossman) * SOLR-4716,SOLR-4584: SolrCloud request proxying does not work on Tomcat and perhaps other non Jetty containers. (Po Rui, Yago Riveiro via Mark Miller) * SOLR-4746: Distributed grouping used a NamedList instead of a SimpleOrderedMap for the top level group commands, causing output formatting differences compared to non-distributed grouping. (yonik) * SOLR-4705: Fixed bug causing NPE when querying a single replica in SolrCloud using the shards param (Raintung Li, hossman) * SOLR-4729: LukeRequestHandler: Using a dynamic copyField source that is not also a dynamic field triggers error message 'undefined field: "(glob)"'. (Adam Hahn, hossman, Steve Rowe) Optimizations ---------------------- Other Changes ---------------------- * SOLR-4653: Solr configuration should log inaccessible/ non-existent relative paths in lib dir=... (Dawid Weiss) * SOLR-4317: SolrTestCaseJ4: Can't avoid "collection1" convention (Tricia Jenkins, via Erick Erickson) * SOLR-4571: SolrZkClient#setData should return Stat object. (Mark Miller) * SOLR-4603: CachingDirectoryFactory should use an IdentityHashMap for byDirectoryCache. (Mark Miller) * SOLR-4544: Refactor HttpShardHandlerFactory so load-balancing logic can be customized. (Ryan Ernst via Robert Muir) * SOLR-4607: Use noggit 0.5 release jar rather than a forked copy. (Yonik Seeley, Robert Muir) * SOLR-3706: Ship setup to log with log4j. (ryan, Mark Miller) * SOLR-4651: Remove dist-excl-slf4j build target. (Shawn Heisey) * SOLR-4622: The hardcoded SolrCloud defaults for 'hostContext="solr"' and 'hostPort="8983"' have been deprecated and will be removed in Solr 5.0. Existing solr.xml files that do not have these options explicitly specified should be updated accordingly. (hossman) * SOLR-4672: Requests attempting to use SolrCores which had init failures (that would be reported by CoreAdmin STATUS requests) now result in 500 error responses with the details about the init failure, instead of 404 error responses. (hossman) * SOLR-4730: Make the wiki link more prominent in the release documentation. (Uri Laserson via Robert Muir) ================== 4.2.1 ================== Versions of Major Components --------------------- Apache Tika 1.3 Carrot2 3.6.2 Velocity 1.7 and Velocity Tools 2.0 Apache UIMA 2.3.1 Apache ZooKeeper 3.4.5 Detailed Change List ---------------------- Bug Fixes ---------------------- * SOLR-4567: copyField source glob matching explicit field(s) stopped working in Solr 4.2. (Alexandre Rafalovitch, Steve Rowe) * SOLR-4475: Fix various places that still assume File based paths even when not using a file based DirectoryFactory. (Mark Miller) * SOLR-4551: CachingDirectoryFactory needs to create CacheEntry's with the fullpath not path. (Mark Miller) * SOLR-4555: When forceNew is used with CachingDirectoryFactory#get, the old CachValue should give up it's path as it will be used by a new Directory instance. (Mark Miller) * SOLR-4578: CoreAdminHandler#handleCreateAction gets a SolrCore and does not close it in SolrCloud mode when a core with the same name already exists. (Mark Miller) * SOLR-4574: The Collections API will silently return success on an unknown ACTION parameter. (Mark Miller) * SOLR-4576: Collections API validation errors should cause an exception on clients and otherwise act as validation errors with the Core Admin API. (Mark Miller) * SOLR-4577: The collections API should return responses (success or failure) for each node it attempts to work with. (Mark Miller) * SOLR-4568: The lastPublished state check before becoming a leader is not working correctly. (Mark Miller) * SOLR-4570: Even if an explicit shard id is used, ZkController#preRegister should still wait to see the shard id in it's current ClusterState. (Mark Miller) * SOLR-4585: The Collections API validates numShards with < 0 but should use <= 0. (Mark Miller) * SOLR-4592: DefaultSolrCoreState#doRecovery needs to check the CoreContainer shutdown flag inside the recoveryLock sync block. (Mark Miller) * SOLR-4595: CachingDirectoryFactory#close can throw a concurrent modification exception. (Mark Miller) * SOLR-4573: Accessing Admin UI files in SolrCloud mode logs warnings. (Mark Miller, Phil John) * SOLR-4594: StandardDirectoryFactory#remove accesses byDirectoryCache without a lock. (Mark Miller) * SOLR-4597: CachingDirectoryFactory#remove should not attempt to empty/remove the index right away but flag for removal after close. (Mark Miller) * SOLR-4598: The Core Admin unload command's option 'deleteDataDir', should use the DirectoryFactory API to remove the data dir. (Mark Miller) * SOLR-4599: CachingDirectoryFactory calls close(Directory) on forceNew if the Directory has a refCnt of 0, but it should call closeDirectory(CacheValue). (Mark Miller) * SOLR-4602: ZkController#unregister should cancel it's election participation before asking the Overseer to delete the SolrCore information. (Mark Miller) * SOLR-4601: A Collection that is only partially created and then deleted will leave pre allocated shard information in ZooKeeper. (Mark Miller) * SOLR-4604: UpdateLog#init is over called on SolrCore#reload. (Mark Miller) * SOLR-4605: Rollback does not work correctly. (Mark S, Mark Miller) * SOLR-4609: The Collections API should only send the reload command to ACTIVE cores. (Mark Miller) * SOLR-4297: Atomic update request containing null=true sets all subsequent fields to null (Ben Pennell, Rob, shalin) * SOLR-4371: Admin UI - Analysis Screen shows empty result (steffkes) * SOLR-4318: NPE encountered with querying with wildcards on a field that uses the DefaultAnalyzer (i.e. no analysis chain defined). (Erick Erickson) * SOLR-4361: DataImportHandler would throw UnsupportedOperationException if handler-level parameters were specified containing periods in the name (James Dyer) * SOLR-4538: Date Math expressions were being truncated to 32 characters when used in field:value queries in the lucene QParser. (hossman, yonik) * SOLR-4617: SolrCore#reload needs to pass the deletion policy to the next SolrCore through it's constructor rather than setting a field after. (Mark Miller) * SOLR-4589: Fixed CPU spikes and poor performance in lazy field loading of multivalued fields. (hossman) * SOLR-4608: Update Log replay and PeerSync replay should use the default processor chain to update the index. (Ludovic Boutros, yonik) * SOLR-4625: The solr (lucene syntax) query parser lost top-level boost values and top-level phrase slops on queries produced by nested sub-parsers. (yonik) * SOLR-4624: CachingDirectoryFactory does not need to support forceNew any longer and it appears to be causing a missing close directory bug. forceNew is no longer respected and will be removed in 4.3. (Mark Miller) * SOLR-3819: Grouped faceting (group.facet=true) did not respect filter exclusions. (Petter Remen, yonik) * SOLR-4637: Replication can sometimes wait until shutdown or core unload until removing some tmp directories. (Mark Miller) * SOLR-4638: DefaultSolrCoreState#getIndexWriter(null) is a way to avoid creating the IndexWriter earlier than necessary, but it's not implemented quite right. (Mark Miller) * SOLR-4640: CachingDirectoryFactory can fail to close directories in some race conditions. (Mark Miller) * SOLR-4642: QueryResultKey is not calculating the correct hashCode for filters. (Joel Bernstein via Mark Miller) Optimizations ---------------------- * SOLR-4569: waitForReplicasToComeUp should bail right away if it doesn't see the expected slice in the clusterstate rather than waiting. (Mark Miller) * SOLR-4311: Admin UI - Optimize Caching Behaviour (steffkes) Other Changes ---------------------- * SOLR-4537: Clean up schema information REST API. (Steve Rowe) * SOLR-4596: DistributedQueue should ensure its full path exists in the constructor. (Mark Miller) ================== 4.2.0 ================== Versions of Major Components --------------------- Apache Tika 1.3 Carrot2 3.6.2 Velocity 1.7 and Velocity Tools 2.0 Apache UIMA 2.3.1 Apache ZooKeeper 3.4.5 Upgrading from Solr 4.1.0 ---------------------- (No upgrade instructions yet) Detailed Change List ---------------------- New Features ---------------------- * SOLR-4043: Add ability to get success/failure responses from Collections API. (Raintung Li, Mark Miller) * SOLR-2827: RegexpBoost Update Processor (janhoy) * SOLR-4370: Allow configuring commitWithin to do hard commits. (Mark Miller, Senthuran Sivananthan) * SOLR-4451: SolrJ, and SolrCloud internals, now use SystemDefaultHttpClient under the covers -- allowing many HTTP connection related properties to be controlled via 'standard' java system properties. (hossman) * SOLR-3855, SOLR-4490: Doc values support. (Adrien Grand, Robert Muir) * SOLR-4417: Reopen the IndexWriter on SolrCore reload. (Mark Miller) * SOLR-4477: Add support for queries (match-only) against docvalues fields. (Robert Muir) * SOLR-4488: Return slave replication details for a master if the master has also acted like a slave. (Mark Miller) * SOLR-4498: Add list command to ZkCLI that prints out the contents of ZooKeeper. (Roman Shaposhnik via Mark Miller) * SOLR-4481: SwitchQParserPlugin registered by default as 'switch' using syntax: {!switch case=XXX case.foo=YYY case.bar=ZZZ default=QQQ}foo (hossman) * SOLR-4078: Allow custom naming of SolrCloud nodes so that a new host:port combination can take over for a previous shard. (Mark Miller) * SOLR-4210: Requests to a Collection that does not exist on the receiving node should be proxied to a suitable node. (Mark Miller, Po Rui, yonik) * SOLR-1365: New SweetSpotSimilarityFactory allows customizable TF/IDF based Similarity when you know the optimal "Sweet Spot" of values for the field length and TF scoring factors. (hossman) * SOLR-4138: CurrencyField fields can now be used in a ValueSources to get the "raw" value (using the default number of fractional digits) in the default currency of the field type. There is also a new currency(field,[CODE]) function for generating a ValueSource of the "natural" value, converted to an optionally specified currency to override the default for the field type. (hossman) * SOLR-4503: Add REST API methods, via Restlet integration, for reading schema elements, at /schema/fields/, /schema/dynamicfields/, /schema/fieldtypes/, and /schema/copyfields/. (Steve Rowe) Bug Fixes ---------------------- * SOLR-2850: Do not refine facets when minCount == 1 (Matt Smith, lundgren via Adrien Grand) * SOLR-4309: /browse: Improve JQuery autosuggest behavior (janhoy) * SOLR-4330: group.sort is ignored when using group.truncate and ex/tag local params together (koji) * SOLR-4321: Collections API will sometimes use a node more than once, even when more unused nodes are available. (Eric Falcao, Brett Hoerner, Mark Miller) * SOLR-4345 : Solr Admin UI dosent work in IE 10 (steffkes) * SOLR-4349 : Admin UI - Query Interface does not work in IE (steffkes) * SOLR-4359: The RecentUpdates#update method should treat a problem reading the next record the same as a problem parsing the record - log the exception and break. (Mark Miller) * SOLR-4225: Term info page under schema browser shows incorrect count of terms (steffkes) * SOLR-3926: Solr should support better way of finding active sorts (Eirik Lygre via Erick Erickson) * SOLR-4342: Fix DataImportHandler stats to be a proper Map (hossman) * SOLR-3967: langid.enforceSchema option checks source field instead of target field (janhoy) * SOLR-4380: Replicate after startup option would not replicate until the IndexWriter was lazily opened. (Mark Miller, Gregg Donovan) * SOLR-4400: Deadlock can occur in a rare race between committing and closing a SolrIndexWriter. (Erick Erickson, Mark Miller) * SOLR-3655: A restarted node can briefly appear live and active before it really is in some cases. (Mark Miller) * SOLR-4426: NRTCachingDirectoryFactory does not initialize maxCachedMB and maxMergeSizeMB if <directoryFactory> is not present in solrconfig.xml (Jack Krupansky via shalin) * SOLR-4463: Fix SolrCoreState reference counting. (Mark Miller) * SOLR-4459: The Replication 'index move' rather than copy optimization doesn't kick in when using NRTCachingDirectory or the rate limiting feature. (Mark Miller) * SOLR-4421,SOLR-4165: On CoreContainer shutdown, all SolrCores should publish their state as DOWN. (Mark Miller, Markus Jelsma) * SOLR-4467: Ephemeral directory implementations may not recover correctly because the code to clear the tlog files on startup is off. (Mark Miller) * SOLR-4413: Fix SolrCore#getIndexDir() to return the current index directory. (Gregg Donovan, Mark Miller) * SOLR-4469: A new IndexWriter must be opened on SolrCore reload when the index directory has changed and the previous SolrCore's state should not be propagated. (Mark Miller, Gregg Donovan) * SOLR-4471: Replication occurs even when a slave is already up to date. (Mark Miller, Andre Charton) * SOLR-4484: ReplicationHandler#loadReplicationProperties still uses Files rather than the Directory to try and read the replication properties files. (Mark Miller) * SOLR-4352: /browse pagination now supports and preserves sort context (Eric Spiegelberg, Erik Hatcher) * LUCENE-4796, SOLR-4373: Fix concurrency issue in NamedSPILoader and AnalysisSPILoader when doing concurrent core loads in multicore Solr configs. (Uwe Schindler, Hossman) * SOLR-4504: Fixed CurrencyField range queries to correctly exclude documents w/o values (hossman) * SOLR-4480: A trailing + or - caused the edismax parser to throw an exception. (Fiona Tay, Jan Høydahl, yonik) * SOLR-4507: The Cloud tab does not show up in the Admin UI if you set zkHost in solr.xml. (Alfonso Presa, Mark Miller) * SOLR-4505: Possible deadlock around SolrCoreState update lock. (Erick Erickson, Mark Miller) * SOLR-4511: When a new index is replicated into place, we need to update the most recent replicatable index point without doing a commit. This is important for repeater use cases, as well as when nodes may switch master/slave roles. (Mark Miller, Raúl Grande) * SOLR-4515: CurrencyField's OpenExchangeRatesOrgProvider now requires a ratesFileLocation init param, since the previous global default no longer works (hossman) * SOLR-4518: Improved CurrencyField error messages when attempting to use a Currency that is not supported by the current JVM. (hossman) * SOLR-3798: Fix copyField implementation in IndexSchema to handle dynamic field references that aren't string-equal to the name of the referenced dynamic field. (Steve Rowe) * SOLR-4497: Collection Aliasing. (Mark Miller) Optimizations ---------------------- * SOLR-4339: Admin UI - Display Field-Flags on Schema-Browser (steffkes) * SOLR-4340: Admin UI - Analysis's Button Spinner goes wild (steffkes) * SOLR-4341: Admin UI - Plugins/Stats Page contains loooong Values which result in horizontal Scrollbar (steffkes) * SOLR-3915: Color Legend for Cloud UI (steffkes) * SOLR-4306: Utilize indexInfo=false when gathering core names in UI (steffkes) * SOLR-4284: Admin UI - make core list scrollable separate from the rest of the UI (steffkes) * SOLR-4364: Admin UI - Locale based number formatting (steffkes) * SOLR-4521: Stop using the 'force' option for recovery replication. This will keep some less common unnecessary replications from happening. (Mark Miller, Simon Scofield) * SOLR-4529: Improve Admin UI Dashboard legibility (Felix Buenemann via steffkes) * SOLR-4526: Admin UI depends on optional system info (Felix Buenemann via steffkes) Other Changes ---------------------- * SOLR-4259: Carrot2 dependency should be declared on the mini version, not the core. (Dawid Weiss). * SOLR-4348: Make the lock type configurable by system property by default. (Mark Miller) * SOLR-4353: Renamed example jetty context file to reduce confusion (hossman) * SOLR-4384: Make post.jar report timing information (Upayavira via janhoy) * SOLR-4415: Add 'state' to shards (default to 'active') and read/write them to ZooKeeper (Anshum Gupta via shalin) * SOLR-4394: Tests and example configs demonstrating SSL with both server and client certs (hossman) * SOLR-3060: SurroundQParserPlugin highlighting tests (Ahmet Arslan via hossman) * SOLR-2470: Added more tests for VelocityResponseWriter * SOLR-4471: Improve and clean up TestReplicationHandler. (Amit Nithian via Mark Miller) * SOLR-3843: Include lucene codecs jar and enable per-field postings and docvalues support in the schema.xml (Robert Muir, Steve Rowe) * SOLR-4511: Add new test for 'repeater' replication node. (Mark Miller) * SOLR-4458: Sort directions (asc, desc) are now case insensitive (Shawn Heisey via hossman) * SOLR-2996: A bare * without a field specification is treated as *:* by the lucene and edismax query paesers. (hossman, Jan Høydahl, Alan Woodward, yonik) * SOLR-4416: Upgrade to Tika 1.3. (Markus Jelsma via Mark Miller) * SOLR-4200: Reduce INFO level logging from CachingDirectoryFactory (Shawn Heisey via hossman) ================== 4.1.0 ================== Versions of Major Components --------------------- Apache Tika 1.2 Carrot2 3.6.2 Velocity 1.7 and Velocity Tools 2.0 Apache UIMA 2.3.1 Apache ZooKeeper 3.4.5 Upgrading from Solr 4.0.0 ---------------------- Custom java parsing plugins need to migrate from throwing the internal ParseException to throwing SyntaxError. BaseDistributedSearchTestCase now randomizes the servlet context it uses when creating Jetty instances. Subclasses that assume a hard coded context of "/solr" should either be fixed to use the "String context" variable, or should take advantage of the new BaseDistributedSearchTestCase(String) constructor to explicitly specify a fixed servlet context path. See SOLR-4136 for details. Detailed Change List ---------------------- New Features ---------------------- * SOLR-2255: Enhanced pivot faceting to use local-params in the same way that regular field value faceting can. This means support for excluding a filter query, using a different output key, and specifying 'threads' to do facet.method=fcs concurrently. PivotFacetHelper now extends SimpleFacet and the getFacetImplementation() extension hook was removed. (dsmiley) * SOLR-3897: A highlighter parameter "hl.preserveMulti" to return all of the values of a multiValued field in their original order when highlighting. (Joel Bernstein via yonik) * SOLR-3929: Support configuring IndexWriter max thread count in solrconfig. (phunt via Mark Miller) * SOLR-3906: Add support for AnalyzingSuggester (LUCENE-3842), where the underlying analyzed form used for suggestions is separate from the returned text. (Robert Muir) * SOLR-3985: ExternalFileField caches can be reloaded on firstSearcher/ newSearcher events using the ExternalFileFieldReloader (Alan Woodward) * SOLR-3911: Make Directory and DirectoryFactory first class so that the majority of Solr's features work with any custom implementations. (Mark Miller) Additional Work: - SOLR-4032: Files larger than an internal buffer size fail to replicate. (Mark Miller, Markus Jelsma) - SOLR-4033: Consistently use the solrconfig.xml lockType everywhere. (Mark Miller, Markus Jelsma) - SOLR-4144: Replication using too much RAM. (yonik, Markus Jelsma) - SOLR-4187: NPE on Directory release (Mark Miller, Markus Jelsma) * SOLR-4051: Add <propertyWriter /> element to DIH's data-config.xml file, allowing the user to specify the location, filename and Locale for the "data-config.properties" file. Alternatively, users can specify their own property writer implementation for greater control. This new configuration element is optional, and defaults mimic prior behavior. The one exception is that the "root" locale is default. Previously it was the machine's default locale. (James Dyer) * SOLR-4084: Add FuzzyLookupFactory, which is like AnalyzingSuggester except that it can tolerate typos in the input. (Areek Zillur via Robert Muir) * SOLR-4088: New and improved auto host detection strategy for SolrCloud. (Raintung Li via Mark Miller) * SOLR-3970: SystemInfoHandler now exposes more details about the JRE/VM/Java version in use. (hossman) * SOLR-4101: Add support for storing term offsets in the index via a 'storeOffsetsWithPositions' flag on field definitions in the schema. (Tom Winch, Alan Woodward) * SOLR-4093: Solr QParsers may now be directly invoked in the lucene query syntax without the _query_ magic field hack. Example: foo AND {!term f=myfield v=$qq} (yonik) * SOLR-4087: Add MAX_DOC_FREQ option to MoreLikeThis. (Andrew Janowczyk via Mark Miller) * SOLR-4114: Allow creating more than one shard per instance with the Collection API. (Per Steffensen, Mark Miller) * SOLR-3531: Allowing configuring maxMergeSizeMB and maxCachedMB when using NRTCachingDirectoryFactory. (Andy Laird via Mark Miller) * SOLR-4118: Fix replicationFactor to align with industry usage. replicationFactor now means the total number of copies of a document stored in the collection (or the total number of physical indexes for a single logical slice of the collection). For example if replicationFactor=3 then for a given shard there will be a total of 3 replicas (one of which will normally be designated as the leader.) (yonik) * SOLR-4124: You should be able to set the update log directory with the CoreAdmin API the same way as the data directory. (Mark Miller) * SOLR-4028: When using ZK chroot, it would be nice if Solr would create the initial path when it doesn't exist. (Tomás Fernández Löbbe via Mark Miller) * SOLR-3948: Calculate/display deleted documents in admin interface. (Shawn Heisey via Mark Miller) * SOLR-4030: Allow rate limiting Directory IO based on the IO context. (Mark Miller, Radim Kolar) * SOLR-4166: LBHttpSolrServer ignores ResponseParser passed in constructor. (Steve Molloy via Mark Miller) * SOLR-4140: Allow access to the collections API through CloudSolrServer without referencing an existing collection. (Per Steffensen via Mark Miller) * SOLR-788: Distributed search support for MLT. (Matthew Woytowitz, Mike Anderson, Jamie Johnson, Mark Miller) * SOLR-4120: Collection API: Support for specifying a list of Solr addresses to spread a new collection across. (Per Steffensen via Mark Miller) * SOLR-4110: Configurable Content-Type headers for PHPResponseWriters and PHPSerializedResponseWriter. (Dominik Siebel via Mark Miller) * SOLR-1028: The ability to specify "transient" and "loadOnStartup" as a new properties of <core> tags in solr.xml. Can specify "transientCacheSize" in the <cores> tag. Together these allow cores to be loaded only when needed and only transientCacheSize transient cores will be loaded at a time, the rest aged out on an LRU basis. * SOLR-4246: When update.distrib is set to skip update processors before the distributed update processor, always include the log update processor so forwarded updates will still be logged. (yonik) * SOLR-4230: The new Solr 4 spatial fields now work with the {!geofilt} and {!bbox} query parsers. The score local-param works too. (David Smiley) * SOLR-1972: Add extra statistics to RequestHandlers - 5 & 15-minute reqs/sec rolling averages; median, 75th, 95th, 99th, 99.9th percentile request times (Alan Woodward, Shawn Heisey, Adrien Grand, Uwe Schindler) * SOLR-4271: Add support for PostingsHighlighter. (Robert Muir) * SOLR-4255: The new Solr 4 spatial fields now have a 'filter' boolean local-param that can be set to false to not filter. Its useful when there is already a spatial filter query but you also need to sort or boost by distance. (David Smiley) * SOLR-4265, SOLR-4283: Solr now parses request parameters (in URL or sent with POST using content-type application/x-www-form-urlencoded) in its dispatcher code. It no longer relies on special configuration settings in Tomcat or other web containers to enable UTF-8 encoding, which is mandatory for correct Solr behaviour. Query strings passed in via the URL need to be properly-%-escaped, UTF-8 encoded bytes, otherwise Solr refuses to handle the request. The maximum length of x-www-form-urlencoded POST parameters can now be configured through the requestDispatcher/requestParsers/@formdataUploadLimitInKB setting in solrconfig.xml (defaults to 2 MiB). Solr now works out of the box with e.g. Tomcat, JBoss,... (Uwe Schindler, Dawid Weiss, Alex Rocher) * SOLR-2201: DIH's "formatDate" function now supports a timezone as an optional fourth parameter (James Dyer, Mark Waddle) * SOLR-4302: New parameter 'indexInfo' (defaults to true) in CoreAdmin STATUS command can be used to omit index specific information (Shahar Davidson via shalin) * SOLR-2592: Collection specific document routing. The "compositeId" router is the default for collections with hash based routing (i.e. when numShards=N is specified on collection creation). Documents with ids sharing the same domain (prefix) will be routed to the same shard, allowing for efficient querying. Example: The following two documents will be indexed to the same shard since they share the same domain "customerB!". <code> {"id" : "customerB!doc1" [...] } {"id" : "customerB!doc2" [...] } </code> At query time, one can specify a "shard.keys" parameter that lists what shards the query should cover. http://.../query?q=my_query&shard.keys=customerB! Collections that do not specify numShards at collection creation time use custom sharding and default to the "implicit" router. Document updates received by a shard will be indexed to that shard, unless a "_shard_" parameter or document field names a different shard. (Michael Garski, Dan Rosher, yonik) Optimizations ---------------------- * SOLR-3788: Admin Cores UI should redirect to newly created core details (steffkes) * SOLR-3895: XML and XSLT UpdateRequestHandler should not try to resolve external entities. This improves speed of loading e.g. XSL-transformed XHTML documents. (Martin Herfurt, uschindler, hossman) * SOLR-3614: Fix XML parsing in XPathEntityProcessor to correctly expand named entities, but ignore external entities. (uschindler, hossman) * SOLR-3734: Improve Schema-Browser Handling for CopyField using dynamicField's (steffkes) * SOLR-3941: The "commitOnLeader" part of distributed recovery can use openSearcher=false. (Tomás Fernández Löbbe via Mark Miller) * SOLR-4063: Allow CoreContainer to load multiple SolrCores in parallel rather than just serially. (Mark Miller) * SOLR-4199: When doing zk retries due to connection loss, rather than just retrying for 2 minutes, retry in proportion to the session timeout. (Mark Miller) * SOLR-4262: Replication Icon on Dashboard does not reflect Master-/Slave- State (steffkes) * SOLR-4264: Missing Error-Screen on UI's Cloud-Page (steffkes) * SOLR-4261: Percentage Infos on Dashboard have a fixed width (steffkes) * SOLR-3851: create a new core/delete an existing core should also update the main/left list of cores on the admin UI (steffkes) * SOLR-3840: XML query response display is unreadable in Solr Admin Query UI (steffkes) * SOLR-3982: Admin UI: Various Dataimport Improvements (steffkes) * SOLR-4296: Admin UI: Improve Dataimport Auto-Refresh (steffkes) * SOLR-3458: Allow multiple Items to stay open on Plugins-Page (steffkes) Bug Fixes ---------------------- * SOLR-4288: Improve logging for FileDataSource (basePath, relative resources). (Dawid Weiss) * SOLR-4007: Morfologik dictionaries not available in Solr field type due to class loader lookup problems. (Lance Norskog, Dawid Weiss) * SOLR-3560: Handle different types of Exception Messages for Logging UI (steffkes) * SOLR-3637: Commit Status at Core-Admin UI is always false (steffkes) * SOLR-3917: Partial State on Schema-Browser UI is not defined for Dynamic Fields & Types (steffkes) * SOLR-3939: Consider a sync attempt from leader to replica that fails due to 404 a success. (Mark Miller, Joel Bernstein) * SOLR-3940: Rejoining the leader election incorrectly triggers the code path for a fresh cluster start rather than fail over. (Mark Miller) * SOLR-3961: Fixed error using LimitTokenCountFilterFactory (Jack Krupansky, hossman) * SOLR-3933: Distributed commits are not guaranteed to be ordered within a request. (Mark Miller) * SOLR-3939: An empty or just replicated index cannot become the leader of a shard after a leader goes down. (Joel Bernstein, yonik, Mark Miller) * SOLR-3971: A collection that is created with numShards=1 turns into a numShards=2 collection after starting up a second core and not specifying numShards. (Mark Miller) * SOLR-3988: Fixed SolrTestCaseJ4.adoc(SolrInputDocument) to respect field and document boosts (hossman) * SOLR-3981: Fixed bug that resulted in document boosts being compounded in <copyField/> destination fields. (hossman) * SOLR-3920: Fix server list caching in CloudSolrServer when using more than one collection list with the same instance. (Grzegorz Sobczyk, Mark Miller) * SOLR-3938: prepareCommit command omits commitData causing a failure to trigger replication to slaves. (yonik) * SOLR-3992: QuerySenderListener doesn't populate document cache. (Shotaro Kamio, yonik) * SOLR-3995: Recovery may never finish on SolrCore shutdown if the last reference to a SolrCore is closed by the recovery process. (Mark Miller) * SOLR-3998: Atomic update on uniqueKey field itself causes duplicate document. (Eric Spencer, yonik) * SOLR-4001: In CachingDirectoryFactory#close, if there are still refs for a Directory outstanding, we need to wait for them to be released before closing. (Mark Miller) * SOLR-4005: If CoreContainer fails to register a created core, it should close it. (Mark Miller) * SOLR-4009: OverseerCollectionProcessor is not resilient to many error conditions and can stop running on errors. (Raintung Li, milesli, Mark Miller) * SOLR-4019: Log stack traces for 503/Service Unavailable SolrException if not thrown by PingRequestHandler. Do not log exceptions if a user tries to view a hidden file using ShowFileRequestHandler. (Tomás Fernández Löbbe via James Dyer) * SOLR-3589: Edismax parser does not honor mm parameter if analyzer splits a token. (Tom Burton-West, Robert Muir) * SOLR-4031: Upgrade to Jetty 8.1.7 to fix a bug where in very rare occasions the content of two concurrent requests get mixed up. (Per Steffensen, yonik) * SOLR-4060: ReplicationHandler can try and do a snappull and open a new IndexWriter after shutdown has already occurred, leaving an IndexWriter that is not closed. (Mark Miller) * SOLR-4055: Fix a thread safety issue with the Collections API that could cause actions to be targeted at the wrong SolrCores. (Raintung Li, Per Steffensen via Mark Miller) * SOLR-3993: If multiple SolrCore's for a shard coexist on a node, on cluster restart, leader election would stall until timeout, waiting to see all of the replicas come up. (Mark Miller, Alexey Kudinov) * SOLR-2045: Databases that require a commit to be issued before closing the connection on a non-read-only database leak connections. Also expanded the SqlEntityProcessor test to sometimes use Derby as well as HSQLDB (Derby is one db affected by this bug). (Fenlor Sebastia, James Dyer) * SOLR-4064: When there is an unexpected exception while trying to run the new leader process, the SolrCore will not correctly rejoin the election. (Po Rui via Mark Miller) * SOLR-3989: SolrZkClient constructor dropped exception cause when throwing a new RuntimeException. (Colin Bartolome, yonik) * SOLR-4036: field aliases in fl should not cause properties of target field to be used. (Martin Koch, yonik) * SOLR-4003: The SolrZKClient clean method should not try and clear zk paths that start with /zookeeper, as this can fail and stop the removal of further nodes. (Mark Miller) * SOLR-4076: SolrQueryParser should run fuzzy terms through MultiTermAwareComponents to ensure that (for example) a fuzzy query of foobar~2 is equivalent to FooBar~2 on a field that includes lowercasing. (yonik) * SOLR-4081: QueryParsing.toString, used during debugQuery=true, did not correctly handle ExtendedQueries such as WrappedQuery (used when cache=false), spatial queries, and frange queries. (Eirik Lygre, yonik) * SOLR-3959: Ensure the internal comma separator of poly fields is escaped for CSVResponseWriter. (Areek Zillur via Robert Muir) * SOLR-4075: A logical shard that has had all of it's SolrCores unloaded should be removed from the cluster state. (Mark Miller, Gilles Comeau) * SOLR-4034: Check if a collection already exists before trying to create a new one. (Po Rui, Mark Miller) * SOLR-4097: Race can cause NPE in logging line on first cluster state update. (Mark Miller) * SOLR-4099: Allow the collection api work queue to make forward progress even when it's watcher is not fired for some reason. (Raintung Li via Mark Miller) * SOLR-3960: Fixed a bug where Distributed Grouping ignored PostFilters (Nathan Visagan, hossman) * SOLR-3842: DIH would not populate multivalued fields if the column name derives from a resolved variable (James Dyer) * SOLR-4117: Retrieving the size of the index may use the wrong index dir if you are replicating. (Mark Miller, Markus Jelsma) * SOLR-2890: Fixed a bug that prevented omitNorms and omitTermFreqAndPositions options from being respected in some <fieldType/> declarations (hossman) * SOLR-4159: When we are starting a shard from rest, a potential leader should not consider it's last published state when deciding if it can be the new leader. (Mark Miller) * SOLR-4158: When a core is registering in ZooKeeper it may not wait long enough to find the leader due to how long the potential leader waits to see replicas. (Mark Miller, Alain Rogister) * SOLR-4162: ZkCli usage examples are not correct because the zkhost parameter is not present and it is mandatory for all commands. (Tomás Fernández Löbbe via Mark Miller) * SOLR-4071: Validate that name is pass to Collections API create, and behave the same way as on startup when collection.configName is not explicitly passed. (Po Rui, Mark Miller) * SOLR-4127: Added explicit error message if users attempt Atomic document updates with either updateLog or DistribUpdateProcessor. (hossman) * SOLR-4136: Fix SolrCloud behavior when using "hostContext" containing "_" or"/" characters. This fix also makes SolrCloud more accepting of hostContext values with leading/trailing slashes. (hossman) * SOLR-4168: Ensure we are using the absolute latest index dir when getting list of files for replication. (Mark Miller) * SOLR-4171: CachingDirectoryFactory should not return any directories after it has been closed. (Mark Miller) * SOLR-4102: Fix UI javascript error if canonical hostname can not be resolved (steffkes via hossman) * SOLR-4178: ReplicationHandler should abort any current pulls and wait for it's executor to stop during core close. (Mark Miller) * SOLR-3918: Fixed the 'dist-war-excl-slf4j' ant target to exclude all slf4j jars, so that the resulting war is usable as is provided the servlet container includes the correct slf4j api and impl jars. (Shawn Heisey, hossman) * SOLR-4198: OverseerCollectionProcessor should implement ClosableThread. (Mark Miller) * SOLR-4213: Directories that are not shutdown until DirectoryFactory#close do not have close listeners called on them. (Mark Miller) * SOLR-4134: Standard (XML) request writer cannot "set" multiple values into multivalued field with partial updates. (Luis Cappa Banda, Will Butler, shalin) * SOLR-3972: Fix ShowFileRequestHandler to not log a warning in the (expected) situation of a file not found. (hossman) * SOLR-4133: Cannot "set" field to null with partial updates when using the standard RequestWriter. (Will Butler, shalin) * SOLR-4223: "maxFormContentSize" in jetty.xml is not picked up by jetty 8 so set it via solr webapp context file. (shalin) * SOLR-4175:SearchComponent chain can't contain two components of the same class and use debugQuery. (Tomás Fernández Löbbe via ehatcher) * SOLR-4244: When coming back from session expiration we should not wait for the leader to see us in the down state if we are the node that must become the leader. (Mark Miller) * SOLR-4245: When a core is registering with ZooKeeper, the timeout to find the leader in the cluster state is 30 seconds rather than leaderVoteWait + extra time. (Mark Miller) * SOLR-4238: Fix jetty example requestLog config (jm via hossman) * SOLR-4251: Fix SynonymFilterFactory when an optional tokenizerFactory is supplied. (Chris Bleakley via rmuir) * SOLR-4253: Misleading resource loading warning from Carrot2 clustering component fixed (Stanisław Osiński) * SOLR-4257: PeerSync updates and Log Replay updates should not wait for a ZooKeeper connection in order to proceed. (yonik) * SOLR-4045: SOLR admin page returns HTTP 404 on core names containing a '.' (dot) (steffkes) * SOLR-4176: analysis ui: javascript not properly handling URL decoding of input (steffkes) * SOLR-4079: Long core names break web gui appearance and functionality (steffkes) * SOLR-4263: Incorrect Link from Schema-Browser to Query From for Top-Terms (steffkes) * SOLR-3829: Admin UI Logging events broken if schema.xml defines a catch-all dynamicField with type ignored (steffkes) * SOLR-4275: Fix TrieTokenizer to no longer throw StringIndexOutOfBoundsException in admin UI / AnalysisRequestHandler when you enter no number to tokenize. (Uwe Schindler) * SOLR-4279: Wrong exception message if _version_ field is multivalued (shalin) * SOLR-4170: The 'backup' ReplicationHandler command can sometimes use a stale index directory rather than the current one. (Mark Miller, Marcin Rzewucki) * SOLR-3876: Solr Admin UI is completely dysfunctional on IE 9 (steffkes) * SOLR-4112: Fixed DataImportHandler ZKAwarePropertiesWriter implementation so import works fine with SolrCloud clusters (Deniz Durmus, James Dyer, Erick Erickson, shalin) * SOLR-4291: Harden the Overseer work queue thread loop. (Mark Miller) * SOLR-3820: Solr Admin Query form is missing some edismax request parameters (steffkes) * SOLR-4217: post.jar no longer ignores -Dparams when -Durl is used. (Alexandre Rafalovitch, ehatcher) * SOLR-4303: On replication, if the generation of the master is lower than the slave we need to force a full copy of the index. (Mark Miller, Gregg Donovan) * SOLR-4266: HttpSolrServer does not release connection properly on exception when no response parser is used. (Steve Molloy via Mark Miller) * SOLR-2298: Updated JavaDoc for SolrDocument.addField and SolrInputDocument.addField to have more information on name and value parameters. (Siva Natarajan) Other Changes ---------------------- * SOLR-4106: Javac/ ivy path warnings with morfologik fixed by upgrading to Morfologik 1.5.5 (Robert Muir, Dawid Weiss) * SOLR-3899: SolrCore should not log at warning level when the index directory changes - it's an info event. (Tobias Bergman, Mark Miller) * SOLR-3861: Refactor SolrCoreState so that it's managed by SolrCore. (Mark Miller, hossman) * SOLR-3966: Eliminate superfluous warning from LanguageIdentifierUpdateProcessor (Markus Jelsma via hossman) * SOLR-3932: SolrCmdDistributorTest either takes 3 seconds or 3 minutes. (yonik, Mark Miller) * SOLR-3856: New tests for SqlEntityProcessor/CachedSqlEntityProcessor (James Dyer) * SOLR-4067: ZkStateReader#getLeaderProps should not return props for a leader that it does not think is live. (Mark Miller) * SOLR-4086: DIH refactor of VariableResolver and Evaluator. VariableResolver and each built-in Evaluator are separate concrete classes. DateFormatEvaluator now defaults with the ROOT Locale. However, users may specify a different Locale using an optional new third parameter. (James Dyer) * SOLR-3602: Update ZooKeeper to 3.4.5 (Mark Miller) * SOLR-4095: DIH NumberFormatTransformer & DateFormatTransformer default to the ROOT Locale if none is specified. These previously used the machine's default. (James Dyer) * SOLR-4096: DIH FileDataSource & FieldReaderDataSource default to UTF-8 encoding if none is specified. These previously used the machine's default. (James Dyer) * SOLR-1916: DIH to not use Lucene-forbidden Java APIs (default encoding, locale, etc.) (James Dyer, Robert Muir) * SOLR-4111: SpellCheckCollatorTest#testContextSensitiveCollate to test against both DirectSolrSpellChecker & IndexBasedSpellChecker (Tomás Fernández Löbbe via James Dyer) * SOLR-2141: Better test coverage for Evaluators (James Dyer) * SOLR-4119: Update Guava to 13.0.1 (Mark Miller) * SOLR-4074: Raise default ramBufferSizeMB to 100 from 32. (yonik, Mark Miller) * SOLR-4062: The update log location in solrconfig.xml should default to ${solr.ulog.dir} rather than ${solr.data.dir:} (Mark Miller) * SOLR-4155: Upgrade Jetty to 8.1.8. (Robert Muir) * SOLR-2986: Add MoreLikeThis to warning about features that require uniqueKey. Also, change the warning to warn log level. (Shawn Heisey via Mark Miller) * SOLR-4163: README improvements (Shawn Heisey via hossman) * SOLR-4248: "ant eclipse" should declare .svn directories as derived. (Shawn Heisey via Mark Miller) * SOLR-3279: Upgrade Carrot2 to 3.6.2 (Stanisław Osiński) * SOLR-4254: Harden the 'leader requests replica to recover' code path. (Mark Miller, yonik) * SOLR-4226: Extract fl parsing code out of ReturnFields constructor. (Ryan Ernst via Robert Muir) * SOLR-4208: ExtendedDismaxQParserPlugin has been refactored to make subclassing easier. (Tomás Fernández Löbbe, hossman) * SOLR-3735: Relocate the example mime-to-extension mapping, and upgrade Velocity Engine to 1.7 (ehatcher) * SOLR-4287: Removed "apache-" prefix from Solr distribution and artifact filenames. (Ryan Ernst, Robert Muir, Steve Rowe) * SOLR-4016: Deduplication does not work with atomic/partial updates so disallow atomic update requests which change signature generating fields. (Joel Nothman, yonik, shalin) * SOLR-4308: Remove the problematic and now unnecessary log4j-over-slf4j. (Mark Miller) ================== 4.0.0 ================== Versions of Major Components --------------------- Apache Tika 1.2 Carrot2 3.5.0 Velocity 1.6.4 and Velocity Tools 2.0 Apache UIMA 2.3.1 Apache ZooKeeper 3.3.6 Upgrading from Solr 4.0.0-BETA ---------------------- In order to better support distributed search mode, the TermVectorComponent's response format has been changed so that if the schema defines a uniqueKeyField, then that field value is used as the "key" for each document in it's response section, instead of the internal lucene doc id. Users w/o a uniqueKeyField will continue to see the same response format. See SOLR-3229 for more details. If you are using SolrCloud's distributed update request capabilities and a non string type id field, you must re-index. Upgrading from Solr 4.0.0-ALPHA ---------------------- Solr is now much more strict about requiring that the uniqueKeyField feature (if used) must refer to a field which is not multiValued. If you upgrade from an earlier version of Solr and see an error that your uniqueKeyField "can not be configured to be multivalued" please add 'multiValued="false"' to the <field /> declaration for your uniqueKeyField. See SOLR-3682 for more details. In addition, please review the notes above about upgrading from 4.0.0-BETA Upgrading from Solr 3.6 ---------------------- * The Lucene index format has changed and as a result, once you upgrade, previous versions of Solr will no longer be able to read your indices. In a master/slave configuration, all searchers/slaves should be upgraded before the master. If the master were to be updated first, the older searchers would not be able to read the new index format. * Setting abortOnConfigurationError=false is no longer supported (since it has never worked properly). Solr will now warn you if you attempt to set this configuration option at all. (see SOLR-1846) * The default logic for the 'mm' param of the 'dismax' QParser has been changed. If no 'mm' param is specified (either in the query, or as a default in solrconfig.xml) then the effective value of the 'q.op' param (either in the query or as a default in solrconfig.xml or from the 'defaultOperator' option in schema.xml) is used to influence the behavior. If q.op is effectively "AND" then mm=100%. If q.op is effectively "OR" then mm=0%. Users who wish to force the legacy behavior should set a default value for the 'mm' param in their solrconfig.xml file. * The VelocityResponseWriter is no longer built into the core. Its JAR and dependencies now need to be added (via <lib> or solr/home lib inclusion), and it needs to be registered in solrconfig.xml like this: <queryResponseWriter name="velocity" class="solr.VelocityResponseWriter"/> * The update request parameter to choose Update Request Processor Chain is renamed from "update.processor" to "update.chain". The old parameter was deprecated but still working since Solr3.2, but is now removed entirely. * The <indexDefaults> and <mainIndex> sections of solrconfig.xml are discontinued and replaced with the <indexConfig> section. There are also better defaults. When migrating, if you don't know what your old settings mean, simply delete both <indexDefaults> and <mainIndex> sections. If you have customizations, put them in <indexConfig> section - with same syntax as before. * Two of the SolrServer subclasses in SolrJ were renamed/replaced. CommonsHttpSolrServer is now HttpSolrServer, and StreamingUpdateSolrServer is now ConcurrentUpdateSolrServer. * The PingRequestHandler no longer looks for a <healthcheck/> option in the (legacy) <admin> section of solrconfig.xml. Users who wish to take advantage of this feature should configure a "healthcheckFile" init param directly on the PingRequestHandler. As part of this change, relative file paths have been fixed to be resolved against the data dir. See the example solrconfig.xml and SOLR-1258 for more details. * Due to low level changes to support SolrCloud, the uniqueKey field can no longer be populated via <copyField/> or <field default=...> in the schema.xml. Users wishing to have Solr automatically generate a uniqueKey value when adding documents should instead use an instance of solr.UUIDUpdateProcessorFactory in their update processor chain. See SOLR-2796 for more details. In addition, please review the notes above about upgrading from 4.0.0-BETA, and 4.0.0-ALPHA Detailed Change List ---------------------- New Features ---------------------- * SOLR-3670: New CountFieldValuesUpdateProcessorFactory makes it easy to index the number of values in another field for later use at query time. (hossman) * SOLR-2768: new "mod(x,y)" function for computing the modulus of two value sources. (hossman) * SOLR-3238: Numerous small improvements to the Admin UI (steffkes) * SOLR-3597: seems like a lot of wasted whitespace at the top of the admin screens (steffkes) * SOLR-3304: Added Solr adapters for Lucene 4's new spatial module. With SpatialRecursivePrefixTreeFieldType ("location_rpt" in example schema), it is possible to index a variable number of points per document (and sort on them), index not just points but any Spatial4j supported shape such as polygons, and to query on these shapes too. Polygons requires adding JTS to the classpath. (David Smiley) * SOLR-3825: Added optional capability to log what ids are in a response (Scott Stults via gsingers) * SOLR-3821: Added 'df' to the UI Query form (steffkes) * SOLR-3822: Added hover titles to the edismax params on the UI Query form (steffkes) Optimizations ---------------------- * SOLR-3715: improve concurrency of the transaction log by removing synchronization around log record serialization. (yonik) * SOLR-3807: Currently during recovery we pause for a number of seconds after waiting for the leader to see a recovering state so that any previous updates will have finished before our commit on the leader - we don't need this wait for peersync. (Mark Miller) * SOLR-3837: When a leader is elected and asks replicas to sync back to him and that fails, we should ask those nodes to recovery asynchronously rather than synchronously. (Mark Miller) * SOLR-3709: Cache the url list created from the ClusterState in CloudSolrServer on each request. (Mark Miller) Bug Fixes ---------------------- * SOLR-3685: Solr Cloud sometimes skipped peersync attempt and replicated instead due to tlog flags not being cleared when no updates were buffered during a previous replication. (Markus Jelsma, Mark Miller, yonik) * SOLR-3229: Fixed TermVectorComponent to work with distributed search (Hang Xie, hossman) * SOLR-3725: Fixed package-local-src-tgz target to not bring in unnecessary jars and binary contents. (Michael Dodsworth via rmuir) * SOLR-3649: Fixed bug in JavabinLoader that caused deleteById(List<String> ids) to not work in SolrJ (siren) * SOLR-3730: Rollback is not implemented quite right and can cause corner case fails in SolrCloud tests. (rmuir, Mark Miller) * SOLR-2981: Fixed StatsComponent to no longer return duplicated information when requesting multiple stats.facet fields. (Roman Kliewer via hossman) * SOLR-3743: Fixed issues with atomic updates and optimistic concurrency in conjunction with stored copyField targets by making real-time get never return copyField targets. (yonik) * SOLR-3746: Proper error reporting if updateLog is configured w/o necessary "_version_" field in schema.xml (hossman) * SOLR-3745: Proper error reporting if SolrCloud mode is used w/o necessary "_version_" field in schema.xml (hossman) * SOLR-3770: Overseer may lose updates to cluster state (siren) * SOLR-3721: Fix bug that could theoretically allow multiple recoveries to run briefly at the same time if the recovery thread join call was interrupted. (Per Steffensen, Mark Miller) * SOLR-3782: A leader going down while updates are coming in can cause shard inconsistency. (Mark Miller) * SOLR-3611: We do not show ZooKeeper data in the UI for a node that has children. (Mark Miller) * SOLR-3789: Fix bug in SnapPuller that caused "internal" compression to fail. (siren) * SOLR-3790: ConcurrentModificationException could be thrown when using hl.fl=*. Fixed in r1231606. (yonik, koji) * SOLR-3668: DataImport : Specifying Custom Parameters (steffkes) * SOLR-3793: UnInvertedField faceting cached big terms in the filter cache that ignored deletions, leading to duplicate documents in search later when a filter of the same term was specified. (Günter Hipler, hossman, yonik) * SOLR-3679: Core Admin UI gives no feedback if "Add Core" fails (steffkes, hossman) * SOLR-3795: Fixed LukeRequestHandler response to correctly return field name strings in copyDests and copySources arrays (hossman) * SOLR-3699: Fixed some Directory leaks when there were errors during SolrCore or SolrIndexWriter initialization. (hossman) * SOLR-3518: Include final 'hits' in log information when aggregating a distributed request (Markus Jelsma via hossman) * SOLR-3628: SolrInputField and SolrInputDocument are now consistently backed by Collections passed in to setValue/setField, and defensively copy values from Collections passed to addValue/addField (Tom Switzer via hossman) * SOLR-3595: CurrencyField now generates an appropriate error on schema init if it is configured as multiValued - this has never been properly supported, but previously failed silently in odd ways. (hossman) * SOLR-3823: Fix 'bq' parsing in edismax. Please note that this required reverting the negative boost support added by SOLR-3278 (hossman) * SOLR-3827: Fix shareSchema=true in solr.xml (Tomás Fernández Löbbe via hossman) * SOLR-3809: Fixed config file replication when subdirectories are used (Emmanuel Espina via hossman) * SOLR-3828: Fixed QueryElevationComponent so that using 'markExcludes' does not modify the result set or ranking of 'excluded' documents relative to not using elevation at all. (Alexey Serba via hossman) * SOLR-3569: Fixed debug output on distributed requests when there are no results found. (David Bowen via hossman) * SOLR-3811: Query Form using wrong values for dismax, edismax (steffkes) * SOLR-3779: DataImportHandler's LineEntityProcessor when used in conjunction with FileListEntityProcessor would only process the first file. (Ahmet Arslan via James Dyer) * SOLR-3791: CachedSqlEntityProcessor would throw a NullPointerException when a query returns a row with a NULL key. (Steffen Moelter via James Dyer) * SOLR-3833: When a election is started because a leader went down, the new leader candidate should decline if the last state they published was not active. (yonik, Mark Miller) * SOLR-3836: When doing peer sync, we should only count sync attempts that cannot reach the given host as success when the candidate leader is syncing with the replicas - not when replicas are syncing to the leader. (Mark Miller) * SOLR-3835: In our leader election algorithm, if on connection loss we found we did not create our election node, we should retry, not throw an exception. (Mark Miller) * SOLR-3834: A new leader on cluster startup should also run the leader sync process in case there was a bad cluster shutdown. (Mark Miller) * SOLR-3772: On cluster startup, we should wait until we see all registered replicas before running the leader process - or if they all do not come up, N amount of time. (Mark Miller) * SOLR-3756: If we are elected the leader of a shard, but we fail to publish this for any reason, we should clean up and re trigger a leader election. (Mark Miller) * SOLR-3812: ConnectionLoss during recovery can cause lost updates, leading to shard inconsistency. (Mark Miller) * SOLR-3813: When a new leader syncs, we need to ask all shards to sync back, not just those that are active. (Mark Miller) * SOLR-3641: CoreContainer is not persisting roles core attribute. (hossman, Mark Miller) * SOLR-3527: SolrCmdDistributor drops some of the important commit attributes (maxOptimizeSegments, softCommit, expungeDeletes) when sending a commit to replicas. (Andy Laird, Tomás Fernández Löbbe, Mark Miller) * SOLR-3844: SolrCore reload can fail because it tries to remove the index write lock while already holding it. (Mark Miller) * SOLR-3831: Atomic updates do not distribute correctly to other nodes. (Jim Musil, Mark Miller) * SOLR-3465: Replication causes two searcher warmups. (Michael Garski, Mark Miller) * SOLR-3645: /terms should default to distrib=false. (Nick Cotton, Mark Miller) * SOLR-3759: Various fixes to the example-DIH configs (Ahmet Arslan, hossman) * SOLR-3777: Dataimport-UI does not send unchecked checkboxes (Glenn MacStravic via steffkes) * SOLR-3850: DataImportHandler "cacheKey" parameter was incorrectly renamed "cachePk" (James Dyer) * SOLR-3087: Fixed DOMUtil so that code doing attribute validation will automatically ignore nodes in the reserved "xml" prefix - in particular this fixes some bugs related to xinclude and fieldTypes. (Amit Nithian, hossman) * SOLR-3783: Fixed Pivot Faceting to work with facet.missing=true (hossman) * SOLR-3869: A PeerSync attempt to it's replicas by a candidate leader should not fail on o.a.http.conn.ConnectTimeoutException. (Mark Miller) * SOLR-3875: Fixed index boosts on multi-valued fields when docBoost is used (hossman) * SOLR-3878: Exception when using open-ended range query with CurrencyField (janhoy) * SOLR-3891: CacheValue in CachingDirectoryFactory cannot be used outside of solr.core package. (phunt via Mark Miller) * SOLR-3892: Inconsistent locking when accessing cache in CachingDirectoryFactory from RAMDirectoryFactory and MockDirectoryFactory. (phunt via Mark Miller) * SOLR-3883: Distributed indexing forwards non-applicable request params. (Dan Sutton, Per Steffensen, yonik, Mark Miller) * SOLR-3903: Fixed MissingFormatArgumentException in ConcurrentUpdateSolrServer (hossman) * SOLR-3916: Fixed whitespace bug in parsing the fl param (hossman) Other Changes ---------------------- * SOLR-3690: Fixed binary release packages to include dependencies needed for the solr-test-framework (hossman) * SOLR-2857: The /update/json and /update/csv URLs were restored to aid in the migration of existing clients. (yonik) * SOLR-3691: SimplePostTool: Mode for crawling/posting web pages See http://wiki.apache.org/solr/ExtractingRequestHandler for examples (janhoy) * SOLR-3707: Upgrade Solr to Tika 1.2 (janhoy) * SOLR-2747: Updated changes2html.pl to handle Solr's CHANGES.txt; added target 'changes-to-html' to solr/build.xml. (Steve Rowe, Robert Muir) * SOLR-3752: When a leader goes down, have the Overseer clear the leader state in cluster.json (Mark Miller) * SOLR-3751: Add defensive checks for SolrCloud updates and requests that ensure the local state matches what we can tell the request expected. (Mark Miller) * SOLR-3773: Hash based on the external String id rather than the indexed representation for distributed updates. (Michael Garski, yonik, Mark Miller) * SOLR-3780: Maven build: Make solrj tests run separately from solr-core. (Steve Rowe) * SOLR-3772: Optionally, on cluster startup, we can wait until we see all registered replicas before running the leader process - or if they all do not come up, N amount of time. (Jan Høydahl, Per Steffensen, Mark Miller) * SOLR-3750: Optionally, on session expiration, we can explicitly wait some time before running the leader sync process so that we are sure every node participates. (Per Steffensen, Mark Miller) * SOLR-3824: Velocity: Error messages from search not displayed (janhoy) * SOLR-3826: Test framework improvements for specifying coreName on initCore (Amit Nithian, hossman) * SOLR-3749: Allow default UpdateLog syncLevel to be configured by solrconfig.xml (Raintung Li, Mark Miller) * SOLR-3845: Rename numReplicas to replicationFactor in Collections API. (yonik, Mark Miller) * SOLR-3815: SolrCloud - Add properties such as "range" to shards, which changes the clusterstate.json and puts the shard replicas under "replicas". (yonik) * SOLR-3871: SyncStrategy should use an executor for the threads it creates to request recoveries. (Mark Miller) * SOLR-3870: SyncStrategy should have a close so it can abort earlier on shutdown. (Mark Miller) ================== 4.0.0-BETA =================== Versions of Major Components --------------------- Apache Tika 1.1 Carrot2 3.5.0 Velocity 1.6.4 and Velocity Tools 2.0 Apache UIMA 2.3.1 Apache ZooKeeper 3.3.6 Upgrading from Solr 4.0.0-ALPHA ---------------------- Solr is now much more strict about requiring that the uniqueKeyField feature (if used) must refer to a field which is not multiValued. If you upgrade from an earlier version of Solr and see an error that your uniqueKeyField "can not be configured to be multivalued" please add 'multiValued="false"' to the <field /> declaration for your uniqueKeyField. See SOLR-3682 for more details. Detailed Change List ---------------------- New Features ---------------------- * LUCENE-4201: Added JapaneseIterationMarkCharFilterFactory to normalize Japanese iteration marks. (Robert Muir, Christian Moen) * SOLR-1856: In Solr Cell, literals should override Tika-parsed values. Patch adds a param "literalsOverride" which defaults to true, but can be set to "false" to let Tika-parsed values be appended to literal values (Chris Harris, janhoy) * SOLR-3488: Added a Collection management API for SolrCloud. (Tommaso Teofili, Sami Siren, yonik, Mark Miller) * SOLR-3559: Full deleteByQuery support with SolrCloud distributed indexing. All replicas of a shard will be consistent, even if updates arrive in a different order on different replicas. (yonik) * SOLR-1929: Index encrypted documents with ExtractingUpdateRequestHandler. By supplying resource.password=<mypw> or specifying an external file with regular expressions matching file names, Solr will decrypt and index PDFs and DOCX formats. (janhoy, Yiannis Pericleous) * SOLR-3562: Add options to remove instance dir or data dir on core unload. (Mark Miller, Per Steffensen) * SOLR-2702: The default directory factory was changed to NRTCachingDirectoryFactory which wraps the StandardDirectoryFactory and caches small files for improved Near Real-time (NRT) performance. (Mark Miller, yonik) * SOLR-2616: Include a sample java util logging configuration file. (David Smiley, Mark Miller) * SOLR-3460: Add cloud-scripts directory and a zkcli.sh|bat tool for easy scripting and interaction with ZooKeeper. (Mark Miller) * SOLR-1725: StatelessScriptUpdateProcessorFactory allows users to implement the full ScriptUpdateProcessor API using any scripting language with a javax.script.ScriptEngineFactory (Uri Boness, ehatcher, Simon Rosenthal, hossman) * SOLR-139: Change to updateable documents to create the document if it doesn't already exist. To assert that the document must exist, use the optimistic concurrency feature by specifying a _version_ of 1. (yonik) * LUCENE-2510, LUCENE-4044: Migrated Solr's Tokenizer-, TokenFilter-, and CharFilterFactories to the lucene-analysis module. To add new analysis modules to Solr (like ICU, SmartChinese, Morfologik,...), just drop in the JAR files from Lucene's binary distribution into your Solr instance's lib folder. The factories are automatically made available with SPI. (Chris Male, Robert Muir, Uwe Schindler) * SOLR-3634, SOLR-3635: CoreContainer and CoreAdminHandler will now remember and report back information about failures to initialize SolrCores. These failures will be accessible from the web UI and CoreAdminHandler STATUS command until they are "reset" by creating/renaming a SolrCore with the same name. (hossman, steffkes) * SOLR-1280: Added commented-out example of the new script update processor to the example configuration. See http://wiki.apache.org/solr/ScriptUpdateProcessor (ehatcher) * SOLR-3672: SimplePostTool: Improvements for posting files Support for auto mode, recursive and wildcards (janhoy) Optimizations ---------------------- * SOLR-3708: Add hashCode to ClusterState so that structures built based on the ClusterState can be easily cached. (Mark Miller) * SOLR-3709: Cache the url list created from the ClusterState in CloudSolrServer on each request. (Mark Miller, yonik) * SOLR-3710: Change CloudSolrServer so that update requests are only sent to leaders by default. (Mark Miller) Bug Fixes ---------------------- * SOLR-3582: Our ZooKeeper watchers respond to session events as if they are change events, creating undesirable side effects. (Trym R. Møller, Mark Miller) * SOLR-3467: ExtendedDismax escaping is missing several reserved characters (Michael Dodsworth via janhoy) * SOLR-3587: After reloading a SolrCore, the original Analyzer is still used rather than a new one. (Alexey Serba, yonik, rmuir, Mark Miller) * LUCENE-4185: Fix a bug where CharFilters were wrongly being applied twice. (Michael Froh, rmuir) * SOLR-3610: After reloading a core, indexing would fail on any newly added fields to the schema. (Brent Mills, rmuir) * SOLR-3377: edismax fails to correctly parse a fielded query wrapped by parens. This regression was introduced in 3.6. (Bernd Fehling, Jan Høydahl, yonik) * SOLR-3621: Fix rare concurrency issue when opening a new IndexWriter for replication or rollback. (Mark Miller) * SOLR-1781: Replication index directories not always cleaned up. (Markus Jelsma, Terje Sten Bjerkseth, Mark Miller) * SOLR-3639: Update ZooKeeper to 3.3.6 for a variety of bug fixes. (Mark Miller) * SOLR-3629: Typo in solr.xml persistence when overriding the solrconfig.xml file name using the "config" attribute prevented the override file from being used. (Ryan Zezeski, hossman) * SOLR-3642: Correct broken check for multivalued fields in stats.facet (Yandong Yao, hossman) * SOLR-3660: Velocity: Link to admin page broken (janhoy) * SOLR-3658: Adding thousands of docs with one UpdateProcessorChain instance can briefly create spikes of threads in the thousands. (yonik, Mark Miller) * SOLR-3656: A core reload now always uses the same dataDir. (Mark Miller, yonik) * SOLR-3662: Core reload bugs: a reload always obtained a non-NRT searcher, which could go back in time with respect to the previous core's NRT searcher. Versioning did not work correctly across a core reload, and update handler synchronization was changed to synchronize on core state since more than on update handler can coexist for a single index during a reload. (yonik) * SOLR-3663: There are a couple of bugs in the sync process when a leader goes down and a new leader is elected. (Mark Miller) * SOLR-3623: Fixed inconsistent treatment of third-party dependencies for solr contribs analysis-extras & uima (hossman) * SOLR-3652: Fixed range faceting to error instead of looping infinitely when 'gap' is zero -- or effectively zero due to floating point arithmetic underflow. (hossman) * SOLR-3648: Fixed VelocityResponseWriter template loading in SolrCloud mode. For the example configuration, this means /browse now works with SolrCloud. (janhoy, ehatcher) * SOLR-3677: Fixed misleading error message in web ui to distinguish between no SolrCores loaded vs. no /admin/ handler available. (hossman, steffkes) * SOLR-3428: SolrCmdDistributor flushAdds/flushDeletes can cause repeated adds/deletes to be sent (Mark Miller, Per Steffensen) * SOLR-3647: DistributedQueue should use our Solr zk client rather than the std zk client. ZooKeeper expiration can be permanent otherwise. (Mark Miller) Other Changes ---------------------- * SOLR-3524: Make discarding punctuation configurable in JapaneseTokenizerFactory. The default is to discard punctuation, but this is overridable as an expert option. (Kazuaki Hiraga, Jun Ohtani via Christian Moen) * SOLR-1770: Move the default core instance directory into a collection1 folder. (Mark Miller) * SOLR-3355: Add shard and collection to SolrCore statistics. (Michael Garski, Mark Miller) * SOLR-3575: solr.xml should default to persist=true (Mark Miller) * SOLR-3563: Unloading all cores in a SolrCloud collection will now cause the removal of that collection's meta data from ZooKeeper. (Mark Miller, Per Steffensen) * SOLR-3599: Add zkClientTimeout to solr.xml so that it's obvious how to change it and so that you can change it with a system property. (Mark Miller) * SOLR-3609: Change Solr's expanded webapp directory to be at a consistent path called solr-webapp rather than a temporary directory. (Mark Miller) * SOLR-3600: Raise the default zkClientTimeout from 10 seconds to 15 seconds. (Mark Miller) * SOLR-3215: Clone SolrInputDocument when distrib indexing so that update processors after the distrib update process do not process the document twice. (Mark Miller) * SOLR-3683: Improved error handling if an <analyzer> contains both an explicit class attribute, as well as nested factories. (hossman) * SOLR-3682: Fail to parse schema.xml if uniqueKeyField is multivalued (hossman) * SOLR-2115: DIH no longer requires the "config" parameter to be specified in solrconfig.xml. Instead, the configuration is loaded and parsed with every import. This allows the use of a different configuration with each import, and makes correcting configuration errors simpler. Also, the configuration itself can be passed using the "dataConfig" parameter rather than using a file (this previously worked in debug mode only). When configuration errors are encountered, the error message is returned in XML format. (James Dyer) * SOLR-3439: Make SolrCell easier to use out of the box. Also improves "/browse" to display rich-text documents correctly, along with facets for author and content_type. With the new "content" field, highlighting of body is supported. See also SOLR-3672 for easier posting of a whole directory structure. (Jack Krupansky, janhoy) * SOLR-3579: SolrCloud view should default to the graph view rather than tree view. (steffkes, Mark Miller) ================== 4.0.0-ALPHA ================== More information about this release, including any errata related to the release notes, upgrade instructions, or other changes may be found online at: https://wiki.apache.org/solr/Solr4.0 Versions of Major Components --------------------- Apache Tika 1.1 Carrot2 3.5.0 Velocity 1.6.4 and Velocity Tools 2.0 Apache UIMA 2.3.1 Apache ZooKeeper 3.3.4 Upgrading from Solr 3.6-dev ---------------------- * The Lucene index format has changed and as a result, once you upgrade, previous versions of Solr will no longer be able to read your indices. In a master/slave configuration, all searchers/slaves should be upgraded before the master. If the master were to be updated first, the older searchers would not be able to read the new index format. * Setting abortOnConfigurationError=false is no longer supported (since it has never worked properly). Solr will now warn you if you attempt to set this configuration option at all. (see SOLR-1846) * The default logic for the 'mm' param of the 'dismax' QParser has been changed. If no 'mm' param is specified (either in the query, or as a default in solrconfig.xml) then the effective value of the 'q.op' param (either in the query or as a default in solrconfig.xml or from the 'defaultOperator' option in schema.xml) is used to influence the behavior. If q.op is effectively "AND" then mm=100%. If q.op is effectively "OR" then mm=0%. Users who wish to force the legacy behavior should set a default value for the 'mm' param in their solrconfig.xml file. * The VelocityResponseWriter is no longer built into the core. Its JAR and dependencies now need to be added (via <lib> or solr/home lib inclusion), and it needs to be registered in solrconfig.xml like this: <queryResponseWriter name="velocity" class="solr.VelocityResponseWriter"/> * The update request parameter to choose Update Request Processor Chain is renamed from "update.processor" to "update.chain". The old parameter was deprecated but still working since Solr3.2, but is now removed entirely. * The <indexDefaults> and <mainIndex> sections of solrconfig.xml are discontinued and replaced with the <indexConfig> section. There are also better defaults. When migrating, if you don't know what your old settings mean, simply delete both <indexDefaults> and <mainIndex> sections. If you have customizations, put them in <indexConfig> section - with same syntax as before. * Two of the SolrServer subclasses in SolrJ were renamed/replaced. CommonsHttpSolrServer is now HttpSolrServer, and StreamingUpdateSolrServer is now ConcurrentUpdateSolrServer. * The PingRequestHandler no longer looks for a <healthcheck/> option in the (legacy) <admin> section of solrconfig.xml. Users who wish to take advantage of this feature should configure a "healthcheckFile" init param directly on the PingRequestHandler. As part of this change, relative file paths have been fixed to be resolved against the data dir. See the example solrconfig.xml and SOLR-1258 for more details. * Due to low level changes to support SolrCloud, the uniqueKey field can no longer be populated via <copyField/> or <field default=...> in the schema.xml. Users wishing to have Solr automatically generate a uniqueKey value when adding documents should instead use an instance of solr.UUIDUpdateProcessorFactory in their update processor chain. See SOLR-2796 for more details. Detailed Change List ---------------------- New Features ---------------------- * SOLR-3272: Solr filter factory for MorfologikFilter (Polish lemmatisation). (Rafał Kuć via Dawid Weiss, Steven Rowe, Uwe Schindler). * SOLR-571: The autowarmCount for LRUCaches (LRUCache and FastLRUCache) now supports "percentages" which get evaluated relative the current size of the cache when warming happens. (Tomás Fernández Löbbe and hossman) * SOLR-1932: New relevancy function queries: termfreq, tf, docfreq, idf norm, maxdoc, numdocs. (yonik) * SOLR-1665: Add debug component options for timings, results and query info only (gsingers, hossman, yonik) * SOLR-2112: Solrj API now supports streaming results. (ryan) * SOLR-792: Adding PivotFacetComponent for Hierarchical faceting (ehatcher, Jeremy Hinegardner, Thibaut Lassalle, ryan) * LUCENE-2507, SOLR-2571, SOLR-2576: Added DirectSolrSpellChecker, which uses Lucene's DirectSpellChecker to retrieve correction candidates directly from the term dictionary using levenshtein automata. (James Dyer, rmuir) * SOLR-1873, SOLR-2358: SolrCloud - added shared/central config and core/shard management via zookeeper, built-in load balancing, and distributed indexing. (Jamie Johnson, Sami Siren, Ted Dunning, yonik, Mark Miller) Additional Work: - SOLR-2324: SolrCloud solr.xml parameters are not persisted by CoreContainer. (Massimo Schiavon, Mark Miller) - SOLR-2287: Allow users to query by multiple, compatible collections with SolrCloud. (Soheb Mahmood, Alex Cowell, Mark Miller) - SOLR-2622: ShowFileRequestHandler does not work in SolrCloud mode. (Stefan Matheis, Mark Miller) - SOLR-3108: Error in SolrCloud's replica lookup code when replica's are hosted in same Solr instance. (Bruno Dumon, Sami Siren, Mark Miller) - SOLR-3080: Remove shard info from zookeeper when SolrCore is explicitly unloaded. (yonik, Mark Miller, siren) - SOLR-3437: Recovery issues a spurious commit to the cluster. (Trym R. Møller via Mark Miller) - SOLR-2822: Skip update processors already run on other nodes (hossman) * SOLR-1566: Transforming documents in the ResponseWriters. This will allow for more complex results in responses and open the door for function queries as results. (ryan with patches from grant, noble, cmale, yonik, Jan Høydahl, Arul Kalaipandian, Luca Cavanna, hossman) - SOLR-2037: Thanks to SOLR-1566, documents boosted by the QueryElevationComponent can be marked as boosted. (gsingers, ryan, yonik) * SOLR-2396: Add CollationField, which is much more efficient than the Solr 3.x CollationKeyFilterFactory, and also supports Locale-sensitive range queries. (rmuir) * SOLR-2338: Add support for using <similarity/> in a schema's fieldType, for customizing scoring on a per-field basis. (hossman, yonik, rmuir) * SOLR-2335: New 'field("...")' function syntax for referring to complex field names (containing whitespace or special characters) in functions. * SOLR-2383: /browse improvements: generalize range and date facet display (Jan Høydahl via yonik) * SOLR-2272: Pseudo-join queries / filters. Examples: - To restrict to the set of parents with at least one blue-eyed child: fq={!join from=parent to=name}eyes:blue - To restrict to the set of children with at least one blue-eyed parent: fq={!join from=name to=parent}eyes:blue (yonik) * SOLR-1942: Added the ability to select postings format per fieldType in schema.xml as well as support custom Codecs in solrconfig.xml. (simonw via rmuir) * SOLR-2136: Boolean type added to function queries, along with new functions exists(), if(), and(), or(), xor(), not(), def(), and true and false constants. (yonik) * SOLR-2491: Add support for using spellcheck collation in conjunction with grouping. Note that the number of hits returned for collations is the number of ungrouped hits. (James Dyer via rmuir) * SOLR-1298: Return FunctionQuery as pseudo field. The solr 'fl' param now supports functions. For example: fl=id,sum(x,y) -- NOTE: only functions with fast random access are recommended. (yonik, ryan) * SOLR-705: Optionally return shard info with each document in distributed search. Use fl=id,[shard] to return the shard url. (ryan) * SOLR-2417: Add explain info directly to return documents using ?fl=id,[explain] (ryan) * SOLR-2533: Converted ValueSource.ValueSourceSortField over to new rewriteable Lucene SortFields. ValueSourceSortField instances must be rewritten before they can be used. This is done by SolrIndexSearcher when necessary. (Chris Male). * SOLR-2193, SOLR-2565: You may now specify a 'soft' commit when committing. This will use Lucene's NRT feature to avoid guaranteeing documents are on stable storage in exchange for faster reopen times. There is also a new 'soft' autocommit tracker that can be configured. (Mark Miller, Robert Muir) * SOLR-2399: Updated Solr Admin interface. New look and feel with per core administration and many new options. (Stefan Matheis via ryan) * SOLR-1032: CSV handler now supports "literal.field_name=value" parameters. (Simon Rosenthal, ehatcher) * SOLR-2656: realtime-get, efficiently retrieves the latest stored fields for specified documents, even if they are not yet searchable (i.e. without reopening a searcher) (yonik) * SOLR-2703: Added support for Lucene's "surround" query parser. (Simon Rosenthal, ehatcher) * SOLR-2754: Added factories for several ranking algorithms: - BM25SimilarityFactory: Okapi BM25 - DFRSimilarityFactory: Divergence from Randomness models - IBSimilarityFactory: Information-based models - LMDirichletSimilarity: LM with Dirichlet smoothing - LMJelinekMercerSimilarity: LM with Jelinek-Mercer smoothing (David Mark Nemeskey, Robert Muir) * SOLR-2134 Trie* fields should support sortMissingLast=true, and deprecate Sortable* Field Types (Ryan McKinley, Mike McCandless, Uwe Schindler, Erick Erickson) * SOLR-2438 added MultiTermAwareComponent to the various classes to allow automatic lowercasing for multiterm queries (wildcards, regex, prefix, range, etc). You can now optionally specify a "multiterm" analyzer in our schema.xml, but Solr should "do the right thing" if you don't specify <analyzer type="multiterm"> (Pete Sturge Erick Erickson, Mentoring from Seeley and Muir) * SOLR-2481: Add support for commitWithin in DataImportHandler (Sami Siren via yonik) * SOLR-2992: Add support for IndexWriter.prepareCommit() via prepareCommit=true on update URLs. (yonik) * SOLR-2906: Added LFU cache options to Solr. (Shawn Heisey via Erick Erickson) * SOLR-3069: Ability to add openSearcher=false to not open a searcher when doing a hard commit. commitWithin now only invokes a softCommit. (yonik) * SOLR-2802: New FieldMutatingUpdateProcessor and Factory to simplify the development of UpdateProcessors that modify field values of documents as they are indexed. Also includes several useful new implementations: - RemoveBlankFieldUpdateProcessorFactory - TrimFieldUpdateProcessorFactory - HTMLStripFieldUpdateProcessorFactory - RegexReplaceProcessorFactory - FieldLengthUpdateProcessorFactory - ConcatFieldUpdateProcessorFactory - FirstFieldValueUpdateProcessorFactory - LastFieldValueUpdateProcessorFactory - MinFieldValueUpdateProcessorFactory - MaxFieldValueUpdateProcessorFactory - TruncateFieldUpdateProcessorFactory - IgnoreFieldUpdateProcessorFactory (hossman, janhoy) * SOLR-3120: Optional post filtering for spatial queries bbox and geofilt for LatLonType. (yonik) * SOLR-2459: Expose LogLevel selection with a RequestHandler rather then servlet (Stefan Matheis, Upayavira, ryan) * SOLR-3134: Include shard info in distributed response when shards.info=true (Russell Black, ryan) * SOLR-2898: Support grouped faceting. (Martijn van Groningen) Additional Work: - SOLR-3406: Extended grouped faceting support to facet.query and facet.range parameters. (David Boychuck, Martijn van Groningen) * SOLR-2949: QueryElevationComponent is now supported with distributed search. (Mark Miller, yonik) * SOLR-3221: Added the ability to directly configure aspects of the concurrency and thread-pooling used within distributed search in solr. This allows for finer grained controlled and can be tuned by end users to target their own specific requirements. This builds on the work of the HttpCommComponent and uses the same configuration block to configure the thread pool. The default configuration has the same behaviour as solr 3.5, favouring throughput over latency. More information can be found on the wiki (http://wiki.apache.org/solr/SolrConfigXml) (Greg Bowyer) * SOLR-3278: Negative boost support to the Extended Dismax Query Parser Boost Query (bq). (James Dyer) * SOLR-3255: OpenExchangeRates.Org Exchange Rate Provider for CurrencyField (janhoy) * SOLR-3358: Logging events are captured and available from the /admin/logging request handler. (ryan) * SOLR-1535: PreAnalyzedField type provides a functionality to index (and optionally store) field content that was already processed and split into tokens using some external processing chain. Serialization format is pluggable, and defaults to JSON. (ab) * SOLR-3363: Consolidated Exceptions in Analysis Factories so they only throw InitializationExceptions (Chris Male) * SOLR-2690: New support for a "TZ" request param which overrides the TimeZone used when rounding Dates in DateMath expressions for the entire request (all date range queries and date faceting is affected). The default TZ is still UTC. (David Schlotfeldt, hossman) * SOLR-3402: Analysis Factories are now configured with their Lucene Version throw setLuceneMatchVersion, rather than through the Map passed to init. Parsing and simple error checking for the Version is now done inside the code that creates the Analysis Factories. (Chris Male) * SOLR-3178: Optimistic locking. If a _version_ is provided with an update that does not match the version in the index, an HTTP 409 error (Conflict) will result. (Per Steffensen, yonik) * SOLR-139: Updateable documents. JSON Example: {"id":"mydoc", "f1":{"set":10}, "f2":{"add":20}} will result in field "f1" being set to 10, "f2" having an additional value of 20 added, and all other existing fields unchanged. All source fields must be stored for this feature to work correctly. (Ryan McKinley, Erik Hatcher, yonik) * SOLR-2857: Support XML,CSV,JSON, and javabin in a single RequestHandler and choose the correct ContentStreamLoader based on Content-Type header. This also deprecates the existing [Xml,JSON,CSV,Binary,Xslt]UpdateRequestHandler. (ryan) * SOLR-2585: Context-Sensitive Spelling Suggestions & Collations. This adds support for the "spellcheck.alternativeTermCount" & "spellcheck.maxResultsForSuggest" parameters, letting users receive suggestions even when all the queried terms exist in the dictionary. This differs from "spellcheck.onlyMorePopular" in that the suggestions need not consist entirely of terms with a greater document frequency than the queried terms. (James Dyer) * SOLR-2058: Edismax query parser to allow "phrase slop" to be specified per-field on the pf/pf2/pf3 parameters using optional "FieldName~slop^boost" syntax. The prior "FieldName^boost" syntax is still accepted. In such cases the value on the "ps" parameter serves as the default slop. (Ron Mayer via James Dyer) * SOLR-3495: New UpdateProcessors have been added to create default values for configured fields. These works similarly to the <field default="..."/> option in schema.xml, but are applied in the UpdateProcessorChain, so they may be used prior to other UpdateProcessors, or to generate a uniqueKey field value when using the DistributedUpdateProcessor (ie: SolrCloud) TimestampUpdateProcessorFactory UUIDUpdateProcessorFactory DefaultValueUpdateProcessorFactory (hossman) * SOLR-2993: Add WordBreakSolrSpellChecker to offer suggestions by combining adjacent query terms and/or breaking terms into multiple words. This spellchecker can be configured with a traditional checker (ie: DirectSolrSpellChecker). The results are combined and collations can contain a mix of corrections from both spellcheckers. (James Dyer) * SOLR-3508: Simplify JSON update format for deletes as well as allow version specification for optimistic locking. Examples: - {"delete":"myid"} - {"delete":["id1","id2","id3"]} - {"delete":{"id":"myid", "_version_":123456789}} (yonik) * SOLR-3211: Allow parameter overrides in conjunction with "spellcheck.maxCollationTries". To do so, use parameters starting with "spellcheck.collateParam." For instance, to override the "mm" parameter, specify "spellcheck.collateParam.mm". This is helpful in cases where testing spellcheck collations for result counts should use different parameters from the main query (James Dyer) * SOLR-2599: CloneFieldUpdateProcessorFactory provides similar functionality to schema.xml's <copyField/> declaration but as an update processor that can be combined with other processors in any order. (Jan Høydahl & hossman) * SOLR-3351: eDismax: ps2 and ps3 params (janhoy) * SOLR-3542: Add WeightedFragListBuilder for FVH and set it to default fragListBuilder in example solrconfig.xml. (Sebastian Lutze, koji) * SOLR-2396: Add ICUCollationField to contrib/analysis-extras, which is much more efficient than the Solr 3.x ICUCollationKeyFilterFactory, and also supports Locale-sensitive range queries. (rmuir) Optimizations ---------------------- * SOLR-1875: Per-segment field faceting for single valued string fields. Enable with facet.method=fcs, control the number of threads used with the "threads" local param on the facet.field param. This algorithm will only be faster in the presence of rapid index changes. (yonik) * SOLR-1904: When facet.enum.cache.minDf > 0 and the base doc set is a SortedIntSet, convert to HashDocSet for better performance. (yonik) * SOLR-2092: Speed up single-valued and multi-valued "fc" faceting. Typical improvement is 5%, but can be much greater (up to 10x faster) when facet.offset is very large (deep paging). (yonik) * SOLR-2193, SOLR-2565: The default Solr update handler has been improved so that it uses fewer locks, keeps the IndexWriter open rather than closing it on each commit (ie commits no longer wait for background merges to complete), works with SolrCore to provide faster 'soft' commits, and has an improved API that requires less instanceof special casing. (Mark Miller, Robert Muir) Additional Work: - SOLR-2697: commit and autocommit operations don't reset DirectUpdateHandler2.numDocsPending stats attribute. (Alexey Serba, Mark Miller) * SOLR-2950: The QueryElevationComponent now avoids using the FieldCache and looking up every document id (gsingers, yonik) Bug Fixes ---------------------- * SOLR-3139: Make ConcurrentUpdateSolrServer send UpdateRequest.getParams() as HTTP request params (siren) * SOLR-3165: Cannot use DIH in Solrcloud + Zookeeper (Alexey Serba, Mark Miller, siren) * SOLR-3068: Occasional NPE in ThreadDumpHandler (siren) * SOLR-2762: FSTLookup could return duplicate results or one results less than requested. (David Smiley, Dawid Weiss) * SOLR-2741: Bugs in facet range display in trunk (janhoy) * SOLR-1908: Fixed SignatureUpdateProcessor to fail to initialize on invalid config. Specifically: a signatureField that does not exist, or overwriteDupes=true with a signatureField that is not indexed. (hossman) * SOLR-1824: IndexSchema will now fail to initialize if there is a problem initializing one of the fields or field types. (hossman) * SOLR-1928: TermsComponent didn't correctly break ties for non-text fields sorted by count. (yonik) * SOLR-2107: MoreLikeThisHandler doesn't work with alternate qparsers. (yonik) * SOLR-2108: Fixed false positives when using wildcard queries on fields with reversed wildcard support. For example, a query of *zemog* would match documents that contain 'gomez'. (Landon Kuhn via Robert Muir) * SOLR-1962: SolrCore#initIndex should not use a mix of indexPath and newIndexPath (Mark Miller) * SOLR-2275: fix DisMax 'mm' parsing to be tolerant of whitespace (Erick Erickson via hossman) * SOLR-2193, SOLR-2565, SOLR-2651: SolrCores now properly share IndexWriters across SolrCore reloads. (Mark Miller, Robert Muir) Additional Work: - SOLR-2705: On reload, IndexWriterProvider holds onto the initial SolrCore it was created with. (Yury Kats, Mark Miller) * SOLR-2682: Remove addException() in SimpleFacet. FacetComponent no longer catches and embeds exceptions occurred during facet processing, it throws HTTP 400 or 500 exceptions instead. (koji) * SOLR-2654: Directorys used by a SolrCore are now closed when they are no longer used. (Mark Miller) * SOLR-2854: Now load URL content stream data (via stream.url) when called for during request handling, rather than loading URL content streams automatically regardless of use. (David Smiley and Ryan McKinley via ehatcher) * SOLR-2829: Fix problem with false-positives due to incorrect equals methods. (Yonik Seeley, Hossman, Erick Erickson. Marc Tinnemeyer caught the bug) * SOLR-2848: Removed 'instanceof AbstractLuceneSpellChecker' hacks from distributed spellchecking code, and added a merge() method to SolrSpellChecker instead. Previously if you extended SolrSpellChecker your spellchecker would not work in distributed fashion. (James Dyer via rmuir) * SOLR-2509: StringIndexOutOfBoundsException in the spellchecker collate when the term contains a hyphen. (Thomas Gambier caught the bug, Steffen Godskesen did the patch, via Erick Erickson) * SOLR-1730: Made it clearer when a core failed to load as well as better logging when the QueryElevationComponent fails to properly initialize (gsingers) * SOLR-1520: QueryElevationComponent now supports non-string ids (gsingers) * SOLR-3037: When using binary format in solrj the codec screws up parameters (Sami Siren, Jörg Maier via yonik) * SOLR-3062: A join in the main query was not respecting any filters pushed down to it via acceptDocs since LUCENE-1536. (Mike Hugo, yonik) * SOLR-3214: If you use multiple fl entries rather than a comma separated list, all but the first entry can be ignored if you are using distributed search. (Tomás Fernández Löbbe via Mark Miller) * SOLR-3352: eDismax: pf2 should kick in for a query with 2 terms (janhoy) * SOLR-3361: ReplicationHandler "maxNumberOfBackups" doesn't work if backups are triggered on commit (James Dyer, Tomás Fernández Löbbe) * SOLR-2605: fixed tracking of the 'defaultCoreName' in CoreContainer so that CoreAdminHandler could return consistent information regardless of whether there is a a default core name or not. (steffkes, hossman) * SOLR-3370: fixed CSVResponseWriter to respect globs in the 'fl' param (Keith Fligg via hossman) * SOLR-3436: Group count incorrect when not all shards are queried in the second pass. (Francois Perron, Martijn van Groningen) * SOLR-3454: Exception when using result grouping with main=true and using wt=javabin. (Ludovic Boutros, Martijn van Groningen) * SOLR-3446: Better errors when PatternTokenizerFactory is configured with an invalid pattern, and include the 'name' whenever possible in plugin init error messages. (hossman) * LUCENE-4075: Cleaner path usage in TestXPathEntityProcessor (Greg Bowyer via hossman) * SOLR-2923: IllegalArgumentException when using useFilterForSortedQuery on an empty index. (Adrien Grand via Mark Miller) * SOLR-2352: Fixed TermVectorComponent so that it will not fail if the fl param contains globs or psuedo-fields (hossman) * SOLR-3541: add missing solrj dependencies to binary packages. (Thijs Vonk via siren) * SOLR-3522: fixed parsing of the 'literal()' function (hossman) * SOLR-3548: Fixed a bug in the cachability of queries using the {!join} parser or the strdist() function, as well as some minor improvements to the hashCode implementation of {!bbox} and {!geofilt} queries. (hossman) * SOLR-3470: contrib/clustering: custom Carrot2 tokenizer and stemmer factories are respected now (Stanislaw Osinski, Dawid Weiss) * SOLR-3430: Added a new DIH test against a real SQL database. Fixed problems revealed by this new test related to the expanded cache support added to 3.6/SOLR-2382 (James Dyer) * SOLR-1958: When using the MailEntityProcessor, import would fail if fetchMailsSince was not specified. (Max Lynch via James Dyer) * SOLR-4289: Admin UI - JVM memory bar - dark grey "used" width is too small (steffkes, elyograg) Other Changes ---------------------- * SOLR-1846: Eliminate support for the abortOnConfigurationError option. It has never worked very well, and in recent versions of Solr hasn't worked at all. (hossman) * SOLR-1889: The default logic for the 'mm' param of DismaxQParser and ExtendedDismaxQParser has been changed to be determined based on the effective value of the 'q.op' param (hossman) * SOLR-1946: Misc improvements to the SystemInfoHandler: /admin/system (hossman) * SOLR-2289: Tweak spatial coords for example docs so they are a bit more spread out (Erick Erickson via hossman) * SOLR-2288: Small tweaks to eliminate compiler warnings. primarily using Generics where applicable in method/object declarations, and adding @SuppressWarnings("unchecked") when appropriate (hossman) * SOLR-2375: Suggester Lookup implementations now store trie data and load it back on init. This means that large tries don't have to be rebuilt on every commit or core reload. (ab) * SOLR-2413: Support for returning multi-valued fields w/o <arr> tag in the XMLResponseWriter was removed. XMLResponseWriter only no longer work with values less then 2.2 (ryan) * SOLR-2423: FieldType argument changed from String to Object Conversion from SolrInputDocument > Object > Fieldable is now managed by FieldType rather then DocumentBuilder. (ryan) * SOLR-2461: QuerySenderListener and AbstractSolrEventListener are now public (hossman) * LUCENE-2995: Moved some spellchecker and suggest APIs to modules/suggest: HighFrequencyDictionary, SortedIterator, TermFreqIterator, and the suggester APIs and implementations. (rmuir) * SOLR-2576: Remove deprecated SpellingResult.add(Token, int). (James Dyer via rmuir) * LUCENE-3232: Moved MutableValue classes to new 'common' module. (Chris Male) * LUCENE-2883: FunctionQuery, DocValues (and its impls), ValueSource (and its impls) and BoostedQuery have been consolidated into the queries module. They can now be found at o.a.l.queries.function. * SOLR-2027: FacetField.getValues() now returns an empty list if there are no values, instead of null (Chris Male) * SOLR-1825: SolrQuery.addFacetQuery now enables facets automatically, like addFacetField (Chris Male) * SOLR-2663: FieldTypePluginLoader has been refactored out of IndexSchema and made public. (hossman) * SOLR-2331,SOLR-2691: Refactor CoreContainer's SolrXML serialization code and improve testing (Yury Kats, hossman, Mark Miller) * SOLR-2698: Enhance CoreAdmin STATUS command to return index size. (Yury Kats, hossman, Mark Miller) * SOLR-2654: The same Directory instance is now always used across a SolrCore so that it's easier to add other DirectoryFactory's without static caching hacks. (Mark Miller) * LUCENE-3286: 'luke' ant target has been disabled due to incompatibilities with XML queryparser location (Chris Male) * SOLR-1897: The data dir from the core descriptor should override the data dir from the solrconfig.xml rather than the other way round. (Mark Miller) * SOLR-2756: Maven configuration: Excluded transitive stax:stax-api dependency from org.codehaus.woodstox:wstx-asl dependency. (David Smiley via Steve Rowe) * SOLR-2588: Moved VelocityResponseWriter back to contrib module in order to remove it as a mandatory core dependency. (ehatcher) * SOLR-2862: More explicit lexical resources location logged if Carrot2 clustering extension is used. Fixed solr. impl. of IResource and IResourceLookup. (Dawid Weiss) * SOLR-1123: Changed JSONResponseWriter to now use application/json as its Content-Type by default. However the Content-Type can be overwritten and is set to text/plain in the example configuration. (Uri Boness, Chris Male) * SOLR-2607: Removed deprecated client/ruby directory, which included solr-ruby and flare. (ehatcher) * SOLR-3032: logOnce from SolrException logOnce and all the supporting structure is gone. abortOnConfigurationError is also gone as it is no longer referenced. Errors should be caught and logged at the top-most level or logged and NOT propagated up the chain. (Erick Erickson) * SOLR-2105: Remove support for deprecated "update.processor" (since 3.2), in favor of "update.chain" (janhoy) * SOLR-3005: Default QueryResponseWriters are now initialized via init() with an empty NamedList. (Gasol Wu, Chris Male) * SOLR-2607: Removed obsolete client/ folder (ehatcher, Eric Pugh, janhoy) * SOLR-3202, SOLR-3244: Dropping Support for JSP. New Admin UI is all client side (ryan, Aliaksandr Zhuhrou, Uwe Schindler) * SOLR-3159: Upgrade example and tests to run with Jetty 8 (ryan) * SOLR-3254: Upgrade Solr to Tika 1.1 (janhoy) * SOLR-3329: Dropped getSourceID() from SolrInfoMBean and using getClass().getPackage().getSpecificationVersion() for Version. (ryan) * SOLR-3302: Upgraded SLF4j to version 1.6.4 (hossman) * SOLR-3322: Add more context to IndexReaderFactory.newReader (ab) * SOLR-3343: Moved FastWriter, FileUtils, RegexFileFilter, RTimer and SystemIdResolver from org.apache.solr.common to org.apache.solr.util (Chris Male) * SOLR-3357: ResourceLoader.newInstance now accepts a Class representation of the expected instance type (Chris Male) * SOLR-3388: HTTP caching is now disabled by default for RequestUpdateHandlers. (ryan) * SOLR-3309: web.xml now specifies metadata-complete=true (which requires Servlet 2.5) to prevent servlet containers from scanning class annotations on startup. This allows for faster startup times on some servlet containers. (Bill Bell, hossman) * SOLR-1893: Refactored some common code from LRUCache and FastLRUCache into SolrCacheBase (Tomás Fernández Löbbe via hossman) * SOLR-3403: Deprecated Analysis Factories now log their own deprecation messages. No logging support is provided by Factory parent classes. (Chris Male) * SOLR-1258: PingRequestHandler is now directly configured with a "healthcheckFile" instead of looking for the legacy <admin><healthcheck/></admin> syntax. Filenames specified as relative paths have been fixed so that they are resolved against the data dir instead of the CWD of the java process. (hossman) * SOLR-3083: JMX beans now report Numbers as numeric values rather then String (Tagged Siteops, Greg Bowyer via ryan) * SOLR-2796: Due to low level changes to support SolrCloud, the uniqueKey field can no longer be populated via <copyField/> or <field default=...> in the schema.xml. * SOLR-3534: The Dismax and eDismax query parsers will fall back on the 'df' parameter when 'qf' is absent. And if neither is present nor the schema default search field then an exception will be thrown now. (dsmiley) * SOLR-3262: The "threads" feature of DIH is removed (deprecated in Solr 3.6) (James Dyer) * SOLR-3422: Refactored DIH internal data classes. All entities in data-config.xml must have a name (James Dyer) Documentation ---------------------- * SOLR-2232: Improved README info on solr.solr.home in examples (Eric Pugh and hossman) ================== 3.6.2 ================== Bug Fixes ---------------------- * SOLR-3790: ConcurrentModificationException could be thrown when using hl.fl=*. (yonik, koji) * SOLR-3589: Edismax parser does not honor mm parameter if analyzer splits a token. (Tom Burton-West, Robert Muir) ================== 3.6.1 ================== More information about this release, including any errata related to the release notes, upgrade instructions, or other changes may be found online at: https://wiki.apache.org/solr/Solr3.6.1 Bug Fixes * LUCENE-3969: Throw IAE on bad arguments that could cause confusing errors in PatternTokenizer. CommonGrams populates PositionLengthAttribute correctly. (Uwe Schindler, Mike McCandless, Robert Muir) * SOLR-3361: ReplicationHandler "maxNumberOfBackups" doesn't work if backups are triggered on commit (James Dyer, Tomás Fernández Löbbe) * SOLR-3375: Fix charset problems with HttpSolrServer (Roger Håkansson, yonik, siren) * SOLR-3436: Group count incorrect when not all shards are queried in the second pass. (Francois Perron, Martijn van Groningen) * SOLR-3454: Exception when using result grouping with main=true and using wt=javabin. (Ludovic Boutros, Martijn van Groningen) * SOLR-3489: Config file replication less error prone (Jochen Just via janhoy) * SOLR-3477: SOLR does not start up when no cores are defined (Tomás Fernández Löbbe via tommaso) * SOLR-3470: contrib/clustering: custom Carrot2 tokenizer and stemmer factories are respected now (Stanislaw Osinski, Dawid Weiss) * SOLR-3360: More DIH bug fixes for the deprecated "threads" parameter. (Mikhail Khludnev, Claudio R, via James Dyer) * SOLR-3430: Added a new DIH test against a real SQL database. Fixed problems revealed by this new test related to the expanded cache support added to 3.6/SOLR-2382 (James Dyer) * SOLR-3336: SolrEntityProcessor substitutes most variables at query time. (Michael Kroh, Lance Norskog, via Martijn van Groningen) ================== 3.6.0 ================== More information about this release, including any errata related to the release notes, upgrade instructions, or other changes may be found online at: https://wiki.apache.org/solr/Solr3.6 Upgrading from Solr 3.5 ---------------------- * SOLR-2983: As a consequence of moving the code which sets a MergePolicy from SolrIndexWriter to SolrIndexConfig, (custom) MergePolicies should now have an empty constructor; thus an IndexWriter should not be passed as constructor parameter but instead set using the setIndexWriter() method. * As doGet() methods in SimplePostTool was changed to static, the client applications of this class need to be recompiled. * In Solr version 3.5 and earlier, HTMLStripCharFilter had known bugs in the character offsets it provided, triggering e.g. exceptions in highlighting. HTMLStripCharFilter has been re-implemented, addressing this and other issues. See the entry for LUCENE-3690 in the Bug Fixes section below for a detailed list of changes. For people who depend on the behavior of HTMLStripCharFilter in Solr version 3.5 and earlier: the old implementation (bugs and all) is preserved as LegacyHTMLStripCharFilter. * As of Solr 3.6, the <indexDefaults> and <mainIndex> sections of solrconfig.xml are deprecated and replaced with a new <indexConfig> section. Read more in SOLR-1052 below. * SOLR-3040: The DIH's admin UI (dataimport.jsp) now requires DIH request handlers to start with a '/'. (dsmiley) * SOLR-3161: <requestDispatcher handleSelect="false"> is now the default. An existing config will probably work as-is because handleSelect was explicitly enabled in default configs. HandleSelect makes /select work as well as enables the 'qt' parameter. Instead, consider explicitly configuring /select as is done in the example solrconfig.xml, and register your other search handlers with a leading '/' which is a recommended practice. (David Smiley, Erik Hatcher) * SOLR-3161: Don't use the 'qt' parameter with a leading '/'. It probably won't work in 4.0 and it's now limited in 3.6 to SearchHandler subclasses that aren't lazy-loaded. * SOLR-2724: Specifying <defaultSearchField> and <solrQueryParser defaultOperator="..."/> in schema.xml is now considered deprecated. Instead you are encouraged to specify these via the "df" and "q.op" parameters in your request handler definition. (David Smiley) * Bugs found and fixed in the SignatureUpdateProcessor that previously caused some documents to produce the same signature even when the configured fields contained distinct (non-String) values. Users of SignatureUpdateProcessor are strongly advised that they should re-index as document signatures may have now changed. (see SOLR-3200 & SOLR-3226 for details) New Features ---------------------- * SOLR-2020: Add Java client that uses Apache Http Components http client (4.x). (Chantal Ackermann, Ryan McKinley, Yonik Seeley, siren) * SOLR-2854: Now load URL content stream data (via stream.url) when called for during request handling, rather than loading URL content streams automatically regardless of use. (David Smiley and Ryan McKinley via ehatcher) * SOLR-2904: BinaryUpdateRequestHandler should be able to accept multiple update requests from a stream (shalin) * SOLR-1565: StreamingUpdateSolrServer supports RequestWriter API and therefore, javabin update format (shalin) * SOLR-2438 added MultiTermAwareComponent to the various classes to allow automatic lowercasing for multiterm queries (wildcards, regex, prefix, range, etc). You can now optionally specify a "multiterm" analyzer in our schema.xml, but Solr should "do the right thing" if you don't specify <fieldType="multiterm"> (Pete Sturge Erick Erickson, Mentoring from Seeley and Muir) * SOLR-2919: Added support for localized range queries when the analysis chain uses CollationKeyFilter or ICUCollationKeyFilter. (Michael Sokolov, rmuir) * SOLR-2982: Added BeiderMorseFilterFactory for Beider-Morse (BMPM) phonetic encoder. Upgrades commons-codec to version 1.6 (Brooke Schreier Ganz, rmuir) * SOLR-1843: A new "rootName" attribute is now available when configuring <jmx/> in solrconfig.xml. If this attribute is set, Solr will use it as the root name for all MBeans Solr exposes via JMX. The default root name is "solr" followed by the core name. (Constantijn Visinescu, hossman) * SOLR-2906: Added LFU cache options to Solr. (Shawn Heisey via Erick Erickson) * SOLR-3036: Ability to specify overwrite=false on the URL for XML updates. (Sami Siren via yonik) * SOLR-2603: Add the encoding function for alternate fields in highlighting. (Massimo Schiavon, koji) * SOLR-1729: Evaluation of NOW for date math is done only once per request for consistency, and is also propagated to shards in distributed search. Adding a parameter NOW=<time_in_ms> to the request will override the current time. (Peter Sturge, yonik, Simon Willnauer) * SOLR-1709: Distributed support for Date and Numeric Range Faceting (Peter Sturge, David Smiley, hossman, Simon Willnauer) * SOLR-3054, LUCENE-3671: Add TypeTokenFilterFactory that creates TypeTokenFilter that filters tokens based on their TypeAttribute. (Tommaso Teofili via Uwe Schindler) * LUCENE-3305, SOLR-3056: Added Kuromoji morphological analyzer for Japanese. See the 'text_ja' fieldtype in the example to get started. (Christian Moen, Masaru Hasegawa via Robert Muir) * SOLR-1860: StopFilterFactory, CommonGramsFilterFactory, and CommonGramsQueryFilterFactory can optionally read stopwords in Snowball format (specify format="snowball"). (Robert Muir) * SOLR-3105: ElisionFilterFactory optionally allows the parameter ignoreCase (default=false). (Robert Muir) * LUCENE-3714: Add WFSTLookupFactory, a suggester that uses a weighted FST for more fine-grained suggestions. (Mike McCandless, Dawid Weiss, Robert Muir) * SOLR-3143: Add SuggestQueryConverter, a QueryConverter intended for auto-suggesters. (Robert Muir) * SOLR-3033: ReplicationHandler's backup command now supports a 'maxNumberOfBackups' init param that can be used to delete all but the most recent N backups. (Torsten Krah, James Dyer) * SOLR-2202: Currency FieldType, whith support for currencies and exchange rates (Greg Fodor & Andrew Morrison via janhoy, rmuir, Uwe Schindler) * SOLR-3026: eDismax: Locking down which fields can be explicitly queried (user fields aka uf) (janhoy, hossmann, Tomás Fernández Löbbe) * SOLR-2826: URLClassify Update Processor (janhoy) * SOLR-2764: Create a NorwegianLightStemmer and NorwegianMinimalStemmer (janhoy) * SOLR-3221: Added the ability to directly configure aspects of the concurrency and thread-pooling used within distributed search in solr. This allows for finer grained controlled and can be tuned by end users to target their own specific requirements. This builds on the work of the HttpCommComponent and uses the same configuration block to configure the thread pool. The default configuration has the same behaviour as solr 3.5, favouring throughput over latency. More information can be found on the wiki (http://wiki.apache.org/solr/SolrConfigXml) (Greg Bowyer) * SOLR-2001: The query component will substitute an empty query that matches no documents if the query parser returns null. This also prevents an exception from being thrown by the default parser if "q" is missing. (yonik) - SOLR-435: if q is "" then it's also acceptable. (dsmiley, hoss) * SOLR-2919: Added parametric tailoring options to ICUCollationKeyFilterFactory. These can be used to customize range query/sort behavior, for example to support numeric collation, ignore punctuation/whitespace, ignore accents but not case, control whether upper/lowercase values are sorted first, etc. (rmuir) * SOLR-2346: Add a chance to set content encoding explicitly via content type of stream for extracting request handler. This is convenient when Tika's auto detector cannot detect encoding, especially the text file is too short to detect encoding. (koji) * SOLR-1499: Added SolrEntityProcessor that imports data from another Solr core or instance based on a specified query. (Lance Norskog, Erik Hatcher, Pulkit Singhal, Ahmet Arslan, Luca Cavanna, Martijn van Groningen) * SOLR-3190: Minor improvements to SolrEntityProcessor. Add more consistency between solr parameters and parameters used in SolrEntityProcessor and ability to specify a custom HttpClient instance. (Luca Cavanna via Martijn van Groningen) * SOLR-2382: Added pluggable cache support to DIH so that any Entity can be made cache-able by adding the "cacheImpl" parameter. Include "SortedMapBackedCache" to provide in-memory caching (as previously this was the only option when using CachedSqlEntityProcessor). Users can provide their own implementations of DIHCache for other caching strategies. Deprecate CachedSqlEntityProcessor in favor of specifing "cacheImpl" with SqlEntityProcessor. Make SolrWriter implement DIHWriter and allow the possibility of pluggable Writers (DIH writing to something other than Solr). (James Dyer, Noble Paul) Optimizations ---------------------- * SOLR-1931: Speedup for LukeRequestHandler and admin/schema browser. New parameter reportDocCount defaults to 'false'. Old behavior still possible by specifying this as 'true' (Erick Erickson) * SOLR-3012: Move System.getProperty("type") in postData() to main() and add type argument so that the client applications of SimplePostTool can set content type via method argument. (koji) * SOLR-2888: FSTSuggester refactoring: internal storage is now UTF-8, external sorting (on disk) prevents OOMs even with large data sets (the bottleneck is now FST construction), code cleanups and API cleanups. (Dawid Weiss, Robert Muir) Bug Fixes ---------------------- * SOLR-3187 SystemInfoHandler leaks filehandles (siren) * LUCENE-3820: Fixed invalid position indexes by reimplementing PatternReplaceCharFilter. This change also drops real support for boundary characters -- all input is prebuffered for pattern matching. (Dawid Weiss) * SOLR-3068: Fixed NPE in ThreadDumpHandler (siren) * SOLR-2912: Fixed File descriptor leak in ShowFileRequestHandler (Michael Ryan, shalin) * SOLR-2819: Improved speed of parsing hex entities in HTMLStripCharFilter (Bernhard Berger, hossman) * SOLR-2509: StringIndexOutOfBoundsException in the spellchecker collate when the term contains a hyphen. (Thomas Gambier caught the bug, Steffen Godskesen did the patch, via Erick Erickson) * SOLR-2955: Fixed IllegalStateException when querying with group.sort=score desc in sharded environment. (Steffen Elberg Godskesen, Martijn van Groningen) * SOLR-2956: Fixed inconsistencies in the flags (and flag key) reported by the LukeRequestHandler (hossman) * SOLR-1730: Made it clearer when a core failed to load as well as better logging when the QueryElevationComponent fails to properly initialize (gsingers) * SOLR-1520: QueryElevationComponent now supports non-string ids (gsingers) * SOLR-3024: Fixed JSONTestUtil.matchObj, in previous releases it was not respecting the 'delta' arg (David Smiley via hossman) * SOLR-2542: Fixed DIH Context variables which were broken for all scopes other then SCOPE_ENTITY (Linbin Chen & Frank Wesemann via hossman) * SOLR-3042: Fixed Maven Jetty plugin configuration. (David Smiley via Steve Rowe) * SOLR-2970: CSV ResponseWriter returns fields defined as stored=false in schema (janhoy) * LUCENE-3690, LUCENE-2208, SOLR-882, SOLR-42: Re-implemented HTMLStripCharFilter as a JFlex-generated scanner and moved it to lucene/contrib/analyzers/common/. See below for a list of bug fixes and other changes. To get the same behavior as HTMLStripCharFilter in Solr version 3.5 and earlier (including the bugs), use LegacyHTMLStripCharFilter, which is the previous implementation. Behavior changes from the previous version: - Known offset bugs are fixed. - The "Mark invalid" exceptions reported in SOLR-1283 are no longer triggered (the bug is still present in LegacyHTMLStripCharFilter). - The character entity "'" is now always properly decoded. - More cases of <script> tags are now properly stripped. - CDATA sections are now handled properly. - Valid tag name characters now include the supplementary Unicode characters from Unicode character classes [:ID_Start:] and [:ID_Continue:]. - Uppercase character entities """, "©", ">", "<", "®", and "&" are now recognized and handled as if they were in lowercase. - The REPLACEMENT CHARACTER U+FFFD is now used to replace numeric character entities for unpaired UTF-16 low and high surrogates (in the range [U+D800-U+DFFF]). - Properly paired numeric character entities for UTF-16 surrogates are now converted to the corresponding code units. - Opening tags with unbalanced quotation marks are now properly stripped. - Literal "<" and ">" characters in opening tags, regardless of whether they appear inside quotation marks, now inhibit recognition (and stripping) of the tags. The only exception to this is for values of event-handler attributes, e.g. "onClick", "onLoad", "onSelect". - A newline '\n' is substituted instead of a space for stripped HTML markup. - Nothing is substituted for opening and closing inline tags - they are simply removed. The list of inline tags is (case insensitively): <a>, <abbr>, <acronym>, <b>, <basefont>, <bdo>, <big>, <cite>, <code>, <dfn>, <em>, <font>, <i>, <img>, <input>, <kbd>, <label>, <q>, <s>, <samp>, <select>, <small>, <span>, <strike>, <strong>, <sub>, <sup>, <textarea>, <tt>, <u>, and <var>. - HTMLStripCharFilterFactory now handles HTMLStripCharFilter's "escapedTags" feature: opening and closing tags with the given names, including any attributes and their values, are left intact in the output. (Steve Rowe) * LUCENE-3717: Fixed offset bugs in TrimFilter, WordDelimiterFilter, and HyphenatedWordsFilter where they would create invalid offsets in some situations, leading to problems in highlighting. (Robert Muir) * SOLR-2280: commitWithin ignored for a delete query (Juan Grande via janhoy) * SOLR-3073: Fixed 'Invalid UUID string' error when having an UUID field as the unique key and executing a distributed grouping request. (Devon Krisman, Martijn van Groningen) * SOLR-3084: Fixed initialization error when using <queryResponseWriter default="true" ... /> (Bernd Fehling and hossman) * SOLR-3109: Fixed numerous redundant shard requests when using distributed grouping. (rblack via Martijn van Groningen) * SOLR-3052: Fixed typo in distributed grouping parameters. (Martijn van Groningen, Grant Ingersoll) * SOLR-2909: Add support for ResourceLoaderAware tokenizerFactories in synonym filter factories. (Tom Klonikowski, Jun Ohtani via Koji Sekiguchi) * SOLR-3168: ReplicationHandler "numberToKeep" & "maxNumberOfBackups" parameters would keep only 1 backup, even if more than 1 was specified (Neil Hooey, James Dyer) * SOLR-3009: hitGrouped.vm isn't shipped with 3.x (ehatcher, janhoy) * SOLR-3195: timeAllowed is ignored for grouping queries (Russell Black via Martijn van Groningen) * SOLR-2124: Do not log stack traces for "Service Disabled" / 503 Exceptions (PingRequestHandler, etc) (James Dyer, others) * SOLR-3260: DataImportHandler: ScriptTransformer gives better error messages when problems arise on initalization (no Script Engine, invalid script, etc). (James Dyer) * SOLR-2959: edismax now respects the magic fields '_val_' and '_query_' (Michael Watts, hossman) * SOLR-3074: fix SolrPluginUtils.docListToSolrDocumentList to respect the list of fields specified. This fix also deprecates DocumentBuilder.loadStoredFields which is not used anywhere in Solr, and was fundamentally broken/bizarre. (hossman, Ahmet Arslan) * SOLR-2291: fix JSONWriter to respect field list when writing SolrDocuments (Ahmet Arslan via hossman) * SOLR-3264: Fix CoreContainer and SolrResourceLoader logging to be more clear about when SolrCores are being created, and stop misleading people about SolrCore instanceDir's being the "Solr Home Dir" (hossman) * SOLR-3046: Fix whitespace typo in DIH response "Time taken" (hossman) * SOLR-3261: Fix edismax to respect query operators when literal colons are used in query string. (Juan Grande via hossman) * SOLR-3226: Fix SignatureUpdateProcessor to no longer ignore non-String field values (Spyros Kapnissis, hossman) * SOLR-3200: Fix SignatureUpdateProcessor "all fields" mode to use all fields of each document instead of the fields specified by the first document indexed (Spyros Kapnissis via hossman) * SOLR-3316: Distributed grouping failed when rows parameter was set to 0 and sometimes returned a wrong hit count as matches. (Cody Young, Martijn van Groningen) * SOLR-3107: contrib/langid: When using the LangDetect implementation of langid, set the random seed to 0, so that the same document is detected as the same language with the same probability every time. (Christian Moen via rmuir) * SOLR-2937: Configuring the number of contextual snippets used for search results clustering. The hl.snippets parameter is now respected by the clustering plugin, can be overridden by carrot.summarySnippets if needed (Stanislaw Osinski). * SOLR-2938: Clustering on multiple fields. The carrot.title and carrot.snippet can now take comma- or space-separated lists of field names to cluster (Stanislaw Osinski). * SOLR-2939: Clustering of multilingual search results. The document's language field be passed in the carrot.lang parameter, the carrot.lcmap parameter enables mapping of language codes to ISO 639 (Stanislaw Osinski). * SOLR-2940: Passing values for custom Carrot2 fields to Clustering component. The custom field mapping are defined using the carrot.custom parameter (Stanislaw Osinski). * SOLR-2941: NullPointerException on clustering component initialization when schema does not have a unique key field (Stanislaw Osinski). * SOLR-2942: ClassCastException when passing non-textual fields to clustering component (Stanislaw Osinski). Other Changes ---------------------- * SOLR-2922: Upgrade commons-io and commons-lang to 2.1 and 2.6, respectively. (koji) * SOLR-2920: Refactor frequent conditional use of DefaultSolrParams and AppendedSolrParams into factory methods. (David Smiley via hossman) * SOLR-3032: Deprecate logOnce from SolrException logOnce and all the supporting structure will disappear in 4.0. Errors should be caught and logged at the top-most level or logged and NOT propagated up the chain. (Erick Erickson) * SOLR-2718: Add ability to lazy load response writers, defined with startup="lazy". (ehatcher) * SOLR-2901: Upgrade Solr to Tika 1.0 (janhoy) * SOLR-3059: Example XSL stylesheet for indexing query result XML (janhoy) * SOLR-3097, SOLR-3105: Add analysis configurations for different languages to the example. (Christian Moen, Robert Muir) * SOLR-3005: Default QueryResponseWriters are now initialized via init() with an empty NamedList. (Gasol Wu, Chris Male) * SOLR-3140: Upgrade schema version to 1.5, where omitNorms defaults to "true" for all primitive (non-analyzed) field types such as int, float, date, bool, string.. (janhoy) * SOLR-3077: Better error messages when attempting to use "blank" field names (Antony Stubbs via hossman) * SOLR-2712: expecting fl=score to return all fields is now deprecated. In solr 4.0, this will only return the score. (ryan) * SOLR-3156: Check for Lucene directory locks at startup. In previous versions this check was only performed during modifying (e.g. adding and deleting documents) the index. (Luca Cavanna via Martijn van Groningen) * SOLR-1052: Deprecated <indexDefaults> and <mainIndex> in solrconfig.xml From now, all settings go in the new <indexConfig> tag, and some defaults are changed: useCompoundFile=false, ramBufferSizeMB=32, lockType=native, so that the effect of NOT specifying <indexConfig> at all gives same result as the example config used to give in 3.5 (janhoy, gsingers) * SOLR-3294: In contrib/clustering/lib/, replaced the manually retrowoven Java 1.5-compatible carrot2-core-3.5.0.jar (which is not publicly available, except from the Solr Subversion repository), with newly released Java 1.5-compatible carrot2-core-3.5.0.1.jar (hosted on the Maven Central repository). Also updated dependencies jackson-core-asl and jackson-mapper-asl (both v1.5.2 -> v1.7.4). (Dawid Weiss, Steve Rowe) * SOLR-3295: netcdf jar is excluded from the binary release (and disabled in ivy.xml) because it requires java 6. If you want to parse this content with extracting request handler and are willing to use java 6, just add the jar. (rmuir) * SOLR-3142: DIH Imports no longer default optimize to true, instead false. If you want to force all segments to be merged into one, you can specify this parameter yourself. NOTE: this can be very expensive operation and usually does not make sense for delta-imports. (Robert Muir) Build ---------------------- * SOLR-2487: Add build target to package war without slf4j jars (janhoy) * SOLR-3112: Fix tests not to write to src/test-files (Luca Cavanna via Robert Muir) * LUCENE-3753: Restructure the Solr build system. (Steve Rowe) * SOLR-3204: The packaged pre-release artifact of Commons CSV used the original package name (org.apache.commons.csv). This created a compatibility issue as the Apache Commons team works toward an official release of Commons CSV. The source of Commons CSV was added under a separate package name to the Solr source code. (Uwe Schindler, Chris Male, Emmanuel Bourg) * LUCENE-3930: Changed build system to use Apache Ivy for retrival of 3rd party JAR files. Please review README.txt for instructions. (Robert Muir, Chris Male, Uwe Schindler, Steven Rowe, Hossman) ================== 3.5.0 ================== New Features ---------------------- * SOLR-2749: Add boundary scanners for FastVectorHighlighter. <boundaryScanner/> can be specified with a name in solrconfig.xml, and use hl.boundaryScanner=name parameter to specify the named <boundaryScanner/>. (koji) * SOLR-2066,SOLR-2776: Added support for distributed grouping. (Martijn van Groningen, Jasper van Veghel, Matt Beaumont) * SOLR-2769: Added factory for the new Hunspell stemmer capable of doing stemming for 99 languages (janhoy, cmale) * SOLR-1979: New contrib "langid". Adds language identification capabilities as an Update Processor, using Tika's LanguageIdentifier or Cybozu language-detection library (janhoy, Tommaso Teofili, gsingers) * SOLR-2818: Added before/after count response parsing support for range facets in SolrJ. (Bernhard Frauendienst via Martijn van Groningen) * SOLR-2276: Add support for cologne phonetic to PhoneticFilterFactory. (Marc Pompl via rmuir) * SOLR-1926: Add hl.q parameter. (koji) * SOLR-2881: Numeric types now support sortMissingFirst/Last. This includes Trie and date types (Ryan McKinley, Mike McCandless, Uwe Schindler, Erick Erickson) * SOLR-1023: StatsComponent now supports date fields and string fields. (Chris Male, Mark Holland, Gunnlaugur Thor Briem, Ryan McKinley) * SOLR-2578: ReplicationHandler's backup command now supports a 'numberToKeep' request param that can be used to delete all but the most recent N backups. (James Dyer via hossman) * SOLR-2839: Add alternative implementation to contrib/langid supporting 53 languages, based on http://code.google.com/p/language-detection/ (rmuir) Optimizations ---------------------- * SOLR-2742: SolrJ: Provide commitWithinMs as optional parameter for all add() methods, making the feature more conveniently accessible for developers (janhoy) Bug Fixes ---------------------- * SOLR-2748: The CommitTracker used for commitWith or autoCommit by maxTime could commit too frequently and could block adds until a new searcher was registered. (yonik) * SOLR-2726: Fixed NullPointerException when using spellcheck.q with Suggester. (Bernd Fehling, valentin via rmuir) * SOLR-2772: Fixed Date parsing/formatting of years 0001-1000 (hossman) * SOLR-2763: Extracting update request handler throws exception and returns 400 when zero-length file posted using multipart form post (janhoy) * SOLR-2780: Fixed issue where multi select facets didn't respect group.truncate parameter. (Martijn van Groningen, Ramzi Alqrainy) * SOLR-2793: In rare cases (most likely during shutdown), a SolrIndexSearcher can be left open if the executor rejects a task. (Mark Miller) * SOLR-2791: Replication: abortfetch command is broken if replication was started by fetchindex command instead of a regular poll (Yury Kats via shalin) * SOLR-2861: Fix extremely rare race condition on commit that can result in a NPE (yonik) * SOLR-2813: Fix HTTP error codes returned when requests contain strings that can not be parsed as numbers for Trie fields. (Jeff Crump and hossman) * SOLR-2902: List of collations are wrong parsed in SpellCheckResponse causing a wrong number of collation results in the response. (Bastiaan Verhoef, James Dyer via Simon Willnauer) * SOLR-2875: Fix the incorrect url in DIH example tika-data-config.xml (Shinichiro Abe via koji) Other Changes ---------------------- * SOLR-2750: Make both "update.chain" and the deprecated "update.param" work consistently everywhere; see also SOLR-2105. (Mark Miller, janhoy) * LUCENE-3410: Deprecated the WordDelimiterFilter constructors accepting multiple ints masquerading as booleans. Preferred constructor now accepts a single int bitfield (Chris Male) * SOLR-2758: Moved ConcurrentLRUCache from o.a.s.common.util package in the solrj module to the o.a.s.util package in the Solr core module. (David Smiley via Steve Rowe) * SOLR-2766: Package individual javadoc sites for solrj and test-framework. (Steve Rowe, Mike McCandless) * SOLR-2771: Solr modules' tests should not depend on solr-core test classes; move BufferingRequestProcessor from solr-core tests to test-framework so that the Solr Cell module can use it. (janhoy, Steve Rowe) * LUCENE-3457: Upgrade commons-compress to 1.2 (Doron Cohen) * SOLR-2757: min() and max() functions now support an arbitrary number of ValueSources (Bill Bell via hossman) * SOLR-2372: Upgrade Solr to Tika 0.10 (janhoy) * SOLR-2792: Allow case insensitive Hunspell stemming (janhoy, rmuir) * SOLR-2862: More explicit lexical resources location logged if Carrot2 clustering extension is used. Fixed solr. impl. of IResource and IResourceLookup. (Dawid Weiss) * SOLR-2849: Fix dependencies in Maven POMs. (David Smiley via Steve Rowe) * SOLR-2591: Remove commitLockTimeout option from solrconfig.xml (Luca Cavanna via Martijn van Groningen) * SOLR-2746: Upgraded UIMA dependencies from *-2.3.1-SNAPSHOT.jar to *-2.3.1.jar. ================== 3.4.0 ================== Upgrading from Solr 3.3 ---------------------- * The Lucene index format has changed and as a result, once you upgrade, previous versions of Solr will no longer be able to read your indices. In a master/slave configuration, all searchers/slaves should be upgraded before the master. If the master were to be updated first, the older searchers would not be able to read the new index format. * Previous versions of Solr silently allow and ignore some contradictory properties specified in schema.xml. For example: - indexed="false" omitNorms="false" - indexed="false" omitTermFreqAndPositions="false" Field property validation has now been fixed, to ensure that contradictions like these now generate error messages. If users have existing schemas that generate one of these new "conflicting 'false' field options for non-indexed field" error messages the conflicting "omit*" properties can safely be removed, or changed to "true" for consistent behavior with previous Solr versions. This situation has now been fixed to cause an error on startup when these contradictory options. See SOLR-2669. * FacetComponent no longer catches and embeds exceptions occurred during facet processing, it throws HTTP 400 or 500 exceptions instead. New Features ---------------------- * SOLR-2540: CommitWithin as an Update Request parameter You can now specify &commitWithin=N (ms) on the update request (janhoy) * SOLR-2458: post.jar enhanced to handle JSON, CSV and <optimize> (janhoy) * LUCENE-3234: add a new parameter hl.phraseLimit for FastVectorHighlighter speed up. (Mike Sokolov via koji) * SOLR-2429: Ability to add cache=false to queries and query filters to avoid using the filterCache or queryCache. A cost may also be specified and is used to order the evaluation of non-cached filters from least to greatest cost . For very expensive query filters (cost >= 100) if the query implements the PostFilter interface, it will be used to obtain a Collector that is checked only for documents that match the main query and all other filters. The "frange" query now implements the PostFilter interface. (yonik) * SOLR-2630: Added new XsltUpdateRequestHandler that works like XmlUpdateRequestHandler but allows to transform the POSTed XML document using XSLT. This allows to POST arbitrary XML documents to the update handler, as long as you also provide a XSL to transform them to a valid Solr input document. (Upayavira, Uwe Schindler) * SOLR-2615: Log individual updates (adds and deletes) at the FINE level before adding to the index. Fix a null pointer exception in logging when there was no unique key. (David Smiley via yonik) * LUCENE-2048: Added omitPositions to the schema, so you can omit position information while still indexing term frequencies. (rmuir) * SOLR-2584: add UniqFieldsUpdateProcessor that removes duplicate values in the specified fields. (Elmer Garduno, koji) * SOLR-2670: Added NIOFSDirectoryFactory (yonik) * SOLR-2523: Added support in SolrJ to easily interact with range facets. The range facet response can be parsed and is retrievable from the QueryResponse class. The SolrQuery class has convenient methods for using range facets. (Martijn van Groningen) * SOLR-2637: Added support for group result parsing in SolrJ. (Tao Cheng, Martijn van Groningen) * SOLR-2665: Added post group faceting. Facet counts are based on the most relevant document of each group matching the query. This feature has the same impact on the StatsComponent. (Martijn van Groningen) * SOLR-2675: CoreAdminHandler now allows arbitrary properties to be specified when CREATEing a new SolrCore using property.* request params. (Yury Kats, hossman) * SOLR-2714: JSON update format - "null" field values are now dropped instead of causing an exception. (Trygve Laugstøl, yonik) Optimizations ---------------------- * LUCENE-3233: Improved memory usage, build time, and performance of SynonymFilterFactory. (Mike McCandless, Robert Muir) Bug Fixes ---------------------- * SOLR-2625: TermVectorComponent throws NPE if TF-IDF option is used without DF option. (Daniel Erenrich, Simon Willnauer) * SOLR-2631: PingRequestHandler should not allow to ping itself using "qt" param to prevent infinite loop. (Edoardo Tosca, Uwe Schindler) * SOLR-2636: Fix explain functionality for negative queries. (Tom Hill via yonik) * SOLR-2538: Range Faceting on long/double fields could overflow if values bigger then the max int/float were used. (Erbi Hanka, hossman) * SOLR-2230: CommonsHttpSolrServer.addFile could not be used to send multiple files in a single request. (Stephan Günther, hossman) * SOLR-2541: PluginInfos was not correctly parsing <long/> tags when initializing plugins (Frank Wesemann, hossman) * SOLR-2623: Solr JMX MBeans do not survive core reloads (Alexey Serba, shalin) * Fixed grouping bug when start is bigger than rows and format is simple that zero documents are returned even if there are documents to display. (Martijn van Groningen, Nikhil Chhaochharia) * SOLR-2564: Fixed ArrayIndexOutOfBoundsException when using simple format and start > 0 (Martijn van Groningen, Matteo Melli) * SOLR-2642: Fixed sorting by function when using grouping. (Thomas Heigl, Martijn van Groningen) * SOLR-2535: REGRESSION: in Solr 3.x and trunk the admin/file handler fails to show directory listings (David Smiley, Peter Wolanin via Erick Erickson) * SOLR-2545: ExternalFileField file parsing would fail if any key contained an "=" character. It now only looks for the last "=" delimiter prior to the float value. (Markus Jelsma, hossman) * SOLR-2662: When Solr is configured to have no queryResultCache, the "start" parameter was not honored and the documents returned were 0 through start+offset. (Markus Jelsma, yonik) * SOLR-2669: Fix backwards validation of field properties in SchemaField.calcProps (hossman) * SOLR-2676: Add "welcome-file-list" to solr.war so admin UI works correctly in servlet containers such as WebSphere that do not use a default list (Jay R. Jaeger, hossman) * SOLR-2606: Fixed sort parsing of fields containing punctuation that failed due to sort by function changes introduced in SOLR-1297 (Mitsu Hadeishi, hossman) * SOLR-2706: contrib/clustering: The carrot.lexicalResourcesDir parameter now works with absolute directories (Stanislaw Osinski) * SOLR-2692: contrib/clustering: Typo in param name fixed: "carrot.fragzise" changed to "carrot.fragSize" (Stanislaw Osinski). * SOLR-2644: When using DIH with threads=2 the default logging is set too high (Bill Bell via shalin) * SOLR-2492: DIH does not commit if only deletes are processed (James Dyer via shalin) * SOLR-2186: DataImportHandler's multi-threaded option throws NPE (Lance Norskog, Frank Wesemann, shalin) * SOLR-2655: DIH multi threaded mode does not resolve attributes correctly (Frank Wesemann, shalin) * SOLR-2695: DIH: Documents are collected in unsynchronized list in multi-threaded debug mode (Michael McCandless, shalin) * SOLR-2668: DIH multithreaded mode does not rollback on errors from EntityProcessor (Frank Wesemann, shalin) Other Changes ---------------------- * SOLR-2629: Eliminate deprecation warnings in some JSPs. (Bernd Fehling, hossman) * SOLR-2743: Remove commons logging from contrib/extraction. (koji) Build ---------------------- * SOLR-2452,SOLR-2653,LUCENE-3323,SOLR-2659,LUCENE-3329,SOLR-2666: Rewrote the Solr build system: - Integrated more fully with the Lucene build system: generalized the Lucene build system and eliminated duplication. - Converted all Solr contribs to the Lucene/Solr conventional src/ layout: java/, resources/, test/, and test-files/<contrib-name>. - Created a new Solr-internal module named "core" by moving the java/, test/, and test-files/ directories from solr/src/ to solr/core/src/. - Merged solr/src/webapp/src/ into solr/core/src/java/. - Eliminated solr/src/ by moving all its directories up one level; renamed solr/src/site/ to solr/site-src/ because solr/site/ already exists. - Merged solr/src/common/ into solr/solrj/src/java/. - Moved o.a.s.client.solrj.* and o.a.s.common.* tests from solr/src/test/ to solr/solrj/src/test/. - Made the solrj tests not depend on the solr core tests by moving some classes from solr/src/test/ to solr/test-framework/src/java/. - Each internal module (core/, solrj/, test-framework/, and webapp/) now has its own build.xml, from which it is possible to run module-specific targets. solr/build.xml delegates all build tasks (via <ant dir="internal-module-dir"> calls) to these modules' build.xml files. (Steve Rowe, Robert Muir) * LUCENE-3406: Add ant target 'package-local-src-tgz' to Lucene and Solr to package sources from the local working copy. (Seung-Yeoul Yang via Steve Rowe) Documentation ---------------------- ================== 3.3.0 ================== Upgrading from Solr 3.2.0 ---------------------- * SolrCore's CloseHook API has been changed in a backward-incompatible way. It has been changed from an interface to an abstract class. Any custom components which use the SolrCore.addCloseHook method will need to be modified accordingly. To migrate, put your old CloseHook#close impl into CloseHook#preClose. New Features ---------------------- * SOLR-2378: A new, automaton-based, implementation of suggest (autocomplete) component, offering an order of magnitude smaller memory consumption compared to ternary trees and jaspell and very fast lookups at runtime. (Dawid Weiss) * SOLR-2400: Field- and DocumentAnalysisRequestHandler now provide a position history for each token, so you can follow the token through all analysis stages. The output contains a separate int[] attribute containing all positions from previous Tokenizers/TokenFilters (called "positionHistory"). (Uwe Schindler) * SOLR-2524: (SOLR-236, SOLR-237, SOLR-1773, SOLR-1311) Grouping / Field collapsing using the Lucene grouping contrib. The search result can be grouped by field and query. (Martijn van Groningen, Emmanuel Keller, Shalin Shekhar Mangar, Koji Sekiguchi, Iván de Prado, Ryan McKinley, Marc Sturlese, Peter Karich, Bojan Smid, Charles Hornberger, Dieter Grad, Dmitry Lihachev, Doug Steigerwald, Karsten Sperling, Michael Gundlach, Oleg Gnatovskiy, Thomas Traeger, Harish Agarwal, yonik, Michael McCandless, Bill Bell) * SOLR-1331 -- Added a srcCore parameter to CoreAdminHandler's mergeindexes action to merge one or more cores' indexes to a target core (shalin) * SOLR-2610 -- Add an option to delete index through CoreAdmin UNLOAD action (shalin) * SOLR-2480: Add ignoreTikaException flag to the extraction request handler so that users can ignore TikaException but index meta data. (Shinichiro Abe, koji) * SOLR-2582: Use uniqueKey for error log in UIMAUpdateRequestProcessor. (Tommaso Teofili via koji) Optimizations ---------------------- * SOLR-2567: Solr now defaults to TieredMergePolicy. See http://s.apache.org/merging for more information. (rmuir) Bug Fixes ---------------------- * SOLR-2519: Improve text_* fieldTypes in example schema.xml: improve cross-language defaults for text_general; break out separate English-specific fieldTypes (Jan Høydahl, hossman, Robert Muir, yonik, Mike McCandless) * SOLR-2462: Fix extremely high memory usage problems with spellcheck.collate. Separately, an additional spellcheck.maxCollationEvaluations (default=10000) parameter is added to avoid excessive CPU time in extreme cases (e.g. long queries with many misspelled words). (James Dyer via rmuir) * SOLR-2579: UIMAUpdateRequestProcessor ignore error fails if text.length() < 100. (Elmer Garduno via koji) * SOLR-2581: UIMAToSolrMapper wrongly instantiates Type with reflection. (Tommaso Teofili via koji) * SOLR-2551: Check dataimport.properties for write access (if delta-import is supported in DIH configuration) before starting an import (C S, shalin) Other Changes ---------------------- * SOLR-2571: Add a commented out example of the spellchecker's thresholdTokenFrequency parameter to the example solrconfig.xml, and also add a unit test for this feature. (James Dyer via rmuir) * SOLR-2576: Deprecate SpellingResult.add(Token token, int docFreq), please use SpellingResult.addFrequency(Token token, int docFreq) instead. (James Dyer via rmuir) * SOLR-2574: Upgrade slf4j to v1.6.1 (shalin) * LUCENE-3204: The maven-ant-tasks jar is now included in the source tree; users of the generate-maven-artifacts target no longer have to manually place this jar in the Ant classpath. NOTE: when Ant looks for the maven-ant-tasks jar, it looks first in its pre-existing classpath, so any copies it finds will be used instead of the copy included in the Lucene/Solr source tree. For this reason, it is recommeded to remove any copies of the maven-ant-tasks jar in the Ant classpath, e.g. under ~/.ant/lib/ or under the Ant installation's lib/ directory. (Steve Rowe) * SOLR-2611: Fix typos in the example configuration (Eric Pugh via rmuir) ================== 3.2.0 ================== Versions of Major Components --------------------- Apache Lucene trunk Apache Tika 0.8 Carrot2 3.4.2 Upgrading from Solr 3.1 ---------------------- * The updateRequestProcessorChain for a RequestHandler is now defined with update.chain rather than update.processor. The latter still works, but has been deprecated. * <uimaConfig/> just beneath <config> ... </config> is no longer supported. It should move to UIMAUpdateRequestProcessorFactory setting. See contrib/uima/README.txt for more details. (SOLR-2436) Detailed Change List ---------------------- New Features ---------------------- * SOLR-2496: Add ability to specify overwrite and commitWithin as request parameters (e.g. specified in the URL) when using the JSON update format, and added a simplified format for specifying multiple documents. Example: [{"id":"doc1"},{"id":"doc2"}] (yonik) * SOLR-2113: Add TermQParserPlugin, registered as "term". This is useful when generating filter queries from terms returned from field faceting or the terms component. Example: fq={!term f=weight}1.5 (hossman, yonik) * SOLR-1915: DebugComponent now supports using a NamedList to model Explanation objects in it's responses instead of Explanation.toString (hossman) * SOLR-2448: Search results clustering updates: bisecting k-means clustering algorithm added, loading of Carrot2 stop words from <solr.home>/conf/carrot2 (SOLR-2449), using Solr's stopwords.txt for clustering (SOLR-2450), output of cluster scores (SOLR-2505) (Stanislaw Osinski, Dawid Weiss). * SOLR-2503: extend UIMAUpdateRequestProcessorFactory mapping function to map feature value to dynamicField. (koji) * SOLR-2512: add ignoreErrors flag to UIMAUpdateRequestProcessorFactory so that users can ignore exceptions in AE. (Tommaso Teofili, koji) Optimizations ---------------------- Bug Fixes ---------------------- * SOLR-2445: Change the default qt to blank in form.jsp, because there is no "standard" request handler unless you have it in your solrconfig.xml explicitly. (koji) * SOLR-2455: Prevent double submit of forms in admin interface. (Jeffrey Chang via uschindler) * SOLR-2464: Fix potential slowness in QueryValueSource (the query() function) when the query is very sparse and may not match any documents in a segment. (yonik) * SOLR-2469: When using java replication with replicateAfter=startup, the first commit point on server startup is never removed. (yonik) * SOLR-2466: SolrJ's CommonsHttpSolrServer would retry requests on failure, regardless of the configured maxRetries, due to HttpClient having it's own retry mechanism by default. The retryCount of HttpClient is now set to 0, and SolrJ does the retry. (yonik) * SOLR-2409: edismax parser - treat the text of a fielded query as a literal if the fieldname does not exist. For example Mission: Impossible should not search on the "Mission" field unless it's a valid field in the schema. (Ryan McKinley, yonik) * SOLR-2403: facet.sort=index reported incorrect results for distributed search in a number of scenarios when facet.mincount>0. This patch also adds some performance/algorithmic improvements when (facet.sort=count && facet.mincount=1 && facet.limit=-1) and when (facet.sort=index && facet.mincount>0) (yonik) * SOLR-2333: The "rename" core admin action does not persist the new name to solr.xml (Rasmus Hahn, Paul R. Brown via Mark Miller) * SOLR-2390: Performance of usePhraseHighlighter is terrible on very large Documents, regardless of hl.maxDocCharsToAnalyze. (Mark Miller) * SOLR-2474: The helper TokenStreams in analysis.jsp and AnalysisRequestHandlerBase did not clear all attributes so they displayed incorrect attribute values for tokens in later filter stages. (uschindler, rmuir, yonik) * SOLR-2467: Fix <analyzer class="..." /> initialization so any errors are logged properly. (hossman) * SOLR-2493: SolrQueryParser was fixed to not parse the SolrConfig DOM tree on each instantiation which is a huge slowdown. (Stephane Bailliez via uschindler) * SOLR-2495: The JSON parser could hang on corrupted input and could fail to detect numbers that were too large to fit in a long. (yonik) * SOLR-2520: Make JSON response format escape \u2029 as well as \u2028 in strings since those characters are not valid in javascript strings (although they are valid in JSON strings). (yonik) * SOLR-2536: Add ReloadCacheRequestHandler to fix ExternalFileField bug (if reopenReaders set to true and no index segments have been changed, commit cannot trigger reload external file). (koji) * SOLR-2539: VectorValueSource.floatVal incorrectly used byteVal on sub-sources. (Tom Liu via yonik) * SOLR-2554: RandomSortField didn't work when used in a function query. (yonik) Other Changes ---------------------- * SOLR-2061: Pull base tests out into a new Solr Test Framework module, and publish binary, javadoc, and source test-framework jars. (Drew Farris, Robert Muir, Steve Rowe) * SOLR-2105: Rename RequestHandler param 'update.processor' to 'update.chain'. (Jan Høydahl via Mark Miller) * SOLR-2485: Deprecate BaseResponseWriter, GenericBinaryResponseWriter, and GenericTextResponseWriter. These classes will be removed in 4.0. (ryan) * SOLR-2451: Enhance assertJQ to allow individual tests to specify the tolerance delta used in numeric equalities. This allows for slight variance in asserting score comparisons in unit tests. (David Smiley, Chris Hostetter) * SOLR-2528: Remove default="true" from HtmlEncoder in example solrconfig.xml, because html encoding confuses non-ascii users. (koji) * SOLR-2387: add mock annotators for improved testing in contrib/uima, (Tommaso Teofili via rmuir) * SOLR-2436: move uimaConfig to under the uima's update processor in solrconfig.xml. (Tommaso Teofili, koji) Build ---------------------- * LUCENE-3006: Building javadocs will fail on warnings by default. Override with -Dfailonjavadocwarning=false (sarowe, gsingers) Documentation ---------------------- ================== 3.1.0 ================== Versions of Major Components --------------------- Apache Lucene 3.1.0 Apache Tika 0.8 Carrot2 3.4.2 Velocity 1.6.1 and Velocity Tools 2.0-beta3 Apache UIMA 2.3.1-SNAPSHOT Upgrading from Solr 1.4 ---------------------- * The Lucene index format has changed and as a result, once you upgrade, previous versions of Solr will no longer be able to read your indices. In a master/slave configuration, all searchers/slaves should be upgraded before the master. If the master were to be updated first, the older searchers would not be able to read the new index format. * The Solr JavaBin format has changed as of Solr 3.1. If you are using the JavaBin format, you will need to upgrade your SolrJ client. (SOLR-2034) * The experimental ALIAS command has been removed (SOLR-1637) * Using solr.xml is recommended for single cores also (SOLR-1621) * Old syntax of <highlighting> configuration in solrconfig.xml is deprecated (SOLR-1696) * The deprecated HTMLStripReader, HTMLStripWhitespaceTokenizerFactory and HTMLStripStandardTokenizerFactory were removed. To strip HTML tags, HTMLStripCharFilter should be used instead, and it works with any Tokenizer of your choice. (SOLR-1657) * Field compression is no longer supported. Fields that were formerly compressed will be uncompressed as index segments are merged. For shorter fields, this may actually be an improvement, as the compression used was not very good for short text. Some indexes may get larger though. * SOLR-1845: The TermsComponent response format was changed so that the "terms" container is a map instead of a named list. This affects response formats like JSON, but not XML. (yonik) * SOLR-1876: All Analyzers and TokenStreams are now final to enforce the decorator pattern. (rmuir, uschindler) * LUCENE-2608: Added the ability to specify the accuracy on a per request basis. It is recommended that implementations of SolrSpellChecker should change over to the new SolrSpellChecker methods using the new SpellingOptions class, but are not required to. While this change is backward compatible, the trunk version of Solr has already dropped support for all but the SpellingOptions method. (gsingers) * readercycle script was removed. (SOLR-2046) * In previous releases, sorting or evaluating function queries on fields that were "multiValued" (either by explicit declaration in schema.xml or by implict behavior because the "version" attribute on the schema was less then 1.2) did not generally work, but it would sometimes silently act as if it succeeded and order the docs arbitrarily. Solr will now fail on any attempt to sort, or apply a function to, multi-valued fields * The DataImportHandler jars are no longer included in the solr WAR and should be added in Solr's lib directory, or referenced via the <lib> directive in solrconfig.xml. Detailed Change List ---------------------- New Features ---------------------- * SOLR-1302: Added several new distance based functions, including Great Circle (haversine), Manhattan, Euclidean and String (using the StringDistance methods in the Lucene spellchecker). Also added geohash(), deg() and rad() convenience functions. See http://wiki.apache.org/solr/FunctionQuery. (gsingers) * SOLR-1553: New dismax parser implementation (accessible as "edismax") that supports full lucene syntax, improved reserved char escaping, fielded queries, improved proximity boosting, and improved stopword handling. Note: status is experimental for now. (yonik) * SOLR-1574: Add many new functions from java Math (e.g. sin, cos) (yonik) * SOLR-1569: Allow functions to take in literal strings by modifying the FunctionQParser and adding LiteralValueSource (gsingers) * SOLR-1571: Added unicode collation support though Lucene's CollationKeyFilter (Robert Muir via shalin) * SOLR-785: Distributed Search support for SpellCheckComponent (Matthew Woytowitz, shalin) * SOLR-1625: Add regexp support for TermsComponent (Uri Boness via noble) * SOLR-1297: Add sort by Function capability (gsingers, yonik) * SOLR-1139: Add TermsComponent Query and Response Support in SolrJ (Matt Weber via shalin) * SOLR-1177: Distributed Search support for TermsComponent (Matt Weber via shalin) * SOLR-1621, SOLR-1722: Allow current single core deployments to be specified by solr.xml (Mark Miller , noble) * SOLR-1532: Allow StreamingUpdateSolrServer to use a provided HttpClient (Gabriele Renzi via shalin) * SOLR-1653: Add PatternReplaceCharFilter (koji) * SOLR-1131: FieldTypes can now output multiple Fields per Type and still be searched. This can be handy for hiding the details of a particular implementation such as in the spatial case. (Chris Mattmann, shalin, noble, gsingers, yonik) * SOLR-1586: Add support for Geohash and Spatial Tile FieldType (Chris Mattmann, gsingers) * SOLR-1697: PluginInfo should load plugins w/o class attribute also (noble) * SOLR-1268: Incorporate FastVectorHighlighter (koji) * SOLR-1750: SolrInfoMBeanHandler added for simpler programmatic access to info currently available from registry.jsp and stats.jsp (ehatcher, hossman) * SOLR-1815: SolrJ now preserves the order of facet queries. (yonik) * SOLR-1677: Add support for choosing the Lucene Version for Lucene components within Solr. (Uwe Schindler, Mark Miller) * SOLR-1379: Add RAMDirectoryFactory for non-persistent in memory index storage. (Alex Baranov via yonik) * SOLR-1857: Synced Solr analysis with Lucene 3.1. Added KeywordMarkerFilterFactory and StemmerOverrideFilterFactory, which can be used to tune stemming algorithms. Added factories for Bulgarian, Czech, Hindi, Turkish, and Wikipedia analysis. Improved the performance of SnowballPorterFilterFactory. (rmuir) * SOLR-1657: Converted remaining TokenStreams to the Attributes-based API. All Solr TokenFilters now support custom Attributes, and some have improved performance: especially WordDelimiterFilter and CommonGramsFilter. (rmuir, cmale, uschindler) * SOLR-1740: ShingleFilterFactory supports the "minShingleSize" and "tokenSeparator" parameters for controlling the minimum shingle size produced by the filter, and the separator string that it uses, respectively. (Steven Rowe via rmuir) * SOLR-744: ShingleFilterFactory supports the "outputUnigramsIfNoShingles" parameter, to output unigrams if the number of input tokens is fewer than minShingleSize, and no shingles can be generated. (Chris Harris via Steven Rowe) * SOLR-1923: PhoneticFilterFactory now has support for the Caverphone algorithm. (rmuir) * SOLR-1957: The VelocityResponseWriter contrib moved to core. Example search UI now available at http://localhost:8983/solr/browse (ehatcher) * SOLR-1974: Add LimitTokenCountFilterFactory. (koji) * SOLR-1966: QueryElevationComponent can now return just the included results in the elevation file (gsingers, yonik) * SOLR-1556: TermVectorComponent now supports per field overrides. Also, it now throws an error if passed in fields do not exist and warnings if fields that do not have term vector options (termVectors, offsets, positions) that align with the schema declaration. It also will now return warnings about (gsingers) * SOLR-1985: FastVectorHighlighter: add wrapper class for Lucene's SingleFragListBuilder (koji) * SOLR-1984: Add HyphenationCompoundWordTokenFilterFactory. (PB via rmuir) * SOLR-397: Date Faceting now supports a "facet.date.include" param for specifying when the upper & lower end points of computed date ranges should be included in the range. Legal values are: "all", "lower", "upper", "edge", and "outer". For backwards compatibility the default value is the set: [lower,upper,edge], so that all ranges between start and end are inclusive of their endpoints, but the "before" and "after" ranges are not. * SOLR-945: JSON update handler that accepts add, delete, commit commands in JSON format. (Ryan McKinley, yonik) * SOLR-2015: Add a boolean attribute autoGeneratePhraseQueries to TextField. autoGeneratePhraseQueries="true" (the default) causes the query parser to generate phrase queries if multiple tokens are generated from a single non-quoted analysis string. For example WordDelimiterFilter splitting text:pdp-11 will cause the parser to generate text:"pdp 11" rather than (text:PDP OR text:11). Note that autoGeneratePhraseQueries="true" tends to not work well for non whitespace delimited languages. (yonik) * SOLR-1925: Add CSVResponseWriter (use wt=csv) that returns the list of documents in CSV format. (Chris Mattmann, yonik) * SOLR-1240: "Range Faceting" has been added. This is a generalization of the existing "Date Faceting" logic so that it now supports any all stock numeric field types that support range queries in addition to dates. facet.date is now deprecated in favor of this generalized mechanism. (Gijs Kunze, hossman) * SOLR-2021: Add SolrEncoder plugin to Highlighter. (koji) * SOLR-2030: Make FastVectorHighlighter use of SolrEncoder. (koji) * SOLR-2053: Add support for custom comparators in Solr spellchecker, per LUCENE-2479 (gsingers) * SOLR-2049: Add hl.multiValuedSeparatorChar for FastVectorHighlighter, per LUCENE-2603. (koji) * SOLR-2059: Add "types" attribute to WordDelimiterFilterFactory, which allows you to customize how WordDelimiterFilter tokenizes text with a configuration file. (Peter Karich, rmuir) * SOLR-2099: Add ability to throttle rsync based replication using rsync option --bwlimit. (Brandon Evans via koji) * SOLR-1316: Create autosuggest component. (Ankul Garg, Jason Rutherglen, Shalin Shekhar Mangar, Grant Ingersoll, Robert Muir, ab) * SOLR-1568: Added "native" filtering support for PointType, GeohashField. Added LatLonType with filtering support too. See http://wiki.apache.org/solr/SpatialSearch and the example. Refactored some items in Lucene spatial. Removed SpatialTileField as the underlying CartesianTier is broken beyond repair and is going to be moved. (gsingers) * SOLR-2128: Full parameter substitution for function queries. Example: q=add($v1,$v2)&v1=mul(popularity,5)&v2=20.0 (yonik) * SOLR-2133: Function query parser can now parse multiple comma separated value sources. It also now fails if there is extra unexpected text after parsing the functions, instead of silently ignoring it. This allows expressions like q=dist(2,vector(1,2),$pt)&pt=3,4 (yonik) * SOLR-2157: Suggester should return alpha-sorted results when onlyMorePopular=false (ab) * SOLR-2010: Added ability to verify that spell checking collations have actual results in the index. (James Dyer via gsingers) * SOLR-2188: Added "maxTokenLength" argument to the factories for ClassicTokenizer, StandardTokenizer, and UAX29URLEmailTokenizer. (Steven Rowe) * SOLR-2129: Added a Solr module for dynamic metadata extraction/indexing with Apache UIMA. See contrib/uima/README.txt for more information. (Tommaso Teofili via rmuir) * SOLR-2325: Allow tagging and exclusion of main query for faceting. (yonik) * SOLR-2263: Add ability for RawResponseWriter to stream binary files as well as text files. (Eric Pugh via yonik) * SOLR-860: Add debug output for MoreLikeThis. (koji) * SOLR-1057: Add PathHierarchyTokenizerFactory. (ryan, koji) * SOLR-1804: Re-enabled clustering component on trunk, updated to latest version of Carrot2. No more LGPL run-time dependencies. This release of C2 also does not have a specific Lucene dependency. (Stanislaw Osinski, gsingers) * SOLR-2282: Add distributed search support for search result clustering. (Brad Giaccio, Dawid Weiss, Stanislaw Osinski, rmuir, koji) * SOLR-2210: Add icu-based tokenizer and filters to contrib/analysis-extras (rmuir) * SOLR-1336: Add SmartChinese (word segmentation for Simplified Chinese) tokenizer and filters to contrib/analysis-extras (rmuir) * SOLR-2211,LUCENE-2763: Added UAX29URLEmailTokenizerFactory, which implements UAX#29, a unicode algorithm with good results for most languages, as well as URL and E-mail tokenization according to the relevant RFCs. (Tom Burton-West via rmuir) * SOLR-2237: Added StempelPolishStemFilterFactory to contrib/analysis-extras (rmuir) * SOLR-1525: allow DIH to refer to core properties (noble) * SOLR-1547: DIH TemplateTransformer copy objects more intelligently when the template is a single variable (noble) * SOLR-1627: DIH VariableResolver should be fetched just in time (noble) * SOLR-1583: DIH Create DataSources that return InputStream (noble) * SOLR-1358: Integration of Tika and DataImportHandler (Akshay Ukey, noble) * SOLR-1654: TikaEntityProcessor example added DIHExample (Akshay Ukey via noble) * SOLR-1678: Move onError handling to DIH framework (noble) * SOLR-1352: Multi-threaded implementation of DIH (noble) * SOLR-1721: Add explicit option to run DataImportHandler in synchronous mode (Alexey Serba via noble) * SOLR-1737: Added FieldStreamDataSource (noble) Optimizations ---------------------- * SOLR-1679: Don't build up string messages in SolrCore.execute unless they are necessary for the current log level. (Fuad Efendi and hossman) * SOLR-1874: Optimize PatternReplaceFilter for better performance. (rmuir, uschindler) * SOLR-1968: speed up initial filter cache population for facet.method=enum and also big terms for multi-valued facet.method=fc. The resulting speedup for the first facet request is anywhere from 30% to 32x, depending on how many terms are in the field and how many documents match per term. (yonik) * SOLR-2089: Speed up UnInvertedField faceting (facet.method=fc for multi-valued fields) when facet.limit is both high, and a high enough percentage of the number of unique terms in the field. Extreme cases yield speedups over 3x. (yonik) * SOLR-2046: add common functions to scripts-util. (koji) * SOLR-1684: Switch clustering component to use the SolrIndexSearcher.doc(int, Set<String>) method b/c it can use the document cache (gsingers) * SOLR-2200: Improve the performance of DataImportHandler for large delta-import updates. (Mark Waddle via rmuir) Bug Fixes ---------------------- * SOLR-1769: Solr 1.4 Replication - Repeater throwing NullPointerException (Jörgen Rydenius via noble) * SOLR-1432: Make the new ValueSource.getValues(context,reader) delegate to the original ValueSource.getValues(reader) so custom sources will work. (yonik) * SOLR-1572: FastLRUCache correctly implemented the LRU policy only for the first 2B accesses. (yonik) * SOLR-1582: copyField was ignored for BinaryField types (gsingers) * SOLR-1563: Binary fields, including trie-based numeric fields, caused null pointer exceptions in the luke request handler. (yonik) * SOLR-1577: The example solrconfig.xml defaulted to a solr data dir relative to the current working directory, even if a different solr home was being used. The new behavior changes the default to a zero length string, which is treated the same as if no dataDir had been specified, hence the "data" directory under the solr home will be used. (yonik) * SOLR-1584: SolrJ - SolrQuery.setIncludeScore() incorrectly added fl=score to the parameter list instead of appending score to the existing field list. (yonik) * SOLR-1580: Solr Configuration ignores 'mergeFactor' parameter, always uses Lucene default. (Lance Norskog via Mark Miller) * SOLR-1593: ReverseWildcardFilter didn't work for surrogate pairs (i.e. code points outside of the BMP), resulting in incorrect matching. This change requires reindexing for any content with such characters. (Robert Muir, yonik) * SOLR-1596: A rollback operation followed by the shutdown of Solr or the close of a core resulted in a warning: "SEVERE: SolrIndexWriter was not closed prior to finalize()" although there were no other consequences. (yonik) * SOLR-1595: StreamingUpdateSolrServer used the platform default character set when streaming updates, rather than using UTF-8 as the HTTP headers indicated, leading to an encoding mismatch. (hossman, yonik) * SOLR-1587: A distributed search request with fl=score, didn't match the behavior of a non-distributed request since it only returned the id,score fields instead of all fields in addition to score. (yonik) * SOLR-1601: Schema browser does not indicate presence of charFilter. (koji) * SOLR-1615: Backslash escaping did not work in quoted strings for local param arguments. (Wojtek Piaseczny, yonik) * SOLR-1628: log contains incorrect number of adds and deletes. (Thijs Vonk via yonik) * SOLR-343: Date faceting now respects facet.mincount limiting (Uri Boness, Raiko Eckstein via hossman) * SOLR-1624: Highlighter only highlights values from the first field value in a multivalued field when term positions (term vectors) are stored. (Chris Harris via yonik) * SOLR-1635: Fixed error message when numeric values can't be parsed by DOMUtils - notably for plugin init params in solrconfig.xml. (hossman) * SOLR-1651: Fixed Incorrect dataimport handler package name in SolrResourceLoader (Akshay Ukey via shalin) * SOLR-1660: CapitalizationFilter crashes if you use the maxWordCountOption (Robert Muir via shalin) * SOLR-1667: PatternTokenizer does not reset attributes such as positionIncrementGap (Robert Muir via shalin) * SOLR-1711: SolrJ - StreamingUpdateSolrServer had a race condition that could halt the streaming of documents. The original patch to fix this (never officially released) introduced another hanging bug due to connections not being released. (Attila Babo, Erik Hetzner, Johannes Tuchscherer via yonik) * SOLR-1748, SOLR-1747, SOLR-1746, SOLR-1745, SOLR-1744: Streams and Readers retrieved from ContentStreams are not closed in various places, resulting in file descriptor leaks. (Christoff Brill, Mark Miller) * SOLR-1753: StatsComponent throws NPE when getting statistics for facets in distributed search (Janne Majaranta via koji) * SOLR-1736:In the slave , If 'mov'ing file does not succeed , copy the file (noble) * SOLR-1579: Fixes to XML escaping in stats.jsp (David Bowen and hossman) * SOLR-1777: fieldTypes with sortMissingLast=true or sortMissingFirst=true can result in incorrectly sorted results. (yonik) * SOLR-1798: Small memory leak (~100 bytes) in fastLRUCache for every commit. (yonik) * SOLR-1823: Fixed XMLResponseWriter (via XMLWriter) so it no longer throws a ClassCastException when a Map containing a non-String key is used. (Frank Wesemann, hossman) * SOLR-1797: fix ConcurrentModificationException and potential memory leaks in ResourceLoader. (yonik) * SOLR-1850: change KeepWordFilter so a new word set is not created for each instance (John Wang via yonik) * SOLR-1706: fixed WordDelimiterFilter for certain combinations of options where it would output incorrect tokens. (Robert Muir, Chris Male) * SOLR-1936: The JSON response format needed to escape unicode code point U+2028 - 'LINE SEPARATOR' (Robert Hofstra, yonik) * SOLR-1914: Change the JSON response format to output float/double values of NaN,Infinity,-Infinity as strings. (yonik) * SOLR-1948: PatternTokenizerFactory should use parent's args (koji) * SOLR-1870: Indexing documents using the 'javabin' format no longer fails with a ClassCastException whenSolrInputDocuments contain field values which are Collections or other classes that implement Iterable. (noble, hossman) * SOLR-1981: Solr will now fail correctly if solr.xml attempts to specify multiple cores that have the same name (hossman) * SOLR-1791: Fix messed up core names on admin gui (yonik via koji) * SOLR-1995: Change date format from "hour in am/pm" to "hour in day" in CoreContainer and SnapShooter. (Hayato Ito, koji) * SOLR-2008: avoid possible RejectedExecutionException w/autoCommit by making SolreCore close the UpdateHandler before closing the SearchExecutor. (NarasimhaRaju, hossman) * SOLR-2036: Avoid expensive fieldCache ram estimation for the admin stats page. (yonik) * SOLR-2047: ReplicationHandler should accept bool type for enable flag. (koji) * SOLR-1630: Fix spell checking collation issue related to token positions (rmuir, gsingers) * SOLR-2100: The replication handler backup command didn't save the commit point and hence could fail when a newer commit caused the older commit point to be removed before it was finished being copied. This did not affect normal master/slave replication. (Peter Sturge via yonik) * SOLR-2114: Fixed parsing error in hsin function. The function signature has changed slightly. (gsingers) * SOLR-2083: SpellCheckComponent misreports suggestions when distributed (James Dyer via gsingers) * SOLR-2111: Change exception handling in distributed faceting to work more like non-distributed faceting, change facet_counts/exception from a String to a List<String> to enable listing all exceptions that happened, and prevent an exception in one facet command from affecting another facet command. (yonik) * SOLR-2110: Remove the restriction on names for local params substitution/dereferencing. Properly encode local params in distributed faceting. (yonik) * SOLR-2135: Fix behavior of ConcurrentLRUCache when asking for getLatestAccessedItems(0) or getOldestAccessedItems(0). (David Smiley via hossman) * SOLR-2148: Highlighter doesn't support q.alt. (koji) * SOLR-2180: It was possible for EmbeddedSolrServer to leave searchers open if a request threw an exception. (yonik) * SOLR-2173: Suggester should always rebuild Lookup data if Lookup.load fails. (ab) * SOLR-2081: BaseResponseWriter.isStreamingDocs causes SingleResponseWriter.end to be called 2x (Chris A. Mattmann via hossman) * SOLR-2219: The init() method of every SolrRequestHandler was being called twice. (ambikeshwar singh and hossman) * SOLR-2285: duplicate SolrEventListeners no longer created (hossman) * SOLR-1993: fix String cast assumption in JavaBinCodec - specific addresses "commitWithin" option on Update requests. (noble, hossman, and Maxim Valyanskiy) * SOLR-2261: fix velocity template layout.vm that referred to an older version of jquery. (Eric Pugh via rmuir) * SOLR-2307: fix bug in PHPSerializedResponseWriter (wt=phps) when dealing with SolrDocumentList objects -- ie: sharded queries. (Antonio Verni via hossman) * SOLR-2127: Fixed serialization of default core and indentation of solr.xml when serializing. (Ephraim Ofir, Mark Miller) * SOLR-2320: Fixed ReplicationHandler detail reporting for masters (hossman) * SOLR-482: Provide more exception handling in CSVLoader (gsingers) * SOLR-1283: HTMLStripCharFilter sometimes threw a "Mark Invalid" exception. (Julien Coloos, hossman, yonik) * SOLR-2085: Improve SolrJ behavior when FacetComponent comes before QueryComponent (Tomas Salfischberger via hossman) * SOLR-1940: Fix SolrDispatchFilter behavior when Content-Type is unknown (Lance Norskog and hossman) * SOLR-1983: snappuller fails when modifiedConfFiles is not empty and full copy of index is needed. (Alexander Kanarsky via yonik) * SOLR-2156: SnapPuller fails to clean Old Index Directories on Full Copy (Jayendra Patil via yonik) * SOLR-96: Fix XML parsing in XMLUpdateRequestHandler and DocumentAnalysisRequestHandler to respect charset from XML file and only use HTTP header's "Content-Type" as a "hint". (uschindler) * SOLR-2339: Fix sorting to explicitly generate an error if you attempt to sort on a multiValued field. (hossman) * SOLR-2348: Fix field types to explicitly generate an error if you attempt to get a ValueSource for a multiValued field. (hossman) * SOLR-2380: Distributed faceting could miss values when facet.sort=index and when facet.offset was greater than 0. (yonik) * SOLR-1656: XIncludes and other HREFs in XML files loaded by ResourceLoader are fixed to be resolved using the URI standard (RFC 2396). The system identifier is no longer a plain filename with path, it gets initialized using a custom URI scheme "solrres:". This scheme is resolved using a EntityResolver that utilizes ResourceLoader (org.apache.solr.common.util.SystemIdResolver). This makes all relative pathes in Solr's config files behave like expected. This change introduces some backwards breaks in the API: Some config classes (Config, SolrConfig, IndexSchema) were changed to take org.xml.sax.InputSource instead of InputStream. There may also be some backwards breaks in existing config files, it is recommended to check your config files / XSLTs and replace all XIncludes/HREFs that were hacked to use absolute paths to use relative ones. (uschindler) * SOLR-309: Fix FieldType so setting an analyzer on a FieldType that doesn't expect it will generate an error. Practically speaking this means that Solr will now correctly generate an error on initialization if the schema.xml contains an analyzer configuration for a fieldType that does not use TextField. (hossman) * SOLR-2192: StreamingUpdateSolrServer.blockUntilFinished was not thread safe and could throw an exception. (yonik) * SOLR-1692: Fix bug in clustering component relating to carrot.produceSummary option (gsingers) * SOLR-1756: The date.format setting for extraction request handler causes ClassCastException when enabled and the config code that parses this setting does not properly use the same iterator instance. (Christoph Brill, Mark Miller) * SOLR-1638: Fixed NullPointerException during DIH import if uniqueKey is not specified in schema (Akshay Ukey via shalin) * SOLR-1639: Fixed misleading error message when dataimport.properties is not writable (shalin) * SOLR-1598: DIH: Reader used in PlainTextEntityProcessor is not explicitly closed (Sascha Szott via noble) * SOLR-1759: DIH: $skipDoc was not working correctly (Gian Marco Tagliani via noble) * SOLR-1762: DIH: DateFormatTransformer does not work correctly with non-default locale dates (tommy chheng via noble) * SOLR-1757: DIH multithreading sometimes throws NPE (noble) * SOLR-1766: DIH with threads enabled doesn't respond to the abort command (Michael Henson via noble) * SOLR-1767: dataimporter.functions.escapeSql() does not escape backslash character (Sean Timm via noble) * SOLR-1811: formatDate should use the current NOW value always (Sean Timm via noble) * SOLR-1794: Dataimport of CLOB fields fails when getCharacterStream() is defined in a superclass. (Gunnar Gauslaa Bergem via rmuir) * SOLR-2057: DataImportHandler never calls UpdateRequestProcessor.finish() (Drew Farris via koji) * SOLR-1973: Empty fields in XML update messages confuse DataImportHandler. (koji) * SOLR-2221: Use StrUtils.parseBool() to get values of boolean options in DIH. true/on/yes (for TRUE) and false/off/no (for FALSE) can be used for sub-options (debug, verbose, synchronous, commit, clean, optimize) for full/delta-import commands. (koji) * SOLR-2310: DIH: getTimeElapsedSince() returns incorrect hour value when the elapse is over 60 hours (tom liu via koji) * SOLR-2252: DIH: When a child entity in nested entities is rootEntity="true", delta-import doesn't work. (koji) * SOLR-2330: solrconfig.xml files in example-DIH are broken. (Matt Parker, koji) * SOLR-1191: resolve DataImportHandler deltaQuery column against pk when pk has a prefix (e.g. pk="book.id" deltaQuery="select id from ..."). More useful error reporting when no match found (previously failed with a NullPointerException in log and no clear user feedback). (gthb via yonik) * SOLR-2116: Fix TikaConfig classloader bug in TikaEntityProcessor (Martijn van Groningen via hossman) Other Changes ---------------------- * SOLR-1602: Refactor SOLR package structure to include o.a.solr.response and move QueryResponseWriters in there (Chris A. Mattmann, ryan, hoss) * SOLR-1516: Addition of an abstract BaseResponseWriter class to simplify the development of QueryResponseWriter implementations. (Chris A. Mattmann via noble) * SOLR-1592: Refactor XMLWriter startTag to allow arbitrary attributes to be written (Chris A. Mattmann via noble) * SOLR-1561: Added Lucene 2.9.1 spatial contrib jar to lib. (gsingers) * SOLR-1570: Log warnings if uniqueKey is multi-valued or not stored (hossman, shalin) * SOLR-1558: QueryElevationComponent only works if the uniqueKey field is implemented using StrField. In previous versions of Solr no warning or error would be generated if you attempted to use QueryElevationComponent, it would just fail in unexpected ways. This has been changed so that it will fail with a clear error message on initialization. (hossman) * SOLR-1611: Added Lucene 2.9.1 collation contrib jar to lib (shalin) * SOLR-1608: Extract base class from TestDistributedSearch to make it easy to write test cases for other distributed components. (shalin) * Upgraded to Lucene 2.9-dev r888785 (shalin) * SOLR-1610: Generify SolrCache (Jason Rutherglen via shalin) * SOLR-1637: Remove ALIAS command * SOLR-1662: Added Javadocs in BufferedTokenStream and fixed incorrect cloning in TestBufferedTokenStream (Robert Muir, Uwe Schindler via shalin) * SOLR-1674: Improve analysis tests and cut over to new TokenStream API. (Robert Muir via Mark Miller) * SOLR-1661: Remove adminCore from CoreContainer . removed deprecated methods setAdminCore(), getAdminCore() (noble) * SOLR-1704: Google collections moved from clustering to core (noble) * SOLR-1268: Add Lucene 2.9-dev r888785 FastVectorHighlighter contrib jar to lib. (koji) * SOLR-1538: Reordering of object allocations in ConcurrentLRUCache to eliminate (an extremely small) potential for deadlock. (gabriele renzi via hossman) * SOLR-1588: Removed some very old dead code. (Chris A. Mattmann via hossman) * SOLR-1696 : Deprecate old <highlighting> syntax and move configuration to HighlightComponent (noble) * SOLR-1727: SolrEventListener should extend NamedListInitializedPlugin (noble) * SOLR-1771: Improved error message when StringIndex cannot be initialized for a function query (hossman) * SOLR-1695: Improved error messages when adding a document that does not contain exactly one value for the uniqueKey field (hossman) * SOLR-1776: DismaxQParser and ExtendedDismaxQParser now use the schema.xml "defaultSearchField" as the default value for the "qf" param instead of failing with an error when "qf" is not specified. (hossman) * SOLR-1851: luceneAutoCommit no longer has any effect - it has been remove (Mark Miller) * SOLR-1865: SolrResourceLoader.getLines ignores Byte Order Markers (BOMs) at the beginning of input files, these are often created by editors such as Windows Notepad. (rmuir, hossman) * SOLR-1938: ElisionFilterFactory will use a default set of French contractions if you do not supply a custom articles file. (rmuir) * SOLR-2003: SolrResourceLoader will report any encoding errors, rather than silently using replacement characters for invalid inputs (blargy via rmuir) * SOLR-1804: Google collections updated to Google Guava (which is a superset of collections and contains bug fixes) (gsingers) * SOLR-2034: Switch to JavaBin codec version 2. Strings are now serialized as the number of UTF-8 bytes, followed by the bytes in UTF-8. Previously Strings were serialized as the number of UTF-16 chars, followed by the bytes in Modified UTF-8. (hossman, yonik, rmuir) * SOLR-2013: Add mapping-FoldToASCII.txt to example conf directory. (Steven Rowe via koji) * SOLR-2213: Upgrade to jQuery 1.4.3 (Erick Erickson via ryan) * SOLR-1826: Add unit tests for highlighting with termOffsets=true and overlapping tokens. (Stefan Oestreicher via rmuir) * SOLR-2340: Add version infos to message in JavaBinCodec when throwing exception. (koji) * SOLR-2350: Since Solr no longer requires XML files to be in UTF-8 (see SOLR-96) SimplePostTool (aka: post.jar) has been improved to work with files of any mime-type or charset. (hossman) * SOLR-2365: Move DIH jars out of solr.war (David Smiley via yonik) * SOLR-2381: Include a patched version of Jetty (6.1.26 + JETTY-1340) to fix problematic UTF-8 handling for supplementary characters. (Bernd Fehling, uschindler, yonik, rmuir) * SOLR-2391: The preferred Content-Type for XML was changed to application/xml. XMLResponseWriter now only delivers using this type; updating documents and analyzing documents is still supported using text/xml as Content-Type, too. If you have clients that are hardcoded on text/xml as Content-Type, you have to change them. (uschindler, rmuir) * SOLR-2414: All ResponseWriters now use only ServletOutputStreams and wrap their own Writer around it when serializing. This fixes the bug in PHPSerializedResponseWriter that produced wrong string length if the servlet container had a broken UTF-8 encoding that was in fact CESU-8 (see SOLR-1091). The system property to enable the CESU-8 byte counting in PHPSerializesResponseWriters for broken servlet containers was therefore removed and is now ignored if set. Output is always UTF-8. (uschindler, yonik, rmuir) * SOLR-141: Errors and Exceptions are formated by ResponseWriter. (Mike Sokolov, Rich Cariens, Daniel Naber, ryan) * SOLR-1902: Upgraded to Tika 0.8 and changed deprecated parse call * SOLR-1813: Add ICU4j to contrib/extraction libs and add tests for Arabic extraction (Robert Muir via gsingers) * SOLR-1821: Fix TimeZone-dependent test failure in TestEvaluatorBag. (Chris Male via rmuir) * SOLR-2367: Reduced noise in test output by ensuring the properties file can be written. (Gunnlaugur Thor Briem via rmuir) Build ---------------------- * SOLR-1522: Automated release signing process. (gsingers) * SOLR-1891: Make lucene-jars-to-solr fail if copying any of the jars fails, and update clean to remove the jars in that directory (Mark Miller) * LUCENE-2466: Commons-Codec was upgraded from 1.3 to 1.4. (rmuir) * SOLR-2042: Fixed some Maven deps (Drew Farris via gsingers) * LUCENE-2657: Switch from using Maven POM templates to full POMs when generating Maven artifacts (Steven Rowe) Documentation ---------------------- * SOLR-1590: Javadoc for XMLWriter#startTag (Chris A. Mattmann via hossman) * SOLR-1792: Documented peculiar behavior of TestHarness.LocalRequestFactory (hossman) ================== Release 1.4.0 ================== Release Date: See http://lucene.apache.org/solr for the official release date. Upgrading from Solr 1.3 ----------------------- There is a new default faceting algorithm for multiVaued fields that should be faster for most cases. One can revert to the previous algorithm (which has also been improved somewhat) by adding facet.method=enum to the request. Searching and sorting is now done on a per-segment basis, meaning that the FieldCache entries used for sorting and for function queries are created and used per-segment and can be reused for segments that don't change between index updates. While generally beneficial, this can lead to increased memory usage over 1.3 in certain scenarios: 1) A single valued field that was used for both sorting and faceting in 1.3 would have used the same top level FieldCache entry. In 1.4, sorting will use entries at the segment level while faceting will still use entries at the top reader level, leading to increased memory usage. 2) Certain function queries such as ord() and rord() require a top level FieldCache instance and can thus lead to increased memory usage. Consider replacing ord() and rord() with alternatives, such as function queries based on ms() for date boosting. If you use custom Tokenizer or TokenFilter components in a chain specified in schema.xml, they must support reusability. If your Tokenizer or TokenFilter maintains state, it should implement reset(). If your TokenFilteFactory does not return a subclass of TokenFilter, then it should implement reset() and call reset() on it's input TokenStream. TokenizerFactory implementations must now return a Tokenizer rather than a TokenStream. New users of Solr 1.4 will have omitTermFreqAndPositions enabled for non-text indexed fields by default, which avoids indexing term frequency, positions, and payloads, making the index smaller and faster. If you are upgrading from an earlier Solr release and want to enable omitTermFreqAndPositions by default, change the schema version from 1.1 to 1.2 in schema.xml. Remove any existing index and restart Solr to ensure that omitTermFreqAndPositions completely takes affect. The default QParserPlugin used by the QueryComponent for parsing the "q" param has been changed, to remove support for the deprecated use of ";" as a separator between the query string and the sort options when no "sort" param was used. Users who wish to continue using the semi-colon based method of specifying the sort options should explicitly set the defType param to "lucenePlusSort" on all requests. (The simplest way to do this is by specifying it as a default param for your request handlers in solrconfig.xml, see the example solrconfig.xml for sample syntax.) If spellcheck.extendedResults=true, the response format for suggestions has changed, see SOLR-1071. Use of the "charset" option when configuring the following Analysis Factories has been deprecated and will cause a warning to be logged. In future versions of Solr attempting to use this option will cause an error. See SOLR-1410 for more information. - GreekLowerCaseFilterFactory - RussianStemFilterFactory - RussianLowerCaseFilterFactory - RussianLetterTokenizerFactory DIH: Evaluator API has been changed in a non back-compatible way. Users who have developed custom Evaluators will need to change their code according to the new API for it to work. See SOLR-996 for details. DIH: The formatDate evaluator's syntax has been changed. The new syntax is formatDate(<variable>, '<format_string>'). For example, formatDate(x.date, 'yyyy-MM-dd'). In the old syntax, the date string was written without a single-quotes. The old syntax has been deprecated and will be removed in 1.5, until then, using the old syntax will log a warning. DIH: The Context API has been changed in a non back-compatible way. In particular, the Context.currentProcess() method now returns a String describing the type of the current import process instead of an int. Similarily, the public constants in Context viz. FULL_DUMP, DELTA_DUMP and FIND_DELTA are changed to a String type. See SOLR-969 for details. DIH: The EntityProcessor API has been simplified by moving logic for applying transformers and handling multi-row outputs from Transformers into an EntityProcessorWrapper class. The EntityProcessor#destroy is now called once per parent-row at the end of row (end of data). A new method EntityProcessor#close is added which is called at the end of import. DIH: In Solr 1.3, if the last_index_time was not available (first import) and a delta-import was requested, a full-import was run instead. This is no longer the case. In Solr 1.4 delta import is run with last_index_time as the epoch date (January 1, 1970, 00:00:00 GMT) if last_index_time is not available. Versions of Major Components ---------------------------- Apache Lucene 2.9.1 (r832363 on 2.9 branch) Apache Tika 0.4 Carrot2 3.1.0 Lucene Information ---------------- Since Solr is built on top of Lucene, many people add customizations to Solr that are dependent on Lucene. Please see http://lucene.apache.org/java/2_9_0/, especially http://lucene.apache.org/java/2_9_0/changes/Changes.html for more information on the version of Lucene used in Solr. Detailed Change List ---------------------- New Features ---------------------- 1. SOLR-560: Use SLF4J logging API rather then JDK logging. The packaged .war file is shipped with a JDK logging implementation, so logging configuration for the .war should be identical to solr 1.3. However, if you are using the .jar file, you can select which logging implementation to use by dropping a different binding. See: http://www.slf4j.org/ (ryan) 2. SOLR-617: Allow configurable index deletion policy and provide a default implementation which allows deletion of commit points on various criteria such as number of commits, age of commit point and optimized status. See http://lucene.apache.org/java/2_3_2/api/org/apache/lucene/index/IndexDeletionPolicy.html (yonik, Noble Paul, Akshay Ukey via shalin) 3. SOLR-658: Allow Solr to load index from arbitrary directory in dataDir (Noble Paul, Akshay Ukey via shalin) 4. SOLR-793: Add 'commitWithin' argument to the update add command. This behaves similar to the global autoCommit maxTime argument except that it is set for each request. (ryan) 5. SOLR-670: Add support for rollbacks in UpdateHandler. This allows user to rollback all changes since the last commit. (Noble Paul, koji via shalin) 6. SOLR-813: Adding DoubleMetaphone Filter and Factory. Similar to the PhoneticFilter, but this uses DoubleMetaphone specific calls (including alternate encoding) (Todd Feak via ryan) 7. SOLR-680: Add StatsComponent. This gets simple statistics on matched numeric fields, including: min, max, mean, median, stddev. (koji, ryan) - SOLR-1380: Added support for multi-valued fields (Harish Agarwal via gsingers) 8. SOLR-561: Added Replication implemented in Java as a request handler. Supports index replication as well as configuration replication and exposes detailed statistics and progress information on the Admin page. Works on all platforms. (Noble Paul, yonik, Akshay Ukey, shalin) 9. SOLR-746: Added "omitHeader" request parameter to omit the header from the response. (Noble Paul via shalin) 10. SOLR-651: Added TermVectorComponent for serving up term vector information, plus IDF. See http://wiki.apache.org/solr/TermVectorComponent (gsingers, Vaijanath N. Rao, Noble Paul) 12. SOLR-795: SpellCheckComponent supports building indices on optimize if configured in solrconfig.xml (Jason Rennie, shalin) 13. SOLR-667: A LRU cache implementation based upon ConcurrentHashMap and other techniques to reduce contention and synchronization overhead, to utilize multiple CPU cores more effectively. (Fuad Efendi, Noble Paul, yonik via shalin) 14. SOLR-465: Add configurable DirectoryProvider so that alternate Directory implementations can be specified via solrconfig.xml. The default DirectoryProvider will use NIOFSDirectory for better concurrency on non Windows platforms. (Mark Miller, TJ Laurenzo via yonik) 15. SOLR-822: Add CharFilter so that characters can be filtered (e.g. character normalization) before Tokenizer/TokenFilters. (koji) 16. SOLR-829: Allow slaves to request compressed files from master during replication (Simon Collins, Noble Paul, Akshay Ukey via shalin) 17. SOLR-877: Added TermsComponent for accessing Lucene's TermEnum capabilities. Useful for auto suggest and possibly distributed search. Not distributed search compliant. (gsingers) - Added mincount and maxcount options (Khee Chin via gsingers) 18. SOLR-538: Add maxChars attribute for copyField function so that the length limit for destination can be specified. (Georgios Stamatis, Lars Kotthoff, Chris Harris via koji) 19. SOLR-284: Added support for extracting content from binary documents like MS Word and PDF using Apache Tika. See also contrib/extraction/CHANGES.txt (Eric Pugh, Chris Harris, yonik, gsingers) 20. SOLR-819: Added factories for Arabic support (gsingers) 21. SOLR-781: Distributed search ability to sort field.facet values lexicographically. facet.sort values "true" and "false" are also deprecated and replaced with "count" and "lex". (Lars Kotthoff via yonik) 22. SOLR-821: Add support for replication to copy conf file to slave with a different name. This allows replication of solrconfig.xml (Noble Paul, Akshay Ukey via shalin) 23. SOLR-911: Add support for multi-select faceting by allowing filters to be tagged and facet commands to exclude certain filters. This patch also added the ability to change the output key for facets in the response, and optimized distributed faceting refinement by lowering parsing overhead and by making requests and responses smaller. 24. SOLR-876: WordDelimiterFilter now supports a splitOnNumerics option, as well as a list of protected terms. (Dan Rosher via hossman) 25. SOLR-928: SolrDocument and SolrInputDocument now implement the Map<String,?> interface. This should make plugging into other standard tools easier. (ryan) 26. SOLR-847: Enhance the snappull command in ReplicationHandler to accept masterUrl. (Noble Paul, Preetam Rao via shalin) 27. SOLR-540: Add support for globbing in field names to highlight. For example, hl.fl=*_text will highlight all fieldnames ending with _text. (Lars Kotthoff via yonik) 28. SOLR-906: Adding a StreamingUpdateSolrServer that writes update commands to an open HTTP connection. If you are using solrj for bulk update requests you should consider switching to this implementaion. However, note that the error handling is not immediate as it is with the standard SolrServer. (ryan) 29. SOLR-865: Adding support for document updates in binary format and corresponding support in Solrj client. (Noble Paul via shalin) 30. SOLR-763: Add support for Lucene's PositionFilter (Mck SembWever via shalin) 31. SOLR-966: Enhance the map() function query to take in an optional default value (Noble Paul, shalin) 32. SOLR-820: Support replication on startup of master with new index. (Noble Paul, Akshay Ukey via shalin) 33. SOLR-943: Make it possible to specify dataDir in solr.xml and accept the dataDir as a request parameter for the CoreAdmin create command. (Noble Paul via shalin) 34. SOLR-850: Addition of timeouts for distributed searching. Configurable through 'shard-socket-timeout' and 'shard-connection-timeout' parameters in SearchHandler. (Patrick O'Leary via shalin) 35. SOLR-799: Add support for hash based exact/near duplicate document handling. (Mark Miller, yonik) 36. SOLR-1026: Add protected words support to SnowballPorterFilterFactory (ehatcher) 37. SOLR-739: Add support for OmitTf (Mark Miller via yonik) 38. SOLR-1046: Nested query support for the function query parser and lucene query parser (the latter existed as an undocumented feature in 1.3) (yonik) 39. SOLR-940: Add support for Lucene's Trie Range Queries by providing new FieldTypes in schema for int, float, long, double and date. Single-valued Trie based fields with a precisionStep will index multiple precisions and enable faster range queries. (Uwe Schindler, yonik, shalin) 40. SOLR-1038: Enhance CommonsHttpSolrServer to add docs in batch using an iterator API (Noble Paul via shalin) 41. SOLR-844: A SolrServer implementation to front-end multiple solr servers and provides load balancing and failover support (Noble Paul, Mark Miller, hossman via shalin) 42. SOLR-939: ValueSourceRangeFilter/Query - filter based on values in a FieldCache entry or on any arbitrary function of field values. (yonik) 43. SOLR-1095: Fixed performance problem in the StopFilterFactory and simplified code. Added tests as well. (gsingers) 44. SOLR-1096: Introduced httpConnTimeout and httpReadTimeout in replication slave configuration to avoid stalled replication. (Jeff Newburn, Noble Paul, shalin) 45. SOLR-1115: <bool>on</bool> and <bool>yes</bool> work as expected in solrconfig.xml. (koji) 46. SOLR-1099: A FieldAnalysisRequestHandler which provides the analysis functionality of the web admin page as a service. The AnalysisRequestHandler is renamed to DocumentAnalysisRequestHandler which is enhanced with query analysis and showMatch support. AnalysisRequestHandler is now deprecated. Support for both FieldAnalysisRequestHandler and DocumentAnalysisRequestHandler is also provided in the Solrj client. (Uri Boness, shalin) 47. SOLR-1106: Made CoreAdminHandler Actions pluggable so that additional actions may be plugged in or the existing ones can be overridden if needed. (Kay Kay, Noble Paul, shalin) 48. SOLR-1124: Add a top() function query that causes it's argument to have it's values derived from the top level IndexReader, even when invoked from a sub-reader. top() is implicitly used for the ord() and rord() functions. (yonik) 49. SOLR-1110: Support sorting on trie fields with Distributed Search. (Mark Miller, Uwe Schindler via shalin) 50. SOLR-1121: CoreAdminhandler should not need a core . This makes it possible to start a Solr server w/o a core .(noble) 51. SOLR-769: Added support for clustering in contrib/clustering. See http://wiki.apache.org/solr/ClusteringComponent for more info. (gsingers, Stanislaw Osinski) 52. SOLR-1175: disable/enable replication on master side. added two commands 'enableReplication' and 'disableReplication' (noble) 53. SOLR-1179: DocSets can now be used as Lucene Filters via DocSet.getTopFilter() (yonik) 54. SOLR-1116: Add a Binary FieldType (noble) 55. SOLR-1051: Support the merge of multiple indexes as a CoreAdmin and an update command (Ning Li via shalin) 56. SOLR-1152: Snapshoot on ReplicationHandler should accept location as a request parameter (shalin) 57. SOLR-1204: Enhance SpellingQueryConverter to handle UTF-8 instead of ASCII only. Use the NMTOKEN syntax for matching field names. (Michael Ludwig, shalin) 58. SOLR-1189: Support providing username and password for basic HTTP authentication in Java replication (Matthew Gregg, shalin) 59. SOLR-243: Add configurable IndexReaderFactory so that alternate IndexReader implementations can be specified via solrconfig.xml. Note that using a custom IndexReader may be incompatible with ReplicationHandler (see comments in SOLR-1366). This should be treated as an experimental feature. (Andrzej Bialecki, hossman, Mark Miller, John Wang) 60. SOLR-1214: differentiate between solr home and instanceDir .deprecates the method SolrResourceLoader#locateInstanceDir() and it is renamed to locateSolrHome (noble) 61. SOLR-1216 : disambiguate the replication command names. 'snappull' becomes 'fetchindex' 'abortsnappull' becomes 'abortfetch' (noble) 62. SOLR-1145: Add capability to specify an infoStream log file for the underlying Lucene IndexWriter in solrconfig.xml. This is an advanced debug log file that can be used to aid developers in fixing IndexWriter bugs. See the commented out example in the example solrconfig.xml under the indexDefaults section. (Chris Harris, Mark Miller) 63. SOLR-1256: Show the output of CharFilters in analysis.jsp. (koji) 64. SOLR-1266: Added stemEnglishPossessive option (default=true) to WordDelimiterFilter that allows disabling of english possessive stemming (removal of trailing 's from tokens) (Robert Muir via yonik) 65. SOLR-1237: firstSearcher and newSearcher can now be identified via the CommonParams.EVENT (evt) parameter in a request. This allows a RequestHandler or SearchComponent to know when a newSearcher or firstSearcher event happened. QuerySenderListender is the only implementation in Solr that implements this, but outside implementations may wish to. See the AbstractSolrEventListener for a helper method. (gsingers) 66. SOLR-1343: Added HTMLStripCharFilter and marked HTMLStripReader, HTMLStripWhitespaceTokenizerFactory and HTMLStripStandardTokenizerFactory deprecated. To strip HTML tags, HTMLStripCharFilter can be used with an arbitrary Tokenizer. (koji) 67. SOLR-1275: Add expungeDeletes to DirectUpdateHandler2 (noble) 68. SOLR-1372: Enhance FieldAnalysisRequestHandler to accept field value from content stream (ehatcher) 69. SOLR-1370: Show the output of CharFilters in FieldAnalysisRequestHandler (koji) 70. SOLR-1373: Add Filter query to admin/form.jsp (Jason Rutherglen via hossman) 71. SOLR-1368: Add ms() function query for getting milliseconds from dates and for high precision date subtraction, add sub() for subtracting other arguments. (yonik) 72. SOLR-1156: Sort TermsComponent results by frequency (Matt Weber via yonik) 73. SOLR-1335 : load core properties from a properties file (noble) 74. SOLR-1385 : Add an 'enable' attribute to all plugins (noble) 75. SOLR-1414 : implicit core properties are not set for single core (noble) 76. SOLR-659 : Adds shards.start and shards.rows to distributed search to allow more efficient bulk queries (those that retrieve many or all documents). (Brian Whitman via yonik) 77. SOLR-1321: Add better support for efficient wildcard handling (Andrzej Bialecki, Robert Muir, gsingers) 78. SOLR-1326 : New interface PluginInfoInitialized for all types of plugin (noble) 79. SOLR-1447 : Simple property injection. <mergePolicy> & <mergeScheduler> syntaxes are now deprecated (Jason Rutherglen, noble) 80. SOLR-908 : CommonGramsFilterFactory/CommonGramsQueryFilterFactory for speeding up phrase queries containing common words by indexing n-grams and using them at query time. (Tom Burton-West, Jason Rutherglen via yonik) 81. SOLR-1292: Add FieldCache introspection to stats.jsp and JMX Monitoring via a new SolrFieldCacheMBean. (hossman) 82. SOLR-1167: Solr Config now supports XInclude for XML engines that can support it. (Bryan Talbot via gsingers) 83. SOLR-1478: Enable sort by Lucene docid. (ehatcher) 84. SOLR-1449: Add <lib> elements to solrconfig.xml to specifying additional classpath directories and regular expressions. (hossman via yonik) 85. SOLR-1128: Added metadata output to extraction request handler "extract only" option. (gsingers) 86. SOLR-1274: Added text serialization output for extractOnly (Peter Wolanin, gsingers) 87. SOLR-768: DIH: Set last_index_time variable in full-import command. (Wojtek Piaseczny, Noble Paul via shalin) 88. SOLR-811: Allow a "deltaImportQuery" attribute in SqlEntityProcessor which is used for delta imports instead of DataImportHandler manipulating the SQL itself. (Noble Paul via shalin) 89. SOLR-842: Better error handling in DataImportHandler with options to abort, skip and continue imports. (Noble Paul, shalin) 90. SOLR-833: DIH: A DataSource to read data from a field as a reader. This can be used, for example, to read XMLs residing as CLOBs or BLOBs in databases. (Noble Paul via shalin) 91. SOLR-887: A DIH Transformer to strip HTML tags. (Ahmed Hammad via shalin) 92. SOLR-886: DataImportHandler should rollback when an import fails or it is aborted (shalin) 93. SOLR-891: A DIH Transformer to read strings from Clob type. (Noble Paul via shalin) 94. SOLR-812: Configurable JDBC settings in JdbcDataSource including optimized defaults for read only mode. (David Smiley, Glen Newton, shalin) 95. SOLR-910: Add a few utility commands to the DIH admin page such as full import, delta import, status, reload config. (Ahmed Hammad via shalin) 96. SOLR-938: Add event listener API for DIH import start and end. (Kay Kay, Noble Paul via shalin) 97. SOLR-801: DIH: Add support for configurable pre-import and post-import delete query per root-entity. (Noble Paul via shalin) 98. SOLR-988: Add a new scope for session data stored in Context to store objects across imports. (Noble Paul via shalin) 99. SOLR-980: A PlainTextEntityProcessor which can read from any DataSource<Reader> and output a String. (Nathan Adams, Noble Paul via shalin) 100.SOLR-1003: XPathEntityprocessor must allow slurping all text from a given xml node and its children. (Noble Paul via shalin) 101.SOLR-1001: Allow variables in various attributes of RegexTransformer, HTMLStripTransformer and NumberFormatTransformer. (Fergus McMenemie, Noble Paul, shalin) 102.SOLR-989: DIH: Expose running statistics from the Context API. (Noble Paul, shalin) 103.SOLR-996: DIH: Expose Context to Evaluators. (Noble Paul, shalin) 104.SOLR-783: DIH: Enhance delta-imports by maintaining separate last_index_time for each entity. (Jon Baer, Noble Paul via shalin) 105.SOLR-1033: Current entity's namespace is made available to all DIH Transformers. This allows one to use an output field of TemplateTransformer in other transformers, among other things. (Fergus McMenemie, Noble Paul via shalin) 106.SOLR-1066: New methods in DIH Context to expose Script details. ScriptTransformer changed to read scripts through the new API methods. (Noble Paul via shalin) 107.SOLR-1062: A DIH LogTransformer which can log data in a given template format. (Jon Baer, Noble Paul via shalin) 108.SOLR-1065: A DIH ContentStreamDataSource which can accept HTTP POST data in a content stream. This can be used to push data to Solr instead of just pulling it from DB/Files/URLs. (Noble Paul via shalin) 109.SOLR-1061: Improve DIH RegexTransformer to create multiple columns from regex groups. (Noble Paul via shalin) 110.SOLR-1059: Special DIH flags introduced for deleting documents by query or id, skipping rows and stopping further transforms. Use $deleteDocById, $deleteDocByQuery for deleting by id and query respectively. Use $skipRow to skip the current row but continue with the document. Use $stopTransform to stop further transformers. New methods are introduced in Context for deleting by id and query. (Noble Paul, Fergus McMenemie, shalin) 111.SOLR-1076: JdbcDataSource should resolve DIH variables in all its configuration parameters. (shalin) 112.SOLR-1055: Make DIH JdbcDataSource easily extensible by making the createConnectionFactory method protected and return a Callable<Connection> object. (Noble Paul, shalin) 113.SOLR-1058: DIH: JdbcDataSource can lookup javax.sql.DataSource using JNDI. Use a jndiName attribute to specify the location of the data source. (Jason Shepherd, Noble Paul via shalin) 114.SOLR-1083: A DIH Evaluator for escaping query characters. (Noble Paul, shalin) 115.SOLR-934: A MailEntityProcessor to enable indexing mails from POP/IMAP sources into a solr index. (Preetam Rao, shalin) 116.SOLR-1060: A DIH LineEntityProcessor which can stream lines of text from a given file to be indexed directly or for processing with transformers and child entities. (Fergus McMenemie, Noble Paul, shalin) 117.SOLR-1127: Add support for DIH field name to be templatized. (Noble Paul, shalin) 118.SOLR-1092: Added a new DIH command named 'import' which does not automatically clean the index. This is useful and more appropriate when one needs to import only some of the entities. (Noble Paul via shalin) 119.SOLR-1153: DIH 'deltaImportQuery' is honored on child entities as well (noble) 120.SOLR-1230: Enhanced dataimport.jsp to work with all DataImportHandler request handler configurations, rather than just a hardcoded /dataimport handler. (ehatcher) 121.SOLR-1235: disallow period (.) in DIH entity names (noble) 122.SOLR-1234: Multiple DIH does not work because all of them write to dataimport.properties. Use the handler name as the properties file name (noble) 123.SOLR-1348: Support binary field type in convertType logic in DIH JdbcDataSource (shalin) 124.SOLR-1406: DIH: Make FileDataSource and FileListEntityProcessor to be more extensible (Luke Forehand, shalin) 125.SOLR-1437: DIH: XPathEntityProcessor can deal with xpath syntaxes such as //tagname , /root//tagname (Fergus McMenemie via noble) Optimizations ---------------------- 1. SOLR-374: Use IndexReader.reopen to save resources by re-using parts of the index that haven't changed. (Mark Miller via yonik) 2. SOLR-808: Write string keys in Maps as extern strings in the javabin format. (Noble Paul via shalin) 3. SOLR-475: New faceting method with better performance and smaller memory usage for multi-valued fields with many unique values but relatively few values per document. Controllable via the facet.method parameter - "fc" is the new default method and "enum" is the original method. (yonik) 4. SOLR-970: Use an ArrayList in SolrPluginUtils.parseQueryStrings since we know exactly how long the List will be in advance. (Kay Kay via hossman) 5. SOLR-1002: Change SolrIndexSearcher to use insertWithOverflow with reusable priority queue entries to reduce the amount of generated garbage during searching. (Mark Miller via yonik) 6. SOLR-971: Replace StringBuffer with StringBuilder for instances that do not require thread-safety. (Kay Kay via shalin) 7. SOLR-921: SolrResourceLoader must cache short class name vs fully qualified classname (Noble Paul, hossman via shalin) 8. SOLR-973: CommonsHttpSolrServer writes the xml directly to the server. (Noble Paul via shalin) 9. SOLR-1108: Remove un-needed synchronization in SolrCore constructor. (Noble Paul via shalin) 10. SOLR-1166: Speed up docset/filter generation by avoiding top-level score() call and iterating over leaf readers with TermDocs. (yonik) 11. SOLR-1169: SortedIntDocSet - a new small set implementation that saves memory over HashDocSet, is faster to construct, is ordered for easier implementation of skipTo, and is faster in the general case. (yonik) 12. SOLR-1165: Use Lucene Filters and pass them down to the Lucene search methods to filter earlier and improve performance. (yonik) 13. SOLR-1111: Use per-segment sorting to share fieldcache elements across unchanged segments. This saves memory and reduces commit times for incremental updates to the index. (yonik) 14. SOLR-1188: Minor efficiency improvement in TermVectorComponent related to ignoring positions or offsets (gsingers) 15. SOLR-1150: Load Documents for Highlighting one at a time rather than all at once to avoid OOM with many large Documents. (Siddharth Gargate via Mark Miller) 16. SOLR-1353: Implement and use reusable token streams for analysis. (Robert Muir, yonik) 17. SOLR-1296: Enables setting IndexReader's termInfosIndexDivisor via a new attribute to StandardIndexReaderFactory. Enables setting termIndexInterval to IndexWriter via SolrIndexConfig. (Jason Rutherglen, hossman, gsingers) 18. SOLR-846: DIH: Reduce memory consumption during delta import by removing keys when used (Ricky Leung, Noble Paul via shalin) 19. SOLR-974: DataImportHandler skips commit if no data has been updated. (Wojtek Piaseczny, shalin) 20. SOLR-1004: DIH: Check for abort more frequently during delta-imports. (Marc Sturlese, shalin) 21. SOLR-1098: DIH DateFormatTransformer can cache the format objects. (Noble Paul via shalin) 22. SOLR-1465: Replaced string concatenations with StringBuilder append calls in DIH XPathRecordReader. (Mark Miller, shalin) Bug Fixes ---------------------- 1. SOLR-774: Fixed logging level display (Sean Timm via Otis Gospodnetic) 2. SOLR-771: CoreAdminHandler STATUS should display 'normalized' paths (koji, hossman, shalin) 3. SOLR-532: WordDelimiterFilter now respects payloads and other attributes of the original Token by using Token.clone() (Tricia Williams, gsingers) 4. SOLR-805: DisMax queries are not being cached in QueryResultCache (Todd Feak via koji) 5. SOLR-751: WordDelimiterFilter didn't adjust the start offset of single tokens that started with delimiters, leading to incorrect highlighting. (Stefan Oestreicher via yonik) 7. SOLR-843: SynonymFilterFactory cannot handle multiple synonym files correctly (koji) 8. SOLR-840: BinaryResponseWriter does not handle incompatible data in fields (Noble Paul via shalin) 9. SOLR-803: CoreAdminRequest.createCore fails because name parameter isn't set (Sean Colombo via ryan) 10. SOLR-869: Fix file descriptor leak in SolrResourceLoader#getLines (Mark Miller, shalin) 11. SOLR-872: Better error message for incorrect copyField destination (Noble Paul via shalin) 12. SOLR-879: Enable position increments in the query parser and fix the example schema to enable position increments for the stop filter in both the index and query analyzers to fix the bug with phrase queries with stopwords. (yonik) 13. SOLR-836: Add missing "a" to the example stopwords.txt (yonik) 14. SOLR-892: Fix serialization of booleans for PHPSerializedResponseWriter (yonik) 15. SOLR-898: Fix null pointer exception for the JSON response writer based formats when nl.json=arrarr with null keys. (yonik) 16. SOLR-901: FastOutputStream ignores write(byte[]) call. (Noble Paul via shalin) 17. SOLR-807: BinaryResponseWriter writes fieldType.toExternal if it is not a supported type, otherwise it writes fieldType.toObject. This fixes the bug with encoding/decoding UUIDField. (koji, Noble Paul, shalin) 18. SOLR-863: SolrCore.initIndex should close the directory it gets for clearing the lock and use the DirectoryFactory. (Mark Miller via shalin) 19. SOLR-802: Fix a potential null pointer error in the distributed FacetComponent (David Bowen via ryan) 20. SOLR-346: Use perl regex to improve accuracy of finding latest snapshot in snapinstaller (billa) 21. SOLR-830: Use perl regex to improve accuracy of finding latest snapshot in snappuller (billa) 22. SOLR-897: Fixed Argument list too long error when there are lots of snapshots/backups (Dan Rosher via billa) 23. SOLR-925: Fixed highlighting on fields with multiValued="true" and termOffsets="true" (koji) 24. SOLR-902: FastInputStream#read(byte b[], int off, int len) gives incorrect results when amount left to read is less than buffer size (Noble Paul via shalin) 25. SOLR-978: Old files are not removed from slaves after replication (Jaco, Noble Paul, shalin) 26. SOLR-883: Implicit properties are not set for Cores created through CoreAdmin (Noble Paul via shalin) 27. SOLR-991: Better error message when parsing solrconfig.xml fails due to malformed XML. Error message notes the name of the file being parsed. (Michael Henson via shalin) 28. SOLR-1008: Fix stats.jsp XML encoding for <stat> item entries with ampersands in their names. (ehatcher) 29. SOLR-976: deleteByQuery is ignored when deleteById is placed prior to deleteByQuery in a <delete>. Now both delete by id and delete by query can be specified at the same time as follows. <delete> <id>05991</id><id>06000</id> <query>office:Bridgewater</query><query>office:Osaka</query> </delete> (koji) 30. SOLR-1016: HTTP 503 error changes 500 in SolrCore (koji) 31. SOLR-1015: Incomplete information in replication admin page and http command response when server is both master and slave i.e. when server is a repeater (Akshay Ukey via shalin) 32. SOLR-1018: Slave is unable to replicate when server acts as repeater (as both master and slave) (Akshay Ukey, Noble Paul via shalin) 33. SOLR-1031: Fix XSS vulnerability in schema.jsp (Paul Lovvik via ehatcher) 34. SOLR-1064: registry.jsp incorrectly displaying info for last core initialized regardless of what the current core is. (hossman) 35. SOLR-1072: absolute paths used in sharedLib attribute were incorrectly treated as relative paths. (hossman) 36. SOLR-1104: Fix some rounding errors in LukeRequestHandler's histogram (hossman) 37. SOLR-1125: Use query analyzer rather than index analyzer for queryFieldType in QueryElevationComponent (koji) 38. SOLR-1126: Replicated files have incorrect timestamp (Jian Han Guo, Jeff Newburn, Noble Paul via shalin) 39. SOLR-1094: Incorrect value of correctlySpelled attribute in some cases (David Smiley, Mark Miller via shalin) 40. SOLR-965: Better error message when <pingQuery> is not configured. (Mark Miller via hossman) 41. SOLR-1135: Java replication creates Snapshot in the directory where Solr was launched (Jianhan Guo via shalin) 42. SOLR-1138: Query Elevation Component now gracefully handles missing queries. (gsingers) 43. SOLR-929: LukeRequestHandler should return "dynamicBase" only if the field is dynamic. (Peter Wolanin, koji) 44. SOLR-1141: NullPointerException during snapshoot command in java based replication (Jian Han Guo, shalin) 45. SOLR-1078: Fixes to WordDelimiterFilter to avoid splitting or dropping international non-letter characters such as non spacing marks. (yonik) 46. SOLR-825, SOLR-1221: Enables highlighting for range/wildcard/fuzzy/prefix queries if using hl.usePhraseHighlighter=true and hl.highlightMultiTerm=true. Also make both options default to true. (Mark Miller, yonik) 47. SOLR-1174: Fix Logging admin form submit url for multicore. (Jacob Singh via shalin) 48. SOLR-1182: Fix bug in OrdFieldSource#equals which could cause a bug with OrdFieldSource caching on OrdFieldSource#hashcode collisions. (Mark Miller) 49. SOLR-1207: equals method should compare this and other of DocList in DocSetBase (koji) 50. SOLR-1242: Human readable JVM info from system handler does integer cutoff rounding, even when dealing with GB. Fixed to round to one decimal place. (Jay Hill, Mark Miller) 51. SOLR-1243: Admin RequestHandlers should not be cached over HTTP. (Mark Miller) 52. SOLR-1260: Fix implementations of set operations for DocList subclasses and fix a bug in HashDocSet construction when offset != 0. These bugs never manifested in normal Solr use and only potentially affect custom code. (yonik) 53. SOLR-1171: Fix LukeRequestHandler so it doesn't rely on SolrQueryParser and report incorrect stats when field names contain characters SolrQueryParser considers special. (hossman) 54. SOLR-1317: Fix CapitalizationFilterFactory to work when keep parameter is not specified. (ehatcher) 55. SOLR-1342: CapitalizationFilterFactory uses incorrect term length calculations. (Robert Muir via Mark Miller) 56. SOLR-1359: DoubleMetaphoneFilter didn't index original tokens if there was no alternative, and could incorrectly skip or reorder tokens. (yonik) 57. SOLR-1360: Prevent PhoneticFilter from producing duplicate tokens. (yonik) 58. SOLR-1371: LukeRequestHandler/schema.jsp errored if schema had no uniqueKey field. The new test for this also (hopefully) adds some future proofing against similar bugs in the future. As a side effect QueryElevationComponentTest was refactored, and a bug in that test was found. (hossman) 59. SOLR-914: General finalize() improvements. No finalizer delegates to the respective close/destroy method w/o first checking if it's already been closed/destroyed; if it hasn't a, SEVERE error is logged first. (noble, hossman) 60. SOLR-1362: WordDelimiterFilter had inconsistent behavior when setting the position increment of tokens following a token consisting of all delimiters, and could additionally lose big position increments. (Robert Muir, yonik) 61. SOLR-1091: Jetty's use of CESU-8 for code points outside the BMP resulted in invalid output from the serialized PHP writer. (yonik) 62. SOLR-1103: LukeRequestHandler (and schema.jsp) have been fixed to include the "1" (ie: 2**0) bucket in the term histogram data. (hossman) 63. SOLR-1398: Add offset corrections in PatternTokenizerFactory. (Anders Melchiorsen, koji) 64. SOLR-1400: Properly handle zero-length tokens in TrimFilter. This was not a bug in any released version. (Peter Wolanin, gsingers) 65. SOLR-1071: spellcheck.extendedResults returns an invalid JSON response when count > 1. To fix, the extendedResults format was changed. (Uri Boness, yonik) 66. SOLR-1381: Fixed improper handling of fields that have only term positions and not term offsets during Highlighting (Thorsten Fischer, gsingers) 67. SOLR-1427: Fixed registry.jsp issue with MBeans (gsingers) 68. SOLR-1468: SolrJ's XML response parsing threw an exception for null names, such as those produced when facet.missing=true (yonik) 69. SOLR-1471: Fixed issue with calculating missing values for facets in single valued cases in Stats Component. This is not correctly calculated for the multivalued case. (James Miller, gsingers) 70. SOLR-1481: Fixed omitHeader parameter for PHP ResponseWriter. (Jun Ohtani via billa) 71. SOLR-1448: Add weblogic.xml to solr webapp to enable correct operation in WebLogic. (Ilan Rabinovitch via yonik) 72. SOLR-1504: empty char mapping can cause ArrayIndexOutOfBoundsException in analysis.jsp and co. (koji) 73. SOLR-1394: HTMLStripCharFilter split tokens that contained entities and often calculated offsets incorrectly for entities. (Anders Melchiorsen via yonik) 74. SOLR-1517: Admin pages could stall waiting for localhost name resolution if reverse DNS wasn't configured; this was changed so the DNS resolution is attempted only once the first time an admin page is loaded. (hossman) 75. SOLR-1529: More than 8 deleteByQuery commands in a single request caused an error to be returned, although the deletes were still executed. (asmodean via yonik) 76. SOLR-800: Deep copy collections to avoid ConcurrentModificationException in XPathEntityprocessor while streaming (Kyle Morrison, Noble Paul via shalin) 77. SOLR-823: Request parameter variables ${dataimporter.request.xxx} are not resolved in DIH (Mck SembWever, Noble Paul, shalin) 78. SOLR-728: Add synchronization to avoid race condition of multiple DIH imports working concurrently (Walter Ferrara, shalin) 79. SOLR-742: Add ability to create dynamic fields with custom DataImportHandler transformers (Wojtek Piaseczny, Noble Paul, shalin) 80. SOLR-832: Rows parameter is not honored in DIH non-debug mode and can abort a running import in debug mode. (Akshay Ukey, shalin) 81. SOLR-838: The DIH VariableResolver obtained from a DataSource's context does not have current data. (Noble Paul via shalin) 82. SOLR-864: DataImportHandler does not catch and log Errors (shalin) 83. SOLR-873: Fix case-sensitive field names and columns (Jon Baer, shalin) 84. SOLR-893: Unable to delete documents via SQL and deletedPkQuery with deltaimport (Dan Rosher via shalin) 85. SOLR-888: DIH DateFormatTransformer cannot convert non-string type (Amit Nithian via shalin) 86. SOLR-841: DataImportHandler should throw exception if a field does not have column attribute (Michael Henson, shalin) 87. SOLR-884: CachedSqlEntityProcessor should check if the cache key is present in the query results (Noble Paul via shalin) 88. SOLR-985: Fix thread-safety issue with DIH TemplateString for concurrent imports with multiple cores. (Ryuuichi Kumai via shalin) 89. SOLR-999: DIH XPathRecordReader fails on XMLs with nodes mixed with CDATA content. (Fergus McMenemie, Noble Paul via shalin) 90. SOLR-1000: DIH FileListEntityProcessor should not apply fileName filter to directory names. (Fergus McMenemie via shalin) 91. SOLR-1009: Repeated column names result in duplicate values. (Fergus McMenemie, Noble Paul via shalin) 92. SOLR-1017: Fix DIH thread-safety issue with last_index_time for concurrent imports in multiple cores due to unsafe usage of SimpleDateFormat by multiple threads. (Ryuuichi Kumai via shalin) 93. SOLR-1024: Calling abort on DataImportHandler import commits data instead of calling rollback. (shalin) 94. SOLR-1037: DIH should not add null values in a row returned by EntityProcessor to documents. (shalin) 95. SOLR-1040: DIH XPathEntityProcessor fails with an xpath like /feed/entry/link[@type='text/html']/@href (Noble Paul via shalin) 96. SOLR-1042: Fix memory leak in DIH by making TemplateString non-static member in VariableResolverImpl (Ryuuichi Kumai via shalin) 97. SOLR-1053: IndexOutOfBoundsException in DIH SolrWriter.getResourceAsString when size of data-config.xml is a multiple of 1024 bytes. (Herb Jiang via shalin) 98. SOLR-1077: IndexOutOfBoundsException with useSolrAddSchema in DIH XPathEntityProcessor. (Sam Keen, Noble Paul via shalin) 99. SOLR-1080: DIH RegexTransformer should not replace if regex is not matched. (Noble Paul, Fergus McMenemie via shalin) 100.SOLR-1090: DataImportHandler should load the data-config.xml using UTF-8 encoding. (Rui Pereira, shalin) 101.SOLR-1146: ConcurrentModificationException in DataImporter.getStatusMessages (Walter Ferrara, Noble Paul via shalin) 102.SOLR-1229: Fixes for DIH deletedPkQuery, particularly when using transformed Solr unique id's (Lance Norskog, Noble Paul via ehatcher) 103.SOLR-1286: Fix the IH commit parameter always defaulting to "true" even if "false" is explicitly passed in. (Jay Hill, Noble Paul via ehatcher) 104.SOLR-1323: Reset XPathEntityProcessor's $hasMore/$nextUrl when fetching next URL (noble, ehatcher) 105.SOLR-1450: DIH: Jdbc connection properties such as batchSize are not applied if the driver jar is placed in solr_home/lib. (Steve Sun via shalin) 106.SOLR-1474: DIH Delta-import should run even if last_index_time is not set. (shalin) Other Changes ---------------------- 1. Upgraded to Lucene 2.4.0 (yonik) 2. SOLR-805: Upgraded to Lucene 2.9-dev (r707499) (koji) 3. DumpRequestHandler (/debug/dump): changed 'fieldName' to 'sourceInfo'. (ehatcher) 4. SOLR-852: Refactored common code in CSVRequestHandler and XMLUpdateRequestHandler (gsingers, ehatcher) 5. SOLR-871: Removed dependency on stax-utils.jar. If you using solr.jar and running java 6, you can also remove woodstox and geronimo. (ryan) 6. SOLR-465: Upgraded to Lucene 2.9-dev (r719351) (shalin) 7. SOLR-889: Upgraded to commons-io-1.4.jar and commons-fileupload-1.2.1.jar (ryan) 8. SOLR-875: Upgraded to Lucene 2.9-dev (r723985) and consolidated the BitSet implementations (Michael Busch, gsingers) 9. SOLR-819: Upgraded to Lucene 2.9-dev (r724059) to get access to Arabic public constructors (gsingers) 10. SOLR-900: Moved solrj into /src/solrj. The contents of solr-common.jar is now included in the solr-solrj.jar. (ryan) 11. SOLR-924: Code cleanup: make all existing finalize() methods call super.finalize() in a finally block. All current instances extend Object, so this doesn't fix any bugs, but helps protect against future changes. (Kay Kay via hossman) 12. SOLR-885: NamedListCodec is renamed to JavaBinCodec and returns Object instead of NamedList. (Noble Paul, yonik via shalin) 13. SOLR-84: Use new Solr logo in admin (Michiel via koji) 14. SOLR-981: groupId for Woodstox dependency in maven solrj changed to org.codehaus.woodstox (Tim Taranov via shalin) 15. Upgraded to Lucene 2.9-dev r738218 (yonik) 16. SOLR-959: Refactored TestReplicationHandler to remove hardcoded port numbers (hossman, Akshay Ukey via shalin) 17. Upgraded to Lucene 2.9-dev r742220 (yonik) 18. SOLR-1022: Better "ignored" field in example schema.xml (Peter Wolanin via hossman) 19. SOLR-967: New type-safe constructor for NamedList (Kay Kay via hossman) 20. SOLR-1036: Change default QParser from "lucenePlusSort" to "lucene" to reduce confusion of semicolon splitting behavior when no sort param is specified (hossman) 21. Upgraded to Lucene 2.9-dev r752164 (shalin) 22. SOLR-1068: Use fsync on replicated index and configuration files (yonik, Noble Paul, shalin) 23. SOLR-952: Cleanup duplicated code in deprecated HighlightingUtils (hossman) 24. Upgraded to Lucene 2.9-dev r764281 (shalin) 25. SOLR-1079: Rename omitTf to omitTermFreqAndPositions (shalin) 26. SOLR-804: Added Lucene's misc contrib JAR (rev 764281). (gsingers) 27. Upgraded to Lucene 2.9-dev r768228 (shalin) 28. Upgraded to Lucene 2.9-dev r768336 (shalin) 29. SOLR-997: Wait for a longer time for slave to complete replication in TestReplicationHandler (Mark Miller via shalin) 30. SOLR-748: FacetComponent helper classes are made public as an experimental API. (Wojtek Piaseczny via shalin) 31. Upgraded to Lucene 2.9-dev 773862 (Mark Miller) 32. Upgraded to Lucene 2.9-dev r776177 (shalin) 33. SOLR-1149: Made QParserPlugin and related classes extendible as an experimental API. (Kaktu Chakarabati via shalin) 34. Upgraded to Lucene 2.9-dev r779312 (yonik) 35. SOLR-786: Refactor DisMaxQParser to allow overriding certain features of DisMaxQParser (Wojciech Biela via shalin) 36. SOLR-458: Add equals and hashCode methods to NamedList (Stefan Rinner, shalin) 37. SOLR-1184: Add option in solrconfig to open a new IndexReader rather than using reopen. Done mainly as a fail-safe in the case that a user runs into a reopen bug/issue. (Mark Miller) 38. SOLR-1215 use double quotes to enclose attributes in solr.xml (noble) 39. SOLR-1151: add dynamic copy field and maxChars example to example schema.xml. (Peter Wolanin, Mark Miller) 40. SOLR-1233: remove /select?qt=/whatever restriction on /-prefixed request handlers. (ehatcher) 41. SOLR-1257: logging.jsp has been removed and now passes through to the hierarchical log level tool added in Solr 1.3. Users still hitting "/admin/logging.jsp" should switch to "/admin/logging". (hossman) 42. Upgraded to Lucene 2.9-dev r794238. Other changes include: - LUCENE-1614 - Use Lucene's DocIdSetIterator.NO_MORE_DOCS as the sentinel value. - LUCENE-1630 - Add acceptsDocsOutOfOrder method to Collector implementations. - LUCENE-1673, LUCENE-1701 - Trie has moved to Lucene core and renamed to NumericRangeQuery. - LUCENE-1662, LUCENE-1687 - Replace usage of ExtendedFieldCache by FieldCache. (shalin) 42. SOLR-1241: Solr's CharFilter has been moved to Lucene. Remove CharFilter and related classes from Solr and use Lucene's corresponding code (koji via shalin) 43. SOLR-1261: Lucene trunk renamed RangeQuery & Co to TermRangeQuery (Uwe Schindler via shalin) 44. Upgraded to Lucene 2.9-dev r801856 (Mark Miller) 45. SOLR-1276: Added StatsComponentTest (Rafał Kuć, gsingers) 46. SOLR-1377: The TokenizerFactory API has changed to explicitly return a Tokenizer rather then a TokenStream (that may be or may not be a Tokenizer). This change is required to take advantage of the Token reuse improvements in lucene 2.9. (ryan) 47. SOLR-1410: Log a warning if the deprecated charset option is used on GreekLowerCaseFilterFactory, RussianStemFilterFactory, RussianLowerCaseFilterFactory or RussianLetterTokenizerFactory. (Robert Muir via hossman) 48. SOLR-1423: Due to LUCENE-1906, Solr's tokenizer should use Tokenizer.correctOffset() instead of CharStream.correctOffset(). (Uwe Schindler via koji) 49. SOLR-1319, SOLR-1345: Upgrade Solr Highlighter classes to new Lucene Highlighter API. This upgrade has resulted in a back compat break in the DefaultSolrHighlighter class - getQueryScorer is no longer protected. If you happened to be overriding that method in custom code, overide getHighlighter instead. Also, HighlightingUtils#getQueryScorer has been removed as it was deprecated and backcompat has been broken with it anyway. (Mark Miller) 50. SOLR-1357 SolrInputDocument cannot process dynamic fields (Lars Grote via noble) 51. SOLR-1075: Upgrade to Tika 0.3. See http://www.apache.org/dist/lucene/tika/CHANGES-0.3.txt (gsingers) 52. SOLR-1310: Upgrade to Tika 0.4. Note there are some differences in detecting Languages now in extracting request handler. See http://www.lucidimagination.com/search/document/d6f1899a85b2a45c/vote_apache_tika_0_4_release_candidate_2#d6f1899a85b2a45c for discussion on language detection. See http://www.apache.org/dist/lucene/tika/CHANGES-0.4.txt. (gsingers) 53. SOLR-782: DIH: Refactored SolrWriter to make it a concrete class and removed wrappers over SolrInputDocument. Refactored to load Evaluators lazily. Removed multiple document nodes in the configuration xml. Removed support for 'default' variables, they are automatically available as request parameters. (Noble Paul via shalin) 54. SOLR-964: DIH: XPathEntityProcessor now ignores DTD validations (Fergus McMenemie, Noble Paul via shalin) 55. SOLR-1029: DIH: Standardize Evaluator parameter parsing and added helper functions for parsing all evaluator parameters in a standard way. (Noble Paul, shalin) 56. SOLR-1081: Change DIH EventListener to be an interface so that components such as an EntityProcessor or a Transformer can act as an event listener. (Noble Paul, shalin) 57. SOLR-1027: DIH: Alias the 'dataimporter' namespace to a shorter name 'dih'. (Noble Paul via shalin) 58. SOLR-1084: Better error reporting when DIH entity name is a reserved word and data-config.xml root node is not <dataConfig>. (Noble Paul via shalin) 59. SOLR-1087: Deprecate 'where' attribute in CachedSqlEntityProcessor in favor of cacheKey and cacheLookup. (Noble Paul via shalin) 60. SOLR-969: Change the FULL_DUMP, DELTA_DUMP, FIND_DELTA constants in DIH Context to String. Change Context.currentProcess() to return a string instead of an integer. (Kay Kay, Noble Paul, shalin) 61. SOLR-1120: Simplified DIH EntityProcessor API by moving logic for applying transformers and handling multi-row outputs from Transformers into an EntityProcessorWrapper class. The behavior of the method EntityProcessor#destroy has been modified to be called once per parent-row at the end of row. A new method EntityProcessor#close is added which is called at the end of import. A new method Context#getResolvedEntityAttribute is added which returns the resolved value of an entity's attribute. Introduced a DocWrapper which takes care of maintaining document level session variables. (Noble Paul, shalin) 62. SOLR-1265: Add DIH variable resolving for URLDataSource properties like baseUrl. (Chris Eldredge via ehatcher) 63. SOLR-1269: Better error messages from DIH JdbcDataSource when JDBC Driver name or SQL is incorrect. (ehatcher, shalin) Build ---------------------- 1. SOLR-776: Added in ability to sign artifacts via Ant for releases (gsingers) 2. SOLR-854: Added run-example target (Mark Miller via ehatcher) 3. SOLR-1054:Fix dist-src target for DataImportHandler (Ryuuichi Kumai via shalin) 4. SOLR-1219: Added proxy.setup target (koji) 5. SOLR-1386: In build.xml, use longfile="gnu" in tar task to avoid warnings about long file names (Mark Miller via shalin) 6. SOLR-1441: Make it possible to run all tests in a package (shalin) Documentation ---------------------- 1. SOLR-789: The javadoc of RandomSortField is not readable (Nicolas Lalevée via koji) 2. SOLR-962: Note about null handling in ModifiableSolrParams.add javadoc (Kay Kay via hossman) 3. SOLR-1409: Added Solr Powered By Logos 4. SOLR-1369: Add HSQLDB Jar to example-DIH, unzip database and update instructions. ================== Release 1.3.0 ================== Upgrading from Solr 1.2 ----------------------- IMPORTANT UPGRADE NOTE: In a master/slave configuration, all searchers/slaves should be upgraded before the master! If the master were to be updated first, the older searchers would not be able to read the new index format. The Porter snowball based stemmers in Lucene were updated (LUCENE-1142), and are not guaranteed to be backward compatible at the index level (the stem of certain words may have changed). Re-indexing is recommended. Older Apache Solr installations can be upgraded by replacing the relevant war file with the new version. No changes to configuration files should be needed. This version of Solr contains a new version of Lucene implementing an updated index format. This version of Solr/Lucene can still read and update indexes in the older formats, and will convert them to the new format on the first index change. Be sure to backup your index before upgrading in case you need to downgrade. Solr now recognizes HTTP Request headers related to HTTP Caching (see RFC 2616 sec13) and will by default respond with "304 Not Modified" when appropriate. This should only affect users who access Solr via an HTTP Cache, or via a Web-browser that has an internal cache, but if you wish to suppress this behavior an '<httpCaching never304="true"/>' option can be added to your solrconfig.xml. See the wiki (or the example solrconfig.xml) for more details... http://wiki.apache.org/solr/SolrConfigXml#HTTPCaching In Solr 1.2, DateField did not enforce the canonical representation of the ISO 8601 format when parsing incoming data, and did not generation the canonical format when generating dates from "Date Math" strings (particularly as it pertains to milliseconds ending in trailing zeros). As a result equivalent dates could not always be compared properly. This problem is corrected in Solr 1.3, but DateField users that might have been affected by indexing inconsistent formats of equivilent dates (ie: 1995-12-31T23:59:59Z vs 1995-12-31T23:59:59.000Z) may want to consider reindexing to correct these inconsistencies. Users who depend on some of the the "broken" behavior of DateField in Solr 1.2 (specificly: accepting any input that ends in a 'Z') should consider using the LegacyDateField class as a possible alternative. Users that desire 100% backwards compatibility should consider using the Solr 1.2 version of DateField. Due to some changes in the lifecycle of TokenFilterFactories, users of Solr 1.2 who have written Java code which constructs new instances of StopFilterFactory, SynonymFilterFactory, or EnglishProterFilterFactory will need to modify their code by adding a line like the following prior to using the factory object... factory.inform(SolrCore.getSolrCore().getSolrConfig().getResourceLoader()); These lifecycle changes do not affect people who use Solr "out of the box" or who have developed their own TokenFilterFactory plugins. More info can be found in SOLR-594. The python client that used to ship with Solr is no longer included in the distribution (see client/python/README.txt). Detailed Change List -------------------- New Features 1. SOLR-69: Adding MoreLikeThisHandler to search for similar documents using lucene contrib/queries MoreLikeThis. MoreLikeThis is also available from the StandardRequestHandler using ?mlt=true. (bdelacretaz, ryan) 2. SOLR-253: Adding KeepWordFilter and KeepWordFilterFactory. A TokenFilter that keeps tokens with text in the registered keeplist. This behaves like the inverse of StopFilter. (ryan) 3. SOLR-257: WordDelimiterFilter has a new parameter splitOnCaseChange, which can be set to 0 to disable splitting "PowerShot" => "Power" "Shot". (klaas) 4. SOLR-193: Adding SolrDocument and SolrInputDocument to represent documents outside of the lucene Document infrastructure. This class will be used by clients and for processing documents. (ryan) 5. SOLR-244: Added ModifiableSolrParams - a SolrParams implementation that help you change values after initialization. (ryan) 6. SOLR-20: Added a java client interface with two implementations. One implementation uses commons httpclient to connect to solr via HTTP. The other connects to solr directly. Check client/java/solrj. This addition also includes tests that start jetty and test a connection using the full HTTP request cycle. (Darren Erik Vengroff, Will Johnson, ryan) 7. SOLR-133: Added StaxUpdateRequestHandler that uses StAX for XML parsing. This implementation has much better error checking and lets you configure a custom UpdateRequestProcessor that can selectively process update requests depending on the request attributes. This class will likely replace XmlUpdateRequestHandler. (Thorsten Scherler, ryan) 8. SOLR-264: Added RandomSortField, a utility field with a random sort order. The seed is based on a hash of the field name, so a dynamic field of this type is useful for generating different random sequences. This field type should only be used for sorting or as a value source in a FunctionQuery (ryan, hossman, yonik) 9. SOLR-266: Adding show=schema to LukeRequestHandler to show the parsed schema fields and field types. (ryan) 10. SOLR-133: The UpdateRequestHandler now accepts multiple delete options within a single request. For example, sending: <delete><id>1</id><id>2</id></delete> will delete both 1 and 2. (ryan) 11. SOLR-269: Added UpdateRequestProcessor plugin framework. This provides a reasonable place to process documents after they are parsed and before they are committed to the index. This is a good place for custom document manipulation or document based authorization. (yonik, ryan) 12. SOLR-260: Converting to a standard PluginLoader framework. This reworks RequestHandlers, FieldTypes, and QueryResponseWriters to share the same base code for loading and initializing plugins. This adds a new configuration option to define the default RequestHandler and QueryResponseWriter in XML using default="true". (ryan) 13. SOLR-225: Enable pluggable highlighting classes. Allow configurable highlighting formatters and Fragmenters. (ryan) 14. SOLR-273/376/452/516: Added hl.maxAnalyzedChars highlighting parameter, defaulting to 50k, hl.alternateField, which allows the specification of a backup field to use as summary if no keywords are matched, and hl.mergeContiguous, which combines fragments if they are adjacent in the source document. (klaas, Grant Ingersoll, Koji Sekiguchi via klaas) 15. SOLR-291: Control maximum number of documents to cache for any entry in the queryResultCache via queryResultMaxDocsCached solrconfig.xml entry. (Koji Sekiguchi via yonik) 16. SOLR-240: New <lockType> configuration setting in <mainIndex> and <indexDefaults> blocks supports all Lucene builtin LockFactories. 'single' is recommended setting, but 'simple' is default for total backwards compatibility. (Will Johnson via hossman) 17. SOLR-248: Added CapitalizationFilterFactory that creates tokens with normalized capitalization. This filter is useful for facet display, but will not work with a prefix query. (ryan) SOLR-468: Change to the semantics to keep the original token, not the token in the Map. Also switched to use Lucene's new reusable token capabilities. (gsingers) 18. SOLR-307: Added NGramFilterFactory and EdgeNGramFilterFactory. (Thomas Peuss via Otis Gospodnetic) 19. SOLR-305: analysis.jsp can be given a fieldtype instead of a field name. (hossman) 20. SOLR-102: Added RegexFragmenter, which splits text for highlighting based on a given pattern. (klaas) 21. SOLR-258: Date Faceting added to SimpleFacets. Facet counts computed for ranges of size facet.date.gap (a DateMath expression) between facet.date.start and facet.date.end. (hossman) 22. SOLR-196: A PHP serialized "phps" response writer that returns a serialized array that can be used with the PHP function unserialize, and a PHP response writer "php" that may be used by eval. (Nick Jenkin, Paul Borgermans, Pieter Berkel via yonik) 23. SOLR-308: A new UUIDField class which accepts UUID string values, as well as the special value of "NEW" which triggers generation of a new random UUID. (Thomas Peuss via hossman) 24. SOLR-349: New FunctionQuery functions: sum, product, div, pow, log, sqrt, abs, scale, map. Constants may now be used as a value source. (yonik) 25. SOLR-359: Add field type className to Luke response, and enabled access to the detailed field information from the solrj client API. (Grant Ingersoll via ehatcher) 26. SOLR-334: Pluggable query parsers. Allows specification of query type and arguments as a prefix on a query string. (yonik) 27. SOLR-351: External Value Source. An external file may be used to specify the values of a field, currently usable as a ValueSource in a FunctionQuery. (yonik) 28. SOLR-395: Many new features for the spell checker implementation, including an extended response mode with much richer output, multi-word spell checking, and a bevy of new and renamed options (see the wiki). (Mike Krimerman, Scott Taber via klaas). 29. SOLR-408: Added PingRequestHandler and deprecated SolrCore.getPingQueryRequest(). Ping requests should be configured using standard RequestHandler syntax in solrconfig.xml rather then using the <pingQuery></pingQuery> syntax. (Karsten Sperling via ryan) 30. SOLR-281: Added a 'Search Component' interface and converted StandardRequestHandler and DisMaxRequestHandler to use this framework. (Sharad Agarwal, Henri Biestro, yonik, ryan) 31. SOLR-176: Add detailed timing data to query response output. The SearchHandler interface now returns how long each section takes. (klaas) 32. SOLR-414: Plugin initialization now supports SolrCore and ResourceLoader "Aware" plugins. Plugins that implement SolrCoreAware or ResourceLoaderAware are informed about the SolrCore/ResourceLoader. (Henri Biestro, ryan) 33. SOLR-350: Support multiple SolrCores running in the same solr instance and allows runtime runtime management for any running SolrCore. If a solr.xml file exists in solr.home, this file is used to instanciate multiple cores and enables runtime core manipulation. For more informaion see: http://wiki.apache.org/solr/CoreAdmin (Henri Biestro, ryan) 34. SOLR-447: Added an single request handler that will automatically register all standard admin request handlers. This replaces the need to register (and maintain) the set of admin request handlers. Assuming solrconfig.xml includes: <requestHandler name="/admin/" class="org.apache.solr.handler.admin.AdminHandlers" /> This will register: Luke/SystemInfo/PluginInfo/ThreadDump/PropertiesRequestHandler. (ryan) 35. SOLR-142: Added RawResponseWriter and ShowFileRequestHandler. This returns config files directly. If AdminHandlers are configured, this will be added automatically. The jsp files /admin/get-file.jsp and /admin/raw-schema.jsp have been deprecated. The deprecated <admin><gettableFiles> will be automatically registered with a ShowFileRequestHandler instance for backwards compatibility. (ryan) 36. SOLR-446: TextResponseWriter can write SolrDocuments and SolrDocumentLists the same way it writes Document and DocList. (yonik, ryan) 37. SOLR-418: Adding a query elevation component. This is an optional component to elevate some documents to the top positions (or exclude them) for a given query. (ryan) 38. SOLR-478: Added ability to get back unique key information from the LukeRequestHandler. (gsingers) 39. SOLR-127: HTTP Caching awareness. Solr now recognizes HTTP Request headers related to HTTP Caching (see RFC 2616 sec13) and will respond with "304 Not Modified" when appropriate. New options have been added to solrconfig.xml to influence this behavior. (Thomas Peuss via hossman) 40. SOLR-303: Distributed Search over HTTP. Specification of shards argument causes Solr to query those shards and merge the results into a single response. Querying, field faceting (sorted only), query faceting, highlighting, and debug information are supported in distributed mode. (Sharad Agarwal, Patrick O'Leary, Sabyasachi Dalal, Stu Hood, Jayson Minard, Lars Kotthoff, ryan, yonik) 41. SOLR-356: Pluggable functions (value sources) that allow registration of new functions via solrconfig.xml (Doug Daniels via yonik) 42. SOLR-494: Added cool admin Ajaxed schema explorer. (Greg Ludington via ehatcher) 43. SOLR-497: Added date faceting to the QueryResponse in SolrJ and QueryResponseTest (Shalin Shekhar Mangar via gsingers) 44. SOLR-486: Binary response format, faster and smaller than XML and JSON response formats (use wt=javabin). BinaryResponseParser for utilizing the binary format via SolrJ and is now the default. (Noble Paul, yonik) 45. SOLR-521: StopFilterFactory support for "enablePositionIncrements" (Walter Ferrara via hossman) 46. SOLR-557: Added SolrCore.getSearchComponents() to return an unmodifiable Map. (gsingers) 47. SOLR-516: Added hl.maxAlternateFieldLength parameter, to set max length for hl.alternateField (Koji Sekiguchi via klaas) 48. SOLR-319: Changed SynonymFilterFactory to "tokenize" synonyms file. To use a tokenizer, specify "tokenizerFactory" attribute in <filter>. For example: <tokenizer class="solr.CJKTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" expand="true" ignoreCase="true" tokenizerFactory="solr.CJKTokenizerFactory"/> (koji) 49. SOLR-515: Added SimilarityFactory capability to schema.xml, making config file parameters usable in the construction of the global Lucene Similarity implementation. (ehatcher) 50. SOLR-536: Add a DocumentObjectBinder to solrj that converts Objects to and from SolrDocuments. (Noble Paul via ryan) 51. SOLR-595: Add support for Field level boosting in the MoreLikeThis Handler. (Tom Morton, gsingers) 52. SOLR-572: Added SpellCheckComponent and org.apache.solr.spelling package to support more spell checking functionality. Also includes ability to add your own SolrSpellChecker implementation that plugs in. See http://wiki.apache.org/solr/SpellCheckComponent for more details (Shalin Shekhar Mangar, Bojan Smid, gsingers) 53. SOLR-679: Added accessor methods to Lucene based spell checkers (gsingers) 54. SOLR-423: Added Request Handler close hook notification so that RequestHandlers can be notified when a core is closing. (gsingers, ryan) 55. SOLR-603: Added ability to partially optimize. (gsingers) 56. SOLR-483: Add byte/short sorting support (gsingers) 57. SOLR-14: Add preserveOriginal flag to WordDelimiterFilter (Geoffrey Young, Trey Hyde, Ankur Madnani, yonik) 58. SOLR-502: Add search timeout support. (Sean Timm via yonik) 59. SOLR-605: Add the ability to register callbacks programatically (ryan, Noble Paul) 60. SOLR-610: hl.maxAnalyzedChars can be -1 to highlight everything (Lars Kotthoff via klaas) 61. SOLR-522: Make analysis.jsp show payloads. (Tricia Williams via yonik) 62. SOLR-611: Expose sort_values returned by QueryComponent in SolrJ's QueryResponse (Dan Rosher via shalin) 63. SOLR-256: Support exposing Solr statistics through JMX (Sharad Agrawal, shalin) 64. SOLR-666: Expose warmup time in statistics for SolrIndexSearcher and LRUCache (shalin) 65. SOLR-663: Allow multiple files for stopwords, keepwords, protwords and synonyms (Otis Gospodnetic, shalin) 66. SOLR-469: Added DataImportHandler as a contrib project which makes indexing data from Databases, XML files and HTTP data sources into Solr quick and easy. Includes API and implementations for supporting multiple data sources, processors and transformers for importing data. Supports full data imports as well as incremental (delta) indexing. See http://wiki.apache.org/solr/DataImportHandler for more details. (Noble Paul, shalin) 67. SOLR-622: SpellCheckComponent supports auto-loading indices on startup and optionally, (re)builds indices on newSearcher event, if configured in solrconfig.xml (shalin) 68. SOLR-554: Hierarchical JDK log level selector for SOLR Admin replaces logging.jsp (Sean Timm via shalin) 69. SOLR-506: Emitting HTTP Cache headers can be enabled or disabled through configuration on a per-handler basis (shalin) 70. SOLR-716: Added support for properties in configuration files. Properties can be specified in solr.xml and can be used in solrconfig.xml and schema.xml (Henri Biestro, hossman, ryan, shalin) 71. SOLR-1129 : Support binding dynamic fields to beans in SolrJ (Avlesh Singh , noble) 72. SOLR-920 : Cache and reuse IndexSchema . A new attribute added in solr.xml called 'shareSchema' (noble) 73. SOLR-700: DIH: Allow configurable locales through a locale attribute in fields for NumberFormatTransformer. (Stefan Oestreicher, shalin) Changes in runtime behavior 1. SOLR-559: use Lucene updateDocument, deleteDocuments methods. This removes the maxBufferedDeletes parameter added by SOLR-310 as Lucene now manages the deletes. This provides slightly better indexing performance and makes overwrites atomic, eliminating the possibility of a crash causing duplicates. (yonik) 2. SOLR-689 / SOLR-695: If you have used "MultiCore" functionality in an unreleased version of 1.3-dev, many classes and configs have been renamed for the official 1.3 release. Speciffically, solr.xml has replaced multicore.xml, and uses a slightly different syntax. The solrj classes: MultiCore{Request/Response/Params} have been renamed: CoreAdmin{Request/Response/Params} (hossman, ryan, Henri Biestro) 3. SOLR-647: reference count the SolrCore uses to prevent a premature close while a core is still in use. (Henri Biestro, Noble Paul, yonik) 4. SOLR-737: SolrQueryParser now uses a ConstantScoreQuery for wildcard queries that prevent an exception from being thrown when the number of matching terms exceeds the BooleanQuery clause limit. (yonik) Optimizations 1. SOLR-276: improve JSON writer speed. (yonik) 2. SOLR-310: bound and reduce memory usage by providing <maxBufferedDeletes> parameter, which flushes deleted without forcing the user to use <commit/> for this purpose. (klaas) 3. SOLR-348: short-circuit faceting if less than mincount docs match. (yonik) 4. SOLR-354: Optimize removing all documents. Now when a delete by query of *:* is issued, the current index is removed. (yonik) 5. SOLR-377: Speed up response writers. (yonik) 6. SOLR-342: Added support into the SolrIndexWriter for using several new features of the new LuceneIndexWriter, including: setRAMBufferSizeMB(), setMergePolicy(), setMergeScheduler. Also, added support to specify Lucene's autoCommit functionality (not to be confused with Solr's similarily named autoCommit functionality) via the <luceneAutoCommit> config. item. See the test and example solrconfig.xml <indexDefaults> section for usage. Performance during indexing should be significantly increased by moving up to 2.3 due to Lucene's new indexing capabilities. Furthermore, the setRAMBufferSizeMB makes it more logical to decide on tuning factors related to indexing. For best performance, leave the mergePolicy and mergeScheduler as the defaults and set ramBufferSizeMB instead of maxBufferedDocs. The best value for this depends on the types of documents in use. 32 should be a good starting point, but reports have shown up to 48 MB provides good results. Note, it is acceptable to set both ramBufferSizeMB and maxBufferedDocs, and Lucene will flush based on whichever limit is reached first. (gsingers) 7. SOLR-330: Converted TokenStreams to use Lucene's new char array based capabilities. (gsingers) 8. SOLR-624: Only take snapshots if there are differences to the index (Richard Trey Hyde via gsingers) 9. SOLR-587: Delete by Query performance greatly improved by using new underlying Lucene IndexWriter implementation. (yonik) 10. SOLR-730: Use read-only IndexReaders that don't synchronize isDeleted(). This will speed up function queries and *:* queries as well as improve their scalability on multi-CPU systems. (Mark Miller via yonik) Bug Fixes 1. Make TextField respect sortMissingFirst and sortMissingLast fields. (J.J. Larrea via yonik) 2. autoCommit/maxDocs was not working properly when large autoCommit/maxTime was specified (klaas) 3. SOLR-283: autoCommit was not working after delete. (ryan) 4. SOLR-286: ContentStreamBase was not using default encoding for getBytes() (Toru Matsuzawa via ryan) 5. SOLR-292: Fix MoreLikeThis facet counting. (Pieter Berkel via ryan) 6. SOLR-297: Fix bug in RequiredSolrParams where requiring a field specific param would fail if a general default value had been supplied. (hossman) 7. SOLR-331: Fix WordDelimiterFilter handling of offsets for synonyms or other injected tokens that can break highlighting. (yonik) 8. SOLR-282: Snapshooter does not work on Solaris and OS X since the cp command there does not have the -l option. Also updated commit/optimize related scripts to handle both old and new response format. (bill) 9. SOLR-294: Logging of elapsed time broken on Solaris because the date command there does not support the %s output format. (bill) 10. SOLR-136: Snappuller - "date -d" and locales don't mix. (Jürgen Hermann via bill) 11. SOLR-333: Changed distributiondump.jsp to use Solr HOME instead of CWD to set path. 12. SOLR-393: Removed duplicate contentType from raw-schema.jsp. (bill) 13. SOLR-413: Requesting a large numbers of documents to be returned (limit) can result in an out-of-memory exception, even for a small index. (yonik) 14. The CSV loader incorrectly threw an exception when given header=true (the default). (ryan, yonik) 15. SOLR-449: the python and ruby response writers are now able to correctly output NaN and Infinity in their respective languages. (klaas) 16. SOLR-42: HTMLStripReader tokenizers now preserve correct source offsets for highlighting. (Grant Ingersoll via yonik) 17. SOLR-481: Handle UnknownHostException in _info.jsp (gsingers) 18. SOLR-324: Add proper support for Long and Doubles in sorting, etc. (gsingers) 19. SOLR-496: Cache-Control max-age changed to Long so Expires calculation won't cause overflow. (Thomas Peuss via hossman) 20. SOLR-535: Fixed typo (Tokenzied -> Tokenized) in schema.jsp (Thomas Peuss via billa) 21. SOLR-529: Better error messages from SolrQueryParser when field isn't specified and there is no defaultSearchField in schema.xml (Lars Kotthoff via hossman) 22. SOLR-530: Better error messages/warnings when parsing schema.xml: field using bogus fieldtype and multiple copyFields to a non-multiValue field. (Shalin Shekhar Mangar via hossman) 23. SOLR-528: Better error message when defaultSearchField is bogus or not indexed. (Lars Kotthoff via hossman) 24. SOLR-533: Fixed tests so they don't use hardcoded port numbers. (hossman) 25. SOLR-400: SolrExceptionTest should now handle using OpenDNS as a DNS provider (gsingers) 26. SOLR-541: Legacy XML update support (provided by SolrUpdateServlet when no RequestHandler is mapped to "/update") now logs error correctly. (hossman) 27. SOLR-267: Changed logging to report number of hits, and also provide a mechanism to add log messages to be output by the SolrCore via a NamedList toLog member variable. (Will Johnson, yseeley, gsingers) - SOLR-267: Removed adding values to the HTTP headers in SolrDispatchFilter (gsingers) 28. SOLR-509: Moved firstSearcher event notification to the end of the SolrCore constructor (Koji Sekiguchi via gsingers) 29. SOLR-470, SOLR-552, SOLR-544, SOLR-701: Multiple fixes to DateField regarding lenient parsing of optional milliseconds, and correct formating using the canonical representation. LegacyDateField has been added for people who have come to depend on the existing broken behavior. (hossman, Stefan Oestreicher) 30. SOLR-539: Fix for non-atomic long counters and a cast fix to avoid divide by zero. (Sean Timm via Otis Gospodnetic) 31. SOLR-514: Added explicit media-type with UTF* charset to *.xsl files that don't already have one. (hossman) 32. SOLR-505: Give RequestHandlers the possiblity to suppress the generation of HTTP caching headers. (Thomas Peuss via Otis Gospodnetic) 33. SOLR-553: Handle highlighting of phrase terms better when hl.usePhraseHighligher=true URL param is used. (Bojan Smid via Otis Gospodnetic) 34. SOLR-590: Limitation in pgrep on Linux platform breaks script-utils fixUser. (Hannes Schmidt via billa) 35. SOLR-597: SolrServlet no longer "caches" SolrCore. This was causing problems in Resin, and could potentially cause problems for customized usages of SolrServlet. 36. SOLR-585: Now sets the QParser on the ResponseBuilder (gsingers) 37. SOLR-604: If the spellchecking path is relative, make it relative to the Solr Data Directory. (Shalin Shekhar Mangar via gsingers) 38. SOLR-584: Make stats.jsp and stats.xsl more robust. (Yousef Ourabi and hossman) 39. SOLR-443: SolrJ: Declare UTF-8 charset on POSTed parameters to avoid problems with servlet containers that default to latin-1 and allow switching of the exact POST mechanism for parameters via useMultiPartPost in CommonsHttpSolrServer. (Lars Kotthoff, Andrew Schurman, ryan, yonik) 40. SOLR-556: multi-valued fields always highlighted in disparate snippets (Lars Kotthoff via klaas) 41. SOLR-501: Fix admin/analysis.jsp UTF-8 input for some other servlet containers such as Tomcat. (Hiroaki Kawai, Lars Kotthoff via yonik) 42. SOLR-616: SpellChecker accuracy configuration is not applied for FileBasedSpellChecker. Apply it for FileBasedSpellChecker and IndexBasedSpellChecker both. (shalin) 43. SOLR-648: SpellCheckComponent throws NullPointerException on using spellcheck.q request parameter after restarting Solr, if reload is called but build is not called. (Jonathan Lee, shalin) 44. SOLR-598: DebugComponent now always occurs last in the SearchHandler list unless the components are explicitly declared. (gsingers) 45. SOLR-676: DataImportHandler should use UpdateRequestProcessor API instead of directly using UpdateHandler. (shalin) 46. SOLR-696: Fixed bug in NamedListCodec in regards to serializing Iterable objects. (gsingers) 47. SOLR-669: snappuler fix for FreeBSD/Darwin (Richard "Trey" Hyde via Otis Gospodnetic) 48. SOLR-606: Fixed spell check collation offset issue. (Stefan Oestreicher , Geoffrey Young, gsingers) 49. SOLR-589: Improved handling of badly formated query strings (Sean Timm via Otis Gospodnetic) 50. SOLR-749: Allow QParser and ValueSourceParsers to be extended with same name (hossman, gsingers) 51. SOLR-704: DIH NumberFormatTransformer can silently ignore part of the string while parsing. Now it tries to use the complete string for parsing. Failure to do so will result in an exception. (Stefan Oestreicher via shalin) 52. SOLR-729: DIH Context.getDataSource(String) gives current entity's DataSource instance regardless of argument. (Noble Paul, shalin) 53. SOLR-726: DIH: Jdbc Drivers and DataSources fail to load if placed in multicore sharedLib or core's lib directory. (Walter Ferrara, Noble Paul, shalin) Other Changes 1. SOLR-135: Moved common classes to org.apache.solr.common and altered the build scripts to make two jars: apache-solr-1.3.jar and apache-solr-1.3-common.jar. This common.jar can be used in client code; It does not have lucene or junit dependencies. The original classes have been replaced with a @Deprecated extended class and are scheduled to be removed in a later release. While this change does not affect API compatibility, it is recommended to update references to these deprecated classes. (ryan) 2. SOLR-268: Tweaks to post.jar so it prints the error message from Solr. (Brian Whitman via hossman) 3. Upgraded to Lucene 2.2.0; June 18, 2007. 4. SOLR-215: Static access to SolrCore.getSolrCore() and SolrConfig.config have been deprecated in order to support multiple loaded cores. (Henri Biestro via ryan) 5. SOLR-367: The create method in all TokenFilter and Tokenizer Factories provided by Solr now declare their specific return types instead of just using "TokenStream" (hossman) 6. SOLR-396: Hooks add to build system for automatic generation of (stub) Tokenizer and TokenFilter Factories. Also: new Factories for all Tokenizers and TokenFilters provided by the lucene-analyzers-2.2.0.jar -- includes support for German, Chinese, Russan, Dutch, Greek, Brazilian, Thai, and French. (hossman) 7. Upgraded to commons-CSV r609327, which fixes escaping bugs and introduces new escaping and whitespace handling options to increase compatibility with different formats. (yonik) 8. Upgraded to Lucene 2.3.0; Jan 23, 2008. 9. SOLR-451: Changed analysis.jsp to use POST instead of GET, also made the input area a bit bigger (gsingers) 10. Upgrade to Lucene 2.3.1 11. SOLR-531: Different exit code for rsyncd-start and snappuller if disabled (Thomas Peuss via billa) 12. SOLR-550: Clarified DocumentBuilder addField javadocs (gsingers) 13. Upgrade to Lucene 2.3.2 14. SOLR-518: Changed luke.xsl to use divs w/css for generating histograms instead of SVG (Thomas Peuss via hossman) 15. SOLR-592: Added ShardParams interface and changed several string literals to references to constants in CommonParams. (Lars Kotthoff via Otis Gospodnetic) 16. SOLR-520: Deprecated unused LengthFilter since already core in Lucene-Java (hossman) 17. SOLR-645: Refactored SimpleFacetsTest (Lars Kotthoff via hossman) 18. SOLR-591: Changed Solrj default value for facet.sort to true (Lars Kotthoff via Shalin) 19. Upgraded to Lucene 2.4-dev (r669476) to support SOLR-572 (gsingers) 20. SOLR-636: Improve/simplify example configs; and make index.jsp links more resilient to configs loaded via an InputStream (Lars Kotthoff, hossman) 21. SOLR-682: Scripts now support FreeBSD (Richard Trey Hyde via gsingers) 22. SOLR-489: Added in deprecation comments. (Sean Timm, Lars Kothoff via gsingers) 23. SOLR-692: Migrated to stable released builds of StAX API 1.0.1 and StAX 1.2.0 (shalin) 24. Upgraded to Lucene 2.4-dev (r686801) (yonik) 25. Upgraded to Lucene 2.4-dev (r688745) 27-Aug-2008 (yonik) 26. Upgraded to Lucene 2.4-dev (r691741) 03-Sep-2008 (yonik) 27. Replaced the StAX reference implementation with the geronimo StAX API jar, and the Woodstox StAX implementation. (yonik) Build 1. SOLR-411. Changed the names of the Solr JARs to use the defacto standard JAR names based on project-name-version.jar. This yields, for example: apache-solr-common-1.3-dev.jar apache-solr-solrj-1.3-dev.jar apache-solr-1.3-dev.jar 2. SOLR-479: Added clover code coverage targets for committers and the nightly build. Requires the Clover library, as licensed to Apache and only available privately. To run: ant -Drun.clover=true clean clover test generate-clover-reports 3. SOLR-510: Nightly release includes client sources. (koji) 4. SOLR-563: Modified the build process to build contrib projects (Shalin Shekhar Mangar via Otis Gospodnetic) 5. SOLR-673: Modify build file to create javadocs for core, solrj, contrib and "all inclusive" (shalin) 6. SOLR-672: Nightly release includes contrib sources. (Jeremy Hinegardner, shalin) 7. SOLR-586: Added ant target and POM files for building maven artifacts of the Solr core, common, client and contrib. The target can publish artifacts with source and javadocs. (Spencer Crissman, Craig McClanahan, shalin) ================== Release 1.2 ================== Upgrading from Solr 1.1 ------------------------------------- IMPORTANT UPGRADE NOTE: In a master/slave configuration, all searchers/slaves should be upgraded before the master! If the master were to be updated first, the older searchers would not be able to read the new index format. Older Apache Solr installations can be upgraded by replacing the relevant war file with the new version. No changes to configuration files should be needed. This version of Solr contains a new version of Lucene implementing an updated index format. This version of Solr/Lucene can still read and update indexes in the older formats, and will convert them to the new format on the first index change. One change in the new index format is that all "norms" are kept in a single file, greatly reducing the number of files per segment. Users of compound file indexes will want to consider converting to the non-compound format for faster indexing and slightly better search concurrency. The JSON response format for facets has changed to make it easier for clients to retain sorted order. Use json.nl=map explicitly in clients to get the old behavior, or add it as a default to the request handler in solrconfig.xml The Lucene based Solr query syntax is slightly more strict. A ':' in a field value must be escaped or the whole value must be quoted. The Solr "Request Handler" framework has been updated in two key ways: First, if a Request Handler is registered in solrconfig.xml with a name starting with "/" then it can be accessed using path-based URL, instead of using the legacy "/select?qt=name" URL structure. Second, the Request Handler framework has been extended making it possible to write Request Handlers that process streams of data for doing updates, and there is a new-style Request Handler for XML updates given the name of "/update" in the example solrconfig.xml. Existing installations without this "/update" handler will continue to use the old update servlet and should see no changes in behavior. For new-style update handlers, errors are now reflected in the HTTP status code, Content-type checking is more strict, and the response format has changed and is controllable via the wt parameter. Detailed Change List -------------------- New Features 1. SOLR-82: Default field values can be specified in the schema.xml. (Ryan McKinley via hossman) 2. SOLR-89: Two new TokenFilters with corresponding Factories... * TrimFilter - Trims leading and trailing whitespace from Tokens * PatternReplaceFilter - applies a Pattern to each token in the stream, replacing match occurances with a specified replacement. (hossman) 3. SOLR-91: allow configuration of a limit of the number of searchers that can be warming in the background. This can be used to avoid out-of-memory errors, or contention caused by more and more searchers warming in the background. An error is thrown if the limit specified by maxWarmingSearchers in solrconfig.xml is exceeded. (yonik) 4. SOLR-106: New faceting parameters that allow specification of a minimum count for returned facets (facet.mincount), paging through facets (facet.offset, facet.limit), and explicit sorting (facet.sort). facet.zeros is now deprecated. (yonik) 5. SOLR-80: Negative queries are now allowed everywhere. Negative queries are generated and cached as their positive counterpart, speeding generation and generally resulting in smaller sets to cache. Set intersections in SolrIndexSearcher are more efficient, starting with the smallest positive set, subtracting all negative sets, then intersecting with all other positive sets. (yonik) 6. SOLR-117: Limit a field faceting to constraints with a prefix specified by facet.prefix or f.<field>.facet.prefix. (yonik) 7. SOLR-107: JAVA API: Change NamedList to use Java5 generics and implement Iterable<Map.Entry> (Ryan McKinley via yonik) 8. SOLR-104: Support for "Update Plugins" -- RequestHandlers that want access to streams of data for doing updates. ContentStreams can come from the raw POST body, multi-part form data, or remote URLs. Included in this change is a new SolrDispatchFilter that allows RequestHandlers registered with names that begin with a "/" to be accessed using a URL structure based on that name. (Ryan McKinley via hossman) 9. SOLR-126: DirectUpdateHandler2 supports autocommitting after a specified time (in ms), using <autoCommit><maxTime>10000</maxTime></autoCommit>. (Ryan McKinley via klaas). 10. SOLR-116: IndexInfoRequestHandler added. (Erik Hatcher) 11. SOLR-79: Add system property ${<sys.prop>[:<default>]} substitution for configuration files loaded, including schema.xml and solrconfig.xml. (Erik Hatcher with inspiration from Andrew Saar) 12. SOLR-149: Changes to make Solr more easily embeddable, in addition to logging which request handler handled each request. (Ryan McKinley via yonik) 13. SOLR-86: Added standalone Java-based command-line updater. (Erik Hatcher via Bertrand Delecretaz) 14. SOLR-152: DisMaxRequestHandler now supports configurable alternate behavior when q is not specified. A "q.alt" param can be specified using SolrQueryParser syntax as a mechanism for specifying what query the dismax handler should execute if the main user query (q) is blank. (Ryan McKinley via hossman) 15. SOLR-158: new "qs" (Query Slop) param for DisMaxRequestHandler allows for specifying the amount of default slop to use when parsing explicit phrase queries from the user. (Adam Hiatt via hossman) 16. SOLR-81: SpellCheckerRequestHandler that uses the SpellChecker from the Lucene contrib. (Otis Gospodnetic and Adam Hiatt) 17. SOLR-182: allow lazy loading of request handlers on first request. (Ryan McKinley via yonik) 18. SOLR-81: More SpellCheckerRequestHandler enhancements, inlcluding support for relative or absolute directory path configurations, as well as RAM based directory. (hossman) 19. SOLR-197: New parameters for input: stream.contentType for specifying or overriding the content type of input, and stream.file for reading local files. (Ryan McKinley via yonik) 20. SOLR-66: CSV data format for document additions and updates. (yonik) 21. SOLR-184: add echoHandler=true to responseHeader, support echoParams=all (Ryan McKinley via ehatcher) 22. SOLR-211: Added a regex PatternTokenizerFactory. This extracts tokens from the input string using a regex Pattern. (Ryan McKinley) 23. SOLR-162: Added a "Luke" request handler and other admin helpers. This exposes the system status through the standard requestHandler framework. (ryan) 24. SOLR-212: Added a DirectSolrConnection class. This lets you access solr using the standard request/response formats, but does not require an HTTP connection. It is designed for embedded applications. (ryan) 25. SOLR-204: The request dispatcher (added in SOLR-104) can handle calls to /select. This offers uniform error handling for /update and /select. To enable this behavior, you must add: <requestDispatcher handleSelect="true" > to your solrconfig.xml See the example solrconfig.xml for details. (ryan) 26. SOLR-170: StandardRequestHandler now supports a "sort" parameter. Using the ';' syntax is still supported, but it is recommended to transition to the new syntax. (ryan) 27. SOLR-181: The index schema now supports "required" fields. Attempts to add a document without a required field will fail, returning a descriptive error message. By default, the uniqueKey field is a required field. This can be disabled by setting required=false in schema.xml. (Greg Ludington via ryan) 28. SOLR-217: Fields configured in the schema to be neither indexed or stored will now be quietly ignored by Solr when Documents are added. The example schema has a comment explaining how this can be used to ignore any "unknown" fields. (Will Johnson via hossman) 29. SOLR-227: If schema.xml defines multiple fieldTypes, fields, or dynamicFields with the same name, a severe error will be logged rather then quietly continuing. Depending on the <abortOnConfigurationError> settings, this may halt the server. Likewise, if solrconfig.xml defines multiple RequestHandlers with the same name it will also add an error. (ryan) 30. SOLR-226: Added support for dynamic field as the destination of a copyField using glob (*) replacement. (ryan) 31. SOLR-224: Adding a PhoneticFilterFactory that uses apache commons codec language encoders to build phonetically similar tokens. This currently supports: DoubleMetaphone, Metaphone, Soundex, and RefinedSoundex (ryan) 32. SOLR-199: new n-gram tokenizers available via NGramTokenizerFactory and EdgeNGramTokenizerFactory. (Adam Hiatt via yonik) 33. SOLR-234: TrimFilter can update the Token's startOffset and endOffset if updateOffsets="true". By default the Token offsets are unchanged. (ryan) 34. SOLR-208: new example_rss.xsl and example_atom.xsl to provide more examples for people about the Solr XML response format and how they can transform it to suit different needs. (Brian Whitman via hossman) 35. SOLR-249: Deprecated SolrException( int, ... ) constructors in favor of constructors that takes an ErrorCode enum. This will ensure that all SolrExceptions use a valid HTTP status code. (ryan) 36. SOLR-386: Abstracted SolrHighlighter and moved existing implementation to DefaultSolrHighlighter. Adjusted SolrCore and solrconfig.xml so that highlighter is configurable via a class attribute. Allows users to use their own highlighter implementation. (Tricia Williams via klaas) Changes in runtime behavior 1. Highlighting using DisMax will only pick up terms from the main user query, not boost or filter queries (klaas). 2. SOLR-125: Change default of json.nl to flat, change so that json.nl only affects items where order matters (facet constraint listings). Fix JSON output bug for null values. Internal JAVA API: change most uses of NamedList to SimpleOrderedMap. (yonik) 3. A new method "getSolrQueryParser" has been added to the IndexSchema class for retrieving a new SolrQueryParser instance with all options specified in the schema.xml's <solrQueryParser> block set. The documentation for the SolrQueryParser constructor and it's use of IndexSchema have also been clarified. (Erik Hatcher and hossman) 4. DisMaxRequestHandler's bq, bf, qf, and pf parameters can now accept multiple values (klaas). 5. Query are re-written before highlighting is performed. This enables proper highlighting of prefix and wildcard queries (klaas). 6. A meaningful exception is raised when attempting to add a doc missing a unique id if it is declared in the schema and allowDups=false. (ryan via klaas) 7. SOLR-183: Exceptions with error code 400 are raised when numeric argument parsing fails. RequiredSolrParams class added to facilitate checking for parameters that must be present. (Ryan McKinley, J.J. Larrea via yonik) 8. SOLR-179: By default, solr will abort after any severe initalization errors. This behavior can be disabled by setting: <abortOnConfigurationError>false</abortOnConfigurationError> in solrconfig.xml (ryan) 9. The example solrconfig.xml maps /update to XmlUpdateRequestHandler using the new request dispatcher (SOLR-104). This requires posted content to have a valid contentType: curl -H 'Content-type:text/xml; charset=utf-8' The response format matches that of /select and returns standard error codes. To enable solr1.1 style /update, do not map "/update" to any handler in solrconfig.xml (ryan) 10. SOLR-231: If a charset is not specified in the contentType, ContentStream.getReader() will use UTF-8 encoding. (ryan) 11. SOLR-230: More options for post.jar to support stdin, xml on the commandline, and defering commits. Tutorial modified to take advantage of these options so there is no need for curl. (hossman) 12. SOLR-128: Upgraded Jetty to the latest stable release 6.1.3 (ryan) Optimizations 1. SOLR-114: HashDocSet specific implementations of union() and andNot() for a 20x performance improvement for those set operations, and a new hash algorithm speeds up exists() by 10% and intersectionSize() by 8%. (yonik) 2. SOLR-115: Solr now uses BooleanQuery.clauses() instead of BooleanQuery.getClauses() in any situation where there is no risk of modifying the original query. (hossman) 3. SOLR-221: Speed up sorted faceting on multivalued fields by ~60% when the base set consists of a relatively large portion of the index. (yonik) 4. SOLR-221: Added a facet.enum.cache.minDf parameter which avoids using the filterCache for terms that match few documents, trading decreased memory usage for increased query time. (yonik) Bug Fixes 1. SOLR-87: Parsing of synonym files did not correctly handle escaped whitespace such as \r\n\t\b\f. (yonik) 2. SOLR-92: DOMUtils.getText (used when parsing config files) did not work properly with many DOM implementations when dealing with "Attributes". (Ryan McKinley via hossman) 3. SOLR-9,SOLR-99: Tighten up sort specification error checking, throw exceptions for missing sort specifications or a sort on a non-indexed field. (Ryan McKinley via yonik) 4. SOLR-145: Fix for bug introduced in SOLR-104 where some Exceptions were being ignored by all "out of the box" RequestHandlers. (hossman) 5. SOLR-166: JNDI solr.home code refactoring. SOLR-104 moved some JNDI related code to the init method of a Servlet Filter - according to the Servlet Spec, all Filter's should be initialized prior to initializing any Servlets, but this is not the case in at least one Servlet Container (Resin). This "bug fix" refactors this JNDI code so that it should be executed the first time any attempt is made to use the solr.home dir. (Ryan McKinley via hossman) 6. SOLR-173: Bug fix to SolrDispatchFilter to reduce "too many open files" problem was that SolrDispatchFilter was not closing requests when finished. Also modified ResponseWriters to only fetch a Searcher reference if necessary for writing out DocLists. (Ryan McKinley via hossman) 7. SOLR-168: Fix display positioning of multiple tokens at the same position in analysis.jsp (yonik) 8. SOLR-167: The SynonymFilter sometimes generated incorrect offsets when multi token synonyms were mached in the source text. (yonik) 9. SOLR-188: bin scripts do not support non-default webapp names. Added "-U" option to specify a full path to the update url, overriding the "-h" (hostname), "-p" (port) and "-w" (webapp name) parameters. (Jeff Rodenburg via billa) 10. SOLR-198: RunExecutableListener always waited for the process to finish, even when wait="false" was set. (Koji Sekiguchi via yonik) 11. SOLR-207: Changed distribution scripts to remove recursive find and avoid use of "find -maxdepth" on platforms where it is not supported. (yonik) 12. SOLR-222: Changing writeLockTimeout in solrconfig.xml did not change the effective timeout. (Koji Sekiguchi via yonik) 13. Changed the SOLR-104 RequestDispatcher so that /select?qt=xxx can not access handlers that start with "/". This makes path based authentication possible for path based request handlers. (ryan) 14. SOLR-214: Some servlet containers (including Tomcat and Resin) do not obey the specified charset. Rather then letting the the container handle it solr now uses the charset from the header contentType to decode posted content. Using the contentType: "text/xml; charset=utf-8" will force utf-8 encoding. If you do not specify a contentType, it will use the platform default. (Koji Sekiguchi via ryan) 15. SOLR-241: Undefined system properties used in configuration files now cause a clear message to be logged rather than an obscure exception thrown. (Koji Sekiguchi via ehatcher) Other Changes 1. Updated to Lucene 2.1 2. Updated to Lucene 2007-05-20_00-04-53 ================== Release 1.1.0 ================== Status ------ This is the first release since Solr joined the Incubator, and brings many new features and performance optimizations including highlighting, faceted browsing, and JSON/Python/Ruby response formats. Upgrading from previous Solr versions ------------------------------------- Older Apache Solr installations can be upgraded by replacing the relevant war file with the new version. No changes to configuration files are needed and the index format has not changed. The default version of the Solr XML response syntax has been changed to 2.2. Behavior can be preserved for those clients not explicitly specifying a version by adding a default to the request handler in solrconfig.xml By default, Solr will no longer use a searcher that has not fully warmed, and requests will block in the meantime. To change back to the previous behavior of using a cold searcher in the event there is no other warm searcher, see the useColdSearcher config item in solrconfig.xml The XML response format when adding multiple documents to the collection in a single <add> command has changed to return a single <result>. Detailed Change List -------------------- New Features 1. added support for setting Lucene's positionIncrementGap 2. Admin: new statistics for SolrIndexSearcher 3. Admin: caches now show config params on stats page 3. max() function added to FunctionQuery suite 4. postOptimize hook, mirroring the functionallity of the postCommit hook, but only called on an index optimize. 5. Ability to HTTP POST query requests to /select in addition to HTTP-GET 6. The default search field may now be overridden by requests to the standard request handler using the df query parameter. (Erik Hatcher) 7. Added DisMaxRequestHandler and SolrPluginUtils. (Chris Hostetter) 8. Support for customizing the QueryResponseWriter per request (Mike Baranczak / SOLR-16 / hossman) 9. Added KeywordTokenizerFactory (hossman) 10. copyField accepts dynamicfield-like names as the source. (Darren Erik Vengroff via yonik, SOLR-21) 11. new DocSet.andNot(), DocSet.andNotSize() (yonik) 12. Ability to store term vectors for fields. (Mike Klaas via yonik, SOLR-23) 13. New abstract BufferedTokenStream for people who want to write Tokenizers or TokenFilters that require arbitrary buffering of the stream. (SOLR-11 / yonik, hossman) 14. New RemoveDuplicatesToken - useful in situations where synonyms, stemming, or word-deliminater-ing produce identical tokens at the same position. (SOLR-11 / yonik, hossman) 15. Added highlighting to SolrPluginUtils and implemented in StandardRequestHandler and DisMaxRequestHandler (SOLR-24 / Mike Klaas via hossman,yonik) 16. SnowballPorterFilterFactory language is configurable via the "language" attribute, with the default being "English". (Bertrand Delacretaz via yonik, SOLR-27) 17. ISOLatin1AccentFilterFactory, instantiates ISOLatin1AccentFilter to remove accents. (Bertrand Delacretaz via yonik, SOLR-28) 18. JSON, Python, Ruby QueryResponseWriters: use wt="json", "python" or "ruby" (yonik, SOLR-31) 19. Make web admin pages return UTF-8, change Content-type declaration to include a space between the mime-type and charset (Philip Jacob, SOLR-35) 20. Made query parser default operator configurable via schema.xml: <solrQueryParser defaultOperator="AND|OR"/> The default operator remains "OR". 21. JAVA API: new version of SolrIndexSearcher.getDocListAndSet() which takes flags (Greg Ludington via yonik, SOLR-39) 22. A HyphenatedWordsFilter, a text analysis filter used during indexing to rejoin words that were hyphenated and split by a newline. (Boris Vitez via yonik, SOLR-41) 23. Added a CompressableField base class which allows fields of derived types to be compressed using the compress=true setting. The field type also gains the ability to specify a size threshold at which field data is compressed. (klaas, SOLR-45) 24. Simple faceted search support for fields (enumerating terms) and arbitrary queries added to both StandardRequestHandler and DisMaxRequestHandler. (hossman, SOLR-44) 25. In addition to specifying default RequestHandler params in the solrconfig.xml, support has been added for configuring values to be appended to the multi-val request params, as well as for configuring invariant params that can not overridden in the query. (hossman, SOLR-46) 26. Default operator for query parsing can now be specified with q.op=AND|OR from the client request, overriding the schema value. (ehatcher) 27. New XSLTResponseWriter does server side XSLT processing of XML Response. In the process, an init(NamedList) method was added to QueryResponseWriter which works the same way as SolrRequestHandler. (Bertrand Delacretaz / SOLR-49 / hossman) 28. json.wrf parameter adds a wrapper-function around the JSON response, useful in AJAX with dynamic script tags for specifying a JavaScript callback function. (Bertrand Delacretaz via yonik, SOLR-56) 29. autoCommit can be specified every so many documents added (klaas, SOLR-65) 30. ${solr.home}/lib directory can now be used for specifying "plugin" jars (hossman, SOLR-68) 31. Support for "Date Math" relative "NOW" when specifying values of a DateField in a query -- or when adding a document. (hossman, SOLR-71) 32. useColdSearcher control in solrconfig.xml prevents the first searcher from being used before it's done warming. This can help prevent thrashing on startup when multiple requests hit a cold searcher. The default is "false", preventing use before warm. (yonik, SOLR-77) Changes in runtime behavior 1. classes reorganized into different packages, package names changed to Apache 2. force read of document stored fields in QuerySenderListener 3. Solr now looks in ./solr/conf for config, ./solr/data for data configurable via solr.solr.home system property 4. Highlighter params changed to be prefixed with "hl."; allow fragmentsize customization and per-field overrides on many options (Andrew May via klaas, SOLR-37) 5. Default param values for DisMaxRequestHandler should now be specified using a '<lst name="defaults">...</lst>' init param, for backwards compatability all init prams will be used as defaults if an init param with that name does not exist. (hossman, SOLR-43) 6. The DisMaxRequestHandler now supports multiple occurances of the "fq" param. (hossman, SOLR-44) 7. FunctionQuery.explain now uses ComplexExplanation to provide more accurate score explanations when composed in a BooleanQuery. (hossman, SOLR-25) 8. Document update handling locking is much sparser, allowing performance gains through multiple threads. Large commits also might be faster (klaas, SOLR-65) 9. Lazy field loading can be enabled via a solrconfig directive. This will be faster when not all stored fields are needed from a document (klaas, SOLR-52) 10. Made admin JSPs return XML and transform them with new XSL stylesheets (Otis Gospodnetic, SOLR-58) 11. If the "echoParams=explicit" request parameter is set, request parameters are copied to the output. In an XML output, they appear in new <lst name="params"> list inside the new <lst name="responseHeader"> element, which replaces the old <responseHeader>. Adding a version=2.1 parameter to the request produces the old format, for backwards compatibility (bdelacretaz and yonik, SOLR-59). Optimizations 1. getDocListAndSet can now generate both a DocList and a DocSet from a single lucene query. 2. BitDocSet.intersectionSize(HashDocSet) no longer generates an intermediate set 3. OpenBitSet completed, replaces BitSet as the implementation for BitDocSet. Iteration is faster, and BitDocSet.intersectionSize(BitDocSet) and unionSize is between 3 and 4 times faster. (yonik, SOLR-15) 4. much faster unionSize when one of the sets is a HashDocSet: O(smaller_set_size) 5. Optimized getDocSet() for term queries resulting in a 36% speedup of facet.field queries where DocSets aren't cached (for example, if the number of terms in the field is larger than the filter cache.) (yonik) 6. Optimized facet.field faceting by as much as 500 times when the field has a single token per document (not multiValued & not tokenized) by using the Lucene FieldCache entry for that field to tally term counts. The first request utilizing the FieldCache will take longer than subsequent ones. Bug Fixes 1. Fixed delete-by-id for field types who's indexed form is different from the printable form (mainly sortable numeric types). 2. Added escaping of attribute values in the XML response (Erik Hatcher) 3. Added empty extractTerms() to FunctionQuery to enable use in a MultiSearcher (Yonik) 4. WordDelimiterFilter sometimes lost token positionIncrement information 5. Fix reverse sorting for fields were sortMissingFirst=true (Rob Staveley, yonik) 6. Worked around a Jetty bug that caused invalid XML responses for fields containing non ASCII chars. (Bertrand Delacretaz via yonik, SOLR-32) 7. WordDelimiterFilter can throw exceptions if configured with both generate and catenate off. (Mike Klaas via yonik, SOLR-34) 8. Escape '>' in XML output (because ]]> is illegal in CharData) 9. field boosts weren't being applied and doc boosts were being applied to fields (klaas) 10. Multiple-doc update generates well-formed xml (klaas, SOLR-65) 11. Better parsing of pingQuery from solrconfig.xml (hossman, SOLR-70) 12. Fixed bug with "Distribution" page introduced when Versions were added to "Info" page (hossman) 13. Fixed HTML escaping issues with user input to analysis.jsp and action.jsp (hossman, SOLR-74) Other Changes 1. Upgrade to Lucene 2.0 nightly build 2006-06-22, lucene SVN revision 416224, http://svn.apache.org/viewvc/lucene/java/trunk/CHANGES.txt?view=markup&pathrev=416224 2. Modified admin styles to improve display in Internet Explorer (Greg Ludington via billa, SOLR-6) 3. Upgrade to Lucene 2.0 nightly build 2006-07-15, lucene SVN revision 422302, 4. Included unique key field name/value (if available) in log message of add (billa, SOLR-18) 5. Updated to Lucene 2.0 nightly build 2006-09-07, SVN revision 462111 6. Added javascript to catch empty query in admin query forms (Tomislav Nakic-Alfirevic via billa, SOLR-48 7. blackslash escape * in ssh command used in snappuller for zsh compatibility, SOLR-63 8. check solr return code in admin scripts, SOLR-62 9. Updated to Lucene 2.0 nightly build 2006-11-15, SVN revision 475069 10. Removed src/apps containing the legacy "SolrTest" app (hossman, SOLR-3) 11. Simplified index.jsp and form.jsp, primarily by removing/hiding XML specific params, and adding an option to pick the output type. (hossman) 12. Added new numeric build property "specversion" to allow clean MANIFEST.MF files (hossman) 13. Added Solr/Lucene versions to "Info" page (hossman) 14. Explicitly set mime-type of .xsl files in web.xml to application/xslt+xml (hossman) 15. Config parsing should now work useing DOM Level 2 parsers -- Solr previously relied on getTextContent which is a DOM Level 3 addition (Alexander Saar via hossman, SOLR-78) 2006/01/17 Solr open sourced, moves to Apache Incubator