Apache Solr Release Notes Introduction ------------ Apache Solr is an open source enterprise search server based on the Apache Lucene Java search library, with XML/HTTP and JSON APIs, hit highlighting, faceted search, caching, replication, and a web administration interface. It runs in a Java servlet container such as Tomcat. See http://lucene.apache.org/solr for more information. Getting Started --------------- You need a Java 1.6 VM or later installed. In this release, there is an example Solr server including a bundled servlet container in the directory named "example". See the tutorial at http://lucene.apache.org/solr/tutorial.html $Id$ ================== 5.0.0 ================== (No changes) ================== 4.0.0-BETA =================== Versions of Major Components --------------------- Apache Tika 1.1 Carrot2 3.5.0 Velocity 1.6.4 and Velocity Tools 2.0 Apache UIMA 2.3.1 Apache ZooKeeper 3.3.5 Detailed Change List ---------------------- New Features ---------------------- * LUCENE-4201: Added JapaneseIterationMarkCharFilterFactory to normalize Japanese iteration marks. (Robert Muir, Christian Moen) * SOLR-1856: In Solr Cell, literals should override Tika-parsed values. Patch adds a param "literalsOverride" which defaults to true, but can be set to "false" to let Tika-parsed values be appended to literal values (Chris Harris, janhoy) * SOLR-3488: Added a Collection management API for SolrCloud. (Tommaso Teofili, Sami Siren, yonik, Mark Miller) * SOLR-3559: Full deleteByQuery support with SolrCloud distributed indexing. All replicas of a shard will be consistent, even if updates arrive in a different order on different replicas. (yonik) * SOLR-1929: Index encrypted documents with ExtractingUpdateRequestHandler. By supplying resource.password= or specifying an external file with regular expressions matching file names, Solr will decrypt and index PDFs and DOCX formats. (janhoy, Yiannis Pericleous) * SOLR-3562: Add options to remove instance dir or data dir on core unload. (Mark Miller, Per Steffensen) * SOLR-2702: The default directory factory was changed to NRTCachingDirectoryFactory which wraps the StandardDirectoryFactory and caches small files for improved Near Real-time (NRT) performance. (Mark Miller, yonik) * SOLR-2616: Include a sample java util logging configuration file. (David Smiley, Mark Miller) * SOLR-3460: Add cloud-scripts directory and a zkcli.sh|bat tool for easy scripting and interaction with ZooKeeper. (Mark Miller) * SOLR-1725: StatelessScriptUpdateProcessorFactory allows users to implement the full ScriptUpdateProcessor API using any scripting language with a javax.script.ScriptEngineFactory (Uri Boness, ehatcher, Simon Rosenthal, hossman) * SOLR-139: Change to updateable documents to create the document if it doesn't already exist. To assert that the document must exist, use the optimistic concurrency feature by specifying a _version_ of 1. (yonik) Bug Fixes ---------------------- * SOLR-3582: Our ZooKeeper watchers respond to session events as if they are change events, creating undesirable side effects. (Trym R. Møller, Mark Miller) * SOLR-3467: ExtendedDismax escaping is missing several reserved characters (Michael Dodsworth via janhoy) * SOLR-3587: After reloading a SolrCore, the original Analyzer is still used rather than a new one. (Alexey Serba, yonik, rmuir, Mark Miller) * LUCENE-4185: Fix a bug where CharFilters were wrongly being applied twice. (Michael Froh, rmuir) * SOLR-3610: After reloading a core, indexing would fail on any newly added fields to the schema. (Brent Mills, rmuir) * SOLR-3377: edismax fails to correctly parse a fielded query wrapped by parens. This regression was introduced in 3.6. (Bernd Fehling, Jan Høydahl, yonik) * SOLR-3621: Fix rare concurrency issue when opening a new IndexWriter for replication or rollback. (Mark Miller) * SOLR-1781: Replication index directories not always cleaned up. (Terje Sten Bjerkseth, Mark Miller) * SOLR-3639: Update ZooKeeper to 3.3.5 for a variety of bug fixes. (Mark Miller) * SOLR-3629: Typo in solr.xml persistence when overriding the solrconfig.xml file name using the "config" attribute prevented the override file from being used. (Ryan Zezeski, hossman) * SOLR-3642: Correct broken check for multivalued fields in stats.facet (Yandong Yao, hossman) Other Changes ---------------------- * SOLR-3524: Make discarding punctuation configurable in JapaneseTokenizerFactory. The default is to discard punctuation, but this is overridable as an expert option. (Kazuaki Hiraga, Jun Ohtani via Christian Moen) * SOLR-1770: Move the default core instance directory into a collection1 folder. (Mark Miller) * SOLR-3355: Add shard and collection to SolrCore statistics. (Michael Garski, Mark Miller) * SOLR-3575: solr.xml should default to persist=true (Mark Miller) * SOLR-3563: Unloading all cores in a SolrCloud collection will now cause the removal of that collection's meta data from ZooKeeper. (Mark Miller, Per Steffensen) * SOLR-3599: Add zkClientTimeout to solr.xml so that it's obvious how to change it and so that you can change it with a system property. (Mark Miller) * SOLR-3609: Change Solr's expanded webapp directory to be at a consistent path called solr-webapp rather than a temporary directory. (Mark Miller) * SOLR-3600: Raise the default zkClientTimeout from 10 seconds to 15 seconds. (Mark Miller) * SOLR-3215: Clone SolrInputDocument when distrib indexing so that update processors after the distrib update process do not process the document twice. (Mark Miller) ================== 4.0.0-ALPHA ================== More information about this release, including any errata related to the release notes, upgrade instructions, or other changes may be found online at: https://wiki.apache.org/solr/Solr4.0 Versions of Major Components --------------------- Apache Tika 1.1 Carrot2 3.5.0 Velocity 1.6.4 and Velocity Tools 2.0 Apache UIMA 2.3.1 Apache ZooKeeper 3.3.4 Upgrading from Solr 3.6-dev ---------------------- * The Lucene index format has changed and as a result, once you upgrade, previous versions of Solr will no longer be able to read your indices. In a master/slave configuration, all searchers/slaves should be upgraded before the master. If the master were to be updated first, the older searchers would not be able to read the new index format. * Setting abortOnConfigurationError=false is no longer supported (since it has never worked properly). Solr will now warn you if you attempt to set this configuration option at all. (see SOLR-1846) * The default logic for the 'mm' param of the 'dismax' QParser has been changed. If no 'mm' param is specified (either in the query, or as a default in solrconfig.xml) then the effective value of the 'q.op' param (either in the query or as a default in solrconfig.xml or from the 'defaultOperator' option in schema.xml) is used to influence the behavior. If q.op is effectively "AND" then mm=100%. If q.op is effectively "OR" then mm=0%. Users who wish to force the legacy behavior should set a default value for the 'mm' param in their solrconfig.xml file. * The VelocityResponseWriter is no longer built into the core. Its JAR and dependencies now need to be added (via or solr/home lib inclusion), and it needs to be registered in solrconfig.xml like this: * The update request parameter to choose Update Request Processor Chain is renamed from "update.processor" to "update.chain". The old parameter was deprecated but still working since Solr3.2, but is now removed entirely. * The and sections of solrconfig.xml are discontinued and replaced with the section. There are also better defaults. When migrating, if you don't know what your old settings mean, simply delete both and sections. If you have customizations, put them in section - with same syntax as before. * Two of the SolrServer subclasses in SolrJ were renamed/replaced. CommonsHttpSolrServer is now HttpSolrServer, and StreamingUpdateSolrServer is now ConcurrentUpdateSolrServer. * The PingRequestHandler no longer looks for a option in the (legacy) section of solrconfig.xml. Users who wish to take advantage of this feature should configure a "healthcheckFile" init param directly on the PingRequestHandler. As part of this change, relative file paths have been fixed to be resolved against the data dir. See the example solrconfig.xml and SOLR-1258 for more details. * Due to low level changes to support SolrCloud, the uniqueKey field can no longer be populated via or in the schema.xml. Users wishing to have Solr automatically generate a uniqueKey value when adding documents should instead use an instance of solr.UUIDUpdateProcessorFactory in their update processor chain. See SOLR-2796 for more details. Detailed Change List ---------------------- New Features ---------------------- * SOLR-3272: Solr filter factory for MorfologikFilter (Polish lemmatisation). (Rafał Kuć via Dawid Weiss, Steven Rowe, Uwe Schindler). * SOLR-571: The autowarmCount for LRUCaches (LRUCache and FastLRUCache) now supports "percentages" which get evaluated relative the current size of the cache when warming happens. (Tomas Fernandez Lobbe and hossman) * SOLR-1932: New relevancy function queries: termfreq, tf, docfreq, idf norm, maxdoc, numdocs. (yonik) * SOLR-1665: Add debug component options for timings, results and query info only (gsingers, hossman, yonik) * SOLR-2112: Solrj API now supports streaming results. (ryan) * SOLR-792: Adding PivotFacetComponent for Hierarchical faceting (ehatcher, Jeremy Hinegardner, Thibaut Lassalle, ryan) * LUCENE-2507, SOLR-2571, SOLR-2576: Added DirectSolrSpellChecker, which uses Lucene's DirectSpellChecker to retrieve correction candidates directly from the term dictionary using levenshtein automata. (James Dyer, rmuir) * SOLR-1873, SOLR-2358: SolrCloud - added shared/central config and core/shard management via zookeeper, built-in load balancing, and distributed indexing. (Jamie Johnson, Sami Siren, Ted Dunning, yonik, Mark Miller) Additional Work: SOLR-2324: SolrCloud solr.xml parameters are not persisted by CoreContainer. (Massimo Schiavon, Mark Miller) SOLR-2287: Allow users to query by multiple, compatible collections with SolrCloud. (Soheb Mahmood, Alex Cowell, Mark Miller) SOLR-2622: ShowFileRequestHandler does not work in SolrCloud mode. (Stefan Matheis, Mark Miller) SOLR-3108: Error in SolrCloud's replica lookup code when replica's are hosted in same Solr instance. (Bruno Dumon, Sami Siren, Mark Miller) SOLR-3080: Remove shard info from zookeeper when SolrCore is explicitly unloaded. (yonik, Mark Miller, siren) SOLR-3437: Recovery issues a spurious commit to the cluster. (Trym R. Møller via Mark Miller) SOLR-2822: Skip update processors already run on other nodes (hossman) * SOLR-1566: Transforming documents in the ResponseWriters. This will allow for more complex results in responses and open the door for function queries as results. (ryan with patches from grant, noble, cmale, yonik, Jan Høydahl, Arul Kalaipandian, Luca Cavanna, hossman) SOLR-2037: Thanks to SOLR-1566, documents boosted by the QueryElevationComponent can be marked as boosted. (gsingers, ryan, yonik) * SOLR-2396: Add CollationField, which is much more efficient than the Solr 3.x CollationKeyFilterFactory, and also supports Locale-sensitive range queries. (rmuir) * SOLR-2338: Add support for using in a schema's fieldType, for customizing scoring on a per-field basis. (hossman, yonik, rmuir) * SOLR-2335: New 'field("...")' function syntax for referring to complex field names (containing whitespace or special characters) in functions. * SOLR-2383: /browse improvements: generalize range and date facet display (Jan Høydahl via yonik) * SOLR-2272: Pseudo-join queries / filters. Examples: To restrict to the set of parents with at least one blue-eyed child: fq={!join from=parent to=name}eyes:blue To restrict to the set of children with at least one blue-eyed parent: fq={!join from=name to=parent}eyes:blue (yonik) * SOLR-1942: Added the ability to select postings format per fieldType in schema.xml as well as support custom Codecs in solrconfig.xml. (simonw via rmuir) * SOLR-2136: Boolean type added to function queries, along with new functions exists(), if(), and(), or(), xor(), not(), def(), and true and false constants. (yonik) * SOLR-2491: Add support for using spellcheck collation in conjunction with grouping. Note that the number of hits returned for collations is the number of ungrouped hits. (James Dyer via rmuir) * SOLR-1298: Return FunctionQuery as pseudo field. The solr 'fl' param now supports functions. For example: fl=id,sum(x,y) -- NOTE: only functions with fast random access are reccomended. (yonik, ryan) * SOLR-705: Optionally return shard info with each document in distributed search. Use fl=id,[shard] to return the shard url. (ryan) * SOLR-2417: Add explain info directly to return documents using ?fl=id,[explain] (ryan) * SOLR-2533: Converted ValueSource.ValueSourceSortField over to new rewriteable Lucene SortFields. ValueSourceSortField instances must be rewritten before they can be used. This is done by SolrIndexSearcher when necessary. (Chris Male). * SOLR-2193, SOLR-2565: You may now specify a 'soft' commit when committing. This will use Lucene's NRT feature to avoid guaranteeing documents are on stable storage in exchange for faster reopen times. There is also a new 'soft' autocommit tracker that can be configured. (Mark Miller, Robert Muir) * SOLR-2399: Updated Solr Admin interface. New look and feel with per core administration and many new options. (Stefan Matheis via ryan) * SOLR-1032: CSV handler now supports "literal.field_name=value" parameters. (Simon Rosenthal, ehatcher) * SOLR-2656: realtime-get, efficiently retrieves the latest stored fields for specified documents, even if they are not yet searchable (i.e. without reopening a searcher) (yonik) * SOLR-2703: Added support for Lucene's "surround" query parser. (Simon Rosenthal, ehatcher) * SOLR-2754: Added factories for several ranking algorithms: BM25SimilarityFactory: Okapi BM25 DFRSimilarityFactory: Divergence from Randomness models IBSimilarityFactory: Information-based models LMDirichletSimilarity: LM with Dirichlet smoothing LMJelinekMercerSimilarity: LM with Jelinek-Mercer smoothing (David Mark Nemeskey, Robert Muir) * SOLR-2134 Trie* fields should support sortMissingLast=true, and deprecate Sortable* Field Types (Ryan McKinley, Mike McCandless, Uwe Schindler, Erick Erickson) * SOLR-2438 added MultiTermAwareComponent to the various classes to allow automatic lowercasing for multiterm queries (wildcards, regex, prefix, range, etc). You can now optionally specify a "multiterm" analyzer in our schema.xml, but Solr should "do the right thing" if you don't specify (Pete Sturge Erick Erickson, Mentoring from Seeley and Muir) * SOLR-2481: Add support for commitWithin in DataImportHandler (Sami Siren via yonik) * SOLR-2992: Add support for IndexWriter.prepareCommit() via prepareCommit=true on update URLs. (yonik) * SOLR-2906: Added LFU cache options to Solr. (Shawn Heisey via Erick Erickson) * SOLR-3069: Ability to add openSearcher=false to not open a searcher when doing a hard commit. commitWithin now only invokes a softCommit. (yonik) * SOLR-2802: New FieldMutatingUpdateProcessor and Factory to simplify the development of UpdateProcessors that modify field values of documents as they are indexed. Also includes several useful new implementations: RemoveBlankFieldUpdateProcessorFactory TrimFieldUpdateProcessorFactory HTMLStripFieldUpdateProcessorFactory RegexReplaceProcessorFactory FieldLengthUpdateProcessorFactory ConcatFieldUpdateProcessorFactory FirstFieldValueUpdateProcessorFactory LastFieldValueUpdateProcessorFactory MinFieldValueUpdateProcessorFactory MaxFieldValueUpdateProcessorFactory TruncateFieldUpdateProcessorFactory IgnoreFieldUpdateProcessorFactory (hossman, janhoy) * SOLR-3120: Optional post filtering for spatial queries bbox and geofilt for LatLonType. (yonik) * SOLR-2459: Expose LogLevel selection with a RequestHandler rather then servlet (Stefan Matheis, Upayavira, ryan) * SOLR-3134: Include shard info in distributed response when shards.info=true (Russell Black, ryan) * SOLR-2898: Support grouped faceting. (Martijn van Groningen) Additional Work: SOLR-3406: Extended grouped faceting support to facet.query and facet.range parameters. (David Boychuck, Martijn van Groningen) * SOLR-2949: QueryElevationComponent is now supported with distributed search. (Mark Miller, yonik) * SOLR-3221: Added the ability to directly configure aspects of the concurrency and thread-pooling used within distributed search in solr. This allows for finer grained controlled and can be tuned by end users to target their own specific requirements. This builds on the work of the HttpCommComponent and uses the same configuration block to configure the thread pool. The default configuration has the same behaviour as solr 3.5, favouring throughput over latency. More information can be found on the wiki (http://wiki.apache.org/solr/SolrConfigXml) (Greg Bowyer) * SOLR-3278: Negative boost support to the Extended Dismax Query Parser Boost Query (bq). (James Dyer) * SOLR-3255: OpenExchangeRates.Org Exchange Rate Provider for CurrencyField (janhoy) * SOLR-3358: Logging events are captured and available from the /admin/logging request handler. (ryan) * SOLR-1535: PreAnalyzedField type provides a functionality to index (and optionally store) field content that was already processed and split into tokens using some external processing chain. Serialization format is pluggable, and defaults to JSON. (ab) * SOLR-3363: Consolidated Exceptions in Analysis Factories so they only throw InitalizationExceptions (Chris Male) * SOLR-2690: New support for a "TZ" request param which overrides the TimeZone used when rounding Dates in DateMath expressions for the entire request (all date range queries and date faceting is affected). The default TZ is still UTC. (David Schlotfeldt, hossman) * SOLR-3402: Analysis Factories are now configured with their Lucene Version throw setLuceneMatchVersion, rather than through the Map passed to init. Parsing and simple error checking for the Version is now done inside the code that creates the Analysis Factories. (Chris Male) * SOLR-3178: Optimistic locking. If a _version_ is provided with an update that does not match the version in the index, an HTTP 409 error (Conflict) will result. (Per Steffensen, yonik) * SOLR-139: Updateable documents. JSON Example: {"id":"mydoc", "f1":{"set":10}, "f2":{"add":20}} will result in field "f1" being set to 10, "f2" having an additional value of 20 added, and all other existing fields unchanged. All source fields must be stored for this feature to work correctly. (Ryan McKinley, Erik Hatcher, yonik) * SOLR-2857: Support XML,CSV,JSON, and javabin in a single RequestHandler and choose the correct ContentStreamLoader based on Content-Type header. This also deprecates the existing [Xml,JSON,CSV,Binary,Xslt]UpdateRequestHandler. (ryan) * SOLR-2585: Context-Sensitive Spelling Suggestions & Collations. This adds support for the "spellcheck.alternativeTermCount" & "spellcheck.maxResultsForSuggest" parameters, letting users receive suggestions even when all the queried terms exist in the dictionary. This differs from "spellcheck.onlyMorePopular" in that the suggestions need not consist entirely of terms with a greater document frequency than the queried terms. (James Dyer) * SOLR-2058: Edismax query parser to allow "phrase slop" to be specified per-field on the pf/pf2/pf3 parameters using optional "FieldName~slop^boost" syntax. The prior "FieldName^boost" syntax is still accepted. In such cases the value on the "ps" parameter serves as the default slop. (Ron Mayer via James Dyer) * SOLR-3495: New UpdateProcessors have been added to create default values for configured fields. These works similarly to the option in schema.xml, but are applied in the UpdateProcessorChain, so they may be used prior to other UpdateProcessors, or to generate a uniqueKey field value when using the DistributedUpdateProcessor (ie: SolrCloud) TimestampUpdateProcessorFactory UUIDUpdateProcessorFactory DefaultValueUpdateProcessorFactory (hossman) * SOLR-2993: Add WordBreakSolrSpellChecker to offer suggestions by combining adjacent query terms and/or breaking terms into multiple words. This spellchecker can be configured with a traditional checker (ie: DirectSolrSpellChecker). The results are combined and collations can contain a mix of corrections from both spellcheckers. (James Dyer) * SOLR-3508: Simplify JSON update format for deletes as well as allow version specification for optimistic locking. Examples: {"delete":"myid"} {"delete":["id1","id2","id3"]} {"delete":{"id":"myid", "_version_":123456789}} (yonik) * SOLR-3211: Allow parameter overrides in conjunction with "spellcheck.maxCollationTries". To do so, use parameters starting with "spellcheck.collateParam." For instance, to override the "mm" parameter, specify "spellcheck.collateParam.mm". This is helpful in cases where testing spellcheck collations for result counts should use different parameters from the main query (James Dyer) * SOLR-2599: CloneFieldUpdateProcessorFactory provides similar functionality to schema.xml's declaration but as an update processor that can be combined with other processors in any order. (Jan Høydahl & hossman) * SOLR-3351: eDismax: ps2 and ps3 params (janhoy) * SOLR-3542: Add WeightedFragListBuilder for FVH and set it to default fragListBuilder in example solrconfig.xml. (Sebastian Lutze, koji) Optimizations ---------------------- * SOLR-1875: Per-segment field faceting for single valued string fields. Enable with facet.method=fcs, control the number of threads used with the "threads" local param on the facet.field param. This algorithm will only be faster in the presence of rapid index changes. (yonik) * SOLR-1904: When facet.enum.cache.minDf > 0 and the base doc set is a SortedIntSet, convert to HashDocSet for better performance. (yonik) * SOLR-2092: Speed up single-valued and multi-valued "fc" faceting. Typical improvement is 5%, but can be much greater (up to 10x faster) when facet.offset is very large (deep paging). (yonik) * SOLR-2193, SOLR-2565: The default Solr update handler has been improved so that it uses fewer locks, keeps the IndexWriter open rather than closing it on each commit (ie commits no longer wait for background merges to complete), works with SolrCore to provide faster 'soft' commits, and has an improved API that requires less instanceof special casing. (Mark Miller, Robert Muir) Additional Work: SOLR-2697: commit and autocommit operations don't reset DirectUpdateHandler2.numDocsPending stats attribute. (Alexey Serba, Mark Miller) * SOLR-2950: The QueryElevationComponent now avoids using the FieldCache and looking up every document id (gsingers, yonik) Bug Fixes ---------------------- * SOLR-3139: Make ConcurrentUpdateSolrServer send UpdateRequest.getParams() as HTTP request params (siren) * SOLR-3165: Cannot use DIH in Solrcloud + Zookeeper (Alexey Serba, Mark Miller, siren) * SOLR-3068: Occasional NPE in ThreadDumpHandler (siren) * SOLR-2762: FSTLookup could return duplicate results or one results less than requested. (David Smiley, Dawid Weiss) * SOLR-2741: Bugs in facet range display in trunk (janhoy) * SOLR-1908: Fixed SignatureUpdateProcessor to fail to initialize on invalid config. Specifically: a signatureField that does not exist, or overwriteDupes=true with a signatureField that is not indexed. (hossman) * SOLR-1824: IndexSchema will now fail to initialize if there is a problem initializing one of the fields or field types. (hossman) * SOLR-1928: TermsComponent didn't correctly break ties for non-text fields sorted by count. (yonik) * SOLR-2107: MoreLikeThisHandler doesn't work with alternate qparsers. (yonik) * SOLR-2108: Fixed false positives when using wildcard queries on fields with reversed wildcard support. For example, a query of *zemog* would match documents that contain 'gomez'. (Landon Kuhn via Robert Muir) * SOLR-1962: SolrCore#initIndex should not use a mix of indexPath and newIndexPath (Mark Miller) * SOLR-2275: fix DisMax 'mm' parsing to be tolerant of whitespace (Erick Erickson via hossman) * SOLR-2193, SOLR-2565, SOLR-2651: SolrCores now properly share IndexWriters across SolrCore reloads. (Mark Miller, Robert Muir) Additional Work: SOLR-2705: On reload, IndexWriterProvider holds onto the initial SolrCore it was created with. (Yury Kats, Mark Miller) * SOLR-2682: Remove addException() in SimpleFacet. FacetComponent no longer catches and embeds exceptions occurred during facet processing, it throws HTTP 400 or 500 exceptions instead. (koji) * SOLR-2654: Directorys used by a SolrCore are now closed when they are no longer used. (Mark Miller) * SOLR-2854: Now load URL content stream data (via stream.url) when called for during request handling, rather than loading URL content streams automatically regardless of use. (David Smiley and Ryan McKinley via ehatcher) * SOLR-2829: Fix problem with false-positives due to incorrect equals methods. (Yonik Seeley, Hossman, Erick Erickson. Marc Tinnemeyer caught the bug) * SOLR-2848: Removed 'instanceof AbstractLuceneSpellChecker' hacks from distributed spellchecking code, and added a merge() method to SolrSpellChecker instead. Previously if you extended SolrSpellChecker your spellchecker would not work in distributed fashion. (James Dyer via rmuir) * SOLR-2509: StringIndexOutOfBoundsException in the spellchecker collate when the term contains a hyphen. (Thomas Gambier caught the bug, Steffen Godskesen did the patch, via Erick Erickson) * SOLR-1730: Made it clearer when a core failed to load as well as better logging when the QueryElevationComponent fails to properly initialize (gsingers) * SOLR-1520: QueryElevationComponent now supports non-string ids (gsingers) * SOLR-3037: When using binary format in solrj the codec screws up parameters (Sami Siren, Jörg Maier via yonik) * SOLR-3062: A join in the main query was not respecting any filters pushed down to it via acceptDocs since LUCENE-1536. (Mike Hugo, yonik) * SOLR-3214: If you use multiple fl entries rather than a comma separated list, all but the first entry can be ignored if you are using distributed search. (Tomas Fernandez Lobbe via Mark Miller) * SOLR-3352: eDismax: pf2 should kick in for a query with 2 terms (janhoy) * SOLR-3361: ReplicationHandler "maxNumberOfBackups" doesn't work if backups are triggered on commit (James Dyer, Tomas Fernandez Lobbe) * SOLR-2605: fixed tracking of the 'defaultCoreName' in CoreContainer so that CoreAdminHandler could return consistent information regardless of wether there is a a default core name or not. (steffkes, hossman) * SOLR-3370: fixed CSVResponseWriter to respect globs in the 'fl' param (Keith Fligg via hossman) * SOLR-3436: Group count incorrect when not all shards are queried in the second pass. (Francois Perron, Martijn van Groningen) * SOLR-3454: Exception when using result grouping with main=true and using wt=javabin. (Ludovic Boutros, Martijn van Groningen) * SOLR-3446: Better errors when PatternTokenizerFactory is configured with an invalid pattern, and include the 'name' whenever possible in plugin init error messages. (hossman) * LUCENE-4075: Cleaner path usage in TestXPathEntityProcessor (Greg Bowyer via hossman) * SOLR-2923: IllegalArgumentException when using useFilterForSortedQuery on an empty index. (Adrien Grand via Mark Miller) * SOLR-2352: Fixed TermVectorComponent so that it will not fail if the fl param contains globs or psuedo-fields (hossman) * SOLR-3541: add missing solrj dependencies to binary packages. (Thijs Vonk via siren) * SOLR-3522: fixed parsing of the 'literal()' function (hossman) * SOLR-3548: Fixed a bug in the cachability of queries using the {!join} parser or the strdist() function, as well as some minor improvements to the hashCode implementation of {!bbox} and {!geofilt} queries. (hossman) Other Changes ---------------------- * SOLR-1846: Eliminate support for the abortOnConfigurationError option. It has never worked very well, and in recent versions of Solr hasn't worked at all. (hossman) * SOLR-1889: The default logic for the 'mm' param of DismaxQParser and ExtendedDismaxQParser has been changed to be determined based on the effective value of the 'q.op' param (hossman) * SOLR-1946: Misc improvements to the SystemInfoHandler: /admin/system (hossman) * SOLR-2289: Tweak spatial coords for example docs so they are a bit more spread out (Erick Erickson via hossman) * SOLR-2288: Small tweaks to eliminate compiler warnings. primarily using Generics where applicable in method/object declatations, and adding @SuppressWarnings("unchecked") when appropriate (hossman) * SOLR-2375: Suggester Lookup implementations now store trie data and load it back on init. This means that large tries don't have to be rebuilt on every commit or core reload. (ab) * SOLR-2413: Support for returning multi-valued fields w/o tag in the XMLResponseWriter was removed. XMLResponseWriter only no longer work with values less then 2.2 (ryan) * SOLR-2423: FieldType argument changed from String to Object Conversion from SolrInputDocument > Object > Fieldable is now managed by FieldType rather then DocumentBuilder. (ryan) * SOLR-2461: QuerySenderListener and AbstractSolrEventListener are now public (hossman) * LUCENE-2995: Moved some spellchecker and suggest APIs to modules/suggest: HighFrequencyDictionary, SortedIterator, TermFreqIterator, and the suggester APIs and implementations. (rmuir) * SOLR-2576: Remove deprecated SpellingResult.add(Token, int). (James Dyer via rmuir) * LUCENE-3232: Moved MutableValue classes to new 'common' module. (Chris Male) * LUCENE-2883: FunctionQuery, DocValues (and its impls), ValueSource (and its impls) and BoostedQuery have been consolidated into the queries module. They can now be found at o.a.l.queries.function. * SOLR-2027: FacetField.getValues() now returns an empty list if there are no values, instead of null (Chris Male) * SOLR-1825: SolrQuery.addFacetQuery now enables facets automatically, like addFacetField (Chris Male) * SOLR-2663: FieldTypePluginLoader has been refactored out of IndexSchema and made public. (hossman) * SOLR-2331,SOLR-2691: Refactor CoreContainer's SolrXML serialization code and improve testing (Yury Kats, hossman, Mark Miller) * SOLR-2698: Enhance CoreAdmin STATUS command to return index size. (Yury Kats, hossman, Mark Miller) * SOLR-2654: The same Directory instance is now always used across a SolrCore so that it's easier to add other DirectoryFactory's without static caching hacks. (Mark Miller) * LUCENE-3286: 'luke' ant target has been disabled due to incompatibilities with XML queryparser location (Chris Male) * SOLR-1897: The data dir from the core descriptor should override the data dir from the solrconfig.xml rather than the other way round. (Mark Miller) * SOLR-2756: Maven configuration: Excluded transitive stax:stax-api dependency from org.codehaus.woodstox:wstx-asl dependency. (David Smiley via Steve Rowe) * SOLR-2588: Moved VelocityResponseWriter back to contrib module in order to remove it as a mandatory core dependency. (ehatcher) * SOLR-2862: More explicit lexical resources location logged if Carrot2 clustering extension is used. Fixed solr. impl. of IResource and IResourceLookup. (Dawid Weiss) * SOLR-1123: Changed JSONResponseWriter to now use application/json as its Content-Type by default. However the Content-Type can be overwritten and is set to text/plain in the example configuration. (Uri Boness, Chris Male) * SOLR-2607: Removed deprecated client/ruby directory, which included solr-ruby and flare. (ehatcher) * Solr-3032: logOnce from SolrException logOnce and all the supporting structure is gone. abortOnConfugrationError is also gone as it is no longer referenced. Errors should be caught and logged at the top-most level or logged and NOT propagated up the chain. (Erick Erickson) * SOLR-2105: Remove support for deprecated "update.processor" (since 3.2), in favor of "update.chain" (janhoy) * SOLR-3005: Default QueryResponseWriters are now initialized via init() with an empty NamedList. (Gasol Wu, Chris Male) * SOLR-2607: Removed obsolete client/ folder (ehatcher, Eric Pugh, janhoy) * SOLR-3202, SOLR-3244: Dropping Support for JSP. New Admin UI is all client side (ryan, Aliaksandr Zhuhrou, Uwe Schindler) * SOLR-3159: Upgrade example and tests to run with Jetty 8 (ryan) * SOLR-3254: Upgrade Solr to Tika 1.1 (janhoy) * SOLR-3329: Dropped getSourceID() from SolrInfoMBean and using getClass().getPackage().getSpecificationVersion() for Version. (ryan) * SOLR-3302: Upgraded SLF4j to version 1.6.4 (hossman) * SOLR-3322: Add more context to IndexReaderFactory.newReader (ab) * SOLR-3343: Moved FastWriter, FileUtils, RegexFileFilter, RTimer and SystemIdResolver from org.apache.solr.common to org.apache.solr.util (Chris Male) * SOLR-3357: ResourceLoader.newInstance now accepts a Class representation of the expected instance type (Chris Male) * SOLR-3388: HTTP caching is now disabled by default for RequestUpdateHandlers. (ryan) * SOLR-3309: web.xml now specifies metadata-complete=true (which requires Servlet 2.5) to prevent servlet containers from scanning class annotations on startup. This allows for faster startup times on some servlet containers. (Bill Bell, hossman) * SOLR-1893: Refactored some common code from LRUCache and FastLRUCache into SolrCacheBase (Tomás Fernández Löbbe via hossman) * SOLR-3403: Deprecated Analysis Factories now log their own deprecation messages. No logging support is provided by Factory parent classes. (Chris Male) * SOLR-1258: PingRequestHandler is now directly configured with a "healthcheckFile" instead of looking for the legacy syntax. Filenames specified as relative paths have been fixed so that they are resolved against the data dir instead of the CWD of the java process. (hossman) * SOLR-3083: JMX beans now report Numbers as numeric values rather then String (Tagged Siteops, Greg Bowyer via ryan) * SOLR-2796: Due to low level changes to support SolrCloud, the uniqueKey field can no longer be populated via or in the schema.xml. * SOLR-3534: The Dismax and eDismax query parsers will fall back on the 'df' parameter when 'qf' is absent. And if neither is present nor the schema default search field then an exception will be thrown now. (dsmiley) Documentation ---------------------- * SOLR-2232: Improved README info on solr.solr.home in examples (Eric Pugh and hossman) ================== 3.6.0 ================== Upgrading from Solr 3.5 ---------------------- * SOLR-2983: As a consequence of moving the code which sets a MergePolicy from SolrIndexWriter to SolrIndexConfig, (custom) MergePolicies should now have an empty constructor; thus an IndexWriter should not be passed as constructor parameter but instead set using the setIndexWriter() method. * As doGet() methods in SimplePostTool was changed to static, the client applications of this class need to be recompiled. * In Solr version 3.5 and earlier, HTMLStripCharFilter had known bugs in the character offsets it provided, triggering e.g. exceptions in highlighting. HTMLStripCharFilter has been re-implemented, addressing this and other issues. See the entry for LUCENE-3690 in the Bug Fixes section below for a detailed list of changes. For people who depend on the behavior of HTMLStripCharFilter in Solr version 3.5 and earlier: the old implementation (bugs and all) is preserved as LegacyHTMLStripCharFilter. * As of Solr 3.6, the and sections of solrconfig.xml are deprecated and replaced with a new section. Read more in SOLR-1052 below. * SOLR-3040: The DIH's admin UI (dataimport.jsp) now requires DIH request handlers to start with a '/'. (dsmiley) * SOLR-3161: is now the default. An existing config will probably work as-is because handleSelect was explicitly enabled in default configs. HandleSelect makes /select work as well as enables the 'qt' parameter. Instead, consider explicitly configuring /select as is done in the example solrconfig.xml, and register your other search handlers with a leading '/' which is a recommended practice. (David Smiley, Erik Hatcher) * SOLR-3161: Don't use the 'qt' parameter with a leading '/'. It probably won't work in 4.0 and it's now limited in 3.6 to SearchHandler subclasses that aren't lazy-loaded. * Bugs found and fixed in the SignatureUpdateProcessor that previously caused some documents to produce the same signature even when the configured fields contained distinct (non-String) values. Users of SignatureUpdateProcessor are strongly advised that they should re-index as document signatures may have now changed. (see SOLR-3200 & SOLR-3226 for details) * SOLR-2724: Specifying and in schema.xml is now considered deprecated. Instead you are encouraged to specify these via the "df" and "q.op" parameters in your request handler definition. (David Smiley) New Features ---------------------- * SOLR-2020: Add Java client that uses Apache Http Components http client (4.x). (Chantal Ackermann, Ryan McKinley, Yonik Seeley, siren) * SOLR-2854: Now load URL content stream data (via stream.url) when called for during request handling, rather than loading URL content streams automatically regardless of use. (David Smiley and Ryan McKinley via ehatcher) * SOLR-2904: BinaryUpdateRequestHandler should be able to accept multiple update requests from a stream (shalin) * SOLR-1565: StreamingUpdateSolrServer supports RequestWriter API and therefore, javabin update format (shalin) * SOLR-2438 added MultiTermAwareComponent to the various classes to allow automatic lowercasing for multiterm queries (wildcards, regex, prefix, range, etc). You can now optionally specify a "multiterm" analyzer in our schema.xml, but Solr should "do the right thing" if you don't specify (Pete Sturge Erick Erickson, Mentoring from Seeley and Muir) * SOLR-2919: Added support for localized range queries when the analysis chain uses CollationKeyFilter or ICUCollationKeyFilter. (Michael Sokolov, rmuir) * SOLR-2982: Added BeiderMorseFilterFactory for Beider-Morse (BMPM) phonetic encoder. Upgrades commons-codec to version 1.6 (Brooke Schreier Ganz, rmuir) * SOLR-1843: A new "rootName" attribute is now available when configuring in solrconfig.xml. If this attribute is set, Solr will use it as the root name for all MBeans Solr exposes via JMX. The default root name is "solr" followed by the core name. (Constantijn Visinescu, hossman) * SOLR-2906: Added LFU cache options to Solr. (Shawn Heisey via Erick Erickson) * SOLR-3036: Ability to specify overwrite=false on the URL for XML updates. (Sami Siren via yonik) * SOLR-2603: Add the encoding function for alternate fields in highlighting. (Massimo Schiavon, koji) * SOLR-1729: Evaluation of NOW for date math is done only once per request for consistency, and is also propagated to shards in distributed search. Adding a parameter NOW= to the request will override the current time. (Peter Sturge, yonik, Simon Willnauer) * SOLR-1709: Distributed support for Date and Numeric Range Faceting (Peter Sturge, David Smiley, hossman, Simon Willnauer) * SOLR-3054, LUCENE-3671: Add TypeTokenFilterFactory that creates TypeTokenFilter that filters tokens based on their TypeAttribute. (Tommaso Teofili via Uwe Schindler) * LUCENE-3305, SOLR-3056: Added Kuromoji morphological analyzer for Japanese. See the 'text_ja' fieldtype in the example to get started. (Christian Moen, Masaru Hasegawa via Robert Muir) * SOLR-1860: StopFilterFactory, CommonGramsFilterFactory, and CommonGramsQueryFilterFactory can optionally read stopwords in Snowball format (specify format="snowball"). (Robert Muir) * SOLR-3105: ElisionFilterFactory optionally allows the parameter ignoreCase (default=false). (Robert Muir) * LUCENE-3714: Add WFSTLookupFactory, a suggester that uses a weighted FST for more fine-grained suggestions. (Mike McCandless, Dawid Weiss, Robert Muir) * SOLR-3143: Add SuggestQueryConverter, a QueryConverter intended for auto-suggesters. (Robert Muir) * SOLR-3033: ReplicationHandler's backup command now supports a 'maxNumberOfBackups' init param that can be used to delete all but the most recent N backups. (Torsten Krah, James Dyer) * SOLR-2202: Currency FieldType, whith support for currencies and exchange rates (Greg Fodor & Andrew Morrison via janhoy, rmuir, Uwe Schindler) * SOLR-3026: eDismax: Locking down which fields can be explicitly queried (user fields aka uf) (janhoy, hossmann, Tomás Fernández Löbbe) * SOLR-2826: URLClassify Update Processor (janhoy) * SOLR-2764: Create a NorwegianLightStemmer and NorwegianMinimalStemmer (janhoy) * SOLR-3221: Added the ability to directly configure aspects of the concurrency and thread-pooling used within distributed search in solr. This allows for finer grained controlled and can be tuned by end users to target their own specific requirements. This builds on the work of the HttpCommComponent and uses the same configuration block to configure the thread pool. The default configuration has the same behaviour as solr 3.5, favouring throughput over latency. More information can be found on the wiki (http://wiki.apache.org/solr/SolrConfigXml) (Greg Bowyer) * SOLR-2001: The query component will substitute an empty query that matches no documents if the query parser returns null. This also prevents an exception from being thrown by the default parser if "q" is missing. (yonik) SOLR-435: if q is "" then it's also acceptable. (dsmiley, hoss) Optimizations ---------------------- * SOLR-1931: Speedup for LukeRequestHandler and admin/schema browser. New parameter reportDocCount defaults to 'false'. Old behavior still possible by specifying this as 'true' (Erick Erickson) * SOLR-3012: Move System.getProperty("type") in postData() to main() and add type argument so that the client applications of SimplePostTool can set content type via method argument. (koji) * SOLR-2888: FSTSuggester refactoring: internal storage is now UTF-8, external sorting (on disk) prevents OOMs even with large data sets (the bottleneck is now FST construction), code cleanups and API cleanups. (Dawid Weiss, Robert Muir) Bug Fixes ---------------------- * SOLR-3187 SystemInfoHandler leaks filehandles (siren) * LUCENE-3820: Fixed invalid position indexes by reimplementing PatternReplaceCharFilter. This change also drops real support for boundary characters -- all input is prebuffered for pattern matching. (Dawid Weiss) * SOLR-3068: Fixed NPE in ThreadDumpHandler (siren) * SOLR-2912: Fixed File descriptor leak in ShowFileRequestHandler (Michael Ryan, shalin) * SOLR-2819: Improved speed of parsing hex entities in HTMLStripCharFilter (Bernhard Berger, hossman) * SOLR-2509: StringIndexOutOfBoundsException in the spellchecker collate when the term contains a hyphen. (Thomas Gambier caught the bug, Steffen Godskesen did the patch, via Erick Erickson) * SOLR-2955: Fixed IllegalStateException when querying with group.sort=score desc in sharded environment. (Steffen Elberg Godskesen, Martijn van Groningen) * SOLR-2956: Fixed inconsistencies in the flags (and flag key) reported by the LukeRequestHandler (hossman) * SOLR-1730: Made it clearer when a core failed to load as well as better logging when the QueryElevationComponent fails to properly initialize (gsingers) * SOLR-1520: QueryElevationComponent now supports non-string ids (gsingers) * SOLR-3024: Fixed JSONTestUtil.matchObj, in previous releases it was not respecting the 'delta' arg (David Smiley via hossman) * SOLR-2542: Fixed DIH Context variables which were broken for all scopes other then SCOPE_ENTITY (Linbin Chen & Frank Wesemann via hossman) * SOLR-3042: Fixed Maven Jetty plugin configuration. (David Smiley via Steve Rowe) * SOLR-2970: CSV ResponseWriter returns fields defined as stored=false in schema (janhoy) * LUCENE-3690, LUCENE-2208, SOLR-882, SOLR-42: Re-implemented HTMLStripCharFilter as a JFlex-generated scanner. See below for a list of bug fixes and other changes. To get the same behavior as HTMLStripCharFilter in Solr version 3.5 and earlier (including the bugs), use LegacyHTMLStripCharFilter, which is the previous implementation. Behavior changes from the previous version: - Known offset bugs are fixed. - The "Mark invalid" exceptions reported in SOLR-1283 are no longer triggered (the bug is still present in LegacyHTMLStripCharFilter). - The character entity "'" is now always properly decoded. - More cases of