lucene/solr/CHANGES.txt

3192 lines
146 KiB
Plaintext
Raw Blame History

Apache Solr Release Notes
Introduction
------------
Apache Solr is an open source enterprise search server based on the Apache Lucene Java
search library, with XML/HTTP and JSON APIs, hit highlighting, faceted search,
caching, replication, and a web administration interface. It runs in a Java
servlet container such as Tomcat.
See http://lucene.apache.org/solr for more information.
Getting Started
---------------
You need a Java 1.6 VM or later installed.
In this release, there is an example Solr server including a bundled
servlet container in the directory named "example".
See the tutorial at http://lucene.apache.org/solr/tutorial.html
$Id$
================== 4.0.0-dev ==================
Versions of Major Components
---------------------
Apache Tika 0.8
Carrot2 3.5.0
Velocity 1.6.4 and Velocity Tools 2.0
Apache UIMA 2.3.1-SNAPSHOT
Upgrading from Solr 3.3-dev
----------------------
* The Lucene index format has changed and as a result, once you upgrade,
previous versions of Solr will no longer be able to read your indices.
In a master/slave configuration, all searchers/slaves should be upgraded
before the master. If the master were to be updated first, the older
searchers would not be able to read the new index format.
* Setting abortOnConfigurationError=false is no longer supported
(since it has never worked properly). Solr will now warn you if
you attempt to set this configuration option at all. (see SOLR-1846)
* The default logic for the 'mm' param of the 'dismax' QParser has
been changed. If no 'mm' param is specified (either in the query,
or as a default in solrconfig.xml) then the effective value of the
'q.op' param (either in the query or as a default in solrconfig.xml
or from the 'defaultOperator' option in schema.xml) is used to
influence the behavior. If q.op is effectively "AND" then mm=100%.
If q.op is effectively "OR" then mm=0%. Users who wish to force the
legacy behavior should set a default value for the 'mm' param in
their solrconfig.xml file.
Detailed Change List
----------------------
New Features
----------------------
* SOLR-571: The autowarmCount for LRUCaches (LRUCache and FastLRUCache) now
supports "percentages" which get evaluated relative the current size of
the cache when warming happens.
(Tomas Fernandez Lobbe and hossman)
* SOLR-1932: New relevancy function queries: termfreq, tf, docfreq, idf
norm, maxdoc, numdocs. (yonik)
* SOLR-1665: Add debug component options for timings, results and query info only (gsingers, hossman, yonik)
* SOLR-2001: The query component will substitute an empty query that matches
no documents if the query parser returns null. This also prevents an
exception from being thrown by the default parser if "q" is missing. (yonik)
* SOLR-2112: Solrj API now supports streaming results. (ryan)
* SOLR-792: Adding PivotFacetComponent for Hierarchical faceting
(erik, Jeremy Hinegardner, Thibaut Lassalle, ryan)
* LUCENE-2507, SOLR-2571, SOLR-2576: Added DirectSolrSpellChecker, which uses Lucene's
DirectSpellChecker to retrieve correction candidates directly from the term dictionary using
levenshtein automata. (James Dyer, rmuir)
* SOLR-1873: SolrCloud - added shared/central config and core/shard managment via zookeeper,
built-in load balancing, and infrastructure for future SolrCloud work. (yonik, Mark Miller)
Additional Work:
SOLR-2324: SolrCloud solr.xml parameters are not persisted by CoreContainer.
(Massimo Schiavon, Mark Miller)
* SOLR-1729: Evaluation of NOW for date math is done only once per request for
consistency, and is also propagated to shards in distributed search.
Adding a parameter NOW=<time_in_ms> to the request will override the
current time. (Peter Sturge, yonik)
* SOLR-1566: Transforming documents in the ResponseWriters. This will allow
for more complex results in responses and open the door for function queries
as results. (ryan with patches from grant, noble, cmale, yonik, Jan Høydahl)
* SOLR-2396: Add CollationField, which is much more efficient than
the Solr 3.x CollationKeyFilterFactory, and also supports
Locale-sensitive range queries. (rmuir)
* SOLR-2338: Add support for using <similarity/> in a schema's fieldType,
for customizing scoring on a per-field basis. (hossman, yonik, rmuir)
* SOLR-2335: New 'field("...")' function syntax for refering to complex
field names (containing whitespace or special characters) in functions.
* SOLR-1709: Distributed support for Date and Numeric Range Faceting
(Peter Sturge, David Smiley, hossman)
* SOLR-2383: /browse improvements: generalize range and date facet display
(Jan Høydahl via yonik)
* SOLR-2272: Pseudo-join queries / filters. Examples:
To restrict to the set of parents with at least one blue-eyed child:
fq={!join from=parent to=name}eyes:blue
To restrict to the set of children with at least one blue-eyed parent:
fq={!join from=name to=parent}eyes:blue
(yonik)
* SOLR-1942: Added the ability to select codec per fieldType in schema.xml
as well as support custom CodecProviders in solrconfig.xml.
NOTE: IndexReaderFactory now has a codecProvider that should be passed
to IndexReader.open (in the case you have a custom IndexReaderFactory).
(simonw via rmuir)
* SOLR-2136: Boolean type added to function queries, along with
new functions exists(), if(), and(), or(), xor(), not(), def(),
and true and false constants. (yonik)
* SOLR-2491: Add support for using spellcheck collation in conjunction
with grouping. Note that the number of hits returned for collations
is the number of ungrouped hits. (James Dyer via rmuir)
* SOLR-1298: Return FunctionQuery as pseudo field. The solr 'fl' param
now supports functions. For example: fl=id,sum(x,y) -- NOTE: only
functions with fast random access are reccomended. (yonik, ryan)
* SOLR-705: Optionally return shard info with each document in distributed
search. Use fl=id,[shard] to return the shard url. (ryan)
* SOLR-2417: Add explain info directly to return documents using
?fl=id,[explain] (ryan)
* SOLR-2533: Converted ValueSource.ValueSourceSortField over to new rewriteable Lucene
SortFields. ValueSourceSortField instances must be rewritten before they can be used.
This is done by SolrIndexSearcher when necessary. (Chris Male).
* SOLR-2193, SOLR-2565: You may now specify a 'soft' commit when committing. This will
use Lucene's NRT feature to avoid guaranteeing documents are on stable storage in exchange
for faster reopen times. There is also a new 'soft' autocommit tracker that can be
configured. (Mark Miller, Robert Muir)
Optimizations
----------------------
* SOLR-1875: Per-segment field faceting for single valued string fields.
Enable with facet.method=fcs, control the number of threads used with
the "threads" local param on the facet.field param. This algorithm will
only be faster in the presence of rapid index changes. (yonik)
* SOLR-1904: When facet.enum.cache.minDf > 0 and the base doc set is a
SortedIntSet, convert to HashDocSet for better performance. (yonik)
* SOLR-1843: A new "rootName" attribute is now available when
configuring <jmx/> in solrconfig.xml. If this attribute is set,
Solr will use it as the root name for all MBeans Solr exposes via
JMX. The default root name is "solr" followed by the core name.
(Constantijn Visinescu, hossman)
* SOLR-2092: Speed up single-valued and multi-valued "fc" faceting. Typical
improvement is 5%, but can be much greater (up to 10x faster) when facet.offset
is very large (deep paging). (yonik)
* SOLR-2193, SOLR-2565: The default Solr update handler has been improved so
that it uses fewer locks, keeps the IndexWriter open rather than closing it
on each commit (ie commits no longer wait for background merges to complete),
works with SolrCore to provide faster 'soft' commits, and has an improved API
that requires less instanceof special casing. (Mark Miller, Robert Muir)
Bug Fixes
----------------------
* SOLR-1908: Fixed SignatureUpdateProcessor to fail to initialize on
invalid config. Specifically: a signatureField that does not exist,
or overwriteDupes=true with a signatureField that is not indexed.
(hossman)
* SOLR-1824: IndexSchema will now fail to initialize if there is a
problem initializing one of the fields or field types. (hossman)
* SOLR-1928: TermsComponent didn't correctly break ties for non-text
fields sorted by count. (yonik)
* SOLR-2107: MoreLikeThisHandler doesn't work with alternate qparsers. (yonik)
* SOLR-2108: Fixed false positives when using wildcard queries on fields with reversed
wildcard support. For example, a query of *zemog* would match documents that contain
'gomez'. (Landon Kuhn via Robert Muir)
* SOLR-1962: SolrCore#initIndex should not use a mix of indexPath and newIndexPath (Mark Miller)
* SOLR-2275: fix DisMax 'mm' parsing to be tolerant of whitespace
(Erick Erickson via hossman)
* SOLR-2193, SOLR-2565: SolrCores now properly share IndexWriters across SolrCore reloads.
(Mark Miller, Robert Muir)
Other Changes
----------------------
* SOLR-1846: Eliminate support for the abortOnConfigurationError
option. It has never worked very well, and in recent versions of
Solr hasn't worked at all. (hossman)
* SOLR-1889: The default logic for the 'mm' param of DismaxQParser and
ExtendedDismaxQParser has been changed to be determined based on the
effective value of the 'q.op' param (hossman)
* SOLR-1946: Misc improvements to the SystemInfoHandler: /admin/system
(hossman)
* SOLR-2289: Tweak spatial coords for example docs so they are a bit
more spread out (Erick Erickson via hossman)
* SOLR-2288: Small tweaks to eliminate compiler warnings. primarily
using Generics where applicable in method/object declatations, and
adding @SuppressWarnings("unchecked") when appropriate (hossman)
* SOLR-2375: Suggester Lookup implementations now store trie data
and load it back on init. This means that large tries don't have to be
rebuilt on every commit or core reload. (ab)
* SOLR-2413: Support for returning multi-valued fields w/o <arr> tag
in the XMLResponseWriter was removed. XMLResponseWriter only
no longer work with values less then 2.2 (ryan)
* SOLR-2423: FieldType argument changed from String to Object
Conversion from SolrInputDocument > Object > Fieldable is now managed
by FieldType rather then DocumentBuilder. (ryan)
* SOLR-2461: QuerySenderListener and AbstractSolrEventListener are
now public (hossman)
* LUCENE-2995: Moved some spellchecker and suggest APIs to modules/suggest:
HighFrequencyDictionary, SortedIterator, TermFreqIterator, and the
suggester APIs and implementations. (rmuir)
* SOLR-2576: Remove deprecated SpellingResult.add(Token, int).
(James Dyer via rmuir)
* LUCENE-3232: Moved MutableValue classes to new 'common' module. (Chris Male)
* LUCENE-2883: FunctionQuery, DocValues (and its impls), ValueSource (and its
impls) and BoostedQuery have been consolidated into the queries module. They
can now be found at o.a.l.queries.function.
Documentation
----------------------
* SOLR-2232: Improved README info on solr.solr.home in examples
(Eric Pugh and hossman)
======================= 3.x (not yet released) ================
New Features
----------------------
* SOLR-2458: post.jar enhanced to handle JSON, CSV and <optimize> (janhoy)
* LUCENE-3234: add a new parameter hl.phraseLimit for FastVectorHighlighter speed up.
(Mike Sokolov via koji)
* SOLR-2429: Ability to add cache=false to queries and query filters to avoid
using the filterCache or queryCache. A cost may also be specified and is used
to order the evaluation of non-cached filters from least to greatest cost .
For very expensive query filters (cost >= 100) if the query implements
the PostFilter interface, it will be used to obtain a Collector that is
checked only for documents that match the main query and all other filters.
The "frange" query now implements the PostFilter interface. (yonik)
* SOLR-2630: Added new XsltUpdateRequestHandler that works like
XmlUpdateRequestHandler but allows to transform the POSTed XML document
using XSLT. This allows to POST arbitrary XML documents to the update
handler, as long as you also provide a XSL to transform them to a valid
Solr input document. (Upayavira, Uwe Schindler)
Optimizations
----------------------
Bug Fixes
----------------------
* SOLR-2625: TermVectorComponent throws NPE if TF-IDF option is used without DF
option. (Daniel Erenrich, Simon Willnauer)
* SOLR-2631: PingRequestHandler should not allow to ping itsself using "qt"
param to prevent infinite loop. (Edoardo Tosca, Uwe Schindler)
* SOLR-2636: Fix explain functionality for negative queries. (Tom Hill via yonik)
* SOLR-2538: Range Faceting on long/double fields could overflow if values
bigger then the max int/float were used.
(Erbi Hanka, hossman)
* SOLR-2230: CommonsHttpSolrServer.addFile could not be used to send
multiple files in a single request.
(Stephan Günther, hossman)
Other Changes
----------------------
Build
----------------------
Documentation
----------------------
================== 3.3.0 ==================
Upgrading from Solr 3.2.0
----------------------
* SolrCore's CloseHook API has been changed in a backward-incompatible way. It
has been changed from an interface to an abstract class. Any custom
components which use the SolrCore.addCloseHook method will need to
be modified accordingly. To migrate, put your old CloseHook#close impl into
CloseHook#preClose.
New Features
----------------------
* SOLR-2378: A new, automaton-based, implementation of suggest (autocomplete)
component, offering an order of magnitude smaller memory consumption
compared to ternary trees and jaspell and very fast lookups at runtime.
(Dawid Weiss)
* SOLR-2400: Field- and DocumentAnalysisRequestHandler now provide a position
history for each token, so you can follow the token through all analysis stages.
The output contains a separate int[] attribute containing all positions from
previous Tokenizers/TokenFilters (called "positionHistory").
(Uwe Schindler)
* SOLR-2524: (SOLR-236, SOLR-237, SOLR-1773, SOLR-1311) Grouping / Field collapsing
using the Lucene grouping contrib. The search result can be grouped by field and query.
(Martijn van Groningen, Emmanuel Keller, Shalin Shekhar Mangar, Koji Sekiguchi,
Iván de Prado, Ryan McKinley, Marc Sturlese, Peter Karich, Bojan Smid,
Charles Hornberger, Dieter Grad, Dmitry Lihachev, Doug Steigerwald,
Karsten Sperling, Michael Gundlach, Oleg Gnatovskiy, Thomas Traeger,
Harish Agarwal, yonik, Michael McCandless, Bill Bell)
* SOLR-1331 -- Added a srcCore parameter to CoreAdminHandler's mergeindexes action
to merge one or more cores' indexes to a target core (shalin)
* SOLR-2610 -- Add an option to delete index through CoreAdmin UNLOAD action (shalin)
Optimizations
----------------------
* SOLR-2567: Solr now defaults to TieredMergePolicy. See http://s.apache.org/merging
for more information. (rmuir)
Bug Fixes
----------------------
* SOLR-2519: Improve text_* fieldTypes in example schema.xml: improve
cross-language defaults for text_general; break out separate
English-specific fieldTypes (Jan Høydahl, hossman, Robert Muir,
yonik, Mike McCandless)
* SOLR-2462: Fix extremely high memory usage problems with spellcheck.collate.
Separately, an additional spellcheck.maxCollationEvaluations (default=10000)
parameter is added to avoid excessive CPU time in extreme cases (e.g. long
queries with many misspelled words). (James Dyer via rmuir)
Other Changes
----------------------
* SOLR-2571: Add a commented out example of the spellchecker's thresholdTokenFrequency
parameter to the example solrconfig.xml, and also add a unit test for this feature.
(James Dyer via rmuir)
* SOLR-2576: Deprecate SpellingResult.add(Token token, int docFreq), please use
SpellingResult.addFrequency(Token token, int docFreq) instead.
(James Dyer via rmuir)
* SOLR-2574: Upgrade slf4j to v1.6.1 (shalin)
* LUCENE-3204: The maven-ant-tasks jar is now included in the source tree;
users of the generate-maven-artifacts target no longer have to manually
place this jar in the Ant classpath. NOTE: when Ant looks for the
maven-ant-tasks jar, it looks first in its pre-existing classpath, so
any copies it finds will be used instead of the copy included in the
Lucene/Solr source tree. For this reason, it is recommeded to remove
any copies of the maven-ant-tasks jar in the Ant classpath, e.g. under
~/.ant/lib/ or under the Ant installation's lib/ directory. (Steve Rowe)
* SOLR-2611: Fix typos in the example configuration (Eric Pugh via rmuir)
================== 3.2.0 ==================
Versions of Major Components
---------------------
Apache Lucene trunk
Apache Tika 0.8
Carrot2 3.4.2
Upgrading from Solr 3.1
----------------------
* The updateRequestProcessorChain for a RequestHandler is now defined
with update.chain rather than update.processor. The latter still works,
but has been deprecated.
Detailed Change List
----------------------
New Features
----------------------
* SOLR-2496: Add ability to specify overwrite and commitWithin as request
parameters (e.g. specified in the URL) when using the JSON update format,
and added a simplified format for specifying multiple documents.
Example: [{"id":"doc1"},{"id":"doc2"}]
(yonik)
* SOLR-2113: Add TermQParserPlugin, registered as "term". This is useful
when generating filter queries from terms returned from field faceting or
the terms component. Example: fq={!term f=weight}1.5 (hossman, yonik)
* SOLR-1915: DebugComponent now supports using a NamedList to model
Explanation objects in it's responses instead of
Explanation.toString (hossman)
Optimizations
----------------------
Bug Fixes
----------------------
* SOLR-2445: Change the default qt to blank in form.jsp, because there is no "standard"
request handler unless you have it in your solrconfig.xml explicitly. (koji)
* SOLR-2455: Prevent double submit of forms in admin interface.
(Jeffrey Chang via uschindler)
* SOLR-2464: Fix potential slowness in QueryValueSource (the query() function) when
the query is very sparse and may not match any documents in a segment. (yonik)
* SOLR-2469: When using java replication with replicateAfter=startup, the first
commit point on server startup is never removed. (yonik)
* SOLR-2466: SolrJ's CommonsHttpSolrServer would retry requests on failure, regardless
of the configured maxRetries, due to HttpClient having it's own retry mechanism
by default. The retryCount of HttpClient is now set to 0, and SolrJ does
the retry. (yonik)
* SOLR-2409: edismax parser - treat the text of a fielded query as a literal if the
fieldname does not exist. For example Mission: Impossible should not search on
the "Mission" field unless it's a valid field in the schema. (Ryan McKinley, yonik)
* SOLR-2403: facet.sort=index reported incorrect results for distributed search
in a number of scenarios when facet.mincount>0. This patch also adds some
performance/algorithmic improvements when (facet.sort=count && facet.mincount=1
&& facet.limit=-1) and when (facet.sort=index && facet.mincount>0) (yonik)
* SOLR-2333: The "rename" core admin action does not persist the new name to solr.xml
(Rasmus Hahn, Paul R. Brown via Mark Miller)
* SOLR-2390: Performance of usePhraseHighlighter is terrible on very large Documents,
regardless of hl.maxDocCharsToAnalyze. (Mark Miller)
* SOLR-2474: The helper TokenStreams in analysis.jsp and AnalysisRequestHandlerBase
did not clear all attributes so they displayed incorrect attribute values for tokens
in later filter stages. (uschindler, rmuir, yonik)
* SOLR-2467: Fix <analyzer class="..." /> initialization so any errors
are logged properly. (hossman)
* SOLR-2493: SolrQueryParser was fixed to not parse the SolrConfig DOM tree on each
instantiation which is a huge slowdown. (Stephane Bailliez via uschindler)
* SOLR-2495: The JSON parser could hang on corrupted input and could fail
to detect numbers that were too large to fit in a long. (yonik)
* SOLR-2520: Make JSON response format escape \u2029 as well as \u2028
in strings since those characters are not valid in javascript strings
(although they are valid in JSON strings). (yonik)
* SOLR-2536: Add ReloadCacheRequestHandler to fix ExternalFileField bug (if reopenReaders
set to true and no index segments have been changed, commit cannot trigger reload
external file). (koji)
* SOLR-2539: VectorValueSource.floatVal incorrectly used byteVal on sub-sources.
(Tom Liu via yonik)
* SOLR-2554: RandomSortField didn't work when used in a function query. (yonik)
Other Changes
----------------------
* SOLR-2061: Pull base tests out into a new Solr Test Framework module,
and publish binary, javadoc, and source test-framework jars.
(Drew Farris, Robert Muir, Steve Rowe)
* SOLR-2105: Rename RequestHandler param 'update.processor' to 'update.chain'.
(Jan Høydahl via Mark Miller)
* SOLR-2485: Deprecate BaseResponseWriter, GenericBinaryResponseWriter, and
GenericTextResponseWriter. These classes will be removed in 4.0. (ryan)
* SOLR-2451: Enhance assertJQ to allow individual tests to specify the
tolerance delta used in numeric equalities. This allows for slight
variance in asserting score comparisons in unit tests.
(David Smiley, Chris Hostetter)
* SOLR-2528: Remove default="true" from HtmlEncoder in example solrconfig.xml,
because html encoding confuses non-ascii users. (koji)
Build
----------------------
* LUCENE-3006: Building javadocs will fail on warnings by default. Override with -Dfailonjavadocwarning=false (sarowe, gsingers)
Documentation
----------------------
================== 3.1.0 ==================
Versions of Major Components
---------------------
Apache Lucene 3.1.0
Apache Tika 0.8
Carrot2 3.4.2
Velocity 1.6.1 and Velocity Tools 2.0-beta3
Apache UIMA 2.3.1-SNAPSHOT
Upgrading from Solr 1.4
----------------------
* The Lucene index format has changed and as a result, once you upgrade,
previous versions of Solr will no longer be able to read your indices.
In a master/slave configuration, all searchers/slaves should be upgraded
before the master. If the master were to be updated first, the older
searchers would not be able to read the new index format.
* The Solr JavaBin format has changed as of Solr 3.1. If you are using the
JavaBin format, you will need to upgrade your SolrJ client. (SOLR-2034)
* The experimental ALIAS command has been removed (SOLR-1637)
* Using solr.xml is recommended for single cores also (SOLR-1621)
* Old syntax of <highlighting> configuration in solrconfig.xml
is deprecated (SOLR-1696)
* The deprecated HTMLStripReader, HTMLStripWhitespaceTokenizerFactory and
HTMLStripStandardTokenizerFactory were removed. To strip HTML tags,
HTMLStripCharFilter should be used instead, and it works with any
Tokenizer of your choice. (SOLR-1657)
* Field compression is no longer supported. Fields that were formerly
compressed will be uncompressed as index segments are merged. For
shorter fields, this may actually be an improvement, as the compression
used was not very good for short text. Some indexes may get larger though.
* SOLR-1845: The TermsComponent response format was changed so that the
"terms" container is a map instead of a named list. This affects
response formats like JSON, but not XML. (yonik)
* SOLR-1876: All Analyzers and TokenStreams are now final to enforce
the decorator pattern. (rmuir, uschindler)
* LUCENE-2608: Added the ability to specify the accuracy on a per request basis.
It is recommended that implementations of SolrSpellChecker should change over to the new SolrSpellChecker
methods using the new SpellingOptions class, but are not required to. While this change is
backward compatible, the trunk version of Solr has already dropped support for all but the SpellingOptions method. (gsingers)
* readercycle script was removed. (SOLR-2046)
* In previous releases, sorting or evaluating function queries on
fields that were "multiValued" (either by explicit declaration in
schema.xml or by implict behavior because the "version" attribute on
the schema was less then 1.2) did not generally work, but it would
sometimes silently act as if it succeeded and order the docs
arbitrarily. Solr will now fail on any attempt to sort, or apply a
function to, multi-valued fields
* The DataImportHandler jars are no longer included in the solr
WAR and should be added in Solr's lib directory, or referenced
via the <lib> directive in solrconfig.xml.
Detailed Change List
----------------------
New Features
----------------------
* SOLR-1302: Added several new distance based functions, including
Great Circle (haversine), Manhattan, Euclidean and String (using the
StringDistance methods in the Lucene spellchecker).
Also added geohash(), deg() and rad() convenience functions.
See http://wiki.apache.org/solr/FunctionQuery. (gsingers)
* SOLR-1553: New dismax parser implementation (accessible as "edismax")
that supports full lucene syntax, improved reserved char escaping,
fielded queries, improved proximity boosting, and improved stopword
handling. Note: status is experimental for now. (yonik)
* SOLR-1574: Add many new functions from java Math (e.g. sin, cos) (yonik)
* SOLR-1569: Allow functions to take in literal strings by modifying the
FunctionQParser and adding LiteralValueSource (gsingers)
* SOLR-1571: Added unicode collation support though Lucene's CollationKeyFilter
(Robert Muir via shalin)
* SOLR-785: Distributed Search support for SpellCheckComponent
(Matthew Woytowitz, shalin)
* SOLR-1625: Add regexp support for TermsComponent (Uri Boness via noble)
* SOLR-1297: Add sort by Function capability (gsingers, yonik)
* SOLR-1139: Add TermsComponent Query and Response Support in SolrJ (Matt Weber via shalin)
* SOLR-1177: Distributed Search support for TermsComponent (Matt Weber via shalin)
* SOLR-1621, SOLR-1722: Allow current single core deployments to be specified by solr.xml (Mark Miller , noble)
* SOLR-1532: Allow StreamingUpdateSolrServer to use a provided HttpClient (Gabriele Renzi via shalin)
* SOLR-1653: Add PatternReplaceCharFilter (koji)
* SOLR-1131: FieldTypes can now output multiple Fields per Type and still be searched. This can be handy for hiding the details of a particular
implementation such as in the spatial case. (Chris Mattmann, shalin, noble, gsingers, yonik)
* SOLR-1586: Add support for Geohash and Spatial Tile FieldType (Chris Mattmann, gsingers)
* SOLR-1697: PluginInfo should load plugins w/o class attribute also (noble)
* SOLR-1268: Incorporate FastVectorHighlighter (koji)
* SOLR-1750: SolrInfoMBeanHandler added for simpler programmatic access
to info currently available from registry.jsp and stats.jsp
(ehatcher, hossman)
* SOLR-1815: SolrJ now preserves the order of facet queries. (yonik)
* SOLR-1677: Add support for choosing the Lucene Version for Lucene components within
Solr. (Uwe Schindler, Mark Miller)
* SOLR-1379: Add RAMDirectoryFactory for non-persistent in memory index storage.
(Alex Baranov via yonik)
* SOLR-1857: Synced Solr analysis with Lucene 3.1. Added KeywordMarkerFilterFactory
and StemmerOverrideFilterFactory, which can be used to tune stemming algorithms.
Added factories for Bulgarian, Czech, Hindi, Turkish, and Wikipedia analysis. Improved the
performance of SnowballPorterFilterFactory. (rmuir)
* SOLR-1657: Converted remaining TokenStreams to the Attributes-based API. All Solr
TokenFilters now support custom Attributes, and some have improved performance:
especially WordDelimiterFilter and CommonGramsFilter. (rmuir, cmale, uschindler)
* SOLR-1740: ShingleFilterFactory supports the "minShingleSize" and "tokenSeparator"
parameters for controlling the minimum shingle size produced by the filter, and
the separator string that it uses, respectively. (Steven Rowe via rmuir)
* SOLR-744: ShingleFilterFactory supports the "outputUnigramsIfNoShingles"
parameter, to output unigrams if the number of input tokens is fewer than
minShingleSize, and no shingles can be generated.
(Chris Harris via Steven Rowe)
* SOLR-1923: PhoneticFilterFactory now has support for the
Caverphone algorithm. (rmuir)
* SOLR-1957: The VelocityResponseWriter contrib moved to core.
Example search UI now available at http://localhost:8983/solr/browse
(ehatcher)
* SOLR-1974: Add LimitTokenCountFilterFactory. (koji)
* SOLR-1966: QueryElevationComponent can now return just the included results in the elevation file (gsingers, yonik)
* SOLR-1556: TermVectorComponent now supports per field overrides. Also, it now throws an error
if passed in fields do not exist and warnings
if fields that do not have term vector options (termVectors, offsets, positions)
that align with the schema declaration. It also
will now return warnings about (gsingers)
* SOLR-1985: FastVectorHighlighter: add wrapper class for Lucene's SingleFragListBuilder (koji)
* SOLR-1984: Add HyphenationCompoundWordTokenFilterFactory. (PB via rmuir)
* SOLR-397: Date Faceting now supports a "facet.date.include" param
for specifying when the upper & lower end points of computed date
ranges should be included in the range. Legal values are: "all",
"lower", "upper", "edge", and "outer". For backwards compatibility
the default value is the set: [lower,upper,edge], so that all ranges
between start and end are inclusive of their endpoints, but the
"before" and "after" ranges are not.
* SOLR-945: JSON update handler that accepts add, delete, commit
commands in JSON format. (Ryan McKinley, yonik)
* SOLR-2015: Add a boolean attribute autoGeneratePhraseQueries to TextField.
autoGeneratePhraseQueries="true" (the default) causes the query parser to
generate phrase queries if multiple tokens are generated from a single
non-quoted analysis string. For example WordDelimiterFilter splitting text:pdp-11
will cause the parser to generate text:"pdp 11" rather than (text:PDP OR text:11).
Note that autoGeneratePhraseQueries="true" tends to not work well for non whitespace
delimited languages. (yonik)
* SOLR-1925: Add CSVResponseWriter (use wt=csv) that returns the list of documents
in CSV format. (Chris Mattmann, yonik)
* SOLR-1240: "Range Faceting" has been added. This is a generalization
of the existing "Date Faceting" logic so that it now supports any
all stock numeric field types that support range queries in addition
to dates. facet.date is now deprecated in favor of this generalized mechanism.
(Gijs Kunze, hossman)
* SOLR-2021: Add SolrEncoder plugin to Highlighter. (koji)
* SOLR-2030: Make FastVectorHighlighter use of SolrEncoder. (koji)
* SOLR-2053: Add support for custom comparators in Solr spellchecker, per LUCENE-2479 (gsingers)
* SOLR-2049: Add hl.multiValuedSeparatorChar for FastVectorHighlighter, per LUCENE-2603. (koji)
* SOLR-2059: Add "types" attribute to WordDelimiterFilterFactory, which
allows you to customize how WordDelimiterFilter tokenizes text with
a configuration file. (Peter Karich, rmuir)
* SOLR-2099: Add ability to throttle rsync based replication using rsync option --bwlimit.
(Brandon Evans via koji)
* SOLR-1316: Create autosuggest component.
(Ankul Garg, Jason Rutherglen, Shalin Shekhar Mangar, Grant Ingersoll, Robert Muir, ab)
* SOLR-1568: Added "native" filtering support for PointType, GeohashField. Added LatLonType with filtering support too. See
http://wiki.apache.org/solr/SpatialSearch and the example. Refactored some items in Lucene spatial.
Removed SpatialTileField as the underlying CartesianTier is broken beyond repair and is going to be moved. (gsingers)
* SOLR-2128: Full parameter substitution for function queries.
Example: q=add($v1,$v2)&v1=mul(popularity,5)&v2=20.0
(yonik)
* SOLR-2133: Function query parser can now parse multiple comma separated
value sources. It also now fails if there is extra unexpected text
after parsing the functions, instead of silently ignoring it.
This allows expressions like q=dist(2,vector(1,2),$pt)&pt=3,4 (yonik)
* SOLR-2157: Suggester should return alpha-sorted results when onlyMorePopular=false (ab)
* SOLR-2010: Added ability to verify that spell checking collations have
actual results in the index. (James Dyer via gsingers)
* SOLR-2188: Added "maxTokenLength" argument to the factories for ClassicTokenizer,
StandardTokenizer, and UAX29URLEmailTokenizer. (Steven Rowe)
* SOLR-2129: Added a Solr module for dynamic metadata extraction/indexing with Apache UIMA.
See contrib/uima/README.txt for more information. (Tommaso Teofili via rmuir)
* SOLR-2325: Allow tagging and exclusion of main query for faceting. (yonik)
* SOLR-2263: Add ability for RawResponseWriter to stream binary files as well as
text files. (Eric Pugh via yonik)
* SOLR-860: Add debug output for MoreLikeThis. (koji)
* SOLR-1057: Add PathHierarchyTokenizerFactory. (ryan, koji)
Optimizations
----------------------
* SOLR-1679: Don't build up string messages in SolrCore.execute unless they
are necessary for the current log level.
(Fuad Efendi and hossman)
* SOLR-1874: Optimize PatternReplaceFilter for better performance. (rmuir, uschindler)
* SOLR-1968: speed up initial filter cache population for facet.method=enum and
also big terms for multi-valued facet.method=fc. The resulting speedup
for the first facet request is anywhere from 30% to 32x, depending on how many
terms are in the field and how many documents match per term. (yonik)
* SOLR-2089: Speed up UnInvertedField faceting (facet.method=fc for
multi-valued fields) when facet.limit is both high, and a high enough
percentage of the number of unique terms in the field. Extreme cases
yield speedups over 3x. (yonik)
* SOLR-2046: add common functions to scripts-util. (koji)
Bug Fixes
----------------------
* SOLR-1769: Solr 1.4 Replication - Repeater throwing NullPointerException (Jörgen Rydenius via noble)
* SOLR-1432: Make the new ValueSource.getValues(context,reader) delegate
to the original ValueSource.getValues(reader) so custom sources
will work. (yonik)
* SOLR-1572: FastLRUCache correctly implemented the LRU policy only
for the first 2B accesses. (yonik)
* SOLR-1582: copyField was ignored for BinaryField types (gsingers)
* SOLR-1563: Binary fields, including trie-based numeric fields, caused null
pointer exceptions in the luke request handler. (yonik)
* SOLR-1577: The example solrconfig.xml defaulted to a solr data dir
relative to the current working directory, even if a different solr home
was being used. The new behavior changes the default to a zero length
string, which is treated the same as if no dataDir had been specified,
hence the "data" directory under the solr home will be used. (yonik)
* SOLR-1584: SolrJ - SolrQuery.setIncludeScore() incorrectly added
fl=score to the parameter list instead of appending score to the
existing field list. (yonik)
* SOLR-1580: Solr Configuration ignores 'mergeFactor' parameter, always
uses Lucene default. (Lance Norskog via Mark Miller)
* SOLR-1593: ReverseWildcardFilter didn't work for surrogate pairs
(i.e. code points outside of the BMP), resulting in incorrect
matching. This change requires reindexing for any content with
such characters. (Robert Muir, yonik)
* SOLR-1596: A rollback operation followed by the shutdown of Solr
or the close of a core resulted in a warning:
"SEVERE: SolrIndexWriter was not closed prior to finalize()" although
there were no other consequences. (yonik)
* SOLR-1595: StreamingUpdateSolrServer used the platform default character
set when streaming updates, rather than using UTF-8 as the HTTP headers
indicated, leading to an encoding mismatch. (hossman, yonik)
* SOLR-1587: A distributed search request with fl=score, didn't match
the behavior of a non-distributed request since it only returned
the id,score fields instead of all fields in addition to score. (yonik)
* SOLR-1601: Schema browser does not indicate presence of charFilter. (koji)
* SOLR-1615: Backslash escaping did not work in quoted strings
for local param arguments. (Wojtek Piaseczny, yonik)
* SOLR-1628: log contains incorrect number of adds and deletes.
(Thijs Vonk via yonik)
* SOLR-343: Date faceting now respects facet.mincount limiting
(Uri Boness, Raiko Eckstein via hossman)
* SOLR-1624: Highlighter only highlights values from the first field value
in a multivalued field when term positions (term vectors) are stored.
(Chris Harris via yonik)
* SOLR-1635: Fixed error message when numeric values can't be parsed by
DOMUtils - notably for plugin init params in solrconfig.xml.
(hossman)
* SOLR-1651: Fixed Incorrect dataimport handler package name in SolrResourceLoader
(Akshay Ukey via shalin)
* SOLR-1660: CapitalizationFilter crashes if you use the maxWordCountOption
(Robert Muir via shalin)
* SOLR-1667: PatternTokenizer does not reset attributes such as positionIncrementGap
(Robert Muir via shalin)
* SOLR-1711: SolrJ - StreamingUpdateSolrServer had a race condition that
could halt the streaming of documents. The original patch to fix this
(never officially released) introduced another hanging bug due to
connections not being released.
(Attila Babo, Erik Hetzner, Johannes Tuchscherer via yonik)
* SOLR-1748, SOLR-1747, SOLR-1746, SOLR-1745, SOLR-1744: Streams and Readers
retrieved from ContentStreams are not closed in various places, resulting
in file descriptor leaks.
(Christoff Brill, Mark Miller)
* SOLR-1753: StatsComponent throws NPE when getting statistics for facets in distributed search
(Janne Majaranta via koji)
* SOLR-1736:In the slave , If 'mov'ing file does not succeed , copy the file (noble)
* SOLR-1579: Fixes to XML escaping in stats.jsp
(David Bowen and hossman)
* SOLR-1777: fieldTypes with sortMissingLast=true or sortMissingFirst=true can
result in incorrectly sorted results. (yonik)
* SOLR-1798: Small memory leak (~100 bytes) in fastLRUCache for every
commit. (yonik)
* SOLR-1823: Fixed XMLResponseWriter (via XMLWriter) so it no longer throws
a ClassCastException when a Map containing a non-String key is used.
(Frank Wesemann, hossman)
* SOLR-1797: fix ConcurrentModificationException and potential memory
leaks in ResourceLoader. (yonik)
* SOLR-1850: change KeepWordFilter so a new word set is not created for
each instance (John Wang via yonik)
* SOLR-1706: fixed WordDelimiterFilter for certain combinations of options
where it would output incorrect tokens. (Robert Muir, Chris Male)
* SOLR-1936: The JSON response format needed to escape unicode code point
U+2028 - 'LINE SEPARATOR' (Robert Hofstra, yonik)
* SOLR-1914: Change the JSON response format to output float/double
values of NaN,Infinity,-Infinity as strings. (yonik)
* SOLR-1948: PatternTokenizerFactory should use parent's args (koji)
* SOLR-1870: Indexing documents using the 'javabin' format no longer
fails with a ClassCastException whenSolrInputDocuments contain field
values which are Collections or other classes that implement
Iterable. (noble, hossman)
* SOLR-1981: Solr will now fail correctly if solr.xml attempts to
specify multiple cores that have the same name (hossman)
* SOLR-1791: Fix messed up core names on admin gui (yonik via koji)
* SOLR-1995: Change date format from "hour in am/pm" to "hour in day"
in CoreContainer and SnapShooter. (Hayato Ito, koji)
* SOLR-2008: avoid possible RejectedExecutionException w/autoCommit
by making SolreCore close the UpdateHandler before closing the
SearchExecutor. (NarasimhaRaju, hossman)
* SOLR-2036: Avoid expensive fieldCache ram estimation for the
admin stats page. (yonik)
* SOLR-2047: ReplicationHandler should accept bool type for enable flag. (koji)
* SOLR-1630: Fix spell checking collation issue related to token positions (rmuir, gsingers)
* SOLR-2100: The replication handler backup command didn't save the commit
point and hence could fail when a newer commit caused the older commit point
to be removed before it was finished being copied. This did not affect
normal master/slave replication. (Peter Sturge via yonik)
* SOLR-2114: Fixed parsing error in hsin function. The function signature has changed slightly. (gsingers)
* SOLR-2083: SpellCheckComponent misreports suggestions when distributed (James Dyer via gsingers)
* SOLR-2111: Change exception handling in distributed faceting to work more
like non-distributed faceting, change facet_counts/exception from a String
to a List<String> to enable listing all exceptions that happened, and
prevent an exception in one facet command from affecting another
facet command. (yonik)
* SOLR-2110: Remove the restriction on names for local params
substitution/dereferencing. Properly encode local params in
distributed faceting. (yonik)
* SOLR-2135: Fix behavior of ConcurrentLRUCache when asking for
getLatestAccessedItems(0) or getOldestAccessedItems(0).
(David Smiley via hossman)
* SOLR-2148: Highlighter doesn't support q.alt. (koji)
* SOLR-2180: It was possible for EmbeddedSolrServer to leave searchers
open if a request threw an exception. (yonik)
* SOLR-2173: Suggester should always rebuild Lookup data if Lookup.load fails. (ab)
* SOLR-2081: BaseResponseWriter.isStreamingDocs causes
SingleResponseWriter.end to be called 2x
(Chris A. Mattmann via hossman)
* SOLR-2219: The init() method of every SolrRequestHandler was being
called twice. (ambikeshwar singh and hossman)
* SOLR-2285: duplicate SolrEventListeners no longer created (hossman)
* SOLR-1993: fix String cast assumption in JavaBinCodec - specific
addresses "commitWithin" option on Update requests.
(noble, hossman, and Maxim Valyanskiy)
* SOLR-2261: fix velocity template layout.vm that referred to an older
version of jquery. (Eric Pugh via rmuir)
* SOLR-2307: fix bug in PHPSerializedResponseWriter (wt=phps) when
dealing with SolrDocumentList objects -- ie: sharded queries.
(Antonio Verni via hossman)
* SOLR-2127: Fixed serialization of default core and indentation of solr.xml when serializing.
(Ephraim Ofir, Mark Miller)
* SOLR-2320: Fixed ReplicationHandler detail reporting for masters
(hossman)
* SOLR-482: Provide more exception handling in CSVLoader (gsingers)
* SOLR-1283: HTMLStripCharFilter sometimes threw a "Mark Invalid" exception.
(Julien Coloos, hossman, yonik)
* SOLR-2085: Improve SolrJ behavior when FacetComponent comes before
QueryComponent (Tomas Salfischberger via hossman)
* SOLR-1940: Fix SolrDispatchFilter behavior when Content-Type is
unknown (Lance Norskog and hossman)
* SOLR-1983: snappuller fails when modifiedConfFiles is not empty and
full copy of index is needed. (Alexander Kanarsky via yonik)
* SOLR-2156: SnapPuller fails to clean Old Index Directories on Full Copy
(Jayendra Patil via yonik)
* SOLR-96: Fix XML parsing in XMLUpdateRequestHandler and
DocumentAnalysisRequestHandler to respect charset from XML file and only
use HTTP header's "Content-Type" as a "hint". (uschindler)
* SOLR-2339: Fix sorting to explicitly generate an error if you
attempt to sort on a multiValued field. (hossman)
* SOLR-2348: Fix field types to explicitly generate an error if you
attempt to get a ValueSource for a multiValued field. (hossman)
* SOLR-2380: Distributed faceting could miss values when facet.sort=index
and when facet.offset was greater than 0. (yonik)
* SOLR-1656: XIncludes and other HREFs in XML files loaded by ResourceLoader
are fixed to be resolved using the URI standard (RFC 2396). The system
identifier is no longer a plain filename with path, it gets initialized
using a custom URI scheme "solrres:". This scheme is resolved using a
EntityResolver that utilizes ResourceLoader
(org.apache.solr.common.util.SystemIdResolver). This makes all relative
pathes in Solr's config files behave like expected. This change
introduces some backwards breaks in the API: Some config classes
(Config, SolrConfig, IndexSchema) were changed to take
org.xml.sax.InputSource instead of InputStream. There may also be some
backwards breaks in existing config files, it is recommended to check
your config files / XSLTs and replace all XIncludes/HREFs that were
hacked to use absolute paths to use relative ones. (uschindler)
* SOLR-309: Fix FieldType so setting an analyzer on a FieldType that
doesn't expect it will generate an error. Practically speaking this
means that Solr will now correctly generate an error on
initialization if the schema.xml contains an analyzer configuration
for a fieldType that does not use TextField. (hossman)
* SOLR-2192: StreamingUpdateSolrServer.blockUntilFinished was not
thread safe and could throw an exception. (yonik)
Other Changes
----------------------
* SOLR-1602: Refactor SOLR package structure to include o.a.solr.response
and move QueryResponseWriters in there
(Chris A. Mattmann, ryan, hoss)
* SOLR-1516: Addition of an abstract BaseResponseWriter class to simplify the
development of QueryResponseWriter implementations.
(Chris A. Mattmann via noble)
* SOLR-1592: Refactor XMLWriter startTag to allow arbitrary attributes to be written
(Chris A. Mattmann via noble)
* SOLR-1561: Added Lucene 2.9.1 spatial contrib jar to lib. (gsingers)
* SOLR-1570: Log warnings if uniqueKey is multi-valued or not stored (hossman, shalin)
* SOLR-1558: QueryElevationComponent only works if the uniqueKey field is
implemented using StrField. In previous versions of Solr no warning or
error would be generated if you attempted to use QueryElevationComponent,
it would just fail in unexpected ways. This has been changed so that it
will fail with a clear error message on initialization. (hossman)
* SOLR-1611: Added Lucene 2.9.1 collation contrib jar to lib (shalin)
* SOLR-1608: Extract base class from TestDistributedSearch to make
it easy to write test cases for other distributed components. (shalin)
* Upgraded to Lucene 2.9-dev r888785 (shalin)
* SOLR-1610: Generify SolrCache (Jason Rutherglen via shalin)
* SOLR-1637: Remove ALIAS command
* SOLR-1662: Added Javadocs in BufferedTokenStream and fixed incorrect cloning
in TestBufferedTokenStream (Robert Muir, Uwe Schindler via shalin)
* SOLR-1674: Improve analysis tests and cut over to new TokenStream API.
(Robert Muir via Mark Miller)
* SOLR-1661: Remove adminCore from CoreContainer . removed deprecated methods setAdminCore(), getAdminCore() (noble)
* SOLR-1704: Google collections moved from clustering to core (noble)
* SOLR-1268: Add Lucene 2.9-dev r888785 FastVectorHighlighter contrib jar to lib. (koji)
* SOLR-1538: Reordering of object allocations in ConcurrentLRUCache to eliminate
(an extremely small) potential for deadlock.
(gabriele renzi via hossman)
* SOLR-1588: Removed some very old dead code.
(Chris A. Mattmann via hossman)
* SOLR-1696 : Deprecate old <highlighting> syntax and move configuration to HighlightComponent (noble)
* SOLR-1727: SolrEventListener should extend NamedListInitializedPlugin (noble)
* SOLR-1771: Improved error message when StringIndex cannot be initialized
for a function query (hossman)
* SOLR-1695: Improved error messages when adding a document that does not
contain exactly one value for the uniqueKey field (hossman)
* SOLR-1776: DismaxQParser and ExtendedDismaxQParser now use the schema.xml
"defaultSearchField" as the default value for the "qf" param instead of failing
with an error when "qf" is not specified. (hossman)
* SOLR-1851: luceneAutoCommit no longer has any effect - it has been remove (Mark Miller)
* SOLR-1865: SolrResourceLoader.getLines ignores Byte Order Markers (BOMs) at the
beginning of input files, these are often created by editors such as Windows
Notepad. (rmuir, hossman)
* SOLR-1938: ElisionFilterFactory will use a default set of French contractions
if you do not supply a custom articles file. (rmuir)
* SOLR-2003: SolrResourceLoader will report any encoding errors, rather than
silently using replacement characters for invalid inputs (blargy via rmuir)
* SOLR-1804: Google collections updated to Google Guava (which is a superset of collections and contains bug fixes) (gsingers)
* SOLR-2034: Switch to JavaBin codec version 2. Strings are now serialized
as the number of UTF-8 bytes, followed by the bytes in UTF-8. Previously
Strings were serialized as the number of UTF-16 chars, followed by the
bytes in Modified UTF-8. (hossman, yonik, rmuir)
* SOLR-2013: Add mapping-FoldToASCII.txt to example conf directory.
(Steven Rowe via koji)
* SOLR-2213: Upgrade to jQuery 1.4.3 (Erick Erickson via ryan)
* SOLR-1826: Add unit tests for highlighting with termOffsets=true
and overlapping tokens. (Stefan Oestreicher via rmuir)
* SOLR-2340: Add version infos to message in JavaBinCodec when throwing
exception. (koji)
* SOLR-2350: Since Solr no longer requires XML files to be in UTF-8
(see SOLR-96) SimplePostTool (aka: post.jar) has been improved to
work with files of any mime-type or charset. (hossman)
* SOLR-2365: Move DIH jars out of solr.war (David Smiley via yonik)
* SOLR-2381: Include a patched version of Jetty (6.1.26 + JETTY-1340)
to fix problematic UTF-8 handling for supplementary characters.
(Bernd Fehling, uschindler, yonik, rmuir)
* SOLR-2391: The preferred Content-Type for XML was changed to
application/xml. XMLResponseWriter now only delivers using this
type; updating documents and analyzing documents is still supported
using text/xml as Content-Type, too. If you have clients that are
hardcoded on text/xml as Content-Type, you have to change them.
(uschindler, rmuir)
* SOLR-2414: All ResponseWriters now use only ServletOutputStreams
and wrap their own Writer around it when serializing. This fixes
the bug in PHPSerializedResponseWriter that produced wrong string
length if the servlet container had a broken UTF-8 encoding that was
in fact CESU-8 (see SOLR-1091). The system property to enable the
CESU-8 byte counting in PHPSerializesResponseWriters for broken
servlet containers was therefore removed and is now ignored if set.
Output is always UTF-8. (uschindler, yonik, rmuir)
Build
----------------------
* SOLR-1522: Automated release signing process. (gsingers)
* SOLR-1891: Make lucene-jars-to-solr fail if copying any of the jars fails, and
update clean to remove the jars in that directory (Mark Miller)
* LUCENE-2466: Commons-Codec was upgraded from 1.3 to 1.4. (rmuir)
* SOLR-2042: Fixed some Maven deps (Drew Farris via gsingers)
* LUCENE-2657: Switch from using Maven POM templates to full POMs when
generating Maven artifacts (Steven Rowe)
Documentation
----------------------
* SOLR-1590: Javadoc for XMLWriter#startTag
(Chris A. Mattmann via hossman)
* SOLR-1792: Documented peculiar behavior of TestHarness.LocalRequestFactory
(hossman)
================== Release 1.4.0 ==================
Release Date: See http://lucene.apache.org/solr for the official release date.
Upgrading from Solr 1.3
-----------------------
There is a new default faceting algorithm for multiVaued fields that should be
faster for most cases. One can revert to the previous algorithm (which has
also been improved somewhat) by adding facet.method=enum to the request.
Searching and sorting is now done on a per-segment basis, meaning that
the FieldCache entries used for sorting and for function queries are
created and used per-segment and can be reused for segments that don't
change between index updates. While generally beneficial, this can lead
to increased memory usage over 1.3 in certain scenarios:
1) A single valued field that was used for both sorting and faceting
in 1.3 would have used the same top level FieldCache entry. In 1.4,
sorting will use entries at the segment level while faceting will still
use entries at the top reader level, leading to increased memory usage.
2) Certain function queries such as ord() and rord() require a top level
FieldCache instance and can thus lead to increased memory usage. Consider
replacing ord() and rord() with alternatives, such as function queries
based on ms() for date boosting.
If you use custom Tokenizer or TokenFilter components in a chain specified in
schema.xml, they must support reusability. If your Tokenizer or TokenFilter
maintains state, it should implement reset(). If your TokenFilteFactory does
not return a subclass of TokenFilter, then it should implement reset() and call
reset() on it's input TokenStream. TokenizerFactory implementations must
now return a Tokenizer rather than a TokenStream.
New users of Solr 1.4 will have omitTermFreqAndPositions enabled for non-text
indexed fields by default, which avoids indexing term frequency, positions, and
payloads, making the index smaller and faster. If you are upgrading from an
earlier Solr release and want to enable omitTermFreqAndPositions by default,
change the schema version from 1.1 to 1.2 in schema.xml. Remove any existing
index and restart Solr to ensure that omitTermFreqAndPositions completely takes
affect.
The default QParserPlugin used by the QueryComponent for parsing the "q" param
has been changed, to remove support for the deprecated use of ";" as a separator
between the query string and the sort options when no "sort" param was used.
Users who wish to continue using the semi-colon based method of specifying the
sort options should explicitly set the defType param to "lucenePlusSort" on all
requests. (The simplest way to do this is by specifying it as a default param
for your request handlers in solrconfig.xml, see the example solrconfig.xml for
sample syntax.)
If spellcheck.extendedResults=true, the response format for suggestions
has changed, see SOLR-1071.
Use of the "charset" option when configuring the following Analysis
Factories has been deprecated and will cause a warning to be logged.
In future versions of Solr attempting to use this option will cause an
error. See SOLR-1410 for more information.
* GreekLowerCaseFilterFactory
* RussianStemFilterFactory
* RussianLowerCaseFilterFactory
* RussianLetterTokenizerFactory
Versions of Major Components
----------------------------
Apache Lucene 2.9.1 (r832363 on 2.9 branch)
Apache Tika 0.4
Carrot2 3.1.0
Lucene Information
----------------
Since Solr is built on top of Lucene, many people add customizations to Solr
that are dependent on Lucene. Please see http://lucene.apache.org/java/2_9_0/,
especially http://lucene.apache.org/java/2_9_0/changes/Changes.html for more
information on the version of Lucene used in Solr.
Detailed Change List
----------------------
New Features
----------------------
1. SOLR-560: Use SLF4J logging API rather then JDK logging. The packaged .war file is
shipped with a JDK logging implementation, so logging configuration for the .war should
be identical to solr 1.3. However, if you are using the .jar file, you can select
which logging implementation to use by dropping a different binding.
See: http://www.slf4j.org/ (ryan)
2. SOLR-617: Allow configurable index deletion policy and provide a default implementation which
allows deletion of commit points on various criteria such as number of commits, age of commit
point and optimized status.
See http://lucene.apache.org/java/2_3_2/api/org/apache/lucene/index/IndexDeletionPolicy.html
(yonik, Noble Paul, Akshay Ukey via shalin)
3. SOLR-658: Allow Solr to load index from arbitrary directory in dataDir
(Noble Paul, Akshay Ukey via shalin)
4. SOLR-793: Add 'commitWithin' argument to the update add command. This behaves
similar to the global autoCommit maxTime argument except that it is set for
each request. (ryan)
5. SOLR-670: Add support for rollbacks in UpdateHandler. This allows user to rollback all changes
since the last commit. (Noble Paul, koji via shalin)
6. SOLR-813: Adding DoubleMetaphone Filter and Factory. Similar to the PhoneticFilter,
but this uses DoubleMetaphone specific calls (including alternate encoding)
(Todd Feak via ryan)
7. SOLR-680: Add StatsComponent. This gets simple statistics on matched numeric fields,
including: min, max, mean, median, stddev. (koji, ryan)
7.1 SOLR-1380: Added support for multi-valued fields (Harish Agarwal via gsingers)
8. SOLR-561: Added Replication implemented in Java as a request handler. Supports index replication
as well as configuration replication and exposes detailed statistics and progress information
on the Admin page. Works on all platforms. (Noble Paul, yonik, Akshay Ukey, shalin)
9. SOLR-746: Added "omitHeader" request parameter to omit the header from the response.
(Noble Paul via shalin)
10. SOLR-651: Added TermVectorComponent for serving up term vector information, plus IDF.
See http://wiki.apache.org/solr/TermVectorComponent (gsingers, Vaijanath N. Rao, Noble Paul)
12. SOLR-795: SpellCheckComponent supports building indices on optimize if configured in solrconfig.xml
(Jason Rennie, shalin)
13. SOLR-667: A LRU cache implementation based upon ConcurrentHashMap and other techniques to reduce
contention and synchronization overhead, to utilize multiple CPU cores more effectively.
(Fuad Efendi, Noble Paul, yonik via shalin)
14. SOLR-465: Add configurable DirectoryProvider so that alternate Directory
implementations can be specified via solrconfig.xml. The default
DirectoryProvider will use NIOFSDirectory for better concurrency
on non Windows platforms. (Mark Miller, TJ Laurenzo via yonik)
15. SOLR-822: Add CharFilter so that characters can be filtered (e.g. character normalization)
before Tokenizer/TokenFilters. (koji)
16. SOLR-829: Allow slaves to request compressed files from master during replication
(Simon Collins, Noble Paul, Akshay Ukey via shalin)
17. SOLR-877: Added TermsComponent for accessing Lucene's TermEnum capabilities.
Useful for auto suggest and possibly distributed search. Not distributed search compliant. (gsingers)
- Added mincount and maxcount options (Khee Chin via gsingers)
18. SOLR-538: Add maxChars attribute for copyField function so that the length limit for destination
can be specified.
(Georgios Stamatis, Lars Kotthoff, Chris Harris via koji)
19. SOLR-284: Added support for extracting content from binary documents like MS Word and PDF using Apache Tika. See also contrib/extraction/CHANGES.txt (Eric Pugh, Chris Harris, yonik, gsingers)
20. SOLR-819: Added factories for Arabic support (gsingers)
21. SOLR-781: Distributed search ability to sort field.facet values
lexicographically. facet.sort values "true" and "false" are
also deprecated and replaced with "count" and "lex".
(Lars Kotthoff via yonik)
22. SOLR-821: Add support for replication to copy conf file to slave with a different name. This allows replication
of solrconfig.xml
(Noble Paul, Akshay Ukey via shalin)
23. SOLR-911: Add support for multi-select faceting by allowing filters to be
tagged and facet commands to exclude certain filters. This patch also
added the ability to change the output key for facets in the response, and
optimized distributed faceting refinement by lowering parsing overhead and
by making requests and responses smaller.
24. SOLR-876: WordDelimiterFilter now supports a splitOnNumerics
option, as well as a list of protected terms.
(Dan Rosher via hossman)
25. SOLR-928: SolrDocument and SolrInputDocument now implement the Map<String,?>
interface. This should make plugging into other standard tools easier. (ryan)
26. SOLR-847: Enhance the snappull command in ReplicationHandler to accept masterUrl.
(Noble Paul, Preetam Rao via shalin)
27. SOLR-540: Add support for globbing in field names to highlight.
For example, hl.fl=*_text will highlight all fieldnames ending with
_text. (Lars Kotthoff via yonik)
28. SOLR-906: Adding a StreamingUpdateSolrServer that writes update commands to
an open HTTP connection. If you are using solrj for bulk update requests
you should consider switching to this implementaion. However, note that
the error handling is not immediate as it is with the standard SolrServer.
(ryan)
29. SOLR-865: Adding support for document updates in binary format and corresponding support in Solrj client.
(Noble Paul via shalin)
30. SOLR-763: Add support for Lucene's PositionFilter (Mck SembWever via shalin)
31. SOLR-966: Enhance the map() function query to take in an optional default value (Noble Paul, shalin)
32. SOLR-820: Support replication on startup of master with new index. (Noble Paul, Akshay Ukey via shalin)
33. SOLR-943: Make it possible to specify dataDir in solr.xml and accept the dataDir as a request parameter for
the CoreAdmin create command. (Noble Paul via shalin)
34. SOLR-850: Addition of timeouts for distributed searching. Configurable through 'shard-socket-timeout' and
'shard-connection-timeout' parameters in SearchHandler. (Patrick O'Leary via shalin)
35. SOLR-799: Add support for hash based exact/near duplicate document
handling. (Mark Miller, yonik)
36. SOLR-1026: Add protected words support to SnowballPorterFilterFactory (ehatcher)
37. SOLR-739: Add support for OmitTf (Mark Miller via yonik)
38. SOLR-1046: Nested query support for the function query parser
and lucene query parser (the latter existed as an undocumented
feature in 1.3) (yonik)
39. SOLR-940: Add support for Lucene's Trie Range Queries by providing new FieldTypes in
schema for int, float, long, double and date. Single-valued Trie based
fields with a precisionStep will index multiple precisions and enable
faster range queries. (Uwe Schindler, yonik, shalin)
40. SOLR-1038: Enhance CommonsHttpSolrServer to add docs in batch using an iterator API (Noble Paul via shalin)
41. SOLR-844: A SolrServer implementation to front-end multiple solr servers and provides load balancing and failover
support (Noble Paul, Mark Miller, hossman via shalin)
42. SOLR-939: ValueSourceRangeFilter/Query - filter based on values in a FieldCache entry or on any arbitrary function of field values. (yonik)
43. SOLR-1095: Fixed performance problem in the StopFilterFactory and simplified code. Added tests as well. (gsingers)
44. SOLR-1096: Introduced httpConnTimeout and httpReadTimeout in replication slave configuration to avoid stalled
replication. (Jeff Newburn, Noble Paul, shalin)
45. SOLR-1115: <bool>on</bool> and <bool>yes</bool> work as expected in solrconfig.xml. (koji)
46. SOLR-1099: A FieldAnalysisRequestHandler which provides the analysis functionality of the web admin page as
a service. The AnalysisRequestHandler is renamed to DocumentAnalysisRequestHandler which is enhanced with
query analysis and showMatch support. AnalysisRequestHandler is now deprecated. Support for both
FieldAnalysisRequestHandler and DocumentAnalysisRequestHandler is also provided in the Solrj client.
(Uri Boness, shalin)
47. SOLR-1106: Made CoreAdminHandler Actions pluggable so that additional actions may be plugged in or the existing
ones can be overridden if needed. (Kay Kay, Noble Paul, shalin)
48. SOLR-1124: Add a top() function query that causes it's argument to
have it's values derived from the top level IndexReader, even when
invoked from a sub-reader. top() is implicitly used for the
ord() and rord() functions. (yonik)
49. SOLR-1110: Support sorting on trie fields with Distributed Search. (Mark Miller, Uwe Schindler via shalin)
50. SOLR-1121: CoreAdminhandler should not need a core . This makes it possible to start a Solr server w/o a core .(noble)
51. SOLR-769: Added support for clustering in contrib/clustering. See http://wiki.apache.org/solr/ClusteringComponent for more info. (gsingers, Stanislaw Osinski)
52. SOLR-1175: disable/enable replication on master side. added two commands 'enableReplication' and 'disableReplication' (noble)
53. SOLR-1179: DocSets can now be used as Lucene Filters via
DocSet.getTopFilter() (yonik)
54. SOLR-1116: Add a Binary FieldType (noble)
55. SOLR-1051: Support the merge of multiple indexes as a CoreAdmin and an update command (Ning Li via shalin)
56. SOLR-1152: Snapshoot on ReplicationHandler should accept location as a request parameter (shalin)
57. SOLR-1204: Enhance SpellingQueryConverter to handle UTF-8 instead of ASCII only.
Use the NMTOKEN syntax for matching field names.
(Michael Ludwig, shalin)
58. SOLR-1189: Support providing username and password for basic HTTP authentication in Java replication
(Matthew Gregg, shalin)
59. SOLR-243: Add configurable IndexReaderFactory so that alternate IndexReader implementations
can be specified via solrconfig.xml. Note that using a custom IndexReader may be incompatible
with ReplicationHandler (see comments in SOLR-1366). This should be treated as an experimental feature.
(Andrzej Bialecki, hossman, Mark Miller, John Wang)
60. SOLR-1214: differentiate between solr home and instanceDir .deprecates the method SolrResourceLoader#locateInstanceDir()
and it is renamed to locateSolrHome (noble)
61. SOLR-1216 : disambiguate the replication command names. 'snappull' becomes 'fetchindex' 'abortsnappull' becomes 'abortfetch' (noble)
62. SOLR-1145: Add capability to specify an infoStream log file for the underlying Lucene IndexWriter in solrconfig.xml.
This is an advanced debug log file that can be used to aid developers in fixing IndexWriter bugs. See the commented
out example in the example solrconfig.xml under the indexDefaults section.
(Chris Harris, Mark Miller)
63. SOLR-1256: Show the output of CharFilters in analysis.jsp. (koji)
64. SOLR-1266: Added stemEnglishPossessive option (default=true) to WordDelimiterFilter
that allows disabling of english possessive stemming (removal of trailing 's from tokens)
(Robert Muir via yonik)
65. SOLR-1237: firstSearcher and newSearcher can now be identified via the CommonParams.EVENT (evt) parameter
in a request. This allows a RequestHandler or SearchComponent to know when a newSearcher or firstSearcher
event happened. QuerySenderListender is the only implementation in Solr that implements this, but outside
implementations may wish to. See the AbstractSolrEventListener for a helper method. (gsingers)
66. SOLR-1343: Added HTMLStripCharFilter and marked HTMLStripReader, HTMLStripWhitespaceTokenizerFactory and
HTMLStripStandardTokenizerFactory deprecated. To strip HTML tags, HTMLStripCharFilter can be used
with an arbitrary Tokenizer. (koji)
67. SOLR-1275: Add expungeDeletes to DirectUpdateHandler2 (noble)
68. SOLR-1372: Enhance FieldAnalysisRequestHandler to accept field value from content stream (ehatcher)
69. SOLR-1370: Show the output of CharFilters in FieldAnalysisRequestHandler (koji)
70. SOLR-1373: Add Filter query to admin/form.jsp
(Jason Rutherglen via hossman)
71. SOLR-1368: Add ms() function query for getting milliseconds from dates and for
high precision date subtraction, add sub() for subtracting other arguments.
(yonik)
72. SOLR-1156: Sort TermsComponent results by frequency (Matt Weber via yonik)
73. SOLR-1335 : load core properties from a properties file (noble)
74. SOLR-1385 : Add an 'enable' attribute to all plugins (noble)
75. SOLR-1414 : implicit core properties are not set for single core (noble)
76. SOLR-659 : Adds shards.start and shards.rows to distributed search
to allow more efficient bulk queries (those that retrieve many or all
documents). (Brian Whitman via yonik)
77. SOLR-1321: Add better support for efficient wildcard handling (Andrzej Bialecki, Robert Muir, gsingers)
78. SOLR-1326 : New interface PluginInfoInitialized for all types of plugin (noble)
79. SOLR-1447 : Simple property injection. <mergePolicy> & <mergeScheduler> syntaxes are now deprecated
(Jason Rutherglen, noble)
80. SOLR-908 : CommonGramsFilterFactory/CommonGramsQueryFilterFactory for
speeding up phrase queries containing common words by indexing
n-grams and using them at query time.
(Tom Burton-West, Jason Rutherglen via yonik)
81. SOLR-1292: Add FieldCache introspection to stats.jsp and JMX Monitoring via
a new SolrFieldCacheMBean. (hossman)
82. SOLR-1167: Solr Config now supports XInclude for XML engines that can support it. (Bryan Talbot via gsingers)
83. SOLR-1478: Enable sort by Lucene docid. (ehatcher)
84. SOLR-1449: Add <lib> elements to solrconfig.xml to specifying additional
classpath directories and regular expressions. (hossman via yonik)
Optimizations
----------------------
1. SOLR-374: Use IndexReader.reopen to save resources by re-using parts of the
index that haven't changed. (Mark Miller via yonik)
2. SOLR-808: Write string keys in Maps as extern strings in the javabin format. (Noble Paul via shalin)
3. SOLR-475: New faceting method with better performance and smaller memory usage for
multi-valued fields with many unique values but relatively few values per document.
Controllable via the facet.method parameter - "fc" is the new default method and "enum"
is the original method. (yonik)
4. SOLR-970: Use an ArrayList in SolrPluginUtils.parseQueryStrings
since we know exactly how long the List will be in advance.
(Kay Kay via hossman)
5. SOLR-1002: Change SolrIndexSearcher to use insertWithOverflow
with reusable priority queue entries to reduce the amount of
generated garbage during searching. (Mark Miller via yonik)
6. SOLR-971: Replace StringBuffer with StringBuilder for instances that do not require thread-safety.
(Kay Kay via shalin)
7. SOLR-921: SolrResourceLoader must cache short class name vs fully qualified classname
(Noble Paul, hossman via shalin)
8. SOLR-973: CommonsHttpSolrServer writes the xml directly to the server.
(Noble Paul via shalin)
9. SOLR-1108: Remove un-needed synchronization in SolrCore constructor.
(Noble Paul via shalin)
10. SOLR-1166: Speed up docset/filter generation by avoiding top-level
score() call and iterating over leaf readers with TermDocs. (yonik)
11. SOLR-1169: SortedIntDocSet - a new small set implementation
that saves memory over HashDocSet, is faster to construct,
is ordered for easier implementation of skipTo, and is faster
in the general case. (yonik)
12. SOLR-1165: Use Lucene Filters and pass them down to the Lucene
search methods to filter earlier and improve performance. (yonik)
13. SOLR-1111: Use per-segment sorting to share fieldcache elements
across unchanged segments. This saves memory and reduces
commit times for incremental updates to the index. (yonik)
14. SOLR-1188: Minor efficiency improvement in TermVectorComponent related to ignoring positions or offsets (gsingers)
15. SOLR-1150: Load Documents for Highlighting one at a time rather than
all at once to avoid OOM with many large Documents. (Siddharth Gargate via Mark Miller)
16. SOLR-1353: Implement and use reusable token streams for analysis. (Robert Muir, yonik)
17. SOLR-1296: Enables setting IndexReader's termInfosIndexDivisor via a new attribute to StandardIndexReaderFactory. Enables
setting termIndexInterval to IndexWriter via SolrIndexConfig. (Jason Rutherglen, hossman, gsingers)
Bug Fixes
----------------------
1. SOLR-774: Fixed logging level display (Sean Timm via Otis Gospodnetic)
2. SOLR-771: CoreAdminHandler STATUS should display 'normalized' paths (koji, hossman, shalin)
3. SOLR-532: WordDelimiterFilter now respects payloads and other attributes of the original Token by
using Token.clone() (Tricia Williams, gsingers)
4. SOLR-805: DisMax queries are not being cached in QueryResultCache (Todd Feak via koji)
5. SOLR-751: WordDelimiterFilter didn't adjust the start offset of single
tokens that started with delimiters, leading to incorrect highlighting.
(Stefan Oestreicher via yonik)
7. SOLR-843: SynonymFilterFactory cannot handle multiple synonym files correctly (koji)
8. SOLR-840: BinaryResponseWriter does not handle incompatible data in fields (Noble Paul via shalin)
9. SOLR-803: CoreAdminRequest.createCore fails because name parameter isn't set (Sean Colombo via ryan)
10. SOLR-869: Fix file descriptor leak in SolrResourceLoader#getLines (Mark Miller, shalin)
11. SOLR-872: Better error message for incorrect copyField destination (Noble Paul via shalin)
12. SOLR-879: Enable position increments in the query parser and fix the
example schema to enable position increments for the stop filter in
both the index and query analyzers to fix the bug with phrase queries
with stopwords. (yonik)
13. SOLR-836: Add missing "a" to the example stopwords.txt (yonik)
14. SOLR-892: Fix serialization of booleans for PHPSerializedResponseWriter
(yonik)
15. SOLR-898: Fix null pointer exception for the JSON response writer
based formats when nl.json=arrarr with null keys. (yonik)
16. SOLR-901: FastOutputStream ignores write(byte[]) call. (Noble Paul via shalin)
17. SOLR-807: BinaryResponseWriter writes fieldType.toExternal if it is not a supported type,
otherwise it writes fieldType.toObject. This fixes the bug with encoding/decoding UUIDField.
(koji, Noble Paul, shalin)
18. SOLR-863: SolrCore.initIndex should close the directory it gets for clearing the lock and
use the DirectoryFactory. (Mark Miller via shalin)
19. SOLR-802: Fix a potential null pointer error in the distributed FacetComponent
(David Bowen via ryan)
20. SOLR-346: Use perl regex to improve accuracy of finding latest snapshot in snapinstaller (billa)
21. SOLR-830: Use perl regex to improve accuracy of finding latest snapshot in snappuller (billa)
22. SOLR-897: Fixed Argument list too long error when there are lots of snapshots/backups (Dan Rosher via billa)
23. SOLR-925: Fixed highlighting on fields with multiValued="true" and termOffsets="true" (koji)
24. SOLR-902: FastInputStream#read(byte b[], int off, int len) gives incorrect results when amount left to read is less
than buffer size (Noble Paul via shalin)
25. SOLR-978: Old files are not removed from slaves after replication (Jaco, Noble Paul, shalin)
26. SOLR-883: Implicit properties are not set for Cores created through CoreAdmin (Noble Paul via shalin)
27. SOLR-991: Better error message when parsing solrconfig.xml fails due to malformed XML. Error message notes the name
of the file being parsed. (Michael Henson via shalin)
28. SOLR-1008: Fix stats.jsp XML encoding for <stat> item entries with ampersands in their names. (ehatcher)
29. SOLR-976: deleteByQuery is ignored when deleteById is placed prior to deleteByQuery in a <delete>.
Now both delete by id and delete by query can be specified at the same time as follows. (koji)
<delete>
<id>05991</id><id>06000</id>
<query>office:Bridgewater</query><query>office:Osaka</query>
</delete>
30. SOLR-1016: HTTP 503 error changes 500 in SolrCore (koji)
31. SOLR-1015: Incomplete information in replication admin page and http command response when server
is both master and slave i.e. when server is a repeater (Akshay Ukey via shalin)
32. SOLR-1018: Slave is unable to replicate when server acts as repeater (as both master and slave)
(Akshay Ukey, Noble Paul via shalin)
33. SOLR-1031: Fix XSS vulnerability in schema.jsp (Paul Lovvik via ehatcher)
34. SOLR-1064: registry.jsp incorrectly displaying info for last core initialized
regardless of what the current core is. (hossman)
35. SOLR-1072: absolute paths used in sharedLib attribute were
incorrectly treated as relative paths. (hossman)
36. SOLR-1104: Fix some rounding errors in LukeRequestHandler's histogram (hossman)
37. SOLR-1125: Use query analyzer rather than index analyzer for queryFieldType in QueryElevationComponent
(koji)
38. SOLR-1126: Replicated files have incorrect timestamp (Jian Han Guo, Jeff Newburn, Noble Paul via shalin)
39. SOLR-1094: Incorrect value of correctlySpelled attribute in some cases (David Smiley, Mark Miller via shalin)
40. SOLR-965: Better error message when <pingQuery> is not configured.
(Mark Miller via hossman)
41. SOLR-1135: Java replication creates Snapshot in the directory where Solr was launched (Jianhan Guo via shalin)
42. SOLR-1138: Query Elevation Component now gracefully handles missing queries. (gsingers)
43. SOLR-929: LukeRequestHandler should return "dynamicBase" only if the field is dynamic.
(Peter Wolanin, koji)
44. SOLR-1141: NullPointerException during snapshoot command in java based replication (Jian Han Guo, shalin)
45. SOLR-1078: Fixes to WordDelimiterFilter to avoid splitting or dropping
international non-letter characters such as non spacing marks. (yonik)
46. SOLR-825, SOLR-1221: Enables highlighting for range/wildcard/fuzzy/prefix queries if using hl.usePhraseHighlighter=true
and hl.highlightMultiTerm=true. Also make both options default to true. (Mark Miller, yonik)
47. SOLR-1174: Fix Logging admin form submit url for multicore. (Jacob Singh via shalin)
48. SOLR-1182: Fix bug in OrdFieldSource#equals which could cause a bug with OrdFieldSource caching
on OrdFieldSource#hashcode collisions. (Mark Miller)
49. SOLR-1207: equals method should compare this and other of DocList in DocSetBase (koji)
50. SOLR-1242: Human readable JVM info from system handler does integer cutoff rounding, even when dealing
with GB. Fixed to round to one decimal place. (Jay Hill, Mark Miller)
51. SOLR-1243: Admin RequestHandlers should not be cached over HTTP. (Mark Miller)
52. SOLR-1260: Fix implementations of set operations for DocList subclasses
and fix a bug in HashDocSet construction when offset != 0. These bugs
never manifested in normal Solr use and only potentially affect
custom code. (yonik)
53. SOLR-1171: Fix LukeRequestHandler so it doesn't rely on SolrQueryParser
and report incorrect stats when field names contain characters
SolrQueryParser considers special.
(hossman)
54. SOLR-1317: Fix CapitalizationFilterFactory to work when keep parameter is not specified.
(ehatcher)
55. SOLR-1342: CapitalizationFilterFactory uses incorrect term length calculations.
(Robert Muir via Mark Miller)
56. SOLR-1359: DoubleMetaphoneFilter didn't index original tokens if there was no
alternative, and could incorrectly skip or reorder tokens. (yonik)
57. SOLR-1360: Prevent PhoneticFilter from producing duplicate tokens. (yonik)
58. SOLR-1371: LukeRequestHandler/schema.jsp errored if schema had no
uniqueKey field. The new test for this also (hopefully) adds some
future proofing against similar bugs in the future. As a side
effect QueryElevationComponentTest was refactored, and a bug in
that test was found. (hossman)
59. SOLR-914: General finalize() improvements. No finalizer delegates
to the respective close/destroy method w/o first checking if it's
already been closed/destroyed; if it hasn't a, SEVERE error is
logged first. (noble, hossman)
60. SOLR-1362: WordDelimiterFilter had inconsistent behavior when setting
the position increment of tokens following a token consisting of all
delimiters, and could additionally lose big position increments.
(Robert Muir, yonik)
61. SOLR-1091: Jetty's use of CESU-8 for code points outside the BMP
resulted in invalid output from the serialized PHP writer. (yonik)
62. SOLR-1103: LukeRequestHandler (and schema.jsp) have been fixed to
include the "1" (ie: 2**0) bucket in the term histogram data.
(hossman)
63. SOLR-1398: Add offset corrections in PatternTokenizerFactory.
(Anders Melchiorsen, koji)
64. SOLR-1400: Properly handle zero-length tokens in TrimFilter. This
was not a bug in any released version. (Peter Wolanin, gsingers)
65. SOLR-1071: spellcheck.extendedResults returns an invalid JSON response
when count > 1. To fix, the extendedResults format was changed.
(Uri Boness, yonik)
66. SOLR-1381: Fixed improper handling of fields that have only term positions and not term offsets during Highlighting (Thorsten Fischer, gsingers)
67. SOLR-1427: Fixed registry.jsp issue with MBeans (gsingers)
68. SOLR-1468: SolrJ's XML response parsing threw an exception for null
names, such as those produced when facet.missing=true (yonik)
69. SOLR-1471: Fixed issue with calculating missing values for facets in single valued cases in Stats Component.
This is not correctly calculated for the multivalued case. (James Miller, gsingers)
70. SOLR-1481: Fixed omitHeader parameter for PHP ResponseWriter. (Jun Ohtani via billa)
71. SOLR-1448: Add weblogic.xml to solr webapp to enable correct operation in
WebLogic. (Ilan Rabinovitch via yonik)
72. SOLR-1504: empty char mapping can cause ArrayIndexOutOfBoundsException in analysis.jsp and co.
(koji)
73. SOLR-1394: HTMLStripCharFilter split tokens that contained entities and
often calculated offsets incorrectly for entities.
(Anders Melchiorsen via yonik)
74. SOLR-1517: Admin pages could stall waiting for localhost name resolution
if reverse DNS wasn't configured; this was changed so the DNS resolution
is attempted only once the first time an admin page is loaded.
(hossman)
75. SOLR-1529: More than 8 deleteByQuery commands in a single request
caused an error to be returned, although the deletes were
still executed. (asmodean via yonik)
Other Changes
----------------------
1. Upgraded to Lucene 2.4.0 (yonik)
2. SOLR-805: Upgraded to Lucene 2.9-dev (r707499) (koji)
3. DumpRequestHandler (/debug/dump): changed 'fieldName' to 'sourceInfo'. (ehatcher)
4. SOLR-852: Refactored common code in CSVRequestHandler and XMLUpdateRequestHandler (gsingers, ehatcher)
5. SOLR-871: Removed dependency on stax-utils.jar. If you using solr.jar and running
java 6, you can also remove woodstox and geronimo. (ryan)
6. SOLR-465: Upgraded to Lucene 2.9-dev (r719351) (shalin)
7. SOLR-889: Upgraded to commons-io-1.4.jar and commons-fileupload-1.2.1.jar (ryan)
8. SOLR-875: Upgraded to Lucene 2.9-dev (r723985) and consolidated the BitSet implementations (Michael Busch, gsingers)
9. SOLR-819: Upgraded to Lucene 2.9-dev (r724059) to get access to Arabic public constructors (gsingers)
and
10. SOLR-900: Moved solrj into /src/solrj. The contents of solr-common.jar is now included
in the solr-solrj.jar. (ryan)
11. SOLR-924: Code cleanup: make all existing finalize() methods call
super.finalize() in a finally block. All current instances extend
Object, so this doesn't fix any bugs, but helps protect against
future changes. (Kay Kay via hossman)
12. SOLR-885: NamedListCodec is renamed to JavaBinCodec and returns Object instead of NamedList.
(Noble Paul, yonik via shalin)
13. SOLR-84: Use new Solr logo in admin (Michiel via koji)
14. SOLR-981: groupId for Woodstox dependency in maven solrj changed to org.codehaus.woodstox (Tim Taranov via shalin)
15. Upgraded to Lucene 2.9-dev r738218 (yonik)
16. SOLR-959: Refactored TestReplicationHandler to remove hardcoded port numbers (hossman, Akshay Ukey via shalin)
17. Upgraded to Lucene 2.9-dev r742220 (yonik)
18. SOLR-1022: Better "ignored" field in example schema.xml (Peter Wolanin via hossman)
19. SOLR-967: New type-safe constructor for NamedList (Kay Kay via hossman)
20. SOLR-1036: Change default QParser from "lucenePlusSort" to "lucene" to
reduce confusion of semicolon splitting behavior when no sort param is
specified (hossman)
21. Upgraded to Lucene 2.9-dev r752164 (shalin)
22. SOLR-1068: Use fsync on replicated index and configuration files (yonik, Noble Paul, shalin)
23. SOLR-952: Cleanup duplicated code in deprecated HighlightingUtils (hossman)
24. Upgraded to Lucene 2.9-dev r764281 (shalin)
25. SOLR-1079: Rename omitTf to omitTermFreqAndPositions (shalin)
26. SOLR-804: Added Lucene's misc contrib JAR (rev 764281). (gsingers)
27. Upgraded to Lucene 2.9-dev r768228 (shalin)
28. Upgraded to Lucene 2.9-dev r768336 (shalin)
29. SOLR-997: Wait for a longer time for slave to complete replication in TestReplicationHandler
(Mark Miller via shalin)
30. SOLR-748: FacetComponent helper classes are made public as an experimental API.
(Wojtek Piaseczny via shalin)
31. Upgraded to Lucene 2.9-dev 773862 (Mark Miller)
32. Upgraded to Lucene 2.9-dev r776177 (shalin)
33. SOLR-1149: Made QParserPlugin and related classes extendible as an experimental API.
(Kaktu Chakarabati via shalin)
34. Upgraded to Lucene 2.9-dev r779312 (yonik)
35. SOLR-786: Refactor DisMaxQParser to allow overriding certain features of DisMaxQParser
(Wojciech Biela via shalin)
36. SOLR-458: Add equals and hashCode methods to NamedList (Stefan Rinner, shalin)
37. SOLR-1184: Add option in solrconfig to open a new IndexReader rather than
using reopen. Done mainly as a fail-safe in the case that a user runs into
a reopen bug/issue. (Mark Miller)
38. SOLR-1215 use double quotes to enclose attributes in solr.xml (noble)
39. SOLR-1151: add dynamic copy field and maxChars example to example schema.xml.
(Peter Wolanin, Mark Miller)
40. SOLR-1233: remove /select?qt=/whatever restriction on /-prefixed request handlers.
(ehatcher)
41. SOLR-1257: logging.jsp has been removed and now passes through to the
hierarchical log level tool added in Solr 1.3. Users still
hitting "/admin/logging.jsp" should switch to "/admin/logging".
(hossman)
42. Upgraded to Lucene 2.9-dev r794238. Other changes include:
LUCENE-1614 - Use Lucene's DocIdSetIterator.NO_MORE_DOCS as the sentinel value.
LUCENE-1630 - Add acceptsDocsOutOfOrder method to Collector implementations.
LUCENE-1673, LUCENE-1701 - Trie has moved to Lucene core and renamed to NumericRangeQuery.
LUCENE-1662, LUCENE-1687 - Replace usage of ExtendedFieldCache by FieldCache.
(shalin)
42. SOLR-1241: Solr's CharFilter has been moved to Lucene. Remove CharFilter and related classes
from Solr and use Lucene's corresponding code (koji via shalin)
43. SOLR-1261: Lucene trunk renamed RangeQuery & Co to TermRangeQuery (Uwe Schindler via shalin)
44. Upgraded to Lucene 2.9-dev r801856 (Mark Miller)
45. SOLR1276: Added StatsComponentTest (Rafa<66>ł Ku<4B>ć, gsingers)
46. SOLR-1377: The TokenizerFactory API has changed to explicitly return a Tokenizer
rather then a TokenStream (that may be or may not be a Tokenizer). This change
is required to take advantage of the Token reuse improvements in lucene 2.9. (ryan)
47. SOLR-1410: Log a warning if the deprecated charset option is used
on GreekLowerCaseFilterFactory, RussianStemFilterFactory,
RussianLowerCaseFilterFactory or RussianLetterTokenizerFactory.
(Robert Muir via hossman)
48. SOLR-1423: Due to LUCENE-1906, Solr's tokenizer should use Tokenizer.correctOffset() instead of CharStream.correctOffset().
(Uwe Schindler via koji)
49. SOLR-1319, SOLR-1345: Upgrade Solr Highlighter classes to new Lucene Highlighter API. This upgrade has
resulted in a back compat break in the DefaultSolrHighlighter class - getQueryScorer is no longer
protected. If you happened to be overriding that method in custom code, overide getHighlighter instead.
Also, HighlightingUtils#getQueryScorer has been removed as it was deprecated and backcompat has been
broken with it anyway. (Mark Miller)
50. SOLR-1357 SolrInputDocument cannot process dynamic fields (Lars Grote via noble)
Build
----------------------
1. SOLR-776: Added in ability to sign artifacts via Ant for releases (gsingers)
2. SOLR-854: Added run-example target (Mark Miller via ehatcher)
3. SOLR-1054:Fix dist-src target for DataImportHandler (Ryuuichi Kumai via shalin)
4. SOLR-1219: Added proxy.setup target (koji)
5. SOLR-1386: In build.xml, use longfile="gnu" in tar task to avoid warnings about long file names
(Mark Miller via shalin)
6. SOLR-1441: Make it possible to run all tests in a package (shalin)
Documentation
----------------------
1. SOLR-789: The javadoc of RandomSortField is not readable (Nicolas Lalev<65>Á<EFBFBD>e via koji)
2. SOLR-962: Note about null handling in ModifiableSolrParams.add javadoc
(Kay Kay via hossman)
3. SOLR-1409: Added Solr Powered By Logos
================== Release 1.3.0 ==================
Upgrading from Solr 1.2
-----------------------
IMPORTANT UPGRADE NOTE: In a master/slave configuration, all searchers/slaves
should be upgraded before the master! If the master were to be updated
first, the older searchers would not be able to read the new index format.
The Porter snowball based stemmers in Lucene were updated (LUCENE-1142),
and are not guaranteed to be backward compatible at the index level
(the stem of certain words may have changed). Re-indexing is recommended.
Older Apache Solr installations can be upgraded by replacing
the relevant war file with the new version. No changes to configuration
files should be needed.
This version of Solr contains a new version of Lucene implementing
an updated index format. This version of Solr/Lucene can still read
and update indexes in the older formats, and will convert them to the new
format on the first index change. Be sure to backup your index before
upgrading in case you need to downgrade.
Solr now recognizes HTTP Request headers related to HTTP Caching (see
RFC 2616 sec13) and will by default respond with "304 Not Modified"
when appropriate. This should only affect users who access Solr via
an HTTP Cache, or via a Web-browser that has an internal cache, but if
you wish to suppress this behavior an '<httpCaching never304="true"/>'
option can be added to your solrconfig.xml. See the wiki (or the
example solrconfig.xml) for more details...
http://wiki.apache.org/solr/SolrConfigXml#HTTPCaching
In Solr 1.2, DateField did not enforce the canonical representation of
the ISO 8601 format when parsing incoming data, and did not generation
the canonical format when generating dates from "Date Math" strings
(particularly as it pertains to milliseconds ending in trailing zeros)
-- As a result equivalent dates could not always be compared properly.
This problem is corrected in Solr 1.3, but DateField users that might
have been affected by indexing inconsistent formats of equivilent
dates (ie: 1995-12-31T23:59:59Z vs 1995-12-31T23:59:59.000Z) may want
to consider reindexing to correct these inconsistencies. Users who
depend on some of the the "broken" behavior of DateField in Solr 1.2
(specificly: accepting any input that ends in a 'Z') should consider
using the LegacyDateField class as a possible alternative. Users that
desire 100% backwards compatibility should consider using the Solr 1.2
version of DateField.
Due to some changes in the lifecycle of TokenFilterFactories, users of
Solr 1.2 who have written Java code which constructs new instances of
StopFilterFactory, SynonymFilterFactory, or EnglishProterFilterFactory
will need to modify their code by adding a line like the following
prior to using the factory object...
factory.inform(SolrCore.getSolrCore().getSolrConfig().getResourceLoader());
These lifecycle changes do not affect people who use Solr "out of the
box" or who have developed their own TokenFilterFactory plugins. More
info can be found in SOLR-594.
The python client that used to ship with Solr is no longer included in
the distribution (see client/python/README.txt).
Detailed Change List
--------------------
New Features
1. SOLR-69: Adding MoreLikeThisHandler to search for similar documents using
lucene contrib/queries MoreLikeThis. MoreLikeThis is also available from
the StandardRequestHandler using ?mlt=true. (bdelacretaz, ryan)
2. SOLR-253: Adding KeepWordFilter and KeepWordFilterFactory. A TokenFilter
that keeps tokens with text in the registered keeplist. This behaves like
the inverse of StopFilter. (ryan)
3. SOLR-257: WordDelimiterFilter has a new parameter splitOnCaseChange,
which can be set to 0 to disable splitting "PowerShot" => "Power" "Shot".
(klaas)
4. SOLR-193: Adding SolrDocument and SolrInputDocument to represent documents
outside of the lucene Document infrastructure. This class will be used
by clients and for processing documents. (ryan)
5. SOLR-244: Added ModifiableSolrParams - a SolrParams implementation that
help you change values after initialization. (ryan)
6. SOLR-20: Added a java client interface with two implementations. One
implementation uses commons httpclient to connect to solr via HTTP. The
other connects to solr directly. Check client/java/solrj. This addition
also includes tests that start jetty and test a connection using the full
HTTP request cycle. (Darren Erik Vengroff, Will Johnson, ryan)
7. SOLR-133: Added StaxUpdateRequestHandler that uses StAX for XML parsing.
This implementation has much better error checking and lets you configure
a custom UpdateRequestProcessor that can selectively process update
requests depending on the request attributes. This class will likely
replace XmlUpdateRequestHandler. (Thorsten Scherler, ryan)
8. SOLR-264: Added RandomSortField, a utility field with a random sort order.
The seed is based on a hash of the field name, so a dynamic field
of this type is useful for generating different random sequences.
This field type should only be used for sorting or as a value source
in a FunctionQuery (ryan, hossman, yonik)
9. SOLR-266: Adding show=schema to LukeRequestHandler to show the parsed
schema fields and field types. (ryan)
10. SOLR-133: The UpdateRequestHandler now accepts multiple delete options
within a single request. For example, sending:
<delete><id>1</id><id>2</id></delete> will delete both 1 and 2. (ryan)
11. SOLR-269: Added UpdateRequestProcessor plugin framework. This provides
a reasonable place to process documents after they are parsed and
before they are committed to the index. This is a good place for custom
document manipulation or document based authorization. (yonik, ryan)
12. SOLR-260: Converting to a standard PluginLoader framework. This reworks
RequestHandlers, FieldTypes, and QueryResponseWriters to share the same
base code for loading and initializing plugins. This adds a new
configuration option to define the default RequestHandler and
QueryResponseWriter in XML using default="true". (ryan)
13. SOLR-225: Enable pluggable highlighting classes. Allow configurable
highlighting formatters and Fragmenters. (ryan)
14. SOLR-273/376/452/516: Added hl.maxAnalyzedChars highlighting parameter, defaulting
to 50k, hl.alternateField, which allows the specification of a backup
field to use as summary if no keywords are matched, and hl.mergeContiguous,
which combines fragments if they are adjacent in the source document.
(klaas, Grant Ingersoll, Koji Sekiguchi via klaas)
15. SOLR-291: Control maximum number of documents to cache for any entry
in the queryResultCache via queryResultMaxDocsCached solrconfig.xml
entry. (Koji Sekiguchi via yonik)
16. SOLR-240: New <lockType> configuration setting in <mainIndex> and
<indexDefaults> blocks supports all Lucene builtin LockFactories.
'single' is recommended setting, but 'simple' is default for total
backwards compatibility.
(Will Johnson via hossman)
17. SOLR-248: Added CapitalizationFilterFactory that creates tokens with
normalized capitalization. This filter is useful for facet display,
but will not work with a prefix query. (ryan)
SOLR-468: Change to the semantics to keep the original token, not the
token in the Map. Also switched to use Lucene's new reusable token
capabilities. (gsingers)
18. SOLR-307: Added NGramFilterFactory and EdgeNGramFilterFactory.
(Thomas Peuss via Otis Gospodnetic)
19. SOLR-305: analysis.jsp can be given a fieldtype instead of a field
name. (hossman)
20. SOLR-102: Added RegexFragmenter, which splits text for highlighting
based on a given pattern. (klaas)
21. SOLR-258: Date Faceting added to SimpleFacets. Facet counts
computed for ranges of size facet.date.gap (a DateMath expression)
between facet.date.start and facet.date.end. (hossman)
22. SOLR-196: A PHP serialized "phps" response writer that returns a
serialized array that can be used with the PHP function unserialize,
and a PHP response writer "php" that may be used by eval.
(Nick Jenkin, Paul Borgermans, Pieter Berkel via yonik)
23. SOLR-308: A new UUIDField class which accepts UUID string values,
as well as the special value of "NEW" which triggers generation of
a new random UUID.
(Thomas Peuss via hossman)
24. SOLR-349: New FunctionQuery functions: sum, product, div, pow, log,
sqrt, abs, scale, map. Constants may now be used as a value source.
(yonik)
25. SOLR-359: Add field type className to Luke response, and enabled access
to the detailed field information from the solrj client API.
(Grant Ingersoll via ehatcher)
26. SOLR-334: Pluggable query parsers. Allows specification of query
type and arguments as a prefix on a query string. (yonik)
27. SOLR-351: External Value Source. An external file may be used
to specify the values of a field, currently usable as
a ValueSource in a FunctionQuery. (yonik)
28. SOLR-395: Many new features for the spell checker implementation, including
an extended response mode with much richer output, multi-word spell checking,
and a bevy of new and renamed options (see the wiki).
(Mike Krimerman, Scott Taber via klaas).
29. SOLR-408: Added PingRequestHandler and deprecated SolrCore.getPingQueryRequest().
Ping requests should be configured using standard RequestHandler syntax in
solrconfig.xml rather then using the <pingQuery></pingQuery> syntax.
(Karsten Sperling via ryan)
30. SOLR-281: Added a 'Search Component' interface and converted StandardRequestHandler
and DisMaxRequestHandler to use this framework.
(Sharad Agarwal, Henri Biestro, yonik, ryan)
31. SOLR-176: Add detailed timing data to query response output. The SearchHandler
interface now returns how long each section takes. (klaas)
32. SOLR-414: Plugin initialization now supports SolrCore and ResourceLoader "Aware"
plugins. Plugins that implement SolrCoreAware or ResourceLoaderAware are
informed about the SolrCore/ResourceLoader. (Henri Biestro, ryan)
33. SOLR-350: Support multiple SolrCores running in the same solr instance and allows
runtime runtime management for any running SolrCore. If a solr.xml file exists
in solr.home, this file is used to instanciate multiple cores and enables runtime
core manipulation. For more informaion see: http://wiki.apache.org/solr/CoreAdmin
(Henri Biestro, ryan)
34. SOLR-447: Added an single request handler that will automatically register all
standard admin request handlers. This replaces the need to register (and maintain)
the set of admin request handlers. Assuming solrconfig.xml includes:
<requestHandler name="/admin/" class="org.apache.solr.handler.admin.AdminHandlers" />
This will register: Luke/SystemInfo/PluginInfo/ThreadDump/PropertiesRequestHandler.
(ryan)
35. SOLR-142: Added RawResponseWriter and ShowFileRequestHandler. This returns config
files directly. If AdminHandlers are configured, this will be added automatically.
The jsp files /admin/get-file.jsp and /admin/raw-schema.jsp have been deprecated.
The deprecated <admin><gettableFiles> will be automatically registered with
a ShowFileRequestHandler instance for backwards compatibility. (ryan)
36. SOLR-446: TextResponseWriter can write SolrDocuments and SolrDocumentLists the
same way it writes Document and DocList. (yonik, ryan)
37. SOLR-418: Adding a query elevation component. This is an optional component to
elevate some documents to the top positions (or exclude them) for a given query.
(ryan)
38. SOLR-478: Added ability to get back unique key information from the LukeRequestHandler.
(gsingers)
39. SOLR-127: HTTP Caching awareness. Solr now recognizes HTTP Request
headers related to HTTP Caching (see RFC 2616 sec13) and will respond
with "304 Not Modified" when appropriate. New options have been added
to solrconfig.xml to influence this behavior.
(Thomas Peuss via hossman)
40. SOLR-303: Distributed Search over HTTP. Specification of shards
argument causes Solr to query those shards and merge the results
into a single response. Querying, field faceting (sorted only),
query faceting, highlighting, and debug information are supported
in distributed mode.
(Sharad Agarwal, Patrick O'Leary, Sabyasachi Dalal, Stu Hood,
Jayson Minard, Lars Kotthoff, ryan, yonik)
41. SOLR-356: Pluggable functions (value sources) that allow
registration of new functions via solrconfig.xml
(Doug Daniels via yonik)
42. SOLR-494: Added cool admin Ajaxed schema explorer.
(Greg Ludington via ehatcher)
43. SOLR-497: Added date faceting to the QueryResponse in SolrJ
and QueryResponseTest (Shalin Shekhar Mangar via gsingers)
44. SOLR-486: Binary response format, faster and smaller
than XML and JSON response formats (use wt=javabin).
BinaryResponseParser for utilizing the binary format via SolrJ
and is now the default.
(Noble Paul, yonik)
45. SOLR-521: StopFilterFactory support for "enablePositionIncrements"
(Walter Ferrara via hossman)
46. SOLR-557: Added SolrCore.getSearchComponents() to return an unmodifiable Map. (gsingers)
47. SOLR-516: Added hl.maxAlternateFieldLength parameter, to set max length for hl.alternateField
(Koji Sekiguchi via klaas)
48. SOLR-319: Changed SynonymFilterFactory to "tokenize" synonyms file.
To use a tokenizer, specify "tokenizerFactory" attribute in <filter>.
For example:
<tokenizer class="solr.CJKTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" expand="true"
ignoreCase="true" tokenizerFactory="solr.CJKTokenizerFactory"/>
(koji)
49. SOLR-515: Added SimilarityFactory capability to schema.xml,
making config file parameters usable in the construction of
the global Lucene Similarity implementation.
(ehatcher)
50. SOLR-536: Add a DocumentObjectBinder to solrj that converts Objects to and
from SolrDocuments. (Noble Paul via ryan)
51. SOLR-595: Add support for Field level boosting in the MoreLikeThis Handler.
(Tom Morton, gsingers)
52. SOLR-572: Added SpellCheckComponent and org.apache.solr.spelling package to support more spell
checking functionality. Also includes ability to add your own SolrSpellChecker implementation that
plugs in. See http://wiki.apache.org/solr/SpellCheckComponent for more details
(Shalin Shekhar Mangar, Bojan Smid, gsingers)
53. SOLR-679: Added accessor methods to Lucene based spell checkers (gsingers)
54. SOLR-423: Added Request Handler close hook notification so that RequestHandlers can be notified
when a core is closing. (gsingers, ryan)
55. SOLR-603: Added ability to partially optimize. (gsingers)
56. SOLR-483: Add byte/short sorting support (gsingers)
57. SOLR-14: Add preserveOriginal flag to WordDelimiterFilter
(Geoffrey Young, Trey Hyde, Ankur Madnani, yonik)
58. SOLR-502: Add search timeout support. (Sean Timm via yonik)
59. SOLR-605: Add the ability to register callbacks programatically (ryan, Noble Paul)
60. SOLR-610: hl.maxAnalyzedChars can be -1 to highlight everything (Lars Kotthoff via klaas)
61. SOLR-522: Make analysis.jsp show payloads. (Tricia Williams via yonik)
62. SOLR-611: Expose sort_values returned by QueryComponent in SolrJ's QueryResponse
(Dan Rosher via shalin)
63. SOLR-256: Support exposing Solr statistics through JMX (Sharad Agrawal, shalin)
64. SOLR-666: Expose warmup time in statistics for SolrIndexSearcher and LRUCache (shalin)
65. SOLR-663: Allow multiple files for stopwords, keepwords, protwords and synonyms
(Otis Gospodnetic, shalin)
66. SOLR-469: Added DataImportHandler as a contrib project which makes indexing data from Databases,
XML files and HTTP data sources into Solr quick and easy. Includes API and implementations for
supporting multiple data sources, processors and transformers for importing data. Supports full
data imports as well as incremental (delta) indexing. See http://wiki.apache.org/solr/DataImportHandler
for more details. (Noble Paul, shalin)
67. SOLR-622: SpellCheckComponent supports auto-loading indices on startup and optionally, (re)builds
indices on newSearcher event, if configured in solrconfig.xml (shalin)
68. SOLR-554: Hierarchical JDK log level selector for SOLR Admin replaces logging.jsp
(Sean Timm via shalin)
69. SOLR-506: Emitting HTTP Cache headers can be enabled or disabled through configuration on a
per-handler basis (shalin)
70. SOLR-716: Added support for properties in configuration files. Properties can be specified in
solr.xml and can be used in solrconfig.xml and schema.xml (Henri Biestro, hossman, ryan, shalin)
71. SOLR-1129 : Support binding dynamic fields to beans in SolrJ (Avlesh Singh , noble)
72. SOLR-920 : Cache and reuse IndexSchema . A new attribute added in solr.xml called 'shareSchema' (noble)
Changes in runtime behavior
1. SOLR-559: use Lucene updateDocument, deleteDocuments methods. This
removes the maxBufferedDeletes parameter added by SOLR-310 as Lucene
now manages the deletes. This provides slightly better indexing
performance and makes overwrites atomic, eliminating the possibility of
a crash causing duplicates. (yonik)
2. SOLR-689 / SOLR-695: If you have used "MultiCore" functionality in an unreleased
version of 1.3-dev, many classes and configs have been renamed for the official
1.3 release. Speciffically, solr.xml has replaced multicore.xml, and uses a slightly
different syntax. The solrj classes: MultiCore{Request/Response/Params} have been
renamed: CoreAdmin{Request/Response/Params} (hossman, ryan, Henri Biestro)
3. SOLR-647: reference count the SolrCore uses to prevent a premature
close while a core is still in use. (Henri Biestro, Noble Paul, yonik)
4. SOLR-737: SolrQueryParser now uses a ConstantScoreQuery for wildcard
queries that prevent an exception from being thrown when the number
of matching terms exceeds the BooleanQuery clause limit. (yonik)
Optimizations
1. SOLR-276: improve JSON writer speed. (yonik)
2. SOLR-310: bound and reduce memory usage by providing <maxBufferedDeletes> parameter,
which flushes deleted without forcing the user to use <commit/> for this purpose.
(klaas)
3. SOLR-348: short-circuit faceting if less than mincount docs match. (yonik)
4. SOLR-354: Optimize removing all documents. Now when a delete by query
of *:* is issued, the current index is removed. (yonik)
5. SOLR-377: Speed up response writers. (yonik)
6. SOLR-342: Added support into the SolrIndexWriter for using several new features of the new
LuceneIndexWriter, including: setRAMBufferSizeMB(), setMergePolicy(), setMergeScheduler.
Also, added support to specify Lucene's autoCommit functionality (not to be confused with Solr's
similarily named autoCommit functionality) via the <luceneAutoCommit> config. item. See the test
and example solrconfig.xml <indexDefaults> section for usage. Performance during indexing should
be significantly increased by moving up to 2.3 due to Lucene's new indexing capabilities.
Furthermore, the setRAMBufferSizeMB makes it more logical to decide on tuning factors related to
indexing. For best performance, leave the mergePolicy and mergeScheduler as the defaults and set
ramBufferSizeMB instead of maxBufferedDocs. The best value for this depends on the types of
documents in use. 32 should be a good starting point, but reports have shown up to 48 MB provides
good results. Note, it is acceptable to set both ramBufferSizeMB and maxBufferedDocs, and Lucene
will flush based on whichever limit is reached first. (gsingers)
7. SOLR-330: Converted TokenStreams to use Lucene's new char array based
capabilities. (gsingers)
8. SOLR-624: Only take snapshots if there are differences to the index (Richard Trey Hyde via gsingers)
9. SOLR-587: Delete by Query performance greatly improved by using
new underlying Lucene IndexWriter implementation. (yonik)
10. SOLR-730: Use read-only IndexReaders that don't synchronize
isDeleted(). This will speed up function queries and *:* queries
as well as improve their scalability on multi-CPU systems.
(Mark Miller via yonik)
Bug Fixes
1. Make TextField respect sortMissingFirst and sortMissingLast fields.
(J.J. Larrea via yonik)
2. autoCommit/maxDocs was not working properly when large autoCommit/maxTime
was specified (klaas)
3. SOLR-283: autoCommit was not working after delete. (ryan)
4. SOLR-286: ContentStreamBase was not using default encoding for getBytes()
(Toru Matsuzawa via ryan)
5. SOLR-292: Fix MoreLikeThis facet counting. (Pieter Berkel via ryan)
6. SOLR-297: Fix bug in RequiredSolrParams where requiring a field
specific param would fail if a general default value had been supplied.
(hossman)
7. SOLR-331: Fix WordDelimiterFilter handling of offsets for synonyms or
other injected tokens that can break highlighting. (yonik)
8. SOLR-282: Snapshooter does not work on Solaris and OS X since the cp command
there does not have the -l option. Also updated commit/optimize related
scripts to handle both old and new response format. (bill)
9. SOLR-294: Logging of elapsed time broken on Solaris because the date command
there does not support the %s output format. (bill)
10. SOLR-136: Snappuller - "date -d" and locales don't mix. (J<>Á<EFBFBD>rgen Hermann via bill)
11. SOLR-333: Changed distributiondump.jsp to use Solr HOME instead of CWD to set path.
12. SOLR-393: Removed duplicate contentType from raw-schema.jsp. (bill)
13. SOLR-413: Requesting a large numbers of documents to be returned (limit)
can result in an out-of-memory exception, even for a small index. (yonik)
14. The CSV loader incorrectly threw an exception when given
header=true (the default). (ryan, yonik)
15. SOLR-449: the python and ruby response writers are now able to correctly
output NaN and Infinity in their respective languages. (klaas)
16. SOLR-42: HTMLStripReader tokenizers now preserve correct source
offsets for highlighting. (Grant Ingersoll via yonik)
17. SOLR-481: Handle UnknownHostException in _info.jsp (gsingers)
18. SOLR-324: Add proper support for Long and Doubles in sorting, etc. (gsingers)
19. SOLR-496: Cache-Control max-age changed to Long so Expires
calculation won't cause overflow. (Thomas Peuss via hossman)
20. SOLR-535: Fixed typo (Tokenzied -> Tokenized) in schema.jsp (Thomas Peuss via billa)
21. SOLR-529: Better error messages from SolrQueryParser when field isn't
specified and there is no defaultSearchField in schema.xml
(Lars Kotthoff via hossman)
22. SOLR-530: Better error messages/warnings when parsing schema.xml:
field using bogus fieldtype and multiple copyFields to a non-multiValue
field. (Shalin Shekhar Mangar via hossman)
23. SOLR-528: Better error message when defaultSearchField is bogus or not
indexed. (Lars Kotthoff via hossman)
24. SOLR-533: Fixed tests so they don't use hardcoded port numbers.
(hossman)
25. SOLR-400: SolrExceptionTest should now handle using OpenDNS as a DNS provider (gsingers)
26. SOLR-541: Legacy XML update support (provided by SolrUpdateServlet
when no RequestHandler is mapped to "/update") now logs error correctly.
(hossman)
27. SOLR-267: Changed logging to report number of hits, and also provide a mechanism to add log
messages to be output by the SolrCore via a NamedList toLog member variable.
(Will Johnson, yseeley, gsingers)
SOLR-267: Removed adding values to the HTTP headers in SolrDispatchFilter (gsingers)
28. SOLR-509: Moved firstSearcher event notification to the end of the SolrCore constructor
(Koji Sekiguchi via gsingers)
29. SOLR-470, SOLR-552, SOLR-544, SOLR-701: Multiple fixes to DateField
regarding lenient parsing of optional milliseconds, and correct
formating using the canonical representation. LegacyDateField has
been added for people who have come to depend on the existing
broken behavior. (hossman, Stefan Oestreicher)
30. SOLR-539: Fix for non-atomic long counters and a cast fix to avoid divide
by zero. (Sean Timm via Otis Gospodnetic)
31. SOLR-514: Added explicit media-type with UTF* charset to *.xsl files that
don't already have one. (hossman)
32. SOLR-505: Give RequestHandlers the possiblity to suppress the generation
of HTTP caching headers. (Thomas Peuss via Otis Gospodnetic)
33. SOLR-553: Handle highlighting of phrase terms better when
hl.usePhraseHighligher=true URL param is used.
(Bojan Smid via Otis Gospodnetic)
34. SOLR-590: Limitation in pgrep on Linux platform breaks script-utils fixUser.
(Hannes Schmidt via billa)
35. SOLR-597: SolrServlet no longer "caches" SolrCore. This was causing
problems in Resin, and could potentially cause problems for customized
usages of SolrServlet.
36. SOLR-585: Now sets the QParser on the ResponseBuilder (gsingers)
37. SOLR-604: If the spellchecking path is relative, make it relative to the Solr Data Directory.
(Shalin Shekhar Mangar via gsingers)
38. SOLR-584: Make stats.jsp and stats.xsl more robust.
(Yousef Ourabi and hossman)
39. SOLR-443: SolrJ: Declare UTF-8 charset on POSTed parameters
to avoid problems with servlet containers that default to latin-1
and allow switching of the exact POST mechanism for parameters
via useMultiPartPost in CommonsHttpSolrServer.
(Lars Kotthoff, Andrew Schurman, ryan, yonik)
40. SOLR-556: multi-valued fields always highlighted in disparate snippets
(Lars Kotthoff via klaas)
41. SOLR-501: Fix admin/analysis.jsp UTF-8 input for some other servlet
containers such as Tomcat. (Hiroaki Kawai, Lars Kotthoff via yonik)
42. SOLR-616: SpellChecker accuracy configuration is not applied for FileBasedSpellChecker.
Apply it for FileBasedSpellChecker and IndexBasedSpellChecker both.
(shalin)
43. SOLR-648: SpellCheckComponent throws NullPointerException on using spellcheck.q request
parameter after restarting Solr, if reload is called but build is not called.
(Jonathan Lee, shalin)
44. SOLR-598: DebugComponent now always occurs last in the SearchHandler list unless the
components are explicitly declared. (gsingers)
45. SOLR-676: DataImportHandler should use UpdateRequestProcessor API instead of directly
using UpdateHandler. (shalin)
46. SOLR-696: Fixed bug in NamedListCodec in regards to serializing Iterable objects. (gsingers)
47. SOLR-669: snappuler fix for FreeBSD/Darwin (Richard "Trey" Hyde via Otis Gospodnetic)
48. SOLR-606: Fixed spell check collation offset issue. (Stefan Oestreicher , Geoffrey Young, gsingers)
49. SOLR-589: Improved handling of badly formated query strings (Sean Timm via Otis Gospodnetic)
50. SOLR-749: Allow QParser and ValueSourceParsers to be extended with same name (hossman, gsingers)
Other Changes
1. SOLR-135: Moved common classes to org.apache.solr.common and altered the
build scripts to make two jars: apache-solr-1.3.jar and
apache-solr-1.3-common.jar. This common.jar can be used in client code;
It does not have lucene or junit dependencies. The original classes
have been replaced with a @Deprecated extended class and are scheduled
to be removed in a later release. While this change does not affect API
compatibility, it is recommended to update references to these
deprecated classes. (ryan)
2. SOLR-268: Tweaks to post.jar so it prints the error message from Solr.
(Brian Whitman via hossman)
3. Upgraded to Lucene 2.2.0; June 18, 2007.
4. SOLR-215: Static access to SolrCore.getSolrCore() and SolrConfig.config
have been deprecated in order to support multiple loaded cores.
(Henri Biestro via ryan)
5. SOLR-367: The create method in all TokenFilter and Tokenizer Factories
provided by Solr now declare their specific return types instead of just
using "TokenStream" (hossman)
6. SOLR-396: Hooks add to build system for automatic generation of (stub)
Tokenizer and TokenFilter Factories.
Also: new Factories for all Tokenizers and TokenFilters provided by the
lucene-analyzers-2.2.0.jar -- includes support for German, Chinese,
Russan, Dutch, Greek, Brazilian, Thai, and French. (hossman)
7. Upgraded to commons-CSV r609327, which fixes escaping bugs and
introduces new escaping and whitespace handling options to
increase compatibility with different formats. (yonik)
8. Upgraded to Lucene 2.3.0; Jan 23, 2008.
9. SOLR-451: Changed analysis.jsp to use POST instead of GET, also made the input area a
bit bigger (gsingers)
10. Upgrade to Lucene 2.3.1
11. SOLR-531: Different exit code for rsyncd-start and snappuller if disabled (Thomas Peuss via billa)
12. SOLR-550: Clarified DocumentBuilder addField javadocs (gsingers)
13. Upgrade to Lucene 2.3.2
14. SOLR-518: Changed luke.xsl to use divs w/css for generating histograms
instead of SVG (Thomas Peuss via hossman)
15. SOLR-592: Added ShardParams interface and changed several string literals
to references to constants in CommonParams.
(Lars Kotthoff via Otis Gospodnetic)
16. SOLR-520: Deprecated unused LengthFilter since already core in
Lucene-Java (hossman)
17. SOLR-645: Refactored SimpleFacetsTest (Lars Kotthoff via hossman)
18. SOLR-591: Changed Solrj default value for facet.sort to true (Lars Kotthoff via Shalin)
19. Upgraded to Lucene 2.4-dev (r669476) to support SOLR-572 (gsingers)
20. SOLR-636: Improve/simplify example configs; and make index.jsp
links more resilient to configs loaded via an InputStream
(Lars Kotthoff, hossman)
21. SOLR-682: Scripts now support FreeBSD (Richard Trey Hyde via gsingers)
22. SOLR-489: Added in deprecation comments. (Sean Timm, Lars Kothoff via gsingers)
23. SOLR-692: Migrated to stable released builds of StAX API 1.0.1 and StAX 1.2.0 (shalin)
24. Upgraded to Lucene 2.4-dev (r686801) (yonik)
25. Upgraded to Lucene 2.4-dev (r688745) 27-Aug-2008 (yonik)
26. Upgraded to Lucene 2.4-dev (r691741) 03-Sep-2008 (yonik)
27. Replaced the StAX reference implementation with the geronimo
StAX API jar, and the Woodstox StAX implementation. (yonik)
Build
1. SOLR-411. Changed the names of the Solr JARs to use the defacto standard JAR names based on
project-name-version.jar. This yields, for example:
apache-solr-common-1.3-dev.jar
apache-solr-solrj-1.3-dev.jar
apache-solr-1.3-dev.jar
2. SOLR-479: Added clover code coverage targets for committers and the nightly build. Requires
the Clover library, as licensed to Apache and only available privately. To run:
ant -Drun.clover=true clean clover test generate-clover-reports
3. SOLR-510: Nightly release includes client sources. (koji)
4. SOLR-563: Modified the build process to build contrib projects
(Shalin Shekhar Mangar via Otis Gospodnetic)
5. SOLR-673: Modify build file to create javadocs for core, solrj, contrib and "all inclusive" (shalin)
6. SOLR-672: Nightly release includes contrib sources. (Jeremy Hinegardner, shalin)
7. SOLR-586: Added ant target and POM files for building maven artifacts of the Solr core, common,
client and contrib. The target can publish artifacts with source and javadocs.
(Spencer Crissman, Craig McClanahan, shalin)
================== Release 1.2 ==================
Upgrading from Solr 1.1
-------------------------------------
IMPORTANT UPGRADE NOTE: In a master/slave configuration, all searchers/slaves
should be upgraded before the master! If the master were to be updated
first, the older searchers would not be able to read the new index format.
Older Apache Solr installations can be upgraded by replacing
the relevant war file with the new version. No changes to configuration
files should be needed.
This version of Solr contains a new version of Lucene implementing
an updated index format. This version of Solr/Lucene can still read
and update indexes in the older formats, and will convert them to the new
format on the first index change. One change in the new index format
is that all "norms" are kept in a single file, greatly reducing the number
of files per segment. Users of compound file indexes will want to consider
converting to the non-compound format for faster indexing and slightly better
search concurrency.
The JSON response format for facets has changed to make it easier for
clients to retain sorted order. Use json.nl=map explicitly in clients
to get the old behavior, or add it as a default to the request handler
in solrconfig.xml
The Lucene based Solr query syntax is slightly more strict.
A ':' in a field value must be escaped or the whole value must be quoted.
The Solr "Request Handler" framework has been updated in two key ways:
First, if a Request Handler is registered in solrconfig.xml with a name
starting with "/" then it can be accessed using path-based URL, instead of
using the legacy "/select?qt=name" URL structure. Second, the Request
Handler framework has been extended making it possible to write Request
Handlers that process streams of data for doing updates, and there is a
new-style Request Handler for XML updates given the name of "/update" in
the example solrconfig.xml. Existing installations without this "/update"
handler will continue to use the old update servlet and should see no
changes in behavior. For new-style update handlers, errors are now
reflected in the HTTP status code, Content-type checking is more strict,
and the response format has changed and is controllable via the wt
parameter.
Detailed Change List
--------------------
New Features
1. SOLR-82: Default field values can be specified in the schema.xml.
(Ryan McKinley via hossman)
2. SOLR-89: Two new TokenFilters with corresponding Factories...
* TrimFilter - Trims leading and trailing whitespace from Tokens
* PatternReplaceFilter - applies a Pattern to each token in the
stream, replacing match occurances with a specified replacement.
(hossman)
3. SOLR-91: allow configuration of a limit of the number of searchers
that can be warming in the background. This can be used to avoid
out-of-memory errors, or contention caused by more and more searchers
warming in the background. An error is thrown if the limit specified
by maxWarmingSearchers in solrconfig.xml is exceeded. (yonik)
4. SOLR-106: New faceting parameters that allow specification of a
minimum count for returned facets (facet.mincount), paging through facets
(facet.offset, facet.limit), and explicit sorting (facet.sort).
facet.zeros is now deprecated. (yonik)
5. SOLR-80: Negative queries are now allowed everywhere. Negative queries
are generated and cached as their positive counterpart, speeding
generation and generally resulting in smaller sets to cache.
Set intersections in SolrIndexSearcher are more efficient,
starting with the smallest positive set, subtracting all negative
sets, then intersecting with all other positive sets. (yonik)
6. SOLR-117: Limit a field faceting to constraints with a prefix specified
by facet.prefix or f.<field>.facet.prefix. (yonik)
7. SOLR-107: JAVA API: Change NamedList to use Java5 generics
and implement Iterable<Map.Entry> (Ryan McKinley via yonik)
8. SOLR-104: Support for "Update Plugins" -- RequestHandlers that want
access to streams of data for doing updates. ContentStreams can come
from the raw POST body, multi-part form data, or remote URLs.
Included in this change is a new SolrDispatchFilter that allows
RequestHandlers registered with names that begin with a "/" to be
accessed using a URL structure based on that name.
(Ryan McKinley via hossman)
9. SOLR-126: DirectUpdateHandler2 supports autocommitting after a specified time
(in ms), using <autoCommit><maxTime>10000</maxTime></autoCommit>.
(Ryan McKinley via klaas).
10. SOLR-116: IndexInfoRequestHandler added. (Erik Hatcher)
11. SOLR-79: Add system property ${<sys.prop>[:<default>]} substitution for
configuration files loaded, including schema.xml and solrconfig.xml.
(Erik Hatcher with inspiration from Andrew Saar)
12. SOLR-149: Changes to make Solr more easily embeddable, in addition
to logging which request handler handled each request.
(Ryan McKinley via yonik)
13. SOLR-86: Added standalone Java-based command-line updater.
(Erik Hatcher via Bertrand Delecretaz)
14. SOLR-152: DisMaxRequestHandler now supports configurable alternate
behavior when q is not specified. A "q.alt" param can be specified
using SolrQueryParser syntax as a mechanism for specifying what query
the dismax handler should execute if the main user query (q) is blank.
(Ryan McKinley via hossman)
15. SOLR-158: new "qs" (Query Slop) param for DisMaxRequestHandler
allows for specifying the amount of default slop to use when parsing
explicit phrase queries from the user.
(Adam Hiatt via hossman)
16. SOLR-81: SpellCheckerRequestHandler that uses the SpellChecker from
the Lucene contrib.
(Otis Gospodnetic and Adam Hiatt)
17. SOLR-182: allow lazy loading of request handlers on first request.
(Ryan McKinley via yonik)
18. SOLR-81: More SpellCheckerRequestHandler enhancements, inlcluding
support for relative or absolute directory path configurations, as
well as RAM based directory. (hossman)
19. SOLR-197: New parameters for input: stream.contentType for specifying
or overriding the content type of input, and stream.file for reading
local files. (Ryan McKinley via yonik)
20. SOLR-66: CSV data format for document additions and updates. (yonik)
21. SOLR-184: add echoHandler=true to responseHeader, support echoParams=all
(Ryan McKinley via ehatcher)
22. SOLR-211: Added a regex PatternTokenizerFactory. This extracts tokens
from the input string using a regex Pattern. (Ryan McKinley)
23. SOLR-162: Added a "Luke" request handler and other admin helpers.
This exposes the system status through the standard requestHandler
framework. (ryan)
24. SOLR-212: Added a DirectSolrConnection class. This lets you access
solr using the standard request/response formats, but does not require
an HTTP connection. It is designed for embedded applications. (ryan)
25. SOLR-204: The request dispatcher (added in SOLR-104) can handle
calls to /select. This offers uniform error handling for /update and
/select. To enable this behavior, you must add:
<requestDispatcher handleSelect="true" > to your solrconfig.xml
See the example solrconfig.xml for details. (ryan)
26. SOLR-170: StandardRequestHandler now supports a "sort" parameter.
Using the ';' syntax is still supported, but it is recommended to
transition to the new syntax. (ryan)
27. SOLR-181: The index schema now supports "required" fields. Attempts
to add a document without a required field will fail, returning a
descriptive error message. By default, the uniqueKey field is
a required field. This can be disabled by setting required=false
in schema.xml. (Greg Ludington via ryan)
28. SOLR-217: Fields configured in the schema to be neither indexed or
stored will now be quietly ignored by Solr when Documents are added.
The example schema has a comment explaining how this can be used to
ignore any "unknown" fields.
(Will Johnson via hossman)
29. SOLR-227: If schema.xml defines multiple fieldTypes, fields, or
dynamicFields with the same name, a severe error will be logged rather
then quietly continuing. Depending on the <abortOnConfigurationError>
settings, this may halt the server. Likewise, if solrconfig.xml
defines multiple RequestHandlers with the same name it will also add
an error. (ryan)
30. SOLR-226: Added support for dynamic field as the destination of a
copyField using glob (*) replacement. (ryan)
31. SOLR-224: Adding a PhoneticFilterFactory that uses apache commons codec
language encoders to build phonetically similar tokens. This currently
supports: DoubleMetaphone, Metaphone, Soundex, and RefinedSoundex (ryan)
32. SOLR-199: new n-gram tokenizers available via NGramTokenizerFactory
and EdgeNGramTokenizerFactory. (Adam Hiatt via yonik)
33. SOLR-234: TrimFilter can update the Token's startOffset and endOffset
if updateOffsets="true". By default the Token offsets are unchanged.
(ryan)
34. SOLR-208: new example_rss.xsl and example_atom.xsl to provide more
examples for people about the Solr XML response format and how they
can transform it to suit different needs.
(Brian Whitman via hossman)
35. SOLR-249: Deprecated SolrException( int, ... ) constructors in favor
of constructors that takes an ErrorCode enum. This will ensure that
all SolrExceptions use a valid HTTP status code. (ryan)
36. SOLR-386: Abstracted SolrHighlighter and moved existing implementation
to DefaultSolrHighlighter. Adjusted SolrCore and solrconfig.xml so
that highlighter is configurable via a class attribute. Allows users
to use their own highlighter implementation. (Tricia Williams via klaas)
Changes in runtime behavior
1. Highlighting using DisMax will only pick up terms from the main
user query, not boost or filter queries (klaas).
2. SOLR-125: Change default of json.nl to flat, change so that
json.nl only affects items where order matters (facet constraint
listings). Fix JSON output bug for null values. Internal JAVA API:
change most uses of NamedList to SimpleOrderedMap. (yonik)
3. A new method "getSolrQueryParser" has been added to the IndexSchema
class for retrieving a new SolrQueryParser instance with all options
specified in the schema.xml's <solrQueryParser> block set. The
documentation for the SolrQueryParser constructor and it's use of
IndexSchema have also been clarified.
(Erik Hatcher and hossman)
4. DisMaxRequestHandler's bq, bf, qf, and pf parameters can now accept
multiple values (klaas).
5. Query are re-written before highlighting is performed. This enables
proper highlighting of prefix and wildcard queries (klaas).
6. A meaningful exception is raised when attempting to add a doc missing
a unique id if it is declared in the schema and allowDups=false.
(ryan via klaas)
7. SOLR-183: Exceptions with error code 400 are raised when
numeric argument parsing fails. RequiredSolrParams class added
to facilitate checking for parameters that must be present.
(Ryan McKinley, J.J. Larrea via yonik)
8. SOLR-179: By default, solr will abort after any severe initalization
errors. This behavior can be disabled by setting:
<abortOnConfigurationError>false</abortOnConfigurationError>
in solrconfig.xml (ryan)
9. The example solrconfig.xml maps /update to XmlUpdateRequestHandler using
the new request dispatcher (SOLR-104). This requires posted content to
have a valid contentType: curl -H 'Content-type:text/xml; charset=utf-8'
The response format matches that of /select and returns standard error
codes. To enable solr1.1 style /update, do not map "/update" to any
handler in solrconfig.xml (ryan)
10. SOLR-231: If a charset is not specified in the contentType,
ContentStream.getReader() will use UTF-8 encoding. (ryan)
11. SOLR-230: More options for post.jar to support stdin, xml on the
commandline, and defering commits. Tutorial modified to take
advantage of these options so there is no need for curl.
(hossman)
12. SOLR-128: Upgraded Jetty to the latest stable release 6.1.3 (ryan)
Optimizations
1. SOLR-114: HashDocSet specific implementations of union() and andNot()
for a 20x performance improvement for those set operations, and a new
hash algorithm speeds up exists() by 10% and intersectionSize() by 8%.
(yonik)
2. SOLR-115: Solr now uses BooleanQuery.clauses() instead of
BooleanQuery.getClauses() in any situation where there is no risk of
modifying the original query.
(hossman)
3. SOLR-221: Speed up sorted faceting on multivalued fields by ~60%
when the base set consists of a relatively large portion of the
index. (yonik)
4. SOLR-221: Added a facet.enum.cache.minDf parameter which avoids
using the filterCache for terms that match few documents, trading
decreased memory usage for increased query time. (yonik)
Bug Fixes
1. SOLR-87: Parsing of synonym files did not correctly handle escaped
whitespace such as \r\n\t\b\f. (yonik)
2. SOLR-92: DOMUtils.getText (used when parsing config files) did not
work properly with many DOM implementations when dealing with
"Attributes". (Ryan McKinley via hossman)
3. SOLR-9,SOLR-99: Tighten up sort specification error checking, throw
exceptions for missing sort specifications or a sort on a non-indexed
field. (Ryan McKinley via yonik)
4. SOLR-145: Fix for bug introduced in SOLR-104 where some Exceptions
were being ignored by all "out of the box" RequestHandlers. (hossman)
5. SOLR-166: JNDI solr.home code refactoring. SOLR-104 moved
some JNDI related code to the init method of a Servlet Filter -
according to the Servlet Spec, all Filter's should be initialized
prior to initializing any Servlets, but this is not the case in at
least one Servlet Container (Resin). This "bug fix" refactors
this JNDI code so that it should be executed the first time any
attempt is made to use the solr.home dir.
(Ryan McKinley via hossman)
6. SOLR-173: Bug fix to SolrDispatchFilter to reduce "too many open
files" problem was that SolrDispatchFilter was not closing requests
when finished. Also modified ResponseWriters to only fetch a Searcher
reference if necessary for writing out DocLists.
(Ryan McKinley via hossman)
7. SOLR-168: Fix display positioning of multiple tokens at the same
position in analysis.jsp (yonik)
8. SOLR-167: The SynonymFilter sometimes generated incorrect offsets when
multi token synonyms were mached in the source text. (yonik)
9. SOLR-188: bin scripts do not support non-default webapp names. Added "-U"
option to specify a full path to the update url, overriding the
"-h" (hostname), "-p" (port) and "-w" (webapp name) parameters.
(Jeff Rodenburg via billa)
10. SOLR-198: RunExecutableListener always waited for the process to
finish, even when wait="false" was set. (Koji Sekiguchi via yonik)
11. SOLR-207: Changed distribution scripts to remove recursive find
and avoid use of "find -maxdepth" on platforms where it is not
supported. (yonik)
12. SOLR-222: Changing writeLockTimeout in solrconfig.xml did not
change the effective timeout. (Koji Sekiguchi via yonik)
13. Changed the SOLR-104 RequestDispatcher so that /select?qt=xxx can not
access handlers that start with "/". This makes path based authentication
possible for path based request handlers. (ryan)
14. SOLR-214: Some servlet containers (including Tomcat and Resin) do not
obey the specified charset. Rather then letting the the container handle
it solr now uses the charset from the header contentType to decode posted
content. Using the contentType: "text/xml; charset=utf-8" will force
utf-8 encoding. If you do not specify a contentType, it will use the
platform default. (Koji Sekiguchi via ryan)
15. SOLR-241: Undefined system properties used in configuration files now
cause a clear message to be logged rather than an obscure exception thrown.
(Koji Sekiguchi via ehatcher)
Other Changes
1. Updated to Lucene 2.1
2. Updated to Lucene 2007-05-20_00-04-53
================== Release 1.1.0 ==================
Status
------
This is the first release since Solr joined the Incubator, and brings many
new features and performance optimizations including highlighting,
faceted browsing, and JSON/Python/Ruby response formats.
Upgrading from previous Solr versions
-------------------------------------
Older Apache Solr installations can be upgraded by replacing
the relevant war file with the new version. No changes to configuration
files are needed and the index format has not changed.
The default version of the Solr XML response syntax has been changed to 2.2.
Behavior can be preserved for those clients not explicitly specifying a
version by adding a default to the request handler in solrconfig.xml
By default, Solr will no longer use a searcher that has not fully warmed,
and requests will block in the meantime. To change back to the previous
behavior of using a cold searcher in the event there is no other
warm searcher, see the useColdSearcher config item in solrconfig.xml
The XML response format when adding multiple documents to the collection
in a single <add> command has changed to return a single <result>.
Detailed Change List
--------------------
New Features
1. added support for setting Lucene's positionIncrementGap
2. Admin: new statistics for SolrIndexSearcher
3. Admin: caches now show config params on stats page
3. max() function added to FunctionQuery suite
4. postOptimize hook, mirroring the functionallity of the postCommit hook,
but only called on an index optimize.
5. Ability to HTTP POST query requests to /select in addition to HTTP-GET
6. The default search field may now be overridden by requests to the
standard request handler using the df query parameter. (Erik Hatcher)
7. Added DisMaxRequestHandler and SolrPluginUtils. (Chris Hostetter)
8. Support for customizing the QueryResponseWriter per request
(Mike Baranczak / SOLR-16 / hossman)
9. Added KeywordTokenizerFactory (hossman)
10. copyField accepts dynamicfield-like names as the source.
(Darren Erik Vengroff via yonik, SOLR-21)
11. new DocSet.andNot(), DocSet.andNotSize() (yonik)
12. Ability to store term vectors for fields. (Mike Klaas via yonik, SOLR-23)
13. New abstract BufferedTokenStream for people who want to write
Tokenizers or TokenFilters that require arbitrary buffering of the
stream. (SOLR-11 / yonik, hossman)
14. New RemoveDuplicatesToken - useful in situations where
synonyms, stemming, or word-deliminater-ing produce identical tokens at
the same position. (SOLR-11 / yonik, hossman)
15. Added highlighting to SolrPluginUtils and implemented in StandardRequestHandler
and DisMaxRequestHandler (SOLR-24 / Mike Klaas via hossman,yonik)
16. SnowballPorterFilterFactory language is configurable via the "language"
attribute, with the default being "English". (Bertrand Delacretaz via yonik, SOLR-27)
17. ISOLatin1AccentFilterFactory, instantiates ISOLatin1AccentFilter to remove accents.
(Bertrand Delacretaz via yonik, SOLR-28)
18. JSON, Python, Ruby QueryResponseWriters: use wt="json", "python" or "ruby"
(yonik, SOLR-31)
19. Make web admin pages return UTF-8, change Content-type declaration to include a
space between the mime-type and charset (Philip Jacob, SOLR-35)
20. Made query parser default operator configurable via schema.xml:
<solrQueryParser defaultOperator="AND|OR"/>
The default operator remains "OR".
21. JAVA API: new version of SolrIndexSearcher.getDocListAndSet() which takes
flags (Greg Ludington via yonik, SOLR-39)
22. A HyphenatedWordsFilter, a text analysis filter used during indexing to rejoin
words that were hyphenated and split by a newline. (Boris Vitez via yonik, SOLR-41)
23. Added a CompressableField base class which allows fields of derived types to
be compressed using the compress=true setting. The field type also gains the
ability to specify a size threshold at which field data is compressed.
(klaas, SOLR-45)
24. Simple faceted search support for fields (enumerating terms)
and arbitrary queries added to both StandardRequestHandler and
DisMaxRequestHandler. (hossman, SOLR-44)
25. In addition to specifying default RequestHandler params in the
solrconfig.xml, support has been added for configuring values to be
appended to the multi-val request params, as well as for configuring
invariant params that can not overridden in the query. (hossman, SOLR-46)
26. Default operator for query parsing can now be specified with q.op=AND|OR
from the client request, overriding the schema value. (ehatcher)
27. New XSLTResponseWriter does server side XSLT processing of XML Response.
In the process, an init(NamedList) method was added to QueryResponseWriter
which works the same way as SolrRequestHandler.
(Bertrand Delacretaz / SOLR-49 / hossman)
28. json.wrf parameter adds a wrapper-function around the JSON response,
useful in AJAX with dynamic script tags for specifying a JavaScript
callback function. (Bertrand Delacretaz via yonik, SOLR-56)
29. autoCommit can be specified every so many documents added (klaas, SOLR-65)
30. ${solr.home}/lib directory can now be used for specifying "plugin" jars
(hossman, SOLR-68)
31. Support for "Date Math" relative "NOW" when specifying values of a
DateField in a query -- or when adding a document.
(hossman, SOLR-71)
32. useColdSearcher control in solrconfig.xml prevents the first searcher
from being used before it's done warming. This can help prevent
thrashing on startup when multiple requests hit a cold searcher.
The default is "false", preventing use before warm. (yonik, SOLR-77)
Changes in runtime behavior
1. classes reorganized into different packages, package names changed to Apache
2. force read of document stored fields in QuerySenderListener
3. Solr now looks in ./solr/conf for config, ./solr/data for data
configurable via solr.solr.home system property
4. Highlighter params changed to be prefixed with "hl."; allow fragmentsize
customization and per-field overrides on many options
(Andrew May via klaas, SOLR-37)
5. Default param values for DisMaxRequestHandler should now be specified
using a '<lst name="defaults">...</lst>' init param, for backwards
compatability all init prams will be used as defaults if an init param
with that name does not exist. (hossman, SOLR-43)
6. The DisMaxRequestHandler now supports multiple occurances of the "fq"
param. (hossman, SOLR-44)
7. FunctionQuery.explain now uses ComplexExplanation to provide more
accurate score explanations when composed in a BooleanQuery.
(hossman, SOLR-25)
8. Document update handling locking is much sparser, allowing performance gains
through multiple threads. Large commits also might be faster (klaas, SOLR-65)
9. Lazy field loading can be enabled via a solrconfig directive. This will be faster when
not all stored fields are needed from a document (klaas, SOLR-52)
10. Made admin JSPs return XML and transform them with new XSL stylesheets
(Otis Gospodnetic, SOLR-58)
11. If the "echoParams=explicit" request parameter is set, request parameters are copied
to the output. In an XML output, they appear in new <lst name="params"> list inside
the new <lst name="responseHeader"> element, which replaces the old <responseHeader>.
Adding a version=2.1 parameter to the request produces the old format, for backwards
compatibility (bdelacretaz and yonik, SOLR-59).
Optimizations
1. getDocListAndSet can now generate both a DocList and a DocSet from a
single lucene query.
2. BitDocSet.intersectionSize(HashDocSet) no longer generates an intermediate
set
3. OpenBitSet completed, replaces BitSet as the implementation for BitDocSet.
Iteration is faster, and BitDocSet.intersectionSize(BitDocSet) and unionSize
is between 3 and 4 times faster. (yonik, SOLR-15)
4. much faster unionSize when one of the sets is a HashDocSet: O(smaller_set_size)
5. Optimized getDocSet() for term queries resulting in a 36% speedup of facet.field
queries where DocSets aren't cached (for example, if the number of terms in the field
is larger than the filter cache.) (yonik)
6. Optimized facet.field faceting by as much as 500 times when the field has
a single token per document (not multiValued & not tokenized) by using the
Lucene FieldCache entry for that field to tally term counts. The first request
utilizing the FieldCache will take longer than subsequent ones.
Bug Fixes
1. Fixed delete-by-id for field types who's indexed form is different
from the printable form (mainly sortable numeric types).
2. Added escaping of attribute values in the XML response (Erik Hatcher)
3. Added empty extractTerms() to FunctionQuery to enable use in
a MultiSearcher (Yonik)
4. WordDelimiterFilter sometimes lost token positionIncrement information
5. Fix reverse sorting for fields were sortMissingFirst=true
(Rob Staveley, yonik)
6. Worked around a Jetty bug that caused invalid XML responses for fields
containing non ASCII chars. (Bertrand Delacretaz via yonik, SOLR-32)
7. WordDelimiterFilter can throw exceptions if configured with both
generate and catenate off. (Mike Klaas via yonik, SOLR-34)
8. Escape '>' in XML output (because ]]> is illegal in CharData)
9. field boosts weren't being applied and doc boosts were being applied to fields (klaas)
10. Multiple-doc update generates well-formed xml (klaas, SOLR-65)
11. Better parsing of pingQuery from solrconfig.xml (hossman, SOLR-70)
12. Fixed bug with "Distribution" page introduced when Versions were
added to "Info" page (hossman)
13. Fixed HTML escaping issues with user input to analysis.jsp and action.jsp
(hossman, SOLR-74)
Other Changes
1. Upgrade to Lucene 2.0 nightly build 2006-06-22, lucene SVN revision 416224,
http://svn.apache.org/viewvc/lucene/java/trunk/CHANGES.txt?view=markup&pathrev=416224
2. Modified admin styles to improve display in Internet Explorer (Greg Ludington via billa, SOLR-6)
3. Upgrade to Lucene 2.0 nightly build 2006-07-15, lucene SVN revision 422302,
4. Included unique key field name/value (if available) in log message of add (billa, SOLR-18)
5. Updated to Lucene 2.0 nightly build 2006-09-07, SVN revision 462111
6. Added javascript to catch empty query in admin query forms (Tomislav Nakic-Alfirevic via billa, SOLR-48
7. blackslash escape * in ssh command used in snappuller for zsh compatibility, SOLR-63
8. check solr return code in admin scripts, SOLR-62
9. Updated to Lucene 2.0 nightly build 2006-11-15, SVN revision 475069
10. Removed src/apps containing the legacy "SolrTest" app (hossman, SOLR-3)
11. Simplified index.jsp and form.jsp, primarily by removing/hiding XML
specific params, and adding an option to pick the output type. (hossman)
12. Added new numeric build property "specversion" to allow clean
MANIFEST.MF files (hossman)
13. Added Solr/Lucene versions to "Info" page (hossman)
14. Explicitly set mime-type of .xsl files in web.xml to
application/xslt+xml (hossman)
15. Config parsing should now work useing DOM Level 2 parsers -- Solr
previously relied on getTextContent which is a DOM Level 3 addition
(Alexander Saar via hossman, SOLR-78)
2006/01/17 Solr open sourced, moves to Apache Incubator