lucene/solr/site/features.html

521 lines
17 KiB
HTML
Raw Normal View History

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta content="Apache Forrest" name="Generator">
<meta name="Forrest-version" content="0.9">
<meta name="Forrest-skin-name" content="lucene">
<title>Introduction to The Solr Enterprise Search Server</title>
<link type="text/css" href="skin/basic.css" rel="stylesheet">
<link media="screen" type="text/css" href="skin/screen.css" rel="stylesheet">
<link media="print" type="text/css" href="skin/print.css" rel="stylesheet">
<link type="text/css" href="skin/profile.css" rel="stylesheet">
<script src="skin/getBlank.js" language="javascript" type="text/javascript"></script><script src="skin/getMenu.js" language="javascript" type="text/javascript"></script><script src="skin/fontsize.js" language="javascript" type="text/javascript"></script>
<link rel="shortcut icon" href="images/favicon.ico">
</head>
<body onload="init()">
<script type="text/javascript">ndeSetTextSize();</script>
<div id="top">
<!--+
|breadtrail
+-->
<div class="breadtrail">
<a href="http://www.apache.org/">apache</a> &gt; <a href="http://lucene.apache.org/">lucene</a><script src="skin/breadcrumbs.js" language="JavaScript" type="text/javascript"></script>
</div>
<!--+
|header
+-->
<div class="header">
<!--+
|start group logo
+-->
<div class="grouplogo">
<a href="http://lucene.apache.org/"><img class="logoImage" alt="Lucene" src="images/lucene_green_150.gif" title="Apache Lucene"></a>
</div>
<!--+
|end group logo
+-->
<!--+
|start Project Logo
+-->
<div class="projectlogo">
<a href="http://lucene.apache.org/solr/"><img class="logoImage" alt="Solr" src="images/solr.jpg" title="Solr Description"></a>
</div>
<!--+
|end Project Logo
+-->
<!--+
|start Search
+-->
<div class="searchbox">
<form id="searchform" action="http://search.lucidimagination.com/p:solr" method="get" class="roundtopsmall">
<input onFocus="getBlank (this, 'Search the site with Solr');" size="25" name="q" id="query" type="text" value="Search the site with Solr">&nbsp;
<input onclick="selectProvider(this.form)" name="Search" value="Search" type="submit">
@
<select id="searchProvider" name="searchProvider"><option value="any">select provider</option><option value="lucid">Lucid Find</option><option value="sl">Search-Lucene</option></select>
</form>
</div>
<!--+
|end search
+-->
<!--+
|start Tabs
+-->
<ul id="tabs">
<li class="current">
<a class="selected" href="index.html">Main</a>
</li>
<li>
<a class="unselected" href="http://wiki.apache.org/solr/FrontPage">Wiki</a>
</li>
</ul>
<!--+
|end Tabs
+-->
</div>
</div>
<div id="main">
<div id="publishedStrip">
<!--+
|start Subtabs
+-->
<div id="level2tabs"></div>
<!--+
|end Endtabs
+-->
<script type="text/javascript"><!--
document.write("Last Published: " + document.lastModified);
// --></script>
</div>
<!--+
|breadtrail
+-->
<div class="breadtrail">
&nbsp;
</div>
<!--+
|start Menu, mainarea
+-->
<!--+
|start Menu
+-->
<div id="menu">
<div onclick="SwitchMenu('menu_1.1', 'skin/')" id="menu_1.1Title" class="menutitle">About</div>
<div id="menu_1.1" class="menuitemgroup">
<div class="menuitem">
<a href="index.html" title="Welcome to Solr">Welcome</a>
</div>
<div class="menuitem">
<a href="http://lucene.apache.org/java/docs/whoweare.html" title="Lucene/Solr Committers">Who We Are</a>
</div>
</div>
<div onclick="SwitchMenu('menu_selected_1.2', 'skin/')" id="menu_selected_1.2Title" class="menutitle" style="background-image: url('skin/images/chapter_open.gif');">Documentation</div>
<div id="menu_selected_1.2" class="selectedmenuitemgroup" style="display: block;">
<div class="menupage">
<div class="menupagetitle">Features</div>
</div>
<div class="menuitem">
<a href="tutorial.html">Tutorial</a>
</div>
<div class="menuitem">
<a href="http://wiki.apache.org/solr/FrontPage">Docs (Wiki)</a>
</div>
<div class="menuitem">
<a href="http://wiki.apache.org/solr/FAQ">FAQ</a>
</div>
<div class="menuitem">
<a href="api/index.html">All Javadoc</a>
</div>
<div class="menuitem">
<a href="api/solrj/index.html">SolrJ Javadoc</a>
</div>
<div class="menuitem">
<a href="api/test-framework/index.html">Test Framework Javadoc</a>
</div>
</div>
<div onclick="SwitchMenu('menu_1.3', 'skin/')" id="menu_1.3Title" class="menutitle">Resources</div>
<div id="menu_1.3" class="menuitemgroup">
<div class="menuitem">
<a href="http://www.apache.org/dyn/closer.cgi/lucene/solr/">Download</a>
</div>
<div class="menuitem">
<a href="mailing_lists.html">Mailing Lists</a>
</div>
<div class="menuitem">
<a href="issue_tracking.html">Issue Tracking</a>
</div>
<div class="menuitem">
<a href="version_control.html">Version Control</a>
</div>
</div>
<div onclick="SwitchMenu('menu_1.4', 'skin/')" id="menu_1.4Title" class="menutitle">Related Projects</div>
<div id="menu_1.4" class="menuitemgroup">
<div class="menuitem">
<a href="http://lucene.apache.org/java/">Lucene Java</a>
</div>
<div class="menuitem">
<a href="http://lucene.apache.org/nutch/">Nutch</a>
</div>
<div class="menuitem">
<a href="http://incubator.apache.org/connectors/">ManifoldCF</a>
</div>
</div>
<script type="text/javascript">
function selectProvider(form) {
provider = form.elements['searchProvider'].value;
if (provider == "any") {
if (Math.random() > 0.5) {
provider = "lucid";
} else {
provider = "sl";
}
}
if (provider == "lucid") {
form.action = "http://search.lucidimagination.com/p:solr";
} else if (provider == "sl") {
form.action = "http://search-lucene.com/solr";
}
days = 365; // cookie will be valid for a year
date = new Date();
date.setTime(date.getTime() + (days * 24 * 60 * 60 * 1000));
expires = "; expires=" + date.toGMTString();
document.cookie = "searchProvider=" + provider + expires + "; path=/";
}
if (document.cookie.length>0) {
cStart=document.cookie.indexOf("searchProvider=");
if (cStart!=-1) {
cStart=cStart + "searchProvider=".length;
cEnd=document.cookie.indexOf(";", cStart);
if (cEnd==-1) {
cEnd=document.cookie.length;
}
provider = unescape(document.cookie.substring(cStart,cEnd));
document.forms['searchform'].elements['searchProvider'].value = provider;
}
}
</script>
<div id="credit"></div>
<div id="roundbottom">
<img style="display: none" class="corner" height="15" width="15" alt="" src="skin/images/rc-b-l-15-1body-2menu-3menu.png"></div>
<!--+
|alternative credits
+-->
<div id="credit2">
<a href="http://www.packtpub.com/solr-1-4-enterprise-search-server?utm_source=http%3A%2F%2Flucene.apache.org%2Fsolr%2F&amp;utm_medium=spons&amp;utm_content=pod&amp;utm_campaign=mdb_000275"><img border="0" title="Book: Solr 1.4 Enterprise Search Server" alt="Book: Solr 1.4 Enterprise Search Server - logo" src="images/solr-book-image.jpg" style="width: 150px;height: 185px;"></a><a href="http://www.packtpub.com/solr-3-1-enterprise-search-server-cookbook/book"><img border="0" title="Book: Apache Solr 3.1 Cookbook" alt="Book: Apache Solr 3.1 Cookbook - logo" src="images/solr_31_cookbook.jpg" style="width: 150px;height: 180px;"></a><a href="http://www.lucidimagination.com/Downloads/LucidWorks-for-Solr/Reference-Guide?sc=AP"><img border="0" title="LucidWorks for Solr Certified Distribution Reference Guide" alt="LucidWorks for Solr Certified Distribution Reference Guide - logo" src="images/lucidworks_reference_guide.png" style="width: 150px;height: 185px;"></a>
</div>
</div>
<!--+
|end Menu
+-->
<!--+
|start content
+-->
<div id="content">
<div title="Portable Document Format" class="pdflink">
<a class="dida" href="features.pdf"><img alt="PDF -icon" src="skin/images/pdfdoc.gif" class="skin"><br>
PDF</a>
</div>
<h1>Introduction to The Solr Enterprise Search Server</h1>
<div id="minitoc-area">
<ul class="minitoc">
<li>
<a href="#Solr+in+a+Nutshell">Solr in a Nutshell</a>
</li>
<li>
<a href="#Solr+Uses+the+Lucene+Search+Library+and+Extends+it%21">Solr Uses the Lucene Search Library and Extends it!</a>
</li>
<li>
<a href="#Detailed+Features">Detailed Features</a>
<ul class="minitoc">
<li>
<a href="#Schema">Schema</a>
</li>
<li>
<a href="#Query">Query</a>
</li>
<li>
<a href="#Core">Core</a>
</li>
<li>
<a href="#Caching">Caching</a>
</li>
<li>
<a href="#Replication">Replication</a>
</li>
<li>
<a href="#Admin+Interface">Admin Interface</a>
</li>
</ul>
</li>
</ul>
</div>
<a name="N1000E"></a><a name="Solr+in+a+Nutshell"></a>
<h2 class="boxed">Solr in a Nutshell</h2>
<div class="section">
<p>
Solr is a standalone enterprise search server with a REST-like API. You put documents in it (called "indexing") via XML, JSON or binary over HTTP. You query it via HTTP GET and receive XML, JSON, or binary results.
</p>
<ul>
<li> Advanced Full-Text Search Capabilities </li>
<li> Optimized for High Volume Web Traffic </li>
<li> Standards Based Open Interfaces - XML,JSON and HTTP </li>
<li> Comprehensive HTML Administration Interfaces </li>
<li> Server statistics exposed over JMX for monitoring </li>
<li> Scalability - Efficient Replication to other Solr Search Servers </li>
<li> Flexible and Adaptable with XML configuration </li>
<li> Extensible Plugin Architecture </li>
</ul>
</div>
<a name="N10033"></a><a name="Solr+Uses+the+Lucene+Search+Library+and+Extends+it%21"></a>
<h2 class="boxed">Solr Uses the Lucene Search Library and Extends it!</h2>
<div class="section">
<ul>
<li> A Real Data Schema, with Numeric Types, Dynamic Fields, Unique Keys </li>
<li> Powerful Extensions to the Lucene Query Language </li>
<li> Faceted Search and Filtering </li>
<li> Geospatial Search </li>
<li> Advanced, Configurable Text Analysis </li>
<li> Highly Configurable and User Extensible Caching </li>
<li> Performance Optimizations </li>
<li> External Configuration via XML </li>
<li> An Administration Interface </li>
<li> Monitorable Logging </li>
<li> Fast Incremental Updates and Index Replication </li>
<li> Highly Scalable Distributed search with sharded index across multiple hosts </li>
<li> JSON, XML, CSV/delimited-text, and binary update formats </li>
<li> Easy ways to pull in data from databases and XML files from local disk and HTTP sources </li>
<li> Rich Document Parsing and Indexing (PDF, Word, HTML, etc) using Apache Tika </li>
<li> Apache UIMA integration for configurable metadata extraction </li>
<li> Multiple search indices </li>
</ul>
</div>
<a name="N10070"></a><a name="Detailed+Features"></a>
<h2 class="boxed">Detailed Features</h2>
<div class="section">
<a name="N10076"></a><a name="Schema"></a>
<h3 class="boxed">Schema</h3>
<ul>
<li>Defines the field types and fields of documents</li>
<li>Can drive more intelligent processing</li>
<li>Declarative Lucene Analyzer specification</li>
<li>Dynamic Fields enables on-the-fly addition of new fields</li>
<li>CopyField functionality allows indexing a single field multiple ways, or combining multiple fields into a single searchable field</li>
<li>Explicit types eliminates the need for guessing types of fields</li>
<li>External file-based configuration of stopword lists, synonym lists, and protected word lists</li>
<li>Many additional text analysis components including word splitting, regex and sounds-like filters</li>
</ul>
<a name="N10097"></a><a name="Query"></a>
<h3 class="boxed">Query</h3>
<ul>
<li>HTTP interface with configurable response formats (XML/XSLT, JSON, Python, Ruby, PHP, Velocity, binary)</li>
<li>Sort by any number of fields, and by complex functions of numeric fields</li>
<li>Advanced DisMax query parser for high relevancy results from user-entered queries</li>
<li>Highlighted context snippets</li>
<li>Faceted Searching based on unique field values, explicit queries, date ranges, and numeric ranges</li>
<li>Multi-Select Faceting by tagging and selectively excluding filters</li>
<li>Spelling suggestions for user queries</li>
<li>More Like This suggestions for given document</li>
<li>Function Query - influence the score by user specified complex functions of
numeric fields or query relevancy scores.</li>
<li>Range filter over Function Query results</li>
<li>Date Math - specify dates relative to "NOW" in queries and updates</li>
<li>Dynamic search results clustering using Carrot2</li>
<li>Numeric field statistics such as min, max, average, standard deviation </li>
<li>Combine queries derived from different syntaxes</li>
<li>Auto-suggest functionality for completing user queries</li>
<li>Allow configuration of top results for a query, overriding normal scoring and sorting</li>
<li>Performance Optimizations</li>
</ul>
<a name="N100D3"></a><a name="Core"></a>
<h3 class="boxed">Core</h3>
<ul>
<li>Dynamically create and delete document collections without restarting</li>
<li>Pluggable query handlers and extensible XML data format</li>
<li>Pluggable user functions for Function Query</li>
<li>Customizable component based request handler with distributed search support</li>
<li>Document uniqueness enforcement based on unique key field</li>
<li>Duplicate document detection, including fuzzy near duplicates</li>
<li>Custom index processing chains, allowing document manipulation before indexing</li>
<li>User configurable commands triggered on index changes</li>
<li>Ability to control where docs with the sort field missing will be placed</li>
<li>"Luke" request handler for corpus information</li>
</ul>
<a name="N100FA"></a><a name="Caching"></a>
<h3 class="boxed">Caching</h3>
<ul>
<li>Configurable Query Result, Filter, and Document cache instances</li>
<li>Pluggable Cache implementations, including a lock free, high concurrency implementation</li>
<li>Cache warming in background
<ul>
<li> When a new searcher is opened, configurable searches are run against
it in order to warm it up to avoid
slow first hits. During warming, the current searcher handles live requests.
</li>
</ul>
</li>
<li>Autowarming in background
<ul>
<li>The most recently accessed items in the caches of the current
searcher are re-populated in the new searcher, enabling high cache hit
rates across index/searcher changes.</li>
</ul>
</li>
<li>Fast/small filter implementation</li>
<li>User level caching with autowarming support</li>
</ul>
<a name="N1011F"></a><a name="Replication"></a>
<h3 class="boxed">Replication</h3>
<ul>
<li>Efficient distribution of index parts that have changed</li>
<li>Pull strategy allows for easy addition of searchers</li>
<li>Configurable distribution interval allows tradeoff between timeliness and cache utilization</li>
<li>Replication and automatic reloading of configuration files</li>
</ul>
<a name="N10134"></a><a name="Admin+Interface"></a>
<h3 class="boxed">Admin Interface</h3>
<ul>
<li>Comprehensive statistics on cache utilization, updates, and queries</li>
<li>Interactive schema browser that includes index statistics</li>
<li>Replication monitoring</li>
<li>Full logging control</li>
<li>Text analysis debugger, showing result of every stage in an analyzer</li>
<li>Web Query Interface w/ debugging output
<ul>
<li>parsed query output</li>
<li>Lucene explain() document score detailing</li>
<li>explain score for documents outside of the requested range to debug why a given document wasn't ranked higher.</li>
</ul>
</li>
</ul>
</div>
</div>
<!--+
|end content
+-->
<div class="clearboth">&nbsp;</div>
</div>
<div id="footer">
<!--+
|start bottomstrip
+-->
<div class="lastmodified">
<script type="text/javascript"><!--
document.write("Last Published: " + document.lastModified);
// --></script>
</div>
<div class="copyright">
Copyright &copy;
2007 <a href="http://www.apache.org/licenses/">The Apache Software Foundation.</a>
</div>
<!--+
|end bottomstrip
+-->
</div>
</body>
</html>