About
Resources
Plans
Download
Jakarta
|
Performance Benchmarks
|
The purpose of these user-submitted performance figures is to
give current and potential users of Lucene a sense
of how well Lucene scales. If the requirements for an upcoming
project is similar to an existing benchmark, you
will also have something to work with when designing the system
architecture for the application.
If you've conducted performance tests with Lucene, we'd
appreciate if you can submit these figures for display
on this page. Post these figures to the lucene-user mailing list
using this
template.
|
|
User-submitted Benchmarks
|
These benchmarks have been kindly submitted by Lucene users for
reference purposes.
We make NO guarantees regarding their accuracy or
validity.
We strongly recommend you conduct your own
performance benchmarks before deciding on a particular
hardware/software setup (and hopefully submit
these figures to us).
Hamish Carpenter's benchmarks
|
Hardware Environment
- Dedicated machine for indexing: yes
- CPU: Intel x86 P4 1.5Ghz
- RAM: 512 DDR
- Drive configuration: IDE 7200rpm Raid-1
Software environment
- Java Version: 1.3.1 IBM JITC Enabled
- Java VM:
- OS Version: Debian Linux 2.4.18-686
- Location of index: local
Lucene indexing variables
- Number of source documents: Random generator. Set
to make 1M documents
in 2x500,000 batches.
- Total filesize of source documents: > 1GB if
stored
- Average filesize of source documents: 1KB
- Source documents storage location: Filesystem
- File type of source documents: Generated
- Parser(s) used, if any:
- Analyzer(s) used: Default
- Number of fields per document: 11
- Type of fields: 1 date, 1 id, 9 text
- Index persistence: FSDirectory
Figures
- Time taken (in ms/s as an average of at least 3
indexing runs):
- Time taken / 1000 docs indexed: 49 seconds
- Memory consumption:
Notes
- Notes:
A windows client ran a random document generator which
created
documents based on some arrays of values and an excerpt
(approx 1kb)
from a text file of the bible (King James version).
These were submitted via a socket connection (open throughout
indexing process).
The index writer was not closed between index calls.
This created a 400Mb index in 23 files (after
optimization).
Query details:
Set up a threaded class to start x number of simultaneous
threads to
search the above created index.
Query: +Domain:sos +(+((Name:goo*^2.0 Name:plan*^2.0)
(Teaser:goo* Tea
ser:plan*) (Details:goo* Details:plan*)) -Cancel:y)
+DisplayStartDate:[mkwsw2jk0
-mq3dj1uq0] +EndDate:[mq3dj1uq0-ntlxuggw0]
This query counted 34000 documents and I limited the returned
documents
to 5.
This is using Peter Halacsy's IndexSearcherCache slightly
modified to
be a singleton returned cached searchers for a given
directory. This
solved an initial problem with too many files open and
running out of
linux handles for them.
Threads|Avg Time per query (ms)
1 1009ms
2 2043ms
3 3087ms
4 4045ms
.. .
.. .
10 10091ms
I removed the two date range terms from the query and it made
a HUGE
difference in performance. With 4 threads the avg time
dropped to 900ms!
Other query optimizations made little difference.
Hamish can be contacted at hamish at catalyst.net.nz.
|
|
Justin Greene's benchmarks
|
Hardware Environment
- Dedicated machine for indexing: No, but nominal
usage at time of indexing.
- CPU: Compaq Proliant 1850R/600 2 X pIII 600
- RAM: 1GB, 256MB allocated to JVM.
- Drive configuration: RAID 5 on Fibre Channel
Array
Software environment
- Java Version: 1.3.1_06
- Java VM:
- OS Version: Winnt 4/Sp6
- Location of index: local
Lucene indexing variables
- Number of source documents: about 60K
- Total filesize of source documents: 6.5GB
- Average filesize of source documents: 100K
(6.5GB/60K documents)
- Source documents storage location: filesystem on
NTFS
- File type of source documents:
- Parser(s) used, if any: Currently the only parser
used is the Quiotix html
parser.
- Analyzer(s) used: SimpleAnalyzer
- Number of fields per document: 8
- Type of fields: All strings, and all are stored
and indexed.
- Index persistence: FSDirectory
Figures
- Time taken (in ms/s as an average of at least 3
indexing runs): 1 hour 12 minutes, 1 hour 14 minutes and 1 hour 17
minutes. Note that the #
and size of documents changes daily.
- Time taken / 1000 docs indexed:
- Memory consumption: JVM is given 256MB and uses it
all.
Notes
- Notes:
We have 10 threads reading files from the filesystem and
parsing and
analyzing them and the pushing them onto a queue and a single
thread poping
them from the queue and indexing. Note that we are indexing
email messages
and are storing the entire plaintext in of the message in the
index. If the
message contains attachment and we do not have a filter for
the attachment
(ie. we do not do PDFs yet), we discard the data.
Justin can be contacted at tvxh-lw4x at spamex.com.
|
|
|
|
|