| 1 | CLucene README
|
|---|
| 2 | ==============
|
|---|
| 3 |
|
|---|
| 4 | ------------------------------------------------------
|
|---|
| 5 | CLucene is a C++ port of Lucene.
|
|---|
| 6 | It is a high-performance, full-featured text search
|
|---|
| 7 | engine written in C++. CLucene is faster than lucene
|
|---|
| 8 | as it is written in C++.
|
|---|
| 9 | ------------------------------------------------------
|
|---|
| 10 |
|
|---|
| 11 | CLucene has contributions from many, see AUTHORS
|
|---|
| 12 |
|
|---|
| 13 | CLucene is distributed under the GNU Lesser General Public License (LGPL)
|
|---|
| 14 | *or*
|
|---|
| 15 | the Apache License, Version 2.0
|
|---|
| 16 | See the LGPL.license and APACHE.license for the respective license information.
|
|---|
| 17 | Read COPYING for more about the license.
|
|---|
| 18 |
|
|---|
| 19 | Installation
|
|---|
| 20 | ------------
|
|---|
| 21 | * For Linux, MacOSX, cygwin and MinGW build information, read INSTALL.
|
|---|
| 22 | * Boost.Jam files are provided in the root directory and subdirectories.
|
|---|
| 23 | * Microsoft Visual Studio (6&7) are provided in the win32 folder.
|
|---|
| 24 |
|
|---|
| 25 | Mailing List
|
|---|
| 26 | ------------
|
|---|
| 27 | Questions and discussion should be directed to the CLucene mailing list
|
|---|
| 28 | at clucene-developers@lists.sourceforge.net
|
|---|
| 29 | Find subscription instructions at
|
|---|
| 30 | http://lists.sourceforge.net/lists/listinfo/clucene-developers
|
|---|
| 31 | Suggestions and bug reports can be made on our bug tracking database
|
|---|
| 32 | (http://sourceforge.net/tracker/?group_id=80013&atid=558446)
|
|---|
| 33 |
|
|---|
| 34 | The latest version
|
|---|
| 35 | ------------------
|
|---|
| 36 | Details of the latest version can be found on the CLucene sourceforge project
|
|---|
| 37 | web site: http://www.sourceforge.net/projects/clucene
|
|---|
| 38 |
|
|---|
| 39 | Documentation
|
|---|
| 40 | -------------
|
|---|
| 41 | Documentation is provided at http://clucene.sourceforge.net/doc/doxygen/html/
|
|---|
| 42 | You can also build your own documentation by running doxygen from the root directory
|
|---|
| 43 | of clucene.
|
|---|
| 44 | CLucene is a very close port of Java Lucene, so you can also try looking at the
|
|---|
| 45 | Java Docs on http://lucene.apache.org/java/
|
|---|
| 46 |
|
|---|
| 47 |
|
|---|
| 48 | Performance
|
|---|
| 49 | -----------
|
|---|
| 50 | Very little benchmarking has been done on clucene. Andi Vajda posted some
|
|---|
| 51 | limited statistics on the clucene list a while ago with the following results.
|
|---|
| 52 |
|
|---|
| 53 | There are 250 HTML files under $JAVA_HOME/docs/api/java/util for about
|
|---|
| 54 | 6108kb of HTML text.
|
|---|
| 55 | org.apache.lucene.demo.IndexFiles with java and gcj:
|
|---|
| 56 | on mac os x 10.3.1 (panther) powerbook g4 1ghz 1gb:
|
|---|
| 57 | . running with java 1.4.1_01-99 : 20379 ms
|
|---|
| 58 | . running with gcj 3.3.2 -O2 : 17842 ms
|
|---|
| 59 | . running clucene 0.8.9's demo : 9930 ms
|
|---|
| 60 |
|
|---|
| 61 | I recently did some more tests and came up with these rough tests:
|
|---|
| 62 | 663mb (797 files) of Guttenberg texts
|
|---|
| 63 | on a Pentium 4 running Windows XP with 1 GB of RAM. Indexing max 100,000 fields
|
|---|
| 64 | Jlucene: 646453ms. peak mem usage ~72mb, avg ~14mb ram
|
|---|
| 65 | Clucene: 232141. peak mem usage ~60, avg ~4mb ram
|
|---|
| 66 |
|
|---|
| 67 | Searching indexing using 10,000 single word queries
|
|---|
| 68 | Jlucene: ~60078ms and used ~13mb ram
|
|---|
| 69 | Clucene: ~48359ms and used ~4.2mb ram
|
|---|
| 70 |
|
|---|
| 71 | Platform notes
|
|---|
| 72 | --------------
|
|---|
| 73 |
|
|---|
| 74 | 'Too many open files'
|
|---|
| 75 | Some platforms don't provide enough file handles to run CLucene properly.
|
|---|
| 76 | To solve this, increase the open file limit:
|
|---|
| 77 |
|
|---|
| 78 | On Solaris:
|
|---|
| 79 | ulimit -n 1024
|
|---|
| 80 | set rlim_fd_cur=1024
|
|---|
| 81 |
|
|---|
| 82 | Acknowledgments
|
|---|
| 83 | ----------------
|
|---|
| 84 |
|
|---|
| 85 | The Apache Lucene project is the basis for this software, so the biggest
|
|---|
| 86 | acknoledgment goes to that project.
|
|---|
| 87 |
|
|---|
| 88 | We wish to acknowledge the following copyrighted works that
|
|---|
| 89 | make up portions of the CLucene software:
|
|---|
| 90 |
|
|---|
| 91 | CLucene relies heavily on the use of autoconf and libtool to provide
|
|---|
| 92 | a build environment.
|
|---|