| 1 | CLucene README | 
|---|
| 2 | ============== | 
|---|
| 3 |  | 
|---|
| 4 | ------------------------------------------------------ | 
|---|
| 5 | CLucene is a C++ port of Lucene. | 
|---|
| 6 | It is a high-performance, full-featured text search | 
|---|
| 7 | engine written in C++. CLucene is faster than lucene | 
|---|
| 8 | as it is written in C++. | 
|---|
| 9 | ------------------------------------------------------ | 
|---|
| 10 |  | 
|---|
| 11 | CLucene has contributions from many, see AUTHORS | 
|---|
| 12 |  | 
|---|
| 13 | CLucene is distributed under the GNU Lesser General Public License (LGPL) | 
|---|
| 14 | *or* | 
|---|
| 15 | the Apache License, Version 2.0 | 
|---|
| 16 | See the LGPL.license and APACHE.license for the respective license information. | 
|---|
| 17 | Read COPYING for more about the license. | 
|---|
| 18 |  | 
|---|
| 19 | Installation | 
|---|
| 20 | ------------ | 
|---|
| 21 | * For Linux, MacOSX, cygwin and MinGW build information, read INSTALL. | 
|---|
| 22 | * Boost.Jam files are provided in the root directory and subdirectories. | 
|---|
| 23 | * Microsoft Visual Studio (6&7) are provided in the win32 folder. | 
|---|
| 24 |  | 
|---|
| 25 | Mailing List | 
|---|
| 26 | ------------ | 
|---|
| 27 | Questions and discussion should be directed to the CLucene mailing list | 
|---|
| 28 | at clucene-developers@lists.sourceforge.net | 
|---|
| 29 | Find subscription instructions at | 
|---|
| 30 | http://lists.sourceforge.net/lists/listinfo/clucene-developers | 
|---|
| 31 | Suggestions and bug reports can be made on our bug tracking database | 
|---|
| 32 | (http://sourceforge.net/tracker/?group_id=80013&atid=558446) | 
|---|
| 33 |  | 
|---|
| 34 | The latest version | 
|---|
| 35 | ------------------ | 
|---|
| 36 | Details of the latest version can be found on the CLucene sourceforge project | 
|---|
| 37 | web site: http://www.sourceforge.net/projects/clucene | 
|---|
| 38 |  | 
|---|
| 39 | Documentation | 
|---|
| 40 | ------------- | 
|---|
| 41 | Documentation is provided at http://clucene.sourceforge.net/doc/doxygen/html/ | 
|---|
| 42 | You can also build your own documentation by running doxygen from the root directory | 
|---|
| 43 | of clucene. | 
|---|
| 44 | CLucene is a very close port of Java Lucene, so you can also try looking at the | 
|---|
| 45 | Java Docs on http://lucene.apache.org/java/ | 
|---|
| 46 |  | 
|---|
| 47 |  | 
|---|
| 48 | Performance | 
|---|
| 49 | ----------- | 
|---|
| 50 | Very little benchmarking has been done on clucene. Andi Vajda posted some | 
|---|
| 51 | limited statistics on the clucene list a while ago with the following results. | 
|---|
| 52 |  | 
|---|
| 53 | There are 250 HTML files under $JAVA_HOME/docs/api/java/util for about | 
|---|
| 54 | 6108kb of HTML text. | 
|---|
| 55 | org.apache.lucene.demo.IndexFiles with java and gcj: | 
|---|
| 56 | on mac os x 10.3.1 (panther) powerbook g4 1ghz 1gb: | 
|---|
| 57 | . running with java 1.4.1_01-99 : 20379 ms | 
|---|
| 58 | . running with gcj 3.3.2 -O2    : 17842 ms | 
|---|
| 59 | . running clucene 0.8.9's demo  :  9930 ms | 
|---|
| 60 |  | 
|---|
| 61 | I recently did some more tests and came up with these rough tests: | 
|---|
| 62 | 663mb (797 files) of Guttenberg texts | 
|---|
| 63 | on a Pentium 4 running Windows XP with 1 GB of RAM. Indexing max 100,000 fields | 
|---|
| 64 |  Jlucene: 646453ms. peak mem usage ~72mb, avg ~14mb ram | 
|---|
| 65 |  Clucene: 232141. peak mem usage ~60, avg ~4mb ram | 
|---|
| 66 |  | 
|---|
| 67 | Searching indexing using 10,000 single word queries | 
|---|
| 68 |  Jlucene: ~60078ms and used ~13mb ram | 
|---|
| 69 |  Clucene: ~48359ms and used ~4.2mb ram | 
|---|
| 70 |  | 
|---|
| 71 | Platform notes | 
|---|
| 72 | -------------- | 
|---|
| 73 |  | 
|---|
| 74 | 'Too many open files' | 
|---|
| 75 | Some platforms don't provide enough file handles to run CLucene properly. | 
|---|
| 76 | To solve this, increase the open file limit: | 
|---|
| 77 |  | 
|---|
| 78 | On Solaris: | 
|---|
| 79 | ulimit -n 1024 | 
|---|
| 80 | set rlim_fd_cur=1024 | 
|---|
| 81 |  | 
|---|
| 82 | Acknowledgments | 
|---|
| 83 | ---------------- | 
|---|
| 84 |  | 
|---|
| 85 | The Apache Lucene project is the basis for this software, so the biggest | 
|---|
| 86 | acknoledgment goes to that project. | 
|---|
| 87 |  | 
|---|
| 88 | We wish to acknowledge the following copyrighted works that | 
|---|
| 89 | make up portions of the CLucene software: | 
|---|
| 90 |  | 
|---|
| 91 | CLucene relies heavily on the use of autoconf and libtool to provide | 
|---|
| 92 | a build environment. | 
|---|