Wednesday, June 6, 2007

Hibernate Search freshly baked features

I had to release Hibernate Search Beta3 early after we discovered a fairly severe bug in Beta2. But I had time to inject some new features. After those introduced in Beta2, that a fairly good week :)

batch size limit on object indexing
If you don't pay attention when initially indexing (or reindexing) your data, you may face out of memory exceptions. The old solution was to execute indexing in several smaller transactions, but the code ended up being fairly complex. Here is the new solution:
hibernate.search.worker.batch_size=5000

int batchSize=5000;
//scroll will load objects as needed
ScrollableResults results = fullTextSession.createCriteria( Email.class )
.scroll( ScrollMode.FORWARD_ONLY );
int index = 0;
while( results.next() ) {
index++;
fullTextSession.index( results.get(0) ); //index each element
if (index % batchSize == 0) s.clear(); //clear every batchSize
}
wrap that into one transaction and you are good to go.

Native Lucene
The APIs were never officially published (until beta3), but Hibernate Search lets you fall back to native Lucene when needed. All the needed APIs are held by SearchFactory.

DirectoryProvider provider = searchFactory.getDirectoryProvider(Order.class);
org.apache.lucene.store.Directory directory = provider.getDirectory();
This one is the brute force and gives you access to the Lucene Directory containing Orders. A smarter way, if you intend to execute a search query, is to use the ReaderProvider
DirectoryProvider clientProvider = searchFactory.getDirectoryProvider(Client.class);
IndexReader reader = searchFactory.getReaderProvider().openReader(clientProvider);

try {
//do read-only operations on the reader
}
finally {
readerProvider.closeReader(reader);
}
Smarter because you share the same IndexReaders as Hibernate Search, hence avoid the unnecessary IndexReader opening and warm up.

Finally you can optimize a Lucene Index (roughly a defragmentation)

SearchFactory searchFactory = fullTextSession.getSearchFactory();
searchFactory.optimize(Order.class);
//or searchFactory.optimize();

3 comments:

Anonymous said...

s/indexation/indexing/ ;)

Cheers from Paris.

Unknown said...

Ah year, I always confuse the economic notion and the data ordering one, it's fixed :)

Bloggery Bog said...

thanks, that saved my day. i constantly ran into out of heapspace exceptions :-)