Using Zend Search Lucene in a symfony app

[tags]zend, search, lucene, zend search lucene, zsl, symfony,php[/tags]

If you’re like me you’ve probably followed the Askeet tutorial on Search in order to create a decent search engine for your web app. It’s fairly straight forward, but they hinted that when Zend Search Lucene (ZSL) is released, that might be the way to go. Well we are in luck, ZSL is available, so let’s just dive right in.

If you aren’t using symfony have a look at this article from the Zend Developer Zone. It covers just enough to get you started. If you are using symfony, just follow along and we’ll get you where you need to go.

Obtaining Zend Search Lucene

First download the Zend Framework (ZF). The Zend Framework is supposed to be fairly “easy” in terms of installation. So let’s put that to the test. Open your ZF archive. Copy Zend.php and Zend/Search to your symfony project’s library folder:

cp Zend.php $SF_PROJECT/lib              
mkdir $SF_PROJECT/lib/Zend
cp -r Zend/Search $SF_PROJECT/lib/Zend
cp Zend/Exception.php $SF_PROJECT/lib/Zend                 
chmod -R a+r $SF_PROJECT/lib/Zend*

Index Something

We’ll deviate slightly from food themed tutorials and do something generic. Let’s try a user search where we can find a user by their name or email address. It’s fairly simple to accomplish, and hardly requires the use of ZSL, but by using ZSL we can easily extend it to do a full-text search of a user’s profile or any other textual data.

Each “thing” stored in the index is a “document” in ZSL, specifically a Zend_Search_Lucene_Document. Each document then consists of several “fields” (Zend_Search_Lucene_Field objects). In our example, our document will be an individual user and the fields will be relevant attributes of the user (username, first name, last name, email, the text of their profile).

We’re going to write a general re-indexing tool. Something that will index all users.

In our userActions class let’s add the following action:

The code should be fairly easy to follow. First of all we’re requiring the necessary libraries for Lucene. The next line we are creating the index:

app_search_user_index_file is a symfony configuration that you define in your app.yml. It defines which file you want to use for your index. /tmp/lucene.user.index works for our purposes. The second parameter tells Lucene we are creating a new index.

We then loop through all the users and for each user create a document. For all the search relevant attributes that a user might have we add a field into the document. Note the last field:

By default search is made for the “contents” field. So in this example we want people to be able to type in someone’s name, email, username without having to specify what field we’re searching for.

Find those users

Finding the user’s is equally as straight-forward. We make a new action called search:

If we have a query, open the ZSL index (note that we only have one parameter here). Run the find method to find our query and store it to the $hits array. Note that our query was cleaned with strtolower, since ZSL is case sensitive.

The template takes care of the rest:

Fairly simple… but it could use some cleaning up (enjoy).

What about new users?

Regularly reindexing might be nice in terms of having an optimized search index, but its lousy if you want to be able to search the network immediately when new people join on. So why not automatically re-index each user every time they are created or everytime one of their indexed components is summoned?

This should be fairly simple by adding to the User class:

We have an attribute called $reindex. When it is false we don’t need to worry about indexes. When something significant changes, like an update to your name or email address, then we set $reindex to true. Then when we save:

We’re calling a new function called generateZSLDocument. It might look familiar:

Now, whenever a user is updated, so is our index. Additionally we can modify our reindex action:

That’s a lot easier to deal with.

…and beyond

Hope this article helps some of you jumpstart your symfony apps. Really cool, easy to implement search is here. We no longer have to stick with shoddy solutions like HT://Dig or spend time rolling our own full text search, as the symfony team diligently showed us we could. But there is a lot more ground to cover. Including optimization techniques and best practices.

Let me know what you think, and if you use this in any of your apps.