<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">

 <title>Dave Dash</title>
 <link href="http://davedash.com/tag/index/atom.xml" rel="self"/>
 <link href="http://davedash.com/tag/index"/>
 <updated>2010-08-29T14:12:50-07:00</updated>
 <id>http://davedash.com/</id>
 <author>
   <name>Dave Dash</name>
   <email>dd+atom1@davedash.com</email>
 </author>

 
 <entry>
   <title>The Lucene Search Index and symfony</title>
   <link href="http://davedash.com/2007/04/23/the-lucene-search-index-and-symfony/"/>
   <updated>2007-04-23T00:00:00-07:00</updated>
   <id>http://davedash.com/2007/04/23/the-lucene-search-index-and-symfony</id>
   <content type="html">&lt;p&gt;[tags]Zend, Zend Search Lucene, Search, Lucene, php, symfony, zsl, index[/tags]&lt;/p&gt;

&lt;p&gt;This article is meant to followup &lt;a href=&quot;http://spindrop.us/2007/04/10/sfzendplugin/&quot;&gt;sfZendPlugin&lt;/a&gt; where we learn a newer way of obtaining the &lt;a href=&quot;http://framework.zend.com/&quot;&gt;Zend Framework&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;In this tutorial we're going to delve into the Lucene index.  &lt;a href=&quot;http://framework.zend.com/manual/en/zend.search.html&quot;&gt;Zend Search Lucene&lt;/a&gt; relies on building a Lucene index.  This is a directory that contains files that can be indexed and queried by Lucene or other ports.  In our example we'll be creating a search for user profiles.&lt;/p&gt;

&lt;p&gt;&lt;!--more--&gt;&lt;/p&gt;

&lt;p&gt;We'll want to store in our &lt;code&gt;app.yml&lt;/code&gt; the precise location of this index file so we can refer to it in our app&lt;sup id=&quot;#fnr_lucene_index1&quot;&gt;&lt;a href=&quot;#fn_lucene_index1&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;Here's an example:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;all:
  search:
    user_index: /tmp/myapp.user.lucene.index
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Now when we need to refer to the index we can do &lt;code&gt;sfConfig::get('app_search_user_index')&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;Index Something&lt;/h3&gt;

&lt;p&gt;Let's try a user search where we can find a user by their name or email address.  It's fairly simple to accomplish, and hardly requires the use of &lt;a href=&quot;http://framework.zend.com/manual/en/zend.search.html&quot;&gt;&lt;acronym title=&quot;Zend Search Lucene&quot;&gt;ZSL&lt;/acronym&gt;&lt;/a&gt;, but by using &lt;acronym title=&quot;Zend Search Lucene&quot;&gt;ZSL&lt;/acronym&gt; we can easily extend it to do a full-text search of a user's profile or any other textual data.&lt;/p&gt;

&lt;p&gt;Each &quot;thing&quot; stored in the index is a Lucene &quot;document&quot;.  Each document then consists of several &quot;fields&quot; (&lt;code&gt;Zend_Search_Lucene_Field&lt;/code&gt; objects).  In our example, each document will be an individual user and the fields will be relevant attributes of the user (username, first name, last name, email, the text of their profile).&lt;/p&gt;

&lt;p&gt;Initially we'll want to populate our index.  We may also want to regularly reindex all the users at once to optimize the search performance.  Since reindexing involves multiple users it would make sense to have a static &lt;code&gt;reindex&lt;/code&gt; method in our &lt;code&gt;UserPeer&lt;/code&gt; class&lt;sup id=&quot;#fnr_lucene_index2&quot;&gt;&lt;a href=&quot;#fn_fn_lucene_index2&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;div&gt;&lt;textarea name=&quot;code&quot; class=&quot;php&quot;&gt;
class UserPeer extends BaseUserPeer
{
    public static function reindex()
    {
        $index = Zend_Search_Lucene::create(sfConfig::get('app_search_user_index'));

        $user = UserPeer::doSelect(new Criteria());
        foreach ($users AS $user)
        {
            $index-&gt;addDocument($user-&gt;generateZSLDocument());
        }

        return $index;
    }
}
&lt;/textarea&gt;&lt;/div&gt;


&lt;p&gt;Very simply, we're creating a new index, getting all the users, adding a document to the index and then committing the index (to disk).  You might have noticed that there's a strange function, &lt;code&gt;User::generateZSLDocument()&lt;/code&gt;.  This function contains all the magic.  In order to not repeat ourselves we keep the internals of making a document for the Lucene index in the &lt;code&gt;User&lt;/code&gt; class itself.  Let's look at it:&lt;/p&gt;

&lt;div&gt;&lt;textarea name=&quot;code&quot; class=&quot;php&quot;&gt;
    public function generateZSLDocument()
    {
        $doc = new Zend_Search_Lucene_Document();
        $doc-&gt;addField(Zend_Search_Lucene_Field::Keyword('uid', $this-&gt;getId()));
        $doc-&gt;addField(Zend_Search_Lucene_Field::Keyword('username', $this-&gt;getUsername()));
        $doc-&gt;addField(Zend_Search_Lucene_Field::Keyword('email', $this-&gt;getEmail()));
        $doc-&gt;addField(Zend_Search_Lucene_Field::Text('firstname', $this-&gt;getFirstname()));
        $doc-&gt;addField(Zend_Search_Lucene_Field::Text('lastname', $this-&gt;getLastname()));
        /* An unstored contents field as an aggregate 
          * of all data is no longer needed in *ZEND* Lucene 
          * But it's here.
          */
        $doc-&gt;addField(Zend_Search_Lucene_Field::Unstored('contents', implode(' ', array($this-&gt;getEmail(), $this-&gt;getFirstname(), $this-&gt;getLastname(), $this-&gt;getUsername())));
        return $doc;
    }
&lt;/textarea&gt;&lt;/div&gt;


&lt;p&gt;We're really just dumping the relevant search terms into this document.  The beauty of keeping this code internalized in the &lt;code&gt;User&lt;/code&gt; class is we can reuse it later if we need to index a single &lt;code&gt;User&lt;/code&gt; at a time.&lt;/p&gt;

&lt;p&gt;A couple things to note.  &lt;code&gt;Zend_Search_Lucene_Field::Keyword&lt;/code&gt; allows us to store data that we can lookup later.  We store the &lt;code&gt;User::id&lt;/code&gt; in a field called &lt;code&gt;uid&lt;/code&gt; since &lt;code&gt;id&lt;/code&gt; is a reserved word for the index and we can't access it from &lt;a href=&quot;http://framework.zend.com/manual/en/zend.search.html&quot;&gt;Zend Search Lucene&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;In a batch script or a reindex action we can now just call &lt;code&gt;UserPeer::reindex()&lt;/code&gt; and have a working search index for our users.&lt;/p&gt;

&lt;div id=&quot;footnotes&quot;&gt;
    &lt;hr/&gt;
    &lt;ol&gt;
        &lt;li id=&quot;fn_lucene_index1&quot;&gt;Storing things in &lt;code&gt;app.yml&lt;/code&gt; is great for indexes that don't need to be searched in multiple applications. &lt;a href=&quot;#fnr_lucene_index1&quot; class=&quot;footnoteBackLink&quot;  title=&quot;Jump back to footnote 1 in the text.&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/li&gt;
        &lt;li id=&quot;fn_lucene_index2&quot;&gt;
Since we're using a Lucene index, which has an open documented structure, we aren't limited to just using Zend Search Lucene or Apache Lucene (java).  We can mix and match and read and write to the same index file.  For very large indexes (65,000+ documents), I rewrote a Java application to index all the documents at once as PHP would time out during such a task.
&lt;a href=&quot;#fnr_lucene_index2&quot; class=&quot;footnoteBackLink&quot;  title=&quot;Jump back to footnote 2 in the text.&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/li&gt;
    &lt;/ol&gt;
&lt;/div&gt;

</content>
 </entry>
 

</feed>
