<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">

 <title>Dave Dash</title>
 <link href="http://davedash.com/tag/amo/atom.xml" rel="self"/>
 <link href="http://davedash.com/tag/amo"/>
 <updated>2012-01-17T21:54:19-08:00</updated>
 <id>http://davedash.com/</id>
 <author>
   <name>Dave Dash</name>
   <email>dd+atom1@davedash.com</email>
 </author>

 
 <entry>
   <title>Testing Redis in Django</title>
   <link href="http://davedash.com/2010/12/07/testing-redis-in-django/"/>
   <updated>2010-12-07T00:00:00-08:00</updated>
   <id>http://davedash.com/2010/12/07/testing-redis-in-django</id>
   <content type="html">&lt;p&gt;For the &lt;a href=&quot;http://addons.mozilla.org/en-US/firefox/&quot;&gt;Firefox Add-ons&lt;/a&gt; we've been using &lt;a href=&quot;http://code.google.com/p/redis/&quot;&gt;redis&lt;/a&gt; here and there mostly
for cache, but lately for a few things we'd love to persist.&lt;/p&gt;

&lt;p&gt;Unfortunately relying on redis does mean we need to be able to test it.  Since
redis touches some of our core components of the site, we can't just raise a
&lt;code&gt;SkipTest&lt;/code&gt; like we would for Sphinx search related tests.  I also don't want to
rely on our developers to have redis installed in order to run the
test-suite.&lt;/p&gt;

&lt;p&gt;So I built a simple &lt;a href=&quot;https://github.com/mozilla/nuggets/blob/master/redisutils.py#L47&quot;&gt;Mock Redis client&lt;/a&gt;.  It's part of our
&lt;code&gt;redisutils.py&lt;/code&gt; that handles connections to redis.  If a test's &lt;code&gt;setUp&lt;/code&gt; method
calls &lt;code&gt;mock_redis&lt;/code&gt; you'll get this phony object that can do a few minimal
redis-like operations.&lt;/p&gt;

&lt;p&gt;It works great for our specific cases, but feel free to fork it and make it
better.&lt;/p&gt;

&lt;p&gt;Note: This &lt;code&gt;MockRedis&lt;/code&gt; is specifically designed to work with &lt;a href=&quot;http://www.djangoproject.com/&quot;&gt;django&lt;/a&gt;.&lt;/p&gt;
</content>
 </entry>
 
 <entry>
   <title>Pythonic string formatting in Javascript</title>
   <link href="http://davedash.com/2010/11/19/pythonic-string-formatting-in-javascript/"/>
   <updated>2010-11-19T00:00:00-08:00</updated>
   <id>http://davedash.com/2010/11/19/pythonic-string-formatting-in-javascript</id>
   <content type="html">&lt;p&gt;We do a lot of string manipulation on the &lt;a href=&quot;https://addons.mozilla.org/&quot;&gt;Firefox Addons&lt;/a&gt; site.  A lot of
it has to do with localization so one thing that comes up is being able to
format strings.  Here's a little snippet to give yourself python like string
formatting:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;javascript&quot;&gt;    &lt;span class=&quot;cm&quot;&gt;/* Python(ish) string formatting:&lt;/span&gt;
&lt;span class=&quot;cm&quot;&gt;     * &amp;gt;&amp;gt;&amp;gt; format(&amp;#39;{0}&amp;#39;, [&amp;#39;zzz&amp;#39;])&lt;/span&gt;
&lt;span class=&quot;cm&quot;&gt;     * &amp;quot;zzz&amp;quot;&lt;/span&gt;
&lt;span class=&quot;cm&quot;&gt;     * &amp;gt;&amp;gt;&amp;gt; format(&amp;#39;{x}&amp;#39;, {x: 1})&lt;/span&gt;
&lt;span class=&quot;cm&quot;&gt;     * &amp;quot;1&amp;quot;&lt;/span&gt;
&lt;span class=&quot;cm&quot;&gt;     */&lt;/span&gt;
    &lt;span class=&quot;kd&quot;&gt;function&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;format&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;args&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;kd&quot;&gt;var&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;re&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;sr&quot;&gt;/\{([^}]+)\}/g&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;replace&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;re&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;function&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;match&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;){&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;args&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;match&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;];&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;});&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;



</content>
 </entry>
 
 <entry>
   <title>Faceted Search on Input</title>
   <link href="http://davedash.com/2010/10/29/faceted-search-on-input/"/>
   <updated>2010-10-29T00:00:00-07:00</updated>
   <id>http://davedash.com/2010/10/29/faceted-search-on-input</id>
   <content type="html">&lt;p&gt;So one trick with &lt;a href=&quot;http://sphinxsearch.com/&quot;&gt;Sphinx search&lt;/a&gt; is &lt;a href=&quot;http://en.wikipedia.org/wiki/Faceted_search&quot;&gt;faceted search&lt;/a&gt;.  It's somewhat
crudely implemented, by batching queries together, but does the job well.  In
the case of &lt;a href=&quot;http://input.mozilla.com/&quot;&gt;Firefox Input&lt;/a&gt; it can reduce quite a bit of queries (our
search result pages take one batched sphinx query, and one database query now
instead of 5 database queries).&lt;/p&gt;

&lt;div class=&quot;side&quot;&gt;
&lt;a href=&quot;http://www.flickr.com/photos/davedash/5126379671/&quot;
   title=&quot;Add-on Search Results for shopping :: Add-ons for Firefox&quot;&gt;
   &lt;img src=&quot;http://farm5.static.flickr.com/4041/5126379671_33b3e472d5_m.jpg&quot;
    width=&quot;240&quot; height=&quot;172&quot; alt=&quot;Add-on Search Results for shopping&quot; /&gt;&lt;/a&gt;
&lt;/div&gt;


&lt;p&gt;Faceted search is search with filters to help narrow down a result set.  I'll
give you three examples.  &lt;a href=&quot;http://addons.mozilla.org/&quot;&gt;Firefox Add-ons&lt;/a&gt; which I wrote,
&lt;a href=&quot;http://www.sittercity.com/search-sitters.html?ct=101&amp;amp;zip=95126&quot;&gt;Sitter City&lt;/a&gt; which gives you a lot of ways on narrowing down on the perfect
baby sitter and &lt;a href=&quot;http://ebay.com/&quot;&gt;ebay&lt;/a&gt; which lets your narrow down on auction items.&lt;/p&gt;

&lt;p&gt;For &lt;a href=&quot;http://input.mozilla.com/&quot;&gt;Input&lt;/a&gt; we ask for the following when we do a search:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How many opinions match the term for which we are searching taking into
account any preferences we have already specified (feeling, locale, operating
system, date range, etc).&lt;/li&gt;
&lt;li&gt;How many opinions show a positive sentiment, and how many show a negative
sentiment?&lt;/li&gt;
&lt;li&gt;What is the breakdown of languages for the opinion results.  (I.e. how many
are en-US, de, fr, etc).&lt;/li&gt;
&lt;li&gt;How many people are on Mac, Linux or Windows.&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;We can batch these four queries into a single Sphinx request.&lt;/p&gt;

&lt;p&gt;Here's &lt;a href=&quot;http://github.com/davedash/reporter/commit/348018&quot;&gt;our implementation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Having done this twice, I do recognize that there is a lot of room for making
the code a bit more reusable.  But overall it runs fairly well.&lt;/p&gt;
</content>
 </entry>
 
 <entry>
   <title>Trimming Whitespace in Django Forms</title>
   <link href="http://davedash.com/2010/08/18/trimming-whitespace-in-django-forms/"/>
   <updated>2010-08-18T00:00:00-07:00</updated>
   <id>http://davedash.com/2010/08/18/trimming-whitespace-in-django-forms</id>
   <content type="html">&lt;p&gt;I've been using frameworks for a number of years.  So I expect a lot of things
to happen &quot;for free&quot; in Django.  One is whitespace removal.  In &lt;a href=&quot;http://delicious.com/&quot;&gt;Delicious&lt;/a&gt;
we had a lot of data in our database with leading and trailing whitespace.  On
the frontend we moved to symfony (actually ysymfony) and that prevented a lot
of this.&lt;/p&gt;

&lt;p&gt;So I was quite surprised that &lt;a href=&quot;http://code.djangoproject.com/ticket/6362&quot;&gt;this is not the case with Django&lt;/a&gt;.  So I
decided we could solve this at the form level, and released a
&lt;a href=&quot;http://github.com/mozilla/happyforms&quot;&gt;ridiculously simple library&lt;/a&gt;.  After some googling, I found that I was
&lt;a href=&quot;http://www.peterbe.com/plog/automatically-strip-whitespace-in-django-forms&quot;&gt;not the first to do this&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Feel free to use this, fork it, submit pull requests, etc.  I suspect in the
future we'll handle other global form filtering - like stripping high order
Unicode since MySQL is often not a fan.&lt;/p&gt;
</content>
 </entry>
 
 <entry>
   <title>The Perils of One Giant Fixture</title>
   <link href="http://davedash.com/2010/08/12/the-perils-of-one-giant-fixture/"/>
   <updated>2010-08-12T00:00:00-07:00</updated>
   <id>http://davedash.com/2010/08/12/the-perils-of-one-giant-fixture</id>
   <content type="html">&lt;p&gt;&lt;img src=&quot;/static/images/2010/08/12/time.jpg&quot; alt=&quot;Timing&quot; /&gt;&lt;/p&gt;

&lt;p&gt;A while back, I thought it would be good to consolidate all the data used in
testing the django-layer of &lt;a href=&quot;https://addons.mozilla.org/&quot;&gt;AMO&lt;/a&gt; into a single data fixture.
Unfortunately we have 600 tests, which were now loading and unloading large
amounts of data each time the test would run.  This made our tests take 20
minutes.&lt;/p&gt;

&lt;p&gt;I decided to cut this down quite a bit, by using smaller fixture files.  Each
fixture file attempts to be a singular primary object (e.g. an Addon or a
Collection or a User) and its associated supporting objects.  It's far from
perfect, but it's achieved tests that run in under 10 minutes.&lt;/p&gt;

&lt;p&gt;The other side effect is tests will be simpler.  They'll only include the
addons needed to generate an effect, and if something can't be done easily with
the fixtures in place, we can always alter the data during the test.&lt;/p&gt;
</content>
 </entry>
 
 <entry>
   <title>Django: Model Inheritance or Related Tables wrt AMO</title>
   <link href="http://davedash.com/2009/12/15/django-model-inheritance-or-related-tables-wrt-amo/"/>
   <updated>2009-12-15T00:00:00-08:00</updated>
   <id>http://davedash.com/2009/12/15/django-model-inheritance-or-related-tables-wrt-amo</id>
   <content type="html">&lt;p&gt;When I attended DjangoCon this year, I lamented that our flagship web property was difficult to test, and not fun to develop.  I figured DjangoCon was a way to placate me, and Django might mean something for some of the smaller projects at Mozilla.  However, Wil Clouser, our lead web developer, &lt;a href=&quot;http://micropipes.com/blog/2009/11/17/amo-development-changes-in-2010/&quot;&gt;announced development changes&lt;/a&gt; for &lt;a href=&quot;http://addons.mozilla.org&quot;&gt;addons.mozilla.org&lt;/a&gt; (AMO) that says we'll be moving to Django.&lt;/p&gt;

&lt;p&gt;Wil was open to Django and knew that's what we in the dev team wanted.  Jeff spawned our foray into a new AMO with &lt;a href=&quot;http://github.com/jbalogh/zamboni&quot;&gt;Zamboni&lt;/a&gt;.  I've been working on some grunt-work tasks inside and outside of Django.&lt;/p&gt;

&lt;p&gt;One of those tasks is building a transparent layer in Django to keep users logged in from our PHP-based site.  That kind of problem almost immediately forces you to ask one of the most fundamental questions you ask when using any framework:&lt;/p&gt;

&lt;blockquote&gt;&lt;p&gt;How much do I change my app, in order to accommodate the framework?&lt;/p&gt;&lt;/blockquote&gt;

&lt;!--more--&gt;


&lt;p&gt;More specifically:&lt;/p&gt;

&lt;blockquote&gt;&lt;p&gt;Should I use the &lt;code&gt;django.contrib.auth&lt;/code&gt; User module, and to what extent?&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;The more we looked into what features of Django we might want to use, &lt;code&gt;django.contrib.auth&lt;/code&gt; was heavily tied into other things we wanted, so it made sense for us to use it.  The next question is whether we try the &lt;a href=&quot;http://scottbarnham.com/blog/2008/08/21/extending-the-django-user-model-with-inheritance/&quot;&gt;inheritance approach&lt;/a&gt; or do we treat our legacy users table as a sort of User Profile and utilize the User module using the &lt;a href=&quot;http://www.b-list.org/weblog/2007/feb/20/about-model-subclassing/&quot;&gt;related table approach&lt;/a&gt;?&lt;/p&gt;

&lt;p&gt;Using model-inheritance seems real nice, because we can pretend that our legacy user is the same thing as a &lt;code&gt;djaango.contrib.auth&lt;/code&gt; User - but this isn't true:&lt;/p&gt;

&lt;p&gt;Looking at our &lt;code&gt;users&lt;/code&gt; table more closely:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;mysql&amp;gt; explain users;
+-------------------------+---------------------+------+-----+---------------------+----------------+
| Field                   | Type                | Null | Key | Default             | Extra          |
+-------------------------+---------------------+------+-----+---------------------+----------------+
| id                      | int(11) unsigned    | NO   | PRI | NULL                | auto_increment |
| email                   | varchar(255)        | YES  | UNI | NULL                |                |
| password                | varchar(255)        | NO   |     |                     |                |
| firstname               | varchar(255)        | NO   |     |                     |                |
| lastname                | varchar(255)        | NO   |     |                     |                |
| nickname                | varchar(255)        | YES  | MUL | NULL                |                |
| bio                     | int(11) unsigned    | YES  | MUL | NULL                |                |
| emailhidden             | tinyint(1) unsigned | NO   |     | 0                   |                |
| sandboxshown            | tinyint(1) unsigned | NO   |     | 0                   |                |
| homepage                | varchar(255)        | YES  |     | NULL                |                |
| display_collections     | tinyint(1) unsigned | NO   |     | 0                   |                |
| display_collections_fav | tinyint(1) unsigned | NO   |     | 0                   |                |
| confirmationcode        | varchar(255)        | NO   |     |                     |                |
| resetcode               | varchar(255)        | NO   |     |                     |                |
| resetcode_expires       | datetime            | NO   |     | 0000-00-00 00:00:00 |                |
| notifycompat            | tinyint(1) unsigned | NO   | MUL | 1                   |                |
| notifyevents            | tinyint(1) unsigned | NO   | MUL | 1                   |                |
| deleted                 | tinyint(1)          | YES  |     | 0                   |                |
| created                 | datetime            | NO   | MUL | 0000-00-00 00:00:00 |                |
| modified                | datetime            | NO   |     | 0000-00-00 00:00:00 |                |
| notes                   | text                | YES  |     | NULL                |                |
| location                | varchar(255)        | NO   |     |                     |                |
| occupation              | varchar(255)        | NO   |     |                     |                |
| picture_type            | varchar(25)         | NO   |     |                     |                |
| averagerating           | varchar(255)        | YES  |     | NULL                |                |
+-------------------------+---------------------+------+-----+---------------------+----------------+
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;You can very easily argue that this is a profile table, which happens to have credential information thrown in.&lt;/p&gt;

&lt;p&gt;I can see overtime, I'll just struggle to keep our legacy User to act like a Django User, whereas a UserProfile is fairly standard.&lt;/p&gt;

&lt;p&gt;Had I been writing this app from scratch, I would have chosen the UserProfile route.  This is extra data which takes up a lot of space, and changes far more often than user credentials.  Changing 4M+ rows sucks, by making users our UserProfile table, any changes to that table, don't tie up the table used for sign-ins.&lt;/p&gt;

&lt;p&gt;I'm curious what other people who port their apps to Django have done.&lt;/p&gt;
</content>
 </entry>
 
 <entry>
   <title>AMO Search: Powered by Sphinx</title>
   <link href="http://davedash.com/2009/09/30/amo-search-powered-by-sphinx/"/>
   <updated>2009-09-30T00:00:00-07:00</updated>
   <id>http://davedash.com/2009/09/30/amo-search-powered-by-sphinx</id>
   <content type="html">&lt;p&gt;Last night, I gave a talk at the &lt;a href=&quot;https://wiki.mozilla.org/AddonMeetups:2009:Chicago&quot;&gt;Addons Meetup&lt;/a&gt; at Threadless HQ in Chicago on the new search engine powering &lt;a href=&quot;http://addons.mozilla.org/&quot;&gt;addons.mozilla.org&lt;/a&gt;.  I'll recap the technical portion of the talk and give a bit more details.&lt;/p&gt;

&lt;p&gt;First, I'd like to thank Harper and Threadless.  It was a great location in the greatest city in the universe.  Before and after the meetup, Harper was just an all-around great guy to hang with and the threadless headquarters was a nice hangout place for meeting people interested in addons.&lt;/p&gt;

&lt;p&gt;Shortly after my talk, our Engineering Ops team deployed the new AMO 5.1 complete with a new Sphinx powered search engine.&lt;/p&gt;

&lt;p&gt;So let's talk about search.  Note: parts of this are a rehash of my talk, so feel free to skip around.&lt;/p&gt;

&lt;!--more--&gt;


&lt;h3&gt;A bit about addons&lt;/h3&gt;

&lt;p&gt;Addons is a huge growing space.  Arguably it's Mozilla's best kept secret.  Sure readers of this blog probably know what Addons are, but ask people who aren't as web-savvy.  Most people don't know what a browser is - and it's hard to explain it to people without getting technical.&lt;/p&gt;

&lt;p&gt;We can just skip that step.  Because Addons are small things that people can easily &quot;get&quot;.&lt;/p&gt;

&lt;p&gt;&quot;It's an easy way to customize the internet when your surfing.&quot;&lt;/p&gt;

&lt;p&gt;While perhaps not technically correct, its one way of explaining it to people.  Maybe a better way is just showing people what they can do with addons.&lt;/p&gt;

&lt;p&gt;On my flight out to Chicago, I talked to a person on the plane who didn't know what a browser was, but after showing her &lt;a href=&quot;http://addons.mozilla.org/&quot;&gt;AMO&lt;/a&gt; she was really intrigued.&lt;/p&gt;

&lt;p&gt;If everyday non-technical people can realize the potential of addons, it's only a matter of time before they start knocking down the doors to AMO.&lt;/p&gt;

&lt;p&gt;So we better be prepared to handle them, and get them what they want.&lt;/p&gt;

&lt;h3&gt;The technical details of addons.mozilla.org&lt;/h3&gt;

&lt;p&gt;Everytime you open Firefox, it pings &lt;a href=&quot;http://addons.mozilla.org/&quot;&gt;AMO&lt;/a&gt; to see if there's any updates to any of the addons that happen to be installed.  Over a third of the people using Firefox have at least one addon, and Firefox is roughly 22% of the browser market.  That means roughly 7% of people opening their browsers are pinging our servers for updates.&lt;/p&gt;

&lt;p&gt;Needless to say it's a lot of traffic, and to support it we need a fair amount of hardware.  AMO is clearly the largest site in the Mozilla universe in both respects.&lt;/p&gt;

&lt;p&gt;Some stats:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;1 mySQL master&lt;/li&gt;
&lt;li&gt;4 mySQL slaves&lt;/li&gt;
&lt;li&gt;2 memached servers&lt;/li&gt;
&lt;li&gt;2 Sphinx indexer/search daemons&lt;/li&gt;
&lt;li&gt;24 Web Frontend&lt;/li&gt;
&lt;li&gt;Multiple Zeus ZXTM clusters all&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;Most of this is standard, we'll talk about Sphinx later, but Zeus is amazing.  I didn't know what Zeus was until earlier this year when I interviewed with Mozilla's VP of Engineering Operations.  All our requests get cached so much of our hits actually hit our Zeus cluster and not our web servers.&lt;/p&gt;

&lt;p&gt;To see just how amazing they are read our &lt;a href=&quot;http://blog.mozilla.com/mrz/&quot;&gt;mrz's ops blog&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;Why search matters&lt;/h3&gt;

&lt;p&gt;If you have any kind of custom content and unique meta data a custom search solution is a must.  Browsing through a site isn't going to cut it.  Browsing is dead.  Search is how you find things on a web site.  On &lt;a href=&quot;http://addons.mozilla.org/&quot;&gt;AMO&lt;/a&gt; you may see an addon that's featured somewhere, or you might want to see what's out there, but the right search query will find you the right addon in two clicks.&lt;/p&gt;

&lt;h3&gt;Improve Search&lt;/h3&gt;

&lt;p&gt;So my first job on AMO was to &lt;a href=&quot;https://bugzilla.mozilla.org/show_bug.cgi?id=498999&quot;&gt;improve addons search&lt;/a&gt;.  It was a vague request and born out of frustration with what we had.  It wasn't a problem that certain things were indexed, or unicode didn't work, or results weren't sorted.  We may have had all those problems, but as a product search needed to be replaced.&lt;/p&gt;

&lt;p&gt;To me it meant that we needed some framework that would allow developers to quickly debug and fix any future search calamities at a moments notice.&lt;/p&gt;

&lt;p&gt;So here were the goals I made for myself:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Do something that sucks less than what we’ve got&lt;/li&gt;
&lt;li&gt;Do something that makes it easier to suck less in the future&lt;/li&gt;
&lt;li&gt;Do something that’s easy to use for our operations team, web developers and most importantly, end-users&lt;/li&gt;
&lt;li&gt;Reduce strain on our databases, developers and operations teams&lt;/li&gt;
&lt;/ul&gt;


&lt;h3&gt;Complex Data&lt;/h3&gt;

&lt;p&gt;Our data set is small (we have 5,000 addons), but there's a lot of secondary meta data about the addons that we track:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Addons work in 1 or more locales (e.g. en-US, fr, de, etc)&lt;/li&gt;
&lt;li&gt;Addons are optionally platform specific (Linux, OS X, etc)&lt;/li&gt;
&lt;li&gt;Addons work with one or more products (Firefox, Thunderbird, Seamonkey, Sunbird or Fennec)&lt;/li&gt;
&lt;li&gt;Addons come in multiple flavors (extensions, themes, dictionaries and more)&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;We want to index all this data.  Unfortunately to get at much of this data it involves either numerous queries, or numerous joins which put a strain on mysql.  How much strain?&lt;/p&gt;

&lt;p&gt;At peak we get about 10 search queries per second.  If we do something smarter this won't have to cause a lot of strain.&lt;/p&gt;

&lt;h3&gt;Using Sphinx&lt;/h3&gt;

&lt;p&gt;Sphinx is an open source search indexer and daemon.  It's used by Craigslist, the Pirate Bay and &lt;a href=&quot;http://support.mozilla.com&quot;&gt;Mozilla Support&lt;/a&gt;.  It was very easy to use and despite a complicated set of data and business logic, Sphinx was up to the task.&lt;/p&gt;

&lt;h3&gt;The challenges&lt;/h3&gt;

&lt;p&gt;We needed to search for addons in several languages.  So indexing just addons wouldn't work, we need to make sure we have every translation of every addon indexed.  For those counting, we have 5,000 addons, but 18,000 translations of addons.&lt;/p&gt;

&lt;p&gt;All the joining and filtering that needed to be done for our old search still needs to be done, but we can do this all in one shot by using a mysql view.  This view is a flat list of each translated addon as well as all meta data associated with it.  This then gets fed into the sphinx indexer.&lt;/p&gt;

&lt;p&gt;Along the way we ran into some issues which used to be dealt with outside of mysql, such as comparing versions.  It was gross and quite a hack, so we turned the variety of &lt;a href=&quot;http://spindrop.us/2009/08/07/v-is-for-version-hell/&quot;&gt;acceptable version strings into integers&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;We also learned that stemming wasn't a good idea as we assumed it would be.  Stemming was great for searching through lots of text, but a great deal of addon searches were really just searches for product names, so we opted for substring searches.  We'll see how that fares.  There is probably room for improvement.&lt;/p&gt;

&lt;p&gt;Much of this, however involved knowing our data, and knowing how it will be used by our users.  Once we got that down, we could hammer it all out using Sphinx.&lt;/p&gt;

&lt;h3&gt;Wins&lt;/h3&gt;

&lt;p&gt;So Sphinx gains us a bit architecturally.  We have a complicated query, but it only gets run once every 5 minutes versus the 180,000 times it was run &quot;on demand.&quot;&lt;/p&gt;

&lt;p&gt;Indexing happens rather quickly, just over a minute.&lt;/p&gt;

&lt;p&gt;The API was a breeze to work with, and was easy to drop into our own codebase.&lt;/p&gt;

&lt;p&gt;Because of our relatively small data set, and quick indexing, we're able to scale this simply by cloning and load balancing.  Meaning, we just need to scale for traffic, but addon growth (which is slower than traffic growth) we can safely not worry about for a while.&lt;/p&gt;

&lt;p&gt;Our ops team can monitor the sphinx clusters and just deploy additional nodes as needed.&lt;/p&gt;

&lt;h3&gt;Building a platform&lt;/h3&gt;

&lt;p&gt;What we've done is built a foundation for search.  Not all the problems are gone, but a lot of the problems that our QA team finds are able to be resolved quickly.  We have a nice pile of unit tests as well that help us keep our results in check when we start tweaking dials.&lt;/p&gt;

&lt;p&gt;We even have the groundwork for some nifty advanced search syntax, that hopefully we can inject into future releases of AMO.&lt;/p&gt;

&lt;p&gt;Enjoy.  And if you find anything, &lt;a href=&quot;http://bit.ly/search-bugs&quot;&gt;let me know&lt;/a&gt;.&lt;/p&gt;
</content>
 </entry>
 
 <entry>
   <title>Question: Building a Better Search Engine</title>
   <link href="http://davedash.com/2009/06/18/question-building-a-better-search-engine/"/>
   <updated>2009-06-18T00:00:00-07:00</updated>
   <id>http://davedash.com/2009/06/18/question-building-a-better-search-engine</id>
   <content type="html">&lt;p&gt;So I finally have one of those jobs where I can tell people almost every little detail about what I'm doing and I'm encouraged to talk to people on the intar-webs and solicit opinions.&lt;/p&gt;

&lt;p&gt;Uh - this is more or less how I've operated at previous jobs, just now I can be overt about it.&lt;/p&gt;

&lt;p&gt;So my &lt;a href=&quot;https://bugzilla.mozilla.org/show_bug.cgi?id=498999&quot;&gt;new task&lt;/a&gt; is to work on improving the &lt;a href=&quot;http://addons.mozilla.org&quot;&gt;addons.mozilla.org&lt;/a&gt; search engine.  I've built various &quot;search engines&quot; over time in PHP, powered by Lucene and most recently in python using an inverted index.&lt;/p&gt;

&lt;p&gt;One tool that I've been looking at briefly is &lt;a href=&quot;http://sphinxsearch.com/&quot;&gt;Sphinx&lt;/a&gt;.  While my record count is low (5-10K), Sphinx basically bakes in a lot of the things I would want in a search engine.  Indexing, merging, etc.&lt;/p&gt;

&lt;p&gt;Since I'm fairly new to the add-ons team I'm still understanding the basics of what we need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Fast automated indexing of addons for Firefox, Thunderbird and any other Mozilla product&lt;/li&gt;
&lt;li&gt;Quick result sets&lt;/li&gt;
&lt;li&gt;Easy deployability&lt;/li&gt;
&lt;li&gt;Extendible&lt;/li&gt;
&lt;li&gt;Customized ranking&lt;/li&gt;
&lt;li&gt;Filtering (e.g. by Firefox version, etc).&lt;/li&gt;
&lt;li&gt;Basics: Stemming and stop-words&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;Whether it's Sphinx, Lucene or some home grown solution, I have all that to support.  But this should be fairly straight forward.  What are people's thoughts?&lt;/p&gt;
</content>
 </entry>
 

</feed>

