<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">

 <title>Dave Dash</title>
 <link href="http://davedash.com/tag/sphinx/atom.xml" rel="self"/>
 <link href="http://davedash.com/tag/sphinx"/>
 <updated>2012-01-17T21:54:19-08:00</updated>
 <id>http://davedash.com/</id>
 <author>
   <name>Dave Dash</name>
   <email>dd+atom1@davedash.com</email>
 </author>

 
 <entry>
   <title>Faceted Search on Input</title>
   <link href="http://davedash.com/2010/10/29/faceted-search-on-input/"/>
   <updated>2010-10-29T00:00:00-07:00</updated>
   <id>http://davedash.com/2010/10/29/faceted-search-on-input</id>
   <content type="html">&lt;p&gt;So one trick with &lt;a href=&quot;http://sphinxsearch.com/&quot;&gt;Sphinx search&lt;/a&gt; is &lt;a href=&quot;http://en.wikipedia.org/wiki/Faceted_search&quot;&gt;faceted search&lt;/a&gt;.  It's somewhat
crudely implemented, by batching queries together, but does the job well.  In
the case of &lt;a href=&quot;http://input.mozilla.com/&quot;&gt;Firefox Input&lt;/a&gt; it can reduce quite a bit of queries (our
search result pages take one batched sphinx query, and one database query now
instead of 5 database queries).&lt;/p&gt;

&lt;div class=&quot;side&quot;&gt;
&lt;a href=&quot;http://www.flickr.com/photos/davedash/5126379671/&quot;
   title=&quot;Add-on Search Results for shopping :: Add-ons for Firefox&quot;&gt;
   &lt;img src=&quot;http://farm5.static.flickr.com/4041/5126379671_33b3e472d5_m.jpg&quot;
    width=&quot;240&quot; height=&quot;172&quot; alt=&quot;Add-on Search Results for shopping&quot; /&gt;&lt;/a&gt;
&lt;/div&gt;


&lt;p&gt;Faceted search is search with filters to help narrow down a result set.  I'll
give you three examples.  &lt;a href=&quot;http://addons.mozilla.org/&quot;&gt;Firefox Add-ons&lt;/a&gt; which I wrote,
&lt;a href=&quot;http://www.sittercity.com/search-sitters.html?ct=101&amp;amp;zip=95126&quot;&gt;Sitter City&lt;/a&gt; which gives you a lot of ways on narrowing down on the perfect
baby sitter and &lt;a href=&quot;http://ebay.com/&quot;&gt;ebay&lt;/a&gt; which lets your narrow down on auction items.&lt;/p&gt;

&lt;p&gt;For &lt;a href=&quot;http://input.mozilla.com/&quot;&gt;Input&lt;/a&gt; we ask for the following when we do a search:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How many opinions match the term for which we are searching taking into
account any preferences we have already specified (feeling, locale, operating
system, date range, etc).&lt;/li&gt;
&lt;li&gt;How many opinions show a positive sentiment, and how many show a negative
sentiment?&lt;/li&gt;
&lt;li&gt;What is the breakdown of languages for the opinion results.  (I.e. how many
are en-US, de, fr, etc).&lt;/li&gt;
&lt;li&gt;How many people are on Mac, Linux or Windows.&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;We can batch these four queries into a single Sphinx request.&lt;/p&gt;

&lt;p&gt;Here's &lt;a href=&quot;http://github.com/davedash/reporter/commit/348018&quot;&gt;our implementation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Having done this twice, I do recognize that there is a lot of room for making
the code a bit more reusable.  But overall it runs fairly well.&lt;/p&gt;
</content>
 </entry>
 
 <entry>
   <title>Counting Sphinx groupBy Queries</title>
   <link href="http://davedash.com/2010/10/15/counting-sphinx-groupby-queries/"/>
   <updated>2010-10-15T00:00:00-07:00</updated>
   <id>http://davedash.com/2010/10/15/counting-sphinx-groupby-queries</id>
   <content type="html">&lt;p&gt;I quickly implemented Sphinx on Input, while revisiting it, I saw that we try
to answer this type of question:&lt;/p&gt;

&lt;blockquote&gt;&lt;p&gt;Of the results displayed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How many are happy and how many are sad?&lt;/li&gt;
&lt;li&gt;How many are for Windows, Linux or Mac?&lt;/li&gt;
&lt;li&gt;How many are for English, French or Japanese&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;p&gt;Finding these involve using faceted search.  Unfortunately this is a bit
awkward to do using Sphinx.  For the first example, happy or sad you would have
to run the query like such:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Take the query, remove any filters on &lt;em&gt;happiness&lt;/em&gt; and do a group by on
happy opinions&lt;/li&gt;
&lt;li&gt;Restore any filters on happiness and run the query as normal.&lt;/li&gt;
&lt;li&gt;Return both the results, and the aggregate data from step 1.&lt;/li&gt;
&lt;/ol&gt;


&lt;p&gt;Doing the group by is easy, but you only get to know how many feelings there
are and what they were.  In our case: happy and sad.  What we really want is
how many of our original search were happy and how many were sad?&lt;/p&gt;

&lt;p&gt;I assumed something like this would work:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;sphinx.SetSelect('feeling, @count')
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&lt;code&gt;@count&lt;/code&gt; is one of those magic variables that Sphinx uses.  Unfortunately this
doesn't work.  &lt;code&gt;COUNT(*)&lt;/code&gt; doesn't work either.  Here's what did:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;sphinx.SetSelect('feeling, SUM(1) AS count')
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Not the straight forward mysqlish syntax I've come to expect from Sphinx, but
it works.&lt;/p&gt;
</content>
 </entry>
 
 <entry>
   <title>Firefox Input, powered by Sphinx</title>
   <link href="http://davedash.com/2010/07/06/firefox-input%2C-powered-by-sphinx/"/>
   <updated>2010-07-06T00:00:00-07:00</updated>
   <id>http://davedash.com/2010/07/06/firefox-input,-powered-by-sphinx</id>
   <content type="html">&lt;p&gt;Thursday, I decided to take a half-day for my sanity, but saw an email about
how Whoosh wasn't going to cut it for &lt;a href=&quot;http://aakash.doesthings.com/2010/06/25/hi-my-name-is-firefox-input/&quot;&gt;Firefox Input&lt;/a&gt;.  I was CC'd about
this and there was mention that Sphinx might be possible.&lt;/p&gt;

&lt;p&gt;Sphinx is my hammer, and everything is a nail.  So I said, let's do this.
That translated into me spending my weekend, soothing &lt;a href=&quot;/tag/baby&quot;&gt;my newborn&lt;/a&gt; and
working on Sphinx.  Luckily this was easy, since &lt;a href=&quot;https://addons.mozilla.org/en-US/firefox/&quot;&gt;AMO&lt;/a&gt; and &lt;a href=&quot;http://support.mozilla.com/en-US/kb/&quot;&gt;SUMO&lt;/a&gt;
are both running Sphinx in a similar &lt;a href=&quot;http://fredericiana.com/2010/06/23/under-the-hood-of-firefox-input/&quot;&gt;Django environment&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;In order to move quickly, I copied code from the &lt;a href=&quot;http://github.com/jbalogh/zamboni/&quot;&gt;Zamboni&lt;/a&gt; project to
&lt;a href=&quot;http://github.com/fwenzel/reporter&quot;&gt;Firefox Input&lt;/a&gt;.  Even our deployment into staging and production wasn't
done by our usual &quot;Sphinx guy&quot; in IT.  Ultimately, everything landed in place.&lt;/p&gt;

&lt;p&gt;So &lt;a href=&quot;http://input.mozilla.com/&quot;&gt;try it out&lt;/a&gt; and file bugs or let me know if searches don't go as
planned.&lt;/p&gt;
</content>
 </entry>
 
 <entry>
   <title>Alphabetical sorting in Sphinx</title>
   <link href="http://davedash.com/2010/04/21/alphabetical-sorting-in-sphinx/"/>
   <updated>2010-04-21T00:00:00-07:00</updated>
   <id>http://davedash.com/2010/04/21/alphabetical-sorting-in-sphinx</id>
   <content type="html">&lt;p&gt;Sphinx 0.9.9 is great at searching full text, but treating actual strings as attributes takes some work.&lt;/p&gt;

&lt;p&gt;Initially I employed the strategy of indexing my full text fields &lt;em&gt;and&lt;/em&gt; storing them as attributes.  E.g.:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;sql_query = SELECT name, name AS name_ord FROM documents
sql_attr_str2ordinal = name_ord
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This stores each attribute in lexical order.  Meaning if your name's are Apple, Aardvark, Button, Choco-room they would be given the ordinal 2, 1, 3, 4 respectively.&lt;/p&gt;

&lt;p&gt;However, this is case-insensitive.  So trying this approach:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;sql_query = SELECT name, UPPER(name) AS name_ord FROM documents
sql_attr_str2ordinal = name_ord
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Will allow for case-insensitive alphabetical sorting in Sphinx.&lt;/p&gt;
</content>
 </entry>
 
 <entry>
   <title>Making our tests run thrice as fast</title>
   <link href="http://davedash.com/2010/03/16/making-our-tests-run-thrice-as-fast/"/>
   <updated>2010-03-16T00:00:00-07:00</updated>
   <id>http://davedash.com/2010/03/16/making-our-tests-run-thrice-as-fast</id>
   <content type="html">&lt;p&gt;I've written a faster version of &lt;a href=&quot;http://github.com/jbalogh/test-utils/blob/c4c31905a95e59dcc8919c1030b23848ad7fbca6/test_utils/__init__.py#L57&quot;&gt;TransactionTestCase&lt;/a&gt; and packaged it with &lt;a href=&quot;http://github.com/jbalogh/test-utils&quot;&gt;test_utils&lt;/a&gt;.  It's mysql specific since it relies on &lt;code&gt;SET FOREIGN_KEY_CHECKS=0&lt;/code&gt; to flush the database.&lt;/p&gt;

&lt;p&gt;The long story...&lt;/p&gt;

&lt;!-- more--&gt;


&lt;h3&gt;Why speed matters&lt;/h3&gt;

&lt;p&gt;We're closing in on 300 tests for &lt;a href=&quot;http://github.com/jbalogh/zamboni/&quot;&gt;Zamboni&lt;/a&gt;.  As of yesterday, to run our entire test suite it would have taken approximately 5 minutes.  If you run tests before code-reviews, during a code-review, and before you push to master - you've spent about 15 minutes doing tests for a single feature or bug-fix.  We have about 5 developers, so this cycle happens many times in a work day.  In that time many sandwiches can be made and consumed.&lt;/p&gt;

&lt;p&gt;Even shortcuts, like running a subset of tests will only go so far, and ultimately we do want to validate that all our tests pass for any code-change.&lt;/p&gt;

&lt;h3&gt;Testing Sphinx search with &lt;code&gt;TransactionTestCase&lt;/code&gt;&lt;/h3&gt;

&lt;p&gt;Django recently sped up testing by running tests in a transaction.  However, this means that data never gets committed to the database and therefore external tools, like the Sphinx indexer, will never see any of that data.  So we resort to &lt;code&gt;TransactionTestCase&lt;/code&gt; which &lt;em&gt;will&lt;/em&gt; commit the data.&lt;/p&gt;

&lt;p&gt;Unfortunately &lt;code&gt;TransactionTestCase&lt;/code&gt; is painfully slow.  The accepted practice is to only use &lt;code&gt;TestCase&lt;/code&gt; if you want your tests to be fast.  So, I decided to complain to &lt;a href=&quot;http://blog.ianbicking.org/&quot;&gt;one of our new hires&lt;/a&gt; and he and I decided to tinker in mysql to figure out what was slow.  We discovered the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;delete from [table] is slow&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;truncate [table] is slow&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;... unless you &lt;code&gt;SET FOREIGN_KEY_CHECKS=0&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;So we decided we should do our own tear down.  After some tinkering with &lt;code&gt;cProfiler&lt;/code&gt; I discovered that &lt;code&gt;TransactionTestCase&lt;/code&gt; does a (slow) database &lt;code&gt;flush&lt;/code&gt; on setup for a test case.  This wouldn't do.&lt;/p&gt;

&lt;h3&gt;Making our own &lt;code&gt;TransactionTestCase&lt;/code&gt;&lt;/h3&gt;

&lt;p&gt;I decided to make our own &lt;code&gt;TransactionTestCase&lt;/code&gt; and it would just run &lt;code&gt;SET FOREIGN_KEY_CHECKS=0&lt;/code&gt; and &lt;code&gt;TRUNCATE&lt;/code&gt; on each table at tear down time.  It would also not do a &lt;code&gt;flush&lt;/code&gt; on set up.&lt;/p&gt;

&lt;p&gt;We write our tests with the idea that they clean up after themselves.  Rather than having them cleanup after the last test.  This is a requirement for us since &lt;code&gt;django-nose&lt;/code&gt; doesn't reorder tests (nor should it) and a standard &lt;code&gt;django.test.TestCase&lt;/code&gt; assumes a clean database.&lt;/p&gt;

&lt;p&gt;Looking at a single test &lt;code&gt;test_sphinx_indexer&lt;/code&gt;, using &lt;code&gt;django.test.TransactionTestCase&lt;/code&gt; took ~30 seconds.  Using our new &lt;code&gt;TransactionTestCase&lt;/code&gt; it takes ~4 seconds!&lt;/p&gt;

&lt;h3&gt;Fast tests are good&lt;/h3&gt;

&lt;p&gt;We can now run our 275 tests in ~100 seconds versus the ~300 seconds it used to take.  Furthermore, skipping our sphinx tests (which are the only tests that use &lt;code&gt;TransactionTestCase&lt;/code&gt;) only saves us ~10seconds.  That's not a lot of overhead for better coverage.&lt;/p&gt;

&lt;p&gt;This took me the better part of a day, but solving this now, means we're going to more often than not run our sphinx tests all the time rather than skip them.  Our QA team will assure you that search is probably the most regression prone part of our site, so running these tests are vital to quality.&lt;/p&gt;

&lt;p&gt;If you need to use &lt;code&gt;TransactionTestCase&lt;/code&gt; in mysql, &lt;a href=&quot;http://github.com/jbalogh/test-utils&quot;&gt;give ours a try&lt;/a&gt;.&lt;/p&gt;
</content>
 </entry>
 
 <entry>
   <title>AMO Search: Powered by Sphinx</title>
   <link href="http://davedash.com/2009/09/30/amo-search-powered-by-sphinx/"/>
   <updated>2009-09-30T00:00:00-07:00</updated>
   <id>http://davedash.com/2009/09/30/amo-search-powered-by-sphinx</id>
   <content type="html">&lt;p&gt;Last night, I gave a talk at the &lt;a href=&quot;https://wiki.mozilla.org/AddonMeetups:2009:Chicago&quot;&gt;Addons Meetup&lt;/a&gt; at Threadless HQ in Chicago on the new search engine powering &lt;a href=&quot;http://addons.mozilla.org/&quot;&gt;addons.mozilla.org&lt;/a&gt;.  I'll recap the technical portion of the talk and give a bit more details.&lt;/p&gt;

&lt;p&gt;First, I'd like to thank Harper and Threadless.  It was a great location in the greatest city in the universe.  Before and after the meetup, Harper was just an all-around great guy to hang with and the threadless headquarters was a nice hangout place for meeting people interested in addons.&lt;/p&gt;

&lt;p&gt;Shortly after my talk, our Engineering Ops team deployed the new AMO 5.1 complete with a new Sphinx powered search engine.&lt;/p&gt;

&lt;p&gt;So let's talk about search.  Note: parts of this are a rehash of my talk, so feel free to skip around.&lt;/p&gt;

&lt;!--more--&gt;


&lt;h3&gt;A bit about addons&lt;/h3&gt;

&lt;p&gt;Addons is a huge growing space.  Arguably it's Mozilla's best kept secret.  Sure readers of this blog probably know what Addons are, but ask people who aren't as web-savvy.  Most people don't know what a browser is - and it's hard to explain it to people without getting technical.&lt;/p&gt;

&lt;p&gt;We can just skip that step.  Because Addons are small things that people can easily &quot;get&quot;.&lt;/p&gt;

&lt;p&gt;&quot;It's an easy way to customize the internet when your surfing.&quot;&lt;/p&gt;

&lt;p&gt;While perhaps not technically correct, its one way of explaining it to people.  Maybe a better way is just showing people what they can do with addons.&lt;/p&gt;

&lt;p&gt;On my flight out to Chicago, I talked to a person on the plane who didn't know what a browser was, but after showing her &lt;a href=&quot;http://addons.mozilla.org/&quot;&gt;AMO&lt;/a&gt; she was really intrigued.&lt;/p&gt;

&lt;p&gt;If everyday non-technical people can realize the potential of addons, it's only a matter of time before they start knocking down the doors to AMO.&lt;/p&gt;

&lt;p&gt;So we better be prepared to handle them, and get them what they want.&lt;/p&gt;

&lt;h3&gt;The technical details of addons.mozilla.org&lt;/h3&gt;

&lt;p&gt;Everytime you open Firefox, it pings &lt;a href=&quot;http://addons.mozilla.org/&quot;&gt;AMO&lt;/a&gt; to see if there's any updates to any of the addons that happen to be installed.  Over a third of the people using Firefox have at least one addon, and Firefox is roughly 22% of the browser market.  That means roughly 7% of people opening their browsers are pinging our servers for updates.&lt;/p&gt;

&lt;p&gt;Needless to say it's a lot of traffic, and to support it we need a fair amount of hardware.  AMO is clearly the largest site in the Mozilla universe in both respects.&lt;/p&gt;

&lt;p&gt;Some stats:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;1 mySQL master&lt;/li&gt;
&lt;li&gt;4 mySQL slaves&lt;/li&gt;
&lt;li&gt;2 memached servers&lt;/li&gt;
&lt;li&gt;2 Sphinx indexer/search daemons&lt;/li&gt;
&lt;li&gt;24 Web Frontend&lt;/li&gt;
&lt;li&gt;Multiple Zeus ZXTM clusters all&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;Most of this is standard, we'll talk about Sphinx later, but Zeus is amazing.  I didn't know what Zeus was until earlier this year when I interviewed with Mozilla's VP of Engineering Operations.  All our requests get cached so much of our hits actually hit our Zeus cluster and not our web servers.&lt;/p&gt;

&lt;p&gt;To see just how amazing they are read our &lt;a href=&quot;http://blog.mozilla.com/mrz/&quot;&gt;mrz's ops blog&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;Why search matters&lt;/h3&gt;

&lt;p&gt;If you have any kind of custom content and unique meta data a custom search solution is a must.  Browsing through a site isn't going to cut it.  Browsing is dead.  Search is how you find things on a web site.  On &lt;a href=&quot;http://addons.mozilla.org/&quot;&gt;AMO&lt;/a&gt; you may see an addon that's featured somewhere, or you might want to see what's out there, but the right search query will find you the right addon in two clicks.&lt;/p&gt;

&lt;h3&gt;Improve Search&lt;/h3&gt;

&lt;p&gt;So my first job on AMO was to &lt;a href=&quot;https://bugzilla.mozilla.org/show_bug.cgi?id=498999&quot;&gt;improve addons search&lt;/a&gt;.  It was a vague request and born out of frustration with what we had.  It wasn't a problem that certain things were indexed, or unicode didn't work, or results weren't sorted.  We may have had all those problems, but as a product search needed to be replaced.&lt;/p&gt;

&lt;p&gt;To me it meant that we needed some framework that would allow developers to quickly debug and fix any future search calamities at a moments notice.&lt;/p&gt;

&lt;p&gt;So here were the goals I made for myself:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Do something that sucks less than what we’ve got&lt;/li&gt;
&lt;li&gt;Do something that makes it easier to suck less in the future&lt;/li&gt;
&lt;li&gt;Do something that’s easy to use for our operations team, web developers and most importantly, end-users&lt;/li&gt;
&lt;li&gt;Reduce strain on our databases, developers and operations teams&lt;/li&gt;
&lt;/ul&gt;


&lt;h3&gt;Complex Data&lt;/h3&gt;

&lt;p&gt;Our data set is small (we have 5,000 addons), but there's a lot of secondary meta data about the addons that we track:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Addons work in 1 or more locales (e.g. en-US, fr, de, etc)&lt;/li&gt;
&lt;li&gt;Addons are optionally platform specific (Linux, OS X, etc)&lt;/li&gt;
&lt;li&gt;Addons work with one or more products (Firefox, Thunderbird, Seamonkey, Sunbird or Fennec)&lt;/li&gt;
&lt;li&gt;Addons come in multiple flavors (extensions, themes, dictionaries and more)&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;We want to index all this data.  Unfortunately to get at much of this data it involves either numerous queries, or numerous joins which put a strain on mysql.  How much strain?&lt;/p&gt;

&lt;p&gt;At peak we get about 10 search queries per second.  If we do something smarter this won't have to cause a lot of strain.&lt;/p&gt;

&lt;h3&gt;Using Sphinx&lt;/h3&gt;

&lt;p&gt;Sphinx is an open source search indexer and daemon.  It's used by Craigslist, the Pirate Bay and &lt;a href=&quot;http://support.mozilla.com&quot;&gt;Mozilla Support&lt;/a&gt;.  It was very easy to use and despite a complicated set of data and business logic, Sphinx was up to the task.&lt;/p&gt;

&lt;h3&gt;The challenges&lt;/h3&gt;

&lt;p&gt;We needed to search for addons in several languages.  So indexing just addons wouldn't work, we need to make sure we have every translation of every addon indexed.  For those counting, we have 5,000 addons, but 18,000 translations of addons.&lt;/p&gt;

&lt;p&gt;All the joining and filtering that needed to be done for our old search still needs to be done, but we can do this all in one shot by using a mysql view.  This view is a flat list of each translated addon as well as all meta data associated with it.  This then gets fed into the sphinx indexer.&lt;/p&gt;

&lt;p&gt;Along the way we ran into some issues which used to be dealt with outside of mysql, such as comparing versions.  It was gross and quite a hack, so we turned the variety of &lt;a href=&quot;http://spindrop.us/2009/08/07/v-is-for-version-hell/&quot;&gt;acceptable version strings into integers&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;We also learned that stemming wasn't a good idea as we assumed it would be.  Stemming was great for searching through lots of text, but a great deal of addon searches were really just searches for product names, so we opted for substring searches.  We'll see how that fares.  There is probably room for improvement.&lt;/p&gt;

&lt;p&gt;Much of this, however involved knowing our data, and knowing how it will be used by our users.  Once we got that down, we could hammer it all out using Sphinx.&lt;/p&gt;

&lt;h3&gt;Wins&lt;/h3&gt;

&lt;p&gt;So Sphinx gains us a bit architecturally.  We have a complicated query, but it only gets run once every 5 minutes versus the 180,000 times it was run &quot;on demand.&quot;&lt;/p&gt;

&lt;p&gt;Indexing happens rather quickly, just over a minute.&lt;/p&gt;

&lt;p&gt;The API was a breeze to work with, and was easy to drop into our own codebase.&lt;/p&gt;

&lt;p&gt;Because of our relatively small data set, and quick indexing, we're able to scale this simply by cloning and load balancing.  Meaning, we just need to scale for traffic, but addon growth (which is slower than traffic growth) we can safely not worry about for a while.&lt;/p&gt;

&lt;p&gt;Our ops team can monitor the sphinx clusters and just deploy additional nodes as needed.&lt;/p&gt;

&lt;h3&gt;Building a platform&lt;/h3&gt;

&lt;p&gt;What we've done is built a foundation for search.  Not all the problems are gone, but a lot of the problems that our QA team finds are able to be resolved quickly.  We have a nice pile of unit tests as well that help us keep our results in check when we start tweaking dials.&lt;/p&gt;

&lt;p&gt;We even have the groundwork for some nifty advanced search syntax, that hopefully we can inject into future releases of AMO.&lt;/p&gt;

&lt;p&gt;Enjoy.  And if you find anything, &lt;a href=&quot;http://bit.ly/search-bugs&quot;&gt;let me know&lt;/a&gt;.&lt;/p&gt;
</content>
 </entry>
 
 <entry>
   <title>V is for Version Hell</title>
   <link href="http://davedash.com/2009/08/07/v-is-for-version-hell/"/>
   <updated>2009-08-07T00:00:00-07:00</updated>
   <id>http://davedash.com/2009/08/07/v-is-for-version-hell</id>
   <content type="html">&lt;p&gt;Versioning is quite difficult to deal with.  Versions are nearly-numbers, but
you can't quite sort them using standard numerical algorithms.&lt;/p&gt;

&lt;p&gt;While the following is true:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;1.1 &amp;lt; 1.2
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The following is also true:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;1.2 &amp;lt; 1.18 &amp;lt; 1.20
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The &quot;.&quot; is not a decimal point but a separator.&lt;/p&gt;

&lt;p&gt;Mozilla uses a modestly complicated &lt;a href=&quot;https://developer.mozilla.org/en/Toolkit_version_format&quot;&gt;versioning system&lt;/a&gt; that involves stars,
plusses, and sometimes &quot;x&quot;.&lt;/p&gt;

&lt;p&gt;I found a very convoluted way to translate these versions into large integers.
The versions for applications in the AMO database have four parts at most, they
are potentially alpha or beta and potentially a pre-release.  In some cases we
have multiple versions represented with &lt;code&gt;.*&lt;/code&gt;, &lt;code&gt;.x&lt;/code&gt; or &lt;code&gt;+&lt;/code&gt; at the end.&lt;/p&gt;

&lt;!--more--&gt;


&lt;p&gt;The &lt;a href=&quot;https://developer.mozilla.org/en/Toolkit_version_format&quot;&gt;Toolkit docs&lt;/a&gt; let us translate &quot;+&quot; to mean &quot;pre-release of the next
version&quot;.  E.g. 1.0+ is 1.1pre0.  Since my primary purpose of all this is for
sorting, &lt;code&gt;.*&lt;/code&gt; and &lt;code&gt;.+&lt;/code&gt; may as well just be a very large &quot;version part.&quot;  Since
all the version parts I deal with are a maximum of 2-digits, I turned &lt;code&gt;.*&lt;/code&gt; and
&lt;code&gt;.+&lt;/code&gt; into &lt;code&gt;.99&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;3.5+ =&amp;gt; '03'+'05'+'99' =&amp;gt; 030599
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;We also need to deal with versions that may be alpha, beta or not.  If
everything else is equal:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;3.5a &amp;lt; 3.5a5 &amp;lt; 3.5b &amp;lt; 3.5b2 &amp;lt; 3.5 &amp;lt; 3.5+
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;We assign a single integer to represent a version's &quot;non-alphaness&quot;:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;a =&amp;gt; 0
b =&amp;gt; 1
non alpha/beta =&amp;gt; 2
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;We assume that &lt;code&gt;3.5a = 3.5a1&lt;/code&gt;.  Therefore:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;'3.5a =&amp;gt; 3.5.0a1 =&amp;gt; '03'+'05'+'00'+'0'+'01' =&amp;gt; 030500001
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Similarly if it's a pre-release we assign a 0 or 1 to represent
&quot;non-pre-releaseness&quot;:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;'3.5a pre2 =&amp;gt; 3.5.0a1pre2
=&amp;gt; '03'+'05'+'00'+'0'+'01'+'0'+'02
=&amp;gt; 030500001002
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;So what does this get us?  Integers which we can use for comparison, sorting,
etc.  It's a one time calculation for each version and we can do some nice SQL
statements in AMO like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;mysql&amp;gt; SELECT version,version_int FROM appversions WHERE application_id = 1 ORDER BY version_int LIMIT 15;
+---------+--------------+
| version | version_int  |
+---------+--------------+
| 0.3     |  30000200100 |
| 0.6     |  60000200100 |
| 0.7     |  70000200100 |
| 0.7+    |  80000200000 |
| 0.8     |  80000200100 |
| 0.8+    |  90000200000 |
| 0.9     |  90000200100 |
| 0.9.0+  |  90100200000 |
| 0.9.1+  |  90200200000 |
| 0.9.2+  |  90300200000 |
| 0.9.3   |  90300200100 |
| 0.9.3+  |  90400200000 |
| 0.9.x   |  99900200100 |
| 0.9+    | 100000200000 |
| 0.10    | 100000200100 |
+---------+--------------+
15 rows in set (0.00 sec)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;I can now index these integers using Sphinx and do some very easy searches for
addons based on version number.&lt;/p&gt;
</content>
 </entry>
 

</feed>

