Dave Dash
2021-01-03T23:49:19+00:00
http://davedash.com/
Dave Dash
dd+atom1@davedash.com
Pinternationalization
2012-06-04T00:00:00+00:00
http://davedash.com/2012/06/04/pinternationalization
<p>We <a href="http://blog.pinterest.com/post/24408983821/pinterest-en-espanol">translated our site to <strong>Spanish</strong></a> and will <strong>continue to translate it into other languages</strong> in the future.</p>
<p>One day, I came into work and they said, “You are in charge of Internationalization now.” I won’t lie, it wasn’t the most exciting news–for most developers, <em>localization is a daunting task</em>. I brought this upon myself after emailing my fellow engineers about <a href="http://playdoh.readthedocs.org/en/latest/userguide/l10n.html#good-practices">how to do localization</a>. At <a href="http://webdev.mozilla.org">Mozilla WebDev</a> localization is a core-competency. You work closely with a team of volunteer translators and you appropriately extract a variety of messages. Mozilla even built some useful tools to help with this.</p>
<p>I adapted this process for Pinterest and I’ve come to <em>enjoy</em> localization.</p>
<h2 id="how-localization-works">How localization works.</h2>
<p>In general localizing a web site involves a few steps steps:</p>
<ul>
<li><strong>Message marking:</strong> Any message on the site (e.g. “Hello Dave”, “Login”, “Repin”) has to be “marked” as localizable.</li>
<li><strong>Message extraction:</strong> We have a tool that extracts any messages found in our codebase and builds a translation template.</li>
<li><strong>Translation:</strong> This involves taking a file filled with English messages and adding a translation for each one in multiple languages.</li>
<li><strong>Compilation:</strong> Each message is compiled into a binary format that allows for fast translation lookups.</li>
</ul>
<p>It seems deceptively simple, but it can get very complicated as you’ll see.</p>
<h2 id="how-it-worked-at-mozilla">How it worked at Mozilla</h2>
<p>At Mozilla any text we wrote <em>had</em> to be localized. It’s very much a global
organization.</p>
<p>All text had to be wrapped in special tags.
These special tags served two purposes:</p>
<ol>
<li>They will look up in a message database what the translation is.</li>
<li>They let our <a href="https://github.com/clouserw/tower">internal tools</a> find these messages, so our translators can translate them.</li>
</ol>
<p>Step 2 is what we call extraction. Some teams at Mozilla would automate this process and automatically email the localizers that the messages are ready for translation.</p>
<p>We had <a href="https://localize.mozilla.org/">a tool</a> that allowed volunteers to begin translating. These translators had leaders and the leaders worked with people at Mozilla to make sure the process was working.</p>
<p>The translated strings would automatically saved to our code repository and we’d then compile that before we deployed a web site.</p>
<h2 id="how-it-works-for-now-at-pinterest">How it works (for now) at Pinterest</h2>
<p>The Mozilla process worked well, but there was a lot of <a href="http://mozweb.readthedocs.org/en/latest/l10n.html">awkward steps</a> that I didn’t want to replicate. Unfortunately we still had to markup strings. It’s a lot more difficult to do this with a social networking site (versus the Mozilla web sites) because you get messages like this:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Bob and 6 others liked your pin.
</code></pre></div></div>
<p>This involved writing a uniquely translatable messages for:</p>
<ul>
<li>Bob liked your pin.</li>
<li>Bob and 1 other liked your pin.</li>
<li>Bob and <code class="language-plaintext highlighter-rouge">n</code> others liked your pin.</li>
</ul>
<p>The hardest part was Pinterest was built without translation in mind–therefore assumptions were made about how we could dynamically construct sentences. For example (pseudo-code):</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>message = "Bob"
if others:
message += "and n other"
if n != 1:
message += "s"
message += "liked your pin"
</code></pre></div></div>
<p>Localizing those four fragments (“Bob”, “and n other”, “s” and “liked your pin”) wouldn’t work as different languages have different rules for plurality, ordering of subjects and general sentence construction.</p>
<p>I spent a lot of time correcting these types of messages as well as doing in person code reviews in order to help other developers construct their own sentences. This is an on-going process and probably the most difficult part of translating .</p>
<p>For a lot of these strings, it was a real simple change, I employed a lot of <a href="https://gist.github.com/2474435">vim macros</a> to assist me (<code class="language-plaintext highlighter-rouge">set paste</code> is your friend).</p>
<p>A tool that could find unmarked messages in our codebase would have been immensely useful. Instead I <a href="https://gist.github.com/2586745">wrote a script</a> to build a “!!!YELLING!!!” translation which uppercases everything and adds exclamations. This is a similar strategy that many teams use, including the <a href="http://micropipes.com/blog/2012/05/31/adding-a-debug-language-to-%C8%A7%E1%B8%93%E1%B8%93-%C7%BF%C6%9Es-%E1%B8%BF%C7%BFzill%C8%A7-%C7%BFr%C9%A0/">Firefox Add-ons</a> team, for finding untranslated text.</p>
<h3 id="translation">Translation</h3>
<p>My colleague, Sarah, has been building a translation team. Together we figured out how we wanted to start. We decided to hire native translators who are familiar with our site. The feedback and discussion we’ve received for Latin American Spanish has been great. From them we’re able to identify things that are difficult to translate, and help build better message strings and context so that a translator can effectively do help write copy for our site.</p>
<p>Context can be screenshots (using <a href="http://getcloudapp.com/">CloudApp</a> heavily) or just a lengthy comment explaining where a word is used in the site.</p>
<p>To help facilitate the actual translation we employ <a href="http://transifex.net">Transifex</a> as a translation hub:</p>
<ul>
<li>We take our extracted message template, upload it to transifex.</li>
<li>Transifex merges those strings into templates for each language.</li>
<li>Translators can download those language files and translate them, or they can translate them on the Transifex site directly.</li>
<li>We download the translated strings.</li>
</ul>
<p>We automate this process and upload our messages weekly and download translations twice daily. This means developers just need to worry about marking new strings, everything routine is done for them.</p>
<h3 id="lessons">Lessons</h3>
<p><strong>This process has gone well.</strong> We really liked focusing on <em>one language</em> to begin. It helped us narrow our focus. We learned that picking a <em>small set of translators</em> and eliminating as many of the levels as you can between engineer and translator is very useful. Keeping that loop tight allows you to keep translation quality high.</p>
<p>We had a lot of outside help. My former team, the web devs at Mozilla, specifically <a href="http://micropipes.com/blog/">Wil Clouser</a>, helped create some useful tools like <a href="https://github.com/clouserw/tower">tower</a>. This utilizes <a href="http://www.gnu.org/software/gettext/">gettext</a> and <a href="http://babel.edgewall.org/">babel</a> which actually make light work of supporting internationalization. A lot of people, including Dan from Dropbox and Dimitri from Transifex have given us a lot of great advice, both technically and operationally. We’ve also had a lot of help from translators, volunteers and friendly Pinterest users, so thanks!</p>
<p>I’m excited to have Pinterest available to Spanish speakers (European Spanish is coming soon), and I’m excited to continue to bring this to the rest of the world.</p>
An Adventure
2012-03-19T00:00:00+00:00
http://davedash.com/2012/03/19/an-adventure
<p>I have left the Mozilla Corporation this month. A lot of people have asked me
why I decided to leave. To some, a decision like that is unfathomable.
Mozilla has been my favorite place to work ever, so leaving was a difficult
choice. I’ve done some things I’m proud of:</p>
<ul>
<li>I built out the add-ons search (twice).</li>
<li>I helped rewrite our add-ons website in Django… and then all our sites.</li>
<li>I helped us ship solid code with continuous integration.</li>
<li>I helped grow our team with some solid people (about 20 people were brought
in or interviewed by me).</li>
<li>I helped speed up our recruitment process.</li>
<li>I helped smooth out <a href="http://mozweb.readthedocs.org/">our on-boarding process</a>.</li>
<li>I helped <a href="http://blog.mozilla.com/webdev/2011/09/28/pyladies-and-djangocon-2011-2/">get a few members of PyLadies to DjangoCon</a>.</li>
</ul>
<p>Mozilla made it very easy to do things like this. It encourages this type of
behavior. I’ve had a lot of good mentors, and a lot of great peers.</p>
<p>I also learned quite a bit of python. Hiring bright python engineers who are
happy to share their knowledge helps.</p>
<p>The folks at <a href="http://pinterest.com/">Pinterest</a> proposed an adventure for me. One where I’ll get
to learn quite a bit on the fly, but also take similar initiatives at making
engineering fun and productive. I’ll also be working on a product used by
many people, but more importantly a product that’s used by quite a few of my
friends.</p>
Yes, I like it too!
2012-03-07T00:00:00+00:00
http://davedash.com/2012/03/07/yes,-i-like-it-too!
<p><img src="http://www.asofterworld.com/clean/shades.jpg" alt="When I overhear someone say, That's fucking gay..." /></p>
<p>It’s time to grow up, Mozilla.</p>
TechShop Offsite
2012-03-01T00:00:00+00:00
http://davedash.com/2012/03/01/techshop-offsite
<p>For the Mozilla Flux Web Development team we went to the Tech Shop and we lasered some cool stuff.</p>
<div>
<a href="http://www.flickr.com/photos/morgamic/sets/72157629121855200/with/6796875162/" title="Build Your Dreams Here by morgamic, on Flickr"><img src="http://farm8.staticflickr.com/7209/6796875162_38de6bd2f9.jpg" width="500" height="331" alt="Build Your Dreams Here" /></a>
</div>
<p>TechShop staff was totally accommodating. It’s definitely a great place for anybody to check out, or any team to do a team building exercise.</p>
Naming things and a recursion
2012-01-05T00:00:00+00:00
http://davedash.com/2012/01/05/naming-things-and-a-recursion
<p>Most Mozilla webdev projects have an awful project structure, and it’s
partially my fault. I’m <a href="https://github.com/mozilla/playdoh/pull/67">attemtping to fix that</a>, but I
cringe every time someone creates a new <a href="http://playdoh.rtfd.org/">playdoh</a>
(Mozilla’s Django template)
based project.</p>
<h3 id="the-typical-python-project">The typical python project</h3>
<p>Your typical python project looks like this:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/my_project
someotherstuff/
docs/
theactualthingicareabout/
setup.py
LICENSE
</code></pre></div></div>
<h3 id="moztrosity">MOZtrosity</h3>
<p>We didn’t have a good guide when we first started writing Django projects, so
we opted for something like this:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>theactualthingicareabout/
apps/
foo/
bar/
__init__.py
urls.py
settings.py
LICENSE
</code></pre></div></div>
<p>In otherwords, the Django Project, which is a python module, is immediately
checked out. If you check this out to an invalid directory, e.g. you do something like:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>git clone github.com/davedash/myawesomeproject.git will.not.work\!
</code></pre></div></div>
<p>Bad things will happen.</p>
<h3 id="so">So?</h3>
<p>To some people this seems like an easy thing to work-a-round, but when it takes
three of my excellent coworkers a week to diagnose an issue, where this ended
up being the root cause… well it becomes a higher priority issue.</p>
<p>So here’s what happened this week, when we tried to deploy
<a href="https://github.com/mozilla/lumbergh/">the new careers site</a> to a VM hardware.</p>
<p>Our ops team sensibly checked out the project like so:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>git clone https://github.com/mozilla/lumbergh.git careers
</code></pre></div></div>
<p>They did everything right. Sure, they were creative and chose <code class="language-plaintext highlighter-rouge">careers</code> over
the default <code class="language-plaintext highlighter-rouge">lumbergh</code>, but they knew the shortcomings of our system and picked
a name that would resolve as a valid python package.</p>
<p>Unfortunately we’d hit some <em>recursion error</em> anytime we tried to hit a URL.
So we knew there was an issue with the URL resolver, but we couldn’t figure it
out.</p>
<p>Here’s what the project layout looked like:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>careers/ # I could be called anything, but they chose careers
__init__.py
apps/
careers/ # I'm going to cause problems,
# but neither devs nor ops will suspect a thing! mwahaha
__init__.py
models.py
urls.py
views.py
settings.py
urls.py
</code></pre></div></div>
<p>other files.</p>
<p>We configure our <code class="language-plaintext highlighter-rouge">apps/</code> directory to be part of our <code class="language-plaintext highlighter-rouge">PYTHON_PATH</code> so we can
do things like <code class="language-plaintext highlighter-rouge">from careers import views</code>… you can probably see where this
is going.</p>
<p>Here’s the main <code class="language-plaintext highlighter-rouge">urls.py</code> <a href="https://github.com/mozilla/lumbergh/blob/master/urls.py">1</a>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>...
urlpatterns = patterns('',
(r'', include('careers.urls')),
)
...
</code></pre></div></div>
<p>The main <code class="language-plaintext highlighter-rouge">urls.py</code> includes <code class="language-plaintext highlighter-rouge">careers.urls</code> which if you look at the above
project layout, resolves to two different python packages:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">careers/urls.py</code></li>
<li><code class="language-plaintext highlighter-rouge">careers/apps/careers/urls.py</code></li>
</ul>
<p>Python chose the first, and therefore <code class="language-plaintext highlighter-rouge">urls.py</code> kept calling upon itself.</p>
<h3 id="so-what-did-we-learn">So what did we learn?</h3>
<p>Do better.</p>
<p>First of all, we need a better project layout. This will continue to cause
problems for even the brightest developers.</p>
<p>Secondly, if you don’t do this at least name apps carefully.
Django’s app model can be a bit much for
non third party apps. Sometimes there’s one app which spans the entire
project, and it’s tempting to call it the same name as the project
(e.g. <code class="language-plaintext highlighter-rouge">careers</code>), but sometimes a lamer more generic name like <code class="language-plaintext highlighter-rouge">common</code> is
better.</p>
<p>But really, the second point is moot if we just clean up.</p>
Choosing a New Web Stack
2011-10-09T00:00:00+00:00
http://davedash.com/2011/10/09/choosing-a-new-web-stack
<p>I like building web sites, a lot. Usually every few years I need to re-
evaluate the stack I use for a side-project. Joshua’s
<a href="http://stackparts.com/">Stack Parts</a> site is handy for this.</p>
<p>At Mozilla Webdev we stick to redis + mysql + elasticsearch + celery + rabbitmq</p>
<ul>
<li>memcache + git + <a href="/tutorial/virtualenv">virtualenv</a> + python + django + jinja2 + modwsgi + commander</li>
<li>puppet + apache + less + jquery as our go-to stack. It’s tried and true and
it’s been working and been evolving for two years.</li>
</ul>
<p>So I’m at re-evaluation time. The first element of the stack I needed to
decide upon was the web framework. I initially thought I’d use Django, and
maybe alternate a few supporting libraries just to color my experience. But
Flask caught my attention.</p>
<p>Flask is from Pocoo who have given me great things like:</p>
<ul>
<li>lodgeit</li>
<li>Werkzeug</li>
<li>Jinja2</li>
<li>Sphinx</li>
</ul>
<p>It was a microframework, which meant that it didn’t contain as many things as
Django, but at the same time, I didn’t use that much of Django.</p>
<p>Flask was a nice way to stay mostly in my comfort-zone, and in some ways, focus
me on just writing an app, and not working in a framework. Since it’s python,
if I start to miss Django, I can probably rewrite my code without too much
effort.</p>
<p>Overall I’m excited, and I just got past, “Hello World.”</p>
<p>I’m not sure what my stack will look like, I’m imagining it will evolve into:</p>
<p>postgresql + memcache + git + flask + jinja2 + gunicorn + fabric + puppet +
nginx + less + backbonejs + jquery</p>
<p>This will give me a chance to learn more about things I’m interested in, and
utilize what I think might be better options along the stack.</p>
Interviewing: How You Think
2011-08-28T00:00:00+00:00
http://davedash.com/2011/08/28/interviewing:-how-you-think
<p>My friend and Web Dev colleague, James Socol, wrote a piece recently about
<a href="http://coffeeonthekeyboard.com/so-you-want-me-to-hire-you-606/">some hiring things</a>. I think it’s a good idea, and it would certainly make
my job easier. It reminded me that I’m putting off a few posts about hiring.</p>
<p>I want to touch a bit on the “build/implement/write an X” style technical
questions. For example:</p>
<ul>
<li>Implement a linked list</li>
<li>Write a url parser</li>
<li>Make an image resizer in Javascript</li>
</ul>
<p>Often we get candidates who are exceptionally smart on paper, have great Github
accounts, but fall flat on the in person implementation.</p>
<p>For some people this is <em>hard</em>. So let’s get over the <em>nerves</em>. I’m not
going to tell you <strong>it’s silly to be nervous</strong> or that
<strong>I’m as nervous as you</strong> (I’m not), or
anything that you already know. I will tell you this. If you notice yourself
not thinking as sharply as you normally do, <strong>count backwards from 10</strong>.
Then carry on. Repeat if
necessary. Usually our interviews are 45 minutes long. 10 seconds of you
counting backwards isn’t going to kill us for time.</p>
<p>Hopefully this will calm you. If this doesn’t… well it was worth a shot.</p>
<p>So let’s get down to <a href="http://en.wiktionary.org/wiki/brass_tacks?rdfrom=Brass_tacks">brass tacks</a>. I want to know how you implement
things. I want to know <em>how you think</em>. I don’t care as much about the
answer, but the correct answer is comforting.</p>
<p>If I tell you “implement an image resizer in Javascript” and you start diving
into JavaScript code and get flustered, I can’t really help you.
If you can do that
without getting flustered and produce something that works, I’ll be impressed,
but somewhat alarmed.</p>
<p>What I really want is you to <strong>step back from the problem</strong>.
I want you to show me what the resizer will look like and how it will behave.
You don’t need to go into too many details, but <em>enough details so I know you
understand the problem</em>.</p>
<p>If the problem is easily testable (like writing a url parser),
it wouldn’t hurt to have some tests to start.</p>
<p>Then I want you to write the code backwards from high level to low level.
Write the HTML, write the initialization code, and finally write the event
handlers.</p>
<p>This is not only easier to follow in an interview,
it’s basic problems solving,
because you end up breaking the problem into smaller chunks.</p>
DjangoCon Testing Tutorial
2011-08-10T00:00:00+00:00
http://davedash.com/2011/08/10/djangocon-testing-tutorial
<div class="side">
<img src="/static/images/2011/08/10/djangocon.png" width="287" height="184" alt="DjangoCon 2011" />
</div>
<p>If you want to learn all you can about testing anything in your Django App, see
<a href="http://djangocon.us/schedule/presentations/30/">my tutorial</a> at <a href="http://djangocon.us/">DjangoCon</a>.
It’s on September 5th, it’ll be 3 hours
long and so far with seven sign ups it will be very hands-on.</p>
<p>Here’s what I think I will cover, but I may change this depending on what the
audience wants:</p>
<ul>
<li>Testing issues
<ul>
<li>ask people to fill out etherpad with issues they’ve run into</li>
<li>ask someone to rank them in order of complexity</li>
</ul>
</li>
<li>List an outline of topics
<ul>
<li>post them on etherpad</li>
<li>have people + them if they are interested</li>
</ul>
</li>
<li>Testing overview
<ul>
<li>We started in late 2009 early 2010</li>
<li>Our largest project has 2500 tests</li>
<li>Our next largest has 1100</li>
<li>We have pretty good coverage</li>
</ul>
</li>
<li>How testing works in Django
<ul>
<li>I’m not 100% sure on this</li>
<li>Test runner setups up a new database</li>
<li>Test runner finds and runs tests</li>
<li>Tests run class setup</li>
<li>Test runs each test in a test case
<ul>
<li>Load fixtures</li>
<li>Tests run setup</li>
<li>Tests runs the test</li>
<li>Tests runs teardown</li>
</ul>
</li>
<li>Tests run class Teardown</li>
<li>You get an F if you’re bad and a . if your not.</li>
<li>Now that you know it, you can hack it.</li>
</ul>
</li>
<li>How we’ve hacked testing
<ul>
<li>2500 tests is a lot</li>
<li>We no longer recreate the database when you run the test suite</li>
<li>In each test case we just load the fixtures once.</li>
<li>We rearrange the tests so things with the same fixture set run together</li>
</ul>
</li>
<li>Testing tools that we use at Mozilla
<ul>
<li>nose/django_nose</li>
<li>nose plugins
<ul>
<li>nicedots</li>
<li>progressive</li>
</ul>
</li>
<li>coverage
<ul>
<li>git + whatchangedpy</li>
</ul>
</li>
</ul>
</li>
<li>Testing everything, no excuses
<ul>
<li>100% Coverage isn’t important</li>
<li>80% is nice</li>
<li>Good coverage on tricky things is important</li>
<li>Some coverage on everything is important</li>
<li>External</li>
<li>If you start depending on APIs, Search or different tools you need to be able to test for them.</li>
<li>Writing these test cases will take less time than this tutorial</li>
<li>It will save you so much headache in the future.</li>
<li>The same headaches you save yourself by writing “normal” tests</li>
<li>Mock easy things
<ul>
<li>use a decorator on any test/view that might use redis</li>
<li>if redis isn’t setup, use the mock client</li>
<li>mock client doesn’t support everything,
<ul>
<li>just what I need to get my tests running -</li>
<li>feel free to extend it if you use it</li>
</ul>
</li>
<li>Testing Redis</li>
</ul>
</li>
<li>Setup/Teardown for complicated tools
<ul>
<li>Good for search and APIs</li>
<li>Raise SkipTest (nose) if the developer doesn’t want to run these tests</li>
<li>Non realtime tools
<ul>
<li>Testing Sphinx search</li>
<li>SetupClass
<ul>
<li>load fixtures</li>
<li>run indexer</li>
<li>run server</li>
</ul>
</li>
<li>Sphinx server now available for all tests in your test case</li>
<li>Teardown
<ul>
<li>stop server</li>
</ul>
</li>
</ul>
</li>
<li>Real time tools
<ul>
<li>Nicer, data can be added in post_save signals or elsewhere in your app</li>
<li>Testing LDAP
<ul>
<li>Setup
<ul>
<li>Remove LDAP files</li>
<li>Load an ldif</li>
<li>Start slapd</li>
</ul>
</li>
<li>Your code can now touch LDAP</li>
</ul>
</li>
<li>Testing ElasticSearch
<ul>
<li>We leave ES running all the time.</li>
<li>Setup
<ul>
<li>Checks for ES support or SkipTest</li>
<li>Deletes index</li>
<li>Creates index</li>
</ul>
</li>
<li>You can now read/write to ES</li>
<li>Teardown
<ul>
<li>Delete’s index</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
<li>Fixtures
<ul>
<li>Fixture Magic</li>
<li>Model Maker</li>
</ul>
</li>
<li>pitfalls
<ul>
<li>dates</li>
<li>using PDB</li>
</ul>
</li>
</ul>
</li>
</ul>
Better querying for ElasticSearch
2011-05-17T00:00:00+00:00
http://davedash.com/2011/05/17/better-querying-for-elasticsearch
<p>I wrote <a href="/2011/03/25/filter-queries-using-pyes/">about how to write filter queries using <code class="language-plaintext highlighter-rouge">pyes</code></a>.
Unfortunately after using ElasticSearch in <a href="http://builder.addons.mozilla.org">the Add-ons Builder</a>, I realized
that our code would become unwieldy and hard to read if we kept using straight
up <code class="language-plaintext highlighter-rouge">pyes</code>.</p>
<p>I prefer to write APIs so that are natural and conform to how I think, not one
that simply mirrors another system.</p>
<p>So rather than this:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"> <span class="n">filters</span> <span class="o">=</span> <span class="p">[</span><span class="n">TermFilter</span><span class="p">(</span><span class="s">"platform"</span><span class="p">,</span> <span class="s">"all"</span><span class="p">),</span>
<span class="n">TermFilter</span><span class="p">(</span><span class="s">"product"</span><span class="p">,</span> <span class="s">"firefox"</span><span class="p">),</span>
<span class="n">TermFilter</span><span class="p">(</span><span class="s">"version"</span><span class="p">,</span> <span class="s">"4.0"</span><span class="p">)]</span>
<span class="nb">filter</span> <span class="o">=</span> <span class="n">ANDFilter</span><span class="p">(</span><span class="n">filters</span><span class="p">)</span>
<span class="n">q</span> <span class="o">=</span> <span class="n">FilteredQuery</span><span class="p">(</span><span class="n">MatchAllQuery</span><span class="p">(),</span> <span class="nb">filter</span><span class="p">).</span><span class="n">search</span><span class="p">()</span>
<span class="n">q</span><span class="p">.</span><span class="n">facet</span><span class="p">.</span><span class="n">add_term_facet</span><span class="p">(</span><span class="s">'type'</span><span class="p">)</span>
<span class="n">results</span> <span class="o">=</span> <span class="n">es</span><span class="p">.</span><span class="n">search</span><span class="p">(</span><span class="n">q</span><span class="p">)</span></code></pre></figure>
<p>I made <a href="https://github.com/davedash/elasticutils/">something simpler</a>:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"> <span class="kn">from</span> <span class="nn">elasticutils</span> <span class="kn">import</span> <span class="n">S</span>
<span class="n">results</span> <span class="o">=</span> <span class="p">(</span><span class="n">S</span><span class="p">(</span><span class="n">platform</span><span class="o">=</span><span class="s">'all'</span><span class="p">,</span> <span class="n">product</span><span class="o">=</span><span class="s">'firefox'</span><span class="p">,</span> <span class="n">version</span><span class="o">=</span><span class="s">'4.0'</span><span class="p">)</span>
<span class="p">.</span><span class="n">facet</span><span class="p">(</span><span class="s">'type'</span><span class="p">).</span><span class="n">get_results</span><span class="p">)</span></code></pre></figure>
<p>Here were the design thoughts:</p>
<ul>
<li>I wanted something easy to remember, <code class="language-plaintext highlighter-rouge">S</code> for search.</li>
<li>I wanted smart defaults, by default <code class="language-plaintext highlighter-rouge">S()</code> matches all documents, unless you
give it a query term.</li>
<li>I didn’t want to write python that looked like Java, or JSON or even a
<code class="language-plaintext highlighter-rouge">dict</code>.</li>
<li>I wanted to write something that felt like the Django-ORM</li>
<li>Ultimately I want code that I enjoy writing.</li>
</ul>
<p>So here it is, I expect it to power Firefox Add-ons, the Add-ons Builder and
Firefox Input shortly.</p>
<p>This is all part of <a href="https://github.com/davedash/elasticutils/">ElasticUtils</a>.
Let me know if you are using it, and pull requests are welcome!</p>
Filter Queries using pyes
2011-03-25T00:00:00+00:00
http://davedash.com/2011/03/25/filter-queries-using-pyes
<p>I’ve been having a tough time navigating the Elastic Search docs, but some
sleuthing in the test suite for <code class="language-plaintext highlighter-rouge">pyes</code> has proved helpful.</p>
<p>If I have documents that I’d like filtered by let’s say <code class="language-plaintext highlighter-rouge">product</code>, <code class="language-plaintext highlighter-rouge">version</code>
and <code class="language-plaintext highlighter-rouge">platform</code>, I can construct a query like so:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"> <span class="n">filters</span> <span class="o">=</span> <span class="p">[</span><span class="n">TermFilter</span><span class="p">(</span><span class="s">"platform"</span><span class="p">,</span> <span class="s">"all"</span><span class="p">),</span>
<span class="n">TermFilter</span><span class="p">(</span><span class="s">"product"</span><span class="p">,</span> <span class="s">"firefox"</span><span class="p">),</span>
<span class="n">TermFilter</span><span class="p">(</span><span class="s">"version"</span><span class="p">,</span> <span class="s">"4.0"</span><span class="p">)]</span>
<span class="nb">filter</span> <span class="o">=</span> <span class="n">ANDFilter</span><span class="p">(</span><span class="n">filters</span><span class="p">)</span>
<span class="n">q</span> <span class="o">=</span> <span class="n">FilteredQuery</span><span class="p">(</span><span class="n">MatchAllQuery</span><span class="p">(),</span> <span class="nb">filter</span><span class="p">)</span>
<span class="n">results</span> <span class="o">=</span> <span class="n">es</span><span class="p">.</span><span class="n">search</span><span class="p">(</span><span class="n">q</span><span class="p">)</span></code></pre></figure>
<p>There is perhaps a more succinct way of doing this, but this serves my
purposes.</p>
<p>Let’s say you need facets as well:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"> <span class="n">filters</span> <span class="o">=</span> <span class="p">[</span><span class="n">TermFilter</span><span class="p">(</span><span class="s">"platform"</span><span class="p">,</span> <span class="s">"all"</span><span class="p">),</span>
<span class="n">TermFilter</span><span class="p">(</span><span class="s">"product"</span><span class="p">,</span> <span class="s">"firefox"</span><span class="p">),</span>
<span class="n">TermFilter</span><span class="p">(</span><span class="s">"version"</span><span class="p">,</span> <span class="s">"4.0"</span><span class="p">)]</span>
<span class="nb">filter</span> <span class="o">=</span> <span class="n">ANDFilter</span><span class="p">(</span><span class="n">filters</span><span class="p">)</span>
<span class="n">q</span> <span class="o">=</span> <span class="n">FilteredQuery</span><span class="p">(</span><span class="n">MatchAllQuery</span><span class="p">(),</span> <span class="nb">filter</span><span class="p">).</span><span class="n">search</span><span class="p">()</span>
<span class="n">q</span><span class="p">.</span><span class="n">facet</span><span class="p">.</span><span class="n">add_term_facet</span><span class="p">(</span><span class="s">'type'</span><span class="p">)</span>
<span class="n">results</span> <span class="o">=</span> <span class="n">es</span><span class="p">.</span><span class="n">search</span><span class="p">(</span><span class="n">q</span><span class="p">)</span></code></pre></figure>
How we slug at Mozilla
2011-03-24T00:00:00+00:00
http://davedash.com/2011/03/24/how-we-slug-at-mozilla
<p>One problem we find with slug generators, is they do an awful job with unicode.
For a string like this: <code class="language-plaintext highlighter-rouge">Bän...g (bang)</code> you get something like
<code class="language-plaintext highlighter-rouge">bng---g--bang-</code> or at best <code class="language-plaintext highlighter-rouge">bang-bang</code>. But it’s 2011, urls can have
unicode… here’s what we really want: <code class="language-plaintext highlighter-rouge">bäng-bang</code>.</p>
<p>In some cases transliteration might be acceptable. But if we look at Django’s
approach it fails at Russian. Here’s a comparison with ours for the Russian
phrase “Быстрее и лучше!” (“Faster and better!”):</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>>>> from django.template.defaultfilters import slugify as djslugify
>>> from slugify import slugify
>>> str = u'Быстрее и лучше!'
>>> print djslugify(str)
>>> print slugify(str)
быстрее-и-лучше
</code></pre></div></div>
<p>So as you can see, the built-in Django <code class="language-plaintext highlighter-rouge">slugify</code> could be disastrous. So take
a look at <a href="https://github.com/mozilla/unicode-slugify">ours</a>. If you have some more test cases, please fork it.</p>
Data Anonymous
2011-03-02T00:00:00+00:00
http://davedash.com/2011/03/02/data-anonymous
<p>I wrote a simple database <a href="https://github.com/davedash/mysql-anonymous">scrubber script</a>. It takes a <code class="language-plaintext highlighter-rouge">yaml</code> file that
describes what scrubbing needs doing and then outputs <code class="language-plaintext highlighter-rouge">sql</code> that you can send
to <code class="language-plaintext highlighter-rouge">mysql</code>. It’s dreadfully simple and I’d like to see if others can make use
of it.</p>
<p>At Mozilla we have a lot of contributors and would like them to have access to
realistic data since many of our bugs are based on certain states within the
data.</p>
Bulk load ElasticSearch using pyes
2011-02-25T00:00:00+00:00
http://davedash.com/2011/02/25/bulk-load-elasticsearch-using-pyes
<p>When indexing a lot of data, you can save time by bulk loading data.</p>
<p>With <code class="language-plaintext highlighter-rouge">pyes</code> you can do the following:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">from</span> <span class="nn">pyes</span> <span class="kn">import</span> <span class="n">ES</span>
<span class="n">es</span> <span class="o">=</span> <span class="n">ES</span><span class="p">()</span>
<span class="n">es</span><span class="p">.</span><span class="n">index</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="s">'my-index'</span><span class="p">,</span> <span class="s">'my-type'</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
<span class="n">es</span><span class="p">.</span><span class="n">index</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="s">'my-index'</span><span class="p">,</span> <span class="s">'my-type'</span><span class="p">,</span> <span class="mi">2</span><span class="p">)</span>
<span class="n">es</span><span class="p">.</span><span class="n">index</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="s">'my-index'</span><span class="p">,</span> <span class="s">'my-type'</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span>
<span class="n">es</span><span class="p">.</span><span class="n">index</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="s">'my-index'</span><span class="p">,</span> <span class="s">'my-type'</span><span class="p">,</span> <span class="mi">4</span><span class="p">)</span></code></pre></figure>
<p>This will make 4 independent network calls.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">from</span> <span class="nn">pyes</span> <span class="kn">import</span> <span class="n">ES</span>
<span class="n">es</span> <span class="o">=</span> <span class="n">ES</span><span class="p">()</span>
<span class="n">es</span><span class="p">.</span><span class="n">index</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="s">'my-index'</span><span class="p">,</span> <span class="s">'my-type'</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="n">bulk</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="n">es</span><span class="p">.</span><span class="n">index</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="s">'my-index'</span><span class="p">,</span> <span class="s">'my-type'</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="n">bulk</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="n">es</span><span class="p">.</span><span class="n">index</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="s">'my-index'</span><span class="p">,</span> <span class="s">'my-type'</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="n">bulk</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="n">es</span><span class="p">.</span><span class="n">index</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="s">'my-index'</span><span class="p">,</span> <span class="s">'my-type'</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="n">bulk</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="n">es</span><span class="p">.</span><span class="n">refresh</span><span class="p">()</span></code></pre></figure>
<p>Will do this in one call. This is handy for those “reindex all the items we
can” weekends.</p>
Installing ElasticSearch plugins
2011-02-24T00:00:00+00:00
http://davedash.com/2011/02/24/installing-elasticsearch-plugins
<p>I’m slowly trying to familiarize myself with ElasticSearch and the <code class="language-plaintext highlighter-rouge">pyes</code>
python interface. ElasticSearch uses a lot of plugins, and while the plugin
system is easy to use, it’s not obvious where to find the plugins.</p>
<p>They are <a href="http://elasticsearch.googlecode.com/svn/plugins/">here</a>.</p>
<p>If you want to install the attachments plugin, you can do:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>bin/plugin install mapper-attachments
</code></pre></div></div>
<p>And voilà it’s installed.</p>
Interviews with Fibonacci
2011-01-28T00:00:00+00:00
http://davedash.com/2011/01/28/interviews-with-fibonacci
<p>I have the privilege of interviewing many of the people who wish to be web
developers at Mozilla. I unfortunately witness
<a href="http://www.codinghorror.com/blog/2007/02/why-cant-programmers-program.html">programmers not being able to program</a>. My interviews are 45 minutes
long. I tend to ask people about their history, experiences and about what
they know. Then I give them some time to write some code.</p>
<h3 id="my-question">My Question</h3>
<p>I use questions relating to the Fibonacci sequence. E.g.:</p>
<ul>
<li>Print <code class="language-plaintext highlighter-rouge">n</code> numbers of the Fibonacci sequence</li>
<li>Give me a <code class="language-plaintext highlighter-rouge">list</code> (if the person knows python) of <code class="language-plaintext highlighter-rouge">n</code> numbers of the Fibonacci
sequence.</li>
<li>In rare cases, Print the <code class="language-plaintext highlighter-rouge">n</code>th numbers of Fibonacci.</li>
</ul>
<p>In some cases I dumb it down… a lot:</p>
<blockquote>
<p>Give me <code class="language-plaintext highlighter-rouge">n</code> integers. In other words <code class="language-plaintext highlighter-rouge">f(5) = [1 1 1 1 1]</code>. I just want to
know if you can write a <code class="language-plaintext highlighter-rouge">for</code>-loop.</p>
</blockquote>
<p>Here’s the sequence:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>1 1 2 3 5 8 13 21 34 .. k[n-2] k[n-1] k[n]
</code></pre></div></div>
<p>The <strong>first two numbers are <code class="language-plaintext highlighter-rouge">0</code> and <code class="language-plaintext highlighter-rouge">1</code></strong>. Each successive number is the
<strong>addition of the two proceeding numbers</strong>. In other words:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>k[n] = k[n-2] +k[n-1]
</code></pre></div></div>
<p>I don’t care too much about efficiency. I often don’t care if it works, if
they’re trying.</p>
<h3 id="this-is-a-low-bar">This is a low bar</h3>
<p>A good <strong>number of people solve this with no issues</strong>. They calculate the
sequence and print out the numbers, or store them in a <code class="language-plaintext highlighter-rouge">list</code>. I run the code,
we optimize it slightly, we move on.</p>
<p>Unfortunately, <strong>I expect <em>everyone</em> I interview to easily solve this</strong>. Quite
honestly, <strong>if they can’t solve this they should not have even made it past the
phone screen</strong>. It’s harsh and the only exception is pressure.</p>
<h3 id="pressure">Pressure</h3>
<p>Interview pressures is a real thing:</p>
<ul>
<li>Interviewees at Mozilla are generally excited by the prospect.</li>
<li>People want to appear intelligent to their peers.</li>
<li>They had too much coffee at their hotel. There’s a lot to think about,
<strong>it’s not a natural environment</strong>.</li>
</ul>
<p>I try to solve this by using a comfortable room with couches. I also refrain
from yelling at the candidates like a Tiger Mom. Being nervous, however, is
difficult.</p>
<p>This problem gives a lot of people coders-block.</p>
<p>My advice to anyone doing technical interviews:</p>
<blockquote>
<p><em>Practice.</em></p>
</blockquote>
<p>Have your friends, family, fiance, whomever ask you programming questions. You
can compile a huge list, and just have them pick one at random. Make the
process seem natural.</p>
<h3 id="excitement">Excitement</h3>
<p>The best <strong>people who solve these problems</strong> are the ones who
<strong>think problems are exciting</strong>. They enjoy challenge. They are also find the
interview exciting, they are <strong>entering in with a good attitude</strong>.</p>
<p>It’s <strong>not about intelligence</strong>. Intelligence is rarely a factor. It’s about
effort, and effort is easy to exert if you find something exciting.</p>
<p>If they are excited to ace a simple interview question, I can usually rely on
them to do their jobs, and in many cases turn to them to come up with solutions
to problems I might be running into.</p>
<p>The excited interviewer asks about what the solutions should entail, and the
rules:</p>
<ul>
<li>Are we starting at zero or one?</li>
<li>Do you want a <code class="language-plaintext highlighter-rouge">list</code>, or a generator?</li>
<li>Do you want to print this out?</li>
</ul>
<p>They eagerly churn out code. They check it with some basic tests. They let me
know it works with confidence. Occasionally they stumble and sometimes come
out with the wrong answer, but with some assistance they could usually figure
it out. I can use their failures as a segue to talk about testing and quality
assurance.</p>
<h3 id="how-do-you-think">How do you think?</h3>
<p>I don’t ask this problem, because I’m dying to know the 12th number of the
fibonacci sequence. I already know that’s 144. I want to know how people
solve problems. How do they take a set of requirements and implement them.</p>
<p>Our jobs as software developers are to take requirements or problems, find a
suitable solution and write code to solve it.</p>
<p>This is my approach. I know the first number is 1 and the next number is 1,
therefore I can store those and get the third number by adding them. I shift
the variables around in a for loop and I can get this:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>cur = next = 1
for 1..n:
print cur
last = cur
cur = next
next += last
</code></pre></div></div>
<p>That’s just pseudo-code. I’m not a math and/or computer science genius. I
can, however, take requirements and imagine how I would solve them if I had to
do it by hand, and then transcribe that process.</p>
<p>Likewise if someone tells me, we need to know the average rating that people
gave Firefox in the area of start-up time each day, I can figure out that
problem and then implement it using software.</p>
<p>So now I have to preface my interviews with, “If you read my post about the
Fibonacci sequence, let me know.”</p>
Testing Redis in Django
2010-12-07T00:00:00+00:00
http://davedash.com/2010/12/07/testing-redis-in-django
<p>For the <a href="http://addons.mozilla.org/en-US/firefox/">Firefox Add-ons</a> we’ve been using <a href="http://code.google.com/p/redis/">redis</a> here and there mostly
for cache, but lately for a few things we’d love to persist.</p>
<p>Unfortunately relying on redis does mean we need to be able to test it. Since
redis touches some of our core components of the site, we can’t just raise a
<code class="language-plaintext highlighter-rouge">SkipTest</code> like we would for Sphinx search related tests. I also don’t want to
rely on our developers to have redis installed in order to run the
test-suite.</p>
<p>So I built a simple <a href="https://github.com/mozilla/nuggets/blob/master/redisutils.py#L47">Mock Redis client</a>. It’s part of our
<code class="language-plaintext highlighter-rouge">redisutils.py</code> that handles connections to redis. If a test’s <code class="language-plaintext highlighter-rouge">setUp</code> method
calls <code class="language-plaintext highlighter-rouge">mock_redis</code> you’ll get this phony object that can do a few minimal
redis-like operations.</p>
<p>It works great for our specific cases, but feel free to fork it and make it
better.</p>
<p>Note: This <code class="language-plaintext highlighter-rouge">MockRedis</code> is specifically designed to work with <a href="http://www.djangoproject.com/">django</a>.</p>
Pythonic string formatting in Javascript
2010-11-19T00:00:00+00:00
http://davedash.com/2010/11/19/pythonic-string-formatting-in-javascript
<p>We do a lot of string manipulation on the <a href="https://addons.mozilla.org/">Firefox Addons</a> site. A lot of
it has to do with localization so one thing that comes up is being able to
format strings. Here’s a little snippet to give yourself python like string
formatting:</p>
<figure class="highlight"><pre><code class="language-javascript" data-lang="javascript"> <span class="cm">/* Python(ish) string formatting:
* >>> format('{0}', ['zzz'])
* "zzz"
* >>> format('{x}', {x: 1})
* "1"
*/</span>
<span class="kd">function</span> <span class="nx">format</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span> <span class="nx">args</span><span class="p">)</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">re</span> <span class="o">=</span> <span class="sr">/</span><span class="se">\{([^</span><span class="sr">}</span><span class="se">]</span><span class="sr">+</span><span class="se">)\}</span><span class="sr">/g</span><span class="p">;</span>
<span class="k">return</span> <span class="nx">s</span><span class="p">.</span><span class="nx">replace</span><span class="p">(</span><span class="nx">re</span><span class="p">,</span> <span class="kd">function</span><span class="p">(</span><span class="nx">_</span><span class="p">,</span> <span class="nx">match</span><span class="p">){</span> <span class="k">return</span> <span class="nx">args</span><span class="p">[</span><span class="nx">match</span><span class="p">];</span> <span class="p">});</span>
<span class="p">}</span></code></pre></figure>
Using git to borrow from the future
2010-11-09T00:00:00+00:00
http://davedash.com/2010/11/09/using-git-to-borrow-from-the-future
<p>One of the great features of <code class="language-plaintext highlighter-rouge">git</code> is the ability to re-order commits, break
commits into parts, and merge commits together.</p>
<p>Assuming that my <code class="language-plaintext highlighter-rouge">master</code> branch is a pristine copy of the site and an ancestor
of <code class="language-plaintext highlighter-rouge">mybranch</code> we can re-order commits by running:</p>
<p><code class="language-plaintext highlighter-rouge">git rebase -i master</code></p>
<p>This will take all the commits in your current branch (<code class="language-plaintext highlighter-rouge">mybranch</code>) that are
built upon <code class="language-plaintext highlighter-rouge">master</code> and allow you to reorder or edit them individually,
remove them or squash them.</p>
<p>For example you might get:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>pick 123abcd New feature supreme
pick 123abce Whitepsace fixes
pick 2222222 Rename functions.
pick 123abcf rebase me
</code></pre></div></div>
<p>The last commit listed is the latest commit and is where <code class="language-plaintext highlighter-rouge">mybranch</code>’s <code class="language-plaintext highlighter-rouge">HEAD</code>
points to.</p>
<p>You can edit this like so:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>pick 123abce Whitepsace fixes
pick 2222222 Rename functions.
pick 123abcd New feature supreme
f 123abcf rebase me
</code></pre></div></div>
<p>This will re-order history so the first three items happen and the “rebase me”
commit just gets rolled into the “New feature supreme”. Note since this is a
rebase the commit hashes will change. Let’s say history is now this (reverse
chronological):</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>323abcd New feature supreme
3222222 Rename functions.
323abce Whitepsace fixes
</code></pre></div></div>
<p>Great? Almost.</p>
<p>There’s a likely hood that your apps’ unit tests will not pass after the commit
“Rename functions”. Some functions may have been renamed somewhere later in
<code class="language-plaintext highlighter-rouge">mybranch</code> possibly in “rebase me” which is now a part of “New feature
supreme.”</p>
<p>This is a mess, but we can run <code class="language-plaintext highlighter-rouge">git rebase -i master</code> again and edit the
“Rename functions.” commit. If you run a test-suite and things fail you can
“borrow from the future”. You see at this point <code class="language-plaintext highlighter-rouge">Rename functions</code> is
something that happened in the past. <code class="language-plaintext highlighter-rouge">mybranch</code>’s head is now
<code class="language-plaintext highlighter-rouge">New feature supreme</code> which is the future. We can pick and choose little
changes with some <code class="language-plaintext highlighter-rouge">git</code>-fu.</p>
<p>While rebasing in the <code class="language-plaintext highlighter-rouge">Rename functions.</code> we might notice that we forgot to
rename a call, but we remembered to rename this call at some point in
<code class="language-plaintext highlighter-rouge">mybranch</code>. We can simply do this:</p>
<p><code class="language-plaintext highlighter-rouge">git checkout -p mybranch [paths]</code></p>
<p>This will let you interactively select chunks of code from the head of
<code class="language-plaintext highlighter-rouge">mybranch</code> and put it into your specific commit <code class="language-plaintext highlighter-rouge">Rename functions.</code>.</p>
<p>You can narrow this down by specifying some <code class="language-plaintext highlighter-rouge">paths</code>.</p>
<p>Once we finish rebasing we’ll have a commit history that is logically ordered
and have all tests passing tests.</p>
Faceted Search on Input
2010-10-29T00:00:00+00:00
http://davedash.com/2010/10/29/faceted-search-on-input
<p>So one trick with <a href="http://sphinxsearch.com/">Sphinx search</a> is <a href="http://en.wikipedia.org/wiki/Faceted_search">faceted search</a>. It’s somewhat
crudely implemented, by batching queries together, but does the job well. In
the case of <a href="http://input.mozilla.com/">Firefox Input</a> it can reduce quite a bit of queries (our
search result pages take one batched sphinx query, and one database query now
instead of 5 database queries).</p>
<div class="side">
<a href="http://www.flickr.com/photos/davedash/5126379671/" title="Add-on Search Results for shopping :: Add-ons for Firefox">
<img src="http://farm5.static.flickr.com/4041/5126379671_33b3e472d5_m.jpg" width="240" height="172" alt="Add-on Search Results for shopping" /></a>
</div>
<p>Faceted search is search with filters to help narrow down a result set. I’ll
give you three examples. <a href="http://addons.mozilla.org/">Firefox Add-ons</a> which I wrote,
<a href="http://www.sittercity.com/search-sitters.html?ct=101&zip=95126">Sitter City</a> which gives you a lot of ways on narrowing down on the perfect
baby sitter and <a href="http://ebay.com/">ebay</a> which lets your narrow down on auction items.</p>
<p>For <a href="http://input.mozilla.com/">Input</a> we ask for the following when we do a search:</p>
<ul>
<li>How many opinions match the term for which we are searching taking into
account any preferences we have already specified (feeling, locale, operating
system, date range, etc).</li>
<li>How many opinions show a positive sentiment, and how many show a negative
sentiment?</li>
<li>What is the breakdown of languages for the opinion results. (I.e. how many
are en-US, de, fr, etc).</li>
<li>How many people are on Mac, Linux or Windows.</li>
</ul>
<p>We can batch these four queries into a single Sphinx request.</p>
<p>Here’s <a href="http://github.com/davedash/reporter/commit/348018">our implementation</a>.</p>
<p>Having done this twice, I do recognize that there is a lot of room for making
the code a bit more reusable. But overall it runs fairly well.</p>
Counting Sphinx groupBy Queries
2010-10-15T00:00:00+00:00
http://davedash.com/2010/10/15/counting-sphinx-groupby-queries
<p>I quickly implemented Sphinx on Input, while revisiting it, I saw that we try
to answer this type of question:</p>
<blockquote>
<p>Of the results displayed:</p>
<ul>
<li>How many are happy and how many are sad?</li>
<li>How many are for Windows, Linux or Mac?</li>
<li>How many are for English, French or Japanese</li>
</ul>
</blockquote>
<p>Finding these involve using faceted search. Unfortunately this is a bit
awkward to do using Sphinx. For the first example, happy or sad you would have
to run the query like such:</p>
<ol>
<li>Take the query, remove any filters on <em>happiness</em> and do a group by on
happy opinions</li>
<li>Restore any filters on happiness and run the query as normal.</li>
<li>Return both the results, and the aggregate data from step 1.</li>
</ol>
<p>Doing the group by is easy, but you only get to know how many feelings there
are and what they were. In our case: happy and sad. What we really want is
how many of our original search were happy and how many were sad?</p>
<p>I assumed something like this would work:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>sphinx.SetSelect('feeling, @count')
</code></pre></div></div>
<p><code class="language-plaintext highlighter-rouge">@count</code> is one of those magic variables that Sphinx uses. Unfortunately this
doesn’t work. <code class="language-plaintext highlighter-rouge">COUNT(*)</code> doesn’t work either. Here’s what did:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>sphinx.SetSelect('feeling, SUM(1) AS count')
</code></pre></div></div>
<p>Not the straight forward mysqlish syntax I’ve come to expect from Sphinx, but
it works.</p>
Trimming Whitespace in Django Forms
2010-08-18T00:00:00+00:00
http://davedash.com/2010/08/18/trimming-whitespace-in-django-forms
<p>I’ve been using frameworks for a number of years. So I expect a lot of things
to happen “for free” in Django. One is whitespace removal. In <a href="http://delicious.com/">Delicious</a>
we had a lot of data in our database with leading and trailing whitespace. On
the frontend we moved to symfony (actually ysymfony) and that prevented a lot
of this.</p>
<p>So I was quite surprised that <a href="http://code.djangoproject.com/ticket/6362">this is not the case with Django</a>. So I
decided we could solve this at the form level, and released a
<a href="http://github.com/mozilla/happyforms">ridiculously simple library</a>. After some googling, I found that I was
<a href="http://www.peterbe.com/plog/automatically-strip-whitespace-in-django-forms">not the first to do this</a>.</p>
<p>Feel free to use this, fork it, submit pull requests, etc. I suspect in the
future we’ll handle other global form filtering - like stripping high order
Unicode since MySQL is often not a fan.</p>
The Perils of One Giant Fixture
2010-08-12T00:00:00+00:00
http://davedash.com/2010/08/12/the-perils-of-one-giant-fixture
<p><img src="/static/images/2010/08/12/time.jpg" alt="Timing" /></p>
<p>A while back, I thought it would be good to consolidate all the data used in
testing the django-layer of <a href="https://addons.mozilla.org/">AMO</a> into a single data fixture.
Unfortunately we have 600 tests, which were now loading and unloading large
amounts of data each time the test would run. This made our tests take 20
minutes.</p>
<p>I decided to cut this down quite a bit, by using smaller fixture files. Each
fixture file attempts to be a singular primary object (e.g. an Addon or a
Collection or a User) and its associated supporting objects. It’s far from
perfect, but it’s achieved tests that run in under 10 minutes.</p>
<p>The other side effect is tests will be simpler. They’ll only include the
addons needed to generate an effect, and if something can’t be done easily with
the fixtures in place, we can always alter the data during the test.</p>
The Python textcluster Package
2010-07-08T00:00:00+00:00
http://davedash.com/2010/07/08/the-python-textcluster-package
<p>Earlier I wrote about <a href="http://davedash.com/2010/03/18/finding-the-most-common-firefox-issues/">finding the most common Firefox issues</a>. I had
wanted to automate that process and continually find these issues.
Unfortunately I never had time to do this.</p>
<p>When they announced <a href="http://aakash.doesthings.com/2010/06/25/hi-my-name-is-firefox-input/">Firefox Input</a>, I thought about doing this again…
just with Firefox Input data but then I went on paternity leave and time kind
of crept away. But I mentioned the idea this week and it piqued some interest.</p>
<p>So I found myself with a bit of time to work on it. The first stage was
releasing a python library called <a href="http://github.com/davedash/textcluster"><code class="language-plaintext highlighter-rouge">textcluster</code></a>.</p>
<p><a href="http://github.com/davedash/textcluster"><code class="language-plaintext highlighter-rouge">textcluster</code></a> takes the <a href="http://davedash.com/2010/03/18/finding-the-most-common-firefox-issues/">work I did earlier</a> and makes it a bit more
general purpose. The idea is I can do something like this:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">docs</span> <span class="o">=</span> <span class="p">(</span>
<span class="s">'Every good boy does fine.'</span><span class="p">,</span>
<span class="s">'Every good girl does well.'</span><span class="p">,</span>
<span class="s">'Cats eat rats.'</span><span class="p">,</span>
<span class="s">"Rats don't sleep."</span><span class="p">,</span>
<span class="p">)</span>
<span class="n">c</span> <span class="o">=</span> <span class="n">Corpus</span><span class="p">()</span>
<span class="k">for</span> <span class="n">doc</span> <span class="ow">in</span> <span class="n">docs</span><span class="p">:</span>
<span class="n">c</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">doc</span><span class="p">)</span>
<span class="k">print</span> <span class="n">c</span><span class="p">.</span><span class="n">cluster</span><span class="p">()</span></code></pre></figure>
<p>Which results in:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[
(
"Rats don't sleep.",
{'Cats eat rats.': 0.21353467285253394}
),
(
'Every good girl does well.',
{'Every good boy does fine.': 0.32030200927880093}
)
]
</code></pre></div></div>
<p>The number is the “similarity” between the strings relative to the entire
document corpus.</p>
<p>My next trick is to see if I can run this memory-intensive calculation over a
data-set of 25,000 opinions submitted. If I can we can get some interesting
data about what people think of the new <a href="http://www.mozilla.com/en-US/firefox/all-beta.html">Firefox beta</a>.</p>
Firefox Input, powered by Sphinx
2010-07-06T00:00:00+00:00
http://davedash.com/2010/07/06/firefox-input,-powered-by-sphinx
<p>Thursday, I decided to take a half-day for my sanity, but saw an email about
how Whoosh wasn’t going to cut it for <a href="http://aakash.doesthings.com/2010/06/25/hi-my-name-is-firefox-input/">Firefox Input</a>. I was CC’d about
this and there was mention that Sphinx might be possible.</p>
<p>Sphinx is my hammer, and everything is a nail. So I said, let’s do this.
That translated into me spending my weekend, soothing <a href="/tag/baby">my newborn</a> and
working on Sphinx. Luckily this was easy, since <a href="https://addons.mozilla.org/en-US/firefox/">AMO</a> and <a href="http://support.mozilla.com/en-US/kb/">SUMO</a>
are both running Sphinx in a similar <a href="http://fredericiana.com/2010/06/23/under-the-hood-of-firefox-input/">Django environment</a>.</p>
<p>In order to move quickly, I copied code from the <a href="http://github.com/jbalogh/zamboni/">Zamboni</a> project to
<a href="http://github.com/fwenzel/reporter">Firefox Input</a>. Even our deployment into staging and production wasn’t
done by our usual “Sphinx guy” in IT. Ultimately, everything landed in place.</p>
<p>So <a href="http://input.mozilla.com/">try it out</a> and file bugs or let me know if searches don’t go as
planned.</p>
Your objects, and all their friends
2010-06-25T00:00:00+00:00
http://davedash.com/2010/06/25/your-objects,-and-all-their-friends
<p>This is complicated.</p>
<p>In my ever-evolving quest to <a href="/2010/03/05/django-fixture-magic-testing-issues-with-real-data/">get data out of the AMO database</a> for tests, I
found myself not just extracting a single object, but a list of complicated
requirements in order to fully replicate behavior in production in a testing
environment.</p>
<p>For <a href="https://addons.mozilla.org/en-US/firefox/">AMO</a> we can use <a href="http://pypi.python.org/pypi/django-fixture-magic">fixture magic</a> to dump a single add-on and all of
it’s <em>database</em> dependencies so that it will insert safely into a
test-database. But we need more than just valid data. We need some supporting
data. For an add-on to be browsable and searchable it needs to have a valid
version and the version needs to have a valid file.</p>
<p>In our app we can check for these things by using this:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>my_addon.current_version.files.all()[0]
</code></pre></div></div>
<p>Of course we need to check that <code class="language-plaintext highlighter-rouge">my_addon.current_version</code> exists and that
<code class="language-plaintext highlighter-rouge">files.all()</code> has at least one object. This ends up being a lot of work if you
just know the <code class="language-plaintext highlighter-rouge">id</code> of the add-on object.</p>
<p>So what I want is something simple, like:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>./manage.py custom_dump addon 3615
</code></pre></div></div>
<p>And it should get me everything I need to test add-on 3615, including a Version
object and any files associated with the version.</p>
<p>Turns out this <em>just works</em>. It works if you define the following settings:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="c1">## Fixture Magic
</span><span class="n">CUSTOM_DUMPS</span> <span class="o">=</span> <span class="p">{</span>
<span class="s">'addon'</span><span class="p">:</span> <span class="p">{</span> <span class="c1"># ./manage.py custom_dump addon id
</span> <span class="s">'primary'</span><span class="p">:</span> <span class="s">'addons.addon'</span><span class="p">,</span> <span class="c1"># This is our reference model.
</span> <span class="s">'dependents'</span><span class="p">:</span> <span class="p">[</span> <span class="c1"># These are items we wish to dump.
</span> <span class="c1"># Magic turns this into current_version.files.all()[0].
</span> <span class="s">'current_version.files.all.0'</span><span class="p">,</span>
<span class="p">],</span>
<span class="s">'order'</span><span class="p">:</span> <span class="p">(</span><span class="s">'app1.model1'</span><span class="p">,</span> <span class="s">'app2.model2'</span><span class="p">,),</span> <span class="c1"># stuff gets sorted
</span> <span class="s">'excludes'</span><span class="p">:</span> <span class="p">{</span>
<span class="s">'app1.model1'</span><span class="p">:</span> <span class="p">(</span><span class="s">'fields'</span><span class="p">,</span> <span class="s">'to'</span><span class="p">,</span> <span class="s">'hide'</span><span class="p">,),</span>
<span class="p">},</span>
<span class="p">}</span>
<span class="p">}</span></code></pre></figure>
<p>Using this we’re able to find out that <code class="language-plaintext highlighter-rouge">addon</code> means an <code class="language-plaintext highlighter-rouge">addons.addon</code> object
and that you want the <code class="language-plaintext highlighter-rouge">addon</code> object with an id of <code class="language-plaintext highlighter-rouge">3615</code>. From there we’ll
try looking for dependent objects. Using some black magic we can turn:
<code class="language-plaintext highlighter-rouge">current_version.files.all.0</code> into
<code class="language-plaintext highlighter-rouge">addon.objects.get(pk=3615).current_version.files.all()[0]</code>. This gives us a
file.</p>
<p>If we mimic our <code class="language-plaintext highlighter-rouge">dump_object</code> command we can get the <code class="language-plaintext highlighter-rouge">file</code> into the database
and everything that the file needs to be valid. This in turn gives us enough
data (usually) to begin testing a single <code class="language-plaintext highlighter-rouge">addon</code>.</p>
<p>So have fun with this, if you’re database is remotely complicated, this can
save you some time replicating it during testing.</p>
<p>Also note, that you can re-order models and exclude certain fields. This can
make your fixtures very easy to load.</p>
So your Wordpress has been hacked
2010-05-19T00:00:00+00:00
http://davedash.com/2010/05/19/so-youre-wordpress-has-been-hacked
<p>Last week, someone informed me that my blog had been hacked:</p>
<p>
<a href="http://www.flickr.com/photos/davedash/4621504223/" title="My blog got hacked by davedash, on Flickr">
<img src="http://farm5.static.flickr.com/4063/4621504223_210d430c1f_m.jpg" width="240" height="111" alt="My blog got hacked" /></a>
</p>
<p>I’m not quite sure what the vector was. Wordpress wasn’t very secure and I
didn’t take too many measures to harden it. A coworker of mine (on our security
team) decided it might be fun to have a look at the infected Wordpress
Installation.</p>
<h3 id="heres-how-the-hack-works">Here’s how the hack works</h3>
<ul>
<li>Your blog appears normal to you and your visitors.</li>
<li>Some rogue PHP code detects if Google is crawling your site and modifies
the text and links so it looks like your website is a Viagra pharmacy.</li>
<li>The links go to other infected blogs and thus builds up page rank for this
ring of blogs. So the upside is that your blog may be a top result… for
<em>VIAGRA</em>.</li>
</ul>
<h3 id="prevention">Prevention</h3>
<p>Here are some tips for prevention, but you can find a lot more by googling for
Wordpress hacks. My solutions are more technical:</p>
<ul>
<li>Don’t use Wordpress - I recently switched to Jekyll since it was conceptually
easier to understand, and it’s coder-friendly.</li>
<li>Remove all users other than your own.</li>
<li>Change your password.</li>
<li>Check your code into git so you can see what files have changed.</li>
<li>Prevent Wordpress from writing to your webroot.</li>
</ul>
<h3 id="restoration">Restoration</h3>
<p>Here’s what you’ll need to do to de-spam yourself:</p>
<ol>
<li>Verify that you are still spammed by using
Google Webmaster Tools|Labs|Fetch as Googlebot.</li>
<li>Back up your blog and database.</li>
<li>Move your Wordpress installation to a new directory.</li>
<li>Install Wordpress from scratch.</li>
<li>Remove all users except for yourself.</li>
<li>Change your password.</li>
<li>Copy your theme to your new installation.</li>
<li>Install <em>only</em> the plugins you need.</li>
</ol>
<p>By step 4, you should be able to verify, using <em>Fetch as Googlebot</em>,
that your website is no longer an online pharmacy.</p>
<p>Good luck.</p>
Test Driven Confidence
2010-04-20T00:00:00+00:00
http://davedash.com/2010/04/20/test-driven-confidence
<p>If you’re already testing your web applications, you can skip this post.</p>
<p>One of the bugs I am working for <a href="https://addons.mozilla.org/">AMO</a> on involves porting a small, but moderately complicated checkbox from our PHP site and rewriting it for Django.</p>
<p>I decided to look at the existing implementation and found it to not work correctly at all. This was frustrating, especially since I verified that my own code worked, and that QA verified that it worked as well.</p>
<p>This is frustrating on many levels. Chances are some minor assumption I made changed, and thus broke this functionality. Discovering regressions is never fun, and fixing them is can be long and tedious if you can’t automatically verify that everything is working correctly.</p>
<p>Lucky for me, coming up with tests is easy, you just do what you would do to verify the code satisfies the requirements and then code it. Sometimes the tests can take longer than writing the actual code, but ultimately you can ship with confidence. You can be confident that your feature won’t break in the future without immediate notice, and you can be confident that your new code won’t break anything else.</p>
Finding the most common Firefox issues
2010-03-18T00:00:00+00:00
http://davedash.com/2010/03/18/finding-the-most-common-firefox-issues
<p>Cheng Wang of the Mozilla Support team, a few months back, decided to present on some design ideas for <a href="http://support.mozilla.com/en-US/kb/">Firefox Support</a>. One of the issues he noted was that there are a lot of repeated issues and that it would be useful to group them. Grouping them lets you see how often something occurs, and secondly let’s you see how urgent it might be.</p>
<p>Luckily grouping and clustering text is something computers can do. So I wrote <a href="http://github.com/davedash/SUMO-issues">this utility</a> that does just that.</p>
<p>I ran this script over a sampling of data from the last week:</p>
<ul>
<li>Firefox won’t start after update. (65 related issues)
<ul>
<li>5.6: Firefox updated, Gmail not delivering mails</li>
<li>5.6: How to change My Profile when Firefox won’t load?</li>
<li>7.5: Once I close firefox, cannot start firefox again except system restart</li>
<li>5.6: When intalling updates Firefox uninstalls itself</li>
<li>16.8: firefox won’t start after update 3.6</li>
<li>11.2: Upgraded to Firefox 3.6 and now it won’t start</li>
<li>14.9: Firefox won’t start with most extensions</li>
</ul>
</li>
<li>How do I add a bookmark to more than one folder? (64 related issues)
<ul>
<li>8.9: How do I get my bookmarks on the bookmarks toolbar to show up as an icon only with no text?</li>
<li>7.5: Bookmarks lost after upgrade and cannot save new bookmarks</li>
<li>7.5: why do i have to add the .com now to addy’s?</li>
<li>8.7: When I open sidebar to edit bookmarks, I only see the folder for Bookmarks Toolbar. I do not see a folder just called Bookmarks nor do I see my list of bookmarks, that separately appear under bookmarks menu at top of screen</li>
<li>7.5: All my impoted bookmarks go to the same webpage</li>
</ul>
</li>
<li>How do I remove the \ask toolbar"?” (50 related issues)
<ul>
<li>14.9: How do I remove an unwanted toolbar?</li>
<li>5.6: how to remove temporary video files from computer</li>
<li>7.5: I have no Toolbars or searchbar and i cant bring them back</li>
<li>7.5: nowhere says how to REMOVE a toolbar - only how to add or modify one</li>
</ul>
</li>
<li>not able to open youtube videos (45 related issues)
<ul>
<li>5.6: Cannot open bookmark/history sidebar</li>
<li>5.6: After working well for years Firefox will now not open</li>
<li>6.7: opening bookmarks do not open in new tab</li>
<li>5.6: I can’t watch videos on youtube with firefox, but on internet explorer i can</li>
</ul>
</li>
<li>I cannot download Firefox 3.6. I’ve tried erasing the download file. I cannot get beyond logging out of Firefox. (44 related issues)
<ul>
<li>8.4: when downloading files firefox download manager will freeze and i will have to start over the file download</li>
<li>5.6: Firefox will not let me download anything! Can someone help?</li>
<li>6.3: cannot download epixHD.com: not compatible with firefox 3.6</li>
<li>5.0: Several tabs are coming up when i try to downloads things</li>
<li>5.0: Firefox wont open since I downloaded the 3.6 update.</li>
</ul>
</li>
</ul>
<p>The number on the right of the related issue is a score of how strongly it relates to the main issue.</p>
<p>The full sample is 352 clusters from an original 3000+ issues. That’s a lot less stuff to go through. We can tune this to have either less clusters, and more related issues in a cluster, or we can make more clusters of issues and that might result in more accuracy.</p>
<p>Despite the inaccuracy of clustering we can make some general observations:</p>
<ul>
<li>Firefox not starting is a big issue.</li>
<li>Bookmarks are either confusing or broken.</li>
<li>People don’t like toolbars</li>
<li>Opening things is hard</li>
<li>Downloading things or Firefox is hard</li>
</ul>
<p>Hopefully we can fine tune these reports and have them run regularly… maybe automatically posting to Tumblr?</p>
A few weeks in Chrome
2010-03-17T00:00:00+00:00
http://davedash.com/2010/03/17/a-few-weeks-in-chrome
<p>A number of weeks ago I got annoyed with Firefox and decided to use Chrome for a while. This reminded me of the olden days where I used Netscape for a while, and then IE6 came out, and then Phoenix came out all the while I’d keep switching to the newest shiniest thing (note: I’m not sure about the timeline of all the browsers either).</p>
<p>My browser of choice since Firefox was released has been Firefox. For some time - nothing shiny in browser-land was coming out. Little UI things in Safari kept me away (and the lack of extensions), but Chrome finally showed promise. WebKit, out of process plugins, process separated tabs and now extensions. This was great.</p>
<p>I immediately felt like I was going to really love Chrome, and be <em>that</em> guy at the office (I work at Mozilla) who insists on using Chrome (just like I was <em>that</em> guy at Yahoo! who used Google for everything). I also wanted to answer the question as to why so many people really like Firefox in spite of Chrome’s amazing speed – even many Googlers will admit to preferring Firefox.</p>
<p>Overall I’m happy with Chrome, but I’m switching back to Firefox for now. Here’s some things I observed:</p>
<ul>
<li>”<code class="language-plaintext highlighter-rouge">/</code>” in Firefox let’s you search. Which to me seems more natural than Ctrl-F. I am pleased that Chrome supports a lot of Firefox’s shortcuts, like <code class="language-plaintext highlighter-rouge">Cmd-1..9</code> for switching tabs, or <code class="language-plaintext highlighter-rouge">Cmd-Shift-T</code> for reopening a closed tab.</li>
<li>No titlebar… I kind of miss it.</li>
<li>XML is way easier to work with in Firefox. It’s collapsable and always looks pretty.</li>
<li>Certain sites don’t work well in Chrome, like Rypple or the AmericanExpress web site. Rypple surprisingly enough is built using the Google Web Toolkit. I really wish their was a “FirefoxTab” that would open certain sites in Firefox instead.</li>
<li>There’s a number of Jetpacks and Extensions that only exist for Firefox or they are severely lacking.
<ul>
<li>The Jetpacks for Mozilla’s Bugzilla instance are awesome.</li>
<li>The Delicious and AdBlock extensions on Chrome aren’t nearly as good as the ones for Firefox.</li>
</ul>
</li>
<li>Firebug is much better than the Chrome developer tools. For example, you can adjust css values instantly.</li>
<li>Extensions die… and don’t come back without restart and appear to never have been installed unless you remember them crashing.</li>
<li>AwesomeBar (the location bar in Firefox) queries your history much better than the OmniBar (the location bar in Chrome)
<ul>
<li>At first I thought this was because of a sparse history, but after several weeks I still have a hard time finding sites I’ve been to.</li>
<li>Chrome will show you a handful of results, and then let you know there are more results, but that takes you to a new screen which is a jarring UI.</li>
</ul>
</li>
<li>Chrome can be slow. The extensions can take a while, and even switching between tabs can be slow. At this point startup time can be a moot point.</li>
</ul>
<p>Overall this was a healthy exercise, since I really like to be up on new browsers, and Chrome really seems like it’s can be a good browser for many people. I’ll probably try it again after the next major Chrome update.</p>
Making our tests run thrice as fast
2010-03-16T00:00:00+00:00
http://davedash.com/2010/03/16/making-our-tests-run-thrice-as-fast
<p>I’ve written a faster version of <a href="http://github.com/jbalogh/test-utils/blob/c4c31905a95e59dcc8919c1030b23848ad7fbca6/test_utils/__init__.py#L57">TransactionTestCase</a> and packaged it with <a href="http://github.com/jbalogh/test-utils">test_utils</a>. It’s mysql specific since it relies on <code class="language-plaintext highlighter-rouge">SET FOREIGN_KEY_CHECKS=0</code> to flush the database.</p>
<p>The long story…</p>
<!-- more-->
<h3 id="why-speed-matters">Why speed matters</h3>
<p>We’re closing in on 300 tests for <a href="http://github.com/jbalogh/zamboni/">Zamboni</a>. As of yesterday, to run our entire test suite it would have taken approximately 5 minutes. If you run tests before code-reviews, during a code-review, and before you push to master - you’ve spent about 15 minutes doing tests for a single feature or bug-fix. We have about 5 developers, so this cycle happens many times in a work day. In that time many sandwiches can be made and consumed.</p>
<p>Even shortcuts, like running a subset of tests will only go so far, and ultimately we do want to validate that all our tests pass for any code-change.</p>
<h3 id="testing-sphinx-search-with-transactiontestcase">Testing Sphinx search with <code class="language-plaintext highlighter-rouge">TransactionTestCase</code></h3>
<p>Django recently sped up testing by running tests in a transaction. However, this means that data never gets committed to the database and therefore external tools, like the Sphinx indexer, will never see any of that data. So we resort to <code class="language-plaintext highlighter-rouge">TransactionTestCase</code> which <em>will</em> commit the data.</p>
<p>Unfortunately <code class="language-plaintext highlighter-rouge">TransactionTestCase</code> is painfully slow. The accepted practice is to only use <code class="language-plaintext highlighter-rouge">TestCase</code> if you want your tests to be fast. So, I decided to complain to <a href="http://blog.ianbicking.org/">one of our new hires</a> and he and I decided to tinker in mysql to figure out what was slow. We discovered the following:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">delete from [table] is slow</code></li>
<li><code class="language-plaintext highlighter-rouge">truncate [table] is slow</code></li>
<li>… unless you <code class="language-plaintext highlighter-rouge">SET FOREIGN_KEY_CHECKS=0</code></li>
</ul>
<p>So we decided we should do our own tear down. After some tinkering with <code class="language-plaintext highlighter-rouge">cProfiler</code> I discovered that <code class="language-plaintext highlighter-rouge">TransactionTestCase</code> does a (slow) database <code class="language-plaintext highlighter-rouge">flush</code> on setup for a test case. This wouldn’t do.</p>
<h3 id="making-our-own-transactiontestcase">Making our own <code class="language-plaintext highlighter-rouge">TransactionTestCase</code></h3>
<p>I decided to make our own <code class="language-plaintext highlighter-rouge">TransactionTestCase</code> and it would just run <code class="language-plaintext highlighter-rouge">SET FOREIGN_KEY_CHECKS=0</code> and <code class="language-plaintext highlighter-rouge">TRUNCATE</code> on each table at tear down time. It would also not do a <code class="language-plaintext highlighter-rouge">flush</code> on set up.</p>
<p>We write our tests with the idea that they clean up after themselves. Rather than having them cleanup after the last test. This is a requirement for us since <code class="language-plaintext highlighter-rouge">django-nose</code> doesn’t reorder tests (nor should it) and a standard <code class="language-plaintext highlighter-rouge">django.test.TestCase</code> assumes a clean database.</p>
<p>Looking at a single test <code class="language-plaintext highlighter-rouge">test_sphinx_indexer</code>, using <code class="language-plaintext highlighter-rouge">django.test.TransactionTestCase</code> took ~30 seconds. Using our new <code class="language-plaintext highlighter-rouge">TransactionTestCase</code> it takes ~4 seconds!</p>
<h3 id="fast-tests-are-good">Fast tests are good</h3>
<p>We can now run our 275 tests in ~100 seconds versus the ~300 seconds it used to take. Furthermore, skipping our sphinx tests (which are the only tests that use <code class="language-plaintext highlighter-rouge">TransactionTestCase</code>) only saves us ~10seconds. That’s not a lot of overhead for better coverage.</p>
<p>This took me the better part of a day, but solving this now, means we’re going to more often than not run our sphinx tests all the time rather than skip them. Our QA team will assure you that search is probably the most regression prone part of our site, so running these tests are vital to quality.</p>
<p>If you need to use <code class="language-plaintext highlighter-rouge">TransactionTestCase</code> in mysql, <a href="http://github.com/jbalogh/test-utils">give ours a try</a>.</p>
λ^2: safely doing class based views in Django
2010-03-09T00:00:00+00:00
http://davedash.com/2010/03/09/2-safely-doing-class-based-views-in-django
<p>When I started rewriting the API for <a href="https://addons.mozilla.org/">addons.mozilla.org</a>, my views were mostly the same: get some data and render it as either JSON or XML. I also wanted all my API methods to take an <code class="language-plaintext highlighter-rouge">api_version</code> parameter, so I decided class based views would be best. This way my classes could just inherit from a base class.</p>
<p>To do this I had to implement a <a href="http://github.com/davedash/zamboni/blob/b5a147820840e66b542691e7239f15eccdebeec9/apps/api/views.py#L39"><code class="language-plaintext highlighter-rouge">__call__</code> method</a>. This works fine, except I wanted to store things into the class – after all the whole point of my use of classes was to keep the code a bit more compact, and cleaner. So, why pass the api_version around everywhere? Unfortunately thread-safety comes to play, and you need a separate instance of your class for each request.</p>
<!--more-->
<h3 id="λ">λ</h3>
<p>Django’s <code class="language-plaintext highlighter-rouge">urlpatterns</code> expects a callable object. So you can’t give it an instance of <code class="language-plaintext highlighter-rouge">AddonDetailView()</code>. But you could give it a callable that creates an instance of <code class="language-plaintext highlighter-rouge">AddonDetailView()</code> and passes it <code class="language-plaintext highlighter-rouge">*args</code> and <code class="language-plaintext highlighter-rouge">**kwargs</code>. Luckily python has <code class="language-plaintext highlighter-rouge">lambda</code> functions. You can <a href="http://github.com/davedash/zamboni/blob/b5a147820840e66b542691e7239f15eccdebeec9/apps/api/urls.py#L10">note how we solved that in our <code class="language-plaintext highlighter-rouge">urlpatterns</code></a>.</p>
<h3 id="λ-λ">λ λ</h3>
<p>But wrapping all your urls with <code class="language-plaintext highlighter-rouge">lambda</code> is tedious and remembering to pass <code class="language-plaintext highlighter-rouge">*args</code> and <code class="language-plaintext highlighter-rouge">**kwargs</code> is error prone.</p>
<p>So let’s make a <code class="language-plaintext highlighter-rouge">lambda</code> function that returns… a <code class="language-plaintext highlighter-rouge">lambda</code> function that <a href="http://github.com/davedash/zamboni/blob/609ec5467dd6db6a6647f375e95abced5203a1b2/apps/api/urls.py#L9">turns an instance of our class into a callable</a>.</p>
<p>We can now return to coding and not think about thread safety.</p>
<p>λλλ</p>
django-fixture-magic: Testing issues with real data.
2010-03-05T00:00:00+00:00
http://davedash.com/2010/03/05/django-fixture-magic-testing-issues-with-real-data
<p>I just released <a href="http://github.com/davedash/django-fixture-magic">Fixture Magic</a>.</p>
<p>When dealing with legacy data, you’ll run into all kinds of edge cases. Perhaps, an object might not display correctly unless it has the right parameters, or if it has null parameters it might not display at all. So when testing <a href="http://djangoproject.com/">Django</a>, it’s nice to actually use non-dummy data.</p>
<p>Luckily Django has a way of pulling real data out of your database using <code class="language-plaintext highlighter-rouge">django.core.serializers</code>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>from addons.models import Addon
a = Addon.objects.get(id=3615)
from django.core.serializers import serialize
jsonize = lambda a: serialize("json", a, indent=4)
jsonize([a])
</code></pre></div></div>
<p>This solution runs well in a Django shell and can be lots of fun for the whole family… until things get complicated.
<!--more--></p>
<h3 id="serializing-alone-isnt-enough">Serializing alone isn’t enough.</h3>
<p>Serializing a fixture with foreign keys means you’ll have an un-loadable fixture unless you serialize the dependent fixtures. Even for one or two foreign keys, this can be a pain. For <a href="http://addons.mozilla.org/">addons.mozilla.org</a>, we have a spidery-web of dependencies: <code class="language-plaintext highlighter-rouge">File</code>s need a <code class="language-plaintext highlighter-rouge">Version</code> which needs an <code class="language-plaintext highlighter-rouge">Addon</code> which need <code class="language-plaintext highlighter-rouge">Translation</code>s.</p>
<p>Thus begat the <code class="language-plaintext highlighter-rouge">dump_object</code> management command. Give it an app, model name and a <code class="language-plaintext highlighter-rouge">pk</code> and it will give you not only a serialized JSON of that object, but all the objects that it requires.</p>
<p>Example:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>./manage.py dump_object files.file 64874 64876 > my_new_fixture.json
</code></pre></div></div>
<p>This looks for the <code class="language-plaintext highlighter-rouge">File</code> model in the <code class="language-plaintext highlighter-rouge">files</code> app and pulls out of the database <code class="language-plaintext highlighter-rouge">File</code>s instances with <code class="language-plaintext highlighter-rouge">pk</code>s of <code class="language-plaintext highlighter-rouge">64874</code> and <code class="language-plaintext highlighter-rouge">64876</code>. It then recursively searches for any required objects.</p>
<h3 id="too-much-serial">Too much serial</h3>
<p>If you create a lot of fixtures, you’ll eventually have overlapping serialized objects. In <code class="language-plaintext highlighter-rouge">addons.mozilla.org</code> we have <code class="language-plaintext highlighter-rouge">Addon</code>s, <code class="language-plaintext highlighter-rouge">Version</code>s (which depend on <code class="language-plaintext highlighter-rouge">Addon</code>s) and <code class="language-plaintext highlighter-rouge">AddonCategory</code>s (which depend on <code class="language-plaintext highlighter-rouge">Addon</code>s and <code class="language-plaintext highlighter-rouge">Category</code>s). If we wanted to get serialize a specific <code class="language-plaintext highlighter-rouge">Addon</code>, it’s dependent <code class="language-plaintext highlighter-rouge">Version</code>s and <code class="language-plaintext highlighter-rouge">AddonCategory</code>s it makes sense to start with <code class="language-plaintext highlighter-rouge">dump_object</code>ing the related <code class="language-plaintext highlighter-rouge">Version</code> and then <code class="language-plaintext highlighter-rouge">dump_objecting</code> the <code class="language-plaintext highlighter-rouge">AddonCategory</code>. Both <code class="language-plaintext highlighter-rouge">dump_object</code> commands will fetch the <code class="language-plaintext highlighter-rouge">Addon</code> in question, resulting in duplicated data.</p>
<p>To combat this we can use <code class="language-plaintext highlighter-rouge">merge_fixtures</code> to dedupe our fixtures:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>./manage.py dump_object versions.version 64874 > 1.json
./manage.py dump_object categories.addoncategory > 2.json
./manage.py merge_json 1.json 2.json > happy_fixture.json
</code></pre></div></div>
<p>This should make creating test data slightly less painful. So <a href="http://github.com/davedash/django-fixture-magic">give it a try</a>.</p>
Palm Pré: A retraction, I really like it now
2010-01-11T00:00:00+00:00
http://davedash.com/2010/01/11/palm-pre-a-retraction-i-really-like-it-now
<p>So before I went on trip to Minnesota last month, I decided maybe I would give the Palm Pré another shot. After all, my parents have no internet access, so having the Pré… if I could overcome <a href="/2009/11/19/palm-pre-always-hot/">my issues</a>, might be a welcome distraction.</p>
<p>Before I packed it, I updated to WebOS 1.3.x (a few days later I updated to 1.3.5) and I was blown away. The horsepower was increased by utilizing the GPU. The following problems were fixed:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>* The device was no longer hot all the time
* Shutdown and startup were long, but not nearly as long as before.
* Render times were quicker
* All the elements usually rendered quickly in an app
* Network was fairly steady
* Phone calls also seemed fairly drop-free.
</code></pre></div></div>
<p>All these improvements helped me get over
* The tiny keyboard… not so bad in practice.
* No soft keyboard - I missed it, but I could deal without it.</p>
<p>Overall the device was great, it was fast enough to use, and most of the errors were annoying, but things I could deal with. Cut and paste could be improved, and I wish the USB connector was the same as the one for HTC devices (I can’t keep micro or mini USB types straight).</p>
<p>So I love the device, and Murphy’s Law dictates if work gives you a phone you don’t like you get to keep it… until you start liking it again. So I sent the phone back into rotation for other people at Mozilla to try. Have at it.</p>
Google Chrome Extensions Puzzle
2010-01-06T00:00:00+00:00
http://davedash.com/2010/01/06/google-chrome-extensions-puzzle
<div style="float:left; margin-right:1em"><a href="http://www.flickr.com/photos/44124375866@N01/4252390433" title="View 'puzzle' on Flickr.com"><div style="text-align:center;"><img src="http://farm3.static.flickr.com/2684/4252390433_b49093b583_m.jpg" alt="puzzle" border="0" width="161" height="240" /></div></a>
</div>
<p>I went to Add-on-Con some weeks back to represent my employer, the Mozilla Corporation.</p>
<p>One of the goodies you get as a registrant was a jigsaw puzzle from the Google Chrome Extensions team.</p>
<p>Perfect, my wife and I love solving jigsaw puzzles. We finally finished a few days ago. Anybody who has started at all will realize the puzzle is of a QR-code. The QR-code is an extension that will eventually lead you to a prize. It was a bit of a mini-puzzle not nearly as difficult as finding the QR code.</p>
<p>Although finding a QR code scanner was a bit difficult, I had to borrow a HTC Magic from <a href="http://fligtar.com/">Justin Scott</a> and installed a decent barcode scanner.</p>
Django: Model Inheritance or Related Tables wrt AMO
2009-12-15T00:00:00+00:00
http://davedash.com/2009/12/15/django-model-inheritance-or-related-tables-wrt-amo
<p>When I attended DjangoCon this year, I lamented that our flagship web property was difficult to test, and not fun to develop. I figured DjangoCon was a way to placate me, and Django might mean something for some of the smaller projects at Mozilla. However, Wil Clouser, our lead web developer, <a href="http://micropipes.com/blog/2009/11/17/amo-development-changes-in-2010/">announced development changes</a> for <a href="http://addons.mozilla.org">addons.mozilla.org</a> (AMO) that says we’ll be moving to Django.</p>
<p>Wil was open to Django and knew that’s what we in the dev team wanted. Jeff spawned our foray into a new AMO with <a href="http://github.com/jbalogh/zamboni">Zamboni</a>. I’ve been working on some grunt-work tasks inside and outside of Django.</p>
<p>One of those tasks is building a transparent layer in Django to keep users logged in from our PHP-based site. That kind of problem almost immediately forces you to ask one of the most fundamental questions you ask when using any framework:</p>
<blockquote>
<p>How much do I change my app, in order to accommodate the framework?</p>
</blockquote>
<!--more-->
<p>More specifically:</p>
<blockquote>
<p>Should I use the <code class="language-plaintext highlighter-rouge">django.contrib.auth</code> User module, and to what extent?</p>
</blockquote>
<p>The more we looked into what features of Django we might want to use, <code class="language-plaintext highlighter-rouge">django.contrib.auth</code> was heavily tied into other things we wanted, so it made sense for us to use it. The next question is whether we try the <a href="http://scottbarnham.com/blog/2008/08/21/extending-the-django-user-model-with-inheritance/">inheritance approach</a> or do we treat our legacy users table as a sort of User Profile and utilize the User module using the <a href="http://www.b-list.org/weblog/2007/feb/20/about-model-subclassing/">related table approach</a>?</p>
<p>Using model-inheritance seems real nice, because we can pretend that our legacy user is the same thing as a <code class="language-plaintext highlighter-rouge">djaango.contrib.auth</code> User - but this isn’t true:</p>
<p>Looking at our <code class="language-plaintext highlighter-rouge">users</code> table more closely:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>mysql> explain users;
+-------------------------+---------------------+------+-----+---------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------------------+---------------------+------+-----+---------------------+----------------+
| id | int(11) unsigned | NO | PRI | NULL | auto_increment |
| email | varchar(255) | YES | UNI | NULL | |
| password | varchar(255) | NO | | | |
| firstname | varchar(255) | NO | | | |
| lastname | varchar(255) | NO | | | |
| nickname | varchar(255) | YES | MUL | NULL | |
| bio | int(11) unsigned | YES | MUL | NULL | |
| emailhidden | tinyint(1) unsigned | NO | | 0 | |
| sandboxshown | tinyint(1) unsigned | NO | | 0 | |
| homepage | varchar(255) | YES | | NULL | |
| display_collections | tinyint(1) unsigned | NO | | 0 | |
| display_collections_fav | tinyint(1) unsigned | NO | | 0 | |
| confirmationcode | varchar(255) | NO | | | |
| resetcode | varchar(255) | NO | | | |
| resetcode_expires | datetime | NO | | 0000-00-00 00:00:00 | |
| notifycompat | tinyint(1) unsigned | NO | MUL | 1 | |
| notifyevents | tinyint(1) unsigned | NO | MUL | 1 | |
| deleted | tinyint(1) | YES | | 0 | |
| created | datetime | NO | MUL | 0000-00-00 00:00:00 | |
| modified | datetime | NO | | 0000-00-00 00:00:00 | |
| notes | text | YES | | NULL | |
| location | varchar(255) | NO | | | |
| occupation | varchar(255) | NO | | | |
| picture_type | varchar(25) | NO | | | |
| averagerating | varchar(255) | YES | | NULL | |
+-------------------------+---------------------+------+-----+---------------------+----------------+
</code></pre></div></div>
<p>You can very easily argue that this is a profile table, which happens to have credential information thrown in.</p>
<p>I can see overtime, I’ll just struggle to keep our legacy User to act like a Django User, whereas a UserProfile is fairly standard.</p>
<p>Had I been writing this app from scratch, I would have chosen the UserProfile route. This is extra data which takes up a lot of space, and changes far more often than user credentials. Changing 4M+ rows sucks, by making users our UserProfile table, any changes to that table, don’t tie up the table used for sign-ins.</p>
<p>I’m curious what other people who port their apps to Django have done.</p>
Palm Pre: Always hot
2009-11-19T00:00:00+00:00
http://davedash.com/2009/11/19/palm-pre-always-hot
<p>So I borrowed a Palm Pré that we had at Mozilla to see what it was like. I was at first very excited, I remember before the Pre was released there was a lot of talk about how awesome-fantastic it was going to be. The stories of awesomeness sort of died, and I had thought nothing of it.</p>
<p>Immediately upon using the Pre I figured out why. In short, it’s a crappy phone. It makes a very good attempt to do a lot, but it does them with such piss-poor performance, that nothing good is noticed.</p>
<p>I am disappointed. It’s not even in the same class as an iPhone - maybe a future generation of Palm devices will be, but not this one. I was hoping WebOS would be a good alternative to the iPhone. It looks like Google will be doing that, though their phones haven’t impressed me much either. I am hoping that maybe this phone is just a dud.</p>
<p>Here’s what I didn’t like:</p>
<ul>
<li>The Palm was always hot.</li>
<li>The first run experience is painfully slow.</li>
<li>The first run was an indicator of things to come, startup and shutdown are ridiculously slow.</li>
<li>Every application is slow to render.</li>
<li>Not all elements of an app render.</li>
<li>The keys are too small. Some people aren’t migrating from a Treo and aren’t used to mini keys.</li>
<li>No soft keyboard.</li>
<li>The palm website doesn’t let you use plus-style addressing</li>
<li>Media Mode was not self explanatory - and forced the phone to not work.</li>
<li>Network would constantly drop out. Couldn’t use a lot of the data features.</li>
<li>Phone calls didn’t work so great.</li>
<li>Did I mention it was ass slow, even the dialing program was slow.</li>
<li>The battery dies quickly</li>
<li>I could only cut/paste when composing, but I couldn’t cut a string of text from an email.</li>
<li>Felt too much like an old palm</li>
</ul>
<p>Despite the sadness there were a few good things:</p>
<ul>
<li>When it did fetch email, and other notices, it displayed them nicely</li>
<li>The unification of Facebook and Gmail was pretty cool - it also made me want to trim some of those friends from highschool off my facebook - I ain’t ever gonna call em.</li>
<li>The Icons were pretty.</li>
<li>The card interface was interesting.</li>
<li>The travel charger could be modified to work in non US chargers fairly easily.</li>
</ul>
<p>All in all, I’m glad that I had a chance to try out this device. It showed me, that user interfaces above all need to be very fast and responsive. Furthermore, everything you try to do should be done exceptionally well. I’m hopeful that software updates can alleviate some of the problem, but I think the root of the problem is slow hardware.</p>
AMO Search: Powered by Sphinx
2009-09-30T00:00:00+00:00
http://davedash.com/2009/09/30/amo-search-powered-by-sphinx
<p>Last night, I gave a talk at the <a href="https://wiki.mozilla.org/AddonMeetups:2009:Chicago">Addons Meetup</a> at Threadless HQ in Chicago on the new search engine powering <a href="http://addons.mozilla.org/">addons.mozilla.org</a>. I’ll recap the technical portion of the talk and give a bit more details.</p>
<p>First, I’d like to thank Harper and Threadless. It was a great location in the greatest city in the universe. Before and after the meetup, Harper was just an all-around great guy to hang with and the threadless headquarters was a nice hangout place for meeting people interested in addons.</p>
<p>Shortly after my talk, our Engineering Ops team deployed the new AMO 5.1 complete with a new Sphinx powered search engine.</p>
<p>So let’s talk about search. Note: parts of this are a rehash of my talk, so feel free to skip around.</p>
<!--more-->
<h3 id="a-bit-about-addons">A bit about addons</h3>
<p>Addons is a huge growing space. Arguably it’s Mozilla’s best kept secret. Sure readers of this blog probably know what Addons are, but ask people who aren’t as web-savvy. Most people don’t know what a browser is - and it’s hard to explain it to people without getting technical.</p>
<p>We can just skip that step. Because Addons are small things that people can easily “get”.</p>
<p>“It’s an easy way to customize the internet when your surfing.”</p>
<p>While perhaps not technically correct, its one way of explaining it to people. Maybe a better way is just showing people what they can do with addons.</p>
<p>On my flight out to Chicago, I talked to a person on the plane who didn’t know what a browser was, but after showing her <a href="http://addons.mozilla.org/">AMO</a> she was really intrigued.</p>
<p>If everyday non-technical people can realize the potential of addons, it’s only a matter of time before they start knocking down the doors to AMO.</p>
<p>So we better be prepared to handle them, and get them what they want.</p>
<h3 id="the-technical-details-of-addonsmozillaorg">The technical details of addons.mozilla.org</h3>
<p>Everytime you open Firefox, it pings <a href="http://addons.mozilla.org/">AMO</a> to see if there’s any updates to any of the addons that happen to be installed. Over a third of the people using Firefox have at least one addon, and Firefox is roughly 22% of the browser market. That means roughly 7% of people opening their browsers are pinging our servers for updates.</p>
<p>Needless to say it’s a lot of traffic, and to support it we need a fair amount of hardware. AMO is clearly the largest site in the Mozilla universe in both respects.</p>
<p>Some stats:</p>
<ul>
<li>1 mySQL master</li>
<li>4 mySQL slaves</li>
<li>2 memached servers</li>
<li>2 Sphinx indexer/search daemons</li>
<li>24 Web Frontend</li>
<li>Multiple Zeus ZXTM clusters all</li>
</ul>
<p>Most of this is standard, we’ll talk about Sphinx later, but Zeus is amazing. I didn’t know what Zeus was until earlier this year when I interviewed with Mozilla’s VP of Engineering Operations. All our requests get cached so much of our hits actually hit our Zeus cluster and not our web servers.</p>
<p>To see just how amazing they are read our <a href="http://blog.mozilla.com/mrz/">mrz’s ops blog</a>.</p>
<h3 id="why-search-matters">Why search matters</h3>
<p>If you have any kind of custom content and unique meta data a custom search solution is a must. Browsing through a site isn’t going to cut it. Browsing is dead. Search is how you find things on a web site. On <a href="http://addons.mozilla.org/">AMO</a> you may see an addon that’s featured somewhere, or you might want to see what’s out there, but the right search query will find you the right addon in two clicks.</p>
<h3 id="improve-search">Improve Search</h3>
<p>So my first job on AMO was to <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=498999">improve addons search</a>. It was a vague request and born out of frustration with what we had. It wasn’t a problem that certain things were indexed, or unicode didn’t work, or results weren’t sorted. We may have had all those problems, but as a product search needed to be replaced.</p>
<p>To me it meant that we needed some framework that would allow developers to quickly debug and fix any future search calamities at a moments notice.</p>
<p>So here were the goals I made for myself:</p>
<ul>
<li>Do something that sucks less than what we’ve got</li>
<li>Do something that makes it easier to suck less in the future</li>
<li>Do something that’s easy to use for our operations team, web developers and most importantly, end-users</li>
<li>Reduce strain on our databases, developers and operations teams</li>
</ul>
<h3 id="complex-data">Complex Data</h3>
<p>Our data set is small (we have 5,000 addons), but there’s a lot of secondary meta data about the addons that we track:</p>
<ul>
<li>Addons work in 1 or more locales (e.g. en-US, fr, de, etc)</li>
<li>Addons are optionally platform specific (Linux, OS X, etc)</li>
<li>Addons work with one or more products (Firefox, Thunderbird, Seamonkey, Sunbird or Fennec)</li>
<li>Addons come in multiple flavors (extensions, themes, dictionaries and more)</li>
</ul>
<p>We want to index all this data. Unfortunately to get at much of this data it involves either numerous queries, or numerous joins which put a strain on mysql. How much strain?</p>
<p>At peak we get about 10 search queries per second. If we do something smarter this won’t have to cause a lot of strain.</p>
<h3 id="using-sphinx">Using Sphinx</h3>
<p>Sphinx is an open source search indexer and daemon. It’s used by Craigslist, the Pirate Bay and <a href="http://support.mozilla.com">Mozilla Support</a>. It was very easy to use and despite a complicated set of data and business logic, Sphinx was up to the task.</p>
<h3 id="the-challenges">The challenges</h3>
<p>We needed to search for addons in several languages. So indexing just addons wouldn’t work, we need to make sure we have every translation of every addon indexed. For those counting, we have 5,000 addons, but 18,000 translations of addons.</p>
<p>All the joining and filtering that needed to be done for our old search still needs to be done, but we can do this all in one shot by using a mysql view. This view is a flat list of each translated addon as well as all meta data associated with it. This then gets fed into the sphinx indexer.</p>
<p>Along the way we ran into some issues which used to be dealt with outside of mysql, such as comparing versions. It was gross and quite a hack, so we turned the variety of <a href="http://spindrop.us/2009/08/07/v-is-for-version-hell/">acceptable version strings into integers</a>.</p>
<p>We also learned that stemming wasn’t a good idea as we assumed it would be. Stemming was great for searching through lots of text, but a great deal of addon searches were really just searches for product names, so we opted for substring searches. We’ll see how that fares. There is probably room for improvement.</p>
<p>Much of this, however involved knowing our data, and knowing how it will be used by our users. Once we got that down, we could hammer it all out using Sphinx.</p>
<h3 id="wins">Wins</h3>
<p>So Sphinx gains us a bit architecturally. We have a complicated query, but it only gets run once every 5 minutes versus the 180,000 times it was run “on demand.”</p>
<p>Indexing happens rather quickly, just over a minute.</p>
<p>The API was a breeze to work with, and was easy to drop into our own codebase.</p>
<p>Because of our relatively small data set, and quick indexing, we’re able to scale this simply by cloning and load balancing. Meaning, we just need to scale for traffic, but addon growth (which is slower than traffic growth) we can safely not worry about for a while.</p>
<p>Our ops team can monitor the sphinx clusters and just deploy additional nodes as needed.</p>
<h3 id="building-a-platform">Building a platform</h3>
<p>What we’ve done is built a foundation for search. Not all the problems are gone, but a lot of the problems that our QA team finds are able to be resolved quickly. We have a nice pile of unit tests as well that help us keep our results in check when we start tweaking dials.</p>
<p>We even have the groundwork for some nifty advanced search syntax, that hopefully we can inject into future releases of AMO.</p>
<p>Enjoy. And if you find anything, <a href="http://bit.ly/search-bugs">let me know</a>.</p>
mySQL and the grand regexp retardedness with lettercasing
2009-09-19T00:00:00+00:00
http://davedash.com/2009/09/19/mysql-and-the-grand-regexp-retardedness-with-lettercasing
<p>I wanted to find a list of Firefox addons that had smushed text in their title.
E.g. FireBug or StumbleUpon. The normal porter stemming algorithm that Sphinx
uses does not turn “StumbleUpon” into “stumbl upon” as it would with “Stumble
Upon”. I was hoping for, and unfortunately could not find a method to do a
regular expression search/replace using mysql. If I could, I could have Sphinx
read “StumbleUpon” as “Stumble Upon” and all would be well (although in theory
this would backfire).</p>
<p>So my Plan B was to get a list of common smushed named addons (I’d say
camelCase, but camelCase is different from SmushedText). Naturally I used my
exceptional skill at regular expressions to concoct this query:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>mysql> SELECT name FROM translated_addons WHERE name REGEXP '[a-z][A-Z][a-z]' = 1 LIMIT 10;
+------------------------+
| name |
+------------------------+
| Orbit Grey |
| Phoenity |
| Pinball |
| Qute |
| FirefoxModern |
| Adblock |
| Add Bookmark Here |
| All-in-One Gestures |
| Bookmarks Synchronizer |
| Browser Uptime |
+------------------------+
10 rows in set (41.28 sec)
</code></pre></div></div>
<p>Wait… none of these match. I scratched my head for a bit and then thought,
oh wait, mysql is case insenstivie maybe it’s turning <code class="language-plaintext highlighter-rouge">[a-z][A-Z][a-z]</code> into
<code class="language-plaintext highlighter-rouge">[a-z][a-z][a-z]</code> ― stupid, but consistent with mysql. Then I pulled my
other regexp card out of my sleve, character classes:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>mysql> SELECT name FROM translated_addons WHERE name REGEXP '[[:lower:]][[:upper:]][[:lower:]]' = 1 LIMIT 10;
+------------------------+
| name |
+------------------------+
| Orbit Grey |
| Phoenity |
| Pinball |
| Qute |
| FirefoxModern |
| Adblock |
| Add Bookmark Here |
| All-in-One Gestures |
| Bookmarks Synchronizer |
| Browser Uptime |
+------------------------+
10 rows in set (12.96 sec)
</code></pre></div></div>
<p>No difference. Time to pull out the
<a href="http://dev.mysql.com/doc/refman/5.1/en/regexp.html">mysql documentation</a>:</p>
<blockquote>
<p>REGEXP is not case sensitive, except when used with binary strings.</p>
</blockquote>
<p>ORLY?</p>
<p>Case-insenstive regular expressions when looking for <code class="language-plaintext highlighter-rouge">[[:upper:]]</code> or
<code class="language-plaintext highlighter-rouge">[[:lower:]]</code>? Fine… I’ll add some syntax to make you work right:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>mysql> SELECT DISTINCT name FROM translated_addons WHERE name REGEXP BINARY '[[:lower:]][[:upper:]][[:lower:]]' = 1 LIMIT 10;
+---------------------------+
| name |
+---------------------------+
| FirefoxModern |
| ChatZilla |
| ChromEdit |
| CuteMenus |
| DownloadWith |
| easyGestures |
| JavaScript Console Status |
| LinkVisitor |
| OpenBook |
| QuickNote |
+---------------------------+
10 rows in set (9.68 sec)
</code></pre></div></div>
<p>That’s more like it!</p>
<p>Unfortunately there’s about 2609 addons matching this query and since I can’t
automatically fix these in mysql, I’ll need to do some work:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>1. Create a new table for additional indexable data.
2. Upon creation of any new addons with names that have SmushedText - store the "un smushed text".
3. Index this "extras" field in Sphinx.
</code></pre></div></div>
<p>Bug: <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=517699">517699</a></p>
Getting started with pipe viewer
2009-09-16T00:00:00+00:00
http://davedash.com/2009/09/16/getting-started-with-pipe-viewer
<p>Despite working on slimming the <code class="language-plaintext highlighter-rouge">addons.mozilla.org</code> database through dieting and exercise - I still have to occasionally do long running database tasks. So I finally tried out <a href="http://www.ivarch.com/programs/pv.shtml">pipe viewer</a>. As someone who’s impatient this has been awesome. Here’s some quick examples:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[root@ml-db10 sun]# pv -cN source < addons_remora.2009.09.15.sql.gz | gunzip|pv -cN gunzip > addons_remora.2009.09.15.sql
gunzip: 10.1GB 0:06:48 [25.5MB/s] [ <=> ]
source: 3.47GB 0:06:48 [8.72MB/s] [======================>] 100%
</code></pre></div></div>
<p>Here we are calling pipe viewer with an argument that says to title this progress meter as <code class="language-plaintext highlighter-rouge">source</code>, and feeding it the gzip’d file. Pipe viewer will output two things the progress, and the actual file. We pipe that file into <code class="language-plaintext highlighter-rouge">gunzip</code> to unzip it, and back into another instance of pipe viewer (again with a title, of <code class="language-plaintext highlighter-rouge">gunzip</code>) and the standard output gets redirected to our destination file.</p>
<p>Now a simpler example is checking the progress of loading a large sql file into mysql:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[root@ml-db10 sun]# pv -cN sql < addons_remora.2009.09.15.sql | mysql -uroot addons_remora -p$PWD
sql: 2.55GB 0:18:19 [5.68MB/s] [=====> ] 25% ETA 0:54:30
</code></pre></div></div>
<p>We could have probably combined all this, however:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[root@ml-db10 sun]# pv -cN source < addons_remora.2009.09.15.sql.gz | gunzip|pv -cN gunzip | mysql -u root addons_remora -p$PWD
</code></pre></div></div>
<p>Armed with this knowledge you can determine whether to grab a soda, a sandwich or a 2-hour lunch.</p>
DjangoCon wrapup
2009-09-15T00:00:00+00:00
http://davedash.com/2009/09/15/djangocon-wrapup
<p>I went to <a href="http://djangocon.org/">DjangoCon</a> this past week for work. Django is one of my favorite frameworks. I dropped PHP and the symfony framework to learn python and Django and I haven’t looked back. I think for Mozilla’s webdev team it would be the framework of choice. We have 100s of sites in many frameworks, but not a lot of resuability. Django apps are built to built to be reusable. If you build correctly you don’t have to refactor, it’s already done.<!--more--></p>
<p>Here’s a collection of notes I collected through the conference.</p>
<h3 id="day-one">Day one</h3>
<h4 id="keynote---avi-bryant">Keynote - Avi Bryant</h4>
<blockquote>
<p>Frameworks lock us into RDBMS = bad</p>
</blockquote>
<p>This keynote mentioned the limits of modern frameworks and modern web development. Essentially frameworks are great for getting started, but as a site grows, the framework gets replaced little by little. Sometimes it can get in the way - such as with limitation of database choices.</p>
<h4 id="ur-doing-it-wrong---james-bennet">UR doing it wrong - James Bennet</h4>
<p>James outlined a few key problems that many Django developers run into:</p>
<ul>
<li>learning python as you go
<ul>
<li>doesn’t work unless you know some programming upfront</li>
<li>do the python tutorial</li>
<li>read python in a nutshell or dive into python</li>
</ul>
</li>
<li>Things you should know:
<ul>
<li>subclasses</li>
<li>super()</li>
<li>slides went too fast… hopefully they’ll be posted</li>
</ul>
</li>
</ul>
<p>All in all RTFM for python and Django :)</p>
<p>Learn about other py packages… like twisted. If Twisted Matrix was implemented in Ruby it would be advertised as the second coming of Christ.</p>
<p>Bennet’s Django App review smoketests:</p>
<ul>
<li>installable via pip, easy_install or setup.py
<ul>
<li>read distutils-guide</li>
<li>stay away from setuptools</li>
</ul>
</li>
<li>have a README</li>
<li>INSTALL file list deps</li>
<li>Write DOCUMENTATION
<ul>
<li>use sphinx.pocoo.org</li>
<li>store it in your package <em>and</em> upload package docs</li>
</ul>
</li>
<li>LICENSE (most Django apps use BSD)</li>
<li>Write unit tests</li>
<li>django-lint - to look over code (like pep8.py)</li>
</ul>
<p>pro-django is a decent book, but not written by Bennet.</p>
<h4 id="testing---eric-holscher">Testing - Eric Holscher</h4>
<ul>
<li>Django 1.1 encourages you to test by auto-creating tests.py.</li>
<li>Support for:
<ul>
<li>Unittests</li>
<li>Doctest</li>
<li>Tests done in a db transacation</li>
</ul>
</li>
<li>Test Driven Documentation (TDD + DDD)</li>
<li>Doctest
<ul>
<li>easy</li>
<li>can’t use PDB</li>
<li>Hides certain failures</li>
</ul>
</li>
<li>Unittests via Django TestCase
<ul>
<li>XUnit</li>
<li>setup/Teardown</li>
<li>adds db fixtures</li>
<li>assertions</li>
<li>mail testing/inbox testing</li>
<li>url testing</li>
</ul>
</li>
<li>TestCase
<ul>
<li>Browserless Request/Response testing</li>
<li>Similar to sfBrowser in symfony</li>
</ul>
</li>
<li>Google Summer of Code (for Django 1.2)
<ul>
<li>Coverage reports!</li>
</ul>
</li>
<li>I need to learn PDB</li>
</ul>
<h4 id="deploying-django--">Deploying Django -</h4>
<p>Run mod_wsgi in daemon mode.</p>
<h3 id="day-2">Day 2</h3>
<h4 id="keynote---ian-bicking"><a href="http://blog.ianbicking.org/2009/09/10/a-new-self-definition-for-foss/">Keynote - Ian Bicking</a></h4>
<p>GNU Manifest:</p>
<blockquote>
<p>I consider that the golden rule requires that if I like a program I must share it with other people who like it. Software sellers want to divide the users and conquer them, making each user agree not to share with others. I refuse to break solidarity with other users in this way. I cannot in good conscience sign a nondisclosure agreement or a software license agreement. …</p>
</blockquote>
<blockquote>
<p>So that I can continue to use computers without dishonor, I have decided to put together a sufficient body of free software so that I will be able to get along without any software that is not free.</p>
</blockquote>
<ul>
<li>GNU manifesto was the idea of sharing software amongst friends</li>
<li>GNU has purpose - BSD, etc is just a rule - free to share</li>
<li>Free is not just the absense of copyright</li>
<li>Free is not a reaction to existing rules, but a golden rule</li>
<li>Not just a fight against MS</li>
<li>Need to find morality (the why) within the practical (the law, or what you can do)</li>
<li>Open sourcing closed source code isn’t building open source</li>
<li>This might apply to Mozilla… as webkit has taken off more than Gecko.</li>
<li>Open source is person to person not company to company - despite sponsorship.</li>
</ul>
<h3 id="using-django-in-non-standard-ways---eric-florenzano">Using Django in Non-standard ways - Eric Florenzano</h3>
<ul>
<li>Django loosely coupled</li>
<li>Replace templating with Jinja 2</li>
<li>Copy Django methods into djangoext to easily customize Django behavior</li>
<li>Not using django.contrib.auth
<ul>
<li>reasons: writing a fb app - no auth needed</li>
<li>no shoehorning needed - saves time - less overhead</li>
</ul>
</li>
<li>skip the orm?
<ul>
<li>legacy dbs</li>
<li>non standard or db (or non-relational database)</li>
<li>no database</li>
</ul>
</li>
<li>wsgi middleware has some cool shit
<ul>
<li>repose.bitblt: autoscales images</li>
<li>repose.squeeze: will concat js/css on the fly based on statistical analysis</li>
</ul>
</li>
<li>non standard Django based apps
<ul>
<li>YARDBird - IRCBot framework</li>
<li>djng micro framework</li>
<li>Jngo- singlefile cms</li>
</ul>
</li>
<li>using admin in a nonstandard way is hard/impossible coupled with ORM and auth</li>
</ul>
<h4 id="real-time-web-and-other-buzzwords---chris-wanstrath">Real-time web and other Buzzwords - Chris Wanstrath</h4>
<ul>
<li>more than just getting your rss feeds faster</li>
<li>push vs. pull</li>
<li>1 persisting connection vs polling</li>
<li>comet/flash-xml/or html5 web socket</li>
<li>orbitted - open source python comet server</li>
<li>zeddicus - does the business logic</li>
<li>orbitted has its own js libs - its a simple port/socket thing for your server code to deal with - not request/response.</li>
<li>all connections are persisting browser/orbitted orbitted/zeddicus</li>
<li>You can even use orbitted to connect straight to IRC and write a client in JS</li>
<li>Jetty also is good for comet</li>
</ul>
<p>Also:</p>
<ul>
<li>see webhooks</li>
<li>see pubsubhubub</li>
</ul>
<h4 id="pluggable-reusable-django-apps-a-use-case-and-proposed-solution---shawn-rider-and-nowell-strite"><a href="http://www.slideshare.net/nowells/djangocon-09-presentation-pluggable-applications">Pluggable, Reusable Django Apps: A Use Case and Proposed Solution</a> - Shawn Rider and Nowell Strite</h4>
<ul>
<li>PBS moved from perl to django - build a lot of reusable apps</li>
<li>convincing your superiors
<ul>
<li>need a good story -</li>
<li>existing base of python helped</li>
<li>With Django easy to do things right without doing things slow</li>
<li>be really good…</li>
</ul>
</li>
<li>built a lot of apps to be very reusable, and pluggable based on requirements PBS had</li>
</ul>
<h3 id="day-3">Day 3</h3>
<h4 id="keynote---ted-leung---sun"><a href="http://www.slideshare.net/twleung/djangocon-2009-keynote">Keynote</a> - Ted Leung - Sun</h4>
<ul>
<li>Django jobs are a growing market</li>
<li>Preferred by startups</li>
<li>Bespin/wave - cool</li>
<li>APIs are big… still</li>
<li>Physically impossible to create purely server-side interactions that are usable enough - rely on rest/comet/ajax/etc to bridge gap</li>
</ul>
<h4 id="scaling-django-mike-malone"><a href="http://immike.net/files/scaling_django_dc09.pdf">Scaling Django</a> Mike Malone</h4>
<ul>
<li>MM from Pownce (now sixapart)</li>
<li>Slides started out as “Building Scalable Web Applications”</li>
<li>Django didn’t get in the way too much when it came to scaling</li>
<li>Django had tons of caching support</li>
<li>Cached objects by hand (memcached) and object ID lists</li>
<li>Use memache for sessions too</li>
<li>use signals to signal cache invalidation</li>
<li>race conditions…</li>
<li>Queue shit… gearman, rabbit mq, etc.</li>
<li>Memecached incr/decr operators are awesome</li>
<li>See gh/mmalone/django-caching</li>
<li>See gh:…/django-multidb</li>
<li>to combat slavelag use a memcache key to alternate between master or slave</li>
</ul>
<h4 id="gearman---working-later---chris-heisel"><a href="http://heisel.org/blog/2009/09/11/gearman/">Gearman - working later</a> - Chris Heisel</h4>
<ul>
<li>Gearman - a work later alt to rabbit mq</li>
<li>Makes the most sense for something like cesium, with a bazillion worker <strike>bees</strike> foxes feeding off a single queue</li>
</ul>
<p>Also at the con, I talked to someone about rebuilding large apps… and they took a PHP app and used URL rewriting to and a lot of PHP/Python glue code to build a seamless transitory app. The rule is, all new functionality was done up in python while the old app was in maintenance mode.</p>
<p>More talks <a href="http://djangocon.pbworks.com/Slides">here</a>!</p>
Snow Leopard for Macports and Mysql users
2009-09-02T00:00:00+00:00
http://davedash.com/2009/09/02/snow-leopard-for-macports-and-mysql-users
<p>I use mysql and macports on OSX and both were broken when I upgraded to Snow Leopard.</p>
<p>Mysql was a quick fix:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ln -s /usr/local/mysql-5.1.35-osx10.5-x86 /usr/local/mysql
</code></pre></div></div>
<p>(you’re installed version might be different). It turns out a symlink was removed during the Snow Leopard upgrade.</p>
<p>As for MacPorts, I had to install Xcode from the Snow Leopard CD, install the Snow Leopard version of MacPorts and then follow <a href="http://trac.macports.org/wiki/Migration">this migration guide</a>.</p>
Fun with HTML5, Blockchalk, Bookmarklets and Google Maps
2009-09-01T00:00:00+00:00
http://davedash.com/2009/09/01/fun-with-html5-blockchalk-bookmarklets-and-google-maps
<p>I’ve been filing away neat things that I’ve learned. Like:</p>
<ul>
<li>Firefox supports GeoLocation (which varies in accuracy, but is really accurate for me)</li>
<li>Stephen Hood released the <a href="http://blockchalk.com/developers">Blockchalk API</a></li>
<li><a href="http://twitter.com/joshu/status/3679085168">Google Maps lets you use GeoRSS feeds as a term</a></li>
</ul>
<p>This solves the problem I had with BlockChalk, which is I wanted a way that I could see what’s going on near where I am - and I don’t have an iPhone.</p>
<p>So I wrote two bookmarklets:</p>
<p><a href="javascript:navigator.geolocation.getCurrentPosition(function(p){window.location='http://maps.google.com/maps?q=http://blockchalk.com/api/v0.6/chalks/'+p.coords.latitude+','+p.coords.longitude;})">Blockchalk Me</a> which will list Blockchalk listings near you</p>
<p>and</p>
<p><a href="javascript:c=gApplication.getMap().getCenter();window.location="http://maps.google.com/maps?q=http://blockchalk.com/api/v0.6/chalks/"+c.lat()+","+c.lng()">Blockchalk this Google Map</a> which only works if Google Maps is open. It will load Blockchalks that are near the center of the open Google Map.</p>
<p>Unfortunately Blockchalk doesn’t have a lot of data yet, and will return no results if there’s nothing within a mile radius. Hopefully a radius parameter will be included for the API call.</p>
<p>So there’s no guarantees on the first bookmarklet, but the second bookmarklet should yield nice results for <a href="http://maps.google.com/maps?q=37.74339,-122.428924">this location</a>.</p>
<p>Enjoy.</p>
git svn rebase... forever?
2009-08-12T00:00:00+00:00
http://davedash.com/2009/08/12/git-svn-rebase-forever
<p>While working on <a href="http://addons.mozilla.org/">addons.mozilla.org</a> I ran into an issue of <code class="language-plaintext highlighter-rouge">git svn rebase</code> continually asking me to merge a file, over and over.</p>
<p>I had a branch open for a bug. In that branch I wrote a library. While that bug was under review, I had to use that library in a new branch for another bug - and had to develop on it a bit.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ git co -b bug1 master
$ vi libs/mylib.php # make the lib
$ git add .
$ git commit -m "my new lib"
$ git checkout -b bug2 master
$ git checkout bug1 libs/mylib.php # copies this file from one branch to the next
$ git commit -m "lib copied over"
$ vi libs/mylib.php # hack on the lib
$ git commit -m "awesomized lib"
$ git svn dcommit # push it up
$ git checkout bug1
$ git svn rebase #... oh shit
</code></pre></div></div>
<p>So the rebase was happening. This is git trying to merge your changes in bug1 and bug2 and play them together in realtime nicely, asking you each step of the way to merge things manually. I thought something weird was happening since “libs/mylib.php” kept needing manual merging. Then I noticed that git is applying a series of patches, and that eventually this will resolve and your site will be rebased.</p>
<p>Don’t lose hope, <code class="language-plaintext highlighter-rouge">git svn rebase</code> will finish.</p>
V is for Version Hell
2009-08-07T00:00:00+00:00
http://davedash.com/2009/08/07/v-is-for-version-hell
<p>Versioning is quite difficult to deal with. Versions are nearly-numbers, but
you can’t quite sort them using standard numerical algorithms.</p>
<p>While the following is true:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>1.1 < 1.2
</code></pre></div></div>
<p>The following is also true:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>1.2 < 1.18 < 1.20
</code></pre></div></div>
<p>The “.” is not a decimal point but a separator.</p>
<p>Mozilla uses a modestly complicated <a href="https://developer.mozilla.org/en/Toolkit_version_format">versioning system</a> that involves stars,
plusses, and sometimes “x”.</p>
<p>I found a very convoluted way to translate these versions into large integers.
The versions for applications in the AMO database have four parts at most, they
are potentially alpha or beta and potentially a pre-release. In some cases we
have multiple versions represented with <code class="language-plaintext highlighter-rouge">.*</code>, <code class="language-plaintext highlighter-rouge">.x</code> or <code class="language-plaintext highlighter-rouge">+</code> at the end.</p>
<!--more-->
<p>The <a href="https://developer.mozilla.org/en/Toolkit_version_format">Toolkit docs</a> let us translate “+” to mean “pre-release of the next
version”. E.g. 1.0+ is 1.1pre0. Since my primary purpose of all this is for
sorting, <code class="language-plaintext highlighter-rouge">.*</code> and <code class="language-plaintext highlighter-rouge">.+</code> may as well just be a very large “version part.” Since
all the version parts I deal with are a maximum of 2-digits, I turned <code class="language-plaintext highlighter-rouge">.*</code> and
<code class="language-plaintext highlighter-rouge">.+</code> into <code class="language-plaintext highlighter-rouge">.99</code>.</p>
<p>For example:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>3.5+ => '03'+'05'+'99' => 030599
</code></pre></div></div>
<p>We also need to deal with versions that may be alpha, beta or not. If
everything else is equal:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>3.5a < 3.5a5 < 3.5b < 3.5b2 < 3.5 < 3.5+
</code></pre></div></div>
<p>We assign a single integer to represent a version’s “non-alphaness”:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>a => 0
b => 1
non alpha/beta => 2
</code></pre></div></div>
<p>We assume that <code class="language-plaintext highlighter-rouge">3.5a = 3.5a1</code>. Therefore:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>'3.5a => 3.5.0a1 => '03'+'05'+'00'+'0'+'01' => 030500001
</code></pre></div></div>
<p>Similarly if it’s a pre-release we assign a 0 or 1 to represent
“non-pre-releaseness”:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>'3.5a pre2 => 3.5.0a1pre2
=> '03'+'05'+'00'+'0'+'01'+'0'+'02
=> 030500001002
</code></pre></div></div>
<p>So what does this get us? Integers which we can use for comparison, sorting,
etc. It’s a one time calculation for each version and we can do some nice SQL
statements in AMO like:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>mysql> SELECT version,version_int FROM appversions WHERE application_id = 1 ORDER BY version_int LIMIT 15;
+---------+--------------+
| version | version_int |
+---------+--------------+
| 0.3 | 30000200100 |
| 0.6 | 60000200100 |
| 0.7 | 70000200100 |
| 0.7+ | 80000200000 |
| 0.8 | 80000200100 |
| 0.8+ | 90000200000 |
| 0.9 | 90000200100 |
| 0.9.0+ | 90100200000 |
| 0.9.1+ | 90200200000 |
| 0.9.2+ | 90300200000 |
| 0.9.3 | 90300200100 |
| 0.9.3+ | 90400200000 |
| 0.9.x | 99900200100 |
| 0.9+ | 100000200000 |
| 0.10 | 100000200100 |
+---------+--------------+
15 rows in set (0.00 sec)
</code></pre></div></div>
<p>I can now index these integers using Sphinx and do some very easy searches for
addons based on version number.</p>
Have unique descriptive page titles
2009-07-31T00:00:00+00:00
http://davedash.com/2009/07/31/have-unique-descriptive-page-titles
<div style="float:right"><a href="http://www.flickr.com/photos/44124375866@N01/3764074726" title="View 'Sphinx - Free open-source SQL full-text search engine - (Build 20090715083437)' on Flickr.com"><div style="text-align:center;"><img src="http://farm4.static.flickr.com/3588/3764074726_0c02ffd18c.jpg" alt="Sphinx - Free open-source SQL full-text search engine - (Build 20090715083437)" border="0" width="483" height="338" /></div></a>
</div>
<p>One of my internet pet-peeves is people using the same page title for every page on their web site. Take a look at <a href="http://www.google.com/search?hl=en&q=+site:www.sphinxsearch.com+sphinx+api+php">this search for Sphinx</a>. As you can see virtually all the links for Sphinx are titled “Sphinx - Free open-source SQL full-text search engine” which blows for usability when it comes to searching, or even managing the various pages you might have open in your web browser.</p>
<p>To get an idea of the page I want I need to look at the abstract which may or may not give me a clue. Even the forum posts which usually have subjects, have their <code class="language-plaintext highlighter-rouge"><title></code> set to the site-wide default.</p>
<!--more-->
<p>The first step in solving this, is identifying you have a problem in the first place. So I wrote <a href="http://github.com/davedash/Title-Variance/tree">a tool</a> in python to determine how unique the page titles you have are:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>% python measure.py sphinxsearch.com
1050 titles found for sphinxsearch.com
483 unique titles found for sphinxsearch.com
46% of the pages on sphinxsearch.com have unique titles
</code></pre></div></div>
<p>This is fairly telling. It means over half the pages on sphinxsearch.com have a generic title.</p>
<p>This site faired a bit better:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>% python measure.py spindrop.us
988 titles found for spindrop.us
822 unique titles found for spindrop.us
83% of the pages on spindrop.us have unique titles
</code></pre></div></div>
<p>So please, think about people trying to use the information on your site. Design your templates in such a way that you can come up with unique titles.</p>
<p>Feel free to expand on this tool, it could easily output the offending pages or titles.</p>
Comprehensive list of international dialing codes
2009-07-01T00:00:00+00:00
http://davedash.com/2009/07/01/list-of-comprehensive-international-dialing-codes
<p>I get bored with mundane tasks. So I create little adventures for myself. I had to create a list of countries and country codes to use on the <a href="http://mozilla.com/mobile">Firefox mobile home page</a>. The first few lists were incomplete, so I made my own by parsing a list provided by the International Telecommunication Union. I stripped it down to simple forms of the country names and removed codes that are very rare (satellite phones).</p>
<p>Well this could be a boring task for most people, so I placed it on [github][http://github.com/davedash/International-Dialing-Codes/]. Feel free to use this in your own projects, I even include a perl one-liner in the README to convert this into a drop down HTML list.</p>
Question: Building a Better Search Engine
2009-06-18T00:00:00+00:00
http://davedash.com/2009/06/18/question-building-a-better-search-engine
<p>So I finally have one of those jobs where I can tell people almost every little detail about what I’m doing and I’m encouraged to talk to people on the intar-webs and solicit opinions.</p>
<p>Uh - this is more or less how I’ve operated at previous jobs, just now I can be overt about it.</p>
<p>So my <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=498999">new task</a> is to work on improving the <a href="http://addons.mozilla.org">addons.mozilla.org</a> search engine. I’ve built various “search engines” over time in PHP, powered by Lucene and most recently in python using an inverted index.</p>
<p>One tool that I’ve been looking at briefly is <a href="http://sphinxsearch.com/">Sphinx</a>. While my record count is low (5-10K), Sphinx basically bakes in a lot of the things I would want in a search engine. Indexing, merging, etc.</p>
<p>Since I’m fairly new to the add-ons team I’m still understanding the basics of what we need:</p>
<ul>
<li>Fast automated indexing of addons for Firefox, Thunderbird and any other Mozilla product</li>
<li>Quick result sets</li>
<li>Easy deployability</li>
<li>Extendible</li>
<li>Customized ranking</li>
<li>Filtering (e.g. by Firefox version, etc).</li>
<li>Basics: Stemming and stop-words</li>
</ul>
<p>Whether it’s Sphinx, Lucene or some home grown solution, I have all that to support. But this should be fairly straight forward. What are people’s thoughts?</p>
From Delicious to Mozilla
2009-05-26T00:00:00+00:00
http://davedash.com/2009/05/26/from-delicious-to-mozilla
<p>Today I said my good-byes to Delicious.com and Yahoo! and tonight I went to the <a href="http://blog.mozilla.com/addons/2009/05/26/add-ons-meetup-tonight/">Addons Meetup</a> @ Mozilla to get a sneak peak at what I’ll be working on in less than two weeks.</p>
<p>I was thrilled. I had no idea how many people to expect, but the Mozilla living room was packed - and most people were there the whole time. Real developers with really cool addons giving feedback to <a href="http://addons.mozilla.org/">addons.mozilla.org</a> directly. No matter how many blog comments, forums answered, customer care emails I responded to at Delicious - nothing beats the real insight and instant feedback you get from meeting a group of users face to face.</p>
<p>My brain, because of Delicious is always in data mining and analysis mode so through each presentation and each question asked, my brain was churning through things that I could build to bring some level of utility to the community.</p>
<p>I’m also happy to be joining an organization where everything is open sourced and available for comment. So I’m hoping to post a lot more on some of the cool tricks I do at Mozilla.</p>