At Mozilla Web we developed a pattern that liked for
testing templates. We would make a fake request to a URL, get the content from the response object (we used Django) and then search for elements using pyquery. pyquery used a syntax similar to jQuery so it wasn’t very difficult for us (who used jQuery) to carefully write selectors that would find the exact piece of text we are looking for.
Let’s do a more complicated example. Currently this is my home page:
Let’s try to grab the text in question.
Let’s install pyquery and requests. Requests is a library that makes light work of fetching web pages:
pip install pyquery requests
Now let’s explore:
Great looks like we were able to fetch some HTML, and hopefully it’s my web page.
Let’s see if we can load that into pyquery:
We can use jQuery style selectors to get HTML elements out of my homepage. Let’s take a wild guess and assume I put this in a paragraph tag:
It appears there are 4 paragraph tags. No problems. This is python:
Oh brilliant we just need the third paragraph tag:
This is simple stuff. Let’s do something a bit more advanced. Let’s try to get the dates of each post that I’ve listed in “Recent Posts”:
Clearly I don’t blog nearly enough. Note the parameter in doc is
'span.date'. If you don’t speak CSS it means I’m looking for a <span>
element with a class of date. Similarly the doc object we created can take
any valid CSS selector as a string argument.