(No boats were harmed, involved, or even really alluded to in the making of this post)
Two things came through my RSS reader recently that resonated with me particularly. The first, a blog post by Martin Keegan, “Intellectual Debt”, says:
I think it’s possible to accumulate “intellectual debt”. Thoughts and ideas that you’ve had, worked on, developed, talked about, but have not written up and published. You can have an idea, but until you’ve tried to write it up properly such that someone else could read and criticise it, you can’t be sure that it actually makes sense.
Whether or not it gets used, you have to finish something. The worst thing you can do is start a bunch of things, get halfway through, quit and start something else. You’re not going to be happy. Ship stuff.
Both of these hit home hard. I’ve got a variety of half-finished, un-written-up side projects that have suffered an undue amount of neglect due to client work and life. I suspect I’m far from alone in this.
Right now I’m taking some holiday time, and since several schools of thought seem to suggest a public commitment can aid in achieving a goal, here’s the list of things I’m going to try to ship over the coming weeks:
"Learning Me A Haskell" - an informal "worked introduction" to writing a puzzle solver in Haskell
"Jeeves-Door" - hooking up my doorbell to a Raspberry Pi for tweets and texts and photos when someone comes a-ringing
"Jeeves-Bell" - replacing the guts of my analogue alarm clock with a Raspberry Pi for better alarm control and scheduling
A campaign to encourage people to cite their sources when publishing
There’s more I’ve got queued, but this holiday time is finite, and this feels like the right amount to be “challenging but achievable”.
If like me you’ve got a personal project backlog, why not join me on a “shipping spree” - I’d love to hear from you if you do.
I’ve finally found the time to sit down and start using Vagrant for Real Things. For the unaware, Vagrant is essentially a tool for managing development VMs - excellent for such things as managing a local development environment, or developing and testing Chef/Puppet configuration. For more detail see the excellent set of slides by Vagrant author Mitchell Hashimoto - Develop and Test Configuration Management Scripts with Vagrant.
Something I swiftly ran into was that I have several manifests written for Puppet 3. Among things introduced since Puppet 2.7 are the addition of unless as a synonym for if ! to the DSL, and the introduction of Hiera as a “first class citizen”.
So, I want to use Puppet 3, but I really don’t want to have to go and rebuild existing Vagrant boxes. I have Puppet modules for ensuring that I’m running the latest stable Puppet, but that’s not going to work when the existing install can’t parse said modules.
It turns out you can have multiple Provisioners in Vagrant. So, while in general Puppet (or Chef if you’re of that persuasion) is The Right Way to provision things, we can add a Shell provisioner to run before the Puppet provisioner and ensure the VM is running Puppet 3.
upgrade-puppet.sh
Note: Everything I’m doing at the moment is Ubuntu-based; this script is Debian/Ubuntu specific, but should be fairly trivial to adapt to the (supported) distro of your choosing
PuppetLabs provides packages to enable their apt repositories for Debian and Ubuntu, so it’s a simple matter of extracting the OS code name (helpfully provided by /etc/lsb-release as $DISTRIB_CODENAMEUpdate: This doesn’t work for Debian; better to use lsb_release --codename --short after installing the lsb-release package providing it) and using that to fetch and install the right package, before updating the package indexes and upgrading Puppet:
With that in place, all that remains is to get Vagrant to use it as a shell provisioner. Just drop the below line into your Vagrantfile, somewhere before your Puppet provisioner:
Puppet 3 Provisioner
Once I’d upgraded to Puppet 3, I noticed a few warnings appear across my boxes. Naturally, I wanted to squash these
FQDN
Something (I haven’t quite established what) was checking the fqdnfact. All the boxes I use seem not to set an FQDN, for example:
It’s a simple matter of setting config.vm.host_name to a valid FQDN, for example:
(If you’re using Vagrant v1 configuration, you’ll want config.vm.host_name (note the underscore))
Hiera
Puppet 3 now has Hiera built in, and while its default configuration seems fairly sane and reasonable, it still regards lack of an explicit configuration file as warning-worthy. So, use options to explicitly set Puppet’s hiera_config option, for example:
Conclusion
Sure, perhaps you’re better off building a new base box, but if you’re not ready to do that, this should hopefully come in useful!
This is nice. This is so much nicer than the other all-too-common model:
Find appropriate contact method, be it a web form or email address somewhere on the site
Email them a description of the issue
Wait
Sometimes I get a reply. Sometimes I don’t. It’s all too fire-and-forget. What GitHub gives me is visibility and openness. There’s now a public URL for my pull request / issue. The “open requests” count increments. This doesn’t sound like much, but it’s important. Anyone visiting the repository sees the count. Anyone can see the issue.
Why does openness matter?
Tom Preston-Werner, one of the GitHub founders, covers why people should open-source code very nicely in his blog post, “Open Source (Almost) Everything”, with the caveat of:
Don’t open source anything that represents core business value
What I’m saying is that it’s not just code. Open your documentation, and open your processes. I want to know that you’ll respond to fixes and issues and questions. I want to see how you respond, and how you react. I want to see that you care about your users. If you’ve just dumped code onto the internet under the guise of “openness”, and all feedback is routed to /dev/null, I want to see that.
There are lots of things that factor into my decision whether or not to invest in a product or technology. Openness helps.
Even if you’re a primarily closed-source company, what do you have to lose by open-sourcing your (presumably already freely-downloadable) documentation? Maybe you do and no-one interacts with it. Is this a cost? Maybe it’ll encourage it to get a bit more love ;)
With the tools currently available it’s now easier than ever to be more open, and I’m wondering to what extent to start viewing pointless closedness as a weakness.
Addendum - Is this GitHub specific?
Not at all. Replace “GitHub” with BitBucket if you like, or Google Code Project Hosting (though I’d like a nice ‘pull-request’ UI or similar). GitHub just has “the nicest and easiest” (read: “my favourite”) UI for code hosting and basic issue tracking.
Features include reputation tracking and graphing, to see how you’re doing compared to your friends and rivals, and a detailed comparison tool, so you can see exactly what badges someone has compared to you, and vice versa.
If you’re a keen StackExchange user who likes to know how they’re doing compared to other users you might know, this is the tool for you.
(Note, if you haven’t read it already, I recommend my previous article on Django and Static Files to get an understanding of the fundamentals)
Pretty much every Django project I deploy, I use Amazon’s Simple Storage Service (S3) for hosting my static files. If you aren’t particularly familiar with it, then the salient points are:
Storage costs approximately $0.13 per month per GB stored up to 1 TB
Inbound data transfer is free
Outbound data transfer is free up to 1 GB per month
Outbound data transfer between 1 GB and 10 TB per month costs approximately $0.13
The cost to the average reader: under $0.15, or free if they're covered by the AWS Free Usage Tier
Why don’t I use nginx or Apache or whatever webserver I have in front of my Django deployment for static file hosting? Three things:
Specialisation - while I have no doubts about the abilities of nginx and Apache to host static files, S3 will inevitably do it far better for far less effort, and it means one less thing for them to do
I frequently deploy to Heroku where I don't have access to the configuration of the httpd layer
I find it pretty simple - not much more than half a dozen lines added to `settings.py`
So, first things first, you’ll need django-storages for the STATICFILES_STORAGE class (see my previous article for the role of storages), and boto which is the (excellent) Python AWS library that the unsurprisingly-namedS3BotoStorage uses to communicate with S3.
Assuming you’re inside a virtualenv, this should be pretty straightforward:
(Also if you have a requirements.txt file or setup.py, don’t forget to update them!)
Now that’s all installed, add it to INSTALLED_APPS in your settings.py:
You’ll need an S3 bucket to push files to, so head over to the AWS Management Console for S3 and “Create Bucket”, giving it some appropriate name, and picking the most appropriate geographical region for you. You’ll be offered the option to set up logging et cetera, but can happily skip this by just clicking “Create”:
You’ll also need to get this name into your settings. We’ll do this from os.environ, because we’ve all read the twelve-factor opinions on config right? (Go read it, so if you turn up on IRC for help and I ask you to pastebin your settings.py, you don’t need to go on an extensive redacting spree / expose sensitive information / both)
If you’re on Heroku, you’ll want to add that to your config:
Finally, all you need is to have Django use the right static configuration. I tend to wrap this in an if not DEBUG block because I don’t want it while developing (I include the AWS_STORAGE_BUCKET_NAME in that block too, so I don’t need to be too specific about my environment at dev time):
Voila. Now, with DEBUG set to False, I just need to collectstatic and my static files will be uploaded to S3:
And there you have it. Hopefully you shouldn’t have any problems following this guide, but if you have any questions, issues, or feedback (always appreciated!) then please leave a comment, find me on IRC, or catch me on Twitter.
Did this help? Check out my book
Ok so I'm still writing the book so you can't buy it just yet. But if you want to make sure that you're serving your static files in the best way possible, you'll want it:
Django’s handling of static files is great, but sometimes causes confusion. If you’re wondering how it all fits together, what some of the settings mean, or just want some example uses, then keep reading.
Introduction
A typical Django project will have multiple sets of static files. The two common sources are applications with a static directory for media specific to them, and a similarly-named directory somewhere in the project for media tying the whole project together.
Ultimately, you want these all to end up in one place, to be served to the end user. This is where the collectstatic command comes in; as the name suggests, it’ll collect all your static files together into that one place. Of course, if you have DEBUG set to True in your settings, runserver will happily handle all this for you, but that won’t be the case for your final deployment (and nor should it be!)
So how does this all happen?
Configuration
Finders
First of all, where to find the static files?
By default settings.py will be created with a STATICFILES_FINDERS setting, with a value of:
These will be used to find the source static files. The AppDirectoriesFinder is responsible for picking up $app_name/static/, the FileSystemFinder uses the directories specified in the STATICFILES_DIRS tuple.
You’ll probably want STATICFILES_DIRS to look something like the below:
Storage
The STATICFILES_STORAGE setting controls how the files are aggregated together. The default value is django.contrib.staticfiles.storage.StaticFilesStorage which will copy the collected files to the directory specified by STATIC_ROOT.
Do not confuse STATIC_ROOT, to where static files are collected, with the aforementioned STATICFILES_DIRS; the former is output, the latter are inputs. They should not overlap. This is a common mistake.
Update: To be absolutely clear, STATIC_ROOT should live outside of your Django project - it’s the directory to where your static files are collected, for use by a local webserver or similar; Django’s involvement with that directory should end once your static files have been collected there
URL
Last but not least, STATIC_URL should be the URL at which a user / client / browser can reach the static files that have been aggregated by collectstatic.
If you’re using the default StaticFilesStorage, then this will be the location of where your nginx (or similar) instance is serving up STATIC_ROOT, e.g. the default /static/, or, better, something like http://static.example.com/. If you’re using Amazon S3 this will be http://your_s3_bucket.s3.amazonaws.com/. Essentially, this is wholly dependent on whatever technique you’re using to host your static files. It’s a URL, and not a file path
Common Mistakes
Overlap between STATICFILES_DIRS and STATIC_ROOT - the former is a set of places to look for static files, the latter is where they’re stored
Incorrect STATIC_URL - it’s a URL, not a file path
Incorrect configuration of whatever you’re using to host your static content; this is why I use S3, in my experience it’s the least effort to get working
Having STATIC_ROOT inside your project directory. While not strictly a mistake, it’s not where it belongs, and is generally a sign of other misunderstandings
Examples
All of these will assume you’ve left STATICFILES_FINDERS as its default, and STATICFILES_DIRS as described above; I’ve never yet had a reason, across dozens of projects, for these not to be the case
Note this uses django-storages, which is a nice wide-ranging collection of custom backends for STATICFILES_STORAGE described above.
I frequently deploy to Heroku where custom httpd configuration is nontrivial, so often use this method.
Conclusion
Static file handling is important to get right, and straight forward once you know how, but easy to get wrong. I hope this clarifies things.
As ever, drop me a line if you have any queries, questions or complaints.
Did this help? Check out my book
Ok so I'm still writing the book so you can't buy it just yet. But if you want to make sure that you're serving your static files in the best way possible, you'll want it:
The folks over at NHS Hack Day put together a presentation on SlideShare with a list of all the projects, and of the judges - check it out for for a good overview of the kind of things that people put together.
A side project of mine that I’m working on at the moment is StackCompare, an app for StackExchange users to compare their reputation and badges to that of their friends. One feature I wanted to add was a graph of reputation over time.
Step One - Data Aggregation
First things first, get the data into some sort of database. The data is a time series (a set of tuples of the form (timestamp,data)), so my thoughts immediately went to setting up my own OpenTSDB instance. However, where possible I’d rather use a hosted solution at this stage of development, and some googling led me to TempoDB, a hosted time-series database service. Currently Tempo seems quite early-stage (a warning sign to me is lack of any mention of pricing…) but works quite nicely, with decent documentation and a Python client.
First, some placeholder HTML (with some slightly ugly hardcoded sizes…):
Then, a little Javascript to populate it:
All that was needed to finish it off was some short Python code to massage the data from TempoDB into the right format for Flot (slightly paraphrased):
So last weekend I found myself up at 0600 on a Saturday, of my own free will. Why? I’d signed up for NHS Hack Day, a two-day event essentially throwing a bunch of NHS-types and tech-types together in a room and seeing what comes out. My domain knowledge is pretty limited - I know the NHS exists and like it, and several friends are doctors or working on becoming so - but I like building things, particularly for people with a clear idea of what they want.
Having arrived on the first day, I listened to the other presentations, and while they all sounded very good, Renal Patient View still seemed very appealing, so it was time to meet the others similarly interested; Dr Grant Hill-Cawthorne, Dr Zeinab Abdi, Ayesha Garrett, and my friends Jeff Snyder and Grey Baker. Our first challenge: getting the software built. With my ever-rusting Java experience putting me as ‘the Java guy’ of the team, we perhaps weren’t best placed to be doing this, but we had clue, coffee, and enthusiasm; how bad could it go…! Well, several hours later, we had it building…
The term ‘Open Source’ is used a lot of ways by a lot of people. Renal Patient View is an Open Source project; you can find its code currently at SourceForge (in the process of being moved to GitHub). However, it’s one thing being able to access the code, and another to be able to actually do anything with it. One of what we felt was our biggest, yet least visible, contributions from the weekend was that we were able to identify and document the build/deployment process for other people to use, making Renal Patient View ‘even more Open Source’. Now, in no way should this be taken as a criticism of the Renal Patient View developers; making a relocatable build system requires genuinely hard work, and it’s to the credit of the RPV team that the source is even available for us to work with - the most important first step! I’m glad we were able to contribute in such a way that will aid anyone coming to the code in future.
Meanwhile, once we’d got development environments up and running, it was time to add some features! After some brainstorming, we aimed at revamping the design and copy, adding some graphing for blood component levels for user results, and adding notification emails for patients when their test results arrived. See them demonstrated in the presentation below:
While we didn’t win, we made the shortlist, and here we are (apart from Grey, who was called away to Silicon Milkroundabout)
In terms of what we were able to produce in a weekend, in a language and framework we were unfamiliar with, in a domain that half of us knew little about, I feel we made incredible progress. It was great meeting and working with the other people on the team, and I learnt an incredible amount.
As for the rest of the event, it was truly excellent. An incredible gathering of bright, intelligent, motivated and interesting people, who produced some great things. Credit to Dr. Carl Reynolds and all the other volunteers and organisers for an absolutely great weekend. Subsequent hack days are being organised for (currently, I believe) Liverpool and Oxford - I thoroughly encourage anyone reading to attend, and I’ll hopefully see you there!