Last Monday I travelled to London to give a talk at a meetup hosted by FutureGov on Github for Government, in trendy Shoreditch. GitHub being, of course, the flavour of the month in collaborative software development. The reason for the event was the presence of Ben Balter, who is GitHub’s Government czar (not his official job title – but close enough), tasked with building a community of “open government” enthusiasts around the world doing cool and crafty things with GitHub.
I’ve spoken and written before about the value of openness in science in general, and I feel that sharing our code is an important part of that. So I volunteered to give a short talk on the evening showing some of the ways that I myself, and the astronomy community in general, are embracing open collaborative software development. My slides are embedded above from Slideshare and in this post I’ll just summarise what I talked about.
In astronomy we’ve long had a pretty good culture of openness compared with other sciences: telescope data are very often made publicly available (after a proprietary period) in data archives that now host 100s of TB. Over 90% of our scientific literature gets posted to Arxiv, making the vast majority of our papers publicly accessible. The benefits of this have been demonstrated time and time again: public archives lead to data being reused over and over again, effectively giving more bang per telescope buck. Just having our imagery out there on the web, free and in the open, leads to creativity in wholly unexpected places like fashion design (and no, I do not get paid to post these pictures, and no, I am not above accepting freebies and yes, I would happily give public talks in astronomy-themed sportswear).
Software has always lagged behind in our openness, as I suspect many of us don’t consider our code to form an important part of our science, or we think we’re too rubbish at it to share. The fact that simulators, whose science is code, have historically been better at sharing or publishing their codes, would support that theory. As a result, lots of effort is needlessly duplicated, and those who are really good at writing code aren’t getting much credit for their work. But as we enter into an era of data-driven science, that is beginning to change.
In the Big Data paradigm, software sits far more “front & centre” in the research process. Software is no longer a clunky “just good enough” tool; with Big Data, clever software is clever science. And as teams are also becoming bigger and more globally distributed, having good platforms for collaborative software development is increasingly important. GitHub is emerging as an extremely popular platform in the astrophysics community. Being able to work together effectively on software projects gives better, faster scientific return; builds a global coding community; helps talented coders develop their careers; and helps with transparency and reproducibility of research.
My own experience is small-scale, simple but effective. After getting my paper on triggered star formation accepted for publication in ApJ in 2012, I posted my analysis code to GitHub, and advertised the link in the paper. Earlier this year, Chris Beaumont (U Hawaii/Harvard) contacted me, wanting to use the code on a paper he was writing on machine learning techniques for better bubble classifications in the Milky Way Project data. With his excellent python and machine learning skills, he changed a few 100 lines of code, got rid of some ugly loops I’d coded in, and sent me a pull request. My code now runs insanely fast, Chris’ excellent paper is in peer review with my name as co-author, and I can unleash my code on much larger datasets than was previously practical. Public code: 3x win.
Organisations too are embracing openness in software development. Earlier this year, the Zooniverse announced that from now on, all new Zooniverse projects will get their own GitHub repository, making them effectively open source. Arfon Smith, who was the technical lead for the Zooniverse at the time, listed four main reasons: it fits with the Zooniverse’s attitude of transparency and openness; it may help other individuals or teams to learn how to build similar tools, be it for science or some other purpose; it allows people all around the world to contribute to the Zooniverse projects, for example by translating the sites into their own language; and finally, it allows the developers to show off their work. I think this last point is often neglected. Put differently, GitHub is a perfect developer’s portfolio.
GitHub obviously thought Zoonivese were doin’ it right, and hired Arfon as their new Science Guy.
A final example I talked about is Astropy, the community-driven effort to develop a Python library of astronomy research tools. When Python first came onto the scene, lots of scientists saw its potential. But with all that lovely functionality in the IDL Astrolib or in IRAF, changing to python seemed like such an effort. A number of good packages emerged pretty quickly, but no definitive library. Astropy, led by Tom Robitaille (MPIA), Erik Tollerud (Yale) and Perry Greenfield (STScI) is changing that.
Together these guys are leading what may well be the largest collaborative software development project astronomy has ever seen, with over 50 contributors all over the world and amazingly, no official funding. The service to the community this team are providing is awesome: not only are they delivering great software, they are building a global community of talented coders in astronomy, and giving lots of early career scientists the opportunity to develop and show their skills.
Most of the things I talked about at this event are not “my” projects; they are my friends’ and colleagues’ efforts, which I’m lucky enough to hear about at dotAstronomy, over coffee or a beer. It’s clear to me that in the last couple of years, astronomers are really embracing the idea of open software development and it’s exciting to see this cultural change. GitHub may not be the only way to enable this, but the easy and effective workflow it offers certainly seems to be one of the driving forces behind this movement.