Astronomers Heart Open Coding

Last Monday I travelled to London to give a talk at a meetup hosted by FutureGov on Github for Government, in trendy Shoreditch. GitHub being, of course, the flavour of the month in collaborative software development. The reason for the event was the presence of Ben Balter, who is GitHub’s Government czar (not his official job title – but close enough), tasked with building a community of “open government” enthusiasts around the world doing cool and crafty things with GitHub.

I’ve spoken and written before about the value of openness in science in general, and I feel that sharing our code is an important part of that. So I volunteered to give a short talk on the evening showing some of the ways that I myself, and the astronomy community in general, are embracing open collaborative software development. My slides are embedded above from Slideshare and in this post I’ll just summarise what I talked about.

In astronomy we’ve long had a pretty good culture of openness compared with other sciences: telescope data are very often made publicly available (after a proprietary period) in data archives that now host 100s of TB. Over 90% of our scientific literature gets posted to Arxiv, making the vast majority of our papers publicly accessible. The benefits of this have been demonstrated time and time again: public archives lead to data being reused over and over again, effectively giving more bang per telescope buck. Just having our imagery out there on the web, free and in the open, leads to creativity in wholly unexpected places like fashion design (and no, I do not get paid to post these pictures, and no, I am not above accepting freebies and yes, I would happily give public talks in astronomy-themed sportswear).

Software has always lagged behind in our openness, as I suspect many of us don’t consider our code to form an important part of our science, or we think we’re too rubbish at it to share. The fact that simulators, whose science is code, have historically been better at sharing or publishing their codes, would support that theory. As a result, lots of effort is needlessly duplicated, and those who are really good at writing code aren’t getting much credit for their work. But as we enter into an era of data-driven science, that is beginning to change.

In the Big Data paradigm, software sits far more “front & centre” in the research process. Software is no longer a clunky “just good enough” tool; with Big Data, clever software is clever science. And as teams are also becoming bigger and more globally distributed, having good platforms for collaborative software development is increasingly important. GitHub is emerging as an extremely popular platform in the astrophysics community. Being able to work together effectively on software projects gives better, faster scientific return; builds a global coding community; helps talented coders develop their careers; and helps with transparency and reproducibility of research.

My own experience is small-scale, simple but effective. After getting my paper on triggered star formation accepted for publication in ApJ in 2012, I posted my analysis code to GitHub, and advertised the link in the paper. Earlier this year, Chris Beaumont (U Hawaii/Harvard) contacted me, wanting to use the code on a paper he was writing on machine learning techniques for better bubble classifications in the Milky Way Project data. With his excellent python and machine learning skills, he changed a few 100 lines of code, got rid of some ugly loops I’d coded in, and sent me a pull request. My code now runs insanely fast, Chris’ excellent paper is in peer review with my name as co-author, and I can unleash my code on much larger datasets than was previously practical. Public code: 3x win.

Organisations too are embracing openness in software development. Earlier this year, the Zooniverse announced that from now on, all new Zooniverse projects will get their own GitHub repository, making them effectively open source. Arfon Smith, who was the technical lead for the Zooniverse at the time, listed four main reasons: it fits with the Zooniverse’s attitude of transparency and openness; it may help other individuals or teams to learn how to build similar tools, be it for science or some other purpose; it allows people all around the world to contribute to the Zooniverse projects, for example by translating the sites into their own language; and finally, it allows the developers to show off their work. I think this last point is often neglected. Put differently, GitHub is a perfect developer’s portfolio.

GitHub obviously thought Zoonivese were doin’ it right, and hired Arfon as their new Science Guy.

A final example I talked about is Astropy, the community-driven effort to develop a Python library of astronomy research tools. When Python first came onto the scene, lots of scientists saw its potential. But with all that lovely functionality in the IDL Astrolib or in IRAF, changing to python seemed like such an effort. A number of good packages emerged pretty quickly, but no definitive library. Astropy, led by Tom Robitaille (MPIA), Erik Tollerud (Yale) and Perry Greenfield (STScI) is changing that.

Together these guys are leading what may well be the largest collaborative software development project astronomy has ever seen, with over 50 contributors all over the world and amazingly, no official funding. The service to the community this team are providing is awesome: not only are they delivering great software, they are building a global community of talented coders in astronomy, and giving lots of early career scientists the opportunity to develop and show their skills.

Most of the things I talked about at this event are not “my” projects; they are my friends’ and colleagues’ efforts, which I’m lucky enough to hear about at dotAstronomy, over coffee or a beer. It’s clear to me that in the last couple of years, astronomers are really embracing the idea of open software development and it’s exciting to see this cultural change. GitHub may not be the only way to enable this, but the easy and effective workflow it offers certainly seems to be one of the driving forces behind this movement.

Topcat, Top Dog

Astrobetter has a guest post by Niall “in the gutter” Deacon of the University of Hawaii on one of my favourite pieces of astronomical software, Topcat. Developed as part of the UK’s Virtual Observatory program Astrogrid, Topcat gives astronomers Tools for Operations on Catalogues and Tables. That doesn’t sound very sexy, but for anyone who deals with data from large public surveys or needs to cross-match several large datasets, Topcat is the grease in the cogs of their productivity.

Niall recorded a cool screencast to show off some of Topcat’s functionality, which I’ve embedded above.  There’s also some useful discussion in the comments on Astrobetter, including one from Mark Taylor who actually wrote Topcat.

I use Topcat almost exclusively in conjunction with image viewer Aladin. Connecting the two via the SAMP protocol, which is done at the click of a button, allows you to send targets back and forth between the two, visualise catalog data or create tables from image data. Recent versions of DS9 are also VO-enabled, though I find the VO functionality of Aladin, i.e. searching catalogs, images and archives, more efficient and versatile.

Making my software open

After thinking  about software development in astronomy and talking about it with friends at work and on this blog, I thought it was about time I put my money where my mouth is. I too write software – in fact, the bulk of my work here in Leiden has been based around code I’ve written over the past 2 years for the METIS project (in IDL). The code basically calculates the sensitivity of METIS on the E-ELT, or the minimum flux it will be able to detect at a particular signal to noise (S/N) in a given exposure time over its wavelength range,  in various modes of observation. You can find the full package with background info on my brand-new github page, and a paper is in preparation (to be presented at SPIE 2010) for your referencing pleasure.

[Read more…]

On Software in Astronomy

Importance of the Hubble archive. The number of archival papers has exceeded the number of PI-led papers since 2006 (from White et al., 2009)
I’ve been giving some thought to software development in astronomy, which is a difficult topic. All astronomers agree that good data processing, and hence good software, is crucial to doing rigorous science. To interpret observational data, to translate electrons on a detector to scientific knowledge, requires a solid understanding of the instrument, the observing conditions, and of the exact process with which the data were treated. Many large ground- and space-based observatories, like those run by ESO, Gemini and NASA, strive to provide the community with “science-ready” data. This means that the data are processed to remove all instrumental signatures, allowing astronomers to dive straight into the analysis.

The rationale is that providing science-ready data essentially makes them usable by a much wider community than those involved in the observing campaign, or those used to working with a given instrument. Indeed, a big driver behind the global Virtual Observatory initiative is the “democratisation of astronomy” by providing anyone in the world with ready-to-use astronomical data, irrespective of their location or affiliation to large organisations.

[Read more…]

Bringing open source to astronomy

A very interesting paper was posted on astro-ph this week on software development in astronomy. Authored by Benjamin Weiner of Steward Observatory in Arizona and many colleagues, the paper is one of many on the State of the Profession submitted to the 2010 Decadal Survey for astronomy and astrophysics (lots of interesting papers in this category, check out the full list here). The position paper describes a problem that I think is well known in the astronomy community: that software development for instruments and large simulations is not adequately funded, and that the developers do not get the recognition they deserve for their extremely valuable work. They call for changes in the way that software development is tackled in research. I entirely agree.

[Read more…]