Milky Way Project: Data Release 1

Spitzer's view of the central regions of our Galaxy (NASA/Milky Way Project)

ResearchBlogging.orgSince its launch little more than over a year ago, Milky Way Project, the citizen science initiative to identify bubbles in the interstellar medium of our Galaxy, has gathered an amazing amount of classifications: over half a million bubbles drawn by around 35,000 users. Before Christmas we reached a major milestone when we submitted our first scientific paper for the project to Monthly Notices of the Royal Astronomical Society (MNRAS).

Following some nice iterations (never said I didn’t like peer review….) with the referee for the paper, and coverage by the BBC at the AAS conference in Austin, TX, in January, we posted the paper to Arxiv a couple of weeks ago. From here it’s available to anyone to download and read. The paper was formally accepted today (yay!) but we haven’t uploaded the final revision to the Arxiv yet – keep an eye out for it in the replacements section if you’re interested, we did improve it significantly with the input of the referee.

As the project was only made possible by NASA publicly sharing the data from the Spitzer Space Telescope, we have of course made our first data catalogues publicly available as well on a dedicated site and on FigShare.

What’s in the paper?

As is customary, we give a description of the science questions we’re addressing in the opening section, with references to previous work. What’s the nature of these bubble-shaped structures we see littered around the Galaxy, and why are we interested in the HII regions they enclose? We describe the typical appearance of bubbles in our data colour scheme: bright green rims with filaments or secondary bubbles, often with toroidal red emission in their interior, and what these properties might physically signify.

The second introductory section describes our approach, data selection, the Milky Way Project interface and the design of the drawing tools we included.  If you’ve classified bubbles, this will be familiar stuff. Then we go on to the data processing and the science.

One of the biggest lessons I’ve learnt from MWP is that the citizen science approach taken by the Zooniverse projects, while it provides a clever solution to some specific problems, doesn’t lead to easy answers. What really happens is that a visual task, which is hard to perform with a computer, is transformed into a data task, which computers are good at. Basically, images are converted into numbers via human brains. Lots of brains, and lots of numbers. How to make sense of all that?

Computers are really only as clever as we make them, so we had to come up with an intelligent way of processing the classifications into a reliable catalogue of bubble objects. In section 3 of the paper we show in detail how we achieved that. We wrote an algorithm that goes through the entire set of images, and finds clusters of classifications. When a cluster of 5 or more drawings is found, their coordinates are added to a separate table. The algorithm examines progressively smaller areas of the sky, until no bubbles drawings are left to be found, and then it moves on to the next section.

Finding clusters of objects in images of the sky is a very common task in astrophysics – finding stellar clusters or clusters of galaxies in large survey sets for example – and lots algorithms exist for doing this. Experimenting with more sophisticated cluster-finding methods is something we’d like to do for a following paper. But no method is perfect, there will always be false positive detections or clusters missed, and for our first data release we were satisfied with the results of our simple approach.

This process leaves us with clusters of bubble drawings that we had to process into one bubble, with a position and size. But is each drawing of the same quality? Do we just average the numbers, or do we try to assess how good or bad a drawing is in relation to the others? how do we deal with outliers? Just like we’d have to do with an instrument, we have to calibrate the classifications.

The solution we used is to discard every user’s first ten classifications, even our own, and from there on in we assigned points to users depending on how many bubbles they’d drawn using the full toolset. Essentially we assume that users who frequently adjust the shapes and sizes of the bubble drawings are precise and careful in their approach, and the more they do this the higher we weight their work.

These strategies weren’t picked out of thin air: we benefited a lot from the experience with previous Zooniverse projects, like Galaxy Zoo, and we tested any new methods quite extensively before accepting it into our “baseline” processing algorithm. And as I pointed out in previous posts on the MWP blog, these strategies are no reflection on the “right-ness” or “wrong-ness” of the drawings, we simply try to benchmark them against one another. After the first 10 drawings every one does count and contribute to the final catalogue.

What’s in the catalogues?

The entry page to the data catalogue for now contains just the bubble catalogues. The link prompts you to download a zip file containing two separate csv files (comma-separated variable files), which you can read either with a simple text editor, load into a spreadsheet in Excel or Numbers, or a more sophisticated table application like TopCat (if you’re nifty with code you can obviously write your own scripts too).

The first catalogue contains the “large bubbles”, or the bubbles that were drawn with the full toolset on the webpage, that lets the user adjust sizes and shapes. The “small bubbles” catalogue contains those objects that were too small to draw with the tools, which were simply marked with a box. The files contain 3744 large and 1362 small bubbles; 5106 objects in total.

If you want to explore the data visually, Rob created a Data Explorer page you can access from the Data webpage, which is really neat.

What will the catalogues be used for?

The original papers by Ed Churchwell and his collaborators from 2006 and 2007, which presented the first bubble  catalogues containing just a few hundred objects, have racked up almost 150 citations between them in 6 years, so there’s certainly interest in the community for this type of object.

Other scientists have particularly used the data to compare the infrared properties of these expanding HII regions with data from other wavelengths – radio, submillimetre, far-infrared or optical – to get more comprehensive pictures of specific star forming regions (e.g. Bik et al 2009, Marco & Neguerela 2011) . The data have also frequently been referenced in theoretical work simulating the expansion of these shells into the dense interstellar medium (e.g. Dale et al 2009), how newly born massive stars affect their surroundings and so on .

Authors have studied the rims of bubbles to locate potential areas of star formation that were triggered by the expansion of the shell into the surrounding dense cloud material (e.g. Watson et al 2010, Thompson et al 2012, Rodon et al 2010). This is a very active field of research, as this mechanism can potentially explain how star formation can propagate through a galaxy. But finding the observational evidence is tricky.

So we know there’s an interest in bubbles in general, and in particular the large sample size our citizen science approach has given us will make it particularly attractive for statistical follow-up studies.

And finally….

Rob Simpson, who leads the project at the University of Oxford and who really bore the brunt of the work for the project setup and this paper, will be on the BBC’s The Sky at Night in early March. He’ll talk about the Zooniverse projects he’s been working on and I expect he’ll talk about Milky Way Project as well. I hope the programme comes to BBC iPlayer soon so I can enjoy it here in these foreign climes.


R. J. Simpson, M. S. Povich, S. Kendrew, C. J. Lintott, E. Bressert, K. Arvidsson, C. Cyganowski, S. Maddison, K. Schawinski, R. Sherman, A. M. Smith, & G. Wolf-Chase (2012). The Milky Way Project First Data Release: A Bubblier Galactic Disk MNRAS arXiv: 1201.6357v1


  1. [...] my trip to Paranal, the fun MIRI day in London in May and the instrument’s delivery to NASA, first results from the Milky Way Project, and the BBC News in my kitchen were particular work-related highlights. And of course, with work [...]