Once upon a time an awesome lady I met on the internet, who blogs about challenging her own comfort zone, asked me if I might ever want to guest post. From what I can tell, she started a company, redirected her efforts there, and is sort of MIA on the internet now. I hope it's her dream come true. Anyway, I had a hard time coming up with a challenge for my own comfort zone, but had finally settled on "attend a hackathon".
This past Saturday, I attended a hackathon hosted by DataRescue Boston at MIT. DataRescue is a movement of volunteers working to archive taxpayer-funded scientific data in the event that an antagonistic administration were to limit access to it.
I arrived at MIT half an hour late, because I screwed up, and was relieved to discover that it was still trickle-in time. There was a ton of coffee, bagels/toppings, OJ, fruit salad and awesome vegetarian breakfast burritos from Feed the People. I checked in, got a nametag, sticker and level keychain (!!!!) and was offered the option of opting out of photography at the event.
I went and washed the MBTA slime off my hands, and sat down at a round table just as the program began. The organizers introduced themselves and the volunteer guides/managers, the purpose and motivation for Data Rescue, the facility information, the schedule and an outline of the roles folks could choose for the event. Two things that I really liked were the inclusion of facility information (bathroom and info table locations, etc.) and the pre-existing "Attendee Info Packet" they'd set up online with info in greater detail.
People chose their own "track" from:
Surveyors: Researching government departments not already being archived (not touching data)
Seeders: Use the information supplied by the surveyors to identify and queue data that needs to be archived
Harvesters: Download data, document download process, and upload to the archive
Storytellers: Document the event for communication to the public and the press.
The roles/tracks was something I had researched ahead of time, but didn't quite understand, so I want to dwell on them for a moment. First of all, I'm a "technical" person aka I code, but given the importance of the mission, I went with the intention of "going where I was most needed". It turns out that was unnecessary: because of the size and parallelization of the national movement, there is already plenty in all stages of the pipeline. In addition, the software and workflow are designed so that at every stage (2-3 hours) the data is annotated and repackaged for the next person.
I was plagued by a phantom 404 (after I checked the urls!) which ate up a lot of my time, but I eventually solved it (the library I used to check urls preprocessed them, the library I used to download didn't) and finished my download and archiving at home later. I now have the info and logins to contribute independent of an event, and I hope to do so.
I enjoyed the event a lot - I was worried about disorganization and drudgery, but the DataRescue folks (Boston organizers and above) were very organized, and the distributed pipeline meant that individuals having issues didn't cause problems for the general forward progress of the group. There's another event coming up at Northeastern, which I recommend checking out.
Random comments on the event: There were gender neutral and accessible bathrooms, as well as vegetarian and gluten-free food options. There was no ASL/captioning, but all of the information was available in static written form. I'd say attendance was at least 1/3 women.