A few recent posts from around the web have gotten me thinking about how the concerns of cyberinfrastructure play out in local laboratories:
- Jonathan Eisen, a biologist at UC Davis, posted on The Tree of Life about his quest to find an electronic lab notebook, and the ensuing discussion suggests that, while it’s possible to kludge together something that works, there aren’t many options specifically designed to support the day-to-day needs and constraints of an academic research laboratory. (And just try to find ones that play well with other information systems inside and outside the lab!)
- Richard Apodaca at Depth-First wants to stop talking about “electronic laboratory notebooks” and instead use the phrase “networked laboratory information.” He suggests that consideration of this new mental model would “start out with identifying the many forms of information we create and use, and the needs of those doing the creating and using. It would then move on to how best to share this information within our organization, and with our customers and partners in a secure manner.”
- Titus Brown has posted a wonderfully tongue-in-cheek Data Management Plan on his blog, Daily Life in an Ivory Basement:
“I will store all data on at least one, and possibly up to 50, hard drives in my lab. The directory structure will be custom, not self-explanatory, and in no way documented or described. Students working with the data will be encouraged to make their own copies and modify them as they please, in order to ensure that no one can ever figure out what the actual real raw data is. Backups will rarely, if ever, be done.”
These posts seem to highlight a tension that arises from individuals and small laboratories doing science in a computerized, networked, big science world. We hear a lot about how building massive databases and supercomputers is increasingly important for doing cutting edge science. The NSF, NIH, DOE, and many other agencies and organizations are putting significant funding and attention toward creating large, centralized scientific resources. But I wonder if this focus on the centralized portion of infrastructure sometimes comes at the expense of supporting local practice.
For example, Brown’s satire is written in response to the NSF’s new policy requiring grants to have data management plans. At least as it is described in the press release, the focus of the new policy is on “community access to data” and “open sharing of research data.” It seems that for the NSF, data management is only important insofar as it supports the one-way movement of data out of the lab and into the community. This is a shortsighted view of data management.
In a recent article, Karen Baker and Lynn Yarmey present a much more nuanced and complex understanding of data management for big science. They see data repositories existing within different “spheres-of-context.” For example, a local repository might be found in a particular laboratory or small group, where it is intended to support data use in the context of a specific set of research questions. On the other hand, a large remote archive might be aimed at preserving data for future reuse. Whereas the NSF policy treats the local context (e.g., the laboratory) as a pit stop on the road to a shared database, Baker and Yarmey remind us that laboratories are more than data factories, and that the data management challenges are about more than simply enabling data aggregation. Data management policies need to consider how data move through and around the entire “web of repositories.”
I think the spheres-of-context concept can help us think not just about repositories, but about the entire range of cyberinfrastructure. In the same way that the electricity infrastructure needs both power plants and wall outlets, cyberinfrastructures need both the local and the community contexts. Our investments in cyberinfrastructure won’t have the transformational impact we want unless we also pay attention to supporting new scientific practices in day-to-day laboratory life, and to meaningfully connecting those local practices with collective scientific activities.
Baker, K. S., & Yarmey, L. (2009). Data stewardship: Environmental data curation and a web-of-repositories. International Journal of Digital Curation, 4(2), 12-27.
Government-wide emphasis on community access to data supports substantive push toward more open sharing of research data