While I have been involved in a number of different research projects, there are several consistent themes that drive my research:
- Data and Information Management
- How are data created, managed, shared, and used?
- Distributed Collaborative Work
- How do people collaborate over distance, what makes that work successful (or not), and how can we develop strategies and information technologies that make collaboration easier and more effective?
- Large-scale Information Infrastructures
- How do the “cyberinfrastructures” that provide the foundations for Big Data science and engineering come into being, and how do they support new forms of knowledge production?
- Values in Design
- How are values embedded in the artifacts we create? How do we support value awareness, negotiation, and reflection in our design processes? How do we manage ethical and privacy considerations in the design of information systems?
I use sociotechnical theories to illuminate the complex interplay of social and technological phenomena. I primarily use qualitative methods (especially interviews and ethnographic observations of collaborative work practices), although I also conduct experimental studies and surveys to validate emerging theory. My research has appeared primarily in the Human-Computer Interaction domain, especially in the Computer Supported Cooperative Work (CSCW) conferences and journals. I am also involved with Social Studies of Science (SSS) and Science, Technology and Society (STS) communities.
Some Key Findings From My Work
Data Is Sociotechnical.
Data play an important role in the production of knowledge, but this is not data’s only function. In "Data at Work [C3]," Jeremy Birnholtz and I wrote about some of the social roles that data play in three different scientific communities: earthquake engineering, HIV/AIDS research, and space physics. For example, we found that one’s access to data frequently denotes position within a scientific community, and that allowing students to take control over data is a powerful signal that they are gaining full scientific status. We also provide an economic framework for understanding the potential positive and negative impacts of data sharing. We conclude that successful data sharing systems will seek to support the social functions of data as well as the scientific.
Databases Are Sites for the Negotiation of Scientific Values.
In a 2009 article [C5], Charlotte Lee and I discuss how genetic sequence databases support collaboration in the nascent and highly interdisciplinary field of metagenomics. Metagenomics researchers rely on large centralized collections of community-submitted genetic sequence data, both to provide shared data for (re)analysis and to provide a baseline for analyzing new data. As part of a 3-year study of the development cyberinfrastructure, we looked at the work of developing large high-performance scientific databases for metagenomics. We classified these databases as boundary negotiating artifacts (Lee, 2007). In other words, in the process of developing the databases, it was necessary to bring multiple stakeholder communities together to make decisions about what kinds of data the databases would contain, how the data should be described, and what tools should be provided to query, manipulate, and analyze the data. These seemingly technical discussions became opportunities for members of different scientific communities to argue about which scientific questions were most pressing, how data should be collected and annotated, and which analysis methods were most valid. Through this process, the database became a site where the communities’ scientific values could be negotiated, codified, and enacted.
“Synergizing” Is a Fundamental Infrastructure Design Process.
Designing infrastructure presents a paradox. Infrastructures are emergent systems, growing out of the interweaving of formal structures and local work practices (Star & Ruhleder, 1996). Such distributed, emergent systems of systems seem resistant to a classic sense of Design that involves designers enacting their intentions through the creation of artifacts. At the same time, large numbers of people claim to be designing infrastructures. In a 2010 paper with Charlotte Lee and Eric Baumer [J2], we develop a concept called synergizing (borrowing a term from the participants in our ethnogrpahic study) to describe the work of cyberinfrastructure development. Synergizing involves identifying and enacting productive collaborative relationships in pursuit of greater combined effects than individuals, groups, or organizations could effect on their own. Synergizing consists of two key sub-processes. Leveraging refers to using an existing relationships with a person, artifact, or organization as a resource to build or strengthen a productive relationship with another person, artifact, or organization. Aligning refers to the work that is necessary to make these relationships productive within an infrastructural context. The synergizing concept highlights that, in an infrastructural context, the object of design is less the components that make up the infrastructure, and more the relationships among components. In other words, instead of asking “what should we make,” infrastructure designers are guided by the question, “with what/whom do we want to connect.”
Sustaining Cyberinfrastructure Requires Adapting to Change.
Infrastructures are expected to last for long time periods, but this is a challenge for cyberinfrastructures that are often funded as short-term projects. In “Sustaining the Development of Cyberinfrastructure” [C9], Toni Ferro, Charlotte Lee and I wrote about how one organization met the challenges of sustaining a scientific infrastructure through a number of potentially disruptive events. We show that “maintenance” work is better thought of as ongoing redevelopment as technologies, organizations, and science changes. As such, flexibility becomes an important design requirement. In another paper [C10], Charlotte Lee and I explored how three genomic databases dealt with the challenge of adapting to new science. Metagenomics is a new science that uses genomic data, but the level of analysis moves away from the organism to the population of organisms. This requires new database structures and new forms of metadata that could not be accommodated by the old systems. We document three strategies for dealing with legacy systems—work-arounds, extensions, and from-scratch development—and explore the design tradeoffs of each strategy.