THEORY AND FRAMEWORKS

A Mathematical Framework for Modeling and Analyzing Migration Time (Full Paper)
Feng Luan, Mads Nygard and Thomas Mestl
Abstract. File format obsolescence has so far been considered the major risk in long-term storage of digital objects. There are, however, growing indications that file transfer may be a real threat as the migration time, i.e., the time required to migrate Petabytes of data, may easily spend years. However, hardware support is usually limited to 3-4 years and a situation can emerge when a new migration has to be started although the previous one is still not finished yet. This paper chooses a process modeling approach to obtain estimates of upper and lower bounds for the required migration time. The advantage is that information about potential bottlenecks can be acquired. Our theoretical considerations are validated by migration tests at the National Library of Norway (NB) as well as at our department.

Digital Libraries for Scientific Data Discovery and Reuse: From Vision to Practical Reality (Full Paper)
Jillian Wallis, Matthew Mayernik, Christine Borgman and Alberto Pepe
Abstract. Science and technology research is becoming not only more distributed and collaborative, but more highly instrumented. Digital libraries provide a means to capture, manage, and access the data deluge that results from these research enterprises. We have conducted research on data practices and participated in developing data management services for the Center for Embedded Networked Sensing since its founding in 2002 as a National Science Foundation Science and Technology Center. Over the course of 8 years, our digital library strategy has shifted dramatically in response to changing technologies, practices, and policies. We report on the development of several DL systems and on the lessons learned, which include the difficulty of anticipating data requirements from nascent technologies, building systems for highly diverse work practices and data types, the need to bind together multiple single-purpose systems, the lack of incentives to manage and share data, the complementary nature of research and development in understanding practices, and sustainability.

Ensemble PDP-8: Eight Principles for Distributed Portals (Short Paper)
Edward Fox, Yinlin Chen, Monika Akbar, Clifford Shaffer, Stephen Edwards, Peter Brusilovsky, Dan Garcia, Lois Delcambre, Felicia Decker, David Archer, Richard Furuta, Frank Shipman, Stephen Carpenter and Lillian Cassel
Abstract. Ensemble, the National Science Digital Library (NSDL) Pathways project for Computing builds upon a diverse group of prior NSDL, DL-I, and other projects. Ensemble has shaped its activities according to principles related to design, development, implementation, and operation of distributed portals. Here we articulate 8 key principles for distributed portals (PDPs). While our focus is on education and pedagogy, we expect that our experiences will generalize to other digital library application domains. These principles inform, facilitate, and enhance the Ensemble R&D and production activities. They allow us to provide a broad range of services, from personalization to coordination across communities. The eight PDPs can be briefly summarized as: (1) Articulation across communities using ontologies. (2) Browsing tailored to collections. (3) Integration across interfaces and virtual environments. (4) Metadata interoperability and integration. (5) Social graph construction using logging and metrics. (6) Superimposed information and annotation integrated across distributed systems. (7) Streamlined user access with IDs. (8) Web 2.0 multiple social network system interconnection.

Discovering Australia’s Research Data (Short Paper)
Stefanie Kethers, Xiaobin Shen, Andrew Treloar and Ross Wilkinson
Abstract. Access to data crucial to research is often slow and difficult. When research problems cross disciplinary boundaries, problems are exacerbated. This paper argues that it is important to make it easier to find and access data that might be found in an institution, in a disciplinary data store, in a government department, or held privately. We explore how to meet ad hoc needs that cannot easily be supported by a disciplinary ontology, and argue that web pages that describe data collections with rich links and rich text are valuable. We describe the approach followed by the Australian National Data Service (ANDS) in making such pages available. Finally, we discuss how we plan to evaluate this approach.