Data Management and Archiving: Beginning and Intermediate Level

Nicholas Thieberger

Course days, time, and location:
6/22, 6/23, 6/24, & 6/28
10:00 - 11:45
Lillis Business Center, Room 282

Course Information:
This course will focus on the creation of good data from linguistic fieldwork. The basic principle advocated is to create data once in the appropriate format so that it can be reused many times. From the recording through analysis, to the archive and community focused outputs, how can we keep track of what we have done and what stage of processing it is at? How can we transform data from the output of one tool to the input of another? What tools and processes can we use and what does each of them do? This course will contextualise some of the other courses at Infield, showing how the various tools that are being taught fit into a workflow and stressing the importance of allowing the underlying data to flow between tools and then into an archive. Topics to be covered in this course include:


The linguistic fieldwork workflow

Processes and current tools

What is 'well-formed data'?

Language documentation requires archives


Course Documents:
Managing Data 1  
Managing Data 2  
Managing Data 3  
Managing Data 4  

Instructor(s) Bio:
Nick Thieberger works with Warnman, an Indigenous language from Western Australia and South Efate, a language from central Vanuatu, for which he developed a method for citing archival recordings created during fieldwork, presenting a DVD of playable example sentences and texts in the language together with the published grammar. In 2003 he helped establish the Pacific And Regional Archive for Digital Sources in Endangered Cultures ( and continues as the project officer with this multi-institutional archiving project that holds 4.4Tb of data, including 2,440 hours of digitised audio files. He leads a team that is building EOPAS, an online database for presentation of interlinear glossed text with media. In 2008 he established Kaipuleohone, the linguistic archive at the university of Hawai'i. He is interested in developments in e-humanities methods and their potential to improve research practice and he is now developing methods for creation of reusable data sets from fieldwork on previously unrecorded languages. He is the technology editor for the journal Language Documentation and Conservation. He is an Australian Research Council QEII Fellow at the University of Melbourne and an Assistant Professor in the Department of Linguistics at the University of Hawai'i at Mānoa.

                                              Updated July 19, 2010 at 10:16 pm