History | Overview | People

As part of Good Systems, a UT Bridging Barriers Grand Challenge, “AI.4.AV: Building and Testing Machine Learning Methods for Metadata Generation in Audiovisual Collections” is developing a methodology and workflow for libraries, archives, and museums (LAMs) to use machine learning and supercomputing resources to generate metadata for AV materials in the humanities. In the process, the project will address research questions around defining and evaluating a “good” system for introducing AI for AV to information professionals. 

History: Audiovisual (AV) materials play a fundamental role as historical and scientific records. AV materials provide evidence for every activity on earth from endangered languages to rare bird calls to the sonification of underwater, melting polar ice caps. The number of these documentary records are increasing exponentially in every field in the humanities and the sciences, and yet the professionals tasked with preserving and helping make these materials useful to scholars and the general public often lack the knowledge and resources to do so. 

What is a good system for those who have the responsibility for managing and preserving these assets? Generating metadata — which is essential for indexing and searchability — requires too much time if done manually. Using machine learning to generate metadata is promising, but information professionals must still overcome a host of technological and social challenges. 

Project research questions around defining and evaluating this “good” system consider: 

  1. What kinds of technical understanding and human and computer interfaces best facilitate the use of machine learning in LAMs? 
  2. Can we design workflows for storage and processing incorporating centralized technologies that produce useful results and adhere to LAM standards and values? 
  3. What are the perceived local and global issues for sustaining these workflows in terms of cost, time, dependencies, and usefulness? 

Answering these questions will aid designing and prototyping a good system in which the meaning of “good” is equivalent to understandable, efficient, and scalable tools for libraries, archives, and museums to untap and improve access to their valuable audiovisual collections. 

Overview: The project deliverables will include:

  1. A suite of tested API-driven executable code bases for implementing the deep neural network architecture DeepSpeech interface, documentation, and a plan for full-scale implementation; 
  2. A shareable workshop framework for introducing LAM professionals to advanced technologies such as DeepSpeech and other ML tools; 
  3. A suite of elastic workflows based on use cases at UT LAMs for exploring potential significant patterns in audiovisual collections with documentation, tutorials, and sample datasets; 
  4. A team of curators with an advanced understanding of using machine learning and HPC resources; 
  5. A white paper describing our methodology for developing and evaluating the above as well as conference papers and research articles that address our research questions. 

People: AI4AV team members are:

Tanya Clement, Associate Professor of English, UT Austin;

Aaron Choate, Director of Digital Strategies; Chair of the Digital Strategies Council for the Directors of Campus Collections, UT Libraries;

Maria Esteva, Research Associate and Data Archivist, Texas Advanced Computing Center

Weijia Xu, Research Scientist Manager, Scalable Computational Intelligence, Texas Advanced Computing Center

Hannah Hopkins, Graduate Research Assistant, English Department and School of Information, UT Austin