PCC Wikidata Pilot: Expanding Authority Control to Identity Management

By Mollie Echeverria, Melanie Wacker, Alex Whelan

Authority control

One of the core functions of library cataloging is ensuring that all resources by or about the same agent can be found under the same access point or heading. This process is referred to as authority control. Authority control is commonly managed through the use of authority records, which include the preferred unique heading for a particular agent, along with variant forms of the name for cross-referencing purposes. These authority records are collected in indexes referred to as authority files.

In the U.S., the primary authority file used by libraries is the Library of Congress Name Authority File (LC/NAF). Records in the LC/NAF are created by catalogers at the Library of Congress and by institutions participating in the Program for Cooperative Cataloging Name Authority Cooperative Program (PCC/NACO). About 10 million authority records have been contributed to the LC/NAF by Library of Congress catalogers and NACO trained catalogers at other PCC member libraries over the years. (John Riemer. Wikidata Pilot Meeting: Background and Goals (August 27, 2020). https://wiki.lyrasis.org/display/pccidmgt/Wikidata+Pilot+Kick-off+meetings (accessed May 24, 2021))

 

LC/NAF record for Gay Officers Action League
Figure 1: LC/NAF record for Gay Officers Action League

 

Screenshot of NACO program home page
Figure 2: Screenshot of NACO program home page

Challenges of NACO Cataloging

In order to contribute authority records to the NAF, catalogers at NACO institutions must complete an intensive training. Because of the volume of training required to contribute to the NAF, generally only a select number of library staff at a given institution are able to become NACO catalogers.

Due to the relatively small number of catalogers trained to contribute to NACO, many agents in libraries’ bibliographic catalogs may be left without controlled name authority records. This lack of controlled access points can create a host of obstacles, including publications by different authors with the same name all being attributed to one person, publications by corporate bodies that have changed names not being linked to one another, and variant forms of an author’s name not showing up in searches.

Besides the limited pool of catalogers able to contribute to NACO cataloging, the formatting of names in the LC/NAF can create additional obstacles. Contemporary OPACs are reliant on matching text strings, a holdover from the time of the card catalogs. The access point for the agent’s name must be formulated according to current standards and always match the “preferred label” exactly.

Enter Wikidata

A few years ago, the PCC began an effort to put its name authority work on a more sustainable footing. As outlined in PCC Strategic Directions, 2018-2022, the PCC is now seeking to move the library community from static, record-based authority work toward more flexible, metadata-based identity management. It seeks to facilitate the implementation of new tools and technologies like linked open data and encourages collaboration with a more diverse variety of outside communities.

The PCC Wikidata Pilot (started in 2020) is the latest outcome of this work. Wikidata offers many possible benefits as a tool for PCC member organizations. Compared to the NAF, Wikidata contains over 89 million entities and has a much larger group of contributors from a far more diverse range of backgrounds because it is open to all.

PCC Wikidata Pilot at Columbia University Libraries

Columbia University Libraries’ Original & Special Materials Cataloging Department (OSMC) was immediately intrigued by the possibilities offered by the PCC Wikidata Pilot, and a team formed around the project consisting of Mollie Echeverria, Matthew Haugen, Ryan Mendenhall, Melanie Wacker, and Alex Whelan.

Amongst a number of project ideas, we had an immediate problem on our hands that we hoped this new Wikidata pilot could help us solve. Over the past three years, the Columbia Libraries have been digitizing a large number of audiovisual materials from our collections as part of the Mellon Audio and Moving Image (AMI) Project. Included in these materials are oral history interviews from the Oral History Archives at the Columbia Center for Oral History (CCOH). At CUL, oral histories are cataloged using individual MARC records, which are then fed into CLIO, the Oral History Portal, OCLC WorldCat, and eventually in converted form, the Digital Library Collections.

Screenshot of the Time-Based Media Initiative landing page at the Columbia Digital Library Collections site
Figure 3: Screenshot of the Time-Based Media Initiative landing page at the Columbia Digital Library Collections site

In year 1 of the AMI project, Alex Whelan, CUL’s Time Based Media Metadata Librarian, was trained to provide needed MARC bibliographic as well as authority records. However, in the following year, Alex’s efforts were needed to work primarily on audio/visual materials from CULs’s archival collections and the oral history processing was taken on by Metadata Operations Specialist Mollie Echeverria.

Mollie was familiar with MARC bibliographic cataloging, but NACO work was outside of her training and her responsibilities. Initially, Alex continued to provide the necessary authority work, but since he was not the one working with the actual materials, this proved problematic and labor-intensive.

The OSMC AMI team developed the idea that Mollie could create Wikidata entries. In these entries, she could capture the information about a specific interviewee at the time of cataloging. Alex (or any other NACO-trained cataloger at OSMC) could then use this information as a basis for minimal NACO records for use in our catalogs and databases. Both the Wikidata entry and the related name authority record would contain their respective identifiers thereby linking the two descriptions.

 

View of the editing interface in OCLC Connexion for the authority record for Jane Abell
Figure 4: View of the editing interface in OCLC Connexion for the authority record for Jane Abell
View of the "identifiers" panel, including the LC/NAF identifier, in a Wikidata item
Figure 5: View of the “identifiers” panel, including the LC/NAF identifier, in a Wikidata item

Workflow

Wikidata proved to be a very low-barrier tool and easy to learn for our project team.
Next, we had to develop a workflow. Mollie created a spreadsheet in which she entered the uncontrolled name and some basic information, such as the related CLIO ID of the oral history interview. She then searches Wikidata for an existing entry or — if none exists — creates a new Wikidata item. The identifiers for these Wikidata items also get added to the spreadsheet.

Internal tracking spreadsheet for CUL's Wikidata item creation
Figure 6: Internal tracking spreadsheet for CUL’s Wikidata item creation

In addition, we are linking the Wikidata items to our project page using a specific property (P5008) that automatically updates our list of items created as part of this project.  A NACO cataloger then uses this information to create a new authority record and formulates the correct access point and cross-references so that it can function in our systems, but links back to the fuller Wikidata entry instead of repeating all of the information. The new NACO identifier, in turn, gets added into the Wikidata description.

Wikidata items created by CUL catalogers during the PCC Wikidata Pilot
Figure 7: Wikidata items created by CUL catalogers during the PCC Wikidata Pilot

While Wikidata allows us to record all kinds of detailed information about an agent, we wanted to be careful about what should be included. For one, the basic idea was to reduce the time that it takes to do the authority work, so we needed to ensure that each entry did not turn into a research project of its own. Other PCC project participants had also reported that due to the large number of Wikidata participants, new items get enhanced by others fairly quickly.  Second, we wanted to pay attention to the privacy concerns of the individuals described. We followed the Wikidata guidelines for the description of living people and created a set of core elements based on that.

Draft of core set of elements used by CUL for the PCC Wikidata Pilot
Figure 8: Draft of core set of elements used by CUL for the PCC Wikidata Pilot
CUL mapping of Wikidata properties to MARC-21 authority format
Figure 9: CUL mapping of Wikidata properties to MARC-21 authority format

Future Directions

Since Wikidata also surfaces information to a broader audience, we can use it to highlight our collections and agents from underrepresented groups connected to them.  David Olson, CUL’s Oral History Archivist, has pointed to a group of African-American newspapers that are featured in the Black Journalist Oral History Collection, that should be represented both in the NACO file and on Wikidata. Alex Whelan has started on the groundwork of identifying both NACO practices for newspapers and the needed set of Wikidata properties to create more detailed descriptions there.

This workflow has proven to be easy to implement and opens the door to other projects where NACO catalogers could collaborate with archivists, curators, or graduate students thereby making our work more inclusive. We are looking forward to all the possibilities.

Leave a Reply

Your email address will not be published.