Skip to content

Conversation

hornc
Copy link
Collaborator

@hornc hornc commented Sep 12, 2025

closes #7724

Closes #

Technical

Testing

Screenshot

Stakeholders

@Copilot Copilot AI review requested due to automatic review settings September 12, 2025 04:25
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements reading Library of Congress Name Authority File (LC NAF) control numbers from MARC records. It extracts authority control identifiers from subfield 0 in author fields and stores them as remote IDs to enable better author matching and deduplication.

  • Adds parsing of LC NAF identifiers from MARC subfield 0
  • Updates author records to include remote_ids field with lc_naf values
  • Includes comprehensive test coverage for the new functionality

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
openlibrary/catalog/marc/parse.py Adds regex pattern and parsing logic to extract LC NAF IDs from author fields
openlibrary/catalog/marc/tests/test_parse.py Adds test case to verify LC NAF ID extraction functionality
openlibrary/catalog/marc/tests/test_data/bin_expect/880_arabic_french_many_linkages.json Updates expected test output to include remote_ids for authors
openlibrary/catalog/marc/tests/test_data/bin_expect/880_Nihon_no_chasho.json Updates expected test output to include remote_ids for authors

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@hornc hornc changed the title read LC author name authority control from MARC read Library of Congress author name authority control from MARC Sep 12, 2025
@mekarpeles mekarpeles merged commit 1547af5 into internetarchive:master Sep 12, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add author authority control metadata from MARC

2 participants