-
-
Notifications
You must be signed in to change notification settings - Fork 59
Labels
help wantedOpen to participation from the communityOpen to participation from the community✨ goal: improvementImprovement to an existing featureImprovement to an existing feature🏁 status: ready for workReady for workReady for work💻 aspect: codeConcerns the software code in the repositoryConcerns the software code in the repository🟩 priority: lowLow priority and doesn't need to be rushedLow priority and doesn't need to be rushed
Description
Problem
The project currently has GitHub and GCS as automated data sources, but not Openverse. Openverse provides a large collection of openly licensed media, which will greatly enhance the breadth and depth of this data observatory
Description
Openverse aggregates data from several other openly licensed repositories like Flickr. It provides:
- quantity of records: millions
- types of metadata available:
source
,license
,license_version
,media_type
etc - API documentation link: https://api.openverse.engineering/v1/, https://api.openverse.org/v1/images/
- API requirements and limitations:
- no API key required
- rate limits are low but sufficient for batch collection
- supports filtering by
license
andmedia_type
Alternatives
- Work on another source
Additional context
- Still understanding the project and solving this issue with one simple PR at a time
- Openverse is compatible with the project structure for tracking CC Legal tools usage
Implementation
- I will be implementing this feature
- Focus on a single non-monolithic script
scripts/1-fetch/openverse_fetch.py
- Design script to run from the repository via
pipenv
- include
--enable-save
and--enable-git
for consistent behavior with other scripts
TimidRobot
Metadata
Metadata
Assignees
Labels
help wantedOpen to participation from the communityOpen to participation from the community✨ goal: improvementImprovement to an existing featureImprovement to an existing feature🏁 status: ready for workReady for workReady for work💻 aspect: codeConcerns the software code in the repositoryConcerns the software code in the repository🟩 priority: lowLow priority and doesn't need to be rushedLow priority and doesn't need to be rushed
Type
Projects
Status
Backlog