Skip to content

Add ISO-639-1 language code validation to backend #2602

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
May 13, 2025

Conversation

tw4l
Copy link
Member

@tw4l tw4l commented May 12, 2025

Fixes #2599

Also adds the new APIErrorDetail code to the workflow editor in the frontend.

@tw4l tw4l requested review from ikreymer and SuaYoo May 12, 2025 16:50
@tw4l tw4l marked this pull request as ready for review May 12, 2025 16:55
@ikreymer
Copy link
Member

Should we add a migration to replace with en any lang codes that are invalid?

Copy link
Member

@ikreymer ikreymer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just need to add migration to validate/fix existing data.

@tw4l
Copy link
Member Author

tw4l commented May 12, 2025

Added a migration, but it still needs some testing. Should also be merged after #2601 due to migration order

@tw4l tw4l requested a review from ikreymer May 13, 2025 15:11
@tw4l tw4l force-pushed the issue-2599-backend-lang-validation branch from 3960e90 to 3b6d013 Compare May 13, 2025 18:12
@tw4l
Copy link
Member Author

tw4l commented May 13, 2025

@ikreymer Reworked the migration and tested it locally. Now includes crawls as well. Didn't find a library that would give us a good ISO-639-1 list in a convenient format so I hardcoded it - I think should be okay since changes are infrequent and almost never happen for the most commonly used languages. Also includes crawls now :)

For testing, I used the API off of main (without the backend validation added in this branch) to create some invalid data, e.g.:

curl -X POST -H "Authorization: Bearer TOKEN" -H "Content-type: application/json" http://localhost:30870/api/orgs/OID/crawlconfigs/ --data "{\"runNow\": true, \"name\": \"lang test\", \"config\": {\"seeds\": [{\"url\": \"https://webrecorder.net/\"}], \"limit\": 3, \"lang\": \"f\"}}"

curl -X POST -H "Authorization: Bearer TOKEN" -H "Content-type: application/json" http://localhost:30870/api/orgs/OID/defaults/crawling --data "{\"lang\": \"e\"}"

then switched to this branch and ran the migration. I verified in mongo before and after that the invalid language codes were present on the workflow, crawl, and org, and then that they were fixed.

@tw4l tw4l merged commit 1492397 into main May 13, 2025
27 checks passed
@tw4l tw4l deleted the issue-2599-backend-lang-validation branch May 13, 2025 20:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add backend validation of language codes
3 participants