-
-
Notifications
You must be signed in to change notification settings - Fork 110
Comparing changes
Open a pull request
base repository: webrecorder/browsertrix-crawler
base: v1.6.4
head repository: webrecorder/browsertrix-crawler
compare: v1.7.0
- 16 commits
- 21 files changed
- 4 contributors
Commits on Jul 1, 2025
-
base: bump to brave 1.80.113 (#857)
version: bump to 1.7.0-beta.0 tests: update deprecated command to work with latest minio
Configuration menu - View commit details
-
Copy full SHA for eb374fa - Browse repository at this point
Copy the full SHA eb374faView commit details -
Add option to save local/sessionStorage (#856)
If --saveStorage is set, localStorage and sessionStorage will be serialized with the WARC record for the page. If a page redirects, track what the current page URL is and save storage as part of the page's WARC record. Fixes #855
Configuration menu - View commit details
-
Copy full SHA for 687f08b - Browse repository at this point
Copy the full SHA 687f08bView commit details
Commits on Jul 3, 2025
-
Support downloading seed file from URL (#852)
Fixes #841 Crawler work toward long URL lists in Browsertrix. This PR moves seed handling from the arg parser's validation step to the crawler's bootstrap step in order to be able to async fetch the seed file from a URL. --------- Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com> Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 2af94ff - Browse repository at this point
Copy the full SHA 2af94ffView commit details
Commits on Jul 4, 2025
-
Use consistent profile directory name (merge 1.6.4 change) (#859)
- Use `TMPDIR/btrixProfile` as consistent profile directory name - Avoid accumulation of temp profile dirs if crawler is restarted multiple times, eg. if tmp dir is mapped to /crawls (as is in Browsertrix now), this prevents a proliferation of /crawls/tmp/profile-* dirs for each crawler restart - change released in 1.6.4, merging into main
Configuration menu - View commit details
-
Copy full SHA for c84f58f - Browse repository at this point
Copy the full SHA c84f58fView commit details
Commits on Jul 8, 2025
-
async fetch: allow retrying async fetch if interrupted (#863)
- retry if 'truncated' set, or if size mismatch, or other exception occurs - retry only for network load and async fetch, not for response fetch - set max retries to 2 (same as default for pages currently) - fixes #831
Configuration menu - View commit details
-
Copy full SHA for 6244515 - Browse repository at this point
Copy the full SHA 6244515View commit details -
Support option to fail crawl on content check (#861)
- add --failOnContentCheck for quick fail if content check in behavior fails - expose __bx_contentCheckFailed to cause an immediately failure from behavior - only allow failing crawl due to content check from within awaitPageLoad() callback - set a 'failReason' key to track that crawl failed due to a particular content check reason - deps: update to browsertrix-behaviors 0.9.0, update to wabac.js (2.23.6) - fixes #860 --------- Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
Configuration menu - View commit details
-
Copy full SHA for 549d655 - Browse repository at this point
Copy the full SHA 549d655View commit details
Commits on Jul 21, 2025
-
Fix docs mistaking --waitUntil with --pageLoadTimeout (#864)
Fixes #853 Corrects a documentation inaccuracy pointed out by a user
Configuration menu - View commit details
-
Copy full SHA for acae515 - Browse repository at this point
Copy the full SHA acae515View commit details
Commits on Jul 23, 2025
-
- bump brave to 1.80.122 - bump wabac.js to 2.23.8 - bump RWP to 2.3.15 - bump browsertrix-behaviors to 0.9.1
Configuration menu - View commit details
-
Copy full SHA for 96fd229 - Browse repository at this point
Copy the full SHA 96fd229View commit details -
url queueing: log skipped URLs as errors if depth === 0 (#868)
- will ensure sees from URL list are reported as errors if skipped - also set logging context to 'scope' instead of 'links' - fixes #866 --------- Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
Configuration menu - View commit details
-
Copy full SHA for 1a4341b - Browse repository at this point
Copy the full SHA 1a4341bView commit details -
Add documentation for
--failOnContentCheck
and update CLI options i…Configuration menu - View commit details
-
Copy full SHA for 66402c2 - Browse repository at this point
Copy the full SHA 66402c2View commit details
Commits on Jul 25, 2025
-
Capitalization fix for log messages (#870)
Capitalizes "URL" in log messages.
Configuration menu - View commit details
-
Copy full SHA for bc4d649 - Browse repository at this point
Copy the full SHA bc4d649View commit details
Commits on Jul 29, 2025
-
quickfix: WACZ upload retry support: (#871)
- if a failure occurs on failed upload, and crawler restarts on error, exit with 'interrupt' to allow for automatic restart (eg. in Browsertrix app) - otherwise, a failed upload will exit the crawl with no WACZ, resulting in overall crawl failure
Configuration menu - View commit details
-
Copy full SHA for 0652a3f - Browse repository at this point
Copy the full SHA 0652a3fView commit details -
Don't trim to limit if limit is default of 0 (#873)
Fixes #872 Fix for restarting crawl from saved state, where the default `--limit` value of 0 was incorrectly preventing any URLs from being re-queued.
Configuration menu - View commit details
-
Copy full SHA for aba065c - Browse repository at this point
Copy the full SHA aba065cView commit details
Commits on Jul 30, 2025
-
behavior logging: remove last line dupe check for behavior logs (#874)
Shouldn't skip multiple log messages, as this is unexpected behavior for user-defined behaviors.
Configuration menu - View commit details
-
Copy full SHA for 18fe5a9 - Browse repository at this point
Copy the full SHA 18fe5a9View commit details
Commits on Jul 31, 2025
-
Configuration menu - View commit details
-
Copy full SHA for 5c7ff3d - Browse repository at this point
Copy the full SHA 5c7ff3dView commit details -
Configuration menu - View commit details
-
Copy full SHA for a6ad6a0 - Browse repository at this point
Copy the full SHA a6ad6a0View commit details
This comparison is taking too long to generate.
Unfortunately it looks like we can’t render this comparison for you right now. It might be too big, or there might be something weird with your repository.
You can try running this command locally to see the comparison on your machine:
git diff v1.6.4...v1.7.0