Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: webrecorder/browsertrix-crawler
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: v1.6.1
Choose a base ref
...
head repository: webrecorder/browsertrix-crawler
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: v1.6.2
Choose a head ref
  • 7 commits
  • 12 files changed
  • 2 contributors

Commits on May 12, 2025

  1. lang code fixes: (#834)

    - validate --lang values, fail immediately with invalid iso-639-1
    country code
    - ignore --lang value when using profile, print warning that profile
    language takes precedence
    - fixes #833
    ikreymer authored May 12, 2025
    Configuration menu
    Copy the full SHA
    71de8d6 View commit details
    Browse the repository at this point in the history

Commits on May 20, 2025

  1. Add WARC-Protocol header (#715)

    - add WARC-Protocol repeated header(s) for HTTP, TLS as per iipc/warc-specifications#42
    - also set HTTP/1.0 on WARC record if actually http/1.0, otherwise keep HTTP/1.1
    
    ---------
    Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
    ikreymer and tw4l authored May 20, 2025
    Configuration menu
    Copy the full SHA
    e72b343 View commit details
    Browse the repository at this point in the history

Commits on May 28, 2025

  1. tmpdir: use os.tmpdir() instead of hardcoded '/tmp' (#842)

    allows for customizing tmp directory with TMPDIR env var
    ikreymer authored May 28, 2025
    Configuration menu
    Copy the full SHA
    52235ab View commit details
    Browse the repository at this point in the history
  2. Remove hardcoded /tmp prefix from path (#843)

    Fast-follow to #842 to fix a typo
    tw4l authored May 28, 2025
    Configuration menu
    Copy the full SHA
    46a02d1 View commit details
    Browse the repository at this point in the history
  3. optimization: normalize dedup status: treat 0 (response code not yet …

    …known) or 206 as 200… (#835)
    
    Avoids fetching duplicate content when fetched through different code
    path (eg. autoplay behavior calling fetch, vs video playing automatically)
    ikreymer authored May 28, 2025
    Configuration menu
    Copy the full SHA
    7bf10f7 View commit details
    Browse the repository at this point in the history

Commits on May 29, 2025

  1. remove early serialization which may result in missing WARC-Protocol …

    …and security metadata (#844)
    
    - drop early serialization in handleFetchResponse(), can result in
    writing WARC record too early, before the WARC-Protocol and other data
    is available. (Added previously for requests loaded via browser context /
    service worker which did not get a 'loadingFinished' message, but now
    these will still be closed in awaitPageResources())
    - don't log 'skipping URL from unknown frame' warning since it is often
    spurious, since frame can be added in subsequent message and response is
    *not* skipped.
    ikreymer authored May 29, 2025
    Configuration menu
    Copy the full SHA
    178b10a View commit details
    Browse the repository at this point in the history

Commits on Jun 3, 2025

  1. deps: bump brave 1.79.118 (#845)

    bump version to 1.6.2
    ikreymer authored Jun 3, 2025
    Configuration menu
    Copy the full SHA
    a5936b5 View commit details
    Browse the repository at this point in the history
Loading