-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Refactor buildpack data store and bin/report
#1878
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The Node.js buildpack had a set of Bash helper utils for managing a key-value store of metadata, that could be added to during the build, and then output when bin/report was run. When the Python buildpack added support for bin/report, its implementation was based on the Node.js one, with some further small improvements. However, there remains a number of issues with the design: 1. It handles different data types (string vs int vs float vs bool) at `bin/report` time, so requires a hardcoded mapping of values which can (and does) get out of sync. (Plus doesn't work well for some cases, like jvm-common being wrapped by the other JVM related buildpacks.) 2. It tries to escape newlines and carriage returns, but there are a number of edge cases still not handled properly which result in some fields not being sent to Honeycomb properly. In order to work around the above, Manuel created a jq based approach for use by the JVM buildpacks: https://github.com/heroku/heroku-buildpack-jvm-common/blob/4fd6e4f8f258a5766a8b2e9ffec2690675b00879/lib/metrics.sh However, that doesn't have the full API of the existing solution (it's missing the ability to fetch previous values; something which Python needs), and I spotted a few more places where things can be optimised. In addition, now that we're using jq and JSON internally in the buildpack, it makes sense for us to update cytokine to support the build report being emitted as JSON too (in addition to YAML), particularly since cytokine then submits the report to Vacuole as JSON afterwards. And in fact, since JSON is roughly a subset of YAML (at least for the simple data types and schema we're using for the build report), we can start emitting the build report as JSON now, rather than needing to wait for the cytokine change to land first. (We'll still want to update cytokine to use a JSON parser though, just to avoid any edge cases in the future). The new implementation: - Sets the data type at the call site where the metadata is set, rather than relying on a hardcoded mapping at time of retrieval. - Uses jq to do the string escaping, so much less fragile than the hand-rolled implementation. - Uses a simplified initialisation/setup API, which avoids the need to call both `init` and `setup`. - Has been rewritten to avoid unnecessary processes calls, redirections, and patterns that required disabling a number of shellcheck warnings. I've confirmed that metrics still appear in Honeycomb with the changes here. After this merges, I'll open PRs against some of the other language buildpacks that don't yet implement `bin/report`, using the improved implementation from here (with simplifications since other buildpacks won't need the legacy data store handling that Python needs to support caches written by older buildpack versions). GUS-W-19344754.
f3cb879
to
2c81271
Compare
schneems
approved these changes
Aug 21, 2025
Co-authored-by: Richard Schneeman <richard.schneeman+foo@gmail.com> Signed-off-by: Ed Morley <501702+edmorley@users.noreply.github.com>
dzuelke
approved these changes
Aug 21, 2025
I've also updated the PR to:
|
bin/report
bin/report
Merged
schneems
added a commit
to heroku/heroku-buildpack-ruby
that referenced
this pull request
Aug 21, 2025
The python buildpack was modified to use JSON instead of YAML as it's easier to use `jq` to format and handle issues such as escaping. This preserves the same API, but changes the format and adopts the strategy from heroku/heroku-buildpack-python#1878.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The Node.js buildpack has a set of Bash helper utils for managing a key-value store of build data/metadata, that could be added to during the build, and then output when
bin/report
was run. When the Python buildpack added support forbin/report
, its implementation was based on the Node.js one, with some further small improvements.However, there remains a number of issues with the design:
bin/report
time, so requires a hardcoded mapping of values which can (and does) get out of sync. (Plus doesn't work well for some cases, like jvm-common being wrapped by the other JVM related buildpacks.)In order to work around the above, Manuel created a jq based approach for use by the JVM buildpacks:
https://github.com/heroku/heroku-buildpack-jvm-common/blob/4fd6e4f8f258a5766a8b2e9ffec2690675b00879/lib/metrics.sh
However, that doesn't have the full API of the existing solution (it's missing the ability to fetch previous values; something which Python needs), and I spotted a few more places where things can be optimised.
In addition, now that we're using jq and JSON internally in the buildpack, it makes sense for us to update cytokine to support the build report being emitted as JSON too (in addition to YAML), particularly since cytokine then submits the report to Vacuole as JSON afterwards.
And in fact, since JSON is roughly a subset of YAML (at least for the simple data types and schema we're using for the build report), we can start emitting the build report as JSON now, rather than needing to wait for the cytokine change to land first. (We'll still want to update cytokine to use a JSON parser though, just to avoid any edge cases in the future).
The new implementation:
I've confirmed that metrics still appear in Honeycomb with the changes here.
After this merges, I'll open PRs against some of the other language buildpacks that don't yet implement
bin/report
, using the improved implementation from here (with simplifications since other buildpacks won't need the legacy data store handling that Python needs to support caches written by older buildpack versions). And once Manuel is back, I'll discuss with him backporting some of the improvements to the JVM buildpacks too.GUS-W-19344754.