Implement a way to be able to debug cache misses that can be always on without too much overhead #18643

lberki · 2023-06-12T12:19:09Z

Description of the feature request:

Sometimes, it's useful to be able to dig into why a particular action was not cached by remote execution.

The usual culprits are a changed input file, different input files or a different command line. Bazel provides ample tools to query what it would run (aquery) and what it did (--execution_log_*_file or the BEP) but neither of these is amenable to post-facto debugging:

aquery requires extra conscious effort
The execution logs are too large to be always uploaded
The BEP does not contain enough information

So it would be nice to be able to do some sort of logging to answer questions like "why did this build done two days ago did not reuse the cached actions from that other build done 30 minutes before it".

This doesn't necessarily need to be done within Bazel (although that's the particular hammer I have); I could also imagine logging on the remote execution side where it is logged with actions are done by which Bazel invocations.

What underlying problem are you trying to solve with this feature?

Being able to tell why a particular action was not cached so that future cache hit rates are higher.

Which operating system are you running Bazel on?

No response

What is the output of `bazel info release`?

No response

If `bazel info release` returns `development version` or `(@non-git)`, tell us how you built Bazel.

No response

What's the output of `git remote get-url origin; git rev-parse master; git rev-parse HEAD` ?

No response

Have you found anything relevant by searching the web?

No response

Any other information, logs, or outputs that you want to share?

No response

The text was updated successfully, but these errors were encountered:

tbaing · 2023-06-13T22:25:29Z

Our exec logs turn out to be ~120GB (and ~150GB once run through the parser) so we'd appreciate anything that can significantly reduce the amount of output while still letting us answer questions about which actions needed a rebuild (rather than being reusable from cache) and which dependencies drove that need.

lberki · 2023-06-14T07:26:10Z

cc @ramymedhat

meisterT · 2023-06-20T13:00:25Z

Ok, I did look into this a bit and found some issues and some avenues.

Reducing the size

There is is a couple things we could do to reduce the size of the output:

use compression, zstd has proven most useful a couple of times both because of low CPU cost and good compression rates
use an indexed format for artifacts instead of listing each artifact in full where it appears as input or output
instead of listing inputs as a list, include the nested set structure, again indexed; this is non-trivial and would incur additional CPU cost, so I would like to avoid that.
leaner proto with less fields populated (removing command lines, listed outputs, textual status, progress message)

I tried zstd compression outside of Bazel (but this can easily be done within Bazel as well if we decide this is the way to go) and also did some dirty hacks to understand the benefits from artifact indexing and a leaner proto.

large build:

what	without compression	zstd compressed
before	99GB	20GB
artifact indexing	5.1GB	610MB
artifact indexing plus lean fields	2.3GB	450MB

medium build:

what	without compression	zstd compressed
before	1.3GB	65MB
artifact indexing	217MB	14MB
artifact indexing plus lean fields	47MB	9MB

The path indexing contains the exact same information as before, but needs to be re-constructed to compare. The lean format does leave out information but probably most of these are not relevant, one exception is command lines where I would store one high level hash. You can still see that the command line has changed and you can reconstruct it, e.g. using aquery. A tool could do this, see more below.

Performance overhead

I noticed that action execution was quite a bit slower with the execution log turned on (regardless of which variant). I added more profiling to the JSON trace profile to better understand it (and will send a PR to add this permanently). For the large build from above, the with the exec log it took ~75mins, without ~7mins.

I think we should move to a producer / consumer pattern where we put spawns to be processed in a queue and work on writing the file in a separate thread. We have to see how much overhead this pattern is memory-wise.

I am not sure whether that is enough to reduce the overhead into acceptable levels.

Different formats

We should use the opportunity to clean up the different logging options anyway. Currently we have:

--experimental_execution_log_file: “unordered” spawn protos
--execution_log_binary_file: First writes the file from above to /tmp and then parses it again and then sorts it in stable order which is CPU and memory intensive (e.g. see improve usability of --experimental_execution_log_file #6807). In my opinion, this is nothing that should live in Bazel and should be removed.
--execution_log_json_file: Same as --execution_log_binary_file, should also go away at least in the current form with sorting. @aherrmann mentions in Documentation issue: Debugging remote cache hits for remote execution #12770 that the JSON format has the advantage of using jq and being human readable. So, we might want to keep the option to output both binary and JSON around in some form.
--experimental_execution_log_spawn_metrics: Adds spawn metrics to the profile. If we add an option to produce a leaner exec log as mentioned above, we could consider making this option go away and write it always unless the lean log format is requested.

Perhaps it would also make sense to add an option to only write the exec log for specific actions (basically filtering on mnemonics and targets, see also #12075) and for the other actions only include high-level information? This would not only help addressing size issues but also performance concerns.

How to compare protos

For large exec logs, a tool to compare them is essential. @coeuvre created https://github.com/coeuvre/bazel-find-non-deterministic-actions - it helps finding deterministic actions, but is built on top of the JSON output. The tool needs to be changed to support the lean proto format.

Then we could also let the tool spit out other helpful commands like bazel aquery for further debugging.

Related issues

Include execution cache hit ratios #16043: Include execution cache hit ratios
View local cache hits and compare against remote cache hits #12960: View local cache hits and compare against remote cache hits
Documentation issue: Debugging remote cache hits for remote execution #12770: documentation is stale
Feature request: Filtering execution log #12075: FR to add filtering
improve usability of --experimental_execution_log_file #6807: stable sorting is expensive
Stable sort of execution logs is unaware of shards #15155: stable sorting is buggy
Execlog parser should output each action's digest #16741: usability concerns
An --execution-log-level= flag that determines the level of execution logging #13269: FR to add control over how much data is in the exec log
Execution log file doesn't expand params files #8718: FR to expand param files

Next steps

I am aware that I am abusing this feature request a little bit to do one step back and understand the bigger picture. I want to give people some time to voice their opinions here before proceeding, so please let me know what you think.

This might help to understand when it is costly, see also #18643 PiperOrigin-RevId: 542194475 Change-Id: I85932d8a5d7a832ee4675722642deb675f7db81f

sheldonneuberger-sc · 2023-08-09T19:38:23Z

@meisterT thanks for that detailed outline, it aligns with what I see too: massive exec logs (50GB+), and the build is slower if exec log is on. I'd prefer to have exec logging always on, but this makes it unworkable.

chancilasnap · 2023-08-09T19:49:13Z

we would prefer this data be visible through BES somehow, be it as a cas artifact (like the profile) or via special messages...the BES also prints actions (generally on failure only) but it can emit all actions in a slightly different format...it's annoying the BES actions and exec log spawns don't match since exec log seems richer...

I noticed that action execution was quite a bit slower with the execution log turned on (regardless of which variant). I added more profiling to the JSON trace profile to better understand it (and will send a PR to add this permanently). For the large build from above, the with the exec log it took ~75mins, without ~7mins.

I think we should move to a producer / consumer pattern where we put spawns to be processed in a queue and work on writing the file in a separate thread. We have to see how much overhead this pattern is memory-wise.

I am not sure whether that is enough to reduce the overhead into acceptable levels.

are execlogs written in the threads running the actions? seems like that would have huge contention issues possibly

tbaing · 2023-09-01T23:57:40Z

@meisterT +1 on the thanks for the detailed analysis.

Reducing the size

For our use case, most of the content is inputs and actual_outputs, so artifact indexing seems most promising for our needs.

Leaner fields could also be useful, but we'd see the most benefit if you focus on inputs/outputs vs. per-action outputs. Hashing the command line and env vars could save some space, but reconstructing those things may be somewhat annoying (the command line requires you to checkout exactly that source code to reconstruct via aquery) or hard (env vars on a CI bot that's been deleted are going to take some non-trivial work to recreate, probably more than they're worth). For my $0.02 I'd just leave them alone. And I find that many of those one-per-action lines are useful for understanding the context ("did we rebuild this action despite no changed inputs because it's cacheable: true?"), so for always-on usage I'd hesitate to cut those things. Whereas I'd happily drop the hash_function_name and size_bytes from inputs/outputs. Set the default hash function at the top of the file and only list it per artifact if it doesn't match, and I don't care about size because all I really care about is "did its hash change between invocations?" Maybe have a flag to control whether to cut the per-action content so people who find it useful can leave it on while cutting the per-artifact lines?

Compression SGTM.

Performance overhead

Anything to improve the overhead of creating this content is welcome. Having less content to create, and compressing that content, might help if the bottleneck is the writes to the disk.

Different formats

Your observation that we could have a single (efficient to create and store) output format that then gets post-processed into other formats (including stable ones) - whether for comparison against another log or just for reading to understand what happened in the current execution - seems right to me.

How to compare protos

I agree that a tool is essential. The one you referenced is a good start, but would benefit from enhancements such as analyzing which changed inputs are outputs from another action in the current run. An action for which all changed inputs are outputs from other actions is far less interesting when you're trying to understand why you rebuilt something you didn't expect to, so the tool would benefit from the ability to 1) focus on the non-transitive changes, and 2) walk easily down through the transitive changes to find interesting non-transitive changes. The ability to filter in/out branches of that graph would also be helpful; I could imagine saying "I want to see all non-transitive changes, but not the ones that chain up into dependency B, or only the ones that chain up in to A and C." I could also imagine that it might be valuable to understand cases where a deeply nested dependency was rebuilt due to changed inputs but produced unmodified output and so didn't contribute to rebuilds of artifacts higher in the dependency graph, and that the tool might help understand that there are multiple distinct subsets of the graph.

I'm not sure I agree that the tool shouldn't work from the JSON content. Having the analysis tool highlight aspects of the content in human-readable formats, and then being able to go find the raw content in human-readable format, seems like a better workflow than having the tool be the only way to interact with the content. The tool should handle all the variations in the data model (no matter how lean/rich the user chooses to capture), but JSON seems like an OK format for the reasons you touched on in your "Different formats" section.

In analyzing execution logs this week, I've found some value in having a file per action, for easy comparison or examination of that particular action vs. having to diff the entire file or open it and search down to the specific part I'm interested in. It probably wouldn't matter with a 5MB file, it was annoying but not worse with a 1GB file, but it's high-friction with our full 100+GB files. And doing that could help with the memory usage required for sorting (my experience was that I needed the JVM to have an Xmx around 100GB for a 100GB sorted exec log due I believe to the sort at the end) since sorting would be just a matter of reading an index to determine how to concatenate the individual files rather than something that has to happen in memory.

minor-fixes · 2023-09-02T00:52:19Z

+1 on data available through BES, or streaming through some remote protocol.

There's already a good deal of challenge around getting the bazel client to perform well (in the face of memory issues, startup times, dependency fetching, etc.) - only offering this info via client-generated local files could very well mean that users need to supply:

mechanisms to unconditionally dump these files ("always on" in the FR) without filling up local disk, etc.
mechanisms to exfiltrate these files from automated build machines, as well as user machines
tuning around automated bazel runs (if running this feature unconditionally incurs a CPU/build time penalty), which are already challenging (see above)

The various streaming protocols that bazel supports (BES, RBE) have been a godsend for doing this sort of analysis/reporting across builds - I'm sure that if this info were available in the build event stream (maybe with verbosity flags/options as needed) then the community would write the backends to store/aggregate the data appropriately.

tbaing · 2023-09-15T19:44:34Z

While trying to debug what changed between two builds, I discovered something else that makes the current implementation hard to use. When an action is not a cache hit, we get information about the actual_outputs produced by the action, including and especially the hashes of each file generate. However, when an action is a cache hit, that content is not provided, and instead there is a digest block given. So when diffing two successive runs where the first built the artifact and the second reused that artifact from the cache, we see irrelevant differences because the second run doesn't have this information, which makes it harder to zero in on the actual differences between the files.

I'm fine with having differences in the runner line (local vs remote cache hit) and the remote_cache_hit line because those are easy to grep out or ignore, but the differences in the actual_outputs makes it much harder to grab two of these logs and compare them against each other because the differences are in content that we actually want to look at for differences.

acecilia · 2023-10-23T22:00:33Z

We were discussing today in BazelCon Community day an idea to be able to test cache determinism in every pull request. Leaving it here in case is useful.

Assuming remote cache is enabled, and that it contains artifacts for latest commit in main git branch (the usual):

Get list of targets that missed cache
- Perform the usual test build of your targets
- Out of that build, generate a list of the targets that missed caching
- => So the CI usage/capacity stays the same, no need to perform extra builds
Get the list of targets that changed based on git diff between current branch and target branch of the PR
- Get the files that changed between current branch and target branch of the PR
- Calculate the targets that are associated with those changed git files (maybe with bazel diff?)
Compare above two target lists. If the list of the cache misses is not a subset of the list generated by git diff, the pull request is introducing non-determinism

…optimize creation of unsorted logs. Since we're about to introduce a new log format (see #18643), let's take this opportunity to clean up the command line API. Avoid a pointless copy of the execution log, if no sorting is requested, by writing directly into the requested output binary path. If a temporary file is necessary, write it to a well-known path in the output base, and delete it after conversions are successful; this ensures that large temporary files don't accumulate in /tmp. RELNOTES[INC]: Delete the --experimental_execution_log_file flag. Use --execution_log_binary_file in conjunction with --noexecution_log_sort instead. PiperOrigin-RevId: 580132026 Change-Id: Icca827260176b3ea804384d2bed4a56ce85cfc14

…ag, and optimize creation of unsorted logs. Since we're about to introduce a new log format (see bazelbuild#18643), let's take this opportunity to clean up the command line API. Avoid a pointless copy of the execution log, if no sorting is requested, by writing directly into the requested output binary path. If a temporary file is necessary, write it to a well-known path in the output base, and delete it after conversions are successful; this ensures that large temporary files don't accumulate in /tmp. RELNOTES[INC]: Delete the --experimental_execution_log_file flag. Use --execution_log_binary_file in conjunction with --noexecution_log_sort instead. PiperOrigin-RevId: 580132026 Change-Id: Icca827260176b3ea804384d2bed4a56ce85cfc14

…ag, and optimize creation of unsorted logs. (#20132) Since we're about to introduce a new log format (see #18643), let's take this opportunity to clean up the command line API. Avoid a pointless copy of the execution log, if no sorting is requested, by writing directly into the requested output binary path. If a temporary file is necessary, write it to a well-known path in the output base, and delete it after conversions are successful; this ensures that large temporary files don't accumulate in /tmp. RELNOTES[INC]: Delete the --experimental_execution_log_file flag. Use --execution_log_binary_file in conjunction with --noexecution_log_sort instead. PiperOrigin-RevId: 580132026 Change-Id: Icca827260176b3ea804384d2bed4a56ce85cfc14

They're fairly verbose and represent a significant portion of the log, much more so once we switch to a compact format. It should be possible to uniquely identify spawns through a combination of mnemonic, target label and output path. Progress on bazelbuild/bazel#18643. PiperOrigin-RevId: 582273738

They're fairly verbose and represent a significant portion of the log, much more so once we switch to a compact format. It should be possible to uniquely identify spawns through a combination of mnemonic, target label and output path. Progress on #18643. PiperOrigin-RevId: 582273738 Change-Id: Ife9edc6aaf51c96e4ac42a84f69371ab4aaa50eb

We now unconditionally emit the full spawn metrics, which already include the wall time; the extra field is wasteful and potentially confusing. Progress on bazelbuild/bazel#18643. PiperOrigin-RevId: 582282615

We now unconditionally emit the full spawn metrics, which already include the wall time; the extra field is wasteful and potentially confusing. Progress on #18643. PiperOrigin-RevId: 582282615 Change-Id: I73f87f3e1c54679506acde860991ef57947262d3

… reflects its purpose. Also amend misleading or plainly incorrect comments. In particular, the runner field is always populated, even for cache hits, and spawns are only missing when the persistent action cache is hit ("local" is ambiguous, as we also use it in certain places to refer to the disk cache). Note that, while the runner and cache_hit fields currently overlap in purpose, their separate existence is better aligned with potential future plans to promote cache lookups to a strategy (see bazelbuild/bazel#19904). Progress on bazelbuild/bazel#18643. PiperOrigin-RevId: 582294420

… reflects its purpose. Also amend misleading or plainly incorrect comments. In particular, the runner field is always populated, even for cache hits, and spawns are only missing when the persistent action cache is hit ("local" is ambiguous, as we also use it in certain places to refer to the disk cache). Note that, while the runner and cache_hit fields currently overlap in purpose, their separate existence is better aligned with potential future plans to promote cache lookups to a strategy (see #19904). Progress on #18643. PiperOrigin-RevId: 582294420 Change-Id: I75fc9a42b8a38c2233cc033e97d7c87264fb11d7

We now unconditionally emit the full spawn metrics, which already include the wall time; the extra field is wasteful and potentially confusing. Progress on #18643. PiperOrigin-RevId: 582282615 Change-Id: I73f87f3e1c54679506acde860991ef57947262d3

…curately reflects its purpose. Also amend misleading or plainly incorrect comments. In particular, the runner field is always populated, even for cache hits, and spawns are only missing when the persistent action cache is hit ("local" is ambiguous, as we also use it in certain places to refer to the disk cache). Note that, while the runner and cache_hit fields currently overlap in purpose, their separate existence is better aligned with potential future plans to promote cache lookups to a strategy (see bazelbuild#19904). Progress on bazelbuild#18643. PiperOrigin-RevId: 582294420 Change-Id: I75fc9a42b8a38c2233cc033e97d7c87264fb11d7

…curately reflects its purpose. (#20201) Also amend misleading or plainly incorrect comments. In particular, the runner field is always populated, even for cache hits, and spawns are only missing when the persistent action cache is hit ("local" is ambiguous, as we also use it in certain places to refer to the disk cache). Note that, while the runner and cache_hit fields currently overlap in purpose, their separate existence is better aligned with potential future plans to promote cache lookups to a strategy (see #19904). Progress on #18643. PiperOrigin-RevId: 582294420 Change-Id: I75fc9a42b8a38c2233cc033e97d7c87264fb11d7

Although this is a breaking change, I find it unlikely that anyone requests both, given how unwieldy these files are. This avoids future complexity around format conversions as we add more of them (specifically: in order to add a new format while keeping the old formats equally performant, the choice of "raw" format would likely have to vary, so there would be additional pairs to convert between). The current conversion code can also be simplified, as there's now at most one conversion to make, but I prefer doing that separately. Related to bazelbuild/bazel#18643. PiperOrigin-RevId: 582994996

Although this is a breaking change, I find it unlikely that anyone requests both, given how unwieldy these files are. This avoids future complexity around format conversions as we add more of them (specifically: in order to add a new format while keeping the old formats equally performant, the choice of "raw" format would likely have to vary, so there would be additional pairs to convert between). The current conversion code can also be simplified, as there's now at most one conversion to make, but I prefer doing that separately. Related to #18643. PiperOrigin-RevId: 582994996 Change-Id: I8e0bc228c73a0da7f3e7b830ade2cab6c14319ff

…tion log. Although this is a breaking change, I find it unlikely that anyone requests both, given how unwieldy these files are. This avoids future complexity around format conversions as we add more of them (specifically: in order to add a new format while keeping the old formats equally performant, the choice of "raw" format would likely have to vary, so there would be additional pairs to convert between). The current conversion code can also be simplified, as there's now at most one conversion to make, but I prefer doing that separately. Related to bazelbuild#18643. PiperOrigin-RevId: 582994996 Change-Id: I8e0bc228c73a0da7f3e7b830ade2cab6c14319ff

…tion log. (#20222) Although this is a breaking change, I find it unlikely that anyone requests both, given how unwieldy these files are. This avoids future complexity around format conversions as we add more of them (specifically: in order to add a new format while keeping the old formats equally performant, the choice of "raw" format would likely have to vary, so there would be additional pairs to convert between). The current conversion code can also be simplified, as there's now at most one conversion to make, but I prefer doing that separately. Related to #18643. PiperOrigin-RevId: 582994996 Change-Id: I8e0bc228c73a0da7f3e7b830ade2cab6c14319ff

They're fairly verbose and represent a significant portion of the log, much more so once we switch to a compact format. It should be possible to uniquely identify spawns through a combination of mnemonic, target label and output path. Progress on bazelbuild/bazel#18643. (cherry picked from commit b7e97fd)

We now unconditionally emit the full spawn metrics, which already include the wall time; the extra field is wasteful and potentially confusing. Progress on bazelbuild/bazel#18643. (cherry picked from commit ab4d2a1)

… reflects its purpose. Also amend misleading or plainly incorrect comments. In particular, the runner field is always populated, even for cache hits, and spawns are only missing when the persistent action cache is hit ("local" is ambiguous, as we also use it in certain places to refer to the disk cache). Note that, while the runner and cache_hit fields currently overlap in purpose, their separate existence is better aligned with potential future plans to promote cache lookups to a strategy (see bazelbuild/bazel#19904). Progress on bazelbuild/bazel#18643. (cherry picked from commit 0bc8e5f)

Although this is a breaking change, I find it unlikely that anyone requests both, given how unwieldy these files are. This avoids future complexity around format conversions as we add more of them (specifically: in order to add a new format while keeping the old formats equally performant, the choice of "raw" format would likely have to vary, so there would be additional pairs to convert between). The current conversion code can also be simplified, as there's now at most one conversion to make, but I prefer doing that separately. Related to bazelbuild/bazel#18643. (cherry picked from commit 557dab5)

tjgq · 2024-02-14T23:58:10Z

Status update:

Bazel 7.1.0 will introduce an --experimental_execution_log_compact_file flag, which is similar to the existing --execution_log_{binary,json}_file, but produces the log in a more efficient binary/compressed format (see flag description for details).

We designed this new format so that the log can be produced with minimal runtime overhead (aiming for 2-3% at most, often less) and have a much smaller file size (a reduction of 100x relative to the old format can be expected, sometimes more).

As a drawback, the new format is not designed with human readability in mind, and is expected to be analyzed with the aid of specialized tools. As a starting point, 7.1.0 will also provide an offline conversion tool (//src/tools/execlog:converter) that can convert the new format into the old ones, so that existing tools built around the old formats continue to work. Over time, we expect tools to acquire the ability to consume the new format natively.

We look forward to your feedback regarding this new format during the 7.x cycle, and might still make some minor adjustments in response to it (hence the experimental nature) although we'll endeavor to minimize disruption to early adopters.

We expect Bazel 8.0.0 to declare the new format to be stable and to deprecate the older formats.

sluongng · 2024-02-15T09:04:53Z

As a drawback, the new format is not designed with human readability in mind, and is expected to be analyzed with the aid of specialized tools. As a starting point, 7.1.0 will also provide an offline conversion tool (//src/tools/execlog:converter) that can convert the new format into the old ones, so that existing tools built around the old formats continue to work. Over time, we expect tools to acquire the ability to consume the new format natively.

We have seen this pattern with recent log files. They are written as protobuf so it's not friendly for human consumption.

What if we include a bazel log subcommand to support these logs collectively instead of shipping them via separate tools? This would let us store these logs in Zstd compressed form as well, which is append-able and save more space.

For reference, buck2 comes with

> buck2 log
buck2-log
Commands for interacting with buck2 logs

USAGE:
    buck2 log [OPTIONS] <SUBCOMMAND>

OPTIONS:
    -h, --help                Print help information
    -v, --verbose <NUMBER>    How verbose buck should be while logging. Values: 0 = Quiet, errors
                              only; 1 = default; 2 = more info about errors; 3 = more info about
                              everything [default: 1]

SUBCOMMANDS:
    cmd                  Show buck command line arguments from selected invocation
    critical-path        Show the critical path for a selected build
    help                 Print this message or the help of the given subcommand(s)
    path                 Output the path to the selected log
    replay               Replay an event log
    show                 Outputs the log in JSON format from selected invocation
    show-user            Converts the event log from a selected invocation into a user event
                             log, in JSONL format
    what-failed          Outputs every command that failed in the selected invocation
    what-materialized    Outputs materializations from selected invocation
    what-ran             Output everything Buck2 ran from selected invocation
    what-up              Show the spans that were open when the log ended
    what-uploaded        Outputs stats about uploads to RE from the selected invocation
exit 2

which is a lot more user-friendly.

Separate question: given that this is supposed to be "compact", could we include this log in BES?

lberki · 2024-02-15T09:21:56Z

I think that's a reasonable idea (cc @tjgq @coeuvre ); getting the separate tools is a pain and I hope that since Bazel needs to link all the code handling these logs anyways, it would not be too much bloat.

brentleyjones · 2024-02-15T12:49:35Z

I also want to ask what Son asked at the end: can we get this referenced in the BES and uploaded to the remote cache? Similar to the execution deps graph when that is enabled?

tjgq · 2024-02-15T12:55:30Z

Yes, we can include it in the BES. I'll prepare a PR.

sluongng · 2024-04-08T15:49:59Z

@tjgq I started to play with the new compact exec log today and it's very nice.

One small feedback is that --experimental_execution_log_compact_file flag should take an empty string. In that case, it should simply write the file to output_base with a default name to upload the file to BES. That's how it's done for the trace profile today https://cs.opensource.google/bazel/bazel/+/master:src/main/java/com/google/devtools/build/lib/runtime/BlazeRuntime.java;l=328-347;drc=3f9d80c35d88a280c2d53682f3d201b4733a3fff

BalestraPatrick · 2024-06-26T13:10:41Z

+1 to also adding a way to upload this by default without providing a value to the flag.

brentleyjones · 2024-06-26T13:45:53Z

@tjgq and I discussed this, and he's not sure it would ever be fine for this to always be collected by default. But there could be a flag, like --enable_execution_log_compact_file_by_default which people could flip that would make it behave like the timing profile does today.

sluongng · 2024-06-26T13:52:14Z

Since this is an experimental flag, I think just change it to a tri-state flag:

Unset: disable logging.
Set, no value: log to bazel_out
Set, with value: log to specified path

The default value could still be unset.

From my observation, this flag benefits folks who rely on Bazel's incremental state tracking a lot more. For folks to rely on a separate query system (i.e. bazel-diff or target-determinator) to decide what to build and test, the number of actions is usually higher and thus, the log size is also a bit higher (1-2 GBs for some runs). To be clear, I don't think the increase in action count is caused by using a separate query system, but there usually is some correlation there.

Therefore, I do agree that we should not enable the compact log by default, at least not yet.

brentleyjones · 2024-06-26T14:03:29Z

There would be no way to unset it then in a different --config, since you can't do null versus empty string at the command-line.

sluongng · 2024-06-26T14:28:55Z

🤔 perhaps you could explicitly unset it with the --no<flag>. Or we could accept some boolean value such as true/false/1/0 to detect unset / set-but-use-default-path?

lberki added type: feature request untriaged labels Jun 12, 2023

lberki assigned sgowroji and Pavank1992 Jun 12, 2023

lberki added the team-Remote-Exec Issues and PRs for the Execution (Remote) team label Jun 12, 2023

Pavank1992 unassigned sgowroji and Pavank1992 Jun 12, 2023

coeuvre added P3 We're not considering working on this, but happy to review a PR. (No assignee) and removed untriaged labels Jun 13, 2023

copybara-service bot pushed a commit that referenced this issue Jun 21, 2023

Add profiling to execution_log creation, writing, sorting.

c2e25ea

This might help to understand when it is costly, see also #18643 PiperOrigin-RevId: 542194475 Change-Id: I85932d8a5d7a832ee4675722642deb675f7db81f

tjgq mentioned this issue Sep 26, 2023

Log filegroups only once in execution_log_json_file and reference them in all the spawns where they are used #19628

Closed

tjgq mentioned this issue Nov 9, 2023

[7.0.0] Delete the now redundant --experimental_execution_log_file flag, and optimize creation of unsorted logs. #20132

Merged

tjgq mentioned this issue Nov 14, 2023

[7.0.0] Rename the remote_cache_hit field to cache_hit, which more accurately reflects its purpose. #20201

Merged

tjgq mentioned this issue Nov 16, 2023

[7.0.0] Forbid simultaneously requesting both a JSON and binary execution log. #20222

Merged

tjgq mentioned this issue Nov 20, 2023

Progress message no longer available in the execution logs #20268

Closed

tjgq mentioned this issue Jan 10, 2024

improve usability of --experimental_execution_log_file #6807

Closed

meisterT added P2 We'll consider working on this in future. (Assignee optional) and removed P3 We're not considering working on this, but happy to review a PR. (No assignee) labels Jan 12, 2024

meisterT assigned tjgq Jan 12, 2024

tjgq mentioned this issue Jan 29, 2024

Allow use scrubbing with remote execution (build actions with scrubbed inputs locally) #20070

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement a way to be able to debug cache misses that can be always on without too much overhead #18643

Implement a way to be able to debug cache misses that can be always on without too much overhead #18643

lberki commented Jun 12, 2023 •

edited

Loading

tbaing commented Jun 13, 2023

lberki commented Jun 14, 2023

meisterT commented Jun 20, 2023 •

edited by coeuvre

Loading

sheldonneuberger-sc commented Aug 9, 2023

chancilasnap commented Aug 9, 2023

tbaing commented Sep 1, 2023

minor-fixes commented Sep 2, 2023

tbaing commented Sep 15, 2023

acecilia commented Oct 23, 2023 •

edited

Loading

tjgq commented Feb 14, 2024

sluongng commented Feb 15, 2024

lberki commented Feb 15, 2024

brentleyjones commented Feb 15, 2024

tjgq commented Feb 15, 2024

sluongng commented Apr 8, 2024

BalestraPatrick commented Jun 26, 2024

brentleyjones commented Jun 26, 2024

sluongng commented Jun 26, 2024

brentleyjones commented Jun 26, 2024 •

edited

Loading

sluongng commented Jun 26, 2024

Implement a way to be able to debug cache misses that can be always on without too much overhead #18643

Implement a way to be able to debug cache misses that can be always on without too much overhead #18643

Comments

lberki commented Jun 12, 2023 • edited Loading

Description of the feature request:

What underlying problem are you trying to solve with this feature?

Which operating system are you running Bazel on?

What is the output of bazel info release?

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

Have you found anything relevant by searching the web?

Any other information, logs, or outputs that you want to share?

tbaing commented Jun 13, 2023

lberki commented Jun 14, 2023

meisterT commented Jun 20, 2023 • edited by coeuvre Loading

Reducing the size

large build:

medium build:

Performance overhead

Different formats

How to compare protos

Related issues

Next steps

sheldonneuberger-sc commented Aug 9, 2023

chancilasnap commented Aug 9, 2023

tbaing commented Sep 1, 2023

Reducing the size

Performance overhead

Different formats

How to compare protos

minor-fixes commented Sep 2, 2023

tbaing commented Sep 15, 2023

acecilia commented Oct 23, 2023 • edited Loading

tjgq commented Feb 14, 2024

sluongng commented Feb 15, 2024

lberki commented Feb 15, 2024

brentleyjones commented Feb 15, 2024

tjgq commented Feb 15, 2024

sluongng commented Apr 8, 2024

BalestraPatrick commented Jun 26, 2024

brentleyjones commented Jun 26, 2024

sluongng commented Jun 26, 2024

brentleyjones commented Jun 26, 2024 • edited Loading

sluongng commented Jun 26, 2024

lberki commented Jun 12, 2023 •

edited

Loading

What is the output of `bazel info release`?

If `bazel info release` returns `development version` or `(@non-git)`, tell us how you built Bazel.

What's the output of `git remote get-url origin; git rev-parse master; git rev-parse HEAD` ?

meisterT commented Jun 20, 2023 •

edited by coeuvre

Loading

acecilia commented Oct 23, 2023 •

edited

Loading

brentleyjones commented Jun 26, 2024 •

edited

Loading