-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request for Machine Specification Recommendations for Image Build. #41
Comments
Hi hahishi, |
Thank you for your prompt and detailed response. I really appreciate your input that has significantly aided in shaping our approach to addressing this particular issue. We will try with a larger instance as suggested. 😄 I believe it would be beneficial to update the relevant part of aws environment somehow; building the image is resource intensive and It should serve as a valuable resource for others in the future, providing guidance and clarity on similar issue. |
I am using
|
Here is the output from The high load average and high I/O wait suggest that it is spending a lot of time waiting for I/O.
|
Thnx for your feedback. We will update aws environment to reflect these requirements. One thing you can try is to try setting a couple of bazel flags to encourage it to schedule more concurrent actions:
https://bazel.build/versions/6.3.0/docs/user-manual?hl=en#local-resources However, we don't often build with a 0% cache hit rate -- so given your feedback, it might be useful to further bump up your machine to Then instead of seeing
you'll see 72 actions running, since you'll have more vCpus, which will increase the build speed. |
I am reposting this communication as a follow-up to my earlier messages regarding a critical issue we have identified in our build process, which we suspect might be related to external bandwidth throttling. After a thorough investigation and multiple tests across different instance types, we have consistently encountered significant delays in our build times. This issue adversely impacts our development and deployment and, if unresolved, poses a substantial risk to our project timelines. The evidence indicates that the issue is unrelated to the instance type or our internal network configurations. We have observed consistent bandwidth limitations during the build process. These limitations occur irrespective of the instance's capacity, suggesting an external constraint on the network bandwidth. We'd like to ask for your help in investigating this issue. We would like to understand if any bandwidth limitations imposed on your end could be causing this bottleneck. Any information or insights you can give would be extremely helpful in helping us resolve this matter. We are open to exploring potential solutions or workarounds that you might suggest. We appreciate your quick attention to this matter and look forward to your support in resolving this issue. Thank you for your cooperation and support. |
Hi, I have a similar issue when building AMI, it has been taking me about 6 hours already to build the image. Wondering how long does it take for @hanishi to successfully build the image. Your provided information really helped me understand the issue more, would appreciate it if you have any follow-ups, thank you! |
@yw63 |
Hi @hanishi, Thanks again for sharing your issue. |
@lx3-g In a recent development, the AWS support team shared their findings with us after independently conducting the download and AMI creation process. They recently conducted tests on their end using a c5d.18xlarge instance, as suggested, and similarly experienced the prolonged 8-hour process with intermittent failures. Given the critical nature of this bottleneck and its apparent impact on our project, we would appreciate your expertise and assistance more urgently than ever. We have ruled out many potential internal causes and believe the issue may involve external bandwidth throttling or other constraints beyond our direct control. We look forward to any support or insights you can offer. Thank you. |
I created a fresh machine -- c5d.24xlarge
Note that I built for AWS, and not local as in example -- since AWS build takes longer.
which is 27 minutes. I didn't do anything extra beyond spinning up a standard AWS EC2 instance. |
Hello @lx3-g, I'm using Macbook M1 2020 (16 GB Memory) to build the image locally by following the instructions on this page. And I'm on the branch release-0.15 (commit 05b6890). I use Docker Desktop 4.28.0. The build speed was OK up to 6k actions. Then it slowed down greatly. It takes around 3hrs now and the build is still at 12k actions.
Do you know what could be the bottleneck? Is it realistic to expect to build this project on Macbook M1? If yes, how many jobs should I force |
hi @xinkuifeng , We can't really comment on whether Macbook M1 is realistic because we don't have one that has the same setup to evaluate. However, I just ran on a GCP VM with 8 vcpu and 32GB RAM 300GB SSD and it finished in about an hour. (The VM is a brand new one so there's not a lot of other cpu-heavy activities and that may or may not have helped.) |
Hey @peiwenhu ,
Thanks for this info! Is it possible to consider supporting the setup Macbook + Docker Desktop for local deployment? Without the local dev environment ready, it could be hard to contribute. In my use case, I noticed the container CPU usage on Docker Desktop is often less than 1 core (<100%) and in theory, I could go 8 cores. I tried:
before launching the build. However, it does not seem to change the container CPU usage. |
Hi @xinkuifeng , we don't have the capacity to support this specific setup unfortunately. Perhaps you can try to locate the problem with a simpler repro case such as replacing Docker Desktop with plain Docker and run the build or running some other simpler bazel build such as the official example inside Docker Desktop and see whether any of these gives any hint. |
Hi @peiwenhu, thanks for replying! Got it and will find alternatives. |
Hey @peiwenhu I'm trying to get this running on my Mac as well and am wondering if I'm barking up the wrong tree, as I'm not even getting to the slow build times that @xinkuifeng is describing (although I have seen that on AWS instances I've tried this on). I've tried this a few different ways:
(1) ran into little issues immediately, so I moved onto trying to use Linux based containers, but even there I'm getting a lot of small errors that might be telling me this isn't the correct path. For instance, I got past a few issues by mounting a directory for the workspace, making sure my Docker was up to date, etc...but it seems like now I'm getting odd issues with the install script where it can't find files that seem to be there, I see it look for files in the install directory that are being written to the workspace directory. I'll come back to this later, but I'm curious if this is just not supported...if so, I'm trying to understand what local development is intended to be supported. |
Hey @thegreatfatzby I would expect that you could encounter at least 2 small errors when building this project on Mac directly:
because bash/zsh on MacOS are slightly different than GNU bash on Ubuntu. I didn't try to build this project inside a docker container. Running a Ubuntu VM inside MacOS to build this project is definitely a viable solution. |
@xinkuifeng very encouraging to hear that, I take it that path succeeded for you...I don't mind spelunking but I was beginning to worry I was in the wrong rabbithole. Did you have to make small adjustments to the scripts? For instance, I just was able to get past one issue by adding a "mkdir ${WS_TMP_IMAGE_DIR} " in the get_builder_image_tagged phase, which has led to my next issue which is similar...did you have to do anything interesting with the scripts, docker commands, etc? I |
For Path 1
I didn't change the scripts. I installed utilities from GNU:
Starting from there, the project can be built. But I would not suggest you take this path as the build speed is far too slow (>> 6 hours). The root cause could be Bazel's sandboxing strategy (creating the symlinks in MacOS is way more costly than doing it in Ubuntu). For Path 3
Things work out of the box. You don't need to change any scripts. |
@xinkuifeng understood on number one, no issue going through an Ubuntu container.... That is really interesting I must be doing something silly... I had to step out so I don't have the commands in front of me but I basically just docker pulled the base Ubuntu, installed some basic docker and other stuff, tried it both with cloning directly into the image and also mounting from the host machine, and also mounting the docker socket as well as something for the workspace... Maybe I'm using the wrong image? |
I don't know. As I said, I never tried to use the docker container with Ubuntu as the base image (Path 2). |
Ah apologies, I heard what I wanted to, thanks. |
Thnx for pointing it out, I've updated the link. And yeah, it was exactly the same. |
Hi all, we have now set up Cloud Build (GCP) and CodeBuild (AWS). For both platforms, the build time should be within 2 hours. See cloud build docs for more details. |
Hi,
I am currently working on building this project following the instruction for the aws environment
and have encountered significant build times and resource usage issues. I seek recommendations on the EC2 instance that is required to handle this task efficiently.
I am currently using c5.4xlarge and it looks to me that compilation speed has decelerated significantly or stopped.
I appreciate any advice or suggestions.
The text was updated successfully, but these errors were encountered: