Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upAdd std::thread::available_concurrency #74480
Conversation
|
r? @dtolnay (rust_highfive has picked a reviewer for you, use r? to override) |
|
To clarify, would this report the number of "logical" cores on CPUs that support hyperthreading? That's probably the right choice if the API intends to answer the question "how much parallelism is reasonable". |
@luser Yes I believe it should. Since I didn't write the backing impl I had to verify, but the Perhaps we should clarify in the docs what we mean by "hardware threads"? |
That is a great point. Related to this, is it worthwhile to have two functions that distinguish between the number of logical cores and the number of hardware cores? |
b2e2c28
to
129d6d8
|
Would you mind retaining the original code that used Also, my impression is that all platform-specific code should be in the Might also want to address, the |
|
Hardware threads is correct term but can be quite confusing for some people. To make it easier you can view it as:
|
|
This doesn't seem to have support for target_os="none", unless I'm totally missing how cfg(target_os) works? |
|
What @mati865 says is accurate. Running the fn main() {
println!("hardware threads {}", num_cpus::get());
println!("hardware cores {}", num_cpus::get_physical());
}
@dfamonteiro I'd like to keep the scope of this PR minimal; but it's certainly possible an API along the lines of |
129d6d8
to
8b93bcd
8b93bcd
to
a548bfb
num_cpus does not actually tell you the number of hardware threads, it tells you the maximum scheduling capacity available to the current process, which can be lower than the number of hardware threads if CPU affinities, cgroups or similar things are applied. This probably is what most thread pools actually want. Returning the number of hardware threads would lead to oversubscription in containers. Java does the same: https://www.docker.com/blog/improved-docker-container-integration-with-java-10/ |
This
For the same reason, it seems undesirable to return |
6e49069
to
ba6c552
|
Updated with @SimonSapin and Leo Le Boueter's feedback; we now return |
|
There needs to be a loud section about the limitations of this API on Windows. Specifically that it only returns the number of logical processors in the current processor group which is limited to at most 64 logical processors. To support more than 64 logical processors requires explicit support for enumerating processor groups and assigning threads to logical processors in other groups. https://docs.microsoft.com/en-us/windows/win32/procthread/processor-groups |
|
@retep998 Rather than having that limitation, we should be calling the APIs that let us figure out the total number of hardware threads, in all processor groups. |
ba6c552
to
b75ad69
|
@joshtriplett @retep998 Added an entry on the tracking issue under "known issues" that the amount of threads reported on Windows is at most 64. Also tracking @luser's report that the interaction with CPU affinity on Linux doesn't seem right. |
|
@joshtriplett and place warning that users will be limited to 64 hardware threads unless they apply Windows specific workaround. I suppose they will rather use crate that does it for them... |
|
@joshtriplett Threads will not run on other processor groups by default. Normally threads only have access to the logical processors in the processor group for that process, so |
|
@retep998 Is it possible for a thread's affinity to be for multiple processor groups at once, or does a thread have to be limited to a single processor group at a time no matter how it's started or modified? If the former, we could use that and start threads that can run anywhere by default. If the latter, do Rayon and other libraries use the appropriate APIs at the moment to start threads on every hardware thread across the system? Long-term, I'd like us to have NUMA-aware APIs as well, but for the moment, it'd be nice to have APIs for the simple case of "start as many hardware threads as the system has". |
|
@yoshuawuyts: |
1 similar comment
|
@yoshuawuyts: |
|
IIRC we don't have FreeBSD builder on the try build so let's just re-add this to the queue :) |
|
|
|
@bors r- This will still fail:
The semicolon in the unsafe block makes the If you have Docker, I would encourage trying to test using that. |
|
Alternative you can use Miri: |
9508c32
to
42a9706
@ehuss Ah yeah, good catch. I thought that followed out of an earlier error, but I got it wrong. That should be fixed now. Apparently I'd also missed that for OpenBSD which is fixed now too.
Do we have instructions on how to run this available? The Windows instructions for the compiler ended up being based on my notes, and researching that took a fair bit of work. If we don't have instructions on how to compile locally on BSD on Docker yet we should probably open up an issue for this somewhere (not sure if |
https://rustc-dev-guide.rust-lang.org/tests/intro.html#testing-with-docker-images The command for freebsd is |
|
@RalfJung that seemed to work successfully for both FreeBSD and OpenBSD! Though from the output it seems like it may only have checked PS C:\Users\yoshu\Code\rust> rustup override set nightly-x86_64-pc-windows-gnu # miri doesn't seem to work yet on msvc
PS C:\Users\yoshu\Code\rust> rustup component add miri
PS C:\Users\yoshu\Code\rust> $env:XARGO_RUST_SRC='C:\Users\yoshu\Code\rust\library'
PS C:\Users\yoshu\Code\rust> cargo miri setup --target x86_64-unknown-freebsd
PS C:\Users\yoshu\Code\rust> cargo miri setup --target x86_64-unknown-openbsd@ehuss dang, that seems to assume a Linux environment and I'm running on Windows. I don't think I'll be able to get that to work without provisioning a new environment (VM or otherwise) which I don't have the bandwidth for right now. Given all known issues have been addressed at this point, can we give this a shot at running through bors again? |
42a9706
to
3717646
It should check core, std, and even test. And the output looks like that for me:
(Ignore the "compiling", that's a cargo bug: rust-lang/cargo#7921) |
|
@bors r=dtolnay |
|
|
|
|
test new available_concurrency function Cc rust-lang/rust#74480
This PR adds a counterpart to C++'s
std::thread::hardware_concurrencyto Rust, tracking issue #74479.cc/ @rust-lang/libs
Motivation
Being able to know how many hardware threads a platform supports is a core part of building multi-threaded code. In C++ 11 this has become available through the
std::thread::hardware_concurrencyAPI. Currently in Rust most of the ecosystem depends on thenum_cpuscrate (no.35 in top 500 crates) to provide this functionality. This PR proposes an API to provide access to the number of hardware threads available on a given platform.edit (2020-07-24): The purpose of this PR is to provide a hint for how many threads to spawn to saturate the processor. There's value in introducing APIs for NUMA and Windows processor groups, but those are intentionally out of scope for this PR. See: #74480 (comment).
Naming
Discussing the naming of the API on Zulip surfaced two options:
std::thread::hardware_concurrencystd::thread::hardware_threadsBoth options seemed acceptable, but overall people seem to gravitate the most towards
hardware_threads. Additionally @jonas-schievink pointed out that the "hardware threads" terminology is well-established and is used in among other the RISC-V specification (page 20):It's also worth noting that the original paper introducing C++'s
std::threadsubmodule unfortunately doesn't feature any discussion on the naming ofhardware_concurrency, so we can't use that to help inform our decision here.Return type
An important consideration @joshtriplett brought up is that we don't want to default to
1for platforms where the number of available threads cannot be retrieved. Instead we want to inform the users of the fact that we don't know and allow them to handle that case. Which is why this PR usesOption<NonZeroUsize>as its return type, whereNoneis returned on platforms where we don't know the number of hardware threads available.The reasoning for
NonZeroUsizevsusizeis that if the number of threads for a platform are known, they'll always be at least 1. As evidenced by the example theNonZero*family of APIs may currently not be the most ergonomic to use, but improving the ergonomics of them is something that I think we can address separately.Implementation
@Mark-Simulacrum pointed out that most of the code we wanted to expose here was already available under
libtest. So this PR mostly moves the internal code of libtest into a public API.