Stephen Xiang [Fri, 14 Jun 2019 02:52:43 +0000 (10:52 +0800)]
cpu: improve cpu info virtualization
This commit introduces several improvements to CPU views based on
quotas:
- fall back to cpuacct.usage_percpu if cpuacct.usage_all not exists
- correct CPU usage
- correct CPU usage in partial CPU cases when quota/period doesn't yield
an integer
Signed-off-by: Stephen Xiang <BurningXFlame@gmail.com>
[christian.brauner@ubuntu.com: squashed commits and fixed up commit message] Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Abstract must_strcat from must_strcat_pid. Replace stringbuild with must_strcat.
Signed-off-by: Stephen Xiang <BurningXFlame@gmail.com>
Jakub Skokan [Wed, 4 Jul 2018 15:48:52 +0000 (17:48 +0200)]
meminfo: read shmem from cgroup parameter memory.stat
Shmem was passed as-is from the host, but other fields weren't (such as
Cached and SReclaimed), which has caused htop to show incorrect memory usage.
If `total_shmem` is found in `memory.stat`, it is used, otherwise `Shmem`
is reported as zero.
Signed-off-by: Jakub Skokan <jakub.skokan@havefun.cz>
Jakub Skokan [Mon, 18 Jun 2018 08:50:02 +0000 (10:50 +0200)]
Per-container CPU usage in /proc/stat
Containers can see utilization of all available CPUs, even if the CPU
is utilized by other containers or by the host. The contents
of `/proc/stat` is shared across the system, except for hiding CPUs
excluded by cpuset. This commit attempts to fix that, but at a cost.
CPU usage is read from cpuacct cgroup, but that accounts only for
`user` and `system` fields from `/proc/stat`. Idle time can be
calculated, but other fields cannot, thus are always set to 0.
Additionally, idle time is based on values from the host, so even
a freshly started container has large initial idle time.
If the cpuacct cgroup is not present, or reading from it fails, LXCFS
will return CPU usage stats from the host as before.
Signed-off-by: Jakub Skokan <jakub.skokan@havefun.cz>
zhang2639 [Thu, 24 May 2018 10:42:37 +0000 (18:42 +0800)]
Calculate and read the average load.
Use load daemon to calculate the loadavg and use proc_loadavg_read() to read the loadavg.
calc_pid : find the process pid from cgroup path of a container.
calc_load : calculate the loadavg of a container.
refresh_load : refresh the loadavg of a container.
load_begin : traverse the hash table and update it.
zhang2639 [Thu, 24 May 2018 10:27:34 +0000 (18:27 +0800)]
Use hash table to store load information
struct load_head{} contains three locks for thread synchronization
and a pointer to the hash list.
struct load_node{} contains special information of container and
pointers to the hash node.
static struct load_head *load_hash[LOAD_SIZE] is hash table.
calc_hash : get the hash of a special container.
init_load : initialize hash table.
insert_node : insert a container node to the hash table.
locate_node : find the specital container node.
del_node : delete a specital container node and return the next node
of it.
Aaron Sokoloski [Mon, 4 Dec 2017 18:30:37 +0000 (12:30 -0600)]
Change MemAvailable figure in /proc/meminfo to include cache memory -- Fixes #175 I think.
MemAvailable represents roughly how much more memory we can use before
we start swapping. Page cache memory can be reclaimed if it's needed
for something else, so it should count as available memory. This
change should also fix the "available" column of the "free" command,
as well as the "avail Mem" value in "top", both of which come from
MemAvailable.
Note that this isn't perfectly accurate. On a physical machine, the
value for MemAvailable is the result of a calculation that takes into
account that when memory gets low (but before it's completely
exhausted), kswapd wakes up and starts paging things out. See:
I tried to think of a way to be more exact, but this calculation
includes figures that we don't have available for a given cgroup
hierarchy, such as reclaimable slab memory and the low watermark for
zones. So it's not really feasible to reproduce it exactly.
Anyway, since the kernel calculation itself is just an estimation, it
doesn't seem too bad that we're a little bit off. Adding in the
amount of memory used for page cache seems much better than what we
were doing before (just copying the free memory figure), because that
can be wrong by gigabytes.
Aaron Sokoloski [Sat, 2 Dec 2017 18:43:06 +0000 (12:43 -0600)]
Fix inaccurate values in /proc/meminfo for containers with child cgroups
The values for Cached, Active, Inactive, Active(anon), Inactive(anon),
Active(file), Inactive(file), and Unevictable are derived/computed
from these values in the relevant meminfo.stat:
However, these value apply only to the cgroup of the lxc container
itself. If your container uses memory cgroups internally, and thus
the container cgroup has children, their memory is not counted.
In order to take the memory usage of child cgroups into account, we
need to look at the "total_" prefixed versions of these values.
In order to enable proper unprivileged cgroup delegation on newer kernels we not
just need delegate the "cgroup.procs" file to the user but also to
"cgroup.threads". But don't report an error in case it doesn't exist.
Also delegate "cgroup.subtree_control" to enable users to hand over controllers
to descendant cgroups.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
yuwang.yuwang [Fri, 20 Oct 2017 06:28:03 +0000 (14:28 +0800)]
Fix wrong calc of swaptoal and swapfree
it make the value of (memswlimit - memlimit) to be swaptotal,
it is wrong, because swapsize in cgroup/container can up to
[0,memswlimit], sometimes if the memsize(unless swap) of all tasks in
cgroup/container is very small, the swaptoal can to be memswlimit
so make the swaptotal to be min(host swtoal,memswlimit)
In order to not require a user to manually list all cgroup controllers
in their PAM configuration, add an "all" option that effectively just
sets all controllers as read-write.
When doing subsequent reads of uptime on an open file handle
in the form:
read
lseek 0L, SEEK_SET
read
the second (and later) reads cause that the error
"failed to write to cache" was printed. This
happens for example with "top". top would print the error:
bad data in /proc/uptime
To fix this problem use the whole size of the buffer instead of the d->size
because this is set on the first read.
Serge Hallyn [Sun, 18 Jun 2017 19:43:22 +0000 (14:43 -0500)]
(temporarily?) revert the virtualization of btime field in /proc/stat
Closes #189
This seems to be responsible for corrupting STIME on processlist
inside containers. Hopefully we can find a reasonable way to fix
both, but compared to unvirtualized btime field, bogus STIME field
is the greater evil here.