[TriLUG] Trying to understand top/htop/free vs other tools
Ron Kelley via TriLUG
trilug at trilug.org
Sat Jul 15 20:26:36 EDT 2017
Greetings all,
We are running a number of Ubuntu 16.04 servers (4.4.0.57 kernel), and I think one of our apps has a memory leak. Over time, system swap rises to an unusual size, and I can’t seem to find the culprit. During our last test, the server ran for 15 days and used 5.5GB of swap although it had lots of free RAM available (buff/cache and avail Mem). I have set vm.swappiness=1 and vm.vfs_cache_pressure=50 to force the kernel to use as much RAM as possible before swapping out.
In an effort to identify the potential rogue program, I did a little google-fu and came up with these hits:
* https://stackoverflow.com/questions/479953/how-to-find-out-which-processes-are-swapping-in-linux
* https://www.cyberciti.biz/faq/linux-which-process-is-using-swap
After running those scripts, I compared the output with top/htop/free and was very surprised. Top/htop/free shows 3.3GB of swap in use, but the other tools only show 1.5G of swap. I don’t know which tool to rely on to provide accurate results.
Specs: Ubuntu 16.04, kernel 4.4.0.57, 8GB RAM, 19GB Swap, LXD 2.13, VMWare VM
Top example:
----------------
top - 20:07:13 up 17:13, 2 users, load average: 0.59, 0.49, 0.46
Tasks: 1033 total, 1 running, 1032 sleeping, 0 stopped, 0 zombie
%Cpu(s): 1.7 us, 1.2 sy, 0.0 ni, 96.2 id, 0.2 wa, 0.0 hi, 0.8 si, 0.0 st
KiB Mem : 8175076 total, 569992 free, 2347856 used, 5257228 buff/cache
KiB Swap: 19737596 total, 16387340 free, 3350256 used. 3144184 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
26709 101001 20 0 496780 66964 45468 S 4.0 0.8 0:00.60 php-fpm7.0
26674 100027 20 0 1380356 325748 3228 S 2.6 4.0 37:38.24 mysqld
26704 101001 20 0 495228 70692 51048 S 2.6 0.9 0:01.82 php-fpm7.0
smem example
---------------
PID User Command Swap USS PSS RSS
...
20747 101001 nginx: worker process 6.8M 47.8M 48.3M 49.5M
11922 101001 nginx: worker process 13.8M 53.7M 54.6M 56.1M
26817 101001 php-fpm: pool www 4.0M 63.8M 68.0M 73.9M
24402 101001 nginx: worker process 12.9M 70.2M 70.6M 71.6M
1897 root /usr/share/metricbeat/bin/m 800.0K 88.4M 88.4M 89.9M
5356 101001 nginx: worker process 6.9M 92.1M 93.7M 96.7M
21907 101001 nginx: worker process 13.1M 100.5M 101.2M 102.8M
11623 101001 nginx: worker process 12.6M 110.9M 112.3M 115.4M
26674 100027 /usr/sbin/mysqld 256.7M 319.3M 319.3M 319.6M
-------------------------------------------------------------------------------
883 16 1.5G 2.5G 2.6G 3.2G
I am curious if anyone else has seen this before?
Thanks,
-Ron
More information about the TriLUG
mailing list