[TriLUG] Trying to understand top/htop/free vs other tools

Ron Kelley via TriLUG trilug at trilug.org
Sat Jul 15 20:26:36 EDT 2017


Greetings all,

We are running a number of Ubuntu 16.04 servers (4.4.0.57 kernel), and I think one of our apps has a memory leak.  Over time, system swap rises to an unusual size, and I can’t seem to find the culprit.  During our last test, the server ran for 15 days and used 5.5GB of swap although it had lots of free RAM available (buff/cache and avail Mem).  I have set vm.swappiness=1 and vm.vfs_cache_pressure=50 to force the kernel to use as much RAM as possible before swapping out.


In an effort to identify the potential rogue program, I did a little google-fu and came up with these hits:
* https://stackoverflow.com/questions/479953/how-to-find-out-which-processes-are-swapping-in-linux
* https://www.cyberciti.biz/faq/linux-which-process-is-using-swap 

After running those scripts, I compared the output with top/htop/free and was very surprised.  Top/htop/free shows 3.3GB of swap in use, but the other tools only show 1.5G of swap.  I don’t know which tool to rely on to provide accurate results.


Specs: Ubuntu 16.04, kernel 4.4.0.57, 8GB RAM, 19GB Swap, LXD 2.13, VMWare VM


Top example:
----------------
top - 20:07:13 up 17:13,  2 users,  load average: 0.59, 0.49, 0.46
Tasks: 1033 total,   1 running, 1032 sleeping,   0 stopped,   0 zombie
%Cpu(s):  1.7 us,  1.2 sy,  0.0 ni, 96.2 id,  0.2 wa,  0.0 hi,  0.8 si,  0.0 st
KiB Mem :  8175076 total,   569992 free,  2347856 used,  5257228 buff/cache
KiB Swap: 19737596 total, 16387340 free,  3350256 used.  3144184 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
26709 101001    20   0  496780  66964  45468 S   4.0  0.8   0:00.60 php-fpm7.0
26674 100027    20   0 1380356 325748   3228 S   2.6  4.0  37:38.24 mysqld
26704 101001    20   0  495228  70692  51048 S   2.6  0.9   0:01.82 php-fpm7.0



smem example
---------------
  PID User     Command                         Swap      USS      PSS      RSS
...
20747 101001   nginx: worker process           6.8M    47.8M    48.3M    49.5M
11922 101001   nginx: worker process          13.8M    53.7M    54.6M    56.1M
26817 101001   php-fpm: pool www               4.0M    63.8M    68.0M    73.9M
24402 101001   nginx: worker process          12.9M    70.2M    70.6M    71.6M
 1897 root     /usr/share/metricbeat/bin/m   800.0K    88.4M    88.4M    89.9M
 5356 101001   nginx: worker process           6.9M    92.1M    93.7M    96.7M
21907 101001   nginx: worker process          13.1M   100.5M   101.2M   102.8M
11623 101001   nginx: worker process          12.6M   110.9M   112.3M   115.4M
26674 100027   /usr/sbin/mysqld              256.7M   319.3M   319.3M   319.6M
-------------------------------------------------------------------------------
  883 16                                       1.5G     2.5G     2.6G     3.2G



I am curious if anyone else has seen this before?



Thanks,

-Ron


More information about the TriLUG mailing list