[TriLUG] (no subject)
Joseph S. Tate
dragonstrider at gmail.com
Mon Feb 27 01:51:18 EST 2012
I'm running into a problem that's really kicking my tail; when my server
gets under high network load, I'm getting connection timeouts. These don't
just happen on the port with the high load, but even on ssh's port too.
What are some things to do to track down why the connections are failing?
I've got a web server that's running varnish + nginx + a python app on the
same box.
netstat -nt shows the number of connections topping out at about 500. Lots
of them in TIME_WAIT (75% or more).
vmstat shows CPU to be ok, with some interrupt and context switching going
on. Swapping is negligible, and IO is at "barely there" levels.
# vmstat -S M 2
procs -----------memory---------- ---swap-- -----io---- -system--
----cpu----
r b swpd free buff cache si so bi bo in cs us sy id
wa
0 0 232 2419 133 2775 0 0 1 5 0 0 6 1 93
0
0 0 232 2419 133 2775 0 0 0 0 549 619 3 0 97
0
0 0 232 2419 133 2775 0 0 0 402 1051 1044 18 0 81
0
0 0 232 2419 133 2775 0 0 0 56 961 1034 9 0 89
1
0 0 232 2419 133 2775 0 0 1 5 0 0 6 1 93
0
0 0 232 2419 133 2775 0 0 0 0 549 619 3 0 97
0
0 0 232 2419 133 2775 0 0 0 402 1051 1044 18 0 81
0
0 0 232 2419 133 2775 0 0 0 56 961 1034 9 0 89
1
1 0 232 2419 133 2775 0 0 0 0 898 962 14 0 86
0
ifconfig shows no errors on my public network interface.
I've got shorewall as a firewall management tool.
The apps database is on a separate server, but that server doesn't look
loaded either. Varnish is showing 99% or better cache hit ratios, so not
much is hitting my python app. What is going to the backend universally
returns within 30 seconds according to varnishhist.
Any other things I can look at? More information you need to help diagnose?
--
Joseph Tate
Personal e-mail: jtate AT dragonstrider DOT com
Web: http://www.dragonstrider.com
More information about the TriLUG
mailing list