It's not uncommon to find that a previously adequate hosting solution starts to seem slower and less responsive as site(s) hosted on it gain in complexity and popularity. Before you can construct a plan of action to start improving performance, you need to find out what is draining your system resources.
Linux has numerous tools to show which resources are being used, and what is using them. Let's look at a few of them...
Measuring resources utilisation
VMSTAT - Reports virtual memory statistics:
root@server vmstat 3 10
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 4 1684 170176 49644 2469352 0 0 5 7 0 0 7 5 38 50 0
0 6 1684 169976 49652 2469356 0 0 3 932 1334 2665 17 12 19 52 0
0 5 1684 170164 49656 2469388 0 0 3 861 1364 890 1 1 19 79 0
2 3 1684 170312 49684 2469380 0 0 13 1111 1438 1848 2 1 29 68 0
1 1 1684 170432 49724 2469412 0 0 35 833 1663 5837 7 7 46 40 0
1 1 1684 169928 49724 2470612 0 0 283 13 2183 13366 17 12 49 21 0
0 6 1684 169516 49732 2471036 0 0 220 1013 2176 11199 14 7 35 44 0
0 5 1684 169524 49732 2471048 0 0 1 935 1266 179 0 0 18 82 0
0 7 1684 169588 49744 2471048 0 0 4 988 1263 205 0 0 16 84 0
0 4 1684 169668 49744 2471068 0 0 3 908 1293 302 1 1 33 66 0
0 5 1684 169684 49748 2471076 0 0 3 920 1272 160 0 1 24 75 0
Above is an example of the output of a vmstat command. The two numbers following its invocation set the delay between updates and number of reports (lines) to generate. The example shown here prints generates a total of 10 reports, with a 3 second interval between each.
The first line should generally be ignored because it contains the averages since the computer was booted.
The output is split into several columns, and the columns are grouped into 'proc', 'memory', 'swap', 'io', 'system' and 'cpu'. The 'proc' group deals with process queues, the first column (r) shows the number in the run queue and the next column (b) gives the number of processes that are blocked on I/O (disk activity). Processes blocked on I/O sleep until the I/O they're waiting on is returned.
The memory group contains the columns swpd, free, buff and cache. All the measurements here are given in kilobytes (base 2) unless otherwise specified. The columns list the amount of swap in use, and free memory. The second two columns deal with buffers and cache.
The buffers temporarily store blocks of data that have been read or will be commited to disk, or the network, sometimes the data here is kept for longer to avoid unnecessary disk activity. The cache is used for actual files being kept in memory, and other things like RAM disks.
The final section shows what CPU is being used on and contains 5 columns; us(er), sy(stem), id(le), wa(iting on I/O) and - although not on all platforms - st(eal). The user column is for anything that runs outside of the kernel and system is purely for processing time used by the kernel. Idle measures how much time that your processor/s aren't in use, waiting on I/O is the time spent blocked on read/writes. Steal, which may or may not be present, is only of relevance to certain virtual hosting environments and it shows time that virtual machine had something waiting to run but wasn't given the opportunity to do so.
In addition to vmstat, the following tools will also help you to determine which resources are heavily used.
DF - Report of disk space usage
To get a quick overview of the space used and available on all mounted filesystems use the df command. You'll likely find that many of the filesystems listed by it are not in fact conventional, disk-based filesystems and instead that they're pseudo filesystems used as an abstract interface through which information and configuration of the kernel are made accessible via the filesystem. Such ideas are core to the Unix philosophy, however as Linux has developed this goal has - for the most-part - been maintained but not extended as Linux has grown.
The df command is usually used with the -h switch, to ensure you get 'human readable' output from it, e.g.
root@server [~]> df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VolGroup00-slash
5.8G 3.8G 1.7G 70% /
/dev/mapper/VolGroup00-var
20G 6.2G 13G 34% /var
/dev/mapper/VolGroup00-home
34G 22G 11G 69% /home
/dev/mapper/VolGroup00-temp
9.7G 38M 9.2G 1% /tmp
/dev/sda2 99M 24M 70M 26% /boot
tmpfs 1014M 0 1014M 0% /dev/shm
FREE - Displays amount of free and used memory in the system
Using the free command you can quickly get an overview of all memory, and virtual memory, being used by the operating system at a particular moment. Like df it's a very simple command and simple use the -m switch to show the output in megabytes (rather than bytes).
root@server [~]> free -m
total used free shared buffers cached
Mem: 2026 1934 92 0 224 426
-/+ buffers/cache: 1282 743
Swap: 1023 89 934
BWM-NG - Bandwidth Monitor NG (Next Generation), a live bandwidth monitor
Before moving onto identifying the cause I'll briefly cover network usage monitoring. Using ifconfig you can see what network interfaces are configurated and how much traffic they are doing, for this purpose you'll want to pay attention to the line containing "RX bytes" and "TX bytes" for received and transmitted bytes, respectively.
For a continually updating overview (in kilobytes oer second) of the interfaces present, try using the tool bwm-ng. Here's some example output from it:
bwm-ng v0.6 (probing every 0.500s), press 'h' for help
input: /proc/net/dev type: rate
| iface Rx Tx Total
==============================================================================
lo: 0.00 KB/s 0.00 KB/s 0.00 KB/s
eth0: 1.14 KB/s 0.55 KB/s 1.70 KB/s
------------------------------------------------------------------------------
total: 1.14 KB/s 0.55 KB/s 1.70 KB/s
If you require further information concerning your system I'd suggest that you install the systat package and get acquainted with its 'sar' tool. For information on sar please consult its manual page, available at the URL, http://linux.die.net/man/1/sar
Identifying the cause
Now that you know what is happening you will want to isolate the cause.
Using ps you can list processes on the system that meet a specific criterion and choose the order by which they're sorted.
Here's an example (output truncated to save space)
root@server [~]> ps axu
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 18930 0.3 2.5 67672 52672 ? S 08:23 0:14 MailScanner: checking with SpamAssassin
root 18954 0.3 2.5 67552 52616 ? S 08:23 0:14 MailScanner: waiting for messages
root 18956 0.0 1.2 36000 26880 ? S 08:23 0:01 MailWatch SQL
apache 21604 0.1 3.8 96396 79464 ? S Aug03 1:55 /usr/sbin/httpd
apache 21605 0.1 4.0 102592 83832 ? S Aug03 1:55 /usr/sbin/httpd
apache 21606 0.1 3.8 97024 80204 ? S Aug03 1:46 /usr/sbin/httpd
Essentially the "axu" part lists all processes (a), even those without consoles associated with them, like daemons (x) and lists them in the 'user' format (u). The 'k' switch is used to order the output, in this example we're ordering it by percentage of CPU (pcpu). For more information on ps consult its manual page, available at http://ss64.com/bash/ps.html
PIDSTAT - Reports statistics for Linux tasks
That's an easy way of determining which tasks are using the most CPU, and often this will be what is causing the most I/O while CPU cycles are being wasted. To measure I/O usage properly you can use a command like pidstat.
pidstat is part of the sysstat package. This tool allows you to monitor various things on a per-process level. The -d switch on pidstat asks it to output disk statistics and -p is used to specify the process ID (pid) to collect data for, in addition to this the argument ALL can be used to specify all processes. Here's an example (output truncated to save space):
root@server:~$ pidstat -d -p ALL
Linux 2.6.27-17-generic (adrianj-desktop) 04/08/10 _i686_
09:41:33 PID kB_rd/s kB_wr/s kB_ccwr/s Command
09:41:33 1 3.86 0.74 0.04 init
09:41:33 2 0.00 0.00 0.00 kthreadd
09:41:33 3 0.00 0.00 0.00 migration/0
09:41:33 4 0.00 0.00 0.00 ksoftirqd/0
Here you'll see 6 columns, a timestamp, PID number, number of kilobytes read per second, number written per second, number of cancelled operations (again, in kilobytes), and finally the command name.
For ordered output you'll probably want to run a command like the following, it cuts the header off, combines the read and write fields, and orders the output by the combined I/O field.
pidstat -d -p ALL | awk 'FNR>3{ $3="\t"$3+$4"\t"; $4=""; print $0; }' | sort -n -k3
To measure the amount of memory being used on a per-process basis you can use the ps command from earlier but has it sort by 'rss' for physical memory, and 'vsz' for both physical and virtual (swap).
The following commands show the top 5 processes for physical and combined memory usage, respectively.
ps aux k vsz | tail -5
ps aux k rss | tail -5
Finally, to determine what has been exhausting disk space you have 3 main tools. These are du, find and lsof. du can be used to determine exactly where space is being used. Find can search for files that match a specific criteria. lsof is used to determine what files are in use, and what is using them.
Here are some example commands using these tools.
DU - Summarizes disk usage
Check the size of the files and directories inside /dir
du -sch /dir/*
Check the size of the subdirectories of /dir without traversing filesystems (useful when / is filling up but other filesystems such as /home would slow down the previous command).
du -chx --max-depth 1 /dir
FIND - Searches for files
List files in /dir larger than 4 megabytes in size.
find /dir -size +4M -ls
Show the number of files in /dir owned by nobody.
find /home/ -user nobody | wc -l
LSOF -Displays a LiSt of Open Files
Shows the process that is currently using this file, if one exists.
lsof /tmp/a_large_file_perhaps
Same as the previous command, but will check all items in the directory.
lsof +d /tmp/a_large_directory_perhaps
Same as the previous command but will traverse into subdirectories.
lsof +D /tmp/a_large_directory_of_large_directories_maybe
IFTOP - Displays bandwidth usage on an interface
Finally I'll cover detailed network usage monitoring, the commands we'll be using are iftop and tcpdump. The iftop command gives you a nice, graphical overview of the connections pushing the most data through your chosen interface. Below is how you would invoke it for interface eth0.
iftop -i eth0
TCPDUMP - Dump traffic on a network
Tcpdump allows you to filter and dump the individual packets coming through an interface, here's a few examples that you can adapt for various common uses. Once you're happy with the output you've seen you Ctrl+C in the terminal to exit.
Show packets for any interface that involve port 80 (HTTP).
tcpdump -ptqi any port 80
Show packets for connections on interface eth0 that have a destination of 10.0.0.5 port 80.
tcpdump -ptqi eth0 dst host 10.0.0.5 and dst port 80
Show packets for connections on interface eth0 that have a destination of 1.2.3.4.
tcpdump -ptqi eth0 dst host 1.2.3.4
Show packets for connections on interface eth0 that have a source of 10.0.0.5 and are destined for port 389 (LDAP).
tcpdump -ptqi eth0 src host 10.0.0.5 and dst port 389
If you remove the dst/src part of the host or port definitions they will apply to either.
Wednesday, September 8, 2010
Subscribe to:
Posts (Atom)