Thursday, May 20, 2010

Identifying Disk Usage and Reclaiming Space

As web sites become larger, fuelled by the availability of faster and faster internet connections, disk space can rapidly become depleted. Anything from excessive FTP uploads, to large log files, can quickly start to eat into your available disk space, so it's useful to know how to identify the cause and know how to reclaim that space back.

The df command (an old Unix command that stands for Disk Free) can be used to get a quick overview of the server's disk usage. Run it and you will see something like this:

: chrish@delli@12:56 ; df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda2 13G 12G 1.6G 88% /
tmpfs 980M 0 980M 0% /lib/init/rw
varrun 980M 384K 980M 1% /var/run
varlock 980M 0 980M 0% /var/lock
udev 980M 2.8M 977M 1% /dev
tmpfs 980M 164K 980M 1% /dev/shm
/dev/sda1 236M 31M 194M 14% /boot
/dev/sda5 19G 13G 5.9G 68% /mnt/storage1

Note: By default, df will report the partition sizes in kilobytes or half-kilobytes. If you use the -h switch, as shown above, it will print sizes in a human readable format (e.g., 1K, 234M, 2G).

>From the listing above, you can see the / partition is starting to run low on space. While 1.6 gigabytes of space free isn't critical, let's look into what is using this and see if we can free up some more space.

For this we will use the du (Disk Usage) command. Rather than checking how much of the disk space is being used, it reports how large files on the disk are.

The usual format for using this command would be something like : du -sch /*

We've added a few switches to the command, which are explained below:

-s Without this option the command will spider any directories that it takes as arguments and give you the size on very file inside them (this is the equivalent to setting --max-depth 0).
-c Provides a total at the end of list, useful to ensure that what you're checking matches the directory totals.
-h Like df, this option displays the sizes in a more readable format.

When you have many large filesystems mounted to directories on disks that are running out of space, it's good practise to stop du from traversing into them - to accomplish this use the -x switch. This forces du to stay on the filesystem(s) that your path arguments correspond to. Because of this, if you wanted to run the command "du -sch /*" you would need to rewrite it if any of the directories in / are on separate filesystems (like /boot). To achieve the same information, but just reporting:

du -hx --max-depth 1 /

If you're wondering why "/*" would include filesystems mounted in directories inside / it is because this type of argument makes use of shell 'globbing'. Globbing is where the shell (i.e. Bash) fills in the blanks and passes them across to the program (in this case du).

If you're still a little confused, try running the following command.

echo /*

Before moving onto how you might want to deal with these files we'll take a quick look at the find command. The find command lists files in a certain place that meet a certain criteria. If you were to run the following command it would eventually list all files that are on your system:

find /

To help us find files that we might want to consider removing, the following two commands may be of use.

find /directroy/containing/uploads -type f -size +100M -ls
find /directory/containing/uploads -type f -mtime +90 -ls

Both of these commands are looking inside the directory (and subdirectories of) /directory/containing/uploads. The '-type f' part tells find to only return files (rather than directories, for instance), as we are only really interested in files. The first of these commands will list all of the files more than 100 megabytes in size, and the second lists all files that haven't been modified in the last 90 days. The -ls argument at the end of these provides us with more information about the files, rather than just their names.

If you are happy to move the files that it flags up, you can make find do this also. Using the -exec flag you can request that find runs a command on each file. Here's an example that would move the files larger than 100 megabytes to another directory. Be sure not to ommit the \; punctuation at the end.

find /directory/containing/uploads -type f -size +100M -exec mv '{}'
/mnt/large_external_harddrive/ \;

Now that you have an idea of what is using the space let's cover the basic short term fixes available. Here I'll briefly cover moving, removing and compressing data.

Please bear in mind that by removing the wrong files and directories you risk not only losing valuable data, you also put the integrity of your system at risk. If in doubt, always check before removing a file or a directory.

If you have enough space available on another filesystem, and you do not want to remove what you have isolated as the most obvious cause, feel free to move (mv) the files or directories to another location.

mv /var/log/big.log /mnt/large_external_harddrive/

If you want to you can create symbolic link from the old location to where it now resides; this way it will continue to be accessible from the old path.

ln -s /mnt/large_external_harddrive/big.log /var/log/big.log

To simply truncate a file (reduce to nothing in size), simply redirect nothing into it. Like so:

echo > /var/log/big.log

Or, more simply:

> /var/log/big.log

This saves you the hassle of recreating it with the correct name and file permissions after simply removing it.

Finally, to compress data use the gzip command simply point it at a file. For instance the following command will remove the file it is pointed at, and in its place a compressed version (/var/log/big.log.gz) will exist.

gzip /var/log/big.log

This isn't of much use when the filesystem is already full though - because there is insufficient space to create the compressed file, prior to removing the original.

However, we can use gzip to compress the file to a different place if there isn't enough space to move the file and then compress it. The following example creates the file
/mnt/large_external_harddrive/big.log.gz. Bear in mind that when using gzip with the -c switch, it does not remove the original file.

gzip -c /var/log/big.log > /mnt/large_external_harddrive/big.log.gz

Finally, a note concerning log files, if you remove a log file without restarting the program that is writing to it you may find that 'df' doesn't report the space as free. This is because although link to the file might not exist in the filesystem now, however while a process is still referencing the file the blocks associate with it are still marked as used.

Therefore, if you have to remove a current Apache log file, for instance, be sure to restart Apache afterwards.

0 comments:

Post a Comment