Monday, February 1, 2010

File Compression

While larger, cheaper hard drives have made a shortage of disk space less of a problem, it's still often necessary to compress files/directories to stop them filing up specific partitions - for example, the /var partition can easily be filled by the unchecked growth of log files. The smaller file size of compressed files also makes it easier and faster to move files between servers/PCs

The two most commonly used open-source file compression tools are gzip and bzip2. While the ubiquitous zip format is also available, it's usually only used when the files are likely to be shared with Windows-based systems.

Gzip is the de facto standard. Bzip2 creates much smaller compressed files, but is more memory intensive and can take significantly longer to compress files. Decompression speed is asymmetric, and much faster than compression, but still slower than gzip. So, unless space is at a particular premium, it's better to use gzip.

The basic syntax is very simple. For example, to compress the file foo.txt you would run:

gzip foo.txt

This would create a file called foo.txt.gz

The default compression level is -6, which is biased towards high compression at the expense of speed. This can be modified by using the -1 or --fast switch (faster compression method, with less compression), or -9/--best switch (best possible compression at the expense of speed).

So, if you wanted the maximum compression possible, you would run:

gzip -9 foo.txt (or gzip --best foo.txt)

To decompress the file you would typically either run:

gzip -d foo.txt.gz (or gunzip foot.txt.gz)

Bzip2 uses the same basic sytax. To compress a file:

bzip2 foo.txt

Which would created a file called foo.txt.bz2

To decompress the file run:

bzip2 -d foo.txt.bz2 or (bunzip2 foo.txt.bz2)

If the compression target consists of multiple files, i.e. the contents of a directory, they must first be concatenated using the tar command to reduce them to a single target.

The basic tar format is:

tar -cvf foo.tar foo/

NB: The 'f' switch must ALWAYS be the last parameter or it will fail, as this must precede the target file name, i.e. foo.tar.

This will collect together the contents of the directory foo/ and create a single file called foo.tar. This file will remain the same size as the original directory. Tar does not compress the data, so it must then be compressed using either gzip or bzip2. Using gzip you would run:

gzip foo.tar

This would create a file called foo.tar.gz (refered to as a tarball) which would be smaller than the original directory or tar file.

Usually tar file creation and the file compression are all done in one command using tar. Using our earlier example of foo/, to tar and gzip the file using just the tar command, we would run:

tar -zcvf foo.tar.gz foo/

The -z switch (--gzip is also valid) tells tar to compress the file using gzip. If you want to use bzip2 you would replace this with either -j or --bzip2.

0 comments:

Post a Comment