Technology blog::Life hacks::Linux::Hardware::Gaming

How to Archive, Compress and Extract files in Linux

Easy guide to archiving, compressing and extracting files in Linux

By , 24th July 2016 in Linux

It's often useful to be able to archive multiple files into a single file, for backup or organisation reasons. These archives are often compressed to save disk space. Read on and find out how to archive, compress and extract files in Linux.

The most basic, and probably also the most powerful archiving tool available in Linux is the humble TAR command. TAR started as the Tape ARchiver and was used to simply send a stream of files to a sequential tape archive. It can also be used to create a physical file in the file system.

Let's have a look at creating an archive of my current home directory and backup all the data.

We can use the DU (Disk Usage) command to see the current size of the directory and we will compare this to the size of the generated archives.

du -sh .

Executing this command in my home directory shows 288K used.

Creating Archives with TAR

Let's go ahead and create an archive of these files.

tar -cvf /tmp/$USER.tar $HOME

This command has a bunch of flags and parameters, let's have a look at each of them and see what they do.

  • -c create archive
  • -v verbose (see what's happening)
  • -f specify the file

The parameters are broken down as follows. $USER is a system variable which contains the current user's name and $HOME is a system variable containing the current user's home directory path. In this command, it will create an archive called tim.tar and /tmp, and it will archive my home directory.

We can go ahead and run this command and see what happens. Remember before when we saw the size of the current home directory? We can do the same with the newly created archive.

du -h /tmp/tim.tar

For my archive, it shows that 192K is in use, which is a little bit smaller. This isn't however due to compression, it is a result of the file system block size (more on this in another tutorial) but essentially the minimum amount of space that a file can occupy is called the block size, typically 4k. So a tiny file, say only a few characters in length will still take up 4k of disk space. A 5k file will occupy one 4k block fully and another 4k file, most of which is wasted space.

Testing TAR Archives

Now let's test the archive because it's all well and good backing up your data, but if the archive is corrupt there is little point. To view or test an archive we can also use the TAR command.

tar -tf /tmp/tim.tar

Expanding Archives

To expand the archive into the current directory, again with verbose and specify file flags, we can use the following command.

tar -xvf /tmp/tim.tar

This will unpack all the files into the current directory.

Compressing Archives

TAR files can also be compressed using the gzip and bzip command to shrink down the backups to the smallest file size. This is great for transmitting online or fitting more data onto backup devices.

Using gzip to Compress the Archive

gzip tim.tar

Well, that was easy, wasn't it! This has now created an archive called tim.tar.gz. NOTE: This will remove the original archive and replace with the gzip archive. The gzip archive is now a lot smaller than the original archive.

Using bzip to Compress the Archive

Bzip is an alternative to gzip which offers slightly better compression at the cost of performance. They are functionally identical.

Creating the bzip archive is as easy as running this command:

bzip2 tim.tar

Uncompressing Archives

Uncompressing archives often called unzipping, is the reverse process. It uncompresses the archive to the current directory or specified directory.

To uncompress gzip archives

This is the opposite of the gzip command. It recreated the original archive and removes the .gz archive.

gunzip tim.tar.gz

To uncompress bzip archives

bunzip2 tim.tar.bz2

Again, this will recreate the original archive and removes the .bz2 extension.

Streamlining Compression

It's a bit of a pain having to issue two commands to archive and compress files. Luckily we can use piping to send the output of the TAR command to the GZIP or BZIP commands.

tar -cvzf tim.tar.gz
tar -cfjf tim.tar.bz2
tar -xvzf tim.tar.gz
tar -xvjf tim.tar.bz2

Using CPIO for Archiving

CPIO (CoPy Input Output) is another general file archiver utility. Like TAR it does not compress by default, but you can create gzip and bzip archives from it.

CPIO has the ability to read in the directory and path names of the files to archive from the STDIN pipe, which means that you can use commands such as find to specify what is included in the archive.

find -name '*.pdf' | cpio -o > /tmp/pdf.cpio

This will run the find command to search for all the PDF files, these are then archived to the /tmp/pdf.cpio archive file. -o means output.

Expanding the archive is also as easy as typing

cpio -id < /tmp/pdf.cpio

In this case -i means read input and -d specifies that the directories should be created if they don't already.

Imaging with DD

Disk Duplicator (dd) is a tool for archiving and backup of complete partitions or entire disks. This is called imaging or cloning a hard drive. Disk images can be used to backup drives, create snapshots or create ISO images of CD and DVD disks. The images created are exact representations of the original filesystem, and they can be mounted as any other device.

Create an ISO image from a CD

dd -if=/dev/sr0 of=cdimage.iso

-if specifies the input source, the -of parameter specifies the output file. You may need to change sr0 to your CD drive device depending on your distro.

Create an image of a hard drive

dd -if=/dev/sda of=harddrive.img

Create an image of a partition

dd -if=/dev/sda1 of=harddrive.img

There are no comments for this post. Be the first!

Leave a Reply

Your email address will not be published.