How to Archive, Compress and Extract files in Linux
- Linux Tips for Beginners
- Beginners guide to Reading and Finding Files in Linux
- Using Grep to Search for Text in Linux
- How to Archive, Compress and Extract files in Linux
- Linux Hardlinks and Softlinks
- Basic Data Recovery in Linux
- Essential Guide to Working with Files in Linux
- Apache Administration on Linux
- MySql Administration on Linux
The most basic, and probably also the most powerful archiving tool available in Linux is the humble TAR command. TAR started as the Tape ARchiver and was used to simply send a stream of files to a sequential tape archive. It can also be used to create a physical file in the file system.
Let's have a look at creating an archive of my current home directory and backup all the data.
We can use the DU (Disk Usage) command to see the current size of the directory and we will compare this to the size of the generated archives.
du -sh .
Executing this command in my home directory shows 288K used.
Creating Archives with TAR
Let's go ahead and create an archive of these files.
tar -cvf /tmp/$USER.tar $HOME
This command has a bunch of flags and parameters, let's have a look at each of them and see what they do.
- -c create archive
- -v verbose (see what's happening)
- -f specify the file
The parameters are broken down as follows. $USER is a system variable which contains the current user's name and $HOME is a system variable containing the current user's home directory path. In this command, it will create an archive called tim.tar and /tmp, and it will archive my home directory.
We can go ahead and run this command and see what happens. Remember before when we saw the size of the current home directory? We can do the same with the newly created archive.
du -h /tmp/tim.tar
For my archive, it shows that 192K is in use, which is a little bit smaller. This isn't however due to compression, it is a result of the file system block size (more on this in another tutorial) but essentially the minimum amount of space that a file can occupy is called the block size, typically 4k. So a tiny file, say only a few characters in length will still take up 4k of disk space. A 5k file will occupy one 4k block fully and another 4k file, most of which is wasted space.
Testing TAR Archives
Now let's test the archive because it's all well and good backing up your data, but if the archive is corrupt there is little point. To view or test an archive we can also use the TAR command.
tar -tf /tmp/tim.tar
To expand the archive into the current directory, again with verbose and specify file flags, we can use the following command.
tar -xvf /tmp/tim.tar
This will unpack all the files into the current directory.
TAR files can also be compressed using the gzip and bzip command to shrink down the backups to the smallest file size. This is great for transmitting online or fitting more data onto backup devices.
Using gzip to Compress the Archive
Well, that was easy, wasn't it! This has now created an archive called tim.tar.gz. NOTE: This will remove the original archive and replace with the gzip archive. The gzip archive is now a lot smaller than the original archive.
Using bzip to Compress the Archive
Bzip is an alternative to gzip which offers slightly better compression at the cost of performance. They are functionally identical.
Creating the bzip archive is as easy as running this command:
Uncompressing archives often called unzipping, is the reverse process. It uncompresses the archive to the current directory or specified directory.
To uncompress gzip archives
This is the opposite of the gzip command. It recreated the original archive and removes the .gz archive.
To uncompress bzip archives
Again, this will recreate the original archive and removes the .bz2 extension.
It's a bit of a pain having to issue two commands to archive and compress files. Luckily we can use piping to send the output of the TAR command to the GZIP or BZIP commands.
tar -cvzf tim.tar.gz tar -cfjf tim.tar.bz2 tar -xvzf tim.tar.gz tar -xvjf tim.tar.bz2
Using CPIO for Archiving
CPIO (CoPy Input Output) is another general file archiver utility. Like TAR it does not compress by default, but you can create gzip and bzip archives from it.
CPIO has the ability to read in the directory and path names of the files to archive from the STDIN pipe, which means that you can use commands such as
find to specify what is included in the archive.
find -name '*.pdf' | cpio -o > /tmp/pdf.cpio
This will run the find command to search for all the PDF files, these are then archived to the /tmp/pdf.cpio archive file. -o means output.
Expanding the archive is also as easy as typing
cpio -id < /tmp/pdf.cpio
In this case -i means read input and -d specifies that the directories should be created if they don't already.
Imaging with DD
Disk Duplicator (dd) is a tool for archiving and backup of complete partitions or entire disks. This is called imaging or cloning a hard drive. Disk images can be used to backup drives, create snapshots or create ISO images of CD and DVD disks. The images created are exact representations of the original filesystem, and they can be mounted as any other device.
Create an ISO image from a CD
dd -if=/dev/sr0 of=cdimage.iso
-if specifies the input source, the -of parameter specifies the output file. You may need to change sr0 to your CD drive device depending on your distro.
Create an image of a hard drive
dd -if=/dev/sda of=harddrive.img
Create an image of a partition
dd -if=/dev/sda1 of=harddrive.img