My *nix world

Creating multi-volume tar.bz2 archives

One thing I love about Linux is its flexibility. Linux is like a Lego: many pieces that you can stick together just to get something different that no one thought about before.

The following is just an example of its flexibility although many people did that before so I do not demand any credit for this 😉

If you just one to read about creating multi-volume tar.bz2 files or even tar.gz files then click here. Otherwise you can just keep reading, it will help you understand better why and how we create multi-volume tar.bz2 archives on Linux.

How to create an archive for a specific file or folder? That's simple: TAR it!

tar -cf my-archive.tar source-file-or-folder

Each file on your folder is a piece of Lego. If you stick many pieces together you get a TAR. Also a TAR archive is nothing more than a plain file that encapsulate (without any compression) one or more files/folders. Of course, it contains a header too so that we can tell where starts/ends each file.

Ok, how about a compressed archive? But first what is an compressed archive?

Well, a compressed archive is just a plain archive (also an uncompressed TAR) that we put it in a "vice" (or vise => a squeeze tool) which tries to compress it as much as possible. Linux has many "vices", it has one called ZIP, another called BZIP, another called GZIP and so on. Each of these vices are a bit different than the others so you can choose what fits to you. Let's call these vices compression filters, do we?

To create a a .tar.bz2 compressed archive you just need a TAR (and I've shown you how to make one) and a BZIP filter. All you do is to send your uncompressed archive (TAR) to the compression filter ( BZIP). The compression filter will squeeze all these bytes and will output the compressed form of the original archive. Put that info into a file and you have gotten a compressed tar.bz2 file:

bzip2 uncompressed-file > compressed-file.bz2

 We can mix these two commands so we can create the archive and then compress it in just one step:

tar -cf - source-file-or-folder | bzip2 > compressed-file.bz2

If you take a look at the command above you will notice that after the -cf arguments (--create and --file) I have placed the symbol "-" instead for "my-archive.tar". This is because I don't want to save the resulted my-archive.tar on the disk but to "send it to memory", byte by byte. Then you will notice that when I've used the bzip2 command I have not mentioned the "uncompressed-file". This is because the symbol "|" (Linux pipe) will instruct the Linux to take that piece of memory from memory (of course) and to send it to the next command (i.e. bzip2). In a word, instead using the disk we have used the memory so we create the archive in the memory then we have squeezed that memory and save the compressed result to compressed-file.bz2 file.

The tar command has also options that allows us to tell it that we want actually to create a compressed archive so that the -j or -z options comes in rescue:

tar -cjf compressed-file.bz2 source-file-or-folder

 Of course, you can use some other (custom) compression filters that no one know about them and you can instruct the tar program to send the uncompressed resulted archive to that program by the -I option, like:

tar -I /bin/my-super-duper-filter -cf - source-file-or-folder > compressed-file.bz2

 

Creating multi-volume tar.bz2

What about if you have a 50GiB folder/file and you want to store it on some storage device that can handle less than 1GiB data? Well, you can create more archive, can't you? But what if you want to do that automatically without being necessary to analyze your data to decide what goes where, did you reached the storage threshold and so on. There are so many options in Linux that is hard to start talking about them.

For instance you can create a 50GiB uncompressed tar archive then split it (with the split command) in volumes of 1GiB of data. Of course that would be stupid because you would need so much free disk space to work out this thing but in theory you can if you want it. Another (better) option would be to instruct the tar program to split it in multi-volumes of 1GiB each (where 1 GiB=1048576 KiB):

tar -ML 1048576 -cf my-archive.tar source-file-or-folder

When the tar will reach that threshold of 1048576 KiB (i.e. 1 GiB) of data it will prompt you to enter the name of the next volume. So this will require user intervention. What if we want to do it programatically, like a cron job or something?

That will complicate little bit the story. If we would knew from the beginning how large is the source file/folder we could calculate up-front how many volumes will be created and we could instruct the tar command to take its volume's name from the console like this:

printf 'n my-archive.tar-%02d.tar\n' {2..50} | tar -ML 1048576 -cvf my-archive.tar source-file-or-folder

 The "printf 'n my-archive.tar-%02d.tar\n' {2..50}" will print-out 49 different names and the tar command will take these from the STDIN as its volume names. If we don't know upfront that there will be 50-1=49 volumes then...we can write a small script (let's name it multivol-tar.sh) that finds out this for us and use it in the command above.

The script bellow can be called with the following syntax:

multivol-tar.sh source-file-or-folder my-archive 1073741824

where 1073741824 means the volume size in bytes (i.e. 10243=1 Gib)

#!/bin/bash
src_len=$(du -sb $1|cut -f1)
vol_count=$(($src_len/$3+1))
printf "n $2-%d.tar\n" `seq 2 ${vol_count}` | tar -ML $(($3/1024)) -cvf $2-1.tar $1

If you want to tar and compress at the same time you should replace the last line in the script above with this one:

printf "n $2-%d.tar\n" `seq 2 ${vol_count}` | tar -ML $(($3/1024)) -cvf $2-1.tar $1 \
&& for f in `ls source-folder*.tar`;do bzip2 -9> $f.bz2

 What this script does is to calculate the source length in bytes, to divide it by the volume size (the script third parameter) and thus to get the volume count. Then to send to console as much names as volume count which will be used by the tar command. When tar command has been created all these multi-volume tar archives we throw these to the compression filter and that's it!

Oh, by the way: if you want to run all of these on Windows (yes, Linux "works" on Windows - CygWin/MinGW - as much as Windows works on Linux - Wine/Mono) you should install CygWin and run the command (at Windows prompter for instance) like this:

C:\cygwin\bin\bash.exe --login -c "the-linux-command-i-shown-you-before"

 Yes, put between quotes the same Linux command as before. Note that on Windows your files system is not used like in Unix and thus, in "the-linux-command-i-shown-you-before" you should replace the path accordingly.

Example:

C:\cygwin\bin\bash.exe --login -c "printf 'n /cygdrive/c/WINDOWS/Temp/myarchive-%d.tar\n' {1..2} | tar -ML 153600 -cvf /cygdrive/c/WINDOWS/Temp/myarchive.tar /cygdrive/d/MyApps/Data 2>/dev/null && for f in `ls /cygdrive/c/WINDOWS/Temp/myarchive*.tar`;do bzip2 -9 -fqvk $f;done"

 Note that in cygwin you can access the C: or D: drive by using the cygwin internal mount-point which is /cygwin/c/ respectively /cygwin/d/

Now, if you think that this article was interesting don't forget to rate it. It shows me that you care and thus I will continue write about these things.

The following two tabs change content below.
Creating multi-volume tar.bz2 archives

Eugen Mihailescu

Founder/programmer/one-man-show at Cubique Software
Always looking to learn more about *nix world, about the fundamental concepts of math, physics, electronics. I am also passionate about programming, database and systems administration. 16+ yrs experience in software development, designing enterprise systems, IT support and troubleshooting.
Creating multi-volume tar.bz2 archives

Latest posts by Eugen Mihailescu (see all)

Tagged on: , , , , ,

Leave a Reply

Your email address will not be published. Required fields are marked *