cmd:gzip, how worse it is to compress again 

$ dir bin*
-rw-------   1 suntong  glan       356561 Feb 21 17:01 bin.tgz
$ mv bin.tgz bin
$ gzip bin
$ dir bin*
-rw-------   1 suntong  glan       356449 Feb 21 17:01 bin.gz
Tip !!

So it doesn't hurt to gzip again.

cmd:bzip2 

Usage 

cd ~/www/gens
tar -cvf - rget | bzip2 > /tmp/rget.tbz
bzip2 -t /tmp/rget.tbz
bzip2 -d -c /tmp/rget.tbz | tar -tvf -
bzip2 -d -c /tmp/rget.tbz | tar -xvf -
bakd rget -d /tmp
$ dir /tmp/rget.*
-rw-rw----    1 tong     tong          506 Apr 18 17:53 /tmp/rget.tbz
-rw-rw----    1 tong     tong          473 Apr 18 17:55 /tmp/rget.tgz

Info 

bzip2 is a freely available, patent free (see below), high-quality data compressor. It typically compresses files to within 10% to 15% of the best available techniques (the PPM family of statistical compressors), whilst being around twice as fast at compression and six times faster at decompression.

Source 

http://sources.redhat.com/bzip2/index.html

DESCRIPTION 

bzip2 compresses files using the Burrows-Wheeler block sorting text compression algorithm, and Huffman coding. Compression is generally considerably better than that achieved by more conventional LZ77/LZ78-based compressors, and approaches the performance of the PPM family of statistical compressors.

The command-line options are deliberately very similar to those of GNU gzip, but they are not identical.

MEMORY MANAGEMENT 

bzip2 compresses large files in blocks. The block size affects both the compression ratio achieved, and the amount of memory needed for compression and decompression. The flags -1 through -9 specify the block size to be 100,000 bytes through 900,000 bytes (the default) respectively.

In general, try and use the largest block size memory constraints allow, since that maximises the compression achieved. Compression and decompression speed are virtually unaffected by block size.

RECOVERING DATA FROM DAMAGED FILES 

bzip2 compresses files in blocks, usually 900kbytes long. Each block is handled independently. If a media or transmission error causes a multi-block .bz2 file to become damaged, it may be possible to (partially) recover (good) data from the undamaged blocks in the file.

The compressed representation of each block is delimited by a 48-bit pattern, which makes it possible to find the block boundaries with reasonable certainty. Each block also carries its own 32-bit CRC, so damaged blocks can be distinguished from undamaged ones.

bzip2recover is a simple program whose purpose is to search for blocks in .bz2 files, and write each block out into its own .bz2 file. You can then use bzip2 -t to test the integrity of the resulting files, and decompress those which are undamaged.

bzip2recover should be of most use dealing with large .bz2 files, as these will contain many blocks. It is clearly futile to use it on damaged single-block files, since a damaged block cannot be recovered. If you wish to minimise any potential data loss through media or transmission errors, you might consider compressing with a smaller block size.

cmd:unarj 

Info 

Source 

http://w3.linux.tucows.com/files/console/file/unarj-2.43.tar.gz

Related Urls 

Build & Installation 

Steps 
tfe ~/dl/mustH_b/va/unarj-2.43.tar.gz
make clean
make
make INSTALLDIR=/opt/bin install

Compressed backups 

http://www.faqs.org/docs/linux_admin/x2717.html

Backups take a lot of space, which can cost quite a lot of money. To reduce the space needed, the backups can be compressed. There are several ways of doing this. Some programs have support for for compression built in; for example, the —gzip (-z) option for GNU tar pipes the whole backup through the gzip compression program, before writing it to the backup medium.

Unfortunately, compressed backups can cause trouble. Due to the nature of how compression works, if a single bit is wrong, all the rest of the compressed data will be unusable. Some backup programs have some built in error correction, but no method can handle a large number of errors. This means that if the backup is compressed the way GNU tar does it, with the whole output compressed as a unit, a single error makes all the rest of the backup lost. Backups must be reliable, and this method of compression is not a good idea.

An alternative way is to compress each file separately. This still means that the one file is lost, but all other files are unharmed. The lost file would have been corrupted anyway, so this situation is not much worse than not using compression at all. The afio program (a variant of cpio) can do this.

Compression takes some time, which may make the backup program unable to write data fast enough for a tape drive. [1] This can be avoided by buffering the output (either internally, if the backup program if smart enough, or by using another program), but even that might not work well enough. This should only be a problem on slow computers.

From "The Linux System Administrator's Guide" Version 0.7 Chapter 12.

documented on: 2005.05.05