which is better: find|cpio or tar 

http://aa11.cjb.net/hpux_admin/1997/0249.html

I asked "which is better (and WHY?): find|cpio or tar for copying a disk to a slightly smaller drive."

About 15 replies have arrived, most of which (a) pointed out my lack of "-depth" in my example. Valid point. mea culpa. -depth will preserve the time-stamps of directories. Two folks also added the "u" option… immaterial in my case, since i was going to a blank disk.

Mark Jones at Motorola suggested the obvious test (not done due to other time pressures) and replied that tar cannot handle very long pathnames, which -is- a very good reason to use the find|cpio method.

> From: Mark Jones <mjones@pencom.com>
>   To answer your question, try both with the time command:
>     time tar -cf - * |(cd /other_place; tar -xf - )
>     time find . -xdev -depth -print | cpio -pmdux /other_place
>
>  Tar has a limitation on the number of characters it can
>  read in a path.  Since we have long paths here, we always
>  use cpio.  I don't know if that limitated was fixed
>  with 10.X.  We are still running 9.05.

(as it happened, my copy-over was on a v9.01 system)

Tom Coates (tom_coates@trimble.com) preferred find|cpio for these reasons:

>  I doubt there would be much difference in speed, since most of the
>  time is probably taken up with just transfering the data.  This
>  seems even more likely since you are copying an entire disk.
>
>  I've always used find|cpio, as it gives very good control over
>  preserving file modification dates, etc.  Also, once you get good
>  at writing find commands, you can filter the copied files to get
>  only what you want.  I've had fits in the past with tar, trying to get
>  it to copy several trees from different locations, involving symbolic
>  links into a single new location.  Probably the nicest feature of
>  find|cpio is that you can work out the find command first, to see
>  exactly what is going to be copied.  Then you just repeat the command
>  piped to cpio to do the actual copying.

Chris Marble (chris_marble@hmc.edu) suggested adding the "-depth", and said:

>  I think it's just personal taste.  I always recommend the
>  cpio command and have been posting it for about 2 1/2 years when
>  anyone asks.  On my SGI systems I use dump and restore to copy disks.
>  I think the cpio could handle CDFs (Cluster Dependant Files)
>  better than anything else.  But CDFs don't expst with HP-UX 10.

Tony Kruse (akruse1@ford.com) suggested a third method (which had crossed my mind, but not in the multi-reader sense he mentions):

>  I always use
>  fbackup -c /usr/adm/fbackupfiles/backup.config -i /usr -f - | \
>       (cd /mnt; frecover -Xrf -)
>  since I can specify 6 filesystem readers to keep the 1 fbackupwriter
>   busy in my backup.config file.

One reader didn't notice that i was going to a -smaller- disk, and suggested "dd", and another warned me that tar might not do symbolic links (it does).

For what it's worth, my find|cpio of a 1.8-gigs-used disk on a 9000/710 v9.01 took about 1.5 hours. (Digital Equip DSP3210 to HP C2490A)

Something find|cpio did NOT do "properly" was a large number of instances where it couldn't/wouldn't set the file's group to match the original. Usually the old group was "other", and it ended up "sys". Tar slavishly sets the owner/group to their original numeric values, even if those don't exist (if a cross-machine operation).

dick seymour

p.s.: in OpenVMS the command to use is "BACKUP/image in-drive out-drive "

My original posting (-depth added):

>I've got two similar, but not-exactly-equal, disks... and i want to
> copy the filesystem (in this instance: not root, not needing to be bootable)
> from one to the other.
>
>Which method is "better", and why?
>
>  cd /old_disk ; find . -depth -print | cpio -pdxm /new_disk
>
> or
>
>  cd /old_disk ; tar -cf - . | (cd /new_disk; tar -xf - )
>
>In previous exercises like this, i've happily used "tar", and the
> results seemed to perform properly.  However i've noticed postings
> here recommending the "find" route... which i'm using at the
> moment to migrate users from a screeching 2.1 gig disk to a new 2.06
> gig disk.
>
>So, what's the difference? Which is faster? Which is fraught with peril?
> "tar" would seem to expend more-than-necessary CPU cycles, since its
>  original goal was an archive file with internal structure,  but i'm
>  primarily concerned with wall-clock time.