...making Linux just a little more fun!

| TAG Index | 1 | 2 | 3 | 4 | Knowledge Base | News Bytes -->

The Answer Gang

By Jim Dennis, Karl-Heinz Herrmann, Breen, Chris, and... (meet the Gang) ... the Editors of Linux Gazette... and You!



(?) tar and find

From anonymous

Answered By: Thomas Adam

I'd like to tar up the contents of /var/www but I'd like to exclude a couple of directories.

I usually use

tar -zcvf www.tar.gz /var/www

but that does everything.

Idea's please

(!) [Thomas] Tar allows for you to have an exclude wildcard, rather than using a file from which exclusions are stored:
tar -czvf foo.tar.gz --exclude='*foo*' /var/www
.. would allow you to specify a wildcard from which a list of files and/or directories could be excluded.
Of course, if you're going to do that, this is where you really want to use find. Here's an example. I have a directory "tar" which has some files, and two directories:
[n6tadam@station tar]$ ls -lFh
total 20K
-rw-r--r--  1 n6tadam n6tadam    4 Jan 17 15:05 a
-rw-r--r--  1 n6tadam n6tadam   34 Jan 17 15:31 b
-rw-r--r--  1 n6tadam n6tadam   32 Jan 17 15:31 c
drwxr-xr-x  2 n6tadam n6tadam 4.0K Jan 17 15:05 foo/
drwxr-xr-x  2 n6tadam n6tadam 4.0K Jan 17 15:04 foo2/
Now let us assume that I only want to tar the files a,b,c and exclude the ./foo{,2} stuff. What you really want is to preprocess your results with find. You can exclude one directory from a list. Here's an example:
find . -path './foo' -prune -o -print
.. and note the syntax. The "." assumes that we're already in the same directory that we want the search to start from. In this case the "-path" option to find matches a pattern, treating "." and "/" as literal characters to match against. The -prune option excludes it (it assumes a -depth level, and doesn't descend into the path given. Then "-o" says to match everything else, and -print the results [1].
On running that command:
[n6tadam@station tar]$ find . -path './foo' -prune -o -print
.
./a
./b
./c
./foo2
./foo2/d
./foo2/e
./foo2/f
... you'll see ./foo has been excluded. But how do you match more than one exclusion? I might not want ./foo or ./foo2 to be in my tar archive. Ok, here's how:
find . \( -path "./foo" -prune -o -path "./foo2" \) -prune -o -print
I've encapsulated the order that find should place when looking for the files, but it's not necessary in this example. It's just an aggregation of the command we saw earlier.
[n6tadam@station tar]$ find . \( -path "./foo" -prune -o -path "./foo2" \)  -prune -o -print
.
./a
./b
./c
... which leaves us with the desired result.
Now the fun stuff. How do you get tar to use the results given to produce a tar file? For ease of use, we'll modify our find command to show the filenames with the full path name, rather than "./" (which isn't at all helpful to us):
[n6tadam@station tar]$ find $(pwd) \( -path "$(pwd)/foo" -prune -o -path "$(pwd)/foo2" \) -prune -o -print

/tmp/tar
/tmp/tar/a
/tmp/tar/b
/tmp/tar/c
So we can see what's what. You might think that it's just a case then of doing:
find $(pwd) \( -path "$(pwd)/foo" -prune -o -path "$(pwd)/foo2" \) -prune -o -print -exec tar -czvf ./foofile.tgz {} \;
... but in fact that won't work, since what that does is runs the command like this:
tar -czvf ./foofile.tgz /tmp/tar
tar -czvf ./foofile.tgz /tmp/tar/a
tar -czvf ./foofile.tgz /tmp/tar/b
tar -czvf ./foofile.tgz /tmp/tar/c
... but, there are two things wrong with this. One, is that it's specifying "/tmp/tar" as a valid entry to our tar file. That's not what we want -- we *don't* want that recursive nature to tar -- so already that's put pay to the whole of the find command (more about that in a minute).
The second problem is that each time that tar command runs, it's replacing the tar file with the new file, rather than appending it. Ouch! So if you were to look at that tar file now, all you would see is "/tmp/tar/c" since that was the last file created in the tar file.
Tar supports the "-A" option -- to append to a tar file. But that presupposes that the tar file is already in existence -- and the assumption here is that it isn't. So we can't use it.
Also, using -exec on find is a terrible idea in this case, since it runs a copy of the same command (tar in this case) for every file encountered, and since the tar file is never created...
So, we'll use xargs. That builds up command-line input on a chain so that when it is run, we'll see something like this:
tar -czvf ./foofile.tar /tmp/tar /tmp/tar/a /tmp/tar/b /tmp/tar/c
Which is exactly what we want. But we first have to ensure that we further disclude that "/tmp/tar" entry. And there's an option to tar to do that: "--no-recursion".
The other consideration to take into account are filenames. Even if you're sure that the filenames are valid, etc., it is still good practise to assume the worst. Modifying our initial find command, we can tell it to split filenames based on '\0' (rather than what $IFS defines it as). The "print0" option to find defines this:
find $(pwd) \( -path "$(pwd)/foo" -prune -o -path "$(pwd)/foo2" \) -prune -o -print0
... which'll give us:
/tmp/tar/tmp/tar/a/tmp/tar/b/tmp/tar/c
Which by itself is useless. But in this situation, we can tell xargs to reinterpret that via "xargs -0", so that's not a problem. It's just a means of protecting the filenames so that they're not mangled.
So if we piece my ramblings together the actual command you'll want to use is:
find $(pwd) -path "$(pwd)/foo" -prune -o -path "$(pwd)/foo2" -prune -o -print0 | xargs -0t tar --no-recursion -PSczf $(pwd)/foofile.tgz
Note the "-t" option to xargs(1). This just prints out the command (as you might have typed it on the command-line) before it is executed.
As a check, we can now ensure that the above command worked:
[n6tadam@station tar]$ tar -tzf ./foofile.tgz
/tmp/tar/
/tmp/tar/a
/tmp/tar/b
/tmp/tar/c
.. and yep, it did. So you can go ahead and modify at will. Is this any easier than creating a file with a list of entries? Probably not...
-- Thomas Adam
[1] -print is backwards compatable with those versions of find which are not GNU.

This page edited and maintained by the Editors of Linux Gazette
HTML script maintained by Heather Stern of Starshine Technical Services, http://www.starshine.org/


Each TAG thread Copyright © its authors, 2005

Published in issue 111 of Linux Gazette February 2005

| TAG Index | 1 | 2 | 3 | 4 | Knowledge Base | News Bytes -->
Tux