ARG_MAX
| Shells
| portability
| permissions
| UUOC
| ancient
| -
| ../Various
| HOME
$() vs )
| IFS
| using siginfo
| nanosleep
| line charset
| locale
find(1)emphasizing on portability and the very details of a few basic issues
2009-03-11 (see recent changes)
Table of content
Basic issues:
-exec cmd + ")
Pointers to several system independent implementations.
An often recommended way to limit find(1) to one
level (i.e., not descending into directories) is using the
expression "-maxdepth 1".
This expression was introduced by GNU findutils. FreeBSD 4.1, NetBSD 2.0, OpenBSD 2.0 and the AST toolchest adopted this. Mac OS X implements it since 10.2 (Darwin 6), switching from NetBSD find to FreeBSD 4.5 find.
But it's also possible to do this with the traditional find 1.
| [1] |
You need a find(1) that offers the expression "-prune".
And nowadays this is generally available, because it had already been introduced with 4.3BSD-Reno and SVR4 (i.e., about 1989). An exception to this rule is the busybox multi-call binary: it emphasizes on minimalistic replacements. It implements -prune since 2007-06-01, here it's a compile time option. |
The portable way to avoid descending is
$ find . ! -name . -prune <remaining expressions>
$ find /etc/. ! -name . -prune <remaining expressions>
Usually, the second works as well (but i know Cray Unicos as exception).
find /etc [...]"
and "find /etc/ [...]".)
The order of arguments is crucial here, because they are not options but expressions.
The explanation sounds obvious: find never
lists the '..' entry. If you also exclude the
'.' entry and then apply "-prune" to all
the remaining entries, find certainly won't descend anymore.
If you ever need it included (pointed out by Stephane Chazelas in comp.unix.shell),
then use "find . \( -name . -o -prune \)".
About the portability:
$ find /etc/. ! -name /etc/. -prune <remaining expressions>
$ find /etc ! -name /etc -prune <remaining expressions>
and so on; because here the string is compared literally (and not with
an internal "basename" mechanism).
The results of this call will look like
"/etc/./hosts", which resolves without any problems on Unix.
Here, be careful when additionally applying a pattern match on the result of the "-prune". At least the versions since 3.8 until the stable 4.1 (and until the alpha release 4.1.5) suffer from a bug, which prevents them from working:
$ find . ! -name . -prune -name '<pattern>' <remaining expressions>
"-name" is not only applied to the result of
"-prune" but all existing entries,
because it is "optimized" to the left of "-prune".
This is violating the fundamental left-to-right order of evaluation.
By the way: If you negate the second "-name" then the bug is apparently
not triggered anymore.
Thus, unmaking the negation by just
doubling it looks like a proper workaround:
$ find . ! -name . -prune ! \( ! -name '<pattern>' \) <remaining expressions>
(See <3D7355C4.23L11TSIK@bigfoot.de> and
<3DA23EDA.LHE11XWCK@bigfoot.de>.)
find /etc ! -name etc
-prune <remaining expressions>"
This won't omit files like "/etc/etc/file".
Thanks to Stéphane Chazelas for reminding me of this.
And interestingly, on SCO OpenServer 5.0.6, this requires
"! -name /etc" instead of
"! -name etc", for top level directories (only).
find /etc/ ! -name etc
-prune <remaining options>"
/etc/etc/file" either; and
the actual output is even varying, because on many systems,
you mustn't append a slash to the path argument. This is depending on the
find-internal "basename" implementation, which is often simpler than
the libc implementation:
The above will fail with a trailing slash for example on AIX 3.2, FreeBSD 4.3, 4.5, GNU-findutils-4.1, Irix 6.5, 5.3, NetBSD 1.5.2, OpenBSD 2.9, OpenServer 5.0.6, Solaris 2.1-2.9 and SunOS 4.1.4.
It will work properly on AIX 4.3, HP-UX 10.x, NetBSD 1.5 and OSF1/V4, /V5.
On some of the former, affected versions (all Solaris, FreeBSD-4.3, OpenBSD-2.9, Irix 5/6), you even can illustrate the "basename bug" with an interesting workaround. Both following examples yield the same correct result, although the empty argument is provided in the second case:
$ find /etc ! -name etc -prune -print
$ find /etc/ ! -name '' -prune -print
This interesting side effect is a main reason why i mention these two unportable variants.
Why using find at all? It's helpful for
find to other commands, like
chown, chmod, etc). Otherwise,
you would have to fiddle around with more complicated shell pattern
matching 2 or alike.
| [2] |
See also 8ned6b$k8$1@nnrp1.deja.com
and ff., "Command to find out if a directory is empty", in comp.unix.shell about a discussion of patterns like ".??* .[!.]* ? *".
|
find
Another frequently mentioned feature of GNU findutils
3 is a special combination
with xargs, "[...] -print0 | xargs -0".
The purpose is increasing performance by avoiding to fork/exec
for each single argument;
and the usage of null-terminated
filenames avoids problems with unexpected filenames.
| [3] | NetBSD 1.0, FreeBSD 2.0.5, (and thus) Mac OS X from start, OpenBSD 2.1, the AST toolchest and HP-UX 11.23 incorporated this, too. |
However, various find implementations know about the expression
+ (instead of ;) in connection with -exec.
This increases performance in the same way, but obsoletes xargs.
It's much simpler then:
$ find . -name xxx -exec command {} +
This has become a standard: it's specified in the
SUSv3,
aka IEEE 1003.1-2001/2004.
find . -exec echo x {} +" works correct.
Some systems don't provide this feature, although you might have expected it. Either this feature
can be deactivated at compile time or - more likely - find originates from pre-SVR4 on these systems:
Having "-exec cmd {} +" in mind, the usefulness and elegance of "-print0"
is questionable,
because most utilities (including the whole traditional
toolchest) don't handle null-terminated output.
Stéphane Chazelas pointed out that at least
GNU grep (-Z), GNU sort (-z), perl (-0) and zsh (via IFS)
understand this output. Apart from these, some utilities from the new AT&T ast toolchest
also understand
it and the bash "read" built-in knows an
appropriate option (-d delimiter) since release 2.04.
Another reason to avoid xargs: possible problems with the character set encoding of arguments for xargs.
For example, some xargs implementations (e.g. SunOS 5) yield an error, if you run them in a multi-byte locale
with path names encoded in a single-byte locale (spotted by S.Chazelas and G.Clare in c.u.s).
-name expression shall do matching
like pattern matching, not like filename expansion.
find . -name '*'"
matches ".". Think of "case filename in pattern) ...".
On Version 7, find implements the match with
glob(3) and thus handles the dot special.
The same applies to System III and at least until 4.3BSD-Tahoe.
In the BSD line, find on 4.3BSD-Reno switched to
fnmatch(3), so the dot is not special anymore.
What about later implementations?
The dot is special on
* before the dot,
that is, '*.' matches '.'.
The dot isn't special on:
-print"?find(1) required "-print".
However, it was changed to be the default action on 4.3BSD-Reno
and in POSIX. Thus it was still required on earlier SVR4-derivated
systems. I had a look at some implementations:
It is required on:
AIX 3.2, IRIX 5.3, SunOS 4.1.4, SunOS 5.1-5.4, Ultrix 4.5,
and also MUNIX 3.2, EP/IX 2.2.1AA
It's not required on:
Free BSD variants (Free-, Net-, OpenBSD) and BSDi/OS 4.1 and Minix 3.1.1,
GNU, AIX 4.3, HPUX 8 ff., IRIX 6.5, OpenServer 5.0.6 / 6.0.0, OSF1/V4 ff.,
SunOS 5.5 ff., Unicos 9.0.2.2, UnixWare 2,
busybox-1.01, sfind v1.1, AST find
-ls"?find didn't know "-ls".
It was introduced with 4.3BSD (early '86).
However, it has never been implemented in the vanilla SysV line (up to SVR4.2).
So it was added individually on non-BSDs.
It is for instance available on:
Free-, Net- and OpenBSD as well as GNU
(all from their beginning),
BSDi/OS 4.1, OSF1/V4, AIX 3.2, SunOS 5.1, EP-IX 2.2.1AA (/bsd43/bin/find),
sfind and AST find (since 98-11-11)
It is not available on:
HP-UX 11.31, Irix 6.5, SCO OpenServer 5.0.6 / 6.0.0, Reliant Unix,
Unicos 9.0.2.2, EP/IX 2.2.1AA (/usr/bin/find), Ultrix 4.5, Minix 3.1.1.
By the way, the resulting output imitates "ls -ldis":
it includes inode number and size in blocks
(with system specific blocksize), sometimes it's documented as kilobytes.
-path"?find didn't know "-path".
It is for example available:
GNU (since initial revision),
NetBSD (1.2, 12/93, sync with 4.4BSD).
FreeBSD and
OpenBSD (initial revisions),
HP-UX at least since 10.x,
sfind (since 0.93, 7/'04), AST find.
It is not available on:
SunOS 5.10 (bin and xpg4), IRIX 6.5, AIX 6.1, OpenServer 6.0.0 (bin and u95), OpenUnix 8, OSF1 V5.1, Ultrix 4.2, Minix 2.0.
-printf"?find -print") with
Some implementations that support the options -H/-L:
Several modern implementations substitute {} even if it is embedded in a string like this
find . -exec echo xx{}xx \;
However, the traditional find requires {} to be a separate argument.
The new behaviour was introduced around 386BSD,
according to the FreeBSD manpage archive.
POSIX/SUS only specifies {} standing alone, but allows them to be embedded
as implementation defined behaviour.
These variants for example accept an embedded {}:
· all traditional and free BSDs, BSDi/OS, Mac OS X
· GNU (probably from the beginning)
· AST (probably from the beginning)
· busybox since 1.1.0-pre1, 20051003 (where -exec was implemented itself)
· OpenServer 5.0.6
· OpenServer 6.0.0 (bin and posix)
but these don't accept an embedded {}:
· SunOS 5.9
· IRIX 6.5.22
· HP-UX 11.11 / 11.31
· sfind-1.1
· OpenServer 6.0.0 (u95-find)
· EP/IX 2.2.1
Consider the call
$SHELL -c 'command' arg0 arg1 ...
Most shells set $0 to arg0 and $1 to arg1, which means:
However: early variants 4
of both the Almquist shell and the the Korn shell (before ksh88f)
implement this differently: $0 is set to $SHELL
and $1 is set to arg0. Here you need:
$SHELL -c 'command' arg1 arg2 ...
This means: If you can rely on modern shell implementations, you will use "find ... -exec sh -c '...' find-sh {}".
But what if you need a portable call?
Back to find: if you only want to make use of one argument
per call (that is, $1 but not $2, $3, etc.),
you can work around by merging both variants:
find ... -exec sh -c '...' {} {}
But if you want to use more parameters (or just "$@")
then you have to decide on one of both variants.
SHELL -c 'shift $1; command' 2 1 arg1 arg2 ...
Example: You want to move all files in one directory to another directory,
but get an "arg list too long" due to ARG_MAX (too many files) with "mv * /dir".
For reasons of speed you might want to avoid a for-loop if you have
a really high number of files. Instead, you will use "find" with its + notation.
You will need the shell to hand over the last argument,
the target directory, to the "mv" command separately. 5
Here are two solutions, the first assumes a modern shell implementation,
the second is completely robust,
find . ! -name . -prune -exec sh -c 'mv "$@" targetdirectory/' sh {} +
find . ! -name . -prune -exec sh -c 'shift $1; mv "$@" targetdirectory/' 2 1 {} +
You do need the shell, because the following is not a valid syntax (it was not allowed to minimize possible confusion with a valid argument "+"):
find ... -exec mv {} target_directory +
| [4] |
Shell implementations with the old behaviour about arg0:
Older variants than ksh88f for example exist on · HP-UX 8-11 ("ksh" but not "sh"): ksh88c · AIX 3.2: ksh88d · Ultrix 4.5: ksh88a-c? · Unicos 9: ksh88e Older variants of ash exist on · all traditional BSDs that come with an ash (4.3BSD-Net/2 ... 4.4BSD-Lite2) · FreeBSD before 2.1.0 (10/95) · NetBSD before 1.2 (10/96) · Minix before 3.1.3 (5/06) |
| [5] |
GNU mv knows an option to prepend the target and allows to
find [...] -exec mv --target-directory=dir {} +
|
Gunnar Ritter's
Heirloom Toolchest
implements several traditional variants and POSIX/SUS.
The AT&T AST
toolchest is a POSIX/SUS implementation with numerous extensions.
The busybox toolchest is aiming at tiny implementations.
Jörg Schilling's
sfind
is a POSIX/SUS implementation.
| [sco6] |
SCO OpenServer 6.0.0 provides three find variants
(Thanks to Rodolfo Martín for access):
- /usr/bin/find - /usr/bin/posix/find - /u95/bin/find - interestingly more posix compliant than the posix variant, for example concerning pattern matching |