Capturing output of find . -print0 into a bash array

Question

Using find . -print0 seems to be the only safe way of obtaining a list of files in bash due to the possibility of filenames containing spaces, newlines, quotation marks etc.

However, I'm having a hard time actually making find's output useful within bash or with other command line utilities. The only way I have managed to make use of the output is by piping it to perl, and changing perl's IFS to null:

find . -print0 | perl -e '$/="\0"; @files=<>; print $#files;'

This example prints the number of files found, avoiding the danger of newlines in filenames corrupting the count, as would occur with:

find . | wc -l

As most command line programs do not support null-delimited input, I figure the best thing would be to capture the output of find . -print0 in a bash array, like I have done in the perl snippet above, and then continue with the task, whatever it may be.

How can I do this?

This doesn't work:

find . -print0 | ( IFS=$'\0' ; array=( $( cat ) ) ; echo ${#array[@]} )

A much more general question might be: How can I do useful things with lists of files in bash?

Oh, you know, the usual things arrays are useful for: finding out their size; iterating over their contents; printing them out backwards; sorting them. That kind of thing. There are a wealth of utilities in unix for doing these things with data: wc, bash's for-loops, tac and sort respectively; but these all seem useless when dealing with lists which might have spaces or newlines in them. I.e. filenames. Piping data around with null valued input-field-separators seems to be the solution, but very few utilities can handle this. — Idris
– Idris, Commented Jul 12, 2009 at 23:14
Here's an essay on how to properly handle filenames in shell, with lots of specifics: http://www.dwheeler.com/essays/filenames-in-shell.html — David A. Wheeler
– David A. Wheeler, Commented May 23, 2010 at 16:54

Gordon Davisson · Accepted Answer · 2024-02-12 18:37:02Z

113

Shamelessly stolen (with some changes) from Greg's BashFAQ:

a=()
while IFS= read -r -d '' file; do
    a+=("$file")        # or however you want to process each file
done < <(find /tmp -type f -print0)

Note that the redirection construct used here (cmd1 < <(cmd2)) is similar to, but not quite the same as the more usual pipeline (cmd2 | cmd1) -- if the commands are shell builtins (e.g. while), the pipeline version executes them in subshells, and any variables they set (e.g. the array a) are lost when they exit. cmd1 < <(cmd2) only runs cmd2 in a subshell, so the array lives past its construction. Warning: this form of redirection is only available in bash, not even bash in sh-emulation mode; you must start your script with #!/bin/bash.

Also, because the file processing step (in this case, just a+=("$file"), but you might want to do something fancier directly in the loop) has its input redirected, it cannot use any commands that might read from stdin. To avoid this limitation, I tend to use:

a=()
while IFS= read -r -d '' file <&3; do
    a+=("$file")         # or however you want to process each file
done 3< <(find /tmp -type f -print0)

...which passes the file list via unit 3, rather than stdin.

edited Feb 12, 2024 at 18:37

answered Jul 13, 2009 at 17:36

Gordon Davisson

128k16 gold badges141 silver badges168 bronze badges

Sign up to request clarification or add additional context in comments.

10 Comments

Idris Over a year ago

Ahhh almost there... this is the best answer yet. However, I've just tried it on a directory containing a file with a newline in its name, and upon inspecting that element using echo ${a[1]}, the newline seems to have become a space (0x20). Any idea why this is happening?

Gordon Davisson Over a year ago

What version of bash are you running? I've had trouble with older versions (unfortunately I don't remember precisely which) not dealing with newlines and deletes (\177) in strings. IIRC, even x="$y" wouldn't always work right with these characters. I just tested with bash 2.05b.0 and 3.2.17 (the oldest and newest I have handy); both handled newlines properly, but v2.05b.0 ate the delete character.

l0b0 Over a year ago

-d '' is equivalent to -d $'\0'.

dogbane Over a year ago

An easier way to add an element to the end of an array is: arr+=("$file")

Gordon Davisson Over a year ago

@CMCDragonkai: readarray was added in bash version 4, which was barely out when I wrote this answer. And some OSes (cough macOS cough) still use bash v3, so it still isn't safe to assume readarray is available. As a result, I haven't actually done the work needed to figure out the possible gotchas with readarray and how to avoid them. If I get around to it, I'll update the answer.

|

gniourf_gniourf · Accepted Answer · 2017-09-14 15:37:59Z

16

Since Bash 4.4, the builtin mapfile has the -d switch (to specify a delimiter, similar to the -d switch of the read statement), and the delimiter can be the null byte. Hence, a nice answer to the question in the title

Capturing output of find . -print0 into a bash array

is:

mapfile -d '' ary < <(find . -print0)

answered Sep 14, 2017 at 15:37

gniourf_gniourf

47.4k10 gold badges105 silver badges113 bronze badges

4 Comments

user unknown Over a year ago

That looks much more elegant and worked like a charm for locate, too: mapfile -d '' list < <(locate -b -0 -r "$1$").

Jonathan Mayer Over a year ago

This answer is correct and elegant, though I made the mistake of re-ordering the arguments to mapfile: mapfile ary -d '' does not do the same thing.

luckman212 Over a year ago

Is mapfile -d '' equivalent to mapfile -d $'\0'?

gniourf_gniourf Over a year ago

@luckman212 yes, but $'\0' does nothing more that '' (and actually, $'\0' doesn't really make sense since variables can't handle null bytes: $'\0' just expands to the null string '').

Balázs Pozsár · Accepted Answer · 2009-07-12 22:08:17Z

6

2 Comments

Idris Over a year ago

This isn't quite what I was after, because there is no opportunity to do array-like things with the list, such as sorting: you must use each element as and when it appears out of the find command. If you could elaborate on this example, with the "do_something_useful" part being a bash array-push operation, then this might be what I'm after.

Daniel Hauck Jul 22 at 12:05

Why not use -exec at that point?

zstegi · Accepted Answer · 2011-10-29 20:18:32Z

6

The main problem is, that the delimiter NUL (\0) is useless here, because it isn't possible to assign IFS a NUL-value. So as good programmers we take care, that the input for our program is something it is able to handle.

First we create a little program, which does this part for us:

#!/bin/bash
printf "%s" "$@" | base64

...and call it base64str (don't forget chmod +x)

Second we can now use a simple and straightforward for-loop:

for i in `find -type f -exec base64str '{}' \;`
do 
  file="`echo -n "$i" | base64 -d`"
  # do something with file
done

So the trick is, that a base64-string has no sign which causes trouble for bash - of course a xxd or something similar can also do the job.

edited Oct 29, 2011 at 20:18

answered Oct 29, 2011 at 10:47

zstegi

1272 silver badges2 bronze badges

3 Comments

Demi Over a year ago

One must ensure that the part of the filesystem that find is processing does not change from when find is invoked until when the script completes. If this is not the case, a race condition results, which can be exploited to invoke commands on the wrong files. For instance a directory to be deleted (say /tmp/junk) could be replaced by a symlink to /home by an unprivaliged user. If the find command was running as root, and it was find -type d -exec rm -rf '{}' \;, this would delete all users' home folders.

Charles Duffy Over a year ago

read -r -d '' will read everything up to the next NUL into "$REPLY". There's no need to care about IFS.

Antoine 'hashar' Musso Over a year ago

That depends on your shell I guess? With bash 5.2.15 read -r -d '' yields bash: warning: command substitution: ignored null byte in input

bitwise · Accepted Answer · 2009-07-13 06:49:58Z

4

Yet another way of counting files:

find /DIR -type f -print0 | tr -dc '\0' | wc -c

answered Jul 13, 2009 at 6:49

bitwise

Comments

Jérôme Pouiller · Accepted Answer · 2016-06-24 10:05:58Z

2

Gordon Davisson's answer is great for bash. However a useful shortcut exist for zsh users:

First, place you string in a variable:

A="$(find /tmp -type f -print0)"

Next, split this variable and store it in an array:

B=( ${(s/^@/)A} )

There is a trick: ^@ is the NUL character. To do it, you have to type Ctrl+V followed by Ctrl+@.

You can check each entry of $B contains right value:

for i in "$B[@]"; echo \"$i\"

Careful readers may notice that call to find command may be avoided in most cases using ** syntax. For example:

B=( /tmp/** )

answered Jun 24, 2016 at 10:05

Jérôme Pouiller

10.5k6 gold badges46 silver badges51 bronze badges

Comments

Stephan202 · Accepted Answer · 2009-07-12 21:55:34Z

1

I think more elegant solutions exists, but I'll toss this one in. This will also work for filenames with spaces and/or newlines:

i=0;
for f in *; do
  array[$i]="$f"
  ((i++))
done

You can then e.g. list the files one by one (in this case in reverse order):

for ((i = $i - 1; i >= 0; i--)); do
  ls -al "${array[$i]}"
done

This page gives a nice example, and for more see Chapter 26 in the Advanced Bash-Scripting Guide.

edited Jul 12, 2009 at 21:55

answered Jul 12, 2009 at 21:48

Stephan202

61.9k14 gold badges132 silver badges135 bronze badges

1 Comment

Idris Over a year ago

This (and other similar examples below) is almost what I'm after - but with a big problem: it only works for globs of the current directory. I would like to be able to manipulate completely arbitrary lists of files; the output of "find" for example, which lists directories recursively, or any other list. What if my list was: ( /tmp/foo.jpg | /home/alice/bar.jpg | /home/bob/my holiday/baz.jpg | /tmp/new\nline/grault.jpg ), or any other totally arbitrary list of files (of course, potentially with spaces and newlines in them)?

Balázs Pozsár · Accepted Answer · 2009-07-12 22:11:06Z

1

You can safely do the count with this:

find . -exec echo ';' | wc -l

(It prints a newline for every file/dir found, and then count the newlines printed out...)

answered Jul 12, 2009 at 22:11

Balázs Pozsár

1,68914 silver badges11 bronze badges

1 Comment

Oliver I Over a year ago

It is much faster to use the -printf option instead of -exec for every file: find . -printf "\n" | wc -l

caruso · Accepted Answer · 2009-07-13 08:36:24Z

1

Avoid xargs if you can:

man ruby | less -p 777 
IFS=$'\777' 
#array=( $(find ~ -maxdepth 1 -type f -exec printf "%s\777" '{}' \; 2>/dev/null) ) 
array=( $(find ~ -maxdepth 1 -type f -exec printf "%s\777" '{}' + 2>/dev/null) ) 
echo ${#array[@]} 
printf "%s\n" "${array[@]}" | nl 
echo "${array[0]}" 
IFS=$' \t\n'

answered Jul 13, 2009 at 8:36

caruso

1 Comment

sschober Over a year ago

Why do you set IFS to \777?

pete · Accepted Answer · 2009-08-18 19:32:18Z

1

I am new but I believe that this an answer; hope it helps someone:

STYLE="$HOME/.fluxbox/styles/"

declare -a array1

LISTING=`find $HOME/.fluxbox/styles/ -print0 -maxdepth 1 -type f`


echo $LISTING
array1=( `echo $LISTING`)
TAR_SOURCE=`echo ${array1[@]}`

#tar czvf ~/FluxieStyles.tgz $TAR_SOURCE

answered Aug 18, 2009 at 19:32

pete

Comments

oHo · Accepted Answer · 2016-06-23 05:01:27Z

1

Old question, but no-one suggested this simple method, so I thought I would. Granted if your filenames have an ETX, this doesn't solve your problem, but I suspect it serves for any real-world scenario. Trying to use null seems to run afoul of default IFS handling rules. Season to your tastes with find options and error handling.

savedFS="$IFS"
IFS=$'\x3'
filenames=(`find wherever -printf %p$'\x3'`)
IFS="$savedFS"

edited Jun 23, 2016 at 5:01

oHo

55.1k30 gold badges175 silver badges210 bronze badges

answered Feb 13, 2016 at 2:24

Dennis Simpson

1471 silver badge7 bronze badges

2 Comments

oHo Over a year ago

What does mean ETX? Maybe filename EXTension or perhaps End of Text...

Chris Combs Over a year ago

ETX is ASCII character #3, indicated here as '\x3'. "End of Text"

Dennis Williamson · Accepted Answer · 2009-07-13 04:39:55Z

0

This is similar to Stephan202's version, but the files (and directories) are put into an array all at once. The for loop here is just to "do useful things":

files=(*)                        # put files in current directory into an array
i=0
for file in "${files[@]}"
do
    echo "File ${i}: ${file}"    # do something useful 
    let i++
done

To get a count:

echo ${#files[@]}

answered Jul 13, 2009 at 4:39

Dennis Williamson

364k95 gold badges386 silver badges446 bronze badges

Comments

Timmmm · Accepted Answer · 2013-01-06 13:14:29Z

-3

Bash has never been good at handling filenames (or any text really) because it uses spaces as a list delimiter.

I'd recommend using python with the sh library instead.

answered Jan 6, 2013 at 13:14

Timmmm

99.3k83 gold badges427 silver badges605 bronze badges

Collectives™ on Stack Overflow

Capturing output of find . -print0 into a bash array

13 Answers 13

10 Comments

4 Comments

2 Comments

3 Comments

Comments

Comments

1 Comment

1 Comment

1 Comment

Comments

2 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

13 Answers 13

10 Comments

4 Comments

2 Comments

3 Comments

Comments

Comments

1 Comment

1 Comment

1 Comment

Comments

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related