Return to Answer

show od and head -c; warn explicitly about null bytes

Source Link

edited Apr 7, 2011 at 22:27

866.5k
205
1.8k
2.3k

Shell utilities aren't geared towards dealing with binary files. While whatIf you want is technically possible (usingto stick with shell utilities, you can use head to extract a fixed number of charactersbytes, and od to convert the lengtha byte tointo a number).

export LC_ALL=C    # make sure we aren't in a multibyte locale
n=$(head -c 1 | od -An -t u1)
string=$(head -c $n)

However, it'll bethis does not work for binary data. There are two problems:

Command substitution $(…) strips final newlines in the command output. There's a fairly easy workaround: make sure the output ends in a character other than a newline, then strip that one character.
```
  string=$(head -c $n; echo .); string=${string%.}
```

Bash, like most shells, is bad at dealing with null bytes. As of bash 4.1, null bytes are simply dropped from the result of the command substitution. Dash 0.5.5 and pdksh 5.2 have the same behavior, and ATT ksh stops reading at the first null byte. In general, shells and their utilities aren't geared towards dealing with binary files. (Zsh is the exception, it's designed to support null bytes.)

If you have binary data, you'll want to switch to a lot easier inlanguage like Perl or Python.

<input_file perl -e '
  read STDIN, $c, 1 or die $!;    # read length byte
  $n = read STDIN, $s, ord($c);   # read data
  die $! if !defined $n;
  die "Input file too short" if ($n != ord($c));
  # post-processProcess $s here
'

Note that if you write string=$(perl …), if the binary data happens to end in a newline, the shell will strip it. I recommend doing the postprocessing in Perl; but if you want to stick with the shell, you can tell Perl to append a character and have the shell strip it:

string=$(<input_file perl -e '…; print "$s."') && string=${string%.}

Python equivalent:

string=$(<input_file python -c '
  import sys
  n = ord(sys.stdin.read(1))      # read length byte
  s = sys.stdin.read(n)           # read data
  if len(s) < n: raise ValueError("input file too short")
  print# sProcess +s "."here
') && string=${string%.}

Shell utilities aren't geared towards dealing with binary files. While what you want is technically possible (using head to extract a fixed number of characters and od to convert the length byte to a number), it'll be a lot easier in Perl or Python.

<input_file perl -e '
  read STDIN, $c, 1 or die $!;    # read length byte
  $n = read STDIN, $s, ord($c);   # read data
  die $! if !defined $n;
  die "Input file too short" if ($n != ord($c));
  # post-process $s here
'

string=$(<input_file perl -e '…; print "$s."') && string=${string%.}

Python equivalent:

string=$(<input_file python -c '
  import sys
  n = ord(sys.stdin.read(1))
  s = sys.stdin.read(n)
  if len(s) < n: raise ValueError("input file too short")
  print s + "."
') && string=${string%.}

If you want to stick with shell utilities, you can use head to extract a number of bytes, and od to convert a byte into a number.

export LC_ALL=C    # make sure we aren't in a multibyte locale
n=$(head -c 1 | od -An -t u1)
string=$(head -c $n)

However, this does not work for binary data. There are two problems:

Command substitution $(…) strips final newlines in the command output. There's a fairly easy workaround: make sure the output ends in a character other than a newline, then strip that one character.
```
  string=$(head -c $n; echo .); string=${string%.}
```

Bash, like most shells, is bad at dealing with null bytes. As of bash 4.1, null bytes are simply dropped from the result of the command substitution. Dash 0.5.5 and pdksh 5.2 have the same behavior, and ATT ksh stops reading at the first null byte. In general, shells and their utilities aren't geared towards dealing with binary files. (Zsh is the exception, it's designed to support null bytes.)

If you have binary data, you'll want to switch to a language like Perl or Python.

<input_file perl -e '
  read STDIN, $c, 1 or die $!;    # read length byte
  $n = read STDIN, $s, ord($c);   # read data
  die $! if !defined $n;
  die "Input file too short" if ($n != ord($c));
  # Process $s here
'

<input_file python -c '
  import sys
  n = ord(sys.stdin.read(1))      # read length byte
  s = sys.stdin.read(n)           # read data
  if len(s) < n: raise ValueError("input file too short")
  # Process s here
'

Source Link

answered Apr 7, 2011 at 7:29

Gilles 'SO- stop being evil'

866.5k
205
1.8k
2.3k

<input_file perl -e '
  read STDIN, $c, 1 or die $!;    # read length byte
  $n = read STDIN, $s, ord($c);   # read data
  die $! if !defined $n;
  die "Input file too short" if ($n != ord($c));
  # post-process $s here
'

string=$(<input_file perl -e '…; print "$s."') && string=${string%.}

Python equivalent:

string=$(<input_file python -c '
  import sys
  n = ord(sys.stdin.read(1))
  s = sys.stdin.read(n)
  if len(s) < n: raise ValueError("input file too short")
  print s + "."
') && string=${string%.}