0

I want to extract data from a file which looks like this :

BK20120802130531:/home/michael/Scripts/usb_backup.sh
BK20120802130531:/home/michael/Scripts/yad_0.17.1.1-1_i386.deb
BK20120802130731:/home/michael/Scripts/gbk.sh
BK20120802130131:/home/michael/Scripts/alt-notify-send.sh
BK20120802130131:/home/michael/Scripts/bk.bak
BK20120802130131:/home/michael/Scripts/bk.sh
BK20120802130131:/home/michael/Scripts/demande_password.sh

The idea is to show on the screen (without creating a temporary file, nor modifying the original file) what follows :

alt-notify-send.sh
/home/michael/Scripts
bk.bak
/home/michael/Scripts
bk.sh
/home/michael/Scripts
demande_password.sh
/home/michael/Scripts
gbk.sh
/home/michael/Scripts
usb_backup.sh
/home/michael/Scripts
yad_0.17.1.1-1_i386.deb
/home/michael/Scripts

To sum up :

  1. Strip the characters before ':'
  2. Put the filenames before their corresponding directory
  3. Sort the filenames by alphabetical order
  4. Do a carriage return between each filename and its corresponding directory

I succeed doing all this, but there is still an ugly thing in my code concerning point #4 :

cut -f 2 -d ':' $big_file | \
sort -u | \
while read file ; do
   echo "$(basename "$file")zipzapzupzop$(dirname "$file")" # <-- ugly thing #1
done | \
sort -dfb | \
while read line ; do
   echo $line
done | \
sed 's/zipzapzupzop/\n/' # <-- ugly thing #2

At the beginning, I had written :

echo "$(basename "$file")\n$(dirname "$file")"

in place of ugly thing#1, in order to be able to do

echo -e "$line"

in the second while boucle. However, the read command strips each time the '\n' string, so that I obtain

alt-notify-send.shn/home/michael/Scripts
bk.bakn/home/michael/Scripts
bk.shn/home/michael/Scripts
demande_password.shn/home/michael/Scripts
gbk.shn/home/michael/Scripts
usb_backup.shn/home/michael/Scripts
yad_0.17.1.1-1_i386.debn/home/michael/Scripts

I tried to protect the '\' character by another '\', but the result is the same.

man read

is of no help either. So, is it a proper way to do this ?

1
  • echo "$(basename "$file") doesn't quote $file, the second double quote ends the quoting. You need to escape the internal quotes or use single quotes. Commented Aug 21, 2012 at 7:46

4 Answers 4

1

read is a shell builtin, and man read may be giving you the docs for the (mostly unrelated) syscall.

read -r will prevent read from processing \ sequences.

The whole thing could have been done with a single awk script though:

awk '
    {
        start = index($0, ":") + 1
        end = match($0, "[^/]*$")
        out[NR] = substr($0, end) "\n" substr($0, start, end - start - 1)
    }
    END {
        asort(out)
        for (i = 1; i <= NR; i++)
            print out[i]
    }'
Sign up to request clarification or add additional context in comments.

Comments

0

If you don't need to handle spaces in filenames, you can do this:

cat $bigfile | sed 's/.*://' | while read file; do
  echo "$(basename $file) $(dirname $file)"
done | sort | awk '{print $1"\n"$2}'

2 Comments

Useless use of cat. sed 's/.*//' $bigfile, or < $bigfile sed 's/.*://'
@chepner: I think it makes the pipeline cleaner, but others may disagree.
0

You can do it with the following pipeline (should be on one line, I've split it and added comments for readability):

| sed -e 's/^[^:]*://'             # Remove from start of line to first ':'
      -e 's?/\([^/]*$\)? \1?'      # Replace final '/' with a space
| sort -k2                         # Sort on column 2 (filename)
| awk '{print $2"\n"$1}'           # Reverse fields

See the following transcript:

echo 'BK20120802130531:/home/michael/Scripts/usb_backup.sh
BK20120802130531:/home/michael/Scripts/yad_0.17.1.1-1_i386.deb
BK20120802130731:/home/michael/Scripts/gbk.sh
BK20120802130131:/home/michael/Scripts/alt-notify-send.sh
BK20120802130131:/home/michael/Scripts/bk.bak
BK20120802130131:/home/michael/Scripts/bk.sh
BK20120802130131:/home/michael/Scripts/demande_password.sh'
    | sed -e 's/^[^:]*://'
          -e 's?/\([^/]*$\)? \1?'
    | sort -k2
    | awk '{print $2"\n"$1}'

alt-notify-send.sh
/home/michael/Scripts
bk.bak
/home/michael/Scripts
bk.sh
/home/michael/Scripts
demande_password.sh
/home/michael/Scripts
gbk.sh
/home/michael/Scripts
usb_backup.sh
/home/michael/Scripts
yad_0.17.1.1-1_i386.deb
/home/michael/Scripts

Just keep in mind that sort may not work as expected with lines containing spaces.

Comments

0

Assuming you do not have hash tags in your filenames you could use this coreutils pipeline:

cut -d: -f2- infile               \
| sed -r 's,(.*)/([^/]*)$,\2#\1,' \
| sort -t'#'                      \
| tr '#' '\n'
  • cut removes the first part.
  • sed splits the path, swaps filename and directory and delimits them with a #.
  • sort hash tag delimited text.
  • tr finally replaces the hash tag with a newline.

If you know the number of path elements, you can use the simpler version:

cut -d: -f2- infile \
| sort -t/ -k4,4    \
| sed 's,(.*)/([^/]*)$,\2\n\1,'

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.