Merge files with same name in multiple subfolders

Question

I have multiple folders with a lot of files, Each folder has txt files of the same name, I want to merge files with the same name into one txt file.

Example:

folder/
     -sub1
     -sub2
     -sub3
      .
      .
      .
     -sub28

In each subfolder has a multiple files:

EAF001.ID001.txt  EAF001.ID002.txt  EAF001.ID003.txt  EAF001.ID004.txt
EAF001.ID005.txt  EAF001.ID006.txt  EAF001.ID007.txt  EAF001.ID008.txt
EAF001.ID009.txt  EAF001.ID010.txt  EAF001.ID011.txt  EAF001.ID012.txt
EAF001.ID013.txt  EAF001.ID014.txt  EAF001.ID015.txt  EAF001.ID016.txt

I want merge files with the same name.

EAF001.ID001.merge.txt  EAF001.ID002.merge.txt  EAF001.ID003.merge.txt  EAF001.ID004.merge.txt
EAF001.ID005.merge.txt  EAF001.ID006.merge.txt  EAF001.ID007.merge.txt  EAF001.ID008.merge.txt
EAF001.ID009.merge.txt  EAF001.ID010.merge.txt  EAF001.ID011.merge.txt  EAF001.ID012.merge.txt
EAF001.ID013.merge.txt  EAF001.ID014.merge.txt  EAF001.ID015.merge.txt  EAF001.ID016.merge.txt

Any help would be much appreciated.

cas · Accepted Answer · 2023-04-25 03:24:34Z

export dir='/path/to/folder'

find "$dir" -mindepth 2 -type f -name 'EAF*.txt' \
  -exec sh -c 'for f; do
                 bn=$(basename "$f" .txt);
                 cat "$f" >> "$dir/$bn.merged.txt";
               done' sh {} +

The -mindepth 2 option excludes files in the /path/to/folder directory itself from being processed (i.e. it finds only files in sub-directories), so that it doesn't concatenate the output files onto themselves if they already exist.

This appends files to the 'merged.txt' output file whether there are duplicate filenames or not.

If you only want duplicated filenames to be merged:

typeset -Ax counts # declare $counts to be an exported associative array
export dir='/path/to/folder'

# find out how many there are of each filename
while read -d '' -r f; do
  let counts[$f]++;
done < <(find "$dir" -mindepth 2 -type f -name 'EAF*.txt' -print0)

# concatenate only the duplicates
find "$dir" -mindepth 2 -type f -name 'EAF*.txt' \
  -exec bash -c 'for f; do
                   if [ "${counts[$f]}" -gt 1 ]; then
                     bn=$(basename "$f" .txt);
                     cat "$f" >> "$dir/$bn.merged.txt";
                   fi
                 done' sh {} +

This requires bash or some other shell that supports associative arrays (i.e. not POSIX sh).

protob · Accepted Answer · 2023-04-25 01:46:34Z

0

You can loop through the txt files and count the duplicated names using find and wc. If the count of duplicate names is greater than 1, append it to the merge.txt file.

#!/bin/bash

output_dir="output"
rm -rf "$output_dir"
mkdir "$output_dir"

for file in */*.txt; do
  file_name=$(basename "$file" .txt)
  duplicate_names_count=$(find . -type f -name "$file_name.txt" |  wc -l)
  if [ "$duplicate_names_count" -gt 1 ]; then
    cat "$file" >> "$output_dir/${file_name}.merge.txt"
  fi
done

answered Apr 25, 2023 at 1:46

protob

2811 silver badge7 bronze badges

works like a charm! Thank you!

Roq
– Roq

2023-04-25 01:57:55 +00:00
Commented Apr 25, 2023 at 1:57
1

You really don't want to run find once for every filename in that loop, that will be extremely slow (and the more files & directories there are, the slower it will be). Run find only once before the for loop and use it to populate an associative array (with key = filename, value = count) and use that as a lookup table for the filename counts. Something like: declare -A counts; while read -d '' -r f; do let counts["$f"]++; done < <(find . -type f -name 'EAF*.txt' -print0). Then in the main loop, check if "${counts[$file]}" -gt 1

cas
– cas

2023-04-25 03:15:18 +00:00
Commented Apr 25, 2023 at 3:15

Add a comment |

Stack Exchange Network

Merge files with same name in multiple subfolders

2 Answers 2

You must log in to answer this question.

Hot Network Questions

Merge files with same name in multiple subfolders

2 Answers 2

You must log in to answer this question.

Related

Hot Network Questions