0

I have multiple folders with a lot of files, Each folder has txt files of the same name, I want to merge files with the same name into one txt file.

Example:

folder/
     -sub1
     -sub2
     -sub3
      .
      .
      .
     -sub28

In each subfolder has a multiple files:

EAF001.ID001.txt  EAF001.ID002.txt  EAF001.ID003.txt  EAF001.ID004.txt
EAF001.ID005.txt  EAF001.ID006.txt  EAF001.ID007.txt  EAF001.ID008.txt
EAF001.ID009.txt  EAF001.ID010.txt  EAF001.ID011.txt  EAF001.ID012.txt
EAF001.ID013.txt  EAF001.ID014.txt  EAF001.ID015.txt  EAF001.ID016.txt

I want merge files with the same name.

EAF001.ID001.merge.txt  EAF001.ID002.merge.txt  EAF001.ID003.merge.txt  EAF001.ID004.merge.txt
EAF001.ID005.merge.txt  EAF001.ID006.merge.txt  EAF001.ID007.merge.txt  EAF001.ID008.merge.txt
EAF001.ID009.merge.txt  EAF001.ID010.merge.txt  EAF001.ID011.merge.txt  EAF001.ID012.merge.txt
EAF001.ID013.merge.txt  EAF001.ID014.merge.txt  EAF001.ID015.merge.txt  EAF001.ID016.merge.txt

Any help would be much appreciated.

2 Answers 2

1
export dir='/path/to/folder'

find "$dir" -mindepth 2 -type f -name 'EAF*.txt' \
  -exec sh -c 'for f; do
                 bn=$(basename "$f" .txt);
                 cat "$f" >> "$dir/$bn.merged.txt";
               done' sh {} +

The -mindepth 2 option excludes files in the /path/to/folder directory itself from being processed (i.e. it finds only files in sub-directories), so that it doesn't concatenate the output files onto themselves if they already exist.

This appends files to the 'merged.txt' output file whether there are duplicate filenames or not.

If you only want duplicated filenames to be merged:

typeset -Ax counts # declare $counts to be an exported associative array
export dir='/path/to/folder'

# find out how many there are of each filename
while read -d '' -r f; do
  let counts[$f]++;
done < <(find "$dir" -mindepth 2 -type f -name 'EAF*.txt' -print0)

# concatenate only the duplicates
find "$dir" -mindepth 2 -type f -name 'EAF*.txt' \
  -exec bash -c 'for f; do
                   if [ "${counts[$f]}" -gt 1 ]; then
                     bn=$(basename "$f" .txt);
                     cat "$f" >> "$dir/$bn.merged.txt";
                   fi
                 done' sh {} +

This requires bash or some other shell that supports associative arrays (i.e. not POSIX sh).

0

You can loop through the txt files and count the duplicated names using find and wc. If the count of duplicate names is greater than 1, append it to the merge.txt file.

#!/bin/bash

output_dir="output"
rm -rf "$output_dir"
mkdir "$output_dir"

for file in */*.txt; do
  file_name=$(basename "$file" .txt)
  duplicate_names_count=$(find . -type f -name "$file_name.txt" |  wc -l)
  if [ "$duplicate_names_count" -gt 1 ]; then
    cat "$file" >> "$output_dir/${file_name}.merge.txt"
  fi
done
2
  • works like a charm! Thank you! Commented Apr 25, 2023 at 1:57
  • 1
    You really don't want to run find once for every filename in that loop, that will be extremely slow (and the more files & directories there are, the slower it will be). Run find only once before the for loop and use it to populate an associative array (with key = filename, value = count) and use that as a lookup table for the filename counts. Something like: declare -A counts; while read -d '' -r f; do let counts["$f"]++; done < <(find . -type f -name 'EAF*.txt' -print0). Then in the main loop, check if "${counts[$file]}" -gt 1 Commented Apr 25, 2023 at 3:15

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.