1

Apologies if this has been answered, I'm somewhat new to Linux but I didn't see anything here that was on target.

Anyway, I'm running this command:

find 2013-12-28 -name '*.gz' | xargs zcat | gzip > /fast/me/2013-12-28.csv.gz

The issue is that I need to run this command for about 250 distinct dates, so doing this one at a time is quite tedious.

What I want to do is have a script that will increment the date by 1 day after the "find" and in the file name. I really don't even know what this would look like, what commands to use, etc.

Background:

The find command is being used in a folder that's full of folders, each for 1 day of data. Each day's folder contains 24 subfolders, with each subfolder containing about 100 gzipped CSV files. So the find command is necessary 2 levels up from the folder because it will scan through each folder to combine all the data. The end result is that all the zipped up files are combined into 1 large zipped up file.

If anyone can help it would be hugely appreciated, otherwise I have about 250 more commands to execute, which obviously will suck.

4
  • Do these top-level folders contain the dates in their names? That would make things easier. Commented Dec 4, 2014 at 21:00
  • Hi eigenchris, yes they do. Commented Dec 4, 2014 at 21:02
  • Hi eigenchris, yes the top level folders are all named like "2014-01-01", "2014-01-02", etc. The subfolders are named "0-0", "0-1", "0-2", etc. The actual files look like "00:00:00.csv.gz", "00:05:00.csv.gz", "00:10:00.csv.gz", etc. Commented Dec 4, 2014 at 21:06
  • Are there actually dates being left out (excluded)? Commented Dec 4, 2014 at 21:12

2 Answers 2

3

What about something like this?

prev_date="2013-12-28"
for i in {0..250}; do
    next_date=$(date -d"$prev_date +1 day" +%Y-%m-%d)
    prev_date=$next_date
    find $next_date -name '*.gz' | xargs zcat | gzip > /fast/me/$next_date.csv.gz
done

It should iterate through 250 dates like:

2014-08-27
2014-08-28
2014-08-29
2014-08-30
2014-08-31
2014-09-01
2014-09-02
2014-09-03
2014-09-04
2014-09-05
Sign up to request clarification or add additional context in comments.

3 Comments

Hmm it made the files ok, but it didn't seem to execute the find command correctly. The output files should be between 500 MB and 1.5 GB, but the ones it made are empty.
Thanks for that though! Awesome stuff :)
ok it worked! thanks so much! i just had to remove the quotes in the 4th line around $next_date
0

jmunsch's solution works very well if the dates are sequential. Otherwise you could do this:

(edited to replace dash characters with colons)

for folderName in $(find . -type d -mindepth 1 -maxdepth 1 )
do
   date=$(basename $folderName)
   dateWithColons=$(echo $date | sed "s#-#:#g")  # this will replace - with :
   find "$folderName" -name '*.gz' | xargs zcat | gzip > /fast/me/$dateWithColons.csv.gz
done

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.