How do I prepend variable to open file stream when using split to create csv's?

Ask Question

Asked 3 years, 3 months ago

Modified 3 years, 3 months ago

Viewed 90 times

I have a bash file that takes a large csv and splits the csv into smaller csv's based on this blog https://medium.com/swlh/automatic-s3-file-splitter-620d04b6e81c. It works well as it is fast never downloading the csv's which is great for a lambda. The csv's after they split do not have headers only the originating csv. This is problem for me since I am not able to read with apache pyspark a set of files one with header row and many other files without header rows.

I want to add a header row to each csv written.

What the code does

INFILE

"s3//test-bucket/test.csv"

OUTFILES - split into 300K lines

"s3//dest-test-bucket/test.00.csv"
"s3//dest-test-bucket/test.01.csv"
"s3//dest-test-bucket/test.02.csv"
"s3//dest-test-bucket/test.03.csv"

AWS documentation states

You can use the dash parameter for file streaming to standard input (stdin) or standard output (stdout).

I don't know if this is even possible with an open file stream.

Original code that works

LINECOUNT=300000
INFILE=s3://"${S3_BUCKET}"/"${FILENAME}"
OUTFILE=s3://"${DEST_S3_BUCKET}"/"${FILENAME%%.*}"

FILES=($(aws s3 cp "${INFILE}" - | split -d -l ${LINECOUNT} --filter "aws s3 cp - \"${OUTFILE}_\$FILE.csv\"  | echo \"\$FILE.csv\""))

This was my attempt to add a variable to outgoing file stream, but it did not work.

LINECOUNT=300000
INFILE=s3://"${S3_BUCKET}"/"${FILENAME}"
OUTFILE=s3://"${DEST_S3_BUCKET}"/"${FILENAME%%.*}"

HEADER=$(aws s3 cp "${INFILE}" - | head -n 1)

FILES=($(aws s3 cp "${INFILE}" - | split -d -l ${LINECOUNT} --filter "echo ${HEADER}; aws s3 cp - \"${OUTFILE}_\$FILE.csv\"  | echo \"\$FILE.csv\""))

edited Sep 9, 2022 at 16:20

asked Sep 9, 2022 at 13:58

vfrank66

1011 bronze badge

The script in that article you reference has some beginner mistakes that will cause breakage and/or security issues given some input and some environment settings, don't use it. If you'd like to know how to split a CSV, post a question with a sample CSV and expected output.

Ed Morton
– Ed Morton

2022-09-10 21:08:08 +00:00
Commented Sep 10, 2022 at 21:08
One line sentence is mandatory? With several lines and iterations it will be readable and elegant

JRichardsz
– JRichardsz

2022-09-18 14:35:35 +00:00
Commented Sep 18, 2022 at 14:35
one line is not necessary as long as it is streaming and not downloaded to the local machine which would cause out of memory exceptions

vfrank66
– vfrank66

2022-10-14 14:32:36 +00:00
Commented Oct 14, 2022 at 14:32

Add a comment |

0 You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Stack Exchange Network

How do I prepend variable to open file stream when using split to create csv's?

0

You must log in to answer this question.

Hot Network Questions

How do I prepend variable to open file stream when using split to create csv's?

0

You must log in to answer this question.

Related

Hot Network Questions