0

I have a large csv file containing data like:

2018-09, 100, A, 2018-10, 50, M, 2018-11, 69, H,....

and so on. (continuous stream without separate rows)

I would want to convert it into dataframe, which would look something like

Col1     Col2  Col3
2018-09  100   A
2018-10  50    M
2018-11  69    H

This is a simplified version of the actual data. Please advice what would be the best way to approach it.

Edit: To clarify, my csv file doesn't have separate lines for each row. All the data is on one row.

4
  • pandas.read_csv Commented Nov 9, 2018 at 14:42
  • df = pd.read_csv('pandas_dataframe_importing_csv/example.csv', names=['UID', 'First Name', 'Last Name', 'Age', 'Pre-Test Score', 'Post-Test Score']) Commented Nov 9, 2018 at 14:43
  • 1
    it seems like new line format is unclear in the csv file. Is that the case? Commented Nov 9, 2018 at 15:17
  • Yes. My csv file doesn't have separate lines, for each row. Commented Nov 9, 2018 at 15:24

2 Answers 2

3

One solution is to split your single row into chunks via the csv module and this algorithm, then feed to pd.DataFrame constructor. Note your dataframe will be of dtype object, so you'll have to cast numeric series types explicitly afterwards.

from io import StringIO
import pandas as pd
import csv

x = StringIO("""2018-09, 100, A, 2018-10, 50, M, 2018-11, 69, H""")

# define chunking algorithm
def chunks(L, n):
    """Yield successive n-sized chunks from l."""
    for i in range(0, len(L), n):
        yield L[i:i + n]

# replace x with open('file.csv', 'r')
with x as fin:
    reader = csv.reader(fin, skipinitialspace=True)
    data = list(chunks(next(iter(reader)), 3))

# read dataframe
df = pd.DataFrame(data)

print(df)

         0    1  2
0  2018-09  100  A
1  2018-10   50  M
2  2018-11   69  H
Sign up to request clarification or add additional context in comments.

Comments

2
data = pd.read_csv('tmp.txt', sep=',\s *', header=None).values
pd.DataFrame(data.reshape(-1, 3), columns=['Col1', 'Col2', 'Col3'])

returns

      Col1 Col2 Col3
0  2018-09  100    A
1  2018-10   50    M
2  2018-11   69    H

1 Comment

Can you provide a little explanation? What does reshape do with -1? What does sep=',\s *' do? Are numeric types preserved? I actually like this solution more than mine so +1.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.