0

The input is given as: rec = [b'1674278797,14.33681', b'1674278798,6.03617', b'1674278799,12.78418'] I want to get a DataFrame like:

df
    timestamp       val
0  1674278797  14.33681
1  1674278798   6.03617
2  1674278799  12.78418

What is the most efficient way? Thanks!

If I can convert rec like [[1674278797,14.33681], [1674278798,6.03617], [1674278799,12.78418]] It would be easy for me by calling df = pd.DataFrame(rec, columns=['timestamp','val']) But I don't know how to do the conversion quickly.

btw, I got rec from a Redis list. I can modify the format of each element (for example, b'1674278797,14.33681' is an element) if necessory.

2
  • Where did this input come from? This looks like an attempt to read an ASCII string as raw bytes instead of a string. Or rather, the lines in a file as raw bytes, instead of a single string. It's far easier to let Pandas load the original data than try to make it work with the converted one. pd.read_csv(the_original_file) would just work Commented Feb 2, 2023 at 7:50
  • The data source is a recorder. It generates data 1~2 seconds per row. Only the recent 24 hours data is useful. There are several clients to access the data. So, I use Redis list to store the records. Commented Feb 2, 2023 at 9:11

2 Answers 2

2

If you can't directly handle the original input, you can use:

(pd.Series([x.decode('utf-8') for x in rec])
   .str.split(',', expand=True).convert_dtypes()
   .set_axis(['timestamp', 'val'], axis=1)
)

Or:

import io

pd.read_csv(io.StringIO('\n'.join([x.decode('utf-8') for x in rec])),
            header=None, names=['timestamp', 'val'])

Output:

    timestamp       val
0  1674278797  14.33681
1  1674278798   6.03617
2  1674278799  12.78418
Sign up to request clarification or add additional context in comments.

Comments

1

You can do this in one line:

pd.DataFrame([x.decode().split(",") for x in rec], columns=["timestamp","val"])

Returns

    timestamp       val
0  1674278797  14.33681
1  1674278798   6.03617
2  1674278799  12.78418

If you want to convert the datatypes of the column you can add .astype({"timestamp": "int64", "val": "float64"}) to the end of the line.

1 Comment

Very simple and clear. Thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.