1

I have a log file from a device with some information for analysis, and I am using Pandas and Python for reading this file for some analysis purposes. So, let's suppose below is the content of the file.

20:03:42.769 0xAA 0x99->0x98 A200000000000000000088DD
20:03:42.769 0x11 0x00->0x00 B2
20:03:42.815 0xAA 0x98->0x99 5203001C0C1C0C0E0E
20:03:42.831 0x11 0x00->0x00 EE
20:03:50.726 0xAA 0x99->0x98 A200000000000000000088DD
20:03:50.819 0xAA 0x99->0x98 A200000000000000000088DD
20:03:50.895 0xAA 0x99->0x98 A200000000000000000088DD
20:03:54.841 0xAA 0x98->0x99 1202
20:03:54.857 0x11 0x00->0x00 E0
20:03:54.935 0xAA 0x99->0x98 120300410102000C070A00
20:03:54.935 0x11 0x00->0x00 AB
20:03:55.013 0xAA 0x99->0x98 120301050000050C000003

And I want to separate the information into individual columns, like below.

20:03:42.769 0xAA 0x99->0x98 A2 00 00 00 00 00 00 00 00 00 88 DD
20:03:42.769 0x11 0x00->0x00 B2
20:03:42.815 0xAA 0x98->0x99 52 03 00 1C 0C 1C 0C 0E 0E
20:03:42.831 0x11 0x00->0x00 EE
20:03:50.726 0xAA 0x99->0x98 A2 00 00 00 00 00 00 00 00 00 88 DD
20:03:50.819 0xAA 0x99->0x98 A2 00 00 00 00 00 00 00 00 00 88 DD
20:03:50.895 0xAA 0x99->0x98 A2 00 00 00 00 00 00 00 00 00 88 DD
20:03:54.841 0xAA 0x98->0x99 12 02
20:03:54.857 0x11 0x00->0x00 E0
20:03:54.935 0xAA 0x99->0x98 12 03 00 41 01 02 00 0C 07 0A 00 
20:03:54.935 0x11 0x00->0x00 AB
20:03:55.013 0xAA 0x99->0x98 12 03 01 05 00 00 05 0C 00 00 03 

With pd.read_csv, I can separate the data, but how to split the last column into fixed-width with 2 characters in each column, the same thing happen with pd.read_fwf

           Time Header   Src->Dest                      Data
0  20:03:42.769   0xAA  0x99->0x98  A200000000000000000088DD
1  20:03:42.769   0x11  0x00->0x00                        B2
2  20:03:42.815   0xAA  0x98->0x99        5203001C0C1C0C0E0E
3  20:03:42.831   0x11  0x00->0x00                        EE
4  20:03:50.726   0xAA  0x99->0x98  A200000000000000000088DD

The problem is how to split the last column data further with 2 characters in each column.

I also tried to do this also.

colspecs = [ (0,12), (13,17), (18,28), (29,31), (31,33) ]
df = pd.read_fwf("SampleLog.txt", colspecs = colspecs, names=["Time", "Header", "Src->Dest", "C1", "C2"])

And I get this.

           Time Header   Src->Dest  C1   C2
0  20:03:42.769   0xAA  0x99->0x98  A2  0.0
1  20:03:42.769   0x11  0x00->0x00  B2  NaN
2  20:03:42.815   0xAA  0x98->0x99  52  3.0
3  20:03:42.831   0x11  0x00->0x00  EE  NaN
4  20:03:50.726   0xAA  0x99->0x98  A2  0.0

Here it looks like this will work, but the problem is I am getting float values in column "C2" and I will also get the same behavior in other columns if I add. The reason for this behavior is due to the "NaN", which converts complete column into float/double type. I tried replacing "NaN" with something else, but not able to use it. Can someone please help me in solving this problem which I am facing?

1 Answer 1

2

You can try the default guess and parse the data:

df = pd.read_fwf('a.csv', names=["Time", "Header", "Src->Dest", "C1"])
(df.drop('C1', axis=1)
   .join(df['C1'].str.extractall('(.{2})')[0].unstack('match'))
)

Output:

            Time    Header  Src->Dest   0   1   2   3   4   5   6   7   8   9   10  11
0   20:03:42.769    0xAA    0x99->0x98  A2  00  00  00  00  00  00  00  00  00  88  DD
1   20:03:42.769    0x11    0x00->0x00  B2  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2   20:03:42.815    0xAA    0x98->0x99  52  03  00  1C  0C  1C  0C  0E  0E  NaN NaN NaN
3   20:03:42.831    0x11    0x00->0x00  EE  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4   20:03:50.726    0xAA    0x99->0x98  A2  00  00  00  00  00  00  00  00  00  88  DD
5   20:03:50.819    0xAA    0x99->0x98  A2  00  00  00  00  00  00  00  00  00  88  DD
6   20:03:50.895    0xAA    0x99->0x98  A2  00  00  00  00  00  00  00  00  00  88  DD
7   20:03:54.841    0xAA    0x98->0x99  12  02  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
8   20:03:54.857    0x11    0x00->0x00  E0  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
9   20:03:54.935    0xAA    0x99->0x98  12  03  00  41  01  02  00  0C  07  0A  00  NaN
10  20:03:54.935    0x11    0x00->0x00  AB  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
11  20:03:55.013    0xAA    0x99->0x98  12  03  01  05  00  00  05  0C  00  00  03  NaN
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.