0

My friends

In the following code, I try to convert XML (https://issat.ttn.tn/cu/export/akouda.php) to CSV file,

The Code :

import requests
import xml.etree.ElementTree as Xet
import pandas as pd
from html import unescape
url = "https://issat.ttn.tn/cu/export/akouda.php"

s = unescape(requests.get(url).text)[5:-6]

df = pd.read_xml(s, xpath="//phases/* | //time")#
#df["value"] = df["value"].ffill()
df
df.to_csv('output0.csv')

and here some of results :

,value,phases,id,act_energy,react_energy,current_inst,voltage_inst,power_inst,power_fact,thd
0,2022-04-14 15:45:00,,,,,,,,,
1,,,0.0,0.3000000000001819,0.4324445747717669,2.0,241.7,0.27,0.57,27.39
2,,,1.0,0.0,0.0,13.06,242.5,0.66,0.2,22.69
3,,,2.0,0.0,0.0,1.07,243.7,0.15,0.58,48.05
4,2022-04-14 15:30:00,,,,,,,,,
5,,,0.0,0.2999999999999545,0.108885460271677,1.02,240.4,0.23,0.94,23.7
6,,,1.0,0.0,0.0,14.54,241.0,0.86,0.24,23.99
7,,,2.0,0.0,0.0,1.07,243.5,0.15,0.59,48.08
8,2022-04-14 15:15:00,,,,,,,,,
9,,,0.0,0.3999999999998636,0.5618044649492236,0.7,243.1,0.1,0.58,42.46
10,,,1.0,0.0,0.0,17.82,241.9,1.99,0.46,33.59
11,,,2.0,0.0,0.0,1.08,246.3,0.15,0.58,51.09
12,2022-04-14 15:00:00,,,,,,,,,
13,,,0.0,0.6000000000001364,0.8427066974243144,0.71,241.7,0.1,0.58,44.02
14,,,1.0,0.0,0.0,18.74,240.5,2.21,0.49,31.3
15,,,2.0,0.0,0.0,1.08,245.3,0.15,0.58,51.77

I need to:

  1. remove the row like rows ( 0 & 4 & 8 & 12 ) that have date without readings.
  2. get the rows that have id = 1 only.
  3. remove the phases column.

Please, anyone can help?

2 Answers 2

1

Consider running two read_xml calls, adjusting xpath and use attrs_only. And because the two will be same level (one <phases> at @id=1 for one <time>), join the result:

...
time_df = pd.read_xml(s, xpath="//time", attrs_only=True, names=["time"])
phase_df = pd.read_xml(s, xpath="//phase[@id=1]")

time_phase_df = time_df.join(phase_df)
time_phase_df
                     time  id  act_energy  ...  power_inst  power_fact    thd
0     2022-04-15 00:00:00   1           0  ...        0.84        0.28  22.35
1     2022-04-14 23:45:00   1           0  ...        0.83        0.28  23.16
2     2022-04-14 23:30:00   1           0  ...        0.83        0.28  22.43
3     2022-04-14 23:15:00   1           0  ...        0.83        0.28  22.56
4     2022-04-14 23:00:00   1           0  ...        0.82        0.28  22.57
                  ...  ..         ...  ...         ...         ...    ...
1289  2022-04-01 02:15:00   1           0  ...        0.69        0.25  22.70
1290  2022-04-01 02:00:00   1           0  ...        0.69        0.25  22.66
1291  2022-04-01 01:45:00   1           0  ...        0.69        0.25  22.46
1292  2022-04-01 01:30:00   1           0  ...        0.69        0.25  22.00
1293  2022-04-01 01:25:00   1           0  ...        0.69        0.25  22.34

And coming soon in Pandas 1.5, read_xml will support parsing dates:

time_df = pd.read_xml(
    s, xpath="//time", attrs_only=True, names=["time"], parse_dates=["value"]
)
Sign up to request clarification or add additional context in comments.

5 Comments

A lot of thanks, the code is running perfectly now.
if you can I need to edit the values in the (power_inst) column I need to multiply them by 1000. What do I need to write?
After read_xml, simply update the column: time_phase_df["power_inst"] = time_phase_df["power_inst"].mul(1000)
it is perfect really you are super hero, if I want to edit on 3 values ( power_inst ,act_energy ,react_energy ) I can write like this time_phase_df["power_inst, act_energy ,react_energy "] = time_phase_df["power_inst, act_energy ,react_energy "].mul(1000) ??
Not quite. Consider researching how to update multiple columns in pandas data frames. Happy coding!
1

Try:

import requests
import pandas as pd
from html import unescape

url = "https://issat.ttn.tn/cu/export/akouda.php"

s = unescape(requests.get(url).text)[5:-6]

df = pd.read_xml(s, xpath="//phases/* | //time")

df["value"] = df["value"].ffill()
df = df.drop(columns="phases")
# if you want only id==1 you can skip this:
# df = df[~df.isna().any(axis=1)]
print(df[df["id"] == 1])

Prints:

                    value   id  act_energy  react_energy  current_inst  voltage_inst  power_inst  power_fact    thd
2     2022-04-14 23:15:00  1.0         0.0           0.0         12.06         241.0        0.83        0.28  22.56
6     2022-04-14 23:00:00  1.0         0.0           0.0         12.04         240.5        0.82        0.28  22.57
10    2022-04-14 22:45:00  1.0         0.0           0.0         12.04         240.2        0.82        0.28  22.56
14    2022-04-14 22:30:00  1.0         0.0           0.0         12.03         240.1        0.82        0.28  22.24
18    2022-04-14 22:15:00  1.0         0.0           0.0         12.01         240.1        0.82        0.28  22.52
22    2022-04-14 22:00:00  1.0         0.0           0.0         12.00         239.8        0.82        0.28  22.74
26    2022-04-14 21:45:00  1.0         0.0           0.0         11.96         239.9        0.82        0.28  22.58

...

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.