0

I am a beginner and trying to parse some data from .xml files which has a structure as below.

<parking id="pucpr">
  <space id="1" occupied="0">
    <rotatedRect>
      <center x="300" y="207" />
      <size w="55" h="32" />
      <angle d="-74" />
    </rotatedRect>
    <contour>
      <point x="278" y="230" />
      <point x="290" y="186" />
      <point x="324" y="185" />
      <point x="308" y="230" />
    </contour>
  </space>
  <space id="2" occupied="0">
    <rotatedRect>
      <center x="332" y="209" />
      <size w="56" h="33" />
      <angle d="-77" />
    </rotatedRect>
    <contour>
      <point x="325" y="185" />
      <point x="355" y="185" />
      <point x="344" y="233" />
      <point x="310" y="233" />
    </contour>
  </space>
.
.
.
</parking>

There are hundreds of such files in different folders. I wrote the code below to parse data from all of those .xml files.

import xml.etree.ElementTree as ET
import os
import xlsxwriter

data_path = '/Users/jaehyunlee/Desktop/for_test'

# Read full directory and file name in the folder
for path, dirs, files in os.walk(data_path):
    for file in files:
        if os.path.splitext(file)[1].lower() == '.xml': # filtering only for .xml files
            full_path = os.path.join(path, file)

            # Parsing data from .xml file
            tree = ET.parse(full_path)
            root = tree.getroot()

            for space in root.iter('space'):
                car = space.attrib["occupied"]
                car_int = int(car)

The problem occurs when I try to parse the value of attribute 'occupied'. When I run the code, it returns KeyError: 'occupied'. For other attributes, such as 'x', 'y', 'w', 'h', it works perfectly fine. Could someone help?

p.s. When I convert one .xml file individually, this error does not occur. But it happens when I try to iterate for all files in the folder.

4
  • 1
    Maybe first check print(space) and print(space.attrib) Commented Jan 19, 2020 at 2:15
  • 2
    did you check data in file ? Maybe there is <space> without occupied. Maybe you should check if "occupied" in space.attrib: or use space.attrib.get("occupied", default_value) to get default value if there is no "occupied" Commented Jan 19, 2020 at 2:23
  • you should print file name to see in which file is problem. Commented Jan 19, 2020 at 2:24
  • @furas Thank you. I checked the files and found that some of them have no attribute 'occupied' :) Commented Jan 20, 2020 at 11:54

1 Answer 1

1

I found that this KeyError is occurred because some of the files do not contain attribute 'occupied'. To avoid this problem and continue to iterate I included 'if' under 'for'.

import xml.etree.ElementTree as ET
import os
import xlsxwriter

data_path = '/Users/jaehyunlee/Desktop/for_test'

# Read full directory and file name in the folder
for path, dirs, files in os.walk(data_path):
    for file in files:
        if os.path.splitext(file)[1].lower() == '.xml': # filtering only for .xml files
            full_path = os.path.join(path, file)

            # Parsing data from .xml file
            tree = ET.parse(full_path)
            root = tree.getroot()

            for space in root:
                if 'occupied' in space.attrib:
                    car = space.attrib['occupied'] 
                    car_int = int(car)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.