0

Object: Match strings in file to string in XML. Replace Match with comments

cat File.txt

RHO_BID_RT
RHO_ASK_RT

XML FILE CONTENTS

<field name="RHO_BID_RT" type="float" id="0x01D3" sequence="1"/>
<field name="RHO_ASK_RT" type="float" id="0x01D4" sequence="1"/>

INTENDED RESULTS in XML CONTENTS

 <!-- Removed RHO_BID_RT-->
 <!-- Removed RHO_ASK_RT-->

CODE

import re

word_file = 'File.txt'
xml_file  = '../file.xml'

with open(word_file) as words:
    regex = r'<[^>]+ *field name="({})"[^>]+>'.format(
        '|'.join(word.rstrip() for word in words)
    )

with open(xml_file) as xml:
    for line in xml:
        line = re.sub(regex, r'<!!-- REMOVED \1 -->', line.rstrip())
        print(line)
2
  • did you mean '<!! REMOVED' or did you mean '<!-- REMOVED'? Also, what problem are you having with it? Commented Feb 26, 2015 at 16:53
  • Yes I did mean '<!-- REMOVED but do not think this is root issue for me Commented Feb 26, 2015 at 16:53

1 Answer 1

1

Use an XML parser, like lxml.

The idea is to read a list of words and construct an xpath expression that would check the name attribute to be one of these words. Then, replace the elements by calling replace():

from lxml import etree
from lxml.etree import Comment


with open('words.txt') as f:
    words = [line.strip() for line in f]

xpath = '//field[{}]'.format(" or ".join(['@name = "%s"' % word for word in words]))

tree = etree.parse('input.xml')
root = tree.getroot()

for element in tree.xpath(xpath):
    root.replace(element, Comment('REMOVED'))

print etree.tostring(tree)

For the following contents of input.xml:

<fields>
    <field name="RHO_BID_RT" type="float" id="0x01D3" sequence="1"/>
    <field name="RHO_ASK_RT" type="float" id="0x01D4" sequence="1"/>
</fields>

and words.txt:

RHO_BID_RT
RHO_ASK_RT

it prints:

<fields>
    <!--REMOVED-->
    <!--REMOVED-->
</fields>

Alternatively, construct a set of words and check the name attribute value in a loop:

from lxml import etree
from lxml.etree import Comment


with open('words.txt') as f:
    words = set([line.strip() for line in f])

tree = etree.parse('input.xml')
root = tree.getroot()

for element in tree.xpath('//field[@name]'):
    if element.attrib['name'] in words:
        root.replace(element, Comment('REMOVED'))

print etree.tostring(tree)
Sign up to request clarification or add additional context in comments.

1 Comment

Appears I do not have this module. I will see what can be done. thank you.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.