0

I have a C# program that sends an XML string as this:

<?xml version="1.0" encoding="utf-16" standalone="no"?>
<ScoreList>
  <Player UserName="Player1" Score="10" />
  <Player UserName="Player2" Score="20" />
</ScoreList>

But when I receive it in my Python program it looks like this

   b'<?xml version="1.0" encoding="utf-16" standalone="no"?>
   \r\n<ScoreList>\r\n  
   <Player UserName="Player1" Score="10" />
   \r\n  <Player UserName="Player2" Score="20" />
   \r\n</ScoreList>' 

I'm sending it to a server with this code C#

Byte[] sendBytes = Encoding.BigEndianUnicode.GetBytes(doc);
        netStream.Write(sendBytes, 0, sendBytes.Length);

And receiving with this code on the Python(Version 3.5) end

self.data = self.request.recv(1024).strip()

Then when I try to parse it using this code

tree = ET.fromstring(self.data)

I get the error:

 File "<string>", line None
 xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 1, 
 column 1

Any advice on where I'm going wrong or what I could try to fix this.

2
  • Don't know much about Python but it seems you are trying to parse XML by string function. I believe there must be some XML parsing functions docs.python.org/2/library/xml.etree.elementtree.html Commented Mar 1, 2019 at 1:45
  • Hemant Sakta I'm using a String function because the xml is turned into string on the c# side and then I want to change it back to xml on the python side. Commented Mar 1, 2019 at 10:26

1 Answer 1

1

It looks as if you are calling str on a bytes instance somewhere in your code.

Consider this xml fragment:

>>> x = '<foo>Hello world</foo>'

If it is being sent across the network it will must be encoded as bytes.

>>> bs = x.encode('utf-8')
>>> bs
b'<foo>Hello world</foo>'

ElementTree will accept the UTF-8 encoded bytes as is, or you can decode them before passing them to ElementTree:

>>> decoded = bs.decode('utf-8')
>>> decoded
'<foo>Hello world</foo>'

However if you call str on the bytes, you'll get the repr of the bytes, which will include the leading b:

>>> stringified = str(bs)
>>> stringified
"b'<foo>Hello world</foo>'"

ElementTree will not accept this input:

>>> ET.fromstring(stringified)
Traceback (most recent call last):
  ...
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 1, column 1

To fix this, you need to look at how self.data is being constructed. Make sure that you are calling decode() on the bytes that you receive, rather than str().

Sign up to request clarification or add additional context in comments.

1 Comment

Thank You. My issue was the way I had written the decode.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.