2

Is there a way to parse utf codes in vbscript? What I'd like to do is replace all codes like "\u00f1" in a string for its corresponding character.

1 Answer 1

1

The Unescape function does that*, only it requires that the Unicode characters are encoded in the %u***xxxx* format. So, you'll need to replace the \u***xxxx* codes with their **%u***xxxx* equivalents first. Here's an example:

str = "\u0044\u006F \u0063\u0061\u0074\u0073 \u0065\u0061\u0074 \u0062\u0061\u0074\u0073\u003f"

Set re = New RegExp
re.Pattern = "\\(u[a-f\d]{4})"
re.IgnoreCase = True
re.Global = True

str2 = Unescape(re.Replace(str, "%$1"))
MsgBox str2

* Note that Unescape also replaces the %***xx* codes in the string with the corresponding ASCII characters. So, if %***xx* is a legal substring in your string, you'll have to write your own replacement function. Such a function could do the following:

  • search for occurences of the **\u***xxxx*-like substrings in your input string,
  • extract the character code from each match, and convert it from hexadecimal to decimal form,
  • call ChrW to convert the decimal character code to the corresponding Unicode character,
  • replace each **\u***xxxx* match with the coresponding character.
Sign up to request clarification or add additional context in comments.

2 Comments

What does re.Replace(str, "%$1") do? What's the meaning of "%$1"?
@Carlos: This code performs a replacement operation on the str string using a regular expression (re). It replaces all occurences of the u[a-f\d]{4} pattern (that is, uxxxx) preceded by \ with the same text preceded by %. $1 in the replacement string is a shorthand for this reused pattern.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.