Decode UTF-8 with Javascript

Question

I have Javascript in an XHTML web page that is passing UTF-8 encoded strings. It needs to continue to pass the UTF-8 version, as well as decode it. How is it possible to decode a UTF-8 string for display?

<script type="text/javascript">
// <![CDATA[
function updateUser(usernameSent){
    var usernameReceived = usernameSent; // Current value: GrÃƒÂ¶ÃƒÂŸe
    var usernameDecoded = usernameReceived;  // Decode to: Größe
    var html2id = '';
    html2id += 'Encoded: ' + usernameReceived + '<br />Decoded: ' + usernameDecoded;
    document.getElementById('userId').innerHTML = html2id;
}
// ]]>
</script>

This is not a problem you use JavaScript to solve. The way to solve it would be to add an appropriate meta tag like <meta http-equiv="Content-Type" content="application/xhtml+xml; charset=UTF-8" /> and XML declaration like <?xml version="1.0" encoding="UTF-8"?>. — icktoofay
– icktoofay, Commented Nov 13, 2012 at 6:53
What? As long as your webpage is encoded in UTF-8, js will treat strings as UTF-8 encoded, and encodeURIComponent() and decodeURIComponent() will assume the data is UTF-8 encoding. — xiaoyi
– xiaoyi, Commented Nov 13, 2012 at 7:07
"GrÃƒÂ¶ÃƒÂŸe" is not UTF-8 (well, it may be, but not intrinsically), it's a mess. It's already broken. Several times, apparently. It doesn't need to be "decoded", wherever it's failing and becomes broken needs to be fixed. Give more context information, otherwise it's hard to help. — deceze
– deceze ♦, Commented Nov 13, 2012 at 7:18
Don't randomly apply utf8_encode. Do you need it? Do you know why you need it? — deceze
– deceze ♦, Commented Nov 13, 2012 at 7:50
The "it" in "user tries to use it" refers to UTF-8? Then you don't need utf8_encode. Not necessarily. utf8_encode transforms the encoding of a string from ISO 8859-1 to UTF-8. It tries to do that even if the string is already UTF-8. UTF-8 "Größe" → utf8_encode → "GrÃ¶Ãe" → utf8_encode "GrÃÂ¶ÃÂe". If you apply it when you don't need it, your string screws up. — deceze
– deceze ♦, Commented Nov 13, 2012 at 9:13

Anna · Accepted Answer · 2019-11-02 14:50:14Z

192

To answer the original question: here is how you decode utf-8 in javascript:

http://ecmanaut.blogspot.ca/2006/07/encoding-decoding-utf8-in-javascript.html

Specifically,

function encode_utf8(s) {
  return unescape(encodeURIComponent(s));
}

function decode_utf8(s) {
  return decodeURIComponent(escape(s));
}

We have been using this in our production code for 6 years, and it has worked flawlessly.

Note, however, that escape() and unescape() are deprecated. See this.

edited Nov 2, 2019 at 14:50

Anna

3595 silver badges20 bronze badges

answered Dec 3, 2012 at 20:53

CpnCrunch

5,1011 gold badge36 silver badges32 bronze badges

Sign up to request clarification or add additional context in comments.

13 Comments

Jarrett Mattson Over a year ago

I've tried using the decodeURIComponent(escape(usernameReceived)) and decodeURIComponent(usernameReceived), but neither are transforming usernameReceived. Can you show some functional code?

CpnCrunch Over a year ago

Here is my code: s = decodeURIComponent( escape( s )); Note that you have to put it in a try/catch block.

Joy George Kunjikkuru Over a year ago

This works for me. But the as you know escape method id deprecated. We are using TypeScript and its not there by default. So what is the best alternative for escape. encodeURI & encodeURIComponent doesn't work to replace escape her in this scenario as they produce different output.

David Spector Over a year ago

I've been asked to add a comment because I downvoted this. But all I can say is that since escape is deprecated, this answer is not acceptable. Why is escape deprecated if it performs an important function? And why is there no native UTF-8 support in JavaScript? And why does no one care (last comment was two years ago).

GetFree Over a year ago

When a deprecated functionality is actually useful, the best way to prevent it from being removed is to keep using it instead of refraining from using it. Browser vendors use usage statistics to determine when to remove a feature.

|

Community · Accepted Answer · 2017-05-23 12:18:25Z

36

This should work:

// http://www.onicos.com/staff/iz/amuse/javascript/expert/utf.txt

/* utf.js - UTF-8 <=> UTF-16 convertion
 *
 * Copyright (C) 1999 Masanao Izumo <[email protected]>
 * Version: 1.0
 * LastModified: Dec 25 1999
 * This library is free.  You can redistribute it and/or modify it.
 */

function Utf8ArrayToStr(array) {
    var out, i, len, c;
    var char2, char3;

    out = "";
    len = array.length;
    i = 0;
    while(i < len) {
    c = array[i++];
    switch(c >> 4)
    { 
      case 0: case 1: case 2: case 3: case 4: case 5: case 6: case 7:
        // 0xxxxxxx
        out += String.fromCharCode(c);
        break;
      case 12: case 13:
        // 110x xxxx   10xx xxxx
        char2 = array[i++];
        out += String.fromCharCode(((c & 0x1F) << 6) | (char2 & 0x3F));
        break;
      case 14:
        // 1110 xxxx  10xx xxxx  10xx xxxx
        char2 = array[i++];
        char3 = array[i++];
        out += String.fromCharCode(((c & 0x0F) << 12) |
                       ((char2 & 0x3F) << 6) |
                       ((char3 & 0x3F) << 0));
        break;
    }
    }

    return out;
}

Check out the JSFiddle demo.

Also see the related questions: here and here

edited May 23, 2017 at 12:18

CommunityBot

11 silver badge

answered Mar 13, 2014 at 8:34

Albert

69k69 gold badges258 silver badges411 bronze badges

2 Comments

user1804599 Over a year ago

Upvote for actually understanding what decoding UTF-8 is.

Fuhrmanator Over a year ago

Some archaeology on the source: web-archive-org.translate.goog/web/20121116231954/http://…

Jonathan · Accepted Answer · 2021-02-16 14:08:36Z

35

Perhaps using the textDecoder will be sufficient.

Not supported in IE though.

var decoder = new TextDecoder('utf-8'),
    decodedMessage;

decodedMessage = decoder.decode(message.data);

Handling non-UTF8 text

In this example, we decode the Russian text "Привет, мир!", which means "Hello, world." In our TextDecoder() constructor, we specify the Windows-1251 character encoding, which is appropriate for Cyrillic script.

    let win1251decoder = new TextDecoder('windows-1251');
    let bytes = new Uint8Array([207, 240, 232, 226, 229, 242, 44, 32, 236, 232, 240, 33]);
    console.log(win1251decoder.decode(bytes)); // Привет, мир!

The interface for the TextDecoder is described here.

Retrieving a byte array from a string is equally simpel:

const decoder = new TextDecoder();
const encoder = new TextEncoder();

const byteArray = encoder.encode('Größe');
// converted it to a byte array

// now we can decode it back to a string if desired
console.log(decoder.decode(byteArray));

If you have it in a different encoding then you must compensate for that upon encoding. The parameter in the constructor for the TextEncoder is any one of the valid encodings listed here.

edited Feb 16, 2021 at 14:08

answered Nov 17, 2016 at 9:53

Jonathan

1,47115 silver badges23 bronze badges

7 Comments

user5066707 Over a year ago

@ÁlvaroGonzález But it works and might be standard (future browsers will need to suport this too, okay?)

Tim Perry Over a year ago

Nowadays this is not experimental, has great support in all modern browsers, and is absolutely the right choice for everybody (unless you still have to support IE)

Jamie Hutber Over a year ago

Where do i get the message.data from?

Jonathan Over a year ago

@JamieHutber Perhaps you are looking for this?: developer.mozilla.org/en-US/docs/Web/API/TextDecoder

Juan Vilar Over a year ago

this does not work for strings, only array buffers.

|

Community · Accepted Answer · 2017-07-06 13:58:31Z

11

Update @Albert's answer adding condition for emoji.

function Utf8ArrayToStr(array) {
    var out, i, len, c;
    var char2, char3, char4;

    out = "";
    len = array.length;
    i = 0;
    while(i < len) {
    c = array[i++];
    switch(c >> 4)
    { 
      case 0: case 1: case 2: case 3: case 4: case 5: case 6: case 7:
        // 0xxxxxxx
        out += String.fromCharCode(c);
        break;
      case 12: case 13:
        // 110x xxxx   10xx xxxx
        char2 = array[i++];
        out += String.fromCharCode(((c & 0x1F) << 6) | (char2 & 0x3F));
        break;
      case 14:
        // 1110 xxxx  10xx xxxx  10xx xxxx
        char2 = array[i++];
        char3 = array[i++];
        out += String.fromCharCode(((c & 0x0F) << 12) |
                       ((char2 & 0x3F) << 6) |
                       ((char3 & 0x3F) << 0));
        break;
     case 15:
        // 1111 0xxx 10xx xxxx 10xx xxxx 10xx xxxx
        char2 = array[i++];
        char3 = array[i++];
        char4 = array[i++];
        out += String.fromCodePoint(((c & 0x07) << 18) | ((char2 & 0x3F) << 12) | ((char3 & 0x3F) << 6) | (char4 & 0x3F));

        break;
    }

    return out;
}

edited Jul 6, 2017 at 13:58

CommunityBot

11 silver badge

answered Feb 25, 2017 at 7:27

lauthu

3063 silver badges11 bronze badges

1 Comment

some Over a year ago

Note: This works on a well formed UTF-8 input, but breaks without notice on some conditions: For example it assumes that there are correct number of bytes left, and that they are of correct continue sequence 0b10xxxxxx, and in case 15 it should only match 0b11110xxx or it can decode an illegal code point.

Matthew Voss · Accepted Answer · 2017-02-19 18:18:22Z

10

Here is a solution handling all Unicode code points include upper (4 byte) values and supported by all modern browsers (IE and others > 5.5). It uses decodeURIComponent(), but NOT the deprecated escape/unescape functions:

function utf8_to_str(a) {
    for(var i=0, s=''; i<a.length; i++) {
        var h = a[i].toString(16)
        if(h.length < 2) h = '0' + h
        s += '%' + h
    }
    return decodeURIComponent(s)
}

Tested and available on GitHub

To create UTF-8 from a string:

function utf8_from_str(s) {
    for(var i=0, enc = encodeURIComponent(s), a = []; i < enc.length;) {
        if(enc[i] === '%') {
            a.push(parseInt(enc.substr(i+1, 2), 16))
            i += 3
        } else {
            a.push(enc.charCodeAt(i++))
        }
    }
    return a
}

Tested and available on GitHub

edited Feb 19, 2017 at 18:18

answered Feb 15, 2017 at 5:35

Matthew Voss

1111 silver badge5 bronze badges

1 Comment

David Spector Over a year ago

Would appreciate detailed specification of the arguments and results. Unicode confuses me terribly.

Olle Tiinus · Accepted Answer · 2019-02-26 10:33:24Z

9

This is what I found after a more specific Google search than just UTF-8 encode/decode. so for those who are looking for a converting library to convert between encodings, here you go.

https://github.com/inexorabletash/text-encoding

var uint8array = new TextEncoder().encode(str);
var str = new TextDecoder(encoding).decode(uint8array);

Paste from repo readme

All encodings from the Encoding specification are supported:

utf-8 ibm866 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-8-i iso-8859-10 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 koi8-r koi8-u macintosh windows-874 windows-1250 windows-1251 windows-1252 windows-1253 windows-1254 windows-1255 windows-1256 windows-1257 windows-1258 x-mac-cyrillic gb18030 hz-gb-2312 big5 euc-jp iso-2022-jp shift_jis euc-kr replacement utf-16be utf-16le x-user-defined

(Some encodings may be supported under other names, e.g. ascii, iso-8859-1, etc. See Encoding for additional labels for each encoding.)

answered Feb 26, 2019 at 10:33

Olle Tiinus

2142 silver badges9 bronze badges

1 Comment

henrry Over a year ago

This is best way work for me.thanks, for more info click developer.mozilla.org/en-US/docs/Web/API/TextDecoder/…

user9642681 · Accepted Answer · 2018-04-13 16:39:56Z

7

// String to Utf8 ByteBuffer

function strToUTF8(str){
  return Uint8Array.from(encodeURIComponent(str).replace(/%(..)/g,(m,v)=>{return String.fromCodePoint(parseInt(v,16))}), c=>c.codePointAt(0))
}

// Utf8 ByteArray to string

function UTF8toStr(ba){
  return decodeURIComponent(ba.reduce((p,c)=>{return p+'%'+c.toString(16),''}))
}

answered Apr 13, 2018 at 16:39

user9642681

711 silver badge1 bronze badge

2 Comments

David Spector Over a year ago

Could someone please test this? Also, please document the argument and return value in detail, to help those of us confused by Unicode. Thanks.

Chris G Over a year ago

not working for me

fakedrake · Accepted Answer · 2016-11-14 17:48:37Z

6

@albert's solution was the closest I think but it can only parse up to 3 byte utf-8 characters

function utf8ArrayToStr(array) {
  var out, i, len, c;
  var char2, char3;

  out = "";
  len = array.length;
  i = 0;

  // XXX: Invalid bytes are ignored
  while(i < len) {
    c = array[i++];
    if (c >> 7 == 0) {
      // 0xxx xxxx
      out += String.fromCharCode(c);
      continue;
    }

    // Invalid starting byte
    if (c >> 6 == 0x02) {
      continue;
    }

    // #### MULTIBYTE ####
    // How many bytes left for thus character?
    var extraLength = null;
    if (c >> 5 == 0x06) {
      extraLength = 1;
    } else if (c >> 4 == 0x0e) {
      extraLength = 2;
    } else if (c >> 3 == 0x1e) {
      extraLength = 3;
    } else if (c >> 2 == 0x3e) {
      extraLength = 4;
    } else if (c >> 1 == 0x7e) {
      extraLength = 5;
    } else {
      continue;
    }

    // Do we have enough bytes in our data?
    if (i+extraLength > len) {
      var leftovers = array.slice(i-1);

      // If there is an invalid byte in the leftovers we might want to
      // continue from there.
      for (; i < len; i++) if (array[i] >> 6 != 0x02) break;
      if (i != len) continue;

      // All leftover bytes are valid.
      return {result: out, leftovers: leftovers};
    }
    // Remove the UTF-8 prefix from the char (res)
    var mask = (1 << (8 - extraLength - 1)) - 1,
        res = c & mask, nextChar, count;

    for (count = 0; count < extraLength; count++) {
      nextChar = array[i++];

      // Is the char valid multibyte part?
      if (nextChar >> 6 != 0x02) {break;};
      res = (res << 6) | (nextChar & 0x3f);
    }

    if (count != extraLength) {
      i--;
      continue;
    }

    if (res <= 0xffff) {
      out += String.fromCharCode(res);
      continue;
    }

    res -= 0x10000;
    var high = ((res >> 10) & 0x3ff) + 0xd800,
        low = (res & 0x3ff) + 0xdc00;
    out += String.fromCharCode(high, low);
  }

  return {result: out, leftovers: []};
}

This returns {result: "parsed string", leftovers: [list of invalid bytes at the end]} in case you are parsing the string in chunks.

EDIT: fixed the issue that @unhammer found.

edited Nov 14, 2016 at 17:48

answered Jan 21, 2016 at 14:50

fakedrake

6,9468 gold badges53 silver badges71 bronze badges

3 Comments

unhammer Over a year ago

When I try this with [195,165] I get {"result":"","leftovers":[195, 165]} while @Albert's gives "å"

fakedrake Over a year ago

You are right, I fixed it in my project but not in this post. Sorry about my neglect.

unhammer Over a year ago

No problem, seems to work now :-) Kinda funny that it already got two upvotes before anyone tested it though :-) Now utf8ArrayToStr([240,159,154,133]) gives me my "🚅"

MCCCS · Accepted Answer · 2019-01-24 13:45:55Z

1

Using my 1.6KB library, you can do

ToString(FromUTF8(Array.from(usernameReceived)))

answered Jan 24, 2019 at 13:45

MCCCS

1,0523 gold badges22 silver badges47 bronze badges

Comments

Yordan Nedelchev · Accepted Answer · 2020-03-27 19:32:38Z

This is a solution with extensive error reporting.

It would take an UTF-8 encoded byte array (where byte array is represented as array of numbers and each number is an integer between 0 and 255 inclusive) and will produce a JavaScript string of Unicode characters.

function getNextByte(value, startByteIndex, startBitsStr, 
                     additional, index) 
{
    if (index >= value.length) {
        var startByte = value[startByteIndex];
        throw new Error("Invalid UTF-8 sequence. Byte " + startByteIndex 
            + " with value " + startByte + " (" + String.fromCharCode(startByte) 
            + "; binary: " + toBinary(startByte)
            + ") starts with " + startBitsStr + " in binary and thus requires " 
            + additional + " bytes after it, but we only have " 
            + (value.length - startByteIndex) + ".");
    }
    var byteValue = value[index];
    checkNextByteFormat(value, startByteIndex, startBitsStr, additional, index);
    return byteValue;
}

function checkNextByteFormat(value, startByteIndex, startBitsStr, 
                             additional, index) 
{
    if ((value[index] & 0xC0) != 0x80) {
        var startByte = value[startByteIndex];
        var wrongByte = value[index];
        throw new Error("Invalid UTF-8 byte sequence. Byte " + startByteIndex 
             + " with value " + startByte + " (" +String.fromCharCode(startByte) 
             + "; binary: " + toBinary(startByte) + ") starts with " 
             + startBitsStr + " in binary and thus requires " + additional 
             + " additional bytes, each of which shouls start with 10 in binary."
             + " However byte " + (index - startByteIndex) 
             + " after it with value " + wrongByte + " (" 
             + String.fromCharCode(wrongByte) + "; binary: " + toBinary(wrongByte)
             +") does not start with 10 in binary.");
    }
}

function fromUtf8 (str) {
        var value = [];
        var destIndex = 0;
        for (var index = 0; index < str.length; index++) {
            var code = str.charCodeAt(index);
            if (code <= 0x7F) {
                value[destIndex++] = code;
            } else if (code <= 0x7FF) {
                value[destIndex++] = ((code >> 6 ) & 0x1F) | 0xC0;
                value[destIndex++] = ((code >> 0 ) & 0x3F) | 0x80;
            } else if (code <= 0xFFFF) {
                value[destIndex++] = ((code >> 12) & 0x0F) | 0xE0;
                value[destIndex++] = ((code >> 6 ) & 0x3F) | 0x80;
                value[destIndex++] = ((code >> 0 ) & 0x3F) | 0x80;
            } else if (code <= 0x1FFFFF) {
                value[destIndex++] = ((code >> 18) & 0x07) | 0xF0;
                value[destIndex++] = ((code >> 12) & 0x3F) | 0x80;
                value[destIndex++] = ((code >> 6 ) & 0x3F) | 0x80;
                value[destIndex++] = ((code >> 0 ) & 0x3F) | 0x80;
            } else if (code <= 0x03FFFFFF) {
                value[destIndex++] = ((code >> 24) & 0x03) | 0xF0;
                value[destIndex++] = ((code >> 18) & 0x3F) | 0x80;
                value[destIndex++] = ((code >> 12) & 0x3F) | 0x80;
                value[destIndex++] = ((code >> 6 ) & 0x3F) | 0x80;
                value[destIndex++] = ((code >> 0 ) & 0x3F) | 0x80;
            } else if (code <= 0x7FFFFFFF) {
                value[destIndex++] = ((code >> 30) & 0x01) | 0xFC;
                value[destIndex++] = ((code >> 24) & 0x3F) | 0x80;
                value[destIndex++] = ((code >> 18) & 0x3F) | 0x80;
                value[destIndex++] = ((code >> 12) & 0x3F) | 0x80;
                value[destIndex++] = ((code >> 6 ) & 0x3F) | 0x80;
                value[destIndex++] = ((code >> 0 ) & 0x3F) | 0x80;
            } else {
                throw new Error("Unsupported Unicode character \"" 
                    + str.charAt(index) + "\" with code " + code + " (binary: " 
                    + toBinary(code) + ") at index " + index
                    + ". Cannot represent it as UTF-8 byte sequence.");
            }
        }
        return value;
    }

Vadim Shvetsov · Accepted Answer · 2020-09-24 02:11:43Z

1

You should take decodeURI for it.

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/decodeURI

As simple as this:

decodeURI('https://developer.mozilla.org/ru/docs/JavaScript_%D1%88%D0%B5%D0%BB%D0%BB%D1%8B');
// "https://developer.mozilla.org/ru/docs/JavaScript_шеллы"

Consider to use it inside try catch block for not missing an URIError.

Also it has full browsers support.

answered Sep 24, 2020 at 2:11

Vadim Shvetsov

2,4463 gold badges22 silver badges28 bronze badges

Comments

Royer Adames · Accepted Answer · 2022-04-21 00:04:53Z

1

const decoder = new TextDecoder();
console.log(decoder.decode(new Uint8Array([97])));

MDN resource link

answered Apr 21, 2022 at 0:04

Royer Adames

1,08414 silver badges13 bronze badges

Comments

Kasparow · Accepted Answer · 2018-03-02 16:26:27Z

0

I reckon the easiest way would be to use a built-in js functions decodeURI() / encodeURI().

function (usernameSent) {
  var usernameEncoded = usernameSent; // Current value: utf8
  var usernameDecoded = decodeURI(usernameReceived);  // Decoded
  // do stuff
}

answered Mar 2, 2018 at 16:26

Kasparow

172 bronze badges

1 Comment

David Spector Over a year ago

Sounds easy. Too easy. Did you test this?

Sergio Abreu · Accepted Answer · 2024-09-28 01:39:41Z

 const maxUnicode = 0x10FFFF,        
       maxMsg = "Maximum utf-8 is 0xF48FBFBF and value is U10FFFF",
       invalidUtf8 = "Not valid UTF-8 bytes to decode",
       regValue = new RegExp("(0x[a-fA-F0-9]+|\\d+)");

/* DECODING utf-8 valid bytes */

function utf8_decode(v){
  var a = 0, base = 64;

  if( typeof v == 'string' && v.match(regValue) ){
     v = eval(v);
  }

  if( v < 0xC280){
    a = v * 1;
  } else if ( v >= 0xC280 && v <= 0xDFBF ){
    // Bits: 110x xxxx 10xx xxxx
    a = v & 0x3F;
    a += (v>>8 & 0x1F) * base;
  } else if ( v >= 0xE0A080 && v <= 0xEFBFBF ){
    // Bits: 1100 xxxx 10xx xxxx 10xx xxxx
    a = v & 0x3F;
    a += (v>>8 & 0x3F) * base;
    a += (v>>16 & 0x0F) * Math.pow(base,2);
  } else if ( v >= 0x0F0908080 && v <= 0xF48FBFBF ){
    // Bits: 1111 0xxx 10xx xxxx 10xx xxxx 10xx xxxx
    a = v & 0x3F;
    a += (v>>8 & 0x3F) * base;
    a += (v>>16 & 0x3F) * Math.pow(base,2);
    a += (v>>24 & 0x07) * Math.pow(base,3);
  } else {

    if( v <= maxUnicode){
      window.alert( invalidUtf8 );
      console.warn( invalidUtf8 );
    } else {
      window.alert( maxMsg );
      console.warn( maxMsg );
    }
  }

  return '\\U' + Number(a).toString(16).toUpperCase();

}


/* ENCODING */

function utf8_encode(v){
  var a = 0, base = 256;

  if( typeof v == 'string' && v.match(regValue) ){
      v = eval(v);
  }

  if( v < 0x80){
    a = v * 1;
  } else if ( v >= 0x80 && v <= 0x7FF ){
    // Bits: 110x xxxx 10xx xxxx
    a = (v & 63)|128;
    a += (v>>6 & 63) * base;
    a += 0xC000
  } else if ( v >= 0x800 && v <= 0x0FFFF ){
    // Bits: 1100 xxxx 10xx xxxx 10xx xxxx
    a = (v & 63)|128;
    a += ((v>>6 & 63)|128) * base;
    a += (v>>12 & 63) * Math.pow(base,2);
    a += 0xE00000
  } else if ( v >= 0x10000 && v <= 0x10FFFF ){
    // Bits: 1111 0xxx 10xx xxxx 10xx xxxx 10xx xxxx
    a = (v & 63)|128;
    a += ((v>>6 & 63)|128) * base;
    a += ((v>>12 & 63)|128) * Math.pow(base,2);
    a += (v>>18 & 63) * Math.pow(base,3);
    a += 0xFF000000
  } else{
    if( v > maxUnicode){
      window.alert( maxMsg );
      console.warn( maxMsg );
    }
  }

  return '\\x' + Number(a).toString(16).toUpperCase();

 }

 /* SOME TESTING */

 window.alert( 'Decode from utf-8 0xF48FBFBF: ' + utf8_decode( 0XF48FBFBF ) + '\n' +
               'Encode Unicode U1F600 as utf-8: ' + utf8_encode( 0x1F600 ) );

 while(true){
    var s = prompt("Inform 'enc'/'dec', a space and number or hex.\n" +
                   "Valid examples: enc 0x1F600, dec 0xC3A7, enc 255 :");
    if( ! s){
      console.log('Bye bye' );
      break;
    } else {
      v = s.match(regValue);
      if(v){
        v = v[0];
      } else {
        window.alert("Bad string, number or prefixed 0x hexa is required");
        continue;
      }
      if ( s.match(/enc\s/) ){
        window.alert('Encoding ' + v + ' as utf-8: ' + utf8_encode( v ) );
      } else if ( s.match(/dec\s/) ){
        window.alert('Decoding ' + v + ': ' + utf8_decode( v ) );
      }
    }
 }

Adween · Accepted Answer · 2015-12-17 12:08:30Z

-3

I searched for a simple solution and this works well for me:

//input data
view = new Uint8Array(data);

//output string
serialString = ua2text(view);

//convert UTF8 to string
function ua2text(ua) {
    s = "";
    for (var i = 0; i < ua.length; i++) {
        s += String.fromCharCode(ua[i]);
    }
    return s;               
}

Only issue I have is sometimes I get one character at a time. This might be by design with my source of the arraybuffer. I'm using https://github.com/xseignard/cordovarduino to read serial data on an android device.

edited Dec 17, 2015 at 12:08

Adween

2,8202 gold badges20 silver badges20 bronze badges

answered Aug 12, 2015 at 13:41

Evan Grant

131 silver badge4 bronze badges

1 Comment

phihag Over a year ago

This does not actually decode UTF-8. For example, C3 BC should be decoded as ü, but your answer returns Ã¼.

geremews · Accepted Answer · 2021-05-06 19:36:36Z

-3

Preferably, as others have suggested, use the Encoding API. But if you need to support IE (for some strange reason) MDN recommends this repo FastestSmallestTextEncoderDecoder

If you need to make use of the polyfill library:

    import {encode, decode} from "fastestsmallesttextencoderdecoder";

Then (regardless of the polyfill) for encoding and decoding:

    // takes in USVString and returns a Uint8Array object
    const encoded = new TextEncoder().encode('€')
    console.log(encoded);

    // takes in an ArrayBuffer or an ArrayBufferView and returns a DOMString
    const decoded = new TextDecoder().decode(encoded);
    console.log(decoded);

edited May 6, 2021 at 19:36

answered May 5, 2021 at 20:02

geremews

114 bronze badges

1 Comment

10 Rep Over a year ago

A link to a solution is welcome, but please ensure your answer is useful without it: add context around the link so your fellow users will have some idea what it is and why it is there, then quote the most relevant part of the page you are linking to in case the target page is unavailable. Answers that are little more than a link may be deleted.

Collectives™ on Stack Overflow

Decode UTF-8 with Javascript

16 Answers 16

13 Comments

2 Comments

Handling non-UTF8 text

7 Comments

1 Comment

1 Comment

1 Comment

2 Comments

3 Comments

Comments

Comments

Comments

Comments

1 Comment

Comments

1 Comment

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

16 Answers 16

13 Comments

2 Comments

Handling non-UTF8 text

7 Comments

1 Comment

1 Comment

1 Comment

2 Comments

3 Comments

Comments

Comments

Comments

Comments

1 Comment

Comments

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related