I am comparing two presumably differently encoded file names, in Javascript, with the hope to find matches:
- One file name is an actual file name, from within a unarchived zip (using https://stuk.github.io/jszip/)
- One file name is a name extracted from a bplist (iOS archive format unarchived with https://github.com/joeferner/node-bplist-parser)
Analysis
When comparing the log output in the javascript console, these file names look exactly identical:
15 - Beschänkt und gsägnet - PLAYBACKVERSION.mp3
15 - Beschänkt und gsägnet - PLAYBACKVERSION.mp3
Note the german umlauts.
Now, when I just copy and paste these strings into Notepad++ and enable the hex editor, it looks like this:
- In the first case the A-Umlaut is encoded with 3 (three) bytes
- In the second case the A-Umlaut is encoded with only 2 (two) bytes.
Question
How can I safely compare those two strings. Is there a general "unencode" method in Javascript that can handle these instances? Or should I / must I guess each encoding and then compare explicitly?
Note
- I am specifically asking for a solution in javascript
- This question, Compare strings with different encodings althoug similar is not actually about encoding
