46

When I wrote in JavaScript "Ł" > "Z" it returns true. In Unicode order it should be of course false. How to fix this? My site is using UTF-8.

8
  • 1
    What are you trying to do exactly? Maybe there are workarounds. Commented Sep 2, 2010 at 19:51
  • 1
    I'm trying to sort table based on user names and I have letters like "Ł". Commented Sep 2, 2010 at 19:56
  • 1
    In other words, it must come right after L? I.e. ..J,K,L,Ł,M,N,O..? Commented Sep 2, 2010 at 20:12
  • 7
    The term you're looking for is "collation", and it is notoriously hard. There is no such thing as "Unicode order"; Unicode explicitly recognizes the fact that different locales have different orders. See unicode.org/reports/tr10 - "does not provide for the following features: ... Linguistic applicability" Commented Sep 3, 2010 at 7:26
  • 2
    I have changed “UTF-8” in the question title to “Unicode”, since the issue does not depend on a particular transfer encoding. (Besides, JavaScript internally uses UTF-16, not UTF-8, even if the HTML document’s encoding is UTF-8.) Commented May 12, 2014 at 19:04

6 Answers 6

41

You can use Intl.Collator or String.prototype.localeCompare, introduced by ECMAScript Internationalization API:

"Ł".localeCompare("Z", "pl");              // -1
new Intl.Collator("pl").compare("Ł","Z");  // -1

-1 means that Ł comes before Z, like you want.

Note it only works on latest browsers, though.

Sign up to request clarification or add additional context in comments.

1 Comment

If you have to sort a list of multi-locale words. For instance a list of people names from various countries. This will not work, isn't it ?
20

Here is an example for the french alphabet that could help you for a custom sort:

var alpha = function(alphabet, dir, caseSensitive){
  return function(a, b){
    var pos = 0,
      min = Math.min(a.length, b.length);
    dir = dir || 1;
    caseSensitive = caseSensitive || false;
    if(!caseSensitive){
      a = a.toLowerCase();
      b = b.toLowerCase();
    }
    while(a.charAt(pos) === b.charAt(pos) && pos < min){ pos++; }
    return alphabet.indexOf(a.charAt(pos)) > alphabet.indexOf(b.charAt(pos)) ?
      dir:-dir;
  };
};

To use it on an array of strings a:

a.sort(
  alpha('ABCDEFGHIJKLMNOPQRSTUVWXYZaàâäbcçdeéèêëfghiïîjklmnñoôöpqrstuûüvwxyÿz')
);

Add 1 or -1 as the second parameter of alpha() to sort ascending or descending.
Add true as the 3rd parameter to sort case sensitive.

You may need to add numbers and special chars to the alphabet list

3 Comments

If you are using this code, also see: stackoverflow.com/questions/3630645/…
Eek! Do you really have to go through all that? What about forgetting to put it into Normalization Form D first? Does PHP truly have nothing equivalent to Perl’s Unicode::Collate and Unicode::Collate::Locale modules? REALLY? It seems like utter madness to try to reïmplement all that on one’s own!
@tchrist, it is not PHP, but javascript here, and it is like it is.
14

You may be able to build your own sorting function using localeCompare() that - at least according to the MDC article on the topic - should sort things correctly.

If that doesn't work out, here is an interesting SO question where the OP employs string replacement to build a "brute-force" sorting mechanism.

Also in that question, the OP shows how to build a custom textExtract function for the jQuery tablesorter plugin that does locale-aware sorting - maybe also worth a look.

Edit: As a totally far-out idea - I have no idea whether this is feasible at all, especially because of performance concerns - if you are working with PHP/mySQL on the back-end anyway, I would like to mention the possibility of sending an Ajax query to a mySQL instance to have it sorted there. mySQL is great at sorting locale aware data, because you can force sorting operations into a specific collation using e.g. ORDER BY xyz COLLATE utf8_polish_ci, COLLATE utf8_german_ci.... those collations would take care of all sorting woes at once.

6 Comments

Thx. for links. It's little shame that JavaScript doesn't support it in core, but still it's working solution.
Be careful with localeCompare() in IE6: blog.schmichael.com/2008/07/14/javascript-collation-fail
@BalusC the comments in that article claim that it's in fact Wine's fault, not IE6's. Can't find anything else on the issue to confirm or disprove it, and I'm too lazy to build a test case right now... @Tomasz if you go this route, it would be interesting to hear whether things work well in IE6.
Oh, I didn't see the comment before. In any way, to avoid unforeseen browser inconsistenties (I still don't have a solid feeling around localeCompare()), I'd implement a custom one like Tomalak did in your linked topic.
@BalusC I agree that's probably the best and most solid way to go.
|
11

Mic's code improved for non-mentioned chars:

var alpha = function(alphabet, dir, caseSensitive){
  dir = dir || 1;
  function compareLetters(a, b) {
    var ia = alphabet.indexOf(a);
    var ib = alphabet.indexOf(b);
    if(ia === -1 || ib === -1) {
      if(ib !== -1)
        return a > 'a';
      if(ia !== -1)
        return 'a' > b;
      return a > b;
    }
    return ia > ib;
  }
  return function(a, b){
    var pos = 0;
    var min = Math.min(a.length, b.length);
    caseSensitive = caseSensitive || false;
    if(!caseSensitive){
      a = a.toLowerCase();
      b = b.toLowerCase();
    }
    while(a.charAt(pos) === b.charAt(pos) && pos < min){ pos++; }
    return compareLetters(a.charAt(pos), b.charAt(pos)) ? dir:-dir;
  };
};

function assert(bCondition, sErrorMessage) {
      if (!bCondition) {
          throw new Error(sErrorMessage);
      }
}

assert(alpha("bac")("a", "b") === 1, "b is first than a");
assert(alpha("abc")("ac", "a") === 1, "shorter string is first than longer string");
assert(alpha("abc")("1abc", "0abc") === 1, "non-mentioned chars are compared as normal");
assert(alpha("abc")("0abc", "1abc") === -1, "non-mentioned chars are compared as normal [2]");
assert(alpha("abc")("0abc", "bbc") === -1, "non-mentioned chars are compared with mentioned chars in special way");
assert(alpha("abc")("zabc", "abc") === 1, "non-mentioned chars are compared with mentioned chars in special way [2]");

Comments

0

You have to keep two sortkey strings. One is for primary order, where German ä=a (primary a->a) and French é=e (primary sortkey e->e) and one for secondary order, where ä comes after a (translating a->azzzz in secondary key) or é comes after e (secondary key e->ezzzz). Especially in Czech some letters are variations of a letter (áéí…) whereas others stand in their full right in the list (ABCČD…GHChI…RŘSŠT…). Plus the problem to consider digraphs a single letters (primary ch->hzzzz). No trivial problem, and there should be a solution within JS.

2 Comments

improve your answer by using defaults format options
The solution is: myArray.sort(new Intl.Collator('pl').compare);
-1

Funny, I have to think about that problem and finished searching here, because it came in mind, that I can use my own javascript module. I wrote a module to generate a clean URL, therefor I have to translitate the input string... (http://pid.github.io/speakingurl/)

var mySlug = require('speakingurl').createSlug({
    maintainCase: true,
    separator: " "
});

var input = "Schöner Titel läßt grüßen!? Bel été !";
var result;

slug = mySlug(input);
console.log(result); // Output: "Schoener Titel laesst gruessen bel ete"

Now you can sort with this results. You can ex. store the original titel in the field "title" and the field for sorting in "title_sort" with the result of mySlug.

1 Comment

It is almost good solution. The problem is that "ä" will be mixed with "a", but should be separated.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.