3

I am looking for a proven algorithm for production. I did see this example but I'm not finding much else on the web or in books.

i.e. file_10.txt > file_2.txt

1
  • What natural language are you interested in? Commented Aug 27, 2009 at 21:32

4 Answers 4

10

Here is a (tested) comparison function that does the job. It understands only unsigned integers, not signed integers or floating point:

#include <stdlib.h>
#include <ctype.h>

/* like strcmp but compare sequences of digits numerically */
int strcmpbynum(const char *s1, const char *s2) {
  for (;;) {
    if (*s2 == '\0')
      return *s1 != '\0';
    else if (*s1 == '\0')
      return 1;
    else if (!(isdigit(*s1) && isdigit(*s2))) {
      if (*s1 != *s2)
        return (int)*s1 - (int)*s2;
      else
        (++s1, ++s2);
    } else {
      char *lim1, *lim2;
      unsigned long n1 = strtoul(s1, &lim1, 10);
      unsigned long n2 = strtoul(s2, &lim2, 10);
      if (n1 > n2)
        return 1;
      else if (n1 < n2)
        return -1;
      s1 = lim1;
      s2 = lim2;
    }
  }
}

If you want to use it with qsort, use this auxiliary function:

static int compare(const void *p1, const void *p2) {
  const char * const *ps1 = p1;
  const char * const *ps2 = p2;
  return strcmpbynum(*ps1, *ps2);
}

And you can do something on the order of

qsort(lines, next, sizeof(lines[0]), compare);
Sign up to request clarification or add additional context in comments.

4 Comments

I think the first "return 1" should be "return -1". Consider strcmpbynum ("", "x").
Contrary to the claim, it was not tested or perhaps the claim should be taken literally: > It understands only unsigned integers godbolt.org/z/rr6-jN int main() { int rc1 = strcmpbynum("foo1", "foo1_2"); int rc2 = strcmpbynum("foo1_2", "foo1"); int r1 = strcmp("foo1", "foo1_2"); int r2 = strcmp("foo1_2", "foo1"); printf("r1 = %d, r2 = %d, rc1 = %d, rc2 = %d\n", r1, r2, rc1, rc2); // we only care that signs match if (r1 * rc1 <= 0) return -1; if (r2 * rc2 <= 0) return -2; }
The claim should most definitely be taken literally.
Without the correction suggested by @DariusBacon, this answer is incorrect. Because both strcmpbynum ("", "x") and strcmpbynum ("x", "") return true, which obviously screw up the comparison and order.
2

The basic sort function would be standard C qsort(). It is parameterized to take a comparison function, and the comparison function is what you would need to write to do the natural ordering.

Your cross-referenced question includes a C implementation of a comparison function.

A Google search 'natural sort c' shows a SourceForge implementation.

1 Comment

I did see the source fourge example, I am on windows. Hoping for more of a well tested and used compare.
1

I assume you already know the C standard library qsort() function:

void qsort(void *base,
           size_t nel,
           size_t width,
           int (*compar)(const void *, const void *);

That last parameter is a function pointer, which means you can pass any function to it. You could use strcmp(), in fact, but that would give you ASCIIbetical, and you specifically want a natural sort.

In that case, you could write one pretty easily:

#include <ctype.h>

int natural(const char *a, const char *b)
{
    if(isalpha(*a) && isalpha(*b))
      {
        // compare two letters
      }
    else
      {
        if(isalpha(*a))
          {
            // compare a letter to a digit (or other non-letter)
          }
        else if(isalpha(*b))
          {
            // compare a digit/non-letter to a letter
          }
        else
          {
            // compare two digits/non-letters
          }
      }
}

Some of the elses could be cleared up if you just return early, but there's a basic structure. Check ctype.h for functions like isalpha() (if a character is part of the alphabet), isdigit(), isspace(), and more.

2 Comments

A nice idea, but a bit oversimplified. It's not clear why I bothered writing and testing code...
It's hard to answer the question if we don't know what he means by "natural" and he ignored the (well designed and perfectly portable) version that was linked.
1

Here's a version for Qt, that also supports unicode:

int strcmpbynum(const QString& s1, const QString &s2) {
int i1 = 0; // index in string
int i2 = 0;
while (true) {
    if (s2.length() == i2) // both strings identical or s1 larger than s2
        return s1.length() == i1 ? 0 : 1;
    if (s1.length() == i1) return -1; // s1 smaller than s2

    unsigned short u1 = s1[i1].unicode();
    unsigned short u2 = s2[i2].unicode();

    if (u1 >= '0' && u1 <= '9' && u2 >= '0' && u2 <= '9') {
        // parse both numbers completely and compare them
        quint64 n1 = 0; // the parsed number
        quint64 n2 = 0;
        int l1 = 0; // length of the number
        int l2 = 0;
        do {
            ++l1;
            n1 = n1 * 10 + u1 - '0';
            if (++i1 == s1.length()) break;
            u1 = s1[i1].unicode();
        } while (u1 >= '0' && u1 <= '9');
        do {
            ++l2;
            n2 = n2 * 10 + u2 - '0';
            if (++i2 == s2.length()) break;
            u2 = s2[i2].unicode();
        } while (u2 >= '0' && u2 <= '9');
        // compare two numbers
        if (n1 < n2) return -1;
        if (n1 > n2) return 1;
        // only accept identical numbers if they also have the same length
        // (same number of leading zeros)
        if (l1 < l2) return -1;
        if (l1 > l2) return 1;
    } else {
        // compare digit with non-digit or two non-digits
        if (u1 < u2) return -1;
        if (u1 > u2) return 1;
        ++i1;
        ++i2;
    }
}

}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.