1

I am trying to encode a request. The request goes as follow:

https://www.overpass-api.de/api/interpreter?data=area["name"="Nicaragua"]["admin_level"="2"]->.boundaryarea;(node["type"="route"]["route"="bus"](area.boundaryarea);way["type"="route"]["route"="bus"](area.boundaryarea);>;relation["type"="route"]["route"="bus"](area.boundaryarea);>>;);out meta;

As you can see, you have a lot of special characters. If I give this URL to curl, I won't process it because of some characters. Hence I decided to encode the URL with my own method and with curl's method. Here is the code sample to encode with curl:

std::string d = ...;
   CURL *curl = curl_easy_init();
if(curl) {
  char *output = curl_easy_escape(curl, d.c_str(), d.length());
  if(output) {
    printf("Encoded: %s\n", output);
    curl_free(output);
  }
}

Will encode the whole request resulting in something like

https%3A%2F%2Fwww.overpass-api.de%2Fapi%2Finterpreter%3Fdata%3D ...

If I then try to give it to curl to process it, it will throw and say that it cannot resolve the host, which makes sense to me. So I then decided to check what chrome does when encoding it - thanks to the dev tools. And this is how it looks like:

https://www.overpass-api.de/api/interpreter?data=area[%22name%22=%22Nicaragua%22][%22admin_level%22=%222%22]-%3E.boundaryarea;(node[%22type%22=%22route%22][%22route%22=%22bus%22](area.boundaryarea);way[%22type%22=%22route%22][%22route%22=%22bus%22](area.boundaryarea);%3E;relation[%22type%22=%22route%22][%22route%22=%22bus%22](area.boundaryarea);%3E%3E;);out%20meta;

And if I give this to curl as it is - it will process it properly.

Why some characters are encoded and not the rest? and why does curl accept it this way ?

EDIT: and more importantly, how can I replicate that in my code?

1

2 Answers 2

2

You must escape the URI parts. Have a look at JavaScript's encodeURI() and encode​URIComponent() functions this is the way to go.

I am using following function, which mimics JavaScript's encodeURIComponent, in order to encode the individual parts

std::string encodeURIComponent(std::string const&value)
{
    std::ostringstream oss;
    oss << std::hex;
    for(auto c : value){
      int uc = static_cast<unsigned char>(c);
      if(((0x30 <= uc) && (uc <= 0x39)) || ((0x41 <= uc) && (uc <= 0x5A)) || ((0x61 <= uc) && (uc <= 0x7A))){
        oss << c;
        continue;
      }
      switch(c){
      case '-': oss << c; break;
      case '_': oss << c; break;
      case '.': oss << c; break;
      case '!': oss << c; break;
      case '~': oss << c; break;
      case '*': oss << c; break;
      case '\'': oss << c; break;
      case '(': oss << c; break;
      case ')': oss << c; break;
      default:
          oss << std::uppercase << '%' << std::setw(2) << uc << std::nouppercase;
          break;
      }
    }
    return oss.str();
}
Sign up to request clarification or add additional context in comments.

Comments

1

Do not escape the entire URL as a single string. Escape only the individual pieces that actually need to be escaped, like query parameters. But even then, in name=value pairs, escape the name and value separately as needed, otherwise the delimiting = within the name=value pair, and the delimiting & between pairs, will get escaped, which you don't want to happen.

Try something more like this:

std::string query_encode(const std::string &s)
{
    std::string ret;

    // curl_easy_escape() escapes way more than it needs to in
    // a URL Query component! Which is not TECHNICALLY wrong, but
    // it won't produce the output you are expecting...
    /*
    char *output = curl_easy_escape(curl, s.c_str(), s.length());
    if (output) {
        ret = output;
        curl_free(output);
    }
    */

    #define IS_BETWEEN(ch, low, high) (ch >= low && ch <= high)
    #define IS_ALPHA(ch) (IS_BETWEEN(ch, 'A', 'Z') || IS_BETWEEN(ch, 'a', 'z'))
    #define IS_DIGIT(ch) IS_BETWEEN(ch, '0', '9')
    #define IS_HEXDIG(ch) (IS_DIGIT(ch) || IS_BETWEEN(ch, 'A', 'F') || IS_BETWEEN(ch, 'a', 'f'))

    for(size_t i = 0; i < s.size();)
    {
        char ch = s[i++];

        if (IS_ALPHA(ch) || IS_DIGIT(ch))
        {
            ret += ch;
        }
        else if ((ch == '%') && IS_HEXDIG(s[i+0]) && IS_HEXDIG(s[i+1]))
        {
            ret += s.substr(i-1, 3);
            i += 2;
        }
        else
        {
            switch (ch)
            {
                case '-':
                case '.':
                case '_':
                case '~':
                case '!':
                case '$':
                case '&':
                case '\'':
                case '(':
                case ')':
                case '*':
                case '+':
                case ',':
                case ';':
                case '=':
                case ':':
                case '@':
                case '/':
                case '?':
                case '[':
                case ']':
                    ret += ch;
                    break;

                default:
                {
                    static const char hex[] = "0123456789ABCDEF";
                    char pct[] = "%  ";
                    pct[1] = hex[(ch >> 4) & 0xF];
                    pct[2] = hex[ch & 0xF];
                    ret.append(pct, 3);
                    break;
                }
            }
        }
    }

    return ret;
}

std::string d = "https://www.overpass-api.de/api/interpreter?data=" + query_encode("area[\"name\"=\"Nicaragua\"][\"admin_level\"=\"2\"]->.boundaryarea;(node[\"type\"=\"route\"][\"route\"=\"bus\"](area.boundaryarea);way[\"type\"=\"route\"][\"route\"=\"bus\"](area.boundaryarea);>;relation[\"type\"=\"route\"][\"route\"=\"bus\"](area.boundaryarea);>>;);out meta;");

std::cout << "Encoded: " + d + "\n";

Live Demo

Output:

https://www.overpass-api.de/api/interpreter?data=area[%22name%22=%22Nicaragua%22][%22admin_level%22=%222%22]-%3E.boundaryarea;(node[%22type%22=%22route%22][%22route%22=%22bus%22](area.boundaryarea);way[%22type%22=%22route%22][%22route%22=%22bus%22](area.boundaryarea);%3E;relation[%22type%22=%22route%22][%22route%22=%22bus%22](area.boundaryarea);%3E%3E;);out%20meta;

Why some characters are encoded and not the rest?

The rules are covered by RFC 3986, in particular Section 2 "Characters" and its sub-sections 2.1 - 2.5. The Query component is covered by Section 3.4.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.