1

I am putting together some bash script for parsing a URL into its components. I am blocked trying to figure out how to add an array value to a key within a JSON body.

Attempted Approach:

I have parsed the following URL: https://bar.foo.com/v2020/folders/8d55e749-bbd7-e811-9c19-3ca82a1e3f41/folders

This URL's path is:

URL_PATH: v2020/folders/8d55e749-bbd7-e811-9c19-3ca82a1e3f41/folders

This URL's path parts array is using

IFS='/' read -ra URL_PATH_PARTS <<< "$URL_PATH"

URL_PATH_PARTS [4]: v2020 folders 8d55e749-bbd7-e811-9c19-3ca82a1e3f41 folders

I want to add an array value to JSON that is formatted as follows:

{
  ...
  "parts": ["v2020", "folders", "8d55e749-bbd7-e811-9c19-3ca82a1e3f41", "folders"]
}

However, currently it looks like this and not sure how to best take the next step:

{
  ...
  "parts": "[v2020 folders 8d55e749-bbd7-e811-9c19-3ca82a1e3f41 folders]"
}

Bash code parsing URL into its components:

#!/usr/bin/env bash

HREF='https://bar.foo.com/v2020/folders/8d55e749-bbd7-e811-9c19-3ca82a1e3f41/folders'
# remove quotes
HREF=$(echo $HREF | tr -d '"')
echo "  HREF: $HREF"

# extract the PROTOCOL
URL_PROTOCOL=$(echo $HREF | grep :// | sed -e's,^\(.*://\).*,\1,g')
echo "  URL_PROTOCOL: $URL_PROTOCOL"

# extract the PROTOCOL SCHEME
URL_SCHEME=`echo ${URL_PROTOCOL::-3}`
echo "  URL_SCHEME: $URL_SCHEME"

# remove the PROTOCOL -- updated
URL=$(echo $HREF | sed -e s,$URL_PROTOCOL,,g)
echo "  URL: $URL"

# extract the host and port -- updated
URL_HOSTPORT=$(echo $URL | sed -e s,$user@,,g | cut -d/ -f1)
echo "  URL_HOSTPORT: $URL_HOSTPORT"

# by request host without port
URL_HOST="$(echo $URL_HOSTPORT | sed -e 's,:.*,,g')"
echo "  URL_HOST: $URL_HOST"

# by request - try to extract the port
URL_PORT="$(echo $URL_HOSTPORT | sed -e 's,^.*:,:,g' -e 's,.*:\([0-9]*\).*,\1,g' -e 's,[^0-9],,g')"
echo "  URL_PORT: $URL_PORT"

# Extract the path
URL_PATH="$(echo $URL | grep / | cut -d/ -f2-)"
echo "  URL_PATH: $URL_PATH"

IFS='/' read -ra URL_PATH_PARTS <<< "$URL_PATH"
echo "  URL_PATH_PARTS [${#URL_PATH_PARTS[@]}]: ${URL_PATH_PARTS[@]}"

URL_COMPONENTS="{ \
    \"protocol\": \"$URL_PROTOCOL\", \
    \"scheme\": \"$URL_SCHEME\", \
    \"url\": \"$URL\", \
    \"host\": \"$URL_HOST\", \
    \"path\": \"$URL_PATH\", \
    \"parts\": \"[${URL_PATH_PARTS[@]}]\" \
}"

echo -e "\n  URL_COMPONENTS:"
echo $URL_COMPONENTS |
    jq '.'

Console Response

  HREF: https://bar.foo.com/v2020/folders/8d55e749-bbd7-e811-9c19-3ca82a1e3f41/folders
  URL_PROTOCOL: https://
  URL_SCHEME: https
  URL: bar.foo.com/v2020/folders/8d55e749-bbd7-e811-9c19-3ca82a1e3f41/folders
  URL_HOST: bar.foo.com
  URL_PATH: v2020/folders/8d55e749-bbd7-e811-9c19-3ca82a1e3f41/folders
  URL_PATH_PARTS [4]: v2020 folders 8d55e749-bbd7-e811-9c19-3ca82a1e3f41 folders

  URL_COMPONENTS:
{
  "protocol": "https://",
  "scheme": "https",
  "url": "bar.foo.com/v2020/folders/8d55e749-bbd7-e811-9c19-3ca82a1e3f41/folders",
  "host": "bar.foo.com",
  "path": "v2020/folders/8d55e749-bbd7-e811-9c19-3ca82a1e3f41/folders",
  "parts": "[v2020 folders 8d55e749-bbd7-e811-9c19-3ca82a1e3f41 folders]"
}

Thank you

Appreciative of all feedback and suggestions!

4
  • Why are you using both sed and jq? If you're using jq to manipulate JSON, you should have absolutely no reason whatsoever to use sed for the same purpose (which it's far less suited to). Commented Oct 17, 2019 at 19:30
  • 1
    (Also, note that when you're using bash specifically, it has built-in regex support, which is generally many orders-of-magnitude faster than spinning up an external command like sed; see BashFAQ #100 for a general introduction to doing native string manipulation in bash). Commented Oct 17, 2019 at 19:31
  • 1
    Use variable substitutions instead of running commands in sub-shells. For example; instead of URL_HOST="$(echo $URL_HOSTPORT | sed -e 's,:.*,,g')", use URL_HOST=${URL_HOST_PORT%:*}. Commented Oct 17, 2019 at 19:34
  • 1
    See github.com/stedolan/jq/issues/537#issuecomment-51635126, from 2014, showing how to do the whole thing natively in jq alone. Commented Oct 17, 2019 at 19:34

3 Answers 3

2

Don't bother with the array. Use variable substitution:

URL_PATH_PARTS=${URL_PATH//\/ }         # Replace slashes with spaces
SPACES="${URL_PATH_PARTS//[^ ]} "       # Append space to avoid fence-post error.
echo "  URL_PATH_PARTS [${#SPACES}]: ${URL_PATH_PARTS}"

...

 \"parts\": [ \"${URL_PATH_PARTS// /\", \"}\" ] \  # Replace spaces with '", "'

You could also do away with the intermediate 'URL_PATH_PARTS' variable (and lose some readability):

SLASHES="${URL_PATH//[^\/]}/"       # Append slash to avoid fence-post error.
echo "  URL_PATH_PARTS [${#SLASHES}]: ${URL_PATH//\// }"

...

 \"parts\": [ \"${URL_PATH//\//\", \"}\" ] \  # Replace slashes with '", "'
Sign up to request clarification or add additional context in comments.

Comments

1

Current code using: \"parts\": \"[${URL_PATH_PARTS[@]}]\" for the path. Possible solution is to iterate over the elements, creating combined string with quotes, and ',' separator

PP=
for P1 in "${URL_PATH_PARTS[@]}" ; do
  # Add ',' unless this is first item
  [ "$PP" ] && PP="$PP, "
  PP=$PP\"$P1\"
done

The replace IN (URL components)

\"parts\": \"[${URL_PATH_PARTS[@]}]\"

With

\"parts\": [ $PP ]

2 Comments

There's far more to correctly generating valid JSON than just adding commas (and not quoting your array expansions means that a URI that contains wildcard characters can inject filenames from the current directory into your result).
@CharlesDuffy thanks for pointing the unintended array expansion in "for P1" statement.
1

Thanks @CharlesDuffy, @dash-o, @AndrewVickers

I tried out all your suggestions.

The suggested approach I took was joelpurra/jq-hopkok

Bash Code

#!/usr/bin/env bash

URL='"https://apiuatna11.springcm.com/v201411/folders/8d55e749-bbd7-e811-9c19-3ca82a1e3f41/folders"'

# URL to components
echo $URL | ./jq-hopkok/src/url/to-components.sh

JSON response

{
  "value": "https://apiuatna11.springcm.com/v201411/folders/8d55e749-bbd7-e811-9c19-3ca82a1e3f41/folders",
  "valid": true,
  "scheme": {
    "value": "https",
    "valid": true
  },
  "domain": {
    "value": "apiuatna11.springcm.com",
    "components": [
      "apiuatna11.springcm.com",
      "springcm.com",
      "com"
    ],
    "tld": "com",
    "valid": true
  },
  "port": {
    "value": null,
    "separator": false,
    "valid": true
  },
  "path": {
    "value": "/v201411/folders/8d55e749-bbd7-e811-9c19-3ca82a1e3f41/folders",
    "components": [
      "v201411",
      "folders",
      "8d55e749-bbd7-e811-9c19-3ca82a1e3f41",
      "folders"
    ],
    "valid": true
  },
  "query": {
    "value": null,
    "separator": false,
    "components": [],
    "valid": true
  },
  "fragment": {
    "value": null,
    "separator": false,
    "valid": true
  }
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.