I am putting together some bash script for parsing a URL into its components. I am blocked trying to figure out how to add an array value to a key within a JSON body.
Attempted Approach:
I have parsed the following URL:
https://bar.foo.com/v2020/folders/8d55e749-bbd7-e811-9c19-3ca82a1e3f41/folders
This URL's path is:
URL_PATH: v2020/folders/8d55e749-bbd7-e811-9c19-3ca82a1e3f41/folders
This URL's path parts array is using
IFS='/' read -ra URL_PATH_PARTS <<< "$URL_PATH"
URL_PATH_PARTS [4]: v2020 folders 8d55e749-bbd7-e811-9c19-3ca82a1e3f41 folders
I want to add an array value to JSON that is formatted as follows:
{
...
"parts": ["v2020", "folders", "8d55e749-bbd7-e811-9c19-3ca82a1e3f41", "folders"]
}
However, currently it looks like this and not sure how to best take the next step:
{
...
"parts": "[v2020 folders 8d55e749-bbd7-e811-9c19-3ca82a1e3f41 folders]"
}
Bash code parsing URL into its components:
#!/usr/bin/env bash
HREF='https://bar.foo.com/v2020/folders/8d55e749-bbd7-e811-9c19-3ca82a1e3f41/folders'
# remove quotes
HREF=$(echo $HREF | tr -d '"')
echo " HREF: $HREF"
# extract the PROTOCOL
URL_PROTOCOL=$(echo $HREF | grep :// | sed -e's,^\(.*://\).*,\1,g')
echo " URL_PROTOCOL: $URL_PROTOCOL"
# extract the PROTOCOL SCHEME
URL_SCHEME=`echo ${URL_PROTOCOL::-3}`
echo " URL_SCHEME: $URL_SCHEME"
# remove the PROTOCOL -- updated
URL=$(echo $HREF | sed -e s,$URL_PROTOCOL,,g)
echo " URL: $URL"
# extract the host and port -- updated
URL_HOSTPORT=$(echo $URL | sed -e s,$user@,,g | cut -d/ -f1)
echo " URL_HOSTPORT: $URL_HOSTPORT"
# by request host without port
URL_HOST="$(echo $URL_HOSTPORT | sed -e 's,:.*,,g')"
echo " URL_HOST: $URL_HOST"
# by request - try to extract the port
URL_PORT="$(echo $URL_HOSTPORT | sed -e 's,^.*:,:,g' -e 's,.*:\([0-9]*\).*,\1,g' -e 's,[^0-9],,g')"
echo " URL_PORT: $URL_PORT"
# Extract the path
URL_PATH="$(echo $URL | grep / | cut -d/ -f2-)"
echo " URL_PATH: $URL_PATH"
IFS='/' read -ra URL_PATH_PARTS <<< "$URL_PATH"
echo " URL_PATH_PARTS [${#URL_PATH_PARTS[@]}]: ${URL_PATH_PARTS[@]}"
URL_COMPONENTS="{ \
\"protocol\": \"$URL_PROTOCOL\", \
\"scheme\": \"$URL_SCHEME\", \
\"url\": \"$URL\", \
\"host\": \"$URL_HOST\", \
\"path\": \"$URL_PATH\", \
\"parts\": \"[${URL_PATH_PARTS[@]}]\" \
}"
echo -e "\n URL_COMPONENTS:"
echo $URL_COMPONENTS |
jq '.'
Console Response
HREF: https://bar.foo.com/v2020/folders/8d55e749-bbd7-e811-9c19-3ca82a1e3f41/folders
URL_PROTOCOL: https://
URL_SCHEME: https
URL: bar.foo.com/v2020/folders/8d55e749-bbd7-e811-9c19-3ca82a1e3f41/folders
URL_HOST: bar.foo.com
URL_PATH: v2020/folders/8d55e749-bbd7-e811-9c19-3ca82a1e3f41/folders
URL_PATH_PARTS [4]: v2020 folders 8d55e749-bbd7-e811-9c19-3ca82a1e3f41 folders
URL_COMPONENTS:
{
"protocol": "https://",
"scheme": "https",
"url": "bar.foo.com/v2020/folders/8d55e749-bbd7-e811-9c19-3ca82a1e3f41/folders",
"host": "bar.foo.com",
"path": "v2020/folders/8d55e749-bbd7-e811-9c19-3ca82a1e3f41/folders",
"parts": "[v2020 folders 8d55e749-bbd7-e811-9c19-3ca82a1e3f41 folders]"
}
Thank you
Appreciative of all feedback and suggestions!
sedandjq? If you're usingjqto manipulate JSON, you should have absolutely no reason whatsoever to usesedfor the same purpose (which it's far less suited to).sed; see BashFAQ #100 for a general introduction to doing native string manipulation in bash).URL_HOST="$(echo $URL_HOSTPORT | sed -e 's,:.*,,g')", useURL_HOST=${URL_HOST_PORT%:*}.jqalone.