3

I want to do cURL GET-request. The following URL should be used:

https://iant.toulouse.inra.fr/bacteria/annotation/cgi/rhime.cgi' -H 'Host: iant.toulouse.inra.fr' -H 'User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:56.0) Gecko/20100101 Firefox/56.0' -H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' -H 'Accept-Language: de,en-US;q=0.7,en;q=0.3' --compressed -H 'Referer: https://iant.toulouse.inra.fr/bacteria/annotation/cgi/rhime.cgi?__wb_cookie=&__wb_cookie_name=auth.rhime&__wb_cookie_path=/bacteria/annotation/cgi&__wb_session=WB84Qfsf&__wb_main_menu=Genome&__wb_function=$parent' -H 'Content-Type: application/x-www-form-urlencoded' -H 'Connection: keep-alive' -H 'Upgrade-Insecure-Requests: 1' -H 'Pragma: no-cache' -H 'Cache-Control: no-cache' --data '__wb_function=PortalExtractSeq&mode=run&species=rhime&fastafile=%2Fwww%2Fbacteria%2Fannotation%2F%2Fsite%2Fprj%2Frhime%2F%2Fdb%2F$ab.genomic&begin=$start&end=$end&strand=$strand

At the end of the URL, I have some words, which I want to design as variables, so depending on the input, the URL is different and I then request another resource.

The end of the URL. $ab, $start, $end and $strand are the variables, all of them are Strings.

...2Frhime%2F%2Fdb%2F$ab.genomic&begin=$start&end=$end&strand=$strand

I came across "urlencode" and I though of storing my URL as one big String in a variable and pass it to URL encode, but I am not sure, how to do it.

I tried this/I am searching for something like this:

#!bin/bash
[...]
cURL="https://iant.toulouse.inra.fr/bacteria/annotation/cgi/rhime.cgi' -H 'Host: iant.toulouse.inra.fr' -H 'User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:56.0) Gecko/20100101 Firefox/56.0' -H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' -H 'Accept-Language: de,en-US;q=0.7,en;q=0.3' --compressed -H 'Referer: https://iant.toulouse.inra.fr/bacteria/annotation/cgi/rhime.cgi?__wb_cookie=&__wb_cookie_name=auth.rhime&__wb_cookie_path=/bacteria/annotation/cgi&__wb_session=WB84Qfsf&__wb_main_menu=Genome&__wb_function=$parent' -H 'Content-Type: application/x-www-form-urlencoded' -H 'Connection: keep-alive' -H 'Upgrade-Insecure-Requests: 1' -H 'Pragma: no-cache' -H 'Cache-Control: no-cache' --data '__wb_function=PortalExtractSeq&mode=run&species=rhime&fastafile=%2Fwww%2Fbacteria%2Fannotation%2F%2Fsite%2Fprj%2Frhime%2F%2Fdb%2F$ab.genomic&begin=$start&end=$end&strand=$strand"

# storing HTTP response code in variable response. Only if the
# reponse code is OK (200), we move on
  response=$(curl -X HEAD -I --header 'Accept:txt/html' "https://iant.toulouse.inra.fr/bacteria/annotation/cgi/rhime.cgi?__wb_cookie=&__wb_cookie_name=auth.rhime&__wb_cookie_path=/bacteria/annotation/cgi&__wb_session=WB8jqwTM&__wb_main_menu=Genome&__wb_function="$location""|head -n1|awk '{print $2}')

  echo "$response"

# getting information via curl request
  if [ $response = 200 ] ; then
    info=$(curl -G "$ (urlencode "$cURL")")
  fi

  echo $info

For my response-code checkup, the method of directly passing $location seems to work, but with more variables, I get an error (response code 100, whereas I get 200 with the code-checkup)

Do I have a general error in understanding curl/urlencode? What did I miss?

Thanks for you time and effort in advance :)

UPDATE

#!/bin/sh
# handling command-line input
file=$1
ecf=$2


# iterating through file and pulling out
# information for the GET- and POST-request

while read -r line
  do
    parent=$(echo $line | awk '{print substr($1,2,3)}')
    start=$(echo $line | awk '{print substr($2,2,6)}')
    end=$(echo $line | awk '{print substr($3,2,6)}')
    strand=$(echo $line | awk '{print substr($4,2,1)}')
    locus=$(echo $line | awk '{print substr($6,2,8)}')

# depending on $parent, the right insertion for the URL is generated
    if [ $parent = "SMc" ] ; then
      location="Genome"
      ab="SMc"
    elif [ $parent = "SMa" ] ; then
      location="PrintPsyma"
      ab="pSymA"
    else [ $parent = "SMb" ]
      location="PrintPsymb"
      ab="pSymB"
    fi
# building variables for curl content request


  options=( --compressed)

  headers=(
    -H 'Host: iant.toulouse.inra.fr'
    -H 'User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:56.0) Gecko/20100101 Firefox/56.0'
    -H 'Accept: txt/html,application/xhtml+xml,application/xml;1=0.9,*/*;q=0.8'
    -H 'Accept-Language: de,en-US;q=0.7,en;q=0.3'
    -H 'Referer: https://iant.toulouse.inra.fr/bacteria/annotation/cgi/rhime.cgi?__wb_cookie=&__wb_cookie_name=auth.rhime&__wb_cookie_path=/bacteria/annotation/cgi&__wb_session=WB84Qfsf&__wb_main_menu=Genome&__wb_function=$parent'
    -H 'Content-Type: application/x-www-form-urlencoded'
    -H 'Connection: keep-alive'
    -H 'Upgrade-Insecure-Requests: 1'
    -H 'Pragma: no-cache'
    -H 'Cache-Control: no-cache'
  )

    url='https://iant.toulouse.inra.fr/bacteria/annotation/cgi/rhime.cgi'

    ab=$(urlencode "${ab}")
    start=$(urlencode "${start}")
    end=$(urlencode "${end}")
    strand=$(urlencode "${strand}")
    data="__wb_function=PortalExtractSeq&mode=run&species=rhime&fastafile=%2Fwww%2Fbacteria%2Fannotation%2F%2Fsite%2Fprj%2Frhime%2F%2Fdb%2F$ab.genomic&begin=$start&end=$end&strand=$strand"




# storing HTTP response code in variable response. Only if the
# reponse code is OK (200), we move on
    response=$(curl -X HEAD -I --header 'Accept:txt/html' "https://iant.toulouse.inra.fr/bacteria/annotation/cgi/rhime.cgi?__wb_cookie=&__wb_cookie_name=auth.rhime&__wb_cookie_path=/bacteria/annotation/cgi&__wb_session=WB8jqwTM&__wb_main_menu=Genome&__wb_function="$location""|head -n1|awk '{print $2}')

    echo "$response"

# getting information via curl request
    if [ $response = 200 ] ; then
        info=$(curl -G "${options[@]}" "${headers[@]}" --data "${data}" "${url}")
    fi

    echo $info

done < $file

1 Answer 1

3

You need to separate concepts. That string that you put in cURL variable is not a URL, it is URL + set of headers + parameters + one option for compression. They all are different things.

Define them separately like this:

url='https://iant.toulouse.inra.fr/bacteria/annotation/cgi/rhime.cgi'
headers=(
    -H 'Host: iant.toulouse.inra.fr'
    -H 'User-Agent: ...'
    -H 'Accept: ...'
    -H 'Accept-Language: ...'
    ... other headers from your example ...
)
options=(
    --compressed
)
data="__wb_function=PortalExtractSeq&mode=run&species=rhime&fastafile=%2Fwww%2Fbacteria%2Fannotation%2F%2Fsite%2Fprj%2Frhime%2F%2Fdb%2F$ab.genomic&begin=$start&end=$end&strand=$strand"

And then run curl in this fashion:

curl -G "${options[@]}" "${headers[@]}" --data "${data}" "${url}"

This will expand to correct curl command.

About urlencode part: You need encode each of $ab, $start, $end and $strand separately. If you insert them in the string and then encode, then all special characters in that string like & and = will be encoded too, and those already encoded ones like %2F in your example will be encoded twice (will become %252F).

To keep the code tidy, you can encode them beforehand:

ab=$(urlencode "${ab}")
start=$(urlencode "${start}")
end=$(urlencode "${end}")
strand=$(urlencode "${strand}")
data="__wb_function=PortalExtractSeq&mode=run&species=rhime&fastafile=%2Fwww%2Fbacteria%2Fannotation%2F%2Fsite%2Fprj%2Frhime%2F%2Fdb%2F$ab.genomic&begin=$start&end=$end&strand=$strand"

... or do it in a cumbersome way:

data="__wb_function=PortalExtractSeq&mode=run&species=rhime&fastafile=%2Fwww%2Fbacteria%2Fannotation%2F%2Fsite%2Fprj%2Frhime%2F%2Fdb%2F$(urlencode "${ab}").genomic&begin=$(urlencode "${start}")&end=$(urlencode "${end}")&strand=$(urlencode "${strand}")"

I hope this helps.

Sign up to request clarification or add additional context in comments.

7 Comments

thank you! it definitely does. I obtained the "URL" from manually copying it using the developer view in my browser. I manually did the request, went to the right one and had the option of "copy as cURL adress". So I thought, what I got would instantly work ^^".
somehow, the brackets on options=(--compressed) give me an error: "(" unexpected. Where does that come from?
@Shushiro Any chance that you have a space character before of after the = operator?
checked the spaces, there are none. Could it have to do with a loop? I went further and I am reading the parameters from a file input (looping with a while-loop). If you interested, I updated the code.
You're using sh and not bash. Change the she-bang in your script from #!/bin/sh to #!/bin/bash and it should work. The syntax I used to define an array is bash-specific. Your question's subject and tags made me think that I can use it.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.