0

I've been trying to access a website with no API. I want to retreive my current "queue" from the website. But it won't let me access this part of the website if i'm not logged in. Here is my code :

login_data = { 
    'action': 'https://www.crunchyroll.com/?a=formhandler',
    'name': 'my_username',
    'password': 'my_password' 
}



import requests

with requests.Session() as s:
    s.post('https://www.crunchyroll.com/login', data=login_data)
    ck = s.cookies
    r = s.get('https://www.crunchyroll.com/home/queue')
    print r.text

Right now, I get a page :

<html lang="en">
  <head>
    <title>Redirecting...</title>
    <meta http-equiv="refresh" content="0;url=http://www.crunchyroll.com/home/queue" />
  </head>
  <body>
    <script type="text/javascript">
      document.location.href="http:\/\/www.crunchyroll.com\/home\/queue";
    </script>
  </body>
</html>

I think it should work, but I'm only getting the redirecting page ... How am I suppose to get past that ?

Thanks !

4
  • 1
    Have you tried doing s.get('http://www.crunchyroll.com/home/queue') instead? Since that is where you are being redirected. Commented Aug 14, 2014 at 13:17
  • Yah, but how can I access the page if it's not stored into a response ? Commented Aug 14, 2014 at 13:26
  • I tried it, but instead it is redirecting me to the login page ? Commented Aug 14, 2014 at 13:48
  • I see now, the login POST is not working, thats why when you GET any page it redirects to login. See my final answer below. Commented Aug 14, 2014 at 14:45

1 Answer 1

1

The redirect is happening because you are not logging into the site properly - you have the wrong form URL for the POST request, and you're not POSTing all the form data the site is expecting.

You can figure out what is required to login by looking at the source code for https://www.crunchyroll.com/login. The parts that matter are the <form> tag and <input> tags:

<form id="RpcApiUser_Login" method="post" action="https://www.crunchyroll.com/?a=formhandler">
<input type="hidden" name="formname" value="RpcApiUser_Login" />
<input type="text" name="name" value="my_user_name_goes_here" /></td>
<input type="password" name="my_password_goes_here" /></td>
</form>

When this means is that when you click Submit, there is a POST request to the URL https://www.crunchyroll.com/?a=formhandler, with key/value pairs of data like formname=RpcApiUser_Login. To replicate this in Python you need to POST all this same pairs of data to that URL.

To learn more about CGI programming like this, look here.

Try this Python code, it works:

import requests

login_data = { 
    'name': 'my_username',
    'password': 'my_password' 
    'formname': 'RpcApiUser_Login'
}

with requests.Session() as s:
    s.post('https://www.crunchyroll.com/?a=formhandler', data=login_data)
    r = s.get('http://www.crunchyroll.com/home/queue')
    print r.text
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks! I've got some more question 1) Why do we have to send the form data directly to their formhandler ? 2) I don't really understand and I don't quite get WHY we have to delete the 's' at the end of HTTP ? 3) How did you know that 'formname' required the 'RpcApiUser_Login' because when I look at the source code, I would have use 'id' or 'value' instead of 'formname'... ?
1) Because that is what the form on the crunchyroll login page does - I just copied how their page works. 2) Because when you go to the crunchyroll site in a web browser, after login it redirects to you http - I just copy how their site works. 3) I added more explanation about how you can figure this stuff out from the source code. Hope this helps.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.