1

I am parsing data from a website but getting the error "IndexError: list index out of range". But, at the time of debugging I got all the values. Previously, it worked completely fine, but suddenly can't understand why I am getting this error.

str2 = cols[1].text.strip()

IndexError: list index out of range

Here is my code.

import requests
import DivisionModel
from bs4 import BeautifulSoup
from time import sleep


class DivisionParser:

    def __init__(self, zoneName, zoneUrl):
        self.zoneName = zoneName
        self.zoneUrl = zoneUrl

    def getDivision(self):

        response = requests.get(self.zoneUrl)
        soup = BeautifulSoup(response.content, 'html5lib')
        table = soup.findAll('table', id='mytable')
        rows = table[0].findAll('tr')

        division = []
        for row in rows:
            if row.text.find('T No.') == -1:
                cols = row.findAll('td')

                str1 = cols[0].text.strip()
                str2 = cols[1].text.strip()
                str3 = cols[2].text.strip()
                strurl = cols[2].findAll('a')[0].get('href')
                str4 = cols[3].text.strip()
                str5 = cols[4].text.strip()
                str6 = cols[5].text.strip()
                str7 = cols[6].text.strip()

                divisionModel = DivisionModel.DivisionModel(self.zoneName, str2, str3, strurl, str4, str5, str6, str7)
                division.append(divisionModel)
        return division


These are the values at the time of debugging:

str1 = {str} '1'
str2 = {str} 'BHUSAWAL DIVN-ENGINEERING'
str3 = {str} 'DRMWBSL692019t1'
str4 = {str} 'Bhusawal Division - TRR/P- 44.898Tkms & 2.225Tkms on 9 Bridges total 47.123Tkms on ADEN MMR &'
str5 = {str} 'Open'
str6 = {str} '23/12/2019 15:00'
str7 = {str} '5'
strurl = {str} '/works/pdfdocs/122019/51822293/viewNitPdf_3021149.pdf'
12
  • 1
    Well, obviously len(cols) < 2. We don't have the input to your program, which can explain why this is the case, so just look into it yourself and decide what to do with it (e.g., remove those specific rows, fix them, etc). Commented Dec 18, 2019 at 11:46
  • @goodvibration please go through the problem again, at the time of debugging i got all the values, each value every time, untill the loop exhaust. Commented Dec 18, 2019 at 11:49
  • 1
    So you disagree on the fact that an IndexError on the line cols[1].text.strip() implies that len(cols) < 2??? Commented Dec 18, 2019 at 11:56
  • 1
    @aviboy2006 Thankyou i am new here. Commented Dec 18, 2019 at 12:01
  • 1
    Are you considering that sometimes the values will not be returned correctly from the website too and perhaps while debugging everything went fine, but in realtime, the server might fail to handle the request and return a null response ? Commented Dec 18, 2019 at 12:17

2 Answers 2

0

when i am parsing data from website by checking T No. in a row and get all the values in td, website developer put "No Result" in some td row, so that's why at the run time my loop will not able to get values and throw "list index out of range error."

Well thanks to all for the help.

class DivisionParser:

def __init__(self, zoneName, zoneUrl):
    self.zoneName = zoneName
    self.zoneUrl = zoneUrl

def getDivision(self):
    global rows
    try:
        response = requests.get(self.zoneUrl)
        soup = BeautifulSoup(response.content, 'html5lib')
        table = soup.findAll('table', id='mytable')
        rows = table[0].findAll('tr')
    except IndexError:
            sleep(2)

    division = []
    for row in rows:
        if row.text.find('T No.') == -1:
            try:
                cols = row.findAll('td')

                str1 = cols[0].text.strip()
                str2 = cols[1].text.strip()
                str3 = cols[2].text.strip()
                strurl = cols[2].findAll('a')[0].get('href')
                str4 = cols[3].text.strip()
                str5 = cols[4].text.strip()
                str6 = cols[5].text.strip()
                str7 = cols[6].text.strip()
                divisionModel = DivisionModel.DivisionModel(self.zoneName, str2, str3, strurl, str4, str5, str6,
                                                            str7)
                division.append(divisionModel)
            except IndexError:
                print("No Result")
    return division
Sign up to request clarification or add additional context in comments.

Comments

0

As a general rule, whatever comes from the cold and hostile outside world is totally unreliable. Here:

    response = requests.get(self.zoneUrl)
    soup = BeautifulSoup(response.content, 'html5lib')

you seem to suffer from the terrible delusion that the response will always be what you expect. Hint: it wont. It is guaranteed that sometimes the response will be something different - could be that the site is down, or decided to blacklist your IP because they don't like having you scraping their data, or whatever.

IOW, you really want to check the response's status code, AND the response content. Actually, you want to be prepared to just anything - FWIW, since you don't specify a timeout, your code could just stay frozen forever waiting for a response

so actually what you want here is along the line of

try:
    response = requests.get(yoururl, timeout=some_appropriate_value)
    # cf requests doc
    response.raise_for_status() 

# cf requests doc
except requests.exceptions.RequestException as e
    # nothing else you can do here - depending on
    # the context (script ? library code ?), 
    # you either want to re-raise the exception
    # raise your own exception or well, just
    # show the error message and exit. 
    # Only you can decide what's the appropriate course
    print("couldn't fetch {}: {}".format(yoururl, e))
    return

 if not response.headers['content-type'].startswith("text/html"):
     # idem - not what you expected, and you can't do much
     # except mentionning the fact to the caller one way
     # or another. Here I just print the error and return
     # but if this is library code you want to raise an exception
     # instead
     print("{} returned non text/html content {}".format(yoururl, response.headers['content-type'])) 
     print("response content:\n\n{}\n".format(response.text))
     return

 # etc...

request has some rather exhaustive doc, I suggest you read more than the quickstart to learn and use it properly. And that's only half the job - even if you do get a 200 response with no redirections and the right content type, it doesn't mean the markup is what you expect, so here again you have to double-check what you get from BeautifulSoup - for example here:

table = soup.findAll('table', id='mytable')
rows = table[0].findAll('tr')

There's absolutely no garantee that the markup contains any table with a matching id (nor any table at all FWIW), so you have to either check beforehand or handle exceptions:

table = soup.findAll('table', id='mytable')
if not tables:
    # oops, no matching tables ?
    print("no table 'mytable' found in markup")
    print("markup:\n{}\n".format(response.text))
    return
rows = table[0].findAll('tr')
# idem, the table might be empty, etc etc

One of the fun things with programming is that handling the nominal case is often rather straightforward - but then you have to handle all the possible corner cases, and this usually requires as much or more code than the nominal case ;-)

4 Comments

well i have hard-code the website code to stripe it down so, i am pretty much sure about the values but what i am facing here is that how can i check that the values are assigning to variables, so my code will run error free, can you please help me with some example code.
Your code will never be garanteed to "run error free". What you have to understand is that the response can be just whatever. As I already said, you must handle possible corner cases first at the request level - add a timeout, have a try/except clause around the request.get call (only catching requests exceptions of course), check the response status, content-type etc - cf requests documentation. Then you have to handle errors at the soup level, starting with checking what soup.findAll() returns instead of blindly believing the markup is what you expect.
I edited my answer with a bit more details, but by all mean do not blindly copy-paste my example code. First because proper error handling depends a lot on the context, and only you know in which context your code is used, and then because it's only a very incomplete example of what can be done. You have to read the docs for your libs, test out things, think about each thing that can go wrong and consider the whole context to do the right thing.
Thankyou @bruno desthuilliers for your suggession, i use try/except method and it solve my problem.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.