Python - Index Error - list index out of range

Question

I am parsing data from a website but getting the error "IndexError: list index out of range". But, at the time of debugging I got all the values. Previously, it worked completely fine, but suddenly can't understand why I am getting this error.

str2 = cols[1].text.strip()

IndexError: list index out of range

Here is my code.

import requests
import DivisionModel
from bs4 import BeautifulSoup
from time import sleep


class DivisionParser:

    def __init__(self, zoneName, zoneUrl):
        self.zoneName = zoneName
        self.zoneUrl = zoneUrl

    def getDivision(self):

        response = requests.get(self.zoneUrl)
        soup = BeautifulSoup(response.content, 'html5lib')
        table = soup.findAll('table', id='mytable')
        rows = table[0].findAll('tr')

        division = []
        for row in rows:
            if row.text.find('T No.') == -1:
                cols = row.findAll('td')

                str1 = cols[0].text.strip()
                str2 = cols[1].text.strip()
                str3 = cols[2].text.strip()
                strurl = cols[2].findAll('a')[0].get('href')
                str4 = cols[3].text.strip()
                str5 = cols[4].text.strip()
                str6 = cols[5].text.strip()
                str7 = cols[6].text.strip()

                divisionModel = DivisionModel.DivisionModel(self.zoneName, str2, str3, strurl, str4, str5, str6, str7)
                division.append(divisionModel)
        return division

These are the values at the time of debugging:

str1 = {str} '1'
str2 = {str} 'BHUSAWAL DIVN-ENGINEERING'
str3 = {str} 'DRMWBSL692019t1'
str4 = {str} 'Bhusawal Division - TRR/P- 44.898Tkms & 2.225Tkms on 9 Bridges total 47.123Tkms on ADEN MMR &'
str5 = {str} 'Open'
str6 = {str} '23/12/2019 15:00'
str7 = {str} '5'
strurl = {str} '/works/pdfdocs/122019/51822293/viewNitPdf_3021149.pdf'

Well, obviously len(cols) < 2. We don't have the input to your program, which can explain why this is the case, so just look into it yourself and decide what to do with it (e.g., remove those specific rows, fix them, etc). — goodvibration
– goodvibration, Commented Dec 18, 2019 at 11:46
@goodvibration please go through the problem again, at the time of debugging i got all the values, each value every time, untill the loop exhaust. — Naman Bhardwaj
– Naman Bhardwaj, Commented Dec 18, 2019 at 11:49
So you disagree on the fact that an IndexError on the line cols[1].text.strip() implies that len(cols) < 2??? — goodvibration
– goodvibration, Commented Dec 18, 2019 at 11:56
Are you considering that sometimes the values will not be returned correctly from the website too and perhaps while debugging everything went fine, but in realtime, the server might fail to handle the request and return a null response ? — Alan Kavanagh
– Alan Kavanagh, Commented Dec 18, 2019 at 12:17

Naman Bhardwaj · Accepted Answer · 2020-01-07 07:24:56Z

when i am parsing data from website by checking T No. in a row and get all the values in td, website developer put "No Result" in some td row, so that's why at the run time my loop will not able to get values and throw "list index out of range error."

Well thanks to all for the help.

class DivisionParser:

def __init__(self, zoneName, zoneUrl):
    self.zoneName = zoneName
    self.zoneUrl = zoneUrl

def getDivision(self):
    global rows
    try:
        response = requests.get(self.zoneUrl)
        soup = BeautifulSoup(response.content, 'html5lib')
        table = soup.findAll('table', id='mytable')
        rows = table[0].findAll('tr')
    except IndexError:
            sleep(2)

    division = []
    for row in rows:
        if row.text.find('T No.') == -1:
            try:
                cols = row.findAll('td')

                str1 = cols[0].text.strip()
                str2 = cols[1].text.strip()
                str3 = cols[2].text.strip()
                strurl = cols[2].findAll('a')[0].get('href')
                str4 = cols[3].text.strip()
                str5 = cols[4].text.strip()
                str6 = cols[5].text.strip()
                str7 = cols[6].text.strip()
                divisionModel = DivisionModel.DivisionModel(self.zoneName, str2, str3, strurl, str4, str5, str6,
                                                            str7)
                division.append(divisionModel)
            except IndexError:
                print("No Result")
    return division

DisappointedByUnaccountableMod · Accepted Answer · 2021-05-13 13:12:07Z

0

As a general rule, whatever comes from the cold and hostile outside world is totally unreliable. Here:

    response = requests.get(self.zoneUrl)
    soup = BeautifulSoup(response.content, 'html5lib')

you seem to suffer from the terrible delusion that the response will always be what you expect. Hint: it wont. It is guaranteed that sometimes the response will be something different - could be that the site is down, or decided to blacklist your IP because they don't like having you scraping their data, or whatever.

IOW, you really want to check the response's status code, AND the response content. Actually, you want to be prepared to just anything - FWIW, since you don't specify a timeout, your code could just stay frozen forever waiting for a response

so actually what you want here is along the line of

try:
    response = requests.get(yoururl, timeout=some_appropriate_value)
    # cf requests doc
    response.raise_for_status() 

# cf requests doc
except requests.exceptions.RequestException as e
    # nothing else you can do here - depending on
    # the context (script ? library code ?), 
    # you either want to re-raise the exception
    # raise your own exception or well, just
    # show the error message and exit. 
    # Only you can decide what's the appropriate course
    print("couldn't fetch {}: {}".format(yoururl, e))
    return

 if not response.headers['content-type'].startswith("text/html"):
     # idem - not what you expected, and you can't do much
     # except mentionning the fact to the caller one way
     # or another. Here I just print the error and return
     # but if this is library code you want to raise an exception
     # instead
     print("{} returned non text/html content {}".format(yoururl, response.headers['content-type'])) 
     print("response content:\n\n{}\n".format(response.text))
     return

 # etc...

request has some rather exhaustive doc, I suggest you read more than the quickstart to learn and use it properly. And that's only half the job - even if you do get a 200 response with no redirections and the right content type, it doesn't mean the markup is what you expect, so here again you have to double-check what you get from BeautifulSoup - for example here:

table = soup.findAll('table', id='mytable')
rows = table[0].findAll('tr')

There's absolutely no garantee that the markup contains any table with a matching id (nor any table at all FWIW), so you have to either check beforehand or handle exceptions:

table = soup.findAll('table', id='mytable')
if not tables:
    # oops, no matching tables ?
    print("no table 'mytable' found in markup")
    print("markup:\n{}\n".format(response.text))
    return
rows = table[0].findAll('tr')
# idem, the table might be empty, etc etc

One of the fun things with programming is that handling the nominal case is often rather straightforward - but then you have to handle all the possible corner cases, and this usually requires as much or more code than the nominal case ;-)

edited May 13, 2021 at 13:12

DisappointedByUnaccountableMod

6,8444 gold badges21 silver badges23 bronze badges

answered Dec 18, 2019 at 12:48

bruno desthuilliers

78.3k6 gold badges103 silver badges129 bronze badges

4 Comments

Naman Bhardwaj Over a year ago

well i have hard-code the website code to stripe it down so, i am pretty much sure about the values but what i am facing here is that how can i check that the values are assigning to variables, so my code will run error free, can you please help me with some example code.

bruno desthuilliers Over a year ago

Your code will never be garanteed to "run error free". What you have to understand is that the response can be just whatever. As I already said, you must handle possible corner cases first at the request level - add a timeout, have a try/except clause around the request.get call (only catching requests exceptions of course), check the response status, content-type etc - cf requests documentation. Then you have to handle errors at the soup level, starting with checking what soup.findAll() returns instead of blindly believing the markup is what you expect.

bruno desthuilliers Over a year ago

I edited my answer with a bit more details, but by all mean do not blindly copy-paste my example code. First because proper error handling depends a lot on the context, and only you know in which context your code is used, and then because it's only a very incomplete example of what can be done. You have to read the docs for your libs, test out things, think about each thing that can go wrong and consider the whole context to do the right thing.

Naman Bhardwaj Over a year ago

Thankyou @bruno desthuilliers for your suggession, i use try/except method and it solve my problem.

Collectives™ on Stack Overflow

Python - Index Error - list index out of range

2 Answers 2

Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related