1

The class below is driving me crazy. It is stuck on the for loop. I'm unsure why it will not access the last element in self.job_ids. This should be working. Any ideas why this for loop does not work inside the class but it works perfectly find outside the class?

import subprocess

class job_runner():

    def __init__(self, user_id='staffner'):
        self.job_ids = ['12054807', '12054808', '12054809', '12054810', '12054811', '12054812', '12054813', '10', '100']

        self.user_id = user_id

    def update_running_jobs(self):
        '''
            () ->
            Remove job ids from self.job_ids which  completed
        '''
        # find currently running jobs
        curr_running = self.find_running_jobs()

        #iterate over self.job_ids and see if they are in the currently running jobs
        working_ids = self.job_ids
        print 'start working ids:'
        print working_ids
        for j_id in working_ids:

            # for some reason I can not access the last id in the list
            print j_id

            if j_id not in curr_running:
                self.job_ids.remove(j_id)
        print 'job_ids'
        print self.job_ids

    def find_running_jobs(self):
        '''
            () -> list running ids

            Find what job ids are still running on the high performance cluster
        '''
        proc = subprocess.Popen(['squeue -u %s --Format=arrayjobid'%(self.user_id)], stdout=subprocess.PIPE, shell=True)
        out, err = proc.communicate()

        if err == None:

            out_list = out.replace('ARRAY_JOB_ID', '').replace(' ', '').split('\n')

            # filter out any empty strings
            out_list = filter(None, out_list)
            return out_list

        else:
            return False

curr_jobs = job_runner()

curr_jobs.update_running_jobs()

Here's the output (as you can see 100 is never accessed):

start working ids:
['12054807', '12054808', '12054809', '12054810', '12054811', '12054812', '12054813', '10', '100']
12054807
12054808
12054809
12054810
12054811
12054812
12054813
10
job_ids
['12054807', '12054808', '12054809', '12054810', '12054811', '12054812', '12054813', '100']
7
  • It works fine on my machine Commented Aug 29, 2017 at 16:29
  • Is there any reason why it wouldn't work on one machine but works on another? Commented Aug 29, 2017 at 16:33
  • 1
    If I remove find_running_jobs() call and also it's related loop from the code it shows all the items, but running your complete code, my machine print 100 but doesnt print 10 !!! Commented Aug 29, 2017 at 16:34
  • 5
    Because you are modifying the container while iterating over it. You set ` working_ids = self.job_ids` But then in the loop you self.job_ids.remove(j_id) Commented Aug 29, 2017 at 16:38
  • 3
    The issue here is a shallow vs. deep copy of the list. Modifying working_ids is also modifying original list, so you are changing same list you are iterating over as they point to the same object. Commented Aug 29, 2017 at 16:39

1 Answer 1

4

You should change:

working_ids = self.job_ids

to:

working_ids = self.job_ids[:]  # create a copy of job_ids and use it

Explanation: you're modifying the list while iterating it which causes unexpected results, changing the code you'll iterate a copy of the list.

Sign up to request clarification or add additional context in comments.

3 Comments

Thank you so much. I thought I was making a copy.
@Samantha just remember that assignment never, ever copies in Python.
"... never ever copies data" - it copies a reference to the object

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.