1

I'm writing a function that recursively traverses the file system, and returns a list of all files with the .txt extension.

The pass_test_func parameter is just a function that can be run and checked (i.e. is the file greater than 100 bytes, etc) - The nothing function (set as its default), simply returns its argument.

My implementation:

def visit(dname, pass_test_func=nothing):           
    directory = os.listdir(dname)                   
    byte_list = []
    for file in directory:
        file_dir = os.path.join(dname, file)
        if os.path.isfile(file_dir) and file_dir.lower().endswith('.txt'):
            size = os.path.getsize(file_dir)
            if pass_test_func(size):
                byte_list.append(str(size) + ' ' + file_dir)
        elif os.path.isdir(file_dir):
            visit(file_dir, pass_test_func)
    return byte_list

My problem is that when I recursively call visit in the following lines

elif os.path.isdir(file_dir):
                visit(file_dir, pass_test_func)

the byte_list is cleared to empty again. I understand why this is happening, but have no idea how I would fix it. The list has to be defined within the definition of visit, so whenever I use recursion it will always be reset no matter what right? Maybe some other data structure is better suited, like a tuple or dictionary?

3
  • You should know that someone put effort into a solution to your problem already: docs.python.org/3/library/os.html#os.scandir Commented Nov 27, 2017 at 18:04
  • @KlausD.: Or on older Python, you can use the PyPI scandir module (which os.scandir was based on). On Windows, this will reduce the system call (and associated I/O) overhead from three calls per directory (check if file, check if directory, list directory) + two calls per file (check if file, get size) to just one check per directory; on Linux, it can't avoid the stat for the size check, but it's still a reduction to one call per directory plus one per file (the type check is provided free on the DirEntry object). Commented Nov 27, 2017 at 18:19
  • @KlausD.: Mind you, in most cases, you're probably better off just using os.walk (which is os.scandir based once os.scandir is available), since it handles a lot of stuff for you. The whole function simplifies to def visit(dname, pass_test_func=nothing): return [os.path.join(root, f) for root, _, files in os.walk(dname) for f in files if pass_test_func(os.path.getsize(os.path.join(root, f)))] with os.walk (one-lined because this is a comment; you could easily write a proper generator function that doesn't re-join the root and f over and over). Commented Nov 27, 2017 at 18:23

2 Answers 2

3

Your function returns byte_list, so just append the returned value when you make your recursive call, instead of throwing it away as you currently do:

elif os.path.isdir(file_dir):    
    byte_list += visit(file_dir, pass_test_func)
Sign up to request clarification or add additional context in comments.

Comments

2

Add an optional argument that can be used in the recursive case:

# Using * makes byte_list keyword-only, so it can't be passed by normal callers by accident
def visit(dname, pass_test_func=nothing, *, byte_list=None):           
    directory = os.listdir(dname)           

    # When not passed explicitly, initialize as empty list
    if byte_list is None:
        byte_list = []
    for file in directory:
        file_dir = os.path.join(dname, file)
        if os.path.isfile(file_dir) and file_dir.lower().endswith('.txt'):
            size = os.path.getsize(file_dir)
            if pass_test_func(size):
                byte_list.append(str(size) + ' ' + file_dir)
        elif os.path.isdir(file_dir):
            # Pass explicitly to recursive call
            visit(file_dir, pass_test_func, byte_list=byte_list)
    return byte_list

As an alternative, as suggested by Blorgbeard, since you return byte_list, use that for your visit calls, changing only a single line in your original code:

        visit(file_dir, pass_test_func)

to:

        byte_list += visit(file_dir, pass_test_func)

This creates additional temporary lists, but that's usually not a big deal.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.