I'm writing a function that recursively traverses the file system, and returns a list of all files with the .txt extension.
The pass_test_func parameter is just a function that can be run and checked (i.e. is the file greater than 100 bytes, etc) - The nothing function (set as its default), simply returns its argument.
My implementation:
def visit(dname, pass_test_func=nothing):
directory = os.listdir(dname)
byte_list = []
for file in directory:
file_dir = os.path.join(dname, file)
if os.path.isfile(file_dir) and file_dir.lower().endswith('.txt'):
size = os.path.getsize(file_dir)
if pass_test_func(size):
byte_list.append(str(size) + ' ' + file_dir)
elif os.path.isdir(file_dir):
visit(file_dir, pass_test_func)
return byte_list
My problem is that when I recursively call visit in the following lines
elif os.path.isdir(file_dir):
visit(file_dir, pass_test_func)
the byte_list is cleared to empty again. I understand why this is happening, but have no idea how I would fix it. The list has to be defined within the definition of visit, so whenever I use recursion it will always be reset no matter what right? Maybe some other data structure is better suited, like a tuple or dictionary?
scandirmodule (whichos.scandirwas based on). On Windows, this will reduce the system call (and associated I/O) overhead from three calls per directory (check if file, check if directory, list directory) + two calls per file (check if file, get size) to just one check per directory; on Linux, it can't avoid thestatfor the size check, but it's still a reduction to one call per directory plus one per file (the type check is provided free on theDirEntryobject).os.walk(which isos.scandirbased onceos.scandiris available), since it handles a lot of stuff for you. The whole function simplifies todef visit(dname, pass_test_func=nothing): return [os.path.join(root, f) for root, _, files in os.walk(dname) for f in files if pass_test_func(os.path.getsize(os.path.join(root, f)))]withos.walk(one-lined because this is a comment; you could easily write a proper generator function that doesn't re-jointherootandfover and over).