Python list files in directory recursive

Recursively listing files in Python

I’m a complete amateur at python. Other than hello world programs, this is the first thing I’ve ever done in Python. I cannot find a good way to make os.walk function the way I want it to so I worked on my own function for it. I’m curious which would be the best way to do something.

import os def atDirList(startDir, maxDepth=0, minDepth=0, curDepth=0): output = [] curDir = [] curDir = os.listdir(startDir) if curDepth >= minDepth: for item in curDir: fullItem = os.path.join(startDir,item) if os.path.isfile(fullItem): output.append(fullItem) if curDepth+1  
import os def atDirList(startDir, maxDepth=0, minDepth=0, curDepth=0): output = [] curDir = [] curDir = os.listdir(startDir) if curDepth >= minDepth: for item in curDir: fullItem = os.path.join(startDir,item) if os.path.isfile(fullItem) and curDepth >= minDepth: output.append(fullItem) elif os.path.isdir(fullItem) and curDepth+1  

Basically of those, is checking the if once, and traversing the list twice best or is traversing the list once, but checking the if on each entry best?

\$\begingroup\$ @boardrider Reading about MCVE is going to be counter-productive for Code Review. We need the context here, as opposed to Stack Overflow which only cares about the actual problem. \$\endgroup\$

\$\begingroup\$ @vnp minDepth is due to how some of the directories are laid out. In this case it's for generating list of mp3s, and in some cases, there will be a main directory with samples, and subdirectories with the full files. \$\endgroup\$

\$\begingroup\$ @Graipher what context do you need in this case, I'm generating directory listings (possibly later with the addition of filtering by file extension). I'm trying to figure out which is the more optimal way to do that function. \$\endgroup\$

1 Answer 1

In Python you usually want to avoid having a recursive solution, because of the maximum recursion depth. Here this will probably not be a problem, because it is 1000 by default (so unless you directory hierarchy is 1000 levels deep, you are fine).

However, I would at least make your functions generators. They have the advantage that you don't need to generate the whole list in one go, but one element at a time. For this, Python 3 has even an additional nice feature, yield from . Note that you will have to call list on the result if you need the whole list at once (which negates any positive effect of using a generator), but that you can directly iterate over the output of a generator

So, your second function would become:

def atDirList(startDir, maxDepth=0, minDepth=0, curDepth=0): if curDepth >= minDepth: for item in os.listdir(startDir): fullItem = os.path.join(startDir, item) try: if os.path.isfile(fullItem) and curDepth >= minDepth: yield fullItem elif os.path.isdir(fullItem) and curDepth + 1  

Note that I also added a try..except block to skip over files for which you don't have enough permissions to check if they are a file or directory.

As an alternative solution, I would propose to use glob , which basically shell expands a string like "/home/graipher/*/*" to a list of all files and directories matching this pattern. This has two caveats, first we still need to filter out the files from the directories (easily achieved with filter ) and second, this ignores files which start with a . (hidden files).

import glob import os def glob_list(start, max_depth=0, min_depth=0): # start out at least `min_depth` levels deep current_dir = os.path.join(start, *"*" * min_depth) for depth in range(min_depth, max_depth + 1): # go one level deeper current_dir = os.path.join(current_dir, "*") # print(current_dir) yield from filter(os.path.isfile, glob.iglob(current_dir)) if __name__ == "__main__": for file_name in glob_list(os.path.expanduser("~"), max_depth=2, min_depth=1): print(file_name) 

Here I changed the parameter names to conform to Python's official style-guide, PEP8, by using lower_case instead of camelCase .

The *"*" * min_depth part is probably slightly complicated, because of all the stars. Let's break it down:

An iterable (like a list or a string) multiplied with an integer is just the iterable repeated n times:

>>> "a" * 3 "aaa" >>> min_depth = 2 >>> "*" * min_depth "**" 

A * can be used to unpack a tuple into function arguments, like in this function that takes an arbitrary number of parameters:

>>> def f(*args): . print(args) . >>> f(*"aaa") ('a', 'a', 'a') >>> f(*"**") ('*', '*') 

Putting this together, this just joins the base path with min_depth + 1 levels of stars:

>>> os.path.join("/home/graipher", *"*" * min_depth) '/home/graipher/*/*' 

Possible bug:

After having written this alternative implementation I noticed some strange behavior of your (second, I did not check the first) function. When minDepth is greater than zero, you would not expect any files from the base directory. However, they are still included (note that you need to start with a curDepth = minDepth , otherwise your script will never run).

This is what I would write with the additional constraints of having to return a list and possibly have a simple pattern (simple enough to be expressed with wildcard characters or an additional filter function):

def glob_list(start, max_depth=0, min_depth=0, pattern="*", func=os.path.isfile): output = [] for depth in range(min_depth, max_depth + 1): path = os.path.join(start, *("*" * depth), pattern) output.extend(filter(func, glob.iglob(path))) return output 

Источник

Python 3: List the Contents of a Directory, Including Recursively

This article shows how to list the files and directories inside a directory using Python 3. Throughout this article, we'll refer to the following example directory structure:

We'll assume the code examples will be saved in script.py above, and will be run from inside the mydir directory so that the relative path '.' always refers to mydir .

Using pathlib (Python 3.4 and up)

Non-Recursive

iterdir

To list the contents of a directory using Python 3.4 or higher, we can use the built-in pathlib library's iterdir() to iterate through the contents. In our example directory, we can write in script.py :

from pathlib import Path for p in Path( '.' ).iterdir(): print( p )

When we run from inside mydir , we should see output like:

Because iterdir is non-recursive, it only lists the immediate contents of mydir and not the contents of subdirectories (like a1.html ).

Note that each item returned by iterdir is also a pathlib.Path , so we can call any pathlib.Path method on the object. For example, to resolve each item as an absolute path, we can write in script.py :

from pathlib import Path for p in Path( '.' ).iterdir(): print( p.resolve() )

This will list the resolved absolute path of each item instead of just the filenames.

Because iterdir returns a generator object (meant to be used in loops), if we want to store the results in a list variable, we can write:

from pathlib import Path files = list( Path( '.' ).iterdir() ) print( files )

glob

We can also use pathlib.Path.glob to list all files (the equivalent of iterdir ):

from pathlib import Path for p in Path( '.' ).glob( '*' ): print( p )

Filename Pattern Matching with glob

If we want to filter our results using Unix glob command-style pattern matching, glob can handle that too. For example, if we only want to list .html files, we would write in script.py :

from pathlib import Path for p in Path( '.' ).glob( '*.html' ): print( p )

As with iterdir , glob returns a generator object, so we'll have to use list() if we want to convert it to a list:

from pathlib import Path files = list( Path( '.' ).glob( '*.html' ) ) print( files )

Recursive

To recursively list the entire directory tree rooted at a particular directory (including the contents of subdirectories), we can use rglob . In script.py , we can write:

from pathlib import Path for p in Path( '.' ).rglob( '*' ): print( p )

This time, when we run script.py from inside mydir , we should see output like:

rglob is the equivalent of calling glob with **/ at the beginning of the path, so the following code is equivalent to the rglob code we just saw:

from pathlib import Path for p in Path( '.' ).glob( '**/*' ): print( p )

Filename Pattern Matching with rglob

Just as with glob , rglob also allows glob-style pattern matching, but automatically does so recursively. In our example, to list all *.html files in the directory tree rooted at mydir , we can write in script.py :

from pathlib import Path for p in Path( '.' ).rglob( '*.html' ): print( p )

This should display all and only .html files, including those inside subdirectories:

Since rglob is the same as calling glob with **/ , we could also just use glob to achieve the same result:

from pathlib import Path for p in Path( '.' ).glob( '**/*.html' ): print( p )

Not Using pathlib

Non-Recursive

os.listdir

On any version of Python 3, we can use the built-in os library to list directory contents. In script.py , we can write:

import os for filename in os.listdir( '.' ): print( filename )

Unlike with pathlib , os.listdir simply returns filenames as strings, so we can't call methods like .resolve() on the result items. To get full paths, we have to build them manually:

import os root = '.' for filename in os.listdir( root ): relative_path = os.path.join( root, filename ) absolute_path = os.path.abspath( relative_path ) print( absolute_path )

Another difference from pathlib is that os.listdir returns a list of strings, so we don't need to call list() on the result to convert it to a list:

import os files = os.listdir( '.' ) # files is a list print( files )

glob

Also available on all versions of Python 3 is the built-in glob library, which provides Unix glob command-style filename pattern matching.

To list all items in a directory (equivalent to os.listdir ), we can write in script.py :

import glob for filename in glob.glob( './*' ): print( filename )

This will produce output like:

Note that the root directory ( '.' in our example) is simply included in the path pattern passed into glob.glob() .

Filename Pattern Matching with glob

To list only .html files, we can write in script.py :

import glob for filename in glob.glob( './*.html' ): print( filename )

Recursive

Since Python versions lower than 3.5 do not have a recursive glob option, and Python versions 3.5 and up have pathlib.Path.rglob , we'll skip recursive examples of glob.glob here.

os.walk

On any version of Python 3, we can use os.walk to list all the contents of a directory recursively.

os.walk() returns a generator object that can be used with a for loop. Each iteration yields a 3-tuple that represents a directory in the directory tree: - current_dir : the path of the directory that the current iteration represents; - subdirs : list of names (strings) of immediate subdirectories of current_dir ; and - files : list of names (strings) of files inside current_dir .

In our example, we can write in script.py :

import os for current_dir, subdirs, files in os.walk( '.' ): # Current Iteration Directory print( current_dir ) # Directories for dirname in subdirs: print( '\t' + dirname ) # Files for filename in files: print( '\t' + filename )

This produces the following output:

Источник

Читайте также:  How to type in javascript
Оцените статью