Python file name matching

Содержание

How To Use Python Fnmatch Module To Handle File Name Matching
1. Python fnmatch Module Supported Wildcards & Functions.
2. Python fnmatch Module Examples.
Leave a Comment Cancel Reply
fnmatch — Unix filename pattern matching¶
Matching Filename Patterns

How To Use Python Fnmatch Module To Handle File Name Matching

Python fnmatch module can support UNIX shell style filename matching. And the python fnmatch matches support the following wildcards and functions.

1. Python fnmatch Module Supported Wildcards & Functions.

*: it can match any character.
?: it can match any single character.
>[character sequence]: it can match any character in the character sequence in brackets. The character sequence also supports middle line representation. For example, [a-c] can represent any of the characters in a, b, and c.
>[! character sequence]: it can match any character not in the bracket character sequence.
fnmatch.fnmatch(filename, pattern) , this function determines whether the specified file name matches the specified pattern.
fnmatch.fnmatchcase(file name, pattern) : this function is similar to the previous function except that it is case sensitive.
fnmatch.filter(names, pattern) : this function filters the names list and returns a subset of file names that match the patterns.
fnmatch.translate(pattern) : this function is used to convert a UNIX shell style pattern to a regular expression pattern.

2. Python fnmatch Module Examples.

The following python code example demonstrates the above function usage.

fnmatch.fnmatch(filename, pattern): This example will filter out all the python files under the current directory.

from pathlib import * import fnmatch import os.path # Traverse all the files and subdirectories under the current directory curr_dir = Path('.') for f in curr_dir.iterdir(): # If the file name end with .py. if fnmatch.fnmatch(f, '*.py'): # Get the file absolute path. f_path = os.path.abspath('.') + '/'+f.name # Print the file path. print(f_path) ======================================================== Output /Users/Documents/WorkSpace/dev2qa.com-example-code/PythonExampleProject/com/dev2qa/example/file/FileOperateExample.py /Users/Documents/WorkSpace/dev2qa.com-example-code/PythonExampleProject/com/dev2qa/example/file/CheckFileExistExample.py /Users/Documents/WorkSpace/dev2qa.com-example-code/PythonExampleProject/com/dev2qa/example/file/OsWalk.py

fnmatch.filter(names, pattern)

import fnmatch if __name__ == '__main__': # Define a list contains 4 python file, fnmatch module do not care whether the files exists or not. file_names = ['a.py','b.py','c.py','d.py'] # Define a pattern string. name_pattern = '[bc].py' # Filter the python file name list with the file name pattern. sub_file_names = fnmatch.filter(file_names, name_pattern) # Print out the filter result. print(sub_file_names) ====================================================== Output ['b.py', 'c.py']

fnmatch.translate(pattern)

import fnmatch unix_pattern_str = '?.py' reg_pattern_str = fnmatch.translate(unix_pattern_str) print(reg_pattern_str) unix_pattern_str = '[a-z].py' reg_pattern_str = fnmatch.translate(unix_pattern_str) print(reg_pattern_str) unix_pattern_str = '[xyz].py' reg_pattern_str = fnmatch.translate(unix_pattern_str) print(reg_pattern_str) ======================================================= Output (?s:.\.py)\Z (?s:[a-z]\.py)\Z (?s:[xyz]\.py)\Z

fnmatch — Unix filename pattern matching¶

This module provides support for Unix shell-style wildcards, which are not the same as regular expressions (which are documented in the re module). The special characters used in shell-style wildcards are:

matches any single character

matches any character in seq

matches any character not in seq

For a literal match, wrap the meta-characters in brackets. For example, ‘[?]’ matches the character ‘?’ .

Note that the filename separator ( ‘/’ on Unix) is not special to this module. See module glob for pathname expansion ( glob uses filter() to match pathname segments). Similarly, filenames starting with a period are not special for this module, and are matched by the * and ? patterns.

Also note that functools.lru_cache() with the maxsize of 32768 is used to cache the compiled regex patterns in the following functions: fnmatch() , fnmatchcase() , filter() .

fnmatch. fnmatch ( filename , pattern ) ¶

Test whether the filename string matches the pattern string, returning True or False . Both parameters are case-normalized using os.path.normcase() . fnmatchcase() can be used to perform a case-sensitive comparison, regardless of whether that’s standard for the operating system.

This example will print all file names in the current directory with the extension .txt :

import fnmatch import os for file in os.listdir('.'): if fnmatch.fnmatch(file, '*.txt'): print(file)

Test whether filename matches pattern, returning True or False ; the comparison is case-sensitive and does not apply os.path.normcase() .

fnmatch. filter ( names , pattern ) ¶

Construct a list from those elements of the iterable names that match pattern. It is the same as [n for n in names if fnmatch(n, pattern)] , but implemented more efficiently.

fnmatch. translate ( pattern ) ¶

Return the shell-style pattern converted to a regular expression for using with re.match() .

>>> import fnmatch, re >>> >>> regex = fnmatch.translate(‘*.txt’) >>> regex ‘(?s:.*\\.txt)\\Z’ >>> reobj = re.compile(regex) >>> reobj.match(‘foobar.txt’)

Unix shell-style path expansion.

Источник

Matching Filename Patterns

When using Windows, and where the path to the target directory has been defined e.g. base_dir = Path(r”D:\Python\Real Python. \Lesson 6”), then s this is a WindowsPath I found two options.

Using glob.glob(os.path.join(base_dir, “backup”)) gives the full path for each file e.g. “D:\Python\Real Python. \Lesson 6\data_01_backup.txt”
To avoid that and just generate the file names matching the pattern, consider the use of glob.glob(“backup”, root_dir=base_dir) which then produces the desired list of just the file names e.g. [‘data_01_backup.txt’, ‘data_02_backup.txt’, ‘data_03_backup.txt’].

Is that an acceptable approach within the context of the specification for glob.glob()? Or is there a better way to get just the file names matching the pattern in the target directory?

One other observation. Using

glob.glob("**/*.py", root_dir=base_dir, recursive=True)

in the example given produces [‘admin.py’, ‘tests.py’, ‘sub_dir\file1.py’, ‘sub_dir\file2.py’]. Is there an easy way to tidy this list up so that the directory separators are either ‘’ or ‘/’?

Regarding the comment above. This should have read that the example given produces

[‘admin.py’, ‘tests.py’, ‘sub_dir\\file1.py’, ‘sub_dir\\file2.py’]

Is there an easy way to tidy this list up so that the directory separators are either ‘’ or ‘/’?

One final question regarding pathlib. Using the example I can get file names using

[file.name for file in base_dir.glob("**/*.py")]

['admin.py', 'tests.py', 'file1.py', 'file2.py']

What I can’t see is a structure to get the equivalent of glob.glob() which gives the result relative to the defined reference path which in this case is the directory ‘Lesson 6’. That would give the result

[‘admin.py’, ‘tests.py’, ‘sub_dir\file1.py’, ‘sub_dir\file2.py’]

Following on from above, I did determine that the following works

for pyfile in base_dir.glob("**/*.py"): pyfile_rel = os.path.relpath(pyfile, base_dir) print(pyfile_rel)

Where base_dir = Path(r“D:\Python\Real Python…\Lesson 6”) The output is then as expected, although not in a list

admin.py tests.py sub_dir\file1.py sub_dir\file2.py

However, not very elegant. Any ideas on improving?

Martin Breuss RP Team on March 14, 2023

@tonypy hi, nice research! 😀

I’m not on a Windows machine to check for path representation of glob.glob() , but you see the double-backslash characters because Python needs to escape backslash characters. So, in a normal string that’s the way they’ll show up.

pathlib solves this issue by a layer of abstraction around paths. When you work with pathlib , then a path isn’t a Python string, but a Path object instead. That gives you a lot of additional possibilities.

Two things that I wanted to pick up from your previous comments:

Recursive Search with .rglob()

You can make recursive search even more clear when you work with Path objects by using .rglob(«*») :

>>> [file.name for file in base_dir.rglob("*.py")] ['admin.py', 'tests.py', 'file1.py', 'file2.py']

If you use .rglob() instead of .glob() , then you can omit the **/ part of the pattern. The method specifically does a recursive search.

Relative Paths with pathlib

You can achieve the same behavior that you’re looking for from glob.glob() also with pathlib , using .relative_to():

>>> [pyfile.relative_to(base_dir) for pyfile in base_dir.rglob("*.py")] [PosixPath('admin.py'), PosixPath('tests.py'), PosixPath('sub_dir/file2.py'), PosixPath('sub_dir/file1.py')]

And if you wanted to show only the string representation of these Path objects, then you could wrap them into str() :

>>> [str(pyfile.relative_to(base_dir)) for pyfile in base_dir.rglob("*.py")] ['admin.py', 'tests.py', 'sub_dir/file2.py', 'sub_dir/file1.py']

Hope that helps! If you enjoy pathlib (I do!), then you can check out the following resources we have on the site:

Many thanks for your feedback and suggestions, they were very useful. I did have realpython.com/courses/pathlib-python/ bookmarked so will be taking that soon.

anaghost on May 9, 2023

it would be nice to add how to deal with shutil being unable to delete dirs when there are permissions issues.