Python файлы одного расширения

Содержание

Find all files in a directory with extension .txt in Python
25 Answers 25
glob.iglob()
glob.glob1()
fnmatch.filter()
Python v3.5+
Вывод файлов с определённым расширением
2 ответа 2
How can I check the extension of a file?
14 Answers 14

Find all files in a directory with extension .txt in Python

This question’s answers are a community effort. Edit existing answers to improve this post. It is not currently accepting new answers or interactions.

25 Answers 25

import glob, os os.chdir("/mydir") for file in glob.glob("*.txt"): print(file)

import os for file in os.listdir("/mydir"): if file.endswith(".txt"): print(os.path.join("/mydir", file))

or if you want to traverse directory, use os.walk :

import os for root, dirs, files in os.walk("/mydir"): for file in files: if file.endswith(".txt"): print(os.path.join(root, file))

chdir is not just unnecessary, it’s highly undesirable since a function call is not supposed to irreversibly modify the current working directory. This may cause a lot of downstream errors.

>>> import glob >>> glob.glob('./*.txt') ['./outline.txt', './pip-log.txt', './test.txt', './testingvim.txt']

Something like that should do the job

for root, dirs, files in os.walk(directory): for file in files: if file.endswith('.txt'): print(file)

Note that this is case sensitive (won’t match .TXT or .Txt), so you’ll probably want to do if file.lower().endswith(‘.txt’):

can’t this be abbreviated to something like the following? for file in os.walk(directory).files: if file.endswith .

import pathlib list(pathlib.Path('your_directory').glob('*.txt'))

for txt_file in pathlib.Path('your_directory').glob('*.txt'): # do something with "txt_file"

If you want it recursive you can use .glob(‘**/*.txt’)

1 The pathlib module was included in the standard library in python 3.4. But you can install back-ports of that module even on older Python versions (i.e. using conda or pip ): pathlib and pathlib2 .

@gota get all files and then exclude the txt files set(pathlib.Path(‘.’).iterdir()) — set(pathlib.Path().glob(«*.txt»))

Something like this will work:

>>> import os >>> path = '/usr/share/cups/charmaps' >>> text_files = [f for f in os.listdir(path) if f.endswith('.txt')] >>> text_files ['euc-cn.txt', 'euc-jp.txt', 'euc-kr.txt', 'euc-tw.txt', . 'windows-950.txt']

You could use os.path.join on each element of text_files . It could be something like text_files = [os.path.join(path, f) for f in os.listdir(path) if f.endswith(‘.txt’)] .

import os path = 'mypath/path' files = os.listdir(path) files_txt = [i for i in files if i.endswith('.txt')]

import os for root, dirs, files in os.walk(dir): for f in files: if os.path.splitext(f)[1] == '.txt': fullpath = os.path.join(root, f) print(fullpath)

import os fileiter = (os.path.join(root, f) for root, _, files in os.walk(dir) for f in files) txtfileiter = (f for f in fileiter if os.path.splitext(f)[1] == '.txt') for txt in txtfileiter: print(txt)

Here’s more versions of the same that produce slightly different results:

glob.iglob()

import glob for f in glob.iglob("/mydir/*/*.txt"): # generator, search immediate subdirectories print f

glob.glob1()

print glob.glob1("/mydir", "*.tx?") # literal_directory, basename_pattern

fnmatch.filter()

import fnmatch, os print fnmatch.filter(os.listdir("/mydir"), "*.tx?") # include dot-files

Try this this will find all your files recursively:

import glob, os os.chdir("H:\\wallpaper")# use whatever directory you want #double\\ no single \ for file in glob.glob("**/*.txt", recursive = True): print(file)

Python v3.5+

Fast method using os.scandir in a recursive function. Searches for all files with a specified extension in folder and sub-folders. It is fast, even for finding 10,000s of files.

Читайте также: Css selector all примеры

I have also included a function to convert the output to a Pandas Dataframe.

import os import re import pandas as pd import numpy as np def findFilesInFolderYield(path, extension, containsTxt='', subFolders = True, excludeText = ''): """ Recursive function to find all files of an extension type in a folder (and optionally in all subfolders too) path: Base directory to find files extension: File extension to find. e.g. 'txt'. Regular expression. Or 'ls\d' to match ls1, ls2, ls3 etc containsTxt: List of Strings, only finds file if it contains this text. Ignore if '' (or blank) subFolders: Bool. If True, find files in all subfolders under path. If False, only searches files in the specified folder excludeText: Text string. Ignore if ''. Will exclude if text string is in path. """ if type(containsTxt) == str: # if a string and not in a list containsTxt = [containsTxt] myregexobj = re.compile('\.' + extension + '$') # Makes sure the file extension is at the end and is preceded by a . try: # Trapping a OSError or FileNotFoundError: File permissions problem I believe for entry in os.scandir(path): if entry.is_file() and myregexobj.search(entry.path): # bools = [True for txt in containsTxt if txt in entry.path and (excludeText == '' or excludeText not in entry.path)] if len(bools)== len(containsTxt): yield entry.stat().st_size, entry.stat().st_atime_ns, entry.stat().st_mtime_ns, entry.stat().st_ctime_ns, entry.path elif entry.is_dir() and subFolders: # if its a directory, then repeat process as a nested function yield from findFilesInFolderYield(entry.path, extension, containsTxt, subFolders) except OSError as ose: print('Cannot access ' + path +'. Probably a permissions error ', ose) except FileNotFoundError as fnf: print(path +' not found ', fnf) def findFilesInFolderYieldandGetDf(path, extension, containsTxt, subFolders = True, excludeText = ''): """ Converts returned data from findFilesInFolderYield and creates and Pandas Dataframe. Recursive function to find all files of an extension type in a folder (and optionally in all subfolders too) path: Base directory to find files extension: File extension to find. e.g. 'txt'. Regular expression. Or 'ls\d' to match ls1, ls2, ls3 etc containsTxt: List of Strings, only finds file if it contains this text. Ignore if '' (or blank) subFolders: Bool. If True, find files in all subfolders under path. If False, only searches files in the specified folder excludeText: Text string. Ignore if ''. Will exclude if text string is in path. """ fileSizes, accessTimes, modificationTimes, creationTimes , paths = zip(*findFilesInFolderYield(path, extension, containsTxt, subFolders)) df = pd.DataFrame(< 'FLS_File_Size':fileSizes, 'FLS_File_Access_Date':accessTimes, 'FLS_File_Modification_Date':np.array(modificationTimes).astype('timedelta64[ns]'), 'FLS_File_Creation_Date':creationTimes, 'FLS_File_PathName':paths, >) df['FLS_File_Modification_Date'] = pd.to_datetime(df['FLS_File_Modification_Date'],infer_datetime_format=True) df['FLS_File_Creation_Date'] = pd.to_datetime(df['FLS_File_Creation_Date'],infer_datetime_format=True) df['FLS_File_Access_Date'] = pd.to_datetime(df['FLS_File_Access_Date'],infer_datetime_format=True) return df ext = 'txt' # regular expression containsTxt=[] path = 'C:\myFolder' df = findFilesInFolderYieldandGetDf(path, ext, containsTxt, subFolders = True)

Источник

Вывод файлов с определённым расширением

Подскажите, пожалуйста, как её лучше модифицировать, чтобы она выводила не только названия файлов, но и пути к ним? Т.е. сейчас:

 [[tem1.txt, tem2.txt, tem3.txt],[tem10.txt, tem11.txt]]

 [[home/test/tem1.txt, home/test/tem2.txt, home/test/tem3.txt],[home/test10/tem10.txt, home/test10/tem11.txt]]

Добавляйте ещё свой dirs[x], как нибудь типа return [map(lambda i: dirs[x] + ‘/’ + i, filter(lambda i: ‘.hdf’ in i, os.listdir(dirs[x]))) for x in range(len(dirs))]

2 ответа 2

Итого, функция из комментария:

@staticmethod def selectfiles(dirs): return [map(lambda i: dirs[x] + '/' + i, filter(lambda i: '.hdf' in i, os.listdir(dirs[x]))) for x in range(len(dirs))]

Этот же алгоритм, но pythonic way :

@staticmethod def select_files2(dirs): files = set() for dir in dirs: for file in os.listdir(dir): if file.endswith('.hdf'): files.add(os.path.join(dir, file)) return list(files)

функции у вас разные результаты возвращают. Первая возвращает вложенный список (что OP хочет), вторая — плоский набор файлов.

Вместо selectfiles(dirs) функции, которая принимает список директорий, лучше создать функцию getfiles(dirname) , которая принимает одну директорию и использовать map() , чтобы применить её к списку dirs всех директорий: list2D = map(getfiles, dirs) .

Чтобы получить из директории все файлы с заданным расширением:

import os from glob import glob def getfiles(dirname): return glob(os.path.join(dirname, '*.hdf'))

Если dirname не содержит файловых шаблонов, то это равнозначно:

def getfiles(dirname): return [os.path.join(dirname, filename) for filename in os.listdir(dirname) if filename.endswith('.hdf')]

В более общем случае (когда задан произвольный шаблон для имени файла), можно использовать glob.glob1() , fnmatch.filter() функции.

from pathlib import Path list2D = [list(Path(dirname).glob('*.hdf')) for dirname in dirs]

Источник

How can I check the extension of a file?

I’m working on a certain program where I need to do different things depending on the extension of the file. Could I just use this?

14 Answers 14

Assuming m is a string, you can use endswith :

if m.endswith('.mp3'): . elif m.endswith('.flac'): .

To be case-insensitive, and to eliminate a potentially large else-if chain:

m.lower().endswith(('.png', '.jpg', '.jpeg'))

@Stevoisiak, I think you misplaced your comment as this solution works even in the case you point out

This doesn’t account for folder names with periods. C:/folder.jpg is a valid path. You can confirm if it is a file or folder with os.path.isfile(m)

os.path provides many functions for manipulating paths/filenames. (docs)

Читайте также: Java runtime windows x86

os.path.splitext takes a path and splits the file extension from the end of it.

import os filepaths = ["/folder/soundfile.mp3", "folder1/folder/soundfile.flac"] for fp in filepaths: # Split the extension from the path and normalise it to lowercase. ext = os.path.splitext(fp)[-1].lower() # Now we can simply use == to check for equality, no need for wildcards. if ext == ".mp3": print fp, "is an mp3!" elif ext == ".flac": print fp, "is a flac file!" else: print fp, "is an unknown file format."

/folder/soundfile.mp3 is an mp3! folder1/folder/soundfile.flac is a flac file!

This method ignores leading periods so /.mp3 is not considered an mp3 file. This is however the way a leading space should be treated. E.g .gitignore is not a file format

This doesn’t account for folder names with periods. ( C:/folder.jpg/file.mp3 is a valid path). You can exclude those with os.path.isfile(m)

Use pathlib From Python3.4 onwards.

from pathlib import Path Path('my_file.mp3').suffix == '.mp3'

If you are working with folders that contain periods, you can perform an extra check using

Path('your_folder.mp3').is_file() and Path('your_folder.mp3').suffix == '.mp3'

to ensure that a folder with a .mp3 suffix is not interpreted to be an mp3 file.

Look at module fnmatch. That will do what you’re trying to do.

import fnmatch import os for file in os.listdir('.'): if fnmatch.fnmatch(file, '*.txt'): print file

from glob import glob . for files in glob('path/*.mp3'): do something for files in glob('path/*.flac'): do something else

import os if os.path.splitext(file)[1] == ".mp3": # do something

os.path.splitext(file) will return a tuple with two values (the filename without extension + just the extension). The second index ([1]) will therefor give you just the extension. The cool thing is, that this way you can also access the filename pretty easily, if needed!

An old thread, but may help future readers.

I would avoid using .lower() on filenames if for no other reason than to make your code more platform independent. (linux is case sensistive, .lower() on a filename will surely corrupt your logic eventually . or worse, an important file!)

Why not use re? (Although to be even more robust, you should check the magic file header of each file. How to check type of files without extensions in python? )

import re def checkext(fname): if re.search('\.mp3$',fname,flags=re.IGNORECASE): return('mp3') if re.search('\.flac$',fname,flags=re.IGNORECASE): return('flac') return('skip') flist = ['myfile.mp3', 'myfile.MP3','myfile.mP3','myfile.mp4','myfile.flack','myfile.FLAC', 'myfile.Mov','myfile.fLaC'] for f in flist: print "<> ==> <>".format(f,checkext(f))

myfile.mp3 ==> mp3 myfile.MP3 ==> mp3 myfile.mP3 ==> mp3 myfile.mp4 ==> skip myfile.flack ==> skip myfile.FLAC ==> flac myfile.Mov ==> skip myfile.fLaC ==> flac

Источник