Python if file has extension

Python list Files in Directory with Extension txt

In this Python tutorial, we will see how to list all files of a directory having a specific extension.

Sometimes we need to list files having a specific extension before performing any operation on them. For example, if you wanted to copy only text files from one location to another. In this case, we need to make sure we are only looking for files having a .txt extension.

We will use the following three methods.

Table of contents

How to list files in directory with extension txt

A file extension, or filename extension, is a suffix at the end of a file. It comes after the period. Extension specifies a file type such as text, CSV file, pdf, or image file. For example, for a text file, it is txt . For image file it is jpg , jpeg , or bmp .

Here are the steps to get the list of files having the txt extension using a glob module.

  1. Import glob module The glob module, part of the Python Standard Library, is used to find the files and folders whose names follow a specific pattern. The searching rules are similar to the Unix Shell path expansion rules.
  2. Construct a pattern to search for the files having the specific extension For example, directory_path/*.txt to list all text files present in a given directory path. Here the * means file name can be anything, but it must have a txt extension.
  3. Use glob() method The gob.glob(pathname) method returns a list of files that matches the path and pattern specified in the pathname argument. in this case, it will return all text files.

Example: list files in directory with extension txt

The following text files are present in my current working directory.

sales.txt profit.txt samples.txt

Example 1: List all txt files present in the ‘account’ directory.

import glob # absolute path to search all text files inside a specific folder path = r'E:/demos/files_demos/account/*.txt' files = glob.glob(path) print(files) 
['E:/account\\profit.txt', 'E:/account\\sales.txt', 'E:/account\\sample.txt']

If you want to list files from a current directory the use glob.glob(‘./*.txt’) .

Note: This solution is fast because it only looks for a specific pattern instead of traversing the entire directory file by file to check if it has a specific extension, resulting in performance benefits.

Os module to list files in directory with extension

This module helps us to work with operating system-dependent functionality in Python. The os module provides functions for interacting with the operating system.

  • Use the os.listdir(‘path’) function to get the list of all files of a directory. This function returns the names of the files and directories present in the directory.
  • Next, use a for loop to iterate all files from a list.
  • Next, use the if condition in each iteration to check if the file name ends with a txt extension. If yes, add it to the final list
import os # folder path dir_path = r'E:\account' # list to store files res = [] # Iterate directory for file in os.listdir(dir_path): # check only text files if file.endswith('.txt'): res.append(file) print(res)
['profit.txt', 'sales.txt', 'sample.txt']

Note: This solution is slow because it traverses the entire directory file by file to check if it has a specific extension, resulting in performance overhead if the directory contains many files. So I suggest you use the first solution, i.e., glob module.

Читайте также:  Border left height css

list files in directory and subdirectories with extension txt

We can use the following two approaches: –

Glob module to list files from subdirectories with txt extension

Set the recursive attribute of a glob() method to True to list text files from subdirectories.

Use Python 3.5+ to find files recursively using the glob module. If you are using the older version of Python, then use the os.walk() method.

The glob module supports the ** directive. If you want it recursive you can use glob.glob(‘**/*.txt’) and set a recursive flag to True , the glob() method parses the given path and looks recursively in the directories.

import glob # absolute path to search all text files inside a specific folder path = r'E:/account/**/*.txt' files = glob.glob(path, recursive=True) print(files)
['E:/account\\profit.txt', 'E:/account\\sales.txt', 'E:/account\\sample.txt', 'E:/account\\reports_2021\\december_2021.txt']

os.walk() to list files in directory and subdirectories with extension txt

It is a recursive function, i.e., Every time the generator is called it creates a tuple of values (current_path, directories in current_path, files in current_path) and it will follow each directory recursively to get a list of files and directories until no further sub-directories are available from the initial directory.

  • Call the os.walk(»path’) function. It will yield two lists for each directory it visits. The first list contains files, and the second list includes directories.
  • Next, Iterate the list of files using a for loop
  • Next, use the if condition in each iteration to check if the file name ends with a txt extension. If yes, add it to the final list.
import os # list to store txt files res = [] # os.walk() returns subdirectories, file from current directory and # And follow next directory from subdirectory list recursively until last directory for root, dirs, files in os.walk(r"E:\demos\files_demos\account"): for file in files: if file.endswith(".txt"): res.append(os.path.join(root, file)) print(res) 
['E:/account\\profit.txt', 'E:/account\\sales.txt', 'E:/account\\sample.txt', 'E:/account\\reports_2021\\december_2021.txt']

Did you find this page helpful? Let others know about it. Sharing helps me continue to create free Python resources.

Читайте также:  Что такое cpp реклама

About Vishal

I’m Vishal Hule, Founder of PYnative.com. I am a Python developer, and I love to write articles to help students, developers, and learners. Follow me on Twitter

Python Exercises and Quizzes

Free coding exercises and quizzes cover Python basics, data structure, data analytics, and more.

  • 15+ Topic-specific Exercises and Quizzes
  • Each Exercise contains 10 questions
  • Each Quiz contains 12-15 MCQ

Источник

Python – Determine File Type

In this tutorial, we’ll try to understand how to determine the file type using Python with the help of some examples.

There are multiple ways to detect the type of a file using Python. For example –

  1. Get the file type using the file name and its extension using the os.path.splitext() function. For example, the file “cat.png” appears to be an image file since it has “.png” as its extension.
  2. Alternatively, you can determine the file based on the contents of the file as well by using the python-magic library.

Let’s look at both methods in detail.

Method 1- Using os.path.splitext() method

This is an inbuilt method in os library which splits the pathname into a (root, ext) pair such that root + ext == path . The extension, ext, is either empty or begins with a period and contains at most one period.

Basic Syntax:

Parameters: The only parameter is path which indicates the path of the file specified.

Let’s now look at the usage of this method with some worked out examples

Example 1 – Simple path

import os print(os.path.splitext("file1.txt"))

Here, we get the filename and the extension of the file. From the extension, we can say that the given file is a text file.

Example 2 – Path with No extension

What happens if the filename does not contain any extension?

import os print(os.path.splitext("file"))

As there is no extension in the path specified, we can only see the filename and can’t really determine the file type here.

Example 3 – Path containing an extension

Let’s look at another example of a filename with a confusing extension.

import os print(os.path.splitext("foo.bar.exe"))

We can say that the given file is an executable file (it has .exe as its extension). As the path contains an extension, then ext will be set to this extension, including the leading period. Note that previous periods will be ignored.

Example 4 – Path containing leading periods

import os print(os.path.splitext(".sdfasg")) print(os.path.splitext("/faaaoo/gves/. png"))

Here we can see that the leading periods of the last component of the path are considered to be part of the root.

A drawback of determining file type from its extension

In this method, we are trying to determine the file type just by looking at the path of the file. But if we think about it, we can see a case where the path of the file has an extension of one type, but the content in the file is of another type. A simple example can be, a file named “f1.jpg” can have the content inside which is of type HTML. So, for such cases, we cannot just determine the file type by using its name/path.

Читайте также:  jQuery Get & Set Width of an Element

The solution can be to use the magic number associated with the file to determine the type of the file. A magic number is a fixed number that is used to identify a file. This approach gives you more freedom when naming files and does not require an extension. Magic numbers are useful for identifying files because files can occasionally have incorrect file extensions.

This is can be done using the python.magic library.

Method 2 – Using the python-magic library

You can use the magic.from_file() method available in the python-magic library to determine the type of a file based on its contents. It uses the magic number associated with the file to determine its type.

The following is the syntax –

Basic Syntax:

Parameters: The parameters are path that indicates the path of the file specified and mime(True/False) that attains the mime type of the file (optional).

For more information, refer this.

Let’s now look at the usage of the above method with an example.

Example – Determine file type

Let’s say we create a simple file with HTML markup inside but name the file with a “.jpg” extension. Look at the image below.

file with HTML content

The contents of this file are saved under the name “cats.jpg”. Now, let’s try to determine its type using the magic.from_file() method.

import magic print(magic.from_file("cats.png"))
HTML document, ASCII text, with CRLF line terminators

Here, we can see that, even though the file is a jpg file, the inside content is of type HTML, so the output of the program is an HTML file.

Summary

In this tutorial, we understood how to determine the file type using os.path.splitext method. Then, we understood the drawback of using this method and then tried to understand python-magic library which is better to use.

You might also be interested in –

Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.

Author

I’m an undergrad student at IIT Madras interested in exploring new technologies. I have worked on various projects related to Data science, Machine learning & Neural Networks, including image classification using Convolutional Neural Networks, Stock prediction using Recurrent Neural Networks, and many more machine learning model training. I write blog articles in which I would try to provide a complete guide on a particular topic and try to cover as many different examples as possible with all the edge cases to understand the topic better and have a complete glance over the topic. View all posts

Career Guides

  • Start Your Path
  • Data Scientist
  • Data Analyst
  • Data Engineer
  • Machine Learning Engineer
  • Statistician
  • Data Architect
  • Software Developer

Источник

Оцените статью