Python project data files

Add data files to python projects setup py

This is necessary because, by default, data files are installed to a different location than source files. The file needs to be added to the source tarball setup.py needs to be modified to install the data file to the source path Step 1: To add the file to the source tarball, include it in the MANIFEST Create a MANIFEST template in the folder that contains setup.py The MANIFEST is basically a text file with a list of all the files that will be included in the source tarball.

Add data files to python projects setup.py

data_files = [('', ['negar/data/untouchable.dat'])], 

The first problem is that I didn’t import my data file into the package with MANIFEST.in file. I imported it like this:

After that my data file already imported with my package install. but because I had mistakes in open my data files, python couldn’t find it. this question helped me to find the right way Python Access Data in Package Subdirectory and now I use something like this:

import os this_dir, this_filename = os.path.split(__file__) DATA_PATH = os.path.join(this_dir, "data", "data.txt") print open(DATA_PATH).read() 

Python package with data files, Following will answer your Q#1: There is a concept of data_file and package_data. Package_data is used to include documentation and

Distributing Python Packages: Setup.py File

Python packaging: data files (intermediate) anthony explains #071

today I talk about packaging data / build files for python packages! This covers MANIFEST.in Duration: 15:57

Python packaging: basic setup.py and declarative

today I go over a basic setup.py for packaging a python package. I then show a few tools you Duration: 12:36

How to correctly install data files with setup.py?

I guess what you have to do is to create the following basic structure of the repo:

myrepo setup.py Manifest.in mypackage/ __init__.py mycode.py data/ __init__.py mydata.txt 

Just make sure to keep in mind 6 additional steps:

  1. You need to put the data folder inside your package folder
  2. You need to add __init__.py inside your data folder.
  3. In setup.py you have to use packages=find_packages(), to find your packages.
  4. In setup.py , you have to set include_package_data=True,
  5. In setup.py , you have to specify the path to your data files:

It you are lucky, then you can include/use your data file like

from mypackage import data import importlib.resources as import_resources text = import_resources.read_text(data, "mydata.txt") 
with import_resources.path(data, "mydata.txt") as filename: myfilename = filename 

to get the path to the data file.

Not sure this is documented anywhere.

MANIFEST.in ignored on «python setup.py install», Note: The information provided in this video is as it is with no modifications. Thanks to many Duration: 1:21

Including non-Python files with setup.py

Probably the best way to do this is to use the setuptools package_data directive. This does mean using setuptools (or distribute ) instead of distutils , but this is a very seamless «upgrade».

Here’s a full (but untested) example:

from setuptools import setup, find_packages setup( name='your_project_name', version='0.1', description='A description.', packages=find_packages(exclude=['ez_setup', 'tests', 'tests.*']), package_data=, include_package_data=True, install_requires=[], ) 

Note the specific lines that are critical here:

package_data=, include_package_data=True, 

package_data is a dict of package names (empty = all packages) to a list of patterns (can include globs). For example, if you want to only specify files within your package, you can do that too:

The solution here is definitely not to rename your non- py files with a .py extension.

See Ian Bicking’s presentation for more info.

UPDATE: Another [Better] Approach

Another approach that works well if you just want to control the contents of the source distribution ( sdist ) and have files outside of the package (e.g. top-level directory) is to add a MANIFEST.in file. See the Python documentation for the format of this file.

Since writing this response, I have found that using MANIFEST.in is typically a less frustrating approach to just make sure your source distribution ( tar.gz ) has the files you need.

For example, if you wanted to include the requirements.txt from top-level, recursively include the top-level «data» directory:

include requirements.txt recursive-include data * 

Nevertheless, in order for these files to be copied at install time to the package’s folder inside site-packages, you’ll need to supply include_package_data=True to the setup() function. See Adding Non-Code Files for more information.

To accomplish what you’re describing will take two steps.

  • The file needs to be added to the source tarball
  • setup.py needs to be modified to install the data file to the source path

Step 1: To add the file to the source tarball, include it in the MANIFEST

Create a MANIFEST template in the folder that contains setup.py

The MANIFEST is basically a text file with a list of all the files that will be included in the source tarball.

Here’s what the MANIFEST for my project look like:

  • CHANGELOG.txt
  • INSTALL.txt
  • LICENSE.txt
  • pypreprocessor.py
  • README.txt
  • setup.py
  • test.py
  • TODO.txt

Note: While sdist does add some files automatically, I prefer to explicitly specify them to be sure instead of predicting what it does and doesn’t.

Step 2: To install the data file to the source folder, modify setup.py

Since you’re looking to add a data file (LICENSE.txt) to the source install folder you need to modify the data install path to match the source install path. This is necessary because, by default, data files are installed to a different location than source files.

To modify the data install dir to match the source install dir.

Pull the install dir info from distutils with:

from distutils.command.install import INSTALL_SCHEMES 

Modify the data install dir to match the source install dir:

for scheme in INSTALL_SCHEMES.values(): scheme['data'] = scheme['purelib'] 

And, add the data file and location to setup():

Note: The steps above should accomplish exactly what you described in a standard manner without requiring any extension libraries.

It is 2019, and here is what is working — despite advice here and there, what I found on the internet halfway documented is using setuptools_scm , passed as options to setuptools.setup . This will include any data files that are versioned on your VCS, be it git or any other, to the wheel package, and will make «pip install» from the git repository to bring those files along.

So, I just added these two lines to the setup call on «setup.py». No extra installs or import required:

 setup_requires=['setuptools_scm'], include_package_data=True, 

No need to manually list package_data, or in a MANIFEST.in file — if it is versioned, it is included in the package. The docs on «setuptools_scm» put emphasis on creating a version number from the commit position, and disregard the really important part of adding the data files. (I can’t care less if my intermediate wheel file is named «*0.2.2.dev45+g3495a1f» or will use the hardcoded version number «0.3.0dev0» I’ve typed in — but leaving crucial files for the program to work behind is somewhat important)

Python — How do you add additional files to a wheel?, An alternative is to use MANIFEST.in and add include_package_data=True in setup() of your setup.py as indicated here. With this directive, the

Find data files inside a Python package — setup.py’s package_data= maybe broken?

Solved it by changing the setup.py from:

packages=["mypkg"], package_dir=, package_data=, 
packages=["mypkg"], include_package_data=True, package_dir=, 

To me this seems obscure. It actually makes me wonder if the package_data= <. >is maybe broken.

Python package : data files of a subpackage, You can also use the package_data argument to setup() to explicitly note data files that must be included in your packages or sub-packages.

Источник

Adding data files to Python package with setup.py

pyproject.toml is the new Python project metadata specification standard since PEP 621. As per PEP 517, and as per one of the comments of this StackOverflow thread, in some rare cases, we might have a chicken and egg problem when using setup.py if it needs to import something from the package it’s building. The only thing that pyproject.toml cannot achieve for the moment is the installation in editable mode, where we must use setup.py . Another advantage of setup.py is that we can compute some variables dynamically during the build time as it’s a Python file.

Nevertheless, setup.py is still a widely used solid tool to build Python package. This post will discuss how to add data files (non Python files) to a Python wheel package built by setup.py , the source distribution files (sdist .tar.gz files, .zip for Windows) are not covered by this post.

Adding data files

With parameter package_data for files inside a package

package_data accepts wildcard, but from the given example, the data files must exist inside a Python module folder (coexist with file __init__.py ), you cannot use package_data to include files from non module folders, for e.g. the folder conf where there’s no __init__.py file inside.

setup.py conf/ conf.json src/ mypkg/ __init__.py module.py data/ tables.dat spoons.dat forks.dat 
setup(. packages=['mypkg'], package_dir='mypkg': 'src/mypkg'>, package_data='mypkg': ['data/*.dat']>, ) 

With parameter data_files for any files

distutils is deprecated, and will be remove in Python 3.12 as per PEP 632, the migration path is to simply use setuptools.

setup(. data_files=[ ('bitmaps', ['bm/b1.gif', 'bm/b2.gif']), ('config', ['cfg/data.cfg']), (dest_folder_path_in_wheel>, [source_file_path_relative_to_setup.py_script>]), ], ) 

From the above example, we can see that:

  1. data_files accepts any files from any folder, in contrast to package_data which accepts files inside a package folder.
  2. data_files takes files one by one, we can not use the wildcard like * to specify a set of source files.
  3. after build, there’s a .whl wheel file generated, the source_file_path_relative_to_setup will be added to the path -.data/data// , and the Python files are added to / . If you want to put the data files at the original path, you need to replace with ../../ , the first two .. is just to escape two folder levels from -.data/data/ .

With file MANIFEST.in

From my understanding and tests, MANIFEST.in file is only for sdist, so out of the scope of this post which talks about bdist wheel package only.

Parameter zip_safe

If you’re using old-fashion egg file, to reference data files inside package, should put zie_safe=False during built. Otherwise, for modern Python packaging, this parameter is obsolete.

Loading data files

A very good sum-up can be found in this StackOverflow thread.

Loading data files packaged by package_data

# to read file from module_a/folder_b/file.json import importlib.resources import json # open_text is deprecated in Python3.11 as only support files in Python modules # see below example how to use `importlib.resources.files` json.load(importlib.resources.open_text("module_a.folder_b", "file.json")) 
# to read file from module_a/folder_b/file.json import json import pkg_resources json.load(pkg_resources.resource_stream("module_a", "folder_b/file.json")) 

Loading data files packaged by data_files

As data files packaged by data_files parameter could be in any folder, not necessarily inside a Python module with __init__ file, in such case the new importlib.resources.open_text can not be used anymore, and indeed marked as deprecated in Python 3.11.

    Use stdlib importlib.resources.files to read file from module_a/folder_b/file.json This method can also be used to load data files packaged by package_data

try: # new stdlib in Python3.9 from importlib.resources import files except ImportError: # third-party package, backport for Python3.9-, # need to add importlib_resources to requirements from importlib_resources import files import json # with `data_files` in `setup.py`, # we can specify where to put the files in the wheel package, # so inside the module_a for example with open(files(module_a).joinpath("folder_b/file.json")) as f: print(json.load(f)) 
import json import pkg_resources # use `data_files` in `setup.py`, we can specify where to put the files, # so inside the module_a for example json.load(pkg_resources.resource_stream("module_a", "folder_b/file.json")) 

Updated: September 15, 2022

Источник

Читайте также:  Решение 2 задание егэ информатика через питон
Оцените статью