Python pandas to pickle

Как сохранить Pandas DataFrame для дальнейшего использования (с примером)

Часто вам может потребоваться сохранить кадр данных pandas для последующего использования без необходимости повторного импорта данных из файла CSV.

Самый простой способ сделать это — использовать to_pickle() для сохранения DataFrame в виде файла рассола:

Это сохранит DataFrame в вашей текущей рабочей среде.

Затем вы можете использовать read_pickle() для быстрого чтения DataFrame из файла рассола:

df = pd.read_pickle (" my_data.pkl ") 

В следующем примере показано, как использовать эти функции на практике.

Пример: сохранение и загрузка Pandas DataFrame

Предположим, мы создаем следующий кадр данных pandas, который содержит информацию о различных баскетбольных командах:

import pandas as pd #create DataFrame df = pd.DataFrame() #view DataFrame print(df) team points assists rebounds 0 A 18 5 11 1 B 22 7 8 2 C 19 7 10 3 D 14 9 6 4 E 14 12 6 5 F 11 9 5 6 G 20 9 9 7 H 28 4 12 

Мы можем использовать df.info() для просмотра типа данных каждой переменной в DataFrame:

#view DataFrame info print(df.info ()) RangeIndex: 8 entries, 0 to 7 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 team 8 non-null object 1 points 8 non-null int64 2 assists 8 non-null int64 3 rebounds 8 non-null int64 dtypes: int64(3), object(1) memory usage: 292.0+ bytes None 

Мы можем использовать функцию to_pickle() , чтобы сохранить этот DataFrame в файл рассола с расширением .pkl :

#save DataFrame to pickle file df.to_pickle (" my_data.pkl ") 

Наш DataFrame теперь сохранен как файл рассола в нашей текущей рабочей среде.

Затем мы можем использовать функцию read_pickle() для быстрого чтения DataFrame:

#read DataFrame from pickle file df= pd.read_pickle (" my_data.pkl ") #view DataFrame print(df) team points assists rebounds 0 A 18 5 11 1 B 22 7 8 2 C 19 7 10 3 D 14 9 6 4 E 14 12 6 5 F 11 9 5 6 G 20 9 9 7 H 28 4 12 

Мы можем снова использовать df.info() , чтобы убедиться, что тип данных каждого столбца такой же, как и раньше:

#view DataFrame info print(df.info ()) RangeIndex: 8 entries, 0 to 7 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 team 8 non-null object 1 points 8 non-null int64 2 assists 8 non-null int64 3 rebounds 8 non-null int64 dtypes: int64(3), object(1) memory usage: 292.0+ bytes None 

Преимущество использования файлов pickle заключается в том, что тип данных каждого столбца сохраняется при сохранении и загрузке DataFrame.

Это дает преимущество перед сохранением и загрузкой файлов CSV, поскольку нам не нужно выполнять какие-либо преобразования в DataFrame, поскольку файл рассола сохраняет исходное состояние DataFrame.

Дополнительные ресурсы

В следующих руководствах объясняется, как исправить другие распространенные ошибки в Python:

Читайте также:  Utils package in python

Источник

pandas.DataFrame.to_pickle#

String, path object (implementing os.PathLike[str] ), or file-like object implementing a binary write() function. File path where the pickled object will be stored.

compression str or dict, default ‘infer’

For on-the-fly compression of the output data. If ‘infer’ and ‘path’ is path-like, then detect compression from the following extensions: ‘.gz’, ‘.bz2’, ‘.zip’, ‘.xz’, ‘.zst’, ‘.tar’, ‘.tar.gz’, ‘.tar.xz’ or ‘.tar.bz2’ (otherwise no compression). Set to None for no compression. Can also be a dict with key ‘method’ set to one of < 'zip' , 'gzip' , 'bz2' , 'zstd' , 'tar' >and other key-value pairs are forwarded to zipfile.ZipFile , gzip.GzipFile , bz2.BZ2File , zstandard.ZstdCompressor or tarfile.TarFile , respectively. As an example, the following could be passed for faster compression and to create a reproducible gzip archive: compression= .

New in version 1.5.0: Added support for .tar files.

Int which indicates which protocol should be used by the pickler, default HIGHEST_PROTOCOL (see [1] paragraph 12.1.2). The possible values are 0, 1, 2, 3, 4, 5. A negative value for the protocol parameter is equivalent to setting its value to HIGHEST_PROTOCOL.

storage_options dict, optional

Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc. For HTTP(S) URLs the key-value pairs are forwarded to urllib.request.Request as header options. For other URLs (e.g. starting with “s3://”, and “gcs://”) the key-value pairs are forwarded to fsspec.open . Please see fsspec and urllib for more details, and for more examples on storage options refer here.

Load pickled pandas object (or any object) from file.

Write DataFrame to an HDF5 file.

Write DataFrame to a SQL database.

Write a DataFrame to the binary parquet format.

>>> original_df = pd.DataFrame("foo": range(5), "bar": range(5, 10)>) >>> original_df foo bar 0 0 5 1 1 6 2 2 7 3 3 8 4 4 9 >>> original_df.to_pickle("./dummy.pkl") 
>>> unpickled_df = pd.read_pickle("./dummy.pkl") >>> unpickled_df foo bar 0 0 5 1 1 6 2 2 7 3 3 8 4 4 9 

Источник

pandas.DataFrame.to_pickle#

String, path object (implementing os.PathLike[str] ), or file-like object implementing a binary write() function. File path where the pickled object will be stored.

compression str or dict, default ‘infer’

For on-the-fly compression of the output data. If ‘infer’ and ‘path’ is path-like, then detect compression from the following extensions: ‘.gz’, ‘.bz2’, ‘.zip’, ‘.xz’, ‘.zst’, ‘.tar’, ‘.tar.gz’, ‘.tar.xz’ or ‘.tar.bz2’ (otherwise no compression). Set to None for no compression. Can also be a dict with key ‘method’ set to one of < 'zip' , 'gzip' , 'bz2' , 'zstd' , 'xz' , 'tar' >and other key-value pairs are forwarded to zipfile.ZipFile , gzip.GzipFile , bz2.BZ2File , zstandard.ZstdCompressor , lzma.LZMAFile or tarfile.TarFile , respectively. As an example, the following could be passed for faster compression and to create a reproducible gzip archive: compression= .

New in version 1.5.0: Added support for .tar files.

Int which indicates which protocol should be used by the pickler, default HIGHEST_PROTOCOL (see [1] paragraph 12.1.2). The possible values are 0, 1, 2, 3, 4, 5. A negative value for the protocol parameter is equivalent to setting its value to HIGHEST_PROTOCOL.

Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc. For HTTP(S) URLs the key-value pairs are forwarded to urllib.request.Request as header options. For other URLs (e.g. starting with “s3://”, and “gcs://”) the key-value pairs are forwarded to fsspec.open . Please see fsspec and urllib for more details, and for more examples on storage options refer here.

Load pickled pandas object (or any object) from file.

Write DataFrame to an HDF5 file.

Write DataFrame to a SQL database.

Write a DataFrame to the binary parquet format.

>>> original_df = pd.DataFrame("foo": range(5), "bar": range(5, 10)>) >>> original_df foo bar 0 0 5 1 1 6 2 2 7 3 3 8 4 4 9 >>> original_df.to_pickle("./dummy.pkl") 
>>> unpickled_df = pd.read_pickle("./dummy.pkl") >>> unpickled_df foo bar 0 0 5 1 1 6 2 2 7 3 3 8 4 4 9 

Источник

pandas.DataFrame.to_pickle#

String, path object (implementing os.PathLike[str] ), or file-like object implementing a binary write() function. File path where the pickled object will be stored.

compression str or dict, default ‘infer’

For on-the-fly compression of the output data. If ‘infer’ and ‘path’ is path-like, then detect compression from the following extensions: ‘.gz’, ‘.bz2’, ‘.zip’, ‘.xz’, ‘.zst’, ‘.tar’, ‘.tar.gz’, ‘.tar.xz’ or ‘.tar.bz2’ (otherwise no compression). Set to None for no compression. Can also be a dict with key ‘method’ set to one of < 'zip' , 'gzip' , 'bz2' , 'zstd' , 'xz' , 'tar' >and other key-value pairs are forwarded to zipfile.ZipFile , gzip.GzipFile , bz2.BZ2File , zstandard.ZstdCompressor , lzma.LZMAFile or tarfile.TarFile , respectively. As an example, the following could be passed for faster compression and to create a reproducible gzip archive: compression= .

New in version 1.5.0: Added support for .tar files.

Int which indicates which protocol should be used by the pickler, default HIGHEST_PROTOCOL (see [1] paragraph 12.1.2). The possible values are 0, 1, 2, 3, 4, 5. A negative value for the protocol parameter is equivalent to setting its value to HIGHEST_PROTOCOL.

Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc. For HTTP(S) URLs the key-value pairs are forwarded to urllib.request.Request as header options. For other URLs (e.g. starting with “s3://”, and “gcs://”) the key-value pairs are forwarded to fsspec.open . Please see fsspec and urllib for more details, and for more examples on storage options refer here.

Load pickled pandas object (or any object) from file.

Write DataFrame to an HDF5 file.

Write DataFrame to a SQL database.

Write a DataFrame to the binary parquet format.

>>> original_df = pd.DataFrame("foo": range(5), "bar": range(5, 10)>) >>> original_df foo bar 0 0 5 1 1 6 2 2 7 3 3 8 4 4 9 >>> original_df.to_pickle("./dummy.pkl") 
>>> unpickled_df = pd.read_pickle("./dummy.pkl") >>> unpickled_df foo bar 0 0 5 1 1 6 2 2 7 3 3 8 4 4 9 

Источник

pandas.Series.to_pickle#

String, path object (implementing os.PathLike[str] ), or file-like object implementing a binary write() function. File path where the pickled object will be stored.

compression str or dict, default ‘infer’

For on-the-fly compression of the output data. If ‘infer’ and ‘path’ is path-like, then detect compression from the following extensions: ‘.gz’, ‘.bz2’, ‘.zip’, ‘.xz’, ‘.zst’, ‘.tar’, ‘.tar.gz’, ‘.tar.xz’ or ‘.tar.bz2’ (otherwise no compression). Set to None for no compression. Can also be a dict with key ‘method’ set to one of < 'zip' , 'gzip' , 'bz2' , 'zstd' , 'tar' >and other key-value pairs are forwarded to zipfile.ZipFile , gzip.GzipFile , bz2.BZ2File , zstandard.ZstdCompressor or tarfile.TarFile , respectively. As an example, the following could be passed for faster compression and to create a reproducible gzip archive: compression= .

New in version 1.5.0: Added support for .tar files.

Int which indicates which protocol should be used by the pickler, default HIGHEST_PROTOCOL (see [1] paragraph 12.1.2). The possible values are 0, 1, 2, 3, 4, 5. A negative value for the protocol parameter is equivalent to setting its value to HIGHEST_PROTOCOL.

storage_options dict, optional

Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc. For HTTP(S) URLs the key-value pairs are forwarded to urllib.request.Request as header options. For other URLs (e.g. starting with “s3://”, and “gcs://”) the key-value pairs are forwarded to fsspec.open . Please see fsspec and urllib for more details, and for more examples on storage options refer here.

Load pickled pandas object (or any object) from file.

Write DataFrame to an HDF5 file.

Write DataFrame to a SQL database.

Write a DataFrame to the binary parquet format.

>>> original_df = pd.DataFrame("foo": range(5), "bar": range(5, 10)>) >>> original_df foo bar 0 0 5 1 1 6 2 2 7 3 3 8 4 4 9 >>> original_df.to_pickle("./dummy.pkl") 
>>> unpickled_df = pd.read_pickle("./dummy.pkl") >>> unpickled_df foo bar 0 0 5 1 1 6 2 2 7 3 3 8 4 4 9 

Источник

Оцените статью