Чтение файла python numpy

Содержание

Input and output#
Text files#
Raw binary files#
String formatting#
Memory mapping files#
Text formatting options#
Base-n representations#
Data sources#
Чтение и запись данных массивов в файлы / np 9
Загрузка и сохранение данных в бинарных файлах
Чтение файлов с табличными данными
Reading and writing files#
Reading text and CSV files#
With no missing values#
With missing values#
With non-whitespace delimiters#
Whitespace-delimited#
Read a file in .npy or .npz format#
Write to a file to be read back by NumPy#
Binary#
Human-readable#
Large arrays#
Read an arbitrarily formatted binary file (“binary blob”)#
Write or read large arrays#
Write files for reading by other (non-NumPy) tools#
Write or read a JSON file#
Save/restore using a pickle file#
Convert from a pandas DataFrame to a NumPy array#
Save/restore using tofile and fromfile #

Input and output#

Load arrays or pickled objects from .npy , .npz or pickled files.

save (file, arr[, allow_pickle, fix_imports])

Save an array to a binary file in NumPy .npy format.

Save several arrays into a single file in uncompressed .npz format.

Save several arrays into a single file in compressed .npz format.

The format of these binary file types is documented in numpy.lib.format

Text files#

loadtxt (fname[, dtype, comments, delimiter, . ])

Load data from a text file.

savetxt (fname, X[, fmt, delimiter, newline, . ])

Save an array to a text file.

Load data from a text file, with missing values handled as specified.

fromregex (file, regexp, dtype[, encoding])

Construct an array from a text file, using regular expression parsing.

A new 1-D array initialized from text data in a string.

Write array to a file as text or binary (default).

Return the array as an a.ndim -levels deep nested list of Python scalars.

Raw binary files#

fromfile (file[, dtype, count, sep, offset, like])

Construct an array from data in a text or binary file.

Write array to a file as text or binary (default).

String formatting#

array2string (a[, max_line_width, precision, . ])

Return a string representation of an array.

array_repr (arr[, max_line_width, precision, . ])

Return the string representation of an array.

array_str (a[, max_line_width, precision, . ])

Return a string representation of the data in an array.

Format a floating-point scalar as a decimal string in positional notation.

Format a floating-point scalar as a decimal string in scientific notation.

Memory mapping files#

memmap (filename[, dtype, mode, offset, . ])

Create a memory-map to an array stored in a binary file on disk.

Open a .npy file as a memory-mapped array.

Text formatting options#

Return the current print options.

Set a Python function to be used when pretty printing arrays.

Context manager for setting print options.

Base-n representations#

Return the binary representation of the input number as a string.

Return a string representation of a number in the given base system.

Data sources#

A generic data source file (file, http, ftp, . ).

Читайте также: Форма обратной связи

Источник

Чтение и запись данных массивов в файлы / np 9

Важный аспект NumPy, которому пока не уделялось внимание — процесс чтения данных из файла. Это очень важный момент, особенно когда нужно работать с большим количеством данных в массивах. Это базовая операция анализа данных, поскольку размер набора данных почти всегда огромен, и в большинстве случаев не рекомендуется работать с ним вручную.NumPy предлагает набор функций, позволяющих специалисту сохранять результаты вычислений в текстовый или бинарный файл. Таким же образом можно считывать и конвертировать текстовые данные из файла в массив.

Загрузка и сохранение данных в бинарных файлах

NumPy предлагает пару функций, save() и load() , которые позволяют сохранять, а позже и получать данные, сохраненные в бинарном формате.

При наличии массива, который нужно сохранить, содержащего, например, результаты анализа данных, остается лишь вызвать функцию call() и определить аргументы: название файла и аргументы. Файл автоматически получит расширение .npy .

>>> data=([[ 0.86466285, 0.76943895, 0.22678279], [ 0.12452825, 0.54751384, 0.06499123], [ 0.06216566, 0.85045125, 0.92093862], [ 0.58401239, 0.93455057, 0.28972379]]) >>> np.save('saved_data',data)

Когда нужно восстановить данные из файла .npy , используется функция load() . Она требует определить имя файла в качестве аргумента с расширением .npy .

>>> loaded_data = np.load('saved_data.npy') >>> loaded_data array([[ 0.86466285, 0.76943895, 0.22678279], [ 0.12452825, 0.54751384, 0.06499123], [ 0.06216566, 0.85045125, 0.92093862], [ 0.58401239, 0.93455057, 0.28972379]])

Чтение файлов с табличными данными

Часто данные для чтения или сохранения представлены в текстовом формате (TXT или CSV). Их можно сохранить в такой формат вместо двоичного, потому что таким образом к ним можно будет получать доступ даже вне NumPy, с помощью других приложений. Возьмем в качестве примера набор данных в формате CSV (Comma-Separated Values — значения, разделенные запятыми). Данные здесь хранятся в табличной форме, а значения разделены запятыми.

id,value1,value2,value3 1,123,1.4,23 2,110,0.5,18 3,164,2.1,19

Для чтения данных в текстовом файле и получения значений в массив NumPy предлагает функцию genfromtxt() . Обычно она принимает три аргумента: имя файла, символ-разделитель и указание, содержат ли данные заголовки колонок.

>>> data = np.genfromtxt('ch3_data.csv', delimiter=',', names=True) >>> data array([(1.0, 123.0, 1.4, 23.0), (2.0, 110.0, 0.5, 18.0), (3.0, 164.0, 2.1, 19.0)], dtype=[('id', '), ('value1', '), ('value2', '), ('value3', ')])

Как видно по результату, можно запросто получить структурированный массив, где заголовки колонок станут именами полей.

Эта функция неявно выполняет два цикла: первый перебирает строки одна за одной, а вторая — разделяет и конвертирует значения в них, вставляя специально созданные последовательные элементы. Плюс в том, что даже при недостатке данных функция их дополнит.

Возьмем в качестве примера предыдущий файл с удаленными элементами. Сохраним его как data2.csv .

id,value1,value2,value3 1,123,1.4,23 2,110,,18 3,,2.1,19

Выполнение этих команд приведет к тому, что genfromtxt() заменит пустые области на значения nan .

>>> data2 = np.genfromtxt('ch3_data2.csv', delimiter=',', names=True) >>> data2 array([(1.0, 123.0, 1.4, 23.0), (2.0, 110.0, nan, 18.0), (3.0, nan, 2.1, 19.0)], dtype=[('id', '), ('value1', '), ('value2', '), ('value3', ')])

В нижней части массива указаны заголовки колонок из файла. Их можно использовать как ярлыки-индексы, используемые для получения данных по колонкам.

А с помощью числовых значений можно получать данные из конкретных строк.

Источник

Reading and writing files#

This page tackles common applications; for the full collection of I/O routines, see Input and output .

Reading text and CSV files#

With no missing values#

With missing values#

return a masked arraymasking out missing values (if usemask=True ), or
fill in the missing value with the value specified in filling_values (default is np.nan for float, -1 for int).

With non-whitespace delimiters#

>>> with open("csv.txt", "r") as f: . print(f.read()) 1, 2, 3 4,, 6 7, 8, 9

Masked-array output#

>>> np.genfromtxt("csv.txt", delimiter=",", usemask=True) masked_array( data=[[1.0, 2.0, 3.0], [4.0, --, 6.0], [7.0, 8.0, 9.0]], mask=[[False, False, False], [False, True, False], [False, False, False]], fill_value=1e+20)

Array output#

>>> np.genfromtxt("csv.txt", delimiter=",") array([[ 1., 2., 3.], [ 4., nan, 6.], [ 7., 8., 9.]])

Array output, specified fill-in value#

>>> np.genfromtxt("csv.txt", delimiter=",", dtype=np.int8, filling_values=99) array([[ 1, 2, 3], [ 4, 99, 6], [ 7, 8, 9]], dtype=int8)

Whitespace-delimited#

numpy.genfromtxt can also parse whitespace-delimited data files that have missing values if

Each field has a fixed width

delimiter

# File with width=4. The data does not have to be justified (for example, # the 2 in row 1), the last column can be less than width (for example, the 6 # in row 2), and no delimiting character is required (for instance 8888 and 9 # in row 3)

>>> with open("fixedwidth.txt", "r") as f: . data = (f.read()) >>> print(data) 1 2 3 44 6 7 88889

>>> np.genfromtxt("fixedwidth.txt", delimiter=4) array([[1.000e+00, 2.000e+00, 3.000e+00], [4.400e+01, nan, 6.000e+00], [7.000e+00, 8.888e+03, 9.000e+00]])

>>> with open("nan.txt", "r") as f: . print(f.read()) 1 2 3 44 x 6 7 8888 9

>>> np.genfromtxt("nan.txt", missing_values="x") array([[1.000e+00, 2.000e+00, 3.000e+00], [4.400e+01, nan, 6.000e+00], [7.000e+00, 8.888e+03, 9.000e+00]])

>>> with open("skip.txt", "r") as f: . print(f.read()) 1 2 3 44 6 7 888 9

>>> np.genfromtxt("skip.txt", invalid_raise=False) __main__:1: ConversionWarning: Some errors were detected ! Line #2 (got 2 columns instead of 3) array([[ 1., 2., 3.], [ 7., 888., 9.]])

>>> with open("tabs.txt", "r") as f: . data = (f.read()) >>> print(data) 1 2 3 44 6 7 888 9

>>> np.genfromtxt("tabs.txt", delimiter="\t", missing_values=" +") array([[ 1., 2., 3.], [ 44., nan, 6.], [ 7., 888., 9.]])

Read a file in .npy or .npz format#

Write to a file to be read back by NumPy#

Binary#

For security and portability , set allow_pickle=False unless the dtype contains Python objects, which requires pickling.

Masked arrays can’t currently be saved , nor can other arbitrary array subclasses.

Human-readable#

numpy.save and numpy.savez create binary files. To write a human-readable file, use numpy.savetxt . The array can only be 1- or 2-dimensional, and there’s no ` savetxtz` for multiple files.

Large arrays#

Read an arbitrarily formatted binary file (“binary blob”)#

The .wav file header is a 44-byte block preceding data_size bytes of the actual sound data:

chunk_id "RIFF" chunk_size 4-byte unsigned little-endian integer format "WAVE" fmt_id "fmt " fmt_size 4-byte unsigned little-endian integer audio_fmt 2-byte unsigned little-endian integer num_channels 2-byte unsigned little-endian integer sample_rate 4-byte unsigned little-endian integer byte_rate 4-byte unsigned little-endian integer block_align 2-byte unsigned little-endian integer bits_per_sample 2-byte unsigned little-endian integer data_id "data" data_size 4-byte unsigned little-endian integer

The .wav file header as a NumPy structured dtype:

wav_header_dtype = np.dtype([ ("chunk_id", (bytes, 4)), # flexible-sized scalar type, item size 4 ("chunk_size", "), # little-endian unsigned 32-bit integer ("format", "S4"), # 4-byte string, alternate spelling of (bytes, 4) ("fmt_id", "S4"), ("fmt_size", "), ("audio_fmt", "), # ("num_channels", "), # .. more of the same . ("sample_rate", "), # ("byte_rate", "), ("block_align", "), ("bits_per_sample", "), ("data_id", "S4"), ("data_size", "), # # the sound data itself cannot be represented here: # it does not have a fixed size ]) header = np.fromfile(f, dtype=wave_header_dtype, count=1)[0]

This .wav example is for illustration; to read a .wav file in real life, use Python’s built-in module wave .

(Adapted from Pauli Virtanen, Advanced NumPy , licensed under CC BY 4.0.)

Write or read large arrays#

Arrays too large to fit in memory can be treated like ordinary in-memory arrays using memory mapping.

array = numpy.memmap("mydata/myarray.arr", mode="r", dtype=np.int16, shape=(1024, 1024))

large_array[some_slice] = np.load("path/to/small_array", mmap_mode="r")

Memory mapping lacks features like data chunking and compression; more full-featured formats and libraries usable with NumPy include:

For tradeoffs among memmap, Zarr, and HDF5, see pythonspeed.com.

Write files for reading by other (non-NumPy) tools#

Formats for exchanging data with other tools include HDF5, Zarr, and NetCDF (see Write or read large arrays ).

Write or read a JSON file#

NumPy arrays are not directly JSON serializable.

Save/restore using a pickle file#

Avoid when possible; pickles are not secure against erroneous or maliciously constructed data.

Use numpy.save and numpy.load . Set allow_pickle=False , unless the array dtype includes Python objects, in which case pickling is required.

Convert from a pandas DataFrame to a NumPy array#

Save/restore using tofile and fromfile #

numpy.ndarray.tofile and numpy.fromfile lose information on endianness and precision and so are unsuitable for anything but scratch storage.

How to write a NumPy how-to

Источник