Python сериализация в файл

Содержание

Работа с файлами. Сериализация: json и pickle #
Контекстный менеджер with #
Data Serialization¶
Flat vs. Nested data¶
Serializing Text¶
Simple file (flat data)¶
repr¶
ast.literal_eval¶
CSV file (flat data)¶
YAML (nested data)¶
JSON file (nested data)¶
XML (nested data)¶
Binary¶
NumPy Array (flat data)¶
Pickle (nested data)¶
Protobuf¶
O’Reilly Book
Translations
Table Of Contents
Related Topics

Работа с файлами. Сериализация: json и pickle #

Для работы с файлами существует функция open, которая создаёт файловый объект. Единственный обязательный параметр этой функции path — путь к файлу. Очень важный опциональный параметр mode отвечает за режим открытия файла.

Открыть на чтение (по умолчанию)

Открыть на запись (содержимое существующего файла уничтожается)

Создать файл и открыть на запись (ошибка, если файл существует)

Открыть файл на дозапись (если файл существует, то дозаписывать в конец файла)

Открыть как бинарный файл

Открыть как текстовый файл (по умолчанию)

Открыть на запись и чтение

По умолчанию функцию open открывает файл как текстовый для чтения. В качестве примера создадим файл «first_file.txt» в папке «tmp» . Для этого сначала убедимся в том, что папка существует и создадим строку path , представляющую путь до будущего файла.

import os folder = "tmp" filename = "first_file.txt" path = os.path.join(folder, filename) os.makedirs(folder, exist_ok=True)

Теперь создадим файловый объект, запишем в него текст. Для этого необходимо открыть файл в режиме на запись в текстовом режиме (флаг «wt» или «w» , т.к. текстовый режим подразумевается по умолчанию), записать строки текста в него методом write, а в конце обязательно закрыть файл методом close.

f = open(path, "w") f.write( """Первая строка. Вторая строка. Последняя строка.""" ) f.close()

Теперь, чтобы убедиться в успешном создании файла, можно повторить эти операции, но на чтение. Как и прежде, в начале создаётся файловый объект встроенной функцией open и в конце закрыть его методом close, но в этот раз режим открытия на чтение, а значит в качестве параметра можно передать «rt» , «r» или не передать ничего, так как по умолчанию подразумевается открытие на чтение в текстовом режиме. Считывать содержимое из файла можно несколькими методами, но для начала рассмотрим самый простой — чтение сразу всего содержимого методом read.

f = open(path) content = f.read() print(content) f.close()

Первая строка. Вторая строка. Последняя строка.

Видим, что все операции завершились успешно.

Контекстный менеджер with #

При создании файлового объекта в явном виде необходимо следить за тем, чтобы в самом конце вызвать метод close. Это принципиально по нескольким причинам. Во-первых, считается хорошей практикой освобождать ресурсы (возвращать их под управление операционной системы), как только в них пропала нужда. Во-вторых, если исполнение программы прекратится в экстренном режиме (например, из-за логической ошибки в программе или в результате технического сбоя в аппаратном обеспечении), то риски повреждения незакрытого файла многократно растут.

Тем не менее в хорошем коде python вы встретите вызов метода close в явном виде очень редко. Все это из-за того, что хорошей практикой является использование контекстного менеджера with, который не только автоматически освобождает файловый ресурс по выходу из соответствующего блока, но и проследит за освобождением ресурсов даже в том случае, если при работе программы возникла логическая ошибка и программа упала. Для работы с файлами обычно используется приблизительно следующая конструкция.

with open(path) as name: первая строка блока with . последняя строка блока with первая инструкция после блока with # здесь файл гарантировано закрыт

В заголовке блока with открывается файл, и результирующий файловый объект связывается с именем name . Этот файл будет открыт до тех пор, пока программа не покинет следующий за заголовком блок кода. Как только программы этот блок покидает (естественным образом или в результате распространения исключения), файл автоматически закрывается и имя name освобождается.

Считаем содержимое созданного в предыдущем разделе файла с блоком with .

with open(filename) as f: print(f.read())

Источник

Data Serialization¶

Data serialization is the process of converting structured data to a format that allows sharing or storage of the data in a form that allows recovery of its original structure. In some cases, the secondary intention of data serialization is to minimize the data’s size which then reduces disk space or bandwidth requirements.

Flat vs. Nested data¶

Before beginning to serialize data, it is important to identify or decide how the data should be structured during data serialization — flat or nested. The differences in the two styles are shown in the below examples.

 "Type" : "A", "field1": "value1", "field2": "value2", "field3": "value3" >

"A"  "field1": "value1", "field2": "value2", "field3": "value3" > >

For more reading on the two styles, please see the discussion on Python mailing list, IETF mailing list and in stackexchange.

Serializing Text¶

Simple file (flat data)¶

If the data to be serialized is located in a file and contains flat data, Python offers two methods to serialize data.

repr¶

The repr method in Python takes a single object parameter and returns a printable representation of the input:

# input as flat text a =  "Type" : "A", "field1": "value1", "field2": "value2", "field3": "value3" > # the same input can also be read from a file a = open('/tmp/file.py', 'r') # returns a printable representation of the input; # the output can be written to a file as well print(repr(a)) # write content to files using repr with open('/tmp/file.py') as f:f.write(repr(a))

ast.literal_eval¶

The literal_eval method safely parses and evaluates an expression for a Python datatype. Supported data types are: strings, numbers, tuples, lists, dicts, booleans, and None.

with open('/tmp/file.py', 'r') as f: inp = ast.literal_eval(f.read())

CSV file (flat data)¶

The CSV module in Python implements classes to read and write tabular data in CSV format.

Simple example for reading:

# Reading CSV content from a file import csv with open('/tmp/file.csv', newline='') as f: reader = csv.reader(f) for row in reader: print(row)

Simple example for writing:

# Writing CSV content to a file import csv with open('/temp/file.csv', 'w', newline='') as f: writer = csv.writer(f) writer.writerows(iterable)

The module’s contents, functions, and examples can be found in the Python documentation.

YAML (nested data)¶

There are many third party modules to parse and read/write YAML file structures in Python. One such example is below.

# Reading YAML content from a file using the load method import yaml with open('/tmp/file.yaml', 'r', newline='') as f: try: print(yaml.load(f)) except yaml.YAMLError as ymlexcp: print(ymlexcp)

Documentation on the third party module can be found in the PyYAML Documentation.

JSON file (nested data)¶

Python’s JSON module can be used to read and write JSON files. Example code is below.

# Reading JSON content from a file import json with open('/tmp/file.json', 'r') as f: data = json.load(f)

# Writing JSON content to a file using the dump method import json with open('/tmp/file.json', 'w') as f: json.dump(data, f, sort_keys=True)

XML (nested data)¶

XML parsing in Python is possible using the xml package.

# reading XML content from a file import xml.etree.ElementTree as ET tree = ET.parse('country_data.xml') root = tree.getroot()

More documentation on using the xml.dom and xml.sax packages can be found in the Python XML library documentation.

Binary¶

NumPy Array (flat data)¶

Python’s NumPy array can be used to serialize and deserialize data to and from byte representation.

import NumPy as np # Converting NumPy array to byte format byte_output = np.array([ [1, 2, 3], [4, 5, 6], [7, 8, 9] ]).tobytes() # Converting byte format back to NumPy array array_format = np.frombuffer(byte_output)

Pickle (nested data)¶

The native data serialization module for Python is called Pickle.

import pickle #Here's an example dict grades =  'Alice': 89, 'Bob': 72, 'Charles': 87 > #Use dumps to convert the object to a serialized string serial_grades = pickle.dumps( grades ) #Use loads to de-serialize an object received_grades = pickle.loads( serial_grades )

Protobuf¶

If you’re looking for a serialization module that has support in multiple languages, Google’s Protobuf library is an option.

O’Reilly Book

This guide is now available in tangible book form! All proceeds are being directly donated to the DjangoGirls organization.

Translations