Python pandas to csv кириллица

Содержание

pandas.DataFrame.to_csv#
Saved searches
Use saved searches to filter your results more quickly
read_csv fails to read file if there are cyrillic symbols in filename #17773
read_csv fails to read file if there are cyrillic symbols in filename #17773
Comments
Code Sample, a copy-pastable example if possible
Problem description
Expected Output
Output of pd.show_versions()
INSTALLED VERSIONS
Не получается прочитать .xls или .csv с кириллицей

pandas.DataFrame.to_csv#

DataFrame. to_csv ( path_or_buf = None , sep = ‘,’ , na_rep = » , float_format = None , columns = None , header = True , index = True , index_label = None , mode = ‘w’ , encoding = None , compression = ‘infer’ , quoting = None , quotechar = ‘»‘ , lineterminator = None , chunksize = None , date_format = None , doublequote = True , escapechar = None , decimal = ‘.’ , errors = ‘strict’ , storage_options = None ) [source] #

Write object to a comma-separated values (csv) file.

Parameters : path_or_buf str, path object, file-like object, or None, default None

String, path object (implementing os.PathLike[str]), or file-like object implementing a write() function. If None, the result is returned as a string. If a non-binary file object is passed, it should be opened with newline=’’ , disabling universal newlines. If a binary file object is passed, mode might need to contain a ‘b’ .

Changed in version 1.2.0: Support for binary file objects was introduced.

String of length 1. Field delimiter for the output file.

na_rep str, default ‘’

Missing data representation.

float_format str, Callable, default None

Format string for floating point numbers. If a Callable is given, it takes precedence over other numeric formatting parameters, like decimal.

columns sequence, optional

header bool or list of str, default True

Write out the column names. If a list of strings is given it is assumed to be aliases for the column names.

index bool, default True

index_label str or sequence, or False, default None

Column label for index column(s) if desired. If None is given, and header and index are True, then the index names are used. A sequence should be given if the object uses MultiIndex. If False do not print fields for index names. Use index_label=False for easier importing in R.

mode , default ‘w’

Forwarded to either open(mode=) or fsspec.open(mode=) to control the file opening. Typical values include:

‘w’, truncate the file first.
‘x’, exclusive creation, failing if the file already exists.
‘a’, append to the end of file if it exists.

A string representing the encoding to use in the output file, defaults to ‘utf-8’. encoding is not supported if path_or_buf is a non-binary file object.

compression str or dict, default ‘infer’

For on-the-fly compression of the output data. If ‘infer’ and ‘path_or_buf’ is path-like, then detect compression from the following extensions: ‘.gz’, ‘.bz2’, ‘.zip’, ‘.xz’, ‘.zst’, ‘.tar’, ‘.tar.gz’, ‘.tar.xz’ or ‘.tar.bz2’ (otherwise no compression). Set to None for no compression. Can also be a dict with key ‘method’ set to one of < 'zip' , 'gzip' , 'bz2' , 'zstd' , 'xz' , 'tar' >and other key-value pairs are forwarded to zipfile.ZipFile , gzip.GzipFile , bz2.BZ2File , zstandard.ZstdCompressor , lzma.LZMAFile or tarfile.TarFile , respectively. As an example, the following could be passed for faster compression and to create a reproducible gzip archive: compression= .

New in version 1.5.0: Added support for .tar files.

May be a dict with key ‘method’ as compression mode and other entries as additional compression options if compression mode is ‘zip’.

Passing compression options as keys in dict is supported for compression modes ‘gzip’, ‘bz2’, ‘zstd’, and ‘zip’.

Changed in version 1.2.0: Compression is supported for binary file objects.

Changed in version 1.2.0: Previous versions forwarded dict entries for ‘gzip’ to gzip.open instead of gzip.GzipFile which prevented setting mtime .

Defaults to csv.QUOTE_MINIMAL. If you have set a float_format then floats are converted to strings and thus csv.QUOTE_NONNUMERIC will treat them as non-numeric.

quotechar str, default ‘»’

String of length 1. Character used to quote fields.

lineterminator str, optional

The newline character or character sequence to use in the output file. Defaults to os.linesep , which depends on the OS in which this method is called (’\n’ for linux, ‘\r\n’ for Windows, i.e.).

Changed in version 1.5.0: Previously was line_terminator, changed for consistency with read_csv and the standard library ‘csv’ module.

date_format str, default None

Format string for datetime objects.

doublequote bool, default True

Control quoting of quotechar inside a field.

escapechar str, default None

String of length 1. Character used to escape sep and quotechar when appropriate.

decimal str, default ‘.’

Character recognized as decimal separator. E.g. use ‘,’ for European data.

errors str, default ‘strict’

Specifies how encoding and decoding errors are to be handled. See the errors argument for open() for a full list of options.

storage_options dict, optional

Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc. For HTTP(S) URLs the key-value pairs are forwarded to urllib.request.Request as header options. For other URLs (e.g. starting with “s3://”, and “gcs://”) the key-value pairs are forwarded to fsspec.open . Please see fsspec and urllib for more details, and for more examples on storage options refer here.

If path_or_buf is None, returns the resulting csv format as a string. Otherwise returns None.

Load a CSV file into a DataFrame.

Write DataFrame to an Excel file.

>>> df = pd.DataFrame('name': ['Raphael', 'Donatello'], . 'mask': ['red', 'purple'], . 'weapon': ['sai', 'bo staff']>) >>> df.to_csv(index=False) 'name,mask,weapon\nRaphael,red,sai\nDonatello,purple,bo staff\n'

Create ‘out.zip’ containing ‘out.csv’

>>> compression_opts = dict(method='zip', . archive_name='out.csv') >>> df.to_csv('out.zip', index=False, . compression=compression_opts)

To write a csv file to a new folder or nested folder you will first need to create it using either Pathlib or os:

>>> from pathlib import Path >>> filepath = Path('folder/subfolder/out.csv') >>> filepath.parent.mkdir(parents=True, exist_ok=True) >>> df.to_csv(filepath)

>>> import os >>> os.makedirs('folder/subfolder', exist_ok=True) >>> df.to_csv('folder/subfolder/out.csv')

Источник

Saved searches

Use saved searches to filter your results more quickly

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_csv fails to read file if there are cyrillic symbols in filename #17773

Comments

Code Sample, a copy-pastable example if possible

import pandas cyrillic_filename = "./файл_1.csv" # 'c' engine fails: df = pandas.read_csv(cyrillic_filename, engine="c", encoding="cp1251") --------------------------------------------------------------------------- OSError Traceback (most recent call last) ipython-input-18-9cb08141730c> in module>() 2 3 cyrillic_filename = "./файл_1.csv" ----> 4 df = pandas.read_csv(cyrillic_filename , engine="c", encoding="cp1251") d:\0_dev\services\protocol_sort\venv\lib\site-packages\pandas\io\parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skipfooter, skip_footer, doublequote, delim_whitespace, as_recarray, compact_ints, use_unsigned, low_memory, buffer_lines, memory_map, float_precision) 653 skip_blank_lines=skip_blank_lines) 654 --> 655 return _read(filepath_or_buffer, kwds) 656 657 parser_f.__name__ = name d:\0_dev\services\protocol_sort\venv\lib\site-packages\pandas\io\parsers.py in _read(filepath_or_buffer, kwds) 403 404 # Create the parser. --> 405 parser = TextFileReader(filepath_or_buffer, **kwds) 406 407 if chunksize or iterator: d:\0_dev\services\protocol_sort\venv\lib\site-packages\pandas\io\parsers.py in __init__(self, f, engine, **kwds) 762 self.options['has_index_names'] = kwds['has_index_names'] 763 --> 764 self._make_engine(self.engine) 765 766 def close(self): d:\0_dev\services\protocol_sort\venv\lib\site-packages\pandas\io\parsers.py in _make_engine(self, engine) 983 def _make_engine(self, engine='c'): 984 if engine == 'c': --> 985 self._engine = CParserWrapper(self.f, **self.options) 986 else: 987 if engine == 'python': d:\0_dev\services\protocol_sort\venv\lib\site-packages\pandas\io\parsers.py in __init__(self, src, **kwds) 1603 kwds['allow_leading_cols'] = self.index_col is not False 1604 -> 1605 self._reader = parsers.TextReader(src, **kwds) 1606 1607 # XXX pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader.__cinit__ (pandas\_libs\parsers.c:4209)() pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader._setup_parser_source (pandas\_libs\parsers.c:8895)() OSError: Initializing from file failed # 'python' engine work: df = pandas.read_csv(cyrillic_filename, engine="python", encoding="cp1251") df.size >>172440 # 'c' engine works if filename can be encoded to utf-8 latin_filename = "./file_1.csv" df = pandas.read_csv(latin_filename, engine="c", encoding="cp1251") df.size >>172440

Problem description

The ‘c’ engine should read the files with non-UTF-8 filenames

Expected Output

File content readed into dataframe

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.1.final.0
python-bits: 32
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 42 Stepping 7, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.20.3
pytest: None
pip: 9.0.1
setuptools: 28.8.0
Cython: None
numpy: 1.13.2
scipy: 0.19.1
xarray: None
IPython: 6.2.1
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: 2.4.8
xlrd: None
xlwt: None
xlsxwriter: None
lxml: 4.0.0
bs4: None
html5lib: 1.0b10
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None
None

The text was updated successfully, but these errors were encountered:

Источник

Не получается прочитать .xls или .csv с кириллицей

File «pandas\_libs\parsers.pyx», line 542, in pandas._libs.parsers.TextReader.__cinit__
File «pandas\_libs\parsers.pyx», line 642, in pandas._libs.parsers.TextReader._get_header
File «pandas\_libs\parsers.pyx», line 843, in pandas._libs.parsers.TextReader._tokenize_rows
File «pandas\_libs\parsers.pyx», line 1917, in pandas._libs.parsers.raise_parser_error
UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0xd0 in position 0: invalid continuation byte

Добавлено через 37 секунд
при игноре ошибок выводит содержимое без кириллицы

Экспорт в csv или xls
при экспорте в excel из php переносит не только нужные данные но и всю страницу function.

Конвертация json в csv или xls
Добрый день, необходимо конвертировать файл json в табличный вид. On-line конверторы не подходят.

Экспорт из phpmyadmin в csv или xls
Экспорт из phpmyadmin в csv или xls есть ли у кого рабочий код?

Простой парсер xls или csv
Прошу помощи в решении задачи. Надо xls файл (либо csv, который получается из этого xls).

Welemir1, list 28.06.2021.xls’ mode=’r’ encoding=’utf-8’>
я его сохраняю в utf-8

Добавлено через 1 минуту
Welemir1, попробовал windows-1251. Кажется получилось: [

Добавлено через 11 минут
Welemir1, с excel все равно не получается ни utf-8 ни windows-1251. Сохраняю как utf-8 при попытке прочитать

def read_excel(path): with open(path, encoding='utf-8') as f: print(f) excel_data = pandas.read_excel(f) #excel_data = pandas.read_excel(path) data = excel_data.to_json(orient="records") print(data)

File «C:\Users\╚┴\Desktop\DobavlyaemRoji\venv\lib\site-packages\pandas\io\excel\_base.py», line 1075, in inspect_excel_format
buf = stream.read(PEEK_SIZE)
File «C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.1776.0_x64__qbz5n2kfra 8p0\lib\codecs.py», line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0xd0 in position 0: invalid continuation byte

Источник