Python change encoding of string

Python Strings encode() method

Python String encode() converts a string value into a collection of bytes, using an encoding scheme specified by the user.

Python String encode() Method Syntax:

Syntax: encode(encoding, errors)

  • encoding: Specifies the encoding on the basis of which encoding has to be performed.
  • errors: Decides how to handle the errors if they occur, e.g ‘strict’ raises Unicode error in case of exception and ‘ignore’ ignores the errors that occurred. There are six types of error response
    • strict – default response which raises a UnicodeDecodeError exception on failure
    • ignore – ignores the unencodable unicode from the result
    • replace – replaces the unencodable unicode to a question mark ?
    • xmlcharrefreplace – inserts XML character reference instead of unencodable unicode
    • backslashreplace – inserts a \uNNNN escape sequence instead of unencodable unicode
    • namereplace – inserts a \N escape sequence instead of unencodable unicode

    Python String encode() Method Example:

    Python3

    Example 1: Code to print encoding schemes available

    There are certain encoding schemes supported by Python String encode() method. We can get the supported encodings using the Python code below.

    Python3

    The available encodings are : dict_keys(['ibm039', 'iso_ir_226', '1140', 'iso_ir_110', '1252', 'iso_8859_8', 'iso_8859_3', 'iso_ir_166', 'cp367', 'uu', 'quotedprintable', 'ibm775', 'iso_8859_16_2001', 'ebcdic_cp_ch', 'gb2312_1980', 'ibm852', 'uhc', 'macgreek', '850', 'iso2022jp_2', 'hz_gb_2312', 'elot_928', 'iso8859_1', 'eucjp', 'iso_ir_199', 'ibm865', 'cspc862latinhebrew', '863', 'iso_8859_5', 'latin4', 'windows_1253', 'csisolatingreek', 'latin5', '855', 'windows_1256', 'rot13', 'ms1361', 'windows_1254', 'ibm863', 'iso_8859_14_1998', 'utf8_ucs2', '500', 'iso8859', '775', 'l7', 'l2', 'gb18030_2000', 'l9', 'utf_32be', 'iso_ir_100', 'iso_8859_4', 'iso_ir_157', 'csibm857', 'shiftjis2004', 'iso2022jp_1', 'iso_8859_2_1987', 'cyrillic', 'ibm861', 'ms950', 'ibm437', '866', 'csibm863', '932', 'iso_8859_14', 'cskoi8r', 'csptcp154', '852', 'maclatin2', 'sjis', 'korean', '865', 'u32', 'csshiftjis', 'dbcs', 'csibm037', 'csibm1026', 'bz2', 'quopri', '860', '1255', '861', 'iso_ir_127', 'iso_celtic', 'chinese', 'l8', '1258', 'u_jis', 'cspc850multilingual', 'iso_2022_jp_2', 'greek8', 'csibm861', '646', 'unicode_1_1_utf_7', 'ibm862', 'latin2', 'ecma_118', 'csisolatinarabic', 'zlib', 'iso2022jp_3', 'ksx1001', '858', 'hkscs', 'shiftjisx0213', 'base64', 'ibm857', 'maccentraleurope', 'latin7', 'ruscii', 'cp_is', 'iso_ir_101', 'us_ascii', 'hebrew', 'ansi_x3.4_1986', 'csiso2022jp', 'iso_8859_15', 'ibm860', 'ebcdic_cp_us', 'x_mac_simp_chinese', 'csibm855', '1250', 'maciceland', 'iso_ir_148', 'iso2022jp', 'u16', 'u7', 's_jisx0213', 'iso_8859_6_1987', 'csisolatinhebrew', 'csibm424', 'quoted_printable', 'utf_16le', 'tis260', 'utf', 'x_mac_trad_chinese', '1256', 'cp866u', 'jisx0213', 'csiso58gb231280', 'windows_1250', 'cp1361', 'kz_1048', 'asmo_708', 'utf_16be', 'ecma_114', 'eucjis2004', 'x_mac_japanese', 'utf8', 'iso_ir_6', 'cp_gr', '037', 'big5_tw', 'eucgb2312_cn', 'iso_2022_jp_3', 'euc_cn', 'iso_8859_13', 'iso_8859_5_1988', 'maccyrillic', 'ks_c_5601_1987', 'greek', 'ibm869', 'roman8', 'csibm500', 'ujis', 'arabic', 'strk1048_2002', '424', 'iso_8859_11_2001', 'l5', 'iso_646.irv_1991', '869', 'ibm855', 'eucjisx0213', 'latin1', 'csibm866', 'ibm864', 'big5_hkscs', 'sjis_2004', 'us', 'iso_8859_7', 'macturkish', 'iso_2022_jp_2004', '437', 'windows_1255', 's_jis_2004', 's_jis', '1257', 'ebcdic_cp_wt', 'iso2022jp_2004', 'ms949', 'utf32', 'shiftjis', 'latin', 'windows_1251', '1125', 'ks_x_1001', 'iso_8859_10_1992', 'mskanji', 'cyrillic_asian', 'ibm273', 'tis620', '1026', 'csiso2022kr', 'cspc775baltic', 'iso_ir_58', 'latin8', 'ibm424', 'iso_ir_126', 'ansi_x3.4_1968', 'windows_1257', 'windows_1252', '949', 'base_64', 'ms936', 'csisolatin2', 'utf7', 'iso646_us', 'macroman', '1253', '862', 'iso_8859_1_1987', 'csibm860', 'gb2312_80', 'latin10', 'ksc5601', 'iso_8859_10', 'utf8_ucs4', 'csisolatin4', 'ebcdic_cp_be', 'iso_8859_1', 'hzgb', 'ansi_x3_4_1968', 'ks_c_5601', 'l3', 'cspc8codepage437', 'iso_8859_7_1987', '8859', 'ibm500', 'ibm1026', 'iso_8859_6', 'csibm865', 'ibm866', 'windows_1258', 'iso_ir_138', 'l4', 'utf_32le', 'iso_8859_11', 'thai', '864', 'euc_jis2004', 'cp936', '1251', 'zip', 'unicodebigunmarked', 'csHPRoman8', 'csibm858', 'utf16', '936', 'ibm037', 'iso_8859_8_1988', '857', 'csibm869', 'ebcdic_cp_he', 'cp819', 'euccn', 'iso_8859_2', 'ms932', 'iso_2022_jp_1', 'iso_2022_kr', 'csisolatin6', 'iso_2022_jp', 'x_mac_korean', 'latin3', 'csbig5', 'hz_gb', 'csascii', 'u8', 'csisolatin5', 'csisolatincyrillic', 'ms_kanji', 'cspcp852', 'rk1048', 'iso2022jp_ext', 'csibm273', 'iso_2022_jp_ext', 'ibm858', 'ibm850', 'sjisx0213', 'tis_620_2529_1', 'l10', 'iso_ir_109', 'ibm1125', '1254', 'euckr', 'tis_620_0', 'l1', 'ibm819', 'iso2022kr', 'ibm367', '950', 'r8', 'hex', 'cp154', 'tis_620_2529_0', 'iso_8859_16', 'pt154', 'ebcdic_cp_ca', 'ibm1140', 'l6', 'csibm864', 'csisolatin1', 'csisolatin3', 'latin6', 'iso_8859_9_1989', 'iso_8859_3_1988', 'unicodelittleunmarked', 'macintosh', '273', 'latin9', 'iso_8859_4_1988', 'iso_8859_9', 'ebcdic_cp_nl', 'iso_ir_144'])

    Example 2: Code to encode the string

    Python3

    Errors when using wrong encoding scheme

    Example 1: Python String encode() method will raise UnicodeEncodeError if wrong encoding scheme is used

    Python3

    UnicodeEncodeError: 'ascii' codec can't encode character '\xb6' in position 0: ordinal not in range(128)

    Example 2: Using ‘errors’ parameter to ignore errors while encoding

    Python String encode() method with errors parameter set to ‘ignore’ will ignore the errors in conversion of characters into specified encoding scheme.

    Python3

    Please Login to comment.

    Improve your Coding Skills with Practice

    GFG App on Play Store GFG App on App Store

    • Computer Science
    • GATE CS Notes
    • Operating Systems
    • Computer Network
    • Database Management System
    • Software Engineering
    • Digital Logic Design
    • Engineering Maths
    • Python
    • Python Programming Examples
    • Django Tutorial
    • Python Projects
    • Python Tkinter
    • OpenCV Python Tutorial
    • Python Interview Question
    • Data Science & ML
    • Data Science With Python
    • Data Science For Beginner
    • Machine Learning Tutorial
    • Maths For Machine Learning
    • Pandas Tutorial
    • NumPy Tutorial
    • NLP Tutorial
    • Deep Learning Tutorial
    • DevOps
    • Git
    • AWS
    • Docker
    • Kubernetes
    • Azure
    • GCP
    • Competitive Programming
    • Top DSA for CP
    • Top 50 Tree Problems
    • Top 50 Graph Problems
    • Top 50 Array Problems
    • Top 50 String Problems
    • Top 50 DP Problems
    • Top 15 Websites for CP
    • System Design
    • What is System Design
    • Monolithic and Distributed SD
    • Scalability in SD
    • Databases in SD
    • High Level Design or HLD
    • Low Level Design or LLD
    • Top SD Interview Questions
    • Interview Corner
    • Company Wise Preparation
    • Preparation for SDE
    • Experienced Interviews
    • Internship Interviews
    • Competitive Programming
    • Aptitude Preparation
    • GfG School
    • CBSE Notes for Class 8
    • CBSE Notes for Class 9
    • CBSE Notes for Class 10
    • CBSE Notes for Class 11
    • CBSE Notes for Class 12
    • English Grammar
    • Commerce
    • Accountancy
    • Business Studies
    • Economics
    • Management
    • Income Tax
    • Finance
    • UPSC
    • Polity Notes
    • Geography Notes
    • History Notes
    • Science and Technology Notes
    • Economics Notes
    • Important Topics in Ethics
    • UPSC Previous Year Papers
    • SSC/ BANKING
    • SSC CGL Syllabus
    • SBI PO Syllabus
    • SBI Clerk Syllabus
    • IBPS PO Syllabus
    • IBPS Clerk Syllabus
    • Aptitude Questions
    • SSC CGL Practice Papers
    • Write & Earn
    • Write an Article
    • Improve an Article
    • Pick Topics to Write
    • Write Interview Experience
    • Internships
    • Video Internship

    We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy Got It !

    This article is being improved by another user right now. You can suggest the changes for now and it will be under the article’s discussion tab.

    You will be notified via email once the article is available for improvement. Thank you for your valuable feedback!

    Источник

    How to Convert a String to UTF-8 in Python?

    In this article, we will learn to convert a string to UTF-8 in Python. We will use some built-in functions and some custom code as well. Let’s first have a quick look over what is a string in Python.

    Python String

    The String is a type in python language just like integer, float, boolean, etc. Data surrounded by single quotes or double quotes are said to be a string. A string is also known as a sequence of characters.

    string1 = "apple" string2 = "Preeti125" string3 = "12345" string4 = "pre@12"

    What is UTF-8 in Python?

    UTF is “Unicode Transformation Format” , and ‘8’ means 8-bit values are used in the encoding. It is one of the most efficient and convenient encoding formats among various encodings. In Python, Strings are by default in utf-8 format which means each alphabet corresponds to a unique code point. utf-8 encodes a Unicode string to bytes. The user receives string data on the server instead of bytes because some frameworks or library on the system has implicitly converted some random bytes to string and it happens due to encoding.

    A user might encounter a situation where his server receives utf-8 characters but when he tries to retrieve it from the query string, he gets ASCII coding. Therefore, in order to convert the plain string to utf-8, we will use the encode() method to convert a string to utf-8 in python 3.

    Use encode() to convert a String to UTF-8

    The encode() method returns the encoded version of the string. In case of failure, a UnicodeDecodeError exception may occur.

    Syntax

    string.encode(encoding = 'UTF-8', errors = 'strict')

    Parameters

    encoding — the encoding type like ‘UTF-8’, ASCII, etc.

    errors — response when encoding fails.

    There are six types of error responses:

    • strict — default response which raises a UnicodeDecodeError exception on failure
    • ignore — ignores the unencodable Unicode from the result
    • replace — replaces the unencodable Unicode to a question mark?
    • xmlcharrefreplace — inserts XML character reference instead of unencodable Unicode
    • backslashreplace — inserts a \uNNNN escape sequence instead of unencodable Unicode
    • namereplace — inserts a \N escape sequence instead of unencodable Unicode

    By default, the encode() method does not take any parameters.

    Example

    # unicode string string = 'pythön!' # default encoding to utf-8 string_utf = string.encode() print('The encoded version is:', string_utf)

    The encoded version is: b’pyth\xc3\xb6n!’

    Conclusion

    In this article, we learned to convert a plain string to utf-8 format using encode() method. You can also try using different encoding formats and error parameters.

    Источник

    Читайте также:  Input type text disabled html
Оцените статью