Python replace all punctuation with space

Remove punctuation from String in Python

In this artilce, we will discuss different ways to remove all punctuation characters from a string in Python.

Table Of Contents

The string package in python provides a variable punctuation and it contains all the punctuation characters i.e.

Remove punctuation from String using Regex

In Python, the regex module provides a function to replace the contents of a string based on a matching regex pattern. The signature of function is like this,

Frequently Asked:

sub(pattern, replacement_str, original_str)

It looks for the matches of the given regex pattern in the sting original_str and replaces all occurrences of matches with the string replacement_str.

We can use the regex.sub() function, to remove all punctuations from a string. For that we need pass pass a regex pattern that matches all the punctuations in a string. Then replace the matched substrings with empty strings.

For example,

import re import string strValue = "This. is,. !! a ;; sample. ! string !!" pattern = "[" + re.escape(string.punctuation) + "]" strValue = re.sub(pattern, '', strValue ) print(strValue)

It deleted all the punctuation characters from a string in Python.

Remove punctuation from String using replace()

In Python, the string class provides a member function to replace all occurrences of a substring in a string i.e.

str.replace(to_be_replaced, replacement, count )

It accepts two arguments i.e.

  • to_be_replaced : The string to be replaced.
  • replacement : The replacement string.
  • count : The number of times sub-string will be replaced with the replacement string.

It returns a copy of the calling string object but with the changed contents i.e. after replacing all the occurrences of sub-string in that string. The original string remains unchanged.

The string package in python provides a variable punctuation and it contains all the punctuation characters i.e.

Iterate over all the characters in string.punctuation and replace all occurrences of each punctuation with an empty string.

For example,

import string strValue = "This. is,. !! a ;; sample. ! string !!" for ch in string.punctuation: strValue = strValue.replace(ch, '') print(strValue)

It deleted all the punctuation characters from a string in Python.

Remove punctuation from String using translate()

In Python, the string class provides a member function translate(). It accepts a translation table as an argument and replaces the characters in a string based on the mapping in the translation table.

We can use the maketrans() function to create this mapping in the form of a translation table. If we want to just delete some characters instead of replacement, then we can provide those characters as a third argument.

deleted all the punctuation characters from a string, create a translation table where all punctuations are mapped with empty strings. Then pass that translation table to the translate() function.

Читайте также:  Преимущества использования функций в php

For example,

import string strValue = "This. is,. !! a ;; sample. ! string !!" # Map characters to be deleted with an empty string translation_table = str.maketrans('', '', string.punctuation) # Remove specific characters from the string based # on translation table / mapping table strValue = strValue.translate(translation_table) print(strValue)

It deleted all the punctuation characters from a string in Python.

Remove punctuation from String using filter() & join()

In Python, you can use the filter() function to filter all the occurrences of certain characters from a string. The steps are as follows,

  • Create a lambda function that accepts a character as an argument and returns True if the passed character matches the condition.
  • Pass this lambda function as the conditional argument to filter() function along with the string to be modified.
  • The filter() function iterates over all characters of string and yields only those characters for which lambda function returns True i.e. all characters except the character to be deleted.
  • Use join() function to combine all yeilded characters returned by filter() function.

This way, we can filter out some characters from the string based on conditions.

Create a lambda function that accepts a character as argument and check if that character is a punctuation or not. Pass that lambda function to filter() function along with the string. It will filter out all the punctuations from string. Then join back the remaining characters.

For example,

import string strValue = "This. is,. !! a ;; sample. ! string !!" # Filter all digits from characters in string and create list of remaining chars filtered_chars = filter(lambda item: item not in string.punctuation, strValue) # Join characters in the filtered list strValue = ''.join(filtered_chars) print(strValue)

It deleted all the punctuation characters from a string in Python.

We learned about four different ways to remove punctuations from a string in Python.

Share your love

Leave a Comment Cancel Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Terms of Use

Disclaimer

Copyright © 2023 thisPointer

To provide the best experiences, we and our partners use technologies like cookies to store and/or access device information. Consenting to these technologies will allow us and our partners to process personal data such as browsing behavior or unique IDs on this site and show (non-) personalized ads. Not consenting or withdrawing consent, may adversely affect certain features and functions.

Click below to consent to the above or make granular choices. Your choices will be applied to this site only. You can change your settings at any time, including withdrawing your consent, by using the toggles on the Cookie Policy, or by clicking on the manage consent button at the bottom of the screen.

The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.

Читайте также:  Найти сумму цифр целого числа питон

The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.

The technical storage or access that is used exclusively for statistical purposes. The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.

The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.

Источник

Remove all special characters, punctuation and spaces from string

I need to remove all special characters, punctuation and spaces from a string so that I only have letters and numbers.

19 Answers 19

This can be done without regex:

>>> string = "Special $#! characters spaces 888323" >>> ''.join(e for e in string if e.isalnum()) 'Specialcharactersspaces888323' 
S.isalnum() -> bool Return True if all characters in S are alphanumeric and there is at least one character in S, False otherwise. 

If you insist on using regex, other solutions will do fine. However note that if it can be done without using a regular expression, that’s the best way to go about it.

@DiegoNavarro except that’s not true, I benchmarked both the isalnum() and regex versions, and the regex one is 50-75% faster

Tried this in Python3 — it accepts unicode chars so it’s useless to me. Try string = «B223323\§§§$3\u445454» as an example. The result? ‘B2233233䑔54’

Additionally: «For 8-bit strings, this method is locale-dependent.»! Thus the regex alternative is strictly better!

Here is a regex to match a string of characters that are not a letters or numbers:

Here is the Python command to do a regex substitution:

I guess this doesn’t work with modified character in other languages, like á, ö, ñ, etc. Am I right? If so, how would it be the regex for it?

just add the special characters of that particular language. For example, to use for german text, re.sub(‘[^A-Za-z0-9 ,.-_\’äöüÄÖÜß]+’, », sample_text) expression can be used.

import re cleanString = re.sub('\W+','', string ) 

If you want spaces between words and numbers substitute » with ‘ ‘

Depends on the context — underscore is very useful for filenames and other identifiers, to the point that I don’t treat it as a special character but rather a sanitised space.I generally use this method myself.

r’\W+’ — slightly off topic (and very pedantic) but I suggest a habit that all regex patterns be raw strings

TLDR

I timed the provided answers.

is typically 3x faster than the next fastest provided top answer.

Caution should be taken when using this option. Some special characters (e.g. ø) may not be striped using this method.

After seeing this, I was interested in expanding on the provided answers by finding out which executes in the least amount of time, so I went through and checked some of the proposed answers with timeit against two of the example strings:

  • string1 = ‘Special $#! characters spaces 888323’
  • string2 = ‘how much for the maple syrup? $20.99? That s ridiculous. ‘
Читайте также:  Lk uksn ru page php

Example 1

'.join(e for e in string if e.isalnum()) 

Example 2

import re re.sub('[^A-Za-z0-9]+', '', string) 

Example 3

The above results are a product of the lowest returned result from an average of: repeat(3, 2000000)

Example 3 can be 3x faster than Example 1.

@kkurian If you read the beginning of my answer, this is merely a comparison of the previously proposed solutions above. You might want to comment on the originating answer. stackoverflow.com/a/25183802/2560922

Python 2.*

I think just filter(str.isalnum, string) works

In [20]: filter(str.isalnum, 'string with special chars like !,#$% etcs.') Out[20]: 'stringwithspecialcharslikeetcs' 

Python 3.*

In Python3, filter( ) function would return an itertable object (instead of string unlike in above). One has to join back to get a string from itertable:

''.join(filter(str.isalnum, string)) 

or to pass list in join use (not sure but can be fast a bit)

''.join([*filter(str.isalnum, string)]) 

note: unpacking in [*args] valid from Python >= 3.5

@Alexey correct, In python3 map , filter , and reduce returns itertable object instead. Still in Python3+ I will prefer ».join(filter(str.isalnum, string)) (or to pass list in join use ».join([*filter(str.isalnum, string)]) ) over accepted answer.

I’m not certain ».join(filter(str.isalnum, string)) is an improvement on filter(str.isalnum, string) , at least to read. Is this really the Pythreenic (yeah, you can use that) way to do this?

@TheProletariat The point is just filter(str.isalnum, string) do not return string in Python3 as filter( ) in Python-3 returns iterator rather than argument type unlike Python-2.+

@GrijeshChauhan, I think you should update your answer to include both your Python2 and Python3 recommendations.

#!/usr/bin/python import re strs = "how much for the maple syrup? $20.99? That's ricidulous. " print strs nstr = re.sub(r'[?|$|.|!]',r'',strs) print nstr nestr = re.sub(r'[^a-zA-Z0-9 ]',r'',nstr) print nestr 

you can add more special character and that will be replaced by » means nothing i.e they will be removed.

Differently than everyone else did using regex, I would try to exclude every character that is not what I want, instead of enumerating explicitly what I don’t want.

For example, if I want only characters from ‘a to z’ (upper and lower case) and numbers, I would exclude everything else:

import re s = re.sub(r"[^a-zA-Z0-9]","",s) 

This means «substitute every character that is not a number, or a character in the range ‘a to z’ or ‘A to Z’ with an empty string».

In fact, if you insert the special character ^ at the first place of your regex, you will get the negation.

Extra tip: if you also need to lowercase the result, you can make the regex even faster and easier, as long as you won’t find any uppercase now.

import re s = re.sub(r"[^a-z0-9]","",s.lower()) 

Источник

Оцените статью