- Удаление знаков препинания из строки в Python
- Использование метода translate и функции maketrans
- Использование регулярных выражений
- Python: Remove Punctuation from a String (3 Different Ways!)
- Use Python to Remove Punctuation from a String with Translate
- What is Python’s string.punctuation?
- Use Python to Strip Punctuation from a String with Regular Expressions (regex)
- Use Python to Remove Punctuation from a String with str.replace
- Use Python to Strip Punctuation from a String using a for-loop
- What is the fastest way to strip a Python String from Punctuation?
- Frequently Asked Questions
- Conclusion
- Additional Resources
Удаление знаков препинания из строки в Python
Одной из общих задач, которые часто возникают при работе со строками в Python, является удаление знаков препинания из строки. Например, может быть строка, содержащая текст с знаками препинания, которые требуется удалить для анализа текста или предобработки данных.
Пример строки: «Привет, мир! Это — тестовая строка. Как дела?»
Существует несколько способов решения этой задачи. Один из них — использование метода translate и функции maketrans модуля string .
Использование метода translate и функции maketrans
Метод translate в Python используется для замены определенных символов в строке. Этот метод требует таблицу перевода символов, которую можно создать с помощью функции maketrans .
import string # исходная строка s = "Привет, мир! Это - тестовая строка. Как дела?" # создание таблицы перевода символов table = str.maketrans("", "", string.punctuation) # применение таблицы к строке new_s = s.translate(table)
В этом коде функция maketrans создает таблицу перевода символов, которая затем используется методом translate для удаления знаков препинания из строки.
Использование регулярных выражений
Другой способ — использование модуля re для работы с регулярными выражениями.
import re # исходная строка s = "Привет, мир! Это - тестовая строка. Как дела?" # удаляем знаки препинания new_s = re.sub(r'[^\w\s]', '', s)
В этом коде re.sub заменяет все символы, не являющиеся буквами или цифрами, на пустую строку, что в итоге приводит к удалению всех знаков препинания.
Оба способа эффективны и могут быть использованы в зависимости от конкретной ситуации. Однако, использование translate и maketrans может быть быстрее, особенно на больших объемах данных.
Python: Remove Punctuation from a String (3 Different Ways!)
In this tutorial, you’ll learn how to use Python to remove punctuation from a string. You’ll learn how to strip punctuation from a Python string using the str.translate() method, the str.replace() method, the popular regular expression library re , and, finally, using for-loops.
Being able to work with and manipulate strings is an essential skill for any Pythonista. Strings you find via the internet or your files will often require quite a bit of work in order to be able to analyze them. One of the tasks you’ll often encounter is the ability to use Python to remove punctuation from a string.
The Quick Answer: Use .translate() for the fastest performance
Use Python to Remove Punctuation from a String with Translate
One of the easiest ways to remove punctuation from a string in Python is to use the str.translate() method. The translate() method typically takes a translation table, which we’ll do using the .maketrans() method.
Let’s take a look at how we can use the .translate() method to remove punctuation from a string in Python. In order to do this, we’ll import the built-in string library, which comes bundled with a punctuation attribute.
import string a_string = '!hi. wh?at is the weat[h]er lik?e.' new_string = a_string.translate(str.maketrans('', '', string.punctuation)) print(new_string) # Returns: hi what is the weather like
The .maketrans() method here takes three arguments, the first two of which are empty strings, and the third is the list of punctuation we want to remove. This tells the function to replace all punctuation with None .
Want to learn more? If you want to learn how to use the translate method (and others!) to remove a character from a string in Python, check out my in-depth tutorial here.
What is Python’s string.punctuation?
Python comes built-in with a library, string , which includes an attribute string.punctuation that includes many built-in punctuation characters. Because the library is built-in, you don’t need to worry about needing to install it.
In case you’re curious about what punctuation is included in the string.punctuation , let’s have a quick look:
print(string.punctuation) # Returns: !"#$%&'()*+,-./:;?@[\]^_`<|>~
Use Python to Strip Punctuation from a String with Regular Expressions (regex)
The Python regular expression library, re , feels like it can do just about anything – including stripping punctuation from a string!
Regular expressions are great because it comes built-in with a number of helpful character classes that allow us to select different types of characters. For example, \w\s looks for words or whitespaces. We can select the opposite of this (i.e., anything that isn’t a word or whitespace) using the ^ character. This, then, allows us to select anything that isn’t a word or whitespace, which in our case, it selects punctuation.
Let’s see how we can use regex to remove punctuation in Python:
import re a_string = '!hi. wh?at is the weat[h]er lik?e.' new_string = re.sub(r'[^\w\s]', '', a_string) print(new_string) # Returns: hi what is the weather like
This is a great approach that looks for anything that isn’t an alphanumeric character or whitespace, and replaces it with a blank string, thereby removing it.
Use Python to Remove Punctuation from a String with str.replace
The str.replace() method makes easy work of replacing a single character. For example, if you wanted to only replace a single punctuation character, this would be a simple, straightforward solution.
Let’s say you only wanted to replace the ! character from our string, we could use the str.replace() method to accomplish this. Let’s take a look at how to:
a_string = '!hi. wh?at is the weat[h]er lik?e.' new_string = a_string.replace('!', '') print(new_string) # Returns: hi. wh?at is the weat[h]er lik?e.
What we’ve done here, is append the .replace() method to our string. The first parameter is the string to replace, which in this case is our ! character. The second parameter is what to replace it with, which in this case is an empty string.
In the next example, you’ll learn how to use a for loop to replace all punctuation from a string using a for-loop.
Use Python to Strip Punctuation from a String using a for-loop
In the previous section of the tutorial, you learned how to use the str.replace() method to remove a single punctuation character. In this section, we’ll repeat this example, but use a for-loop to be able to remove every punctuation character.
Let’s see how we can do this in Python:
import string a_string = '!hi. wh?at is the weat[h]er lik?e.' for character in string.punctuation: a_string = a_string.replace(character, '') print(a_string) # Returns: hi what is the weather like
One of the things to note here is that we’re writing over our original string here. We can’t assign a new string, as it will continuously replace itself.
Now that you’ve learned a number of methods, let’s see which of these methods is the fastest.
What is the fastest way to strip a Python String from Punctuation?
In this tutorial, you’ve learned three different methods to remove punctuation from a string in Python. Let’s see which of these methods is the fastest.
For this test, we created a string that’s over 1,000,000,000 characters long and removed all punctuation from a string using Python.
Let’s take a look at the results:
The str.translate() method is the fastest way to remove punctuation from a string in Python – sometimes up to 40 times faster!
Of course, speed isn’t everything, but finding code that significantly slows down your code will often lead to a poorer user experience.
Frequently Asked Questions
Python comes with a built-in library, string, that includes all common punctuation characters using the string.punctuation attribute. Included are: !”#$%&'()*+,-./:;?@[\]^_`<|>~
The easiest way to replace punctuation with a space in Python is to use the .translate() method with the string.punctuation library. Simply write: a_string.str.maketrans(dict.fromkeys(string.punctuation, ‘ ‘)) .
Conclusion
In this post, you learned how to strip punctuation from a Python string. You learned how to do this using the str.translate() method, as well as regular expressions. You also learned how to do this with the .replace() method as well as with a for-loop. Finally, you learned which of these methods is the fastest.
To learn more about the str.translate() method, check out the official documentation here.
Additional Resources
To learn more about related topics, check out the tutorials below: