- How to Replace String in pandas DataFrame
- 1. Quick Examples to Replace String in DataFrame
- 2. pandas Replace String Example
- 3. Replace Multiple Strings
- 4. Replace Pattern of String Using Regular Expression
- 5. Using str.replace() on DataFrame
- 6. Replace String Using apply() function with lambda
- 7. Complete Example of Replace String in DataFrame
- 8. Conclusion
- You may also like reading:
- pandas.Series.str.replace#
- pandas.DataFrame.replace#
How to Replace String in pandas DataFrame
You can replace a string in the pandas DataFrame column by using replace(), str.replace() with lambda functions. In this article, I will explain how to replace the string of the DataFrame column with multiple examples.
- Replace a string with another string in pandas.
- Replace a pattern of string with another string using regular expression.
1. Quick Examples to Replace String in DataFrame
If you are in hurry below are some examples of how to replace a string in pandas DataFrame.
, , regex=True) # Replace pattern of string using regular expression. df2=df.replace(regex=['Language'],value='Lang') # By using str.replace() df['Courses'] = df['Courses'].str.replace('Language','Lang') # Replace String using apply() function with lambda. df2 = df.apply(lambda x: x.replace(, regex=True))
Now, let’s create a pandas DataFrame with a few rows and columns, execute these examples and validate results. Our DataFrame contains column names Courses , Fee and Duration .
df = pd.DataFrame(technologies) print(df)
2. pandas Replace String Example
You can replace the string of pandas DataFrame column with another string by using DataFrame.replace() method. This method updates the specified value with another specified value and returns a new DataFrame. In order to update on existing DataFrame use inplace=True
Yields below output. This example replaces the string PySpark with Python with Spark .
3. Replace Multiple Strings
Now let’s see how to replace multiple strings column(s), In this example, I will also show how to replace part of the string by using regex=True param. To update multiple string columns, use the dict with key-value pair. The below example updates Py with Python with on Courses column and days with Days on Duration column.
4. Replace Pattern of String Using Regular Expression
Using regular expression you can replace the matching string with another string in pandas DataFrame. The below example find string Language and replace it with Lan .
5. Using str.replace() on DataFrame
Alternatively, use str.replace() to replace a string, repalce() looks for exact matches unless you pass a regex pattern and param regex=True .
Yields same output as above. Note that this replaces the value on the Courses column on the existing DataFrame object.
6. Replace String Using apply() function with lambda
In this section, you can find out how to replace string using DataFrame.apply() with lambda expression. The apply() method allows you to apply a function along one of the axis of the DataFrame, default 0, which is the index (row) axis.
7. Complete Example of Replace String in DataFrame
df = pd.DataFrame(technologies) print(df) # Replace string using DataFrame.replace() method. df2 = df.replace('Py','Python with ', regex=True) print(df2) # Replace pattern of string using regular expression. df2 = df.replace(, , regex=True) print(df2) # Replace pattern of string using regular expression. df2=df.replace(regex=['Language'],value='Lang') print(df2) # By using str.replace() df['Courses'] = df['Courses'].str.replace('Language','Lang') print(df) # Replace String using apply() function with lambda. df2 = df.apply(lambda x: x.replace(, regex=True)) print(df2)
8. Conclusion
In this article, You have learned how to replace the string in pandas column by using
You may also like reading:
pandas.Series.str.replace#
Replace each occurrence of pattern/regex in the Series/Index.
Equivalent to str.replace() or re.sub() , depending on the regex value.
Parameters pat str or compiled regex
String can be a character sequence or regular expression.
repl str or callable
Replacement string or a callable. The callable is passed the regex match object and must return a replacement string to be used. See re.sub() .
n int, default -1 (all)
Number of replacements to make from start.
case bool, default None
Determines if replace is case sensitive:
- If True, case sensitive (the default if pat is a string)
- Set to False for case insensitive
- Cannot be set if pat is a compiled regex.
Regex module flags, e.g. re.IGNORECASE. Cannot be set if pat is a compiled regex.
regex bool, default False
Determines if the passed-in pattern is a regular expression:
- If True, assumes the passed-in pattern is a regular expression.
- If False, treats the pattern as a literal string
- Cannot be set to False if pat is a compiled regex or repl is a callable.
A copy of the object with all matching occurrences of pat replaced by repl .
- if regex is False and repl is a callable or pat is a compiled regex
- if pat is a compiled regex and case or flags is set
When pat is a compiled regex, all flags should be included in the compiled regex. Use of case , flags , or regex=False with a compiled regex will raise an error.
When pat is a string and regex is True (the default), the given pat is compiled as a regex. When repl is a string, it replaces matching regex patterns as with re.sub() . NaN value(s) in the Series are left as is:
>>> pd.Series(['foo', 'fuz', np.nan]).str.replace('f.', 'ba', regex=True) 0 bao 1 baz 2 NaN dtype: object
When pat is a string and regex is False, every pat is replaced with repl as with str.replace() :
>>> pd.Series(['f.o', 'fuz', np.nan]).str.replace('f.', 'ba', regex=False) 0 bao 1 fuz 2 NaN dtype: object
When repl is a callable, it is called on every pat using re.sub() . The callable should expect one positional argument (a regex object) and return a string.
>>> pd.Series(['foo', 'fuz', np.nan]).str.replace('f', repr, regex=True) 0 oo 1 uz 2 NaN dtype: object
Reverse every lowercase alphabetic word:
>>> repl = lambda m: m.group(0)[::-1] >>> ser = pd.Series(['foo 123', 'bar baz', np.nan]) >>> ser.str.replace(r'[a-z]+', repl, regex=True) 0 oof 123 1 rab zab 2 NaN dtype: object
Using regex groups (extract second group and swap case):
>>> pat = r"(?P\w+) (?P\w+) (?P\w+)" >>> repl = lambda m: m.group('two').swapcase() >>> ser = pd.Series(['One Two Three', 'Foo Bar Baz']) >>> ser.str.replace(pat, repl, regex=True) 0 tWO 1 bAR dtype: object
Using a compiled regex with flags
>>> import re >>> regex_pat = re.compile(r'FUZ', flags=re.IGNORECASE) >>> pd.Series(['foo', 'fuz', np.nan]).str.replace(regex_pat, 'bar', regex=True) 0 foo 1 bar 2 NaN dtype: object
pandas.DataFrame.replace#
Values of the DataFrame are replaced with other values dynamically.
This differs from updating with .loc or .iloc , which require you to specify a location to update with some value.
Parameters to_replace str, regex, list, dict, Series, int, float, or None
How to find the values that will be replaced.
- numeric: numeric values equal to to_replace will be replaced with value
- str: string exactly matching to_replace will be replaced with value
- regex: regexs matching to_replace will be replaced with value
- First, if to_replace and value are both lists, they must be the same length.
- Second, if regex=True then all of the strings in both lists will be interpreted as regexs otherwise they will match directly. This doesn’t matter much for value since there are only a few possible substitution regexes you can use.
- str, regex and numeric rules apply as above.
- Dicts can be used to specify different replacement values for different existing values. For example, replaces the value ‘a’ with ‘b’ and ‘y’ with ‘z’. To use a dict in this way, the optional value parameter should not be given.
- For a DataFrame a dict can specify that different values should be replaced in different columns. For example, looks for the value 1 in column ‘a’ and the value ‘z’ in column ‘b’ and replaces these values with whatever is specified in value . The value parameter should not be None in this case. You can treat this as a special case of passing two lists except that you are specifying the column to search in.
- For a DataFrame nested dictionaries, e.g., > , are read as follows: look in column ‘a’ for the value ‘b’ and replace it with NaN. The optional value parameter should not be specified to use a nested dict in this way. You can nest regular expressions as well. Note that column names (the top-level dictionary keys in a nested dictionary) cannot be regular expressions.
- This means that the regex argument must be a string, compiled regular expression, or list, dict, ndarray or Series of such elements. If value is also None then this must be a nested dictionary or Series.
See the examples section for examples of each of these.
value scalar, dict, list, str, regex, default None
Value to replace any values matching to_replace with. For a DataFrame a dict of values can be used to specify which value to use for each column (columns not in the dict will not be filled). Regular expressions, strings and lists or dicts of such objects are also allowed.
inplace bool, default False
Whether to modify the DataFrame rather than creating a new one.
limit int, default None
Maximum size gap to forward or backward fill.
regex bool or same types as to_replace , default False
Whether to interpret to_replace and/or value as regular expressions. If this is True then to_replace must be a string. Alternatively, this could be a regular expression or a list, dict, or array of regular expressions in which case to_replace must be None .
The method to use when for replacement, when to_replace is a scalar, list or tuple and value is None .
- If regex is not a bool and to_replace is not None .
- If to_replace is not a scalar, array-like, dict , or None
- If to_replace is a dict and value is not a list , dict , ndarray , or Series
- If to_replace is None and regex is not compilable into a regular expression or is a list, dict, ndarray, or Series.
- When replacing multiple bool or datetime64 objects and the arguments to to_replace does not match the type of the value being replaced
- If a list or an ndarray is passed to to_replace and value but they are not the same length.
Replace values based on boolean condition.
Simple string replacement.
- Regex substitution is performed under the hood with re.sub . The rules for substitution for re.sub are the same.
- Regular expressions will only substitute on strings, meaning you cannot provide, for example, a regular expression matching floating point numbers and expect the columns in your frame that have a numeric dtype to be matched. However, if those floating point numbers are strings, then you can do this.
- This method has a lot of options. You are encouraged to experiment and play with this method to gain intuition about how it works.
- When dict is used as the to_replace value, it is like key(s) in the dict are the to_replace part and value(s) in the dict are the value parameter.
Scalar `to_replace` and `value`
>>> s = pd.Series([1, 2, 3, 4, 5]) >>> s.replace(1, 5) 0 5 1 2 2 3 3 4 4 5 dtype: int64
>>> df = pd.DataFrame('A': [0, 1, 2, 3, 4], . 'B': [5, 6, 7, 8, 9], . 'C': ['a', 'b', 'c', 'd', 'e']>) >>> df.replace(0, 5) A B C 0 5 5 a 1 1 6 b 2 2 7 c 3 3 8 d 4 4 9 e
List-like `to_replace`
>>> df.replace([0, 1, 2, 3], 4) A B C 0 4 5 a 1 4 6 b 2 4 7 c 3 4 8 d 4 4 9 e
>>> df.replace([0, 1, 2, 3], [4, 3, 2, 1]) A B C 0 4 5 a 1 3 6 b 2 2 7 c 3 1 8 d 4 4 9 e
>>> s.replace([1, 2], method='bfill') 0 3 1 3 2 3 3 4 4 5 dtype: int64
dict-like `to_replace`
>>> df.replace(0: 10, 1: 100>) A B C 0 10 5 a 1 100 6 b 2 2 7 c 3 3 8 d 4 4 9 e
>>> df.replace('A': 0, 'B': 5>, 100) A B C 0 100 100 a 1 1 6 b 2 2 7 c 3 3 8 d 4 4 9 e
>>> df.replace('A': 0: 100, 4: 400>>) A B C 0 100 5 a 1 1 6 b 2 2 7 c 3 3 8 d 4 400 9 e
Regular expression `to_replace`
>>> df = pd.DataFrame('A': ['bat', 'foo', 'bait'], . 'B': ['abc', 'bar', 'xyz']>) >>> df.replace(to_replace=r'^ba.$', value='new', regex=True) A B 0 new abc 1 foo new 2 bait xyz
>>> df.replace('A': r'^ba.$'>, 'A': 'new'>, regex=True) A B 0 new abc 1 foo bar 2 bait xyz
>>> df.replace(regex=r'^ba.$', value='new') A B 0 new abc 1 foo new 2 bait xyz
>>> df.replace(regex=r'^ba.$': 'new', 'foo': 'xyz'>) A B 0 new abc 1 xyz new 2 bait xyz
>>> df.replace(regex=[r'^ba.$', 'foo'], value='new') A B 0 new abc 1 new new 2 bait xyz
Compare the behavior of s.replace() and s.replace(‘a’, None) to understand the peculiarities of the to_replace parameter:
When one uses a dict as the to_replace value, it is like the value(s) in the dict are equal to the value parameter. s.replace() is equivalent to s.replace(to_replace=, value=None, method=None) :
>>> s.replace('a': None>) 0 10 1 None 2 None 3 b 4 None dtype: object
When value is not explicitly passed and to_replace is a scalar, list or tuple, replace uses the method parameter (default ‘pad’) to do the replacement. So this is why the ‘a’ values are being replaced by 10 in rows 1 and 2 and ‘b’ in row 4 in this case.
>>> s.replace('a') 0 10 1 10 2 10 3 b 4 b dtype: object
On the other hand, if None is explicitly passed for value , it will be respected:
>>> s.replace('a', None) 0 10 1 None 2 None 3 b 4 None dtype: object
Changed in version 1.4.0: Previously the explicit None was silently ignored.