Python space in regexp

Regular Expression in Python with Examples | Set 1

A Regular Expressions (RegEx) is a special sequence of characters that uses a search pattern to find a string or set of strings. It can detect the presence or absence of a text by matching it with a particular pattern, and also can split a pattern into one or more sub-patterns. Python provides a re module that supports the use of regex in Python. Its primary function is to offer a search, where it takes a regular expression and a string. Here, it either returns the first match or else none.

Python3

Start Index: 34 End Index: 40

The above code gives the starting index and the ending index of the string portal.

Note: Here r character (r’portal’) stands for raw, not regex. The raw string is slightly different from a regular string, it won’t interpret the \ character as an escape character. This is because the regular expression engine uses \ character for its own escaping purpose.

Before starting with the Python regex module let’s see how to actually write regex using metacharacters or special sequences.

MetaCharacters

To understand the RE analogy, MetaCharacters are useful, important, and will be used in functions of module re. Below is the list of metacharacters.

MetaCharacters Description
\ Used to drop the special meaning of character following it
[] Represent a character class
^ Matches the beginning
$ Matches the end
. Matches any character except newline
| Means OR (Matches with any of the characters separated by it.
? Matches zero or one occurrence
* Any number of occurrences (including 0 occurrences)
+ One or more occurrences
<> Indicate the number of occurrences of a preceding regex to match.
() Enclose a group of Regex
Читайте также:  Android sdk install java home

Let’s discuss each of these metacharacters in detail

\ – Backslash

The backslash (\) makes sure that the character is not treated in a special way. This can be considered a way of escaping metacharacters. For example, if you want to search for the dot(.) in the string then you will find that dot(.) will be treated as a special character as is one of the metacharacters (as shown in the above table). So for this case, we will use the backslash(\) just before the dot(.) so that it will lose its specialty. See the below example for a better understanding.

Источник

Regex to Match no space or one space in Python

Need to use a regex to match — «no character or one character» or «zero or one space». If so, you may use the following syntax to match similar patterns:

Let’s demonstrate usage of them with an example.

Example 1: Match space or no space in a string

If you have a list of usernames like:

  • @ user_1
  • @John Doe 1969@
  • @-Peter-Parker123@
  • @123any_other_user2@
  • more users33
  • more users2
  • @more@
  • @last standing@

and you would like to match @ followed by zero or one space than you can use regex syntax: (@[ ][A-Za-z0-9 ]+) as:

import re texts = ['@ user_1 ', '@John Doe 1969@', '@-Peter-Parker123@', '@123any_other_user2@', 'more users33', 'more users2' , '@more@', '@last standing@'] for text in texts: print(re.findall(r"(@[ ][A-Za-z0-9 ]+)", text)) 
['@ user'] ['@John Doe 1969'] [] ['@123any'] [] [] ['@more'] ['@last standing'] 
  • () — stands for a capture group and all inside will be extracted
  • [ ] — matches no space or 1 space
  • [A-Za-z0-9 ]+
    • a-z (Range), Matches a character in the range «a» to «z» (char code 97 to 122)
    • 0–9 (Range), Matches a character in the range «0» to «9» (char code 48 to 57)

    Example 2: Match letter S Or letter S in URL

    Say that you have a list of URLs and you would like to extract only URLs which start with http and then contain 1 letter s or no s at all.

    import re texts = [ 'https://en.wikipedia.org/wiki/Main_Page/', 'http://en.wikipedia.org/wiki/National_Park_Service/', 'https://en.wikipedia.org/wiki/Hoover_Dam/', 'http://en.wikipedia.org/wiki/United_States_Bureau_of_Reclamation/', 'https://en.wikipedia.org/wiki/Central_African_Republic/', 'en.wikipedia.org/wiki/Africa/', 'ftp://en.wikipedia.org/wiki/Central_African_Republic/', ] for text in texts: print(re.findall(r"(http[s].*)", text)) 
    ['https://en.wikipedia.org/wiki/Main_Page/'] ['http://en.wikipedia.org/wiki/National_Park_Service/'] ['https://en.wikipedia.org/wiki/Hoover_Dam/'] ['http://en.wikipedia.org/wiki/United_States_Bureau_of_Reclamation/'] ['https://en.wikipedia.org/wiki/Central_African_Republic/'] [] [] 

    Example 3: Match strings with no more than n occurrences

    Finally if you like to find all strings which contain n number of spaces( or any other character) then you can use the next regex: re.findall(r»([_])», text) .

    So let’s count number of _ in the next URLS:

    import re texts = [ 'https://en.wikipedia.org/wiki/Main_Page/', 'http://en.wikipedia.org/wiki/National_Park_Service/', 'https://en.wikipedia.org/wiki/Hoover_Dam/', 'http://en.wikipedia.org/wiki/United_States_Bureau_of_Reclamation/', 'https://en.wikipedia.org/wiki/Central_African_Republic/', 'en.wikipedia.org/wiki/Africa/', 'ftp://en.wikipedia.org/wiki/Central_African_Republic/', ] for text in texts: print(len(re.findall(r"([_])", text)), end=' - ') print(re.findall(r"([_])", text)) 
    1 - ['_'] 2 - ['_', '_'] 1 - ['_'] 4 - ['_', '_', '_', '_'] 2 - ['_', '_'] 0 - [] 2 - ['_', '_'] 

    Of course Python offers faster solution in case of just counting:

    The advantage of the regex is the customization. You can count not only for a single character but also for a list or a pattern.

    So if you like to count how many times _ , / or spaces occurs in the strings than you can use:

    import re texts = [ 'https://en.wikipedia.org/wiki/Main_Page/', 'http://en.wikipedia.org/wiki/National_Park_Service/', 'https://en.wikipedia.org/wiki/Hoover_Dam/', 'http://en.wikipedia.org/wiki/United_States_Bureau_of_Reclamation/', 'https://en.wikipedia.org/wiki/Central_African_Republic/', 'en.wikipedia.org/wiki/Africa/', 'ftp://en.wikipedia.org/wiki/Central_African_Republic/', ] for text in texts: print(len(re.findall(r"([_/ ])", text)), end=' - ') print(re.findall(r"([_/])", text)) 
    6 - ['/', '/', '/', '/', '_', '/'] 7 - ['/', '/', '/', '/', '_', '_', '/'] 6 - ['/', '/', '/', '/', '_', '/'] 9 - ['/', '/', '/', '/', '_', '_', '_', '_', '/'] 7 - ['/', '/', '/', '/', '_', '_', '/'] 3 - ['/', '/', '/'] 7 - ['/', '/', '/', '/', '_', '_', '/'] 

    By using SoftHints — Python, Linux, Pandas , you agree to our Cookie Policy.

    Источник

Оцените статью