Python проверка строки на совпадение

Check if string matches pattern

From the docs on re.match : If zero or more characters at the beginning of string match the regular expression pattern . I just spent like 30 minutes trying to understand why I couldn’t match something at the end of a string. Seems like it’s not possible with match , is it? For that, re.search(pattern, my_string) works though.

@conradk Yes, you’re right, I think there’s something like an implied ^ at the beginning when you use match . I think it’s a bit more complicated then that very simple explanation, but I’m not clear. You are correct that it does start from the beginning of the string though.

Yes, but that’s what the questioner wants. I’m not sure what you mean by «only makes sense with search() «. It works perfectly fine with match.

To be clear: You probably want to check if pattern.match returns something; luckily «None» is truthy, so you can just do «if pattern.match:»

One-liner: re.match(r»pattern», string) # No need to compile

import re >>> if re.match(r"hello7+", 'hello1'): . print('Yes') . Yes 

You can evalute it as bool if needed

>>> bool(re.match(r"hello5+", 'hello1')) True 

That’s weird. Why can you use re.match in the context of an if , but you have to use bool if you’re using it elsewhere?

@SuhFangmbeng Compilation is useful when the same re is used in more than one places to improve efficiency. In terms of error .match would throw the same error what .compile does. It’s perfectly safe to use.

@nehem actually all of the regex functions in re module compile and cache the patterns. Therefore there is absolutely no efficiency gain using compile and then match than just directly calling re.match . All of these functions call the internal function _compile (including re.compile ) which does the caching to a python dictionary.

import re name = ["A1B1", "djdd", "B2C4", "C2H2", "jdoi","1A4V"] # Match names. for element in name: m = re.match("(^[A-Z]\d[A-Z]\d)", element) if m: print(m.groups()) 

This is the only case that returns the match which is required for getting groups. Best answer in my opinion.

import re import sys prog = re.compile('([A-Z]\d+)+') while True: line = sys.stdin.readline() if not line: break if prog.match(line): print 'matched' else: print 'not matched' 

As stated in the comments, all these answers using re.match implicitly matches on the start of the string. re.search is needed if you want to generalize to the whole string.

import re pattern = re.compile("([A-Z]9+)+") # finds match anywhere in string bool(re.search(pattern, 'aA1A1')) # True # matches on start of string, even though pattern does not have ^ constraint bool(re.match(pattern, 'aA1A1')) # False 

If you need the full string to exactly match the regex, see @Ali Sajjad’s answer using re.fullmatch

Читайте также:  Python add one year to date

Credit: @LondonRob and @conradkleinespel in the comments.

Careful! (Maybe you want to check if FULL string matches)

The re.match(. ) will not work if you want to match the full string.

  • re.match(«[a-z]+», «abcdef») ✅ will give a match
  • But! re.match(«[a-z]+», «abcdef 12345») ✅ will also give a match because there is a part in string which matches (maybe you don’t want that when you’re checking if the entire string is valid or not)

Solution

Use re.fullmatch(. ) . This will only match if the

if re.fullmatch("[a-z]+", my_string): print("Yes") 
Example

One liner: bool(re.fullmatch(«[a-z]+», my_string))

regular expressions make this easy .

[A-Z] will match exactly one character between A and Z

\d+ will match one or more digits

() group things (and also return things. but for now just think of them grouping)

 import re ab = re.compile("^([A-Z]1)+$") ab.match(string) 

I believe that should work for an uppercase, number pattern.

Ali Sajjad’s answer should be the default, i.e. fullmatch to avoid false positives.

However, it’s also important to know that you’re always checking not None for «yes, it’s a match»:

The two possibilities are therefore:

if re.fullmatch(«[a-z]+», my_string)!=None:

if bool(re.fullmatch(«[a-z]+», my_string)):

To my way of thinking both of these are really quite horribly unreadable. So a simple utility function is needed for readability:

def is_match(pattern, string, flags=re.IGNORECASE | re.DOTALL): # or "is_full_match", as desired return re.fullmatch(pattern, string, flags)!=None 

Those 2 flags are (usually) the most helpful default flags settings in my experience, rather than «0».

In practice, of course, you may need to examine the Match object delivered by re.fullmatch . But for cases where you just need to find whether there’s a match.

Just want to point out that for string without line breaks ( \n ), one could also use the anchors \A and \Z for beginning and end of a string, respectively.

import re pat = re.compile(r'\A([A-Z]4+)+\Z') pat.match('A1B2') # match pat.match('A1B2a') # no match 

This makes a difference if the string contains multiple lines and you want to match the pattern in latter lines in a string.

Читайте также:  Write variable to file in python

match vs search vs fullmatch . Which is appropriate?

re.search is the more general of the three. As others have said, re.match() checks for a match only at the beginning of the string. re.search() can mimic that too by prepending \A to whatever pattern used. On the other hand, re.fullmatch() checks if the entire string is a match, which can again be mimicked by re.search() by prepending \A and appending \Z to whatever pattern used. Below example may help illustrate this point.

# prepending \A to pattern makes `search` behave similar to `match` s1 = '1B10L1' pattern1 = r'([A-Z]1+)+' re.match(pattern1, s1) # no match re.search(pattern1, s1) # match re.search(fr"\A", s1) # no match <--- with \A prepended (behaves same as re.match) # `match` only checks at the beginning and doesn't care about the end of the string s2 = 'B10L1a' re.match(pattern1, s2) # match re.match(fr"\Z", s2) # no match <--- with \Z appended (behaves same as re.fullmatch) re.search(fr"\A\Z", s2) # no match  

If the string contains multiple lines and if it's flagged, then this relation breaks down: fullmatch never scans across lines and match scans only the first line.

# by flagging multiline, `match` and `search` match the pattern in the first line s3 = 'B10\nL1' pattern2 = r'^([A-Z]6+)+$' re.match(pattern2, s3, re.M) # match re.search(pattern2, s3, re.M) # match re.fullmatch(pattern2, s3, re.M) # no match # the pattern is in the second line but this will not be matched by `match` s4 = 'a\nB10' pattern2 = r'^([A-Z]4+)+$' re.match(pattern2, s4, re.M) # no match re.search(pattern2, s4, re.M) # match re.search(r'\A([A-Z]1+)+', s4, re.M) # no match  

To compile or not to compile

If you need to search a pattern in a single string, then there's no need to compile it since re.search , re.match etc. all make calls to _compile method anyway. However, if you need to search a pattern in multiple strings, then compiling it first makes a lot of difference performance-wise. For the example in the OP, pre-compiling the pattern and searching through the list is over 2 times faster than not compiling first.

from timeit import timeit setup2 = "import re; lst = ['A1B2', 'B10L1', 'C1N200J1K1', 'a1B2', 'A10B', 'AB400']" setup1 = setup2 + "; pat = re.compile(r'^([A-Z]8+)+$')" for _ in range(3): t1 = timeit("list(map(pat.match, lst))", setup1) # pre-compiled t2 = timeit("[re.match(r'^([A-Z]9+)+$', x) for x in lst]", setup2) print(t2 / t1) # 2.083788080189313 # 2.448126223007598 # 2.43617482049811 

Источник

Проверяем наличие подстроки в строке

В этой статье мы расскажем о четырех способах найти подстроку в строке. У каждого способа есть плюсы и минусы — о них мы также кратко поговорим.

1) Оператор in

Простейший способ проверить наличие подстроки в строке — оператор in . Он используется для проверки наличия элемента в какой-либо структуре данных. Этот оператор возвращает булево значение — True или False . Пример:

fullstring = "pythonist" substring = "python" if substring in fullstring: print "Подстрока найдена!" else: print "Подстрока не найдена!"

Этот оператор — сокращенный вызов метода __contains__ . Он подходит и для проверки наличия элемента в списке.

2) Метод String.index()

Тип string имеет метод index . Он возвращает наименьший индекс, по которому обнаруживается начало подстроки в строке. Если подстрока не найдена, то возбуждается исключение ValueError . Реализовать его можно с помощью блока try-except-else.

fullstring = "pythonist" substring = "python" try: fullstring.index(substring) except ValueError: print "Подстрока не найдена!" else: print "Подстрока найдена!"

Этот метод может особенно пригодиться, если вам нужно найти индекс подстроки, а не просто узнать о ее существовании.

3) Метод String.find()

Тип string имеет еще один метод — find . Он удобнее, чем index , потому что благодаря ему нам не нужно думать об обработке ошибок. Если метод не находит подстроку в строке, он возвращает -1 . В случае успеха он возвращает наименьший индекс, по которому обнаруживается подстрока.

fullstring = "pythonist" substring = "python" if fullstring.find(substring) != -1: print "Подстрока найдена!" else: print "Подстрока не найдена!"

Если вы не хотите обрабатывать ошибки — отдайте предпочтение этому методу, а не index .

4) Регулярные выражения (REGEX)

Регулярные выражения — более гибкий способ нахождения подстрок в строках. В Python есть встроенный модуль, позволяющий работать с регулярными выражениями, — re . Модуль re содержит функцию search , которая позволит вам найти подстроку:

from re import search fullstring = "pythonist" substring = "python" if search(substring, fullstring): print "Подстрока найдена!" else: print "Подстрока не найдена!"

Если вам нужны сложные сопоставления, например, учет регистра — этот метод подойдет вам лучше всего. Но у него есть и недостатки: сложность и скорость работы. То есть, в простеньких задачах его лучше не использовать.

Источник

Оцените статью