Match any string in php

preg_match_all

Searches subject for all matches to the regular expression given in pattern and puts them in matches in the order specified by flags .

After the first match is found, the subsequent searches are continued on from end of the last match.

Parameters

The pattern to search for, as a string.

Array of all matches in multi-dimensional array ordered according to flags .

Can be a combination of the following flags (note that it doesn’t make sense to use PREG_PATTERN_ORDER together with PREG_SET_ORDER ): PREG_PATTERN_ORDER

Orders results so that $matches[0] is an array of full pattern matches, $matches[1] is an array of strings matched by the first parenthesized subpattern, and so on.

preg_match_all ( «|<[^>]+>(.*)]+>|U» ,
«example:

this is a test

» ,
$out , PREG_PATTERN_ORDER );
echo $out [ 0 ][ 0 ] . «, » . $out [ 0 ][ 1 ] . «\n» ;
echo $out [ 1 ][ 0 ] . «, » . $out [ 1 ][ 1 ] . «\n» ;
?>

The above example will output:

example: , 
this is a test
example: , this is a test

So, $out[0] contains array of strings that matched full pattern, and $out[1] contains array of strings enclosed by tags.

If the pattern contains named subpatterns, $matches additionally contains entries for keys with the subpattern name.

If the pattern contains duplicate named subpatterns, only the rightmost subpattern is stored in $matches[NAME] .

The above example will output:

PREG_SET_ORDER

Orders results so that $matches[0] is an array of first set of matches, $matches[1] is an array of second set of matches, and so on.

preg_match_all ( «|<[^>]+>(.*)]+>|U» ,
«example:

this is a test

» ,
$out , PREG_SET_ORDER );
echo $out [ 0 ][ 0 ] . «, » . $out [ 0 ][ 1 ] . «\n» ;
echo $out [ 1 ][ 0 ] . «, » . $out [ 1 ][ 1 ] . «\n» ;
?>

The above example will output:

example: , example: 
this is a test
, this is a test

PREG_OFFSET_CAPTURE

If this flag is passed, for every occurring match the appendant string offset (in bytes) will also be returned. Note that this changes the value of matches into an array of arrays where every element is an array consisting of the matched string at offset 0 and its string offset into subject at offset 1 .

preg_match_all ( ‘/(foo)(bar)(baz)/’ , ‘foobarbaz’ , $matches , PREG_OFFSET_CAPTURE );
print_r ( $matches );
?>

The above example will output:

Array ( [0] => Array ( [0] => Array ( [0] => foobarbaz [1] => 0 ) ) [1] => Array ( [0] => Array ( [0] => foo [1] => 0 ) ) [2] => Array ( [0] => Array ( [0] => bar [1] => 3 ) ) [3] => Array ( [0] => Array ( [0] => baz [1] => 6 ) ) )

PREG_UNMATCHED_AS_NULL

Читайте также:  Convert unicode to str in python

If this flag is passed, unmatched subpatterns are reported as null ; otherwise they are reported as an empty string .

If no order flag is given, PREG_PATTERN_ORDER is assumed.

Normally, the search starts from the beginning of the subject string. The optional parameter offset can be used to specify the alternate place from which to start the search (in bytes).

Note:

Using offset is not equivalent to passing substr($subject, $offset) to preg_match_all() in place of the subject string, because pattern can contain assertions such as ^, $ or (?<=x). See preg_match() for examples.

Return Values

Returns the number of full pattern matches (which might be zero), or false on failure.

Errors/Exceptions

If the regex pattern passed does not compile to a valid regex, an E_WARNING is emitted.

Changelog

Version Description
7.2.0 The PREG_UNMATCHED_AS_NULL is now supported for the $flags parameter.

Examples

Example #1 Getting all phone numbers out of some text.

Example #2 Find matching HTML tags (greedy)

preg_match_all ( «/(<([\w]+)[^>]*>)(.*?)()/» , $html , $matches , PREG_SET_ORDER );

foreach ( $matches as $val ) echo «matched: » . $val [ 0 ] . «\n» ;
echo «part 1: » . $val [ 1 ] . «\n» ;
echo «part 2: » . $val [ 2 ] . «\n» ;
echo «part 3: » . $val [ 3 ] . «\n» ;
echo «part 4: » . $val [ 4 ] . «\n\n» ;
>
?>

The above example will output:

matched: bold text part 1: part 2: b part 3: bold text part 4: matched: click me part 1: part 2: a part 3: click me part 4: 

Example #3 Using named subpattern

preg_match_all ( ‘/(?P\w+): (?P\d+)/’ , $str , $matches );

The above example will output:

Array ( [0] => Array ( [0] => a: 1 [1] => b: 2 [2] => c: 3 ) [name] => Array ( [0] => a [1] => b [2] => c ) [1] => Array ( [0] => a [1] => b [2] => c ) [digit] => Array ( [0] => 1 [1] => 2 [2] => 3 ) [2] => Array ( [0] => 1 [1] => 2 [2] => 3 ) )

See Also

  • PCRE Patterns
  • preg_quote() — Quote regular expression characters
  • preg_match() — Perform a regular expression match
  • preg_replace() — Perform a regular expression search and replace
  • preg_split() — Split string by a regular expression
  • preg_last_error() — Returns the error code of the last PCRE regex execution

User Contributed Notes 37 notes

The code that john at mccarthy dot net posted is not necessary. If you want your results grouped by individual match simply use:

Читайте также:  Example of using the scoped attribute

preg_match_all($pattern, $string, $matches, PREG_SET_ORDER);
?>

preg_match_all(‘/([GH])([12])([!?])/’, ‘G1? H2!’, $matches); // Default PREG_PATTERN_ORDER
// $matches = array(0 => array(0 => ‘G1?’, 1 => ‘H2!’),
// 1 => array(0 => ‘G’, 1 => ‘H’),
// 2 => array(0 => ‘1’, 1 => ‘2’),
// 3 => array(0 => ‘?’, 1 => ‘!’))

preg_match_all(‘/([GH])([12])([!?])/’, ‘G1? H2!’, $matches, PREG_SET_ORDER);
// $matches = array(0 => array(0 => ‘G1?’, 1 => ‘G’, 2 => ‘1’, 3 => ‘?’),
// 1 => array(0 => ‘H2!’, 1 => ‘H’, 2 => ‘2’, 3 => ‘!’))
?>

PREG_OFFSET_CAPTURE always seems to provide byte offsets, rather than character position offsets, even when you are using the unicode /u modifier.

if you want to extract all s from a string:

$pattern = «/<[^>]*>/» ;
$subject = » foo bar» ;
preg_match_all ( $pattern , $subject , $matches );
print_r ( $matches );
?>

output:

Here is a awesome online regex editor https://regex101.com/
which helps you test your regular expressions (prce, js, python) with real-time highlighting of regex match on data input.

Here’s some fleecy code to 1. validate RCF2822 conformity of address lists and 2. to extract the address specification (the part commonly known as ’email’). I wouldn’t suggest using it for input form email checking, but it might be just what you want for other email applications. I know it can be optimized further, but that part I’ll leave up to you nutcrackers. The total length of the resulting Regex is about 30000 bytes. That because it accepts comments. You can remove that by setting $cfws to $fws and it shrinks to about 6000 bytes. Conformity checking is absolutely and strictly referring to RFC2822. Have fun and email me if you have any enhancements!

function mime_extract_rfc2822_address ( $string )
//rfc2822 token setup
$crlf = «(?:\r\n)» ;
$wsp = «[\t ]» ;
$text = «[\\x01-\\x09\\x0B\\x0C\\x0E-\\x7F]» ;
$quoted_pair = «(?:\\\\ $text )» ;
$fws = «(?:(?: $wsp * $crlf )? $wsp +)» ;
$ctext = «[\\x01-\\x08\\x0B\\x0C\\x0E-\\x1F» .
«!-‘*-[\\]-\\x7F]» ;
$comment = «(\\((?: $fws ?(?: $ctext | $quoted_pair |(?1)))*» .
» $fws ?\\))» ;
$cfws = «(?:(?: $fws ? $comment )*(?:(?: $fws ? $comment )| $fws ))» ;
//$cfws = $fws; //an alternative to comments
$atext = «[!#-‘*+\\-\\/0-9=?A-Z\\^-~]» ;
$atom = «(?: $cfws ? $atext + $cfws ?)» ;
$dot_atom_text = «(?: $atext +(?:\\. $atext +)*)» ;
$dot_atom = «(?: $cfws ? $dot_atom_text$cfws ?)» ;
$qtext = «[\\x01-\\x08\\x0B\\x0C\\x0E-\\x1F!#-[\\]-\\x7F]» ;
$qcontent = «(?: $qtext | $quoted_pair )» ;
$quoted_string = «(?: $cfws ?\»(?: $fws ? $qcontent )* $fws ?\» $cfws ?)» ;
$dtext = «[\\x01-\\x08\\x0B\\x0C\\x0E-\\x1F!-Z\\^-\\x7F]» ;
$dcontent = «(?: $dtext | $quoted_pair )» ;
$domain_literal = «(?: $cfws ?\\[(?: $fws ? $dcontent )* $fws ?] $cfws ?)» ;
$domain = «(?: $dot_atom | $domain_literal )» ;
$local_part = «(?: $dot_atom | $quoted_string )» ;
$addr_spec = «( $local_part @ $domain )» ;
$display_name = «(?:(?: $atom | $quoted_string )+)» ;
$angle_addr = «(?: $cfws ? < $addr_spec >$cfws ?)» ;
$name_addr = «(?: $display_name ? $angle_addr )» ;
$mailbox = «(?: $name_addr | $addr_spec )» ;
$mailbox_list = «(?:(?:(?:(? <=:)|,) $mailbox )+)" ;
$group = «(?: $display_name :(?: $mailbox_list | $cfws )?; $cfws ?)» ;
$address = «(?: $mailbox | $group )» ;
$address_list = «(?:(?:^|,) $address )+» ;

Читайте также:  Самые популярные java приложение

//output length of string (just so you see how f**king long it is)
echo( strlen ( $address_list ) . » » );

//apply expression
preg_match_all ( «/^ $address_list $/» , $string , $array , PREG_SET_ORDER );

preg_match_all() and other preg_*() functions doesn’t work well with very long strings, at least longer that 1Mb.
In this case case function returns FALSE and $matchers value is unpredictable, may contain some values, may be empty.
In this case workaround is pre-split long string onto parts, for instance explode() long string by some criteria and then apply preg_match_all() on each part.
Typical scenario for this case is log analysis by regular expressions.
Tested on PHP 7.2.0

To count str_length in UTF-8 string i use

$count = preg_match_all(«/[[:print:]\pL]/u», $str, $pockets);

where
[:print:] — printing characters, including space
\pL — UTF-8 Letter
/u — UTF-8 string
other unicode character properties on http://www.pcre.org/pcre.txt

Here is a function that replaces all occurrences of a number in a string by the number—

function decremente_chaine ( $chaine )
//récupérer toutes les occurrences de nombres et leurs indices
preg_match_all ( «/6+/» , $chaine , $out , PREG_OFFSET_CAPTURE );
//parcourir les occurrences
for( $i = 0 ; $i < sizeof ( $out [ 0 ]); $i ++)
$longueurnombre = strlen ((string) $out [ 0 ][ $i ][ 0 ]);
$taillechaine = strlen ( $chaine );
// découper la chaine en 3 morceaux
$debut = substr ( $chaine , 0 , $out [ 0 ][ $i ][ 1 ]);
$milieu = ( $out [ 0 ][ $i ][ 0 ])- 1 ;
$fin = substr ( $chaine , $out [ 0 ][ $i ][ 1 ]+ $longueurnombre , $taillechaine );
// si c’est 10,100,1000 etc. on décale tout de 1 car le résultat comporte un chiffre de moins
if( preg_match ( ‘#[1][0]+$#’ , $out [ 0 ][ $i ][ 0 ]))
for( $j = $i + 1 ; $j < sizeof ( $out [ 0 ]); $j ++)
$out [ 0 ][ $j ][ 1 ] = $out [ 0 ][ $j ][ 1 ] — 1 ;
>
>
$chaine = $debut . $milieu . $fin ;
>
return $chaine ;
>
?>

This is a function to convert byte offsets into (UTF-8) character offsets (this is reagardless of whether you use /u modifier:

$pn_offset = strlen ( mb_substr ( $ps_subject , 0 , $pn_offset , $ps_encoding ));
$ret = preg_match_all ( $ps_pattern , $ps_subject , $pa_matches , $pn_flags , $pn_offset );

Источник

Оцените статью