Java regex any symbol

Regular Expressions: how to accept any symbol

I want to replace any content in my text file in between symbols < and >What’s the regular expression to accept any symbol ? I’ve currently:

fields[i] = fields[i].replaceAll("\\", ""); 

But it works only for letters and numbers, if there is a symbol in between < and >, the string is not replaced. thanks

As a suggestion, when you make Regex and you don’t need to escape characters, do this: @»\<[a-z0-9_-]*\>» . @ disables escape expansion.

4 Answers 4

To accept any symbol, .* should do the trick.

E.g.: fields[i] = fields[i].replaceAll(«\\», «»);

Mine is a little better only because the «terminator» is long 1 char. If the terminator had been (for example) «/>» (without quotes), then his solution (using the .*?, so the non-greedy) would have been better.

Try this [^\>]* (any character that isn’t > )

Any char in regexp is «.» the «*» — is quantifier, how many. Thus if you want just one char, then use «.» (dot) and that’s it.

This is generic for the bigger-picture approach, say you wanted to clean out (or select) any symbols from a string.

A cleaner approach will be to select anything that is not alphanumeric, which by elimination must be a symbol, simply by using /\W/ , see [1]. The regex will be

let re = /\W/g // for example, given a string and you would like to // clean out any non-alphanumerics // remember this will include the spaces let s = "he$$llo# worl??d!" s = s.replace(re, '') // "helloworld"

However, if you need to exclude all non-alphanumerics except a few, say «space» from our previous example. You can use the [^ . ] (hat) pattern.

let re = /[^ \w]/g // match everything else except space and \w (alphanumeric) let s = "he$$llo# worl??d!" s = s.replace(re, '') // "hello world"

Источник

Match Any Character Using Regex In Java

In this short tutorial, we are going to shed light on how to match any character using regex in Java.

First, we will explain how to use a regular expression to match any single character. Then, we are going to showcase how to find multiple matches.

Finally, we will illustrate how to exclude and escape specific characters.

Regex to Match Any Character

Typically, we can use the dot/period pattern “.” to match a single character once.

In Java, the matched character can be any char except line terminators. However, we can address this limitation using the Pattern.DOTALL flag.

Regex to match any character

The default behavior of the dot changes depending on whether we combine it with other patterns.

For example, we used the dot pattern with the end pattern to remove the last character in a string.

Pattern Example Description
. single char except a line terminator
.? matches zero or once any character except a line terminator
.+ matches any char that is not a line terminator once or more times
.* any character (zero or more times) except a line terminator
\. matches the dot character itself
A.B a string starting with A, followed by any char, and ending with B
Читайте также:  Pattern programming with python

Basically, Java provides the Pattern class to denote a compiled regular expression.

So, let’s see how we can use it to compile a regex that matches any single character:

 @Test public void matchAnyCharacterUsingRegex() < assertTrue(Pattern.matches(".", "A")); // any char except new line assertFalse(Pattern.matches(".", "\n")); // using Pattern.DOTALL to match new line assertTrue(Pattern.compile(".", Pattern.DOTALL) .matcher("\n") .matches()); assertTrue(Pattern.matches(".?", "C")); assertFalse(Pattern.matches(".?", "CD")); assertTrue(Pattern.matches(".+", "ABC")); assertTrue(Pattern.matches(".*", "Z")); assertTrue(Pattern.matches("A.Z", "AYZ")); assertFalse(Pattern.matches("A.F", "AGH")); > 

Match Multiple Characters

The wildcard character “*“, called also asterisk, provides the easiest way to match any number of characters that are not line terminators.

For instance, we can use it with the dot ”.”, or the class “[]” patterns:

Pattern Example Description
B.*Y finds a string that starts with B, followed by any number of chars, and ends with Y
6* multiple digits only
[a-z]* matches zero or multiple lowercase alphabets
[A-Z]* only zero or multiple uppercase alphabets
[a-zA-Z]* matches any number of alphabets

Now, let’s create a test case to exemplify how to use the asterisk symbol to find any number of chars:

 @Test public void matchMultipleCharacterUsingRegex() < assertTrue(Pattern.matches("4*", "12345")); assertFalse(Pattern.matches("7*", "123ABC")); assertTrue(Pattern.matches("[a-z]*", "abcd")); assertTrue(Pattern.matches("[A-Z]*", "XYZ")); assertTrue(Pattern.matches("[a-zA-Z]*", "yzAB")); > 

Match Range of Characters

Furthermore, we can use the square brackets with a hyphen to match a range of characters.

The hyphen acts as a range delimiter as it separates the starting char and the ending char.

For instance, we can use a regex with the 2 pattern to match only numbers.

Pattern Example Description
27 matches a number between 0 and 4, followed by a number ranging from 6 to 8
[a-z]5 finds a lowercase character followed a number between 1 and 6
[c-d]2[A-N] matches a char ranging between c and d, a number between 1 and 5, and an uppercase alphabet ranging from A to N

Now, let’s demonstrate how to find a set of chars ranging between two given characters:

 @Test public void matchRangeOfCharacterUsingRegex() < assertTrue(Pattern.matches("28", "17")); assertFalse(Pattern.matches("49", "19")); assertTrue(Pattern.matches("[a-z]zhwani5", "azhwani5")); assertTrue(Pattern.matches("[a-z][A-Z]", "iN")); > 

Excluding Specific Characters

We can put the excluded characters inside the brackets prefixed by a caret [^..]. However, specifying the caret outside the brackets will mean the start of a string.

For example, [^abc] will match all chars except a, b, and c.

Please notice that the caret must be inside the brackets. Otherwise, the pattern will have another meaning.

Pattern Example Description
[^A] the character A will be excluded from the matching character
[^0-9] matches a character that is not a digit
[^A-Z] Excludes uppercase alphabets

Finally, we are going to see how to exclude characters using a regular expression in Java:

 @Test public void ExcludeCharactersUsingRegex() < assertTrue(Pattern.matches("[^a-z]", "A")); assertFalse(Pattern.matches("[^0-1]", "1")); assertTrue(Pattern.matches("[^A-Z]", "z")); > 

Escaping Special Characters

Sometimes, we want to match a character that has a special meaning in regular expressions such as dot, backslash, or caret.

Читайте также:  One page css layout

To achieve this, we need to prefix the matched char with a backslash. For instance, to match a dot, we need to use the pattern “\.”.

Regex Special Characters List

Conclusion

To sum it up, in this tutorial we explained how to match any character using regex in Java.

Along the way, we have seen how to use regular expressions to match multiple chars.

Lastly, we showcased how to exclude and escape specific characters.

Liked the Article? Share it on Social media!

If you enjoy reading my articles, buy me a coffee ☕. I would be very grateful if you could consider my request ✌️

Источник

Regex – Match Any Character(s)

In regular expressions, we can match any character using period «.» character. To match multiple characters or a given set of characters, we should use character classes.

1. Matching a Single Character Using Regex

By default, the ‘.’ dot character in a regular expression matches a single character without regard to what character it is. The matched character can be an alphabet, a number or, any special character.

To create more meaningful patterns, we can combine the dot character with other regular expression constructs.

Pattern Description
. (Dot) Matches only a single character.
A.B Matches only a single character at second place in a 3 character long string where the string starts with ‘A’ and ends with ‘B’.
[abc] Matches only a single character from a set of given characters.
[aA] Matches only a single character ‘a’, case-insensitive.
import java.util.regex.Pattern; public class Main < public static void main(String[] args) < Pattern.compile(".").matcher("a").matches(); //true Pattern.compile(".").matcher("ab").matches(); //false Pattern.compile("A.B").matcher("AIB").matches(); //true Pattern.compile("A.B").matcher("ABI").matches(); //false Pattern.compile("A[abc]B").matcher("AaB").matches(); //true Pattern.compile("A[abc]B").matcher("AkB").matches(); //false >>

2. Matching Range of Characters

If we want to match a range of characters at any place, we need to use character classes with a hyphen between the range. e.g. ‘[a-f]’ will match a single character which can be either of ‘a’, ‘b’, ‘c’, ‘d’, ‘e’ or ‘f’.

Pattern Description
[a-f] Matches only a single character in the range from ‘a’ to ‘f’.
[a-z] Matches only a single lowercase character in the range from ‘a’ to ‘z’.
[A-Z] Matches only a single uppercase character in the range from ‘A’ to ‘Z’.
[a-zA-Z] Matches only a single character in the range from ‘a’ to ‘z’, case-insensitive.
5 Matches only a single number in the range from ‘0’ to ‘9’.
import java.util.regex.Pattern; public class Main < public static void main(String[] args) < System.out.println(Pattern.compile("[a-f]").matcher("b").matches()); //true System.out.println(Pattern.compile("[a-f]").matcher("g").matches()); //false System.out.println(Pattern.compile("[a-zA-Z]").matcher("a").matches()); //true System.out.println(Pattern.compile("[a-zA-Z]").matcher("B").matches()); //true System.out.println(Pattern.compile("[a-zA-Z]").matcher("4").matches()); //false System.out.println(Pattern.compile("6").matcher("9").matches()); //true System.out.println(Pattern.compile("5").matcher("91").matches()); //false >>

3. Matching Multiple Characters

If we want to match a set of characters at any place then we need to use a wild card character ‘ * ‘ (asterisk) which matches 0 or more characters.

Pattern Description
.* Matches any number of characters including special characters.
3* Matches any number of digits.
[a-zA-Z]* Matches any number of alphabets.
[a-zA-Z0-9]* Matches any number of alphanumeric characters.
Pattern.compile(".*").matcher("abcd").matches(); //true Pattern.compile("[a-zA-Z]*").matcher("abcd").matches(); //true Pattern.compile("6*").matcher("01234").matches(); //true Pattern.compile("[a-zA-Z0-9]*").matcher("a1b2c3").matches(); //true

Источник

Читайте также:  Изучать язык программирования php

Java regex for any symbol?

Is there a regex which accepts any symbol? EDIT: To clarify what I’m looking for.. I want to build a regex which will accept ANY number of whitespaces and the it must contain atleast 1 symbol (e.g , . » ‘ $ £ etc.) or (not exclusive or) at least 1 character.

@Ulkmum: See my answer: you are including things that Java has trouble with, because they’re in its native character set instead of the legacy character set. If you have to do deal with any of these: !»#$%&'()*+,-./:;<=>?@[\]^_ˋ<|>~¡¢£¤¥¦§¨©«¬®¯°±´¶·¸»¿×÷˂˃˄˅˘˙˚˜˝϶҂՚׀׃׆׳״‐‑‒–—―‖‗‘’‚‛“”„‟†‡•‹›‼‽‾‿⁀ then you must use my fancier formulations.

@aioobe: In Java — but not in Perl — the pattern ^\s*\S+$ “succeeds” against «\t\n » . I find that counterintuitive to the point of being wrong: obviously it should fail, not succeed. Nothing but the casuistry of a language-lawyer paid off by the Evil Empire could make anyone believe otherwise. It is simply nuts!

2 Answers 2

Yes. The dot ( . ) will match any symbol, at least if you use it in conjunction with Pattern.DOTALL flag (otherwise it won’t match new-line characters). From the docs:

In dotall mode, the expression . matches any character, including a line terminator. By default this expression does not match line terminators.

I want to build a regex which will accept ANY number of whitespaces and the it must contain atleast 1 symbol (e.g , . » ‘ $ £ etc.) or (not exclusive or) at least 1 character.

  • \s* any number of whitespace characters
  • \S+ one or more («at least one») non-whitespace character.

Right, so a regex that would accept strings which contain any number of whitespaces and ATLEAST 1 word and any number of symbols would be. \\s*\\p[\\p\\s]* . where does the dot go?

Strictly speaking LF and CR are control codes not symbols but you’re still correct in that . won’t match every possible character value.

Aren’t we confusing «symbol» with «character»? I interpreted «symbol» in the question as «non-alphanumeric character».

Generally when you ask for help with regular expressions, it helps a lot if you provide a few examples of strings that should match, and a few examples of strings that should not match.

In Java, a symbol is \pS , which is not the same as punctuation characters, which are \pP .

I talk about this issue, plus enumerate the types for all the ASCII punctuation and symbols, here in this answer.

Patterns like [\p\s] only work on legacy dataset from the 1960s. To work on things with the Java native characters set, you needs something on the order of

identifier_charclass = "[\\pL\\pM\\p\\p\\p[\\p&&\\p]]"; whitespace_charclass = "[\\u000A\\u000B\\u000C\\u000D\\u0020\\u0085\\u00A0\\u1680\\u180E\\u2000\\u2001\\u2002\\u2003\\u2004\\u2005\\u2006\\u2007\\u2008\\u2009\\u200A\\u2028\\u2029\\u202F\\u205F\\u3000]"; ident_or_white = "[" + identifier_charclass + whitespace_charclass + "]"; 

I’m sorry that Java makes it so difficult to work with modern dataset, but at least it is possible.

Just don’t ask about boundaries or grapheme clusters. For that, see my others posting.

Источник

Оцените статью