- Regular Expressions: how to accept any symbol
- 4 Answers 4
- Match Any Character Using Regex In Java
- Regex to Match Any Character
- Match Multiple Characters
- Match Range of Characters
- Excluding Specific Characters
- Escaping Special Characters
- Conclusion
- Regex – Match Any Character(s)
- Java regex for any symbol?
- 2 Answers 2
Regular Expressions: how to accept any symbol
I want to replace any content in my text file in between symbols < and >What’s the regular expression to accept any symbol ? I’ve currently:
fields[i] = fields[i].replaceAll("\\", "");
But it works only for letters and numbers, if there is a symbol in between < and >, the string is not replaced. thanks
As a suggestion, when you make Regex and you don’t need to escape characters, do this: @»\<[a-z0-9_-]*\>» . @ disables escape expansion.[a-z0-9_-]*\>
4 Answers 4
To accept any symbol, .* should do the trick.
E.g.: fields[i] = fields[i].replaceAll(«\\», «»);
Mine is a little better only because the «terminator» is long 1 char. If the terminator had been (for example) «/>» (without quotes), then his solution (using the .*?, so the non-greedy) would have been better.
Try this [^\>]* (any character that isn’t > )
Any char in regexp is «.» the «*» — is quantifier, how many. Thus if you want just one char, then use «.» (dot) and that’s it.
This is generic for the bigger-picture approach, say you wanted to clean out (or select) any symbols from a string.
A cleaner approach will be to select anything that is not alphanumeric, which by elimination must be a symbol, simply by using /\W/ , see [1]. The regex will be
let re = /\W/g // for example, given a string and you would like to // clean out any non-alphanumerics // remember this will include the spaces let s = "he$$llo# worl??d!" s = s.replace(re, '') // "helloworld"
However, if you need to exclude all non-alphanumerics except a few, say «space» from our previous example. You can use the [^ . ] (hat) pattern.
let re = /[^ \w]/g // match everything else except space and \w (alphanumeric) let s = "he$$llo# worl??d!" s = s.replace(re, '') // "hello world"
Match Any Character Using Regex In Java
In this short tutorial, we are going to shed light on how to match any character using regex in Java.
First, we will explain how to use a regular expression to match any single character. Then, we are going to showcase how to find multiple matches.
Finally, we will illustrate how to exclude and escape specific characters.
Regex to Match Any Character
Typically, we can use the dot/period pattern “.” to match a single character once.
In Java, the matched character can be any char except line terminators. However, we can address this limitation using the Pattern.DOTALL flag.
The default behavior of the dot changes depending on whether we combine it with other patterns.
For example, we used the dot pattern with the end pattern to remove the last character in a string.
Pattern Example | Description |
. | single char except a line terminator |
.? | matches zero or once any character except a line terminator |
.+ | matches any char that is not a line terminator once or more times |
.* | any character (zero or more times) except a line terminator |
\. | matches the dot character itself |
A.B | a string starting with A, followed by any char, and ending with B |
Basically, Java provides the Pattern class to denote a compiled regular expression.
So, let’s see how we can use it to compile a regex that matches any single character:
@Test public void matchAnyCharacterUsingRegex() < assertTrue(Pattern.matches(".", "A")); // any char except new line assertFalse(Pattern.matches(".", "\n")); // using Pattern.DOTALL to match new line assertTrue(Pattern.compile(".", Pattern.DOTALL) .matcher("\n") .matches()); assertTrue(Pattern.matches(".?", "C")); assertFalse(Pattern.matches(".?", "CD")); assertTrue(Pattern.matches(".+", "ABC")); assertTrue(Pattern.matches(".*", "Z")); assertTrue(Pattern.matches("A.Z", "AYZ")); assertFalse(Pattern.matches("A.F", "AGH")); >
Match Multiple Characters
The wildcard character “*“, called also asterisk, provides the easiest way to match any number of characters that are not line terminators.
For instance, we can use it with the dot ”.”, or the class “[]” patterns:
Pattern Example | Description |
B.*Y | finds a string that starts with B, followed by any number of chars, and ends with Y |
6* | multiple digits only |
[a-z]* | matches zero or multiple lowercase alphabets |
[A-Z]* | only zero or multiple uppercase alphabets |
[a-zA-Z]* | matches any number of alphabets |
Now, let’s create a test case to exemplify how to use the asterisk symbol to find any number of chars:
@Test public void matchMultipleCharacterUsingRegex() < assertTrue(Pattern.matches("4*", "12345")); assertFalse(Pattern.matches("7*", "123ABC")); assertTrue(Pattern.matches("[a-z]*", "abcd")); assertTrue(Pattern.matches("[A-Z]*", "XYZ")); assertTrue(Pattern.matches("[a-zA-Z]*", "yzAB")); >
Match Range of Characters
Furthermore, we can use the square brackets with a hyphen to match a range of characters.
The hyphen acts as a range delimiter as it separates the starting char and the ending char.
For instance, we can use a regex with the 2 pattern to match only numbers.
Pattern Example | Description |
27 | matches a number between 0 and 4, followed by a number ranging from 6 to 8 |
[a-z]5 | finds a lowercase character followed a number between 1 and 6 |
[c-d]2[A-N] | matches a char ranging between c and d, a number between 1 and 5, and an uppercase alphabet ranging from A to N |
Now, let’s demonstrate how to find a set of chars ranging between two given characters:
@Test public void matchRangeOfCharacterUsingRegex() < assertTrue(Pattern.matches("28", "17")); assertFalse(Pattern.matches("49", "19")); assertTrue(Pattern.matches("[a-z]zhwani5", "azhwani5")); assertTrue(Pattern.matches("[a-z][A-Z]", "iN")); >
Excluding Specific Characters
We can put the excluded characters inside the brackets prefixed by a caret [^..]. However, specifying the caret outside the brackets will mean the start of a string.
For example, [^abc] will match all chars except a, b, and c.
Please notice that the caret must be inside the brackets. Otherwise, the pattern will have another meaning.
Pattern Example | Description |
[^A] | the character A will be excluded from the matching character |
[^0-9] | matches a character that is not a digit |
[^A-Z] | Excludes uppercase alphabets |
Finally, we are going to see how to exclude characters using a regular expression in Java:
@Test public void ExcludeCharactersUsingRegex() < assertTrue(Pattern.matches("[^a-z]", "A")); assertFalse(Pattern.matches("[^0-1]", "1")); assertTrue(Pattern.matches("[^A-Z]", "z")); >
Escaping Special Characters
Sometimes, we want to match a character that has a special meaning in regular expressions such as dot, backslash, or caret.
To achieve this, we need to prefix the matched char with a backslash. For instance, to match a dot, we need to use the pattern “\.”.
Conclusion
To sum it up, in this tutorial we explained how to match any character using regex in Java.
Along the way, we have seen how to use regular expressions to match multiple chars.
Lastly, we showcased how to exclude and escape specific characters.
Liked the Article? Share it on Social media!
If you enjoy reading my articles, buy me a coffee ☕. I would be very grateful if you could consider my request ✌️
Regex – Match Any Character(s)
In regular expressions, we can match any character using period «.» character. To match multiple characters or a given set of characters, we should use character classes.
1. Matching a Single Character Using Regex
By default, the ‘.’ dot character in a regular expression matches a single character without regard to what character it is. The matched character can be an alphabet, a number or, any special character.
To create more meaningful patterns, we can combine the dot character with other regular expression constructs.
Pattern | Description |
---|---|
. (Dot) | Matches only a single character. |
A.B | Matches only a single character at second place in a 3 character long string where the string starts with ‘A’ and ends with ‘B’. |
[abc] | Matches only a single character from a set of given characters. |
[aA] | Matches only a single character ‘a’, case-insensitive. |
import java.util.regex.Pattern; public class Main < public static void main(String[] args) < Pattern.compile(".").matcher("a").matches(); //true Pattern.compile(".").matcher("ab").matches(); //false Pattern.compile("A.B").matcher("AIB").matches(); //true Pattern.compile("A.B").matcher("ABI").matches(); //false Pattern.compile("A[abc]B").matcher("AaB").matches(); //true Pattern.compile("A[abc]B").matcher("AkB").matches(); //false >>
2. Matching Range of Characters
If we want to match a range of characters at any place, we need to use character classes with a hyphen between the range. e.g. ‘[a-f]’ will match a single character which can be either of ‘a’, ‘b’, ‘c’, ‘d’, ‘e’ or ‘f’.
Pattern | Description |
---|---|
[a-f] | Matches only a single character in the range from ‘a’ to ‘f’. |
[a-z] | Matches only a single lowercase character in the range from ‘a’ to ‘z’. |
[A-Z] | Matches only a single uppercase character in the range from ‘A’ to ‘Z’. |
[a-zA-Z] | Matches only a single character in the range from ‘a’ to ‘z’, case-insensitive. |
5 | Matches only a single number in the range from ‘0’ to ‘9’. |
import java.util.regex.Pattern; public class Main < public static void main(String[] args) < System.out.println(Pattern.compile("[a-f]").matcher("b").matches()); //true System.out.println(Pattern.compile("[a-f]").matcher("g").matches()); //false System.out.println(Pattern.compile("[a-zA-Z]").matcher("a").matches()); //true System.out.println(Pattern.compile("[a-zA-Z]").matcher("B").matches()); //true System.out.println(Pattern.compile("[a-zA-Z]").matcher("4").matches()); //false System.out.println(Pattern.compile("6").matcher("9").matches()); //true System.out.println(Pattern.compile("5").matcher("91").matches()); //false >>
3. Matching Multiple Characters
If we want to match a set of characters at any place then we need to use a wild card character ‘ * ‘ (asterisk) which matches 0 or more characters.
Pattern | Description |
---|---|
.* | Matches any number of characters including special characters. |
3* | Matches any number of digits. |
[a-zA-Z]* | Matches any number of alphabets. |
[a-zA-Z0-9]* | Matches any number of alphanumeric characters. |
Pattern.compile(".*").matcher("abcd").matches(); //true Pattern.compile("[a-zA-Z]*").matcher("abcd").matches(); //true Pattern.compile("6*").matcher("01234").matches(); //true Pattern.compile("[a-zA-Z0-9]*").matcher("a1b2c3").matches(); //true
Java regex for any symbol?
Is there a regex which accepts any symbol? EDIT: To clarify what I’m looking for.. I want to build a regex which will accept ANY number of whitespaces and the it must contain atleast 1 symbol (e.g , . » ‘ $ £ etc.) or (not exclusive or) at least 1 character.
@Ulkmum: See my answer: you are including things that Java has trouble with, because they’re in its native character set instead of the legacy character set. If you have to do deal with any of these: !»#$%&'()*+,-./:;<=>?@[\]^_ˋ<|>~¡¢£¤¥¦§¨©«¬®¯°±´¶·¸»¿×÷˂˃˄˅˘˙˚˜˝϶҂՚׀׃׆׳״‐‑‒–—―‖‗‘’‚‛“”„‟†‡•‹›‼‽‾‿⁀ then you must use my fancier formulations.=>
@aioobe: In Java — but not in Perl — the pattern ^\s*\S+$ “succeeds” against «\t\n » . I find that counterintuitive to the point of being wrong: obviously it should fail, not succeed. Nothing but the casuistry of a language-lawyer paid off by the Evil Empire could make anyone believe otherwise. It is simply nuts!
2 Answers 2
Yes. The dot ( . ) will match any symbol, at least if you use it in conjunction with Pattern.DOTALL flag (otherwise it won’t match new-line characters). From the docs:
In dotall mode, the expression . matches any character, including a line terminator. By default this expression does not match line terminators.
I want to build a regex which will accept ANY number of whitespaces and the it must contain atleast 1 symbol (e.g , . » ‘ $ £ etc.) or (not exclusive or) at least 1 character.
- \s* any number of whitespace characters
- \S+ one or more («at least one») non-whitespace character.
Right, so a regex that would accept strings which contain any number of whitespaces and ATLEAST 1 word and any number of symbols would be. \\s*\\p
Strictly speaking LF and CR are control codes not symbols but you’re still correct in that . won’t match every possible character value.
Aren’t we confusing «symbol» with «character»? I interpreted «symbol» in the question as «non-alphanumeric character».
Generally when you ask for help with regular expressions, it helps a lot if you provide a few examples of strings that should match, and a few examples of strings that should not match.
In Java, a symbol is \pS , which is not the same as punctuation characters, which are \pP .
I talk about this issue, plus enumerate the types for all the ASCII punctuation and symbols, here in this answer.
Patterns like [\p\s] only work on legacy dataset from the 1960s. To work on things with the Java native characters set, you needs something on the order of
identifier_charclass = "[\\pL\\pM\\p\\p\\p[\\p&&\\p]]"; whitespace_charclass = "[\\u000A\\u000B\\u000C\\u000D\\u0020\\u0085\\u00A0\\u1680\\u180E\\u2000\\u2001\\u2002\\u2003\\u2004\\u2005\\u2006\\u2007\\u2008\\u2009\\u200A\\u2028\\u2029\\u202F\\u205F\\u3000]"; ident_or_white = "[" + identifier_charclass + whitespace_charclass + "]";
I’m sorry that Java makes it so difficult to work with modern dataset, but at least it is possible.
Just don’t ask about boundaries or grapheme clusters. For that, see my others posting.