JAVA正则表达式
校验qq
public static boolean checkQQregex(String qq) {
return qq != null && qq.matches("\\d{4,}");
}
检验电话号码
public static boolean checkPhoneregex(String str) {
return str != null && str.matches("1[3-9]\\d{9}");
}
String类split replace方法
private void name() {
String str = "张三fs333李四hg444王五";
//split集成正则表达式做分割
//以匹配正则表达式的内容为分割点,分割成字符串数组
str.split("\\w+");
str.replaceAll("\\w+", "/");
}
正则表达式爬取信息中的内容
String rs = "邮箱 4221321@qq.com 电话 13812345678 邮箱 4221321@qq.com 电话 13812345678 邮箱 4221321@qq.com 电话 13812345678 ";
//1.定义爬取规则
String regex = "(\w{1,}@\w{2,10})|(1[3-9]\d{9})";
//2.编译正则表达式成为一个匹配规则对象
Pattern pattern = Pattern.compile(regex);
//3.通过匹配规则对象得到一个匹配数据内容的匹配器对象
Matcher matcher = pattern.matcher(rs);
//4.通过匹配器去内容中爬取出信息
while(mather.find()){
system.out.println(matcher.group());
}
Summary of regular-expression constructs
可以参见javaAPI文档
Character classes
[abc] a, b, or c (simple class)
[^abc] Any character except a, b, or c (negation)
[a-zA-Z] a through z or A through Z, inclusive (range)
[a-d[m-p]] a through d, or m through p: [a-dm-p] (union)
[a-z&&[def]] d, e, or f (intersection)
[a-z&&[^bc]] a through z, except for b and c: [ad-z] (subtraction)
[a-z&&[^m-p]] a through z, and not m through p: a-lq-z
Predefined character classes
. Any character (may or may not match line terminators)
\d A digit: [0-9]
\D A non-digit: [^0-9]
\h A horizontal whitespace character: [ \t\xA0\u1680\u180e\u2000-\u200a\u202f\u205f\u3000]
\H A non-horizontal whitespace character: [^\h]
\s A whitespace character: [ \t\n\x0B\f\r]
\S A non-whitespace character: [^\s]
\v A vertical whitespace character: [\n\x0B\f\r\x85\u2028\u2029]
\V A non-vertical whitespace character: [^\v]
\w A word character: [a-zA-Z_0-9]
\W A non-word character: [^\w]