面向普通人的 JavaScript 正则表达式

常规表达式,也被称为regex或regexp,是一个很难解决的主题. 不要感到羞耻,如果你还没有100%舒适地写自己的常规表达式,因为它确实需要一些习惯。

在JavaScript中,正规表达式是标准内置对象)。

字面方式, /expression/.match('string to test against')
具有字符串参数的 新' 关键字, new RegExp('expression')`
具有字面值的 新' 关键字, new RegExp(/expression/)`

我将使用这些方法的组合来证明它们基本上执行相同的工作。

我们的常规表达的目标

在我的例子中,我将使用包含我的姓名、姓名和域名的字符串。在现实世界中,这个例子需要更多的思考。

假设我正在构建一个仪表板,并希望显示登录用户的名称. 我没有对返回给我的数据的控制权,所以我必须处理我所拥有的数据。

我需要将aaron.arney:alligator.io转换为Aaron Arney [Alligator]。

常规表达式将大量的逻辑合并到一个单一的凝结对象中,这可以并会引起混乱。一个很好的做法是将你的表达式分解成一种假代码形式,这使我们能够看到需要发生什么以及何时发生。

提取名称
提取名称
提取域名
将字符串格式为所需的模板格式 First Last [Domain]

匹配第一個名字

要与常规表达式匹配一个字符串,你只需要通过字面字符串. 表达式末尾的i是一个旗帜. 特别是i的旗帜代表不敏感的情况。

1const unformattedName = 'aaron.arney:alligator.io';
2
3const found = unformattedName.match(/aaron/i);
4
5console.log(found);
6// expected output: Array [ "aaron" ]

这很好,但在我们的情况下,这不是一个好方法,因为用户的名字并不总是Aaron。

让我们专注于匹配一个名字,暂时把这个词分成个别字符,你看到了什么?

名字Aaron由五个阿尔法字符组成. 每个名字的头字符只有五个字符吗? 不,但假设名字的头字符可以从1到15个字符之间。

现在,如果我们更新我们的表达式以使用这个字符类...

1const unformattedName = 'aaron.arney:alligator.io';
2
3const found = unformattedName.match(/[a-z]/i);
4
5console.log(found);
6// expected output: Array [ "a" ]

这是好事,因为常规表达式尽可能努力匹配尽可能少。要重复字符匹配一个数字,直到我们的限度为15,我们使用弯曲的支架。这告诉我们观看匹配前代币,我们的a-z,匹配1到15次。

 1const unformattedName = 'aaron.arney:alligator.io';
 2const unformattedNameTwo = 'montgomery.bickerdicke:alligator.io';
 3const unformattedNameThree = 'a.lila:alligator.io';
 4
 5const exp = new RegExp(/[a-z]{1,15}/, 'i');
 6
 7const found = unformattedName.match(exp);
 8const foundTwo = unformattedNameTwo.match(exp);
 9const foundThree = unformattedNameThree.match(exp);
10
11console.log(found);
12// expected output: Array [ "aaron" ]
13
14console.log(foundTwo);
15// expected output: Array [ "montgomery" ]
16
17console.log(foundThree);
18// expected output: Array [ "a" ]

匹配最后一个名字

提取姓氏应该像复制和粘贴我们的第一个表达式一样简单,您会注意到匹配仍然返回相同的值,而不是第一和最后的名字。

将字符串字符划分为字符,有一个完整的停止将名字分开,为此我们将完整的停止添加到我们的表达式中。

在这里,我们必须小心,‘.’在一个表达式中可能意味着两件事之一。

. - 匹配任何字符除新线
. - 匹配.

在这种情况下,使用任何一个版本都会产生相同的结果,但这并不总是如此. 诸如 eslint等工具有时会将逃跑序列标记为不必要,但我说比对不起更安全!

1const unformattedName = 'aaron.arney:alligator.io';
2
3const exp = new RegExp(/[a-z]{1,15}\.[a-z]{1,15}/, 'i');
4
5const found = unformattedName.match(exp);
6
7console.log(found);
8// expected output: Array [ "aaron.arney" ]

由于我们更喜欢将字符串划分为两个项目,并排除表达式返回的完整停止,我们现在可以使用捕捉组。这些被标记为()并围绕您想要返回的表达式的部分包装。

使用捕捉组的语法很简单: (表达式). 因为我只想返回我的姓名和姓名,而不是完整的停留,将我们的表达式包裹在对面。

1const unformattedName = 'aaron.arney:alligator.io';
2
3const exp = new RegExp(/([a-z]{1,15})\.([a-z]{1,15})/, 'i');
4
5const found = unformattedName.match(exp);
6
7console.log(found);
8// expected output: Array [ "aaron.arney", "aaron", "arney" ]

匹配域名

为了提取alligator.io,我们将使用我们迄今为止已经使用的字符类。

验证域名和TLD是一个困难的业务,我们会假装我们分析的域名总是有>3&&<25字符,TLD总是>1&<10字符,如果我们插入这些字符,我们会得到一些新的输出:

1const unformattedName = 'aaron.arney:alligator.io';
2
3const exp = new RegExp(/([a-z]{1,15})\.([a-z]{1,15}):([a-z]{3,25}\.[a-z]{2,10})/, 'i');
4
5const found = unformattedName.match(exp);
6
7console.log(found);
8// expected output: Array [ "aaron.arney:alligator.io", "aaron", "arney", "alligator.io" ]

短片

我向你展示了关于表达的长途方法,现在我将向你展示你如何可以有一个捕捉同一文本的更少说话式的表达式. 通过使用+量子,我们可以告诉我们的表达式尽可能多重复上一个代币。

1// With the global flag
2'aaron.arney:alligator.io'.match(/[a-z]+/ig);
3// expected output: Array(4) [ "aaron", "arney", "alligator", "io" ]
4
5// Without the global flag
6'aaron.arney:alligator.io'.match(/[a-z]+/i);
7// expected output: Array(4) [ "aaron" ]

输出格式

要格式化字符串,我们将使用在字符串对象上的 replace 方法。

RegExp BG String - 常规表达对象或字母
RegExp BG函数 - 常规表达式或函数

 1const unformattedName = 'aaron.arney:alligator.io';
 2
 3// The "long" way
 4const exp = new RegExp(/([a-z]{1,15})\.([a-z]{1,15}):([a-z]{3,25}\.[a-z]{2,10})/, 'i');
 5
 6unformattedName.replace(exp, '$1 $2 [$3]');
 7// expected output:  "aaron arney [alligator.io]"
 8
 9// A slightly shorter way
10unformattedName.replace(/([a-z]+)\.([a-z]+):([a-z]+\.[a-z]{2,10})/ig, '$1 $2 [$3]');
11// expected output: "aaron arney [alligator.io]"

在上述片段中,1美元,2美元,3美元是由替换方法解释的特殊模式。

$1 - 第一个结果来自对应数组 =>第一个对应组的引用
$2 - 第二个结果来自对应数组=> 第一个对应组的引用
$n - 等等

要将单词资本化,我们可以使用另一个 regex. 相反,我们将像上面那样格式化输出,我们将传递一个函数. 函数将提供的参数资本化并返回。

在这里,我介绍了一些新部分,``,交替,以及一个新的字符类^。

[^abc] - 不是 a、 b 或 c
\b - 字边界
ab̧cd - 逻辑OR,匹配 ab 或 cd

1// Capitalize the words
2"aaron arney [alligator.io]".replace(/(^\b[a-z])|([^\.]\b[a-z])/g, (char) => char.toUpperCase());
3// expected output: "Aaron Arney [Alligator.io]"

把这句话分成两个部分......

(^\b[a-z]) - 捕捉字符串的第一个字符。 ^ 表示匹配字符串的开始。

继续你的探索

这是常规表达的力量的一小部分,我所研究的例子是可以改进的,但如何?

这个表达式太语了吗?它太简单了吗?
它是否涵盖边缘案例?
您可以用使用本地方法进行一些聪明的字符串操纵来代替它吗?

这就是你所学到的知识,并试图回答这些问题的地方. 探索以下资源,以帮助你在你的旅程和实验!