介绍
grep
命令是Linux终端环境中最有用的命令之一.这个名称grep
代表全球正规表达式打印
。这意味着你可以使用grep
来检查它所接收的输入是否符合特定的模式。
在本教程中,您将探索抓住
命令的选项,然后您将深入使用常规表达式来进行更高级的搜索。
前提条件
要跟随这个指南,你需要访问运行基于Linux的操作系统的计算机. 它可以是一个虚拟的私人服务器,你已经连接到SSH或你的本地机器. 请注意,本教程是通过使用运行Ubuntu 20.04的Linux服务器验证的,但所示的例子应该在运行任何Linux发行版的计算机上工作。
如果您打算使用远程服务器来遵循本指南,我们建议您先完成我们的 初始服务器设置指南。
基本使用
在本教程中,您将使用grep
来搜索 GNU 通用公共许可证版本 3的各种单词和短语。
如果你在Ubuntu系统,你可以在/usr/share/common-licenses
文件夹中找到该文件。
1cp /usr/share/common-licenses/GPL-3 .
如果您在其他系统上,请使用‘curl’命令下载副本:
1curl -o GPL-3 https://www.gnu.org/licenses/gpl-3.0.txt
在Linux上,您可以使用以下命令将其复制到您的主目录:
1cp /usr/share/common-licenses/BSD .
如果您在其他系统上,请使用以下命令创建文件:
1cat << 'EOF' > BSD
2Copyright (c) The Regents of the University of California.
3All rights reserved.
4
5Redistribution and use in source and binary forms, with or without
6modification, are permitted provided that the following conditions
7are met:
81. Redistributions of source code must retain the above copyright
9 notice, this list of conditions and the following disclaimer.
102. Redistributions in binary form must reproduce the above copyright
11 notice, this list of conditions and the following disclaimer in the
12 documentation and/or other materials provided with the distribution.
133. Neither the name of the University nor the names of its contributors
14 may be used to endorse or promote products derived from this software
15 without specific prior written permission.
16
17THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
18ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
19IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
20ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
21FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
22DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
23OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
24HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
25LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
26OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
27SUCH DAMAGE.
28EOF
现在你有文件,你可以开始使用grep
。
在最基本的形式中,您使用grep
来匹配文本文件中的字面模式,这意味着如果您通过grep
搜索一个单词,它将打印包含该单词的文件中的每个行。
執行以下命令,用「grep」尋找包含「GNU」字的每一行:
1grep "GNU" GPL-3
第一個論點「GNU」是您正在尋找的模式,而第二個論點「GPL-3」是您想要搜尋的入口檔案。
结果的输出将是包含模式文本的每个行:
1[secondary_label Output]
2 GNU GENERAL PUBLIC LICENSE
3 The GNU General Public License is a free, copyleft license for
4the GNU General Public License is intended to guarantee your freedom to
5GNU General Public License for most of our software; it applies also to
6 Developers that use the GNU GPL protect your rights with two steps:
7 "This License" refers to version 3 of the GNU General Public License.
8 13. Use with the GNU Affero General Public License.
9under version 3 of the GNU Affero General Public License into a single
10...
11...
在某些系统中,您搜索的模式将在输出中突出显示。
共同选择
默认情况下,grep 会搜索输入文件中的确切规定的模式,并返回它找到的行. 您可以通过在grep 中添加一些可选的旗帜来使这种行为更有用。
如果您希望抓住
忽略搜索参数的案例
,并搜索上下案例变量,则可以指定i
或ignore
案例选项。
搜索许可证
一词的每个实例(上、下或混合案例)在同一文件中以以下命令:
1grep -i "license" GPL-3
结果包含许可证
,许可证
和许可证
:
1[secondary_label Output]
2 GNU GENERAL PUBLIC LICENSE
3 of this license document, but changing it is not allowed.
4 The GNU General Public License is a free, copyleft license for
5 The licenses for most software and other practical works are designed
6the GNU General Public License is intended to guarantee your freedom to
7GNU General Public License for most of our software; it applies also to
8price. Our General Public Licenses are designed to make sure that you
9(1) assert copyright on the software, and (2) offer you this License
10 "This License" refers to version 3 of the GNU General Public License.
11 "The Program" refers to any copyrightable work licensed under this
12...
13...
如果有LiCeNsE
的实例,它也会被返回。
如果您想要找到所有没有包含指定模式的 **行,则可以使用-v
或--invert-match
选项。
搜索 BSD 许可证中不含the
字的每个行,使用以下命令:
1grep -v "the" BSD
你会得到这个输出:
1[secondary_label Output]
2All rights reserved.
3
4Redistribution and use in source and binary forms, with or without
5are met:
6 may be used to endorse or promote products derived from this software
7 without specific prior written permission.
8
9THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
10ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
11...
12...
由于您没有指定忽略案例
选项,最后两个项目被返回为没有the
字。
您可以通过使用 -n
或 --line-number
选项来做到这一点。
1grep -vn "the" BSD
这将返回以下文本:
1[secondary_label Output]
22:All rights reserved.
33:
44:Redistribution and use in source and binary forms, with or without
56:are met:
613: may be used to endorse or promote products derived from this software
714: without specific prior written permission.
815:
916:THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
1017:ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
11...
12...
现在,您可以引用行号码,如果您想对不包含the
的每个行进行更改,这在使用源代码时尤其方便。
常规表达
在介绍中,您了解到‘grep’是指全球正规表达式打印
。
不同的应用程序和编程语言以略有不同的方式实现常规表达式. 在本教程中,您只会探索grep
描述其模式的方式的一小部分。
字面比赛
在本教程中之前的示例中,当您搜索GNU
和the
字符时,您实际上正在寻找符合GNU
和the
字符串的基本正规表达式。
将这些视为对应字符串而不是对应一个单词是有帮助的,这将成为一个更重要的区别,因为你学习更复杂的模式。
所有字母和数字字符(以及某些其他字符)都与字母相匹配,除非通过其他表达机制进行修改。
安克尔比赛
是特定的字符,指定在线上一个匹配必须发生的地方是有效的。
例如,使用,你可以指定你只想知道在字面字符串前的^
。
运行以下命令来搜索GPL-3
文件,并找到一行一开始出现GNU
的行:
1grep "^GNU" GPL-3
此命令将返回以下两行:
1[secondary_label Output]
2GNU General Public License for most of our software; it applies also to
3GNU General Public License, you may choose any version ever published
同样,您在模式的末尾使用$
,表示匹配只有在行末端发生时才会有效。
此命令将匹配在GPL-3
文件中的每个结束于and
字的行:
1grep "and$" GPL-3
你会得到这个输出:
1[secondary_label Output]
2that there is no warranty for this free software. For both users' and
3 The precise terms and conditions for copying, distribution and
4 License. Each licensee is addressed as "you". "Licensees" and
5receive it, in any medium, provided that you conspicuously and
6 alternative is allowed only occasionally and noncommercially, and
7network may be denied when the modification itself materially and
8adversely affects the operation of the network or violates the rules and
9provisionally, unless and until the copyright holder explicitly and
10receives a license from the original licensors, to run, modify and
11make, use, sell, offer for sale, import and otherwise run, modify and
匹配任何性格
期间字符(.)在正常表达式中被用来表示任何单个字符可以在指定的位置存在。
例如,要匹配具有两个字符的GPL-3
文件中的任何东西,然后是cept
字符串,你会使用以下模式:
1grep "..cept" GPL-3
此命令返回以下输出:
1[secondary_label Output]
2use, which is precisely where it is most unacceptable. Therefore, we
3infringement under applicable copyright law, except executing it on a
4tells the user that there is no warranty for the work (except to the
5License by making exceptions from one or more of its conditions.
6form of a separately written license, or stated as exceptions;
7 You may not propagate or modify a covered work except as expressly
8 9. Acceptance Not Required for Having Copies.
9...
10...
这个输出有接受
和除外
的实例,以及两个单词的变异。
Bracket 表达式
通过将一组字符放置在支架内([
和 `]),您可以指定在该位置的字符可以是支架组中的任何一个字符。
例如,要找到包含太
或两个
的行,您可以使用以下模式简要地指定这些变异:
1grep "t[wo]o" GPL-3
输出显示文件中存在两种变异:
1[secondary_label Output]
2your programs, too.
3freedoms that you received. You must make sure that they, too, receive
4 Developers that use the GNU GPL protect your rights with two steps:
5a computer network, with no transfer of a copy, is not conveying.
6System Libraries, or general-purpose tools or generally available free
7 Corresponding Source from a network server at no charge.
8...
9...
你可以让模式匹配任何 除了的字符在一个支架内通过开始字符列表在支架内一个 ^
字符。
此示例类似于模式 .ode
,但不会匹配模式 code
:
1grep "[^c]ode" GPL-3
以下是你将收到的输出:
1[secondary_label Output]
2 1. Source Code.
3 model, to give anyone who possesses the object code either (1) a
4the only significant mode of use of the product.
5notice like this when it starts in an interactive mode:
请注意,在第二行返回中,实际上有代码
这个词,这不是正常表达式或抓取的失败,而是因为在前一行中,在模型
这个词中发现了模式
这个模式,所以返回了这个行。
支架的另一个有用的功能是,您可以指定一个字符范围,而不是单独键入每个可用的字符。
这意味着,如果您想要找到以大字母开头的每个行,您可以使用以下模式:
1grep "^[A-Z]" GPL-3
以下是此表达式返回的输出:
1[secondary_label Output]
2GNU General Public License for most of our software; it applies also to
3States should not allow patents to restrict development and use of
4License. Each licensee is addressed as "you". "Licensees" and
5Component, and (b) serves only to enable use of the work with that
6Major Component, or to implement a Standard Interface for which an
7System Libraries, or general-purpose tools or generally available free
8Source.
9User Product is transferred to the recipient in perpetuity or for a
10...
11...
由于一些传统的排序问题,使用 POSIX 字符类往往更准确,而不是您刚刚使用的字符范围。
要讨论每个 POSIX 字符类别将超出本指南的范围,但一个实现与上一个例子相同的程序的示例使用一个支架选择器中的 `[:上:] 字符类别:
1grep "^[[:upper:]]" GPL-3
产量将与以前相同。
重复零或多次模式
最后,最常用的元字符之一是星座,或*
,意思是重复以前的字符或表达式零或更多次
。
要查找包含打开和关闭符号的GPL-3
文件中的每个行,其中只有字母和单个间隙,请使用以下表达式:
1grep "([A-Za-z ]*)" GPL-3
你会得到以下的输出:
1[secondary_label Output]
2 Copyright (C) 2007 Free Software Foundation, Inc.
3distribution (with or without modification), making available to the
4than the work as a whole, that (a) is included in the normal form of
5Component, and (b) serves only to enable use of the work with that
6(if any) on which the executable work runs, or a compiler used to
7 (including a physical distribution medium), accompanied by the
8 (including a physical distribution medium), accompanied by a
9 place (gratis or for a charge), and offer equivalent access to the
10...
11...
到目前为止,你已经在表达中使用了时期,星座和其他字符,但有时你需要特别搜索这些字符。
逃避元素人物
有时你需要搜索一个字面期或一个字面开关,特别是在处理源代码或配置文件时. 因为这些字符在常规表达中具有特殊的含义,你需要逃避
这些字符来告诉抓住
,你不想在这种情况下使用他们的特殊含义。
您可以通过使用背后的字符(\
)在通常具有特殊含义的字符前逃脱字符。
例如,要查找以字母开始并以期结束的任何行,请使用以下表达式,以便它代表一个字面时期,而不是通常的任何字符
的含义:
1grep "^[A-Z].*\.$" GPL-3
这是你会看到的结果:
1[secondary_label Output]
2Source.
3License by making exceptions from one or more of its conditions.
4License would be to refrain entirely from conveying the Program.
5ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
6SUCH DAMAGES.
7Also add information on how to contact you by electronic and paper mail.
现在让我们看看其他常规表达的选项。
扩展常规表达式
grep
命令通过使用-E
旗帜或调用egrep
命令而不是grep
来支持更广泛的正规表达语言。
这些选项打开了扩展正规表达式
的功能。扩展正规表达式包括所有基本的元字符,以及额外的元字符来表达更复杂的匹配。
团体
扩展的正规表达式打开的最有用的能力之一是将表达式组合在一起以操纵或参考为一个单位的能力。
若要将表达式组合在一起,请将它们包装成窗口. 如果您想使用窗口而不使用扩展的常规表达式,则可以使用反转器来释放它们以启用此功能,这意味着以下三种表达式在功能上等同:
1grep "\(grouping\)" file.txt
2grep -E "(grouping)" file.txt
3egrep "(grouping)" file.txt
替代
类似于列表表达式可以指定单个字符匹配的不同可能的选择,交替允许您为字符串或表达式集指定替代匹配。
要表示交替,请使用管道字符``,这些字符常被用在 parenthetical 组合中,以指定两个或多个可能性中的一个应被视为匹配。
以下内容将在文本中找到GPL
或通用公共许可证
:
1grep -E "(GPL|General Public License)" GPL-3
结果看起来像这样:
1[secondary_label Output]
2 The GNU General Public License is a free, copyleft license for
3the GNU General Public License is intended to guarantee your freedom to
4GNU General Public License for most of our software; it applies also to
5price. Our General Public Licenses are designed to make sure that you
6 Developers that use the GNU GPL protect your rights with two steps:
7 For the developers' and authors' protection, the GPL clearly explains
8authors' sake, the GPL requires that modified versions be marked as
9have designed this version of the GPL to prohibit the practice for those
10...
11...
Alternation 可以通过在选项组中添加额外的选项来选择两个以上的选项,这些选项由额外的管道字符分开。
量化
与*
元字符相匹配以前的字符或字符设置为零或更多次一样,还有其他元字符可用扩展的正规表达式来指定发生次数。
要匹配一个字符为零或一次,你可以使用?
字符,这使得之前出现的字符或字符集基本上是可选的。
下列内容将版权
和权利
匹配,将副本
放入可选组:
1grep -E "(copy)?right" GPL-3
你会得到这个输出:
1[secondary_label Output]
2 Copyright (C) 2007 Free Software Foundation, Inc.
3 To protect your rights, we need to prevent others from denying you
4these rights or asking you to surrender the rights. Therefore, you have
5know their rights.
6 Developers that use the GNU GPL protect your rights with two steps:
7(1) assert copyright on the software, and (2) offer you this License
8 "Copyright" also means copyright-like laws that apply to other kinds of
9...
+
字符匹配一个表达式一次或多次. 这几乎就像*
元字符,但与+
字符,表达式 **必须至少匹配一次。
以下表达式匹配字符串免费
,加上一个或多个不是白色字符的字符:
1grep -E "free[^[:space:]]+" GPL-3
你会看到这个输出:
1[secondary_label Output]
2 The GNU General Public License is a free, copyleft license for
3to take away your freedom to share and change the works. By contrast,
4the GNU General Public License is intended to guarantee your freedom to
5 When we speak of free software, we are referring to freedom, not
6have the freedom to distribute copies of free software (and charge for
7you modify it: responsibilities to respect the freedom of others.
8freedomss that you received. You must make sure that they, too, receive
9protecting users' freedom to change the software. The systematic
10of the GPL, as needed to protect the freedom of users.
11patents cannot be used to render the program non-free.
标签: 比赛重复
若要指定一个匹配重复的次数,请使用组合字符({
和 }
)。这些字符允许您指定一个确切的数字、范围或上限或下限,以达到表达式可以匹配的次数。
使用以下表达式来查找包含三重语音的GPL-3
文件中的所有行:
1grep -E "[AEIOUaeiou]{3}" GPL-3
每个返回的行都有一个词,有三个语音:
1[secondary_label Output]
2changed, so that their problems will not be attributed erroneously to
3authors of previous versions.
4receive it, in any medium, provided that you conspicuously and
5give under the previous paragraph, plus a right to possession of the
6covered work so as to satisfy simultaneously your obligations under this
要匹配任何包含 16 到 20 个字符的单词,请使用以下表达式:
1grep -E "[[:alpha:]]{16,20}" GPL-3
以下是这个命令的输出:
1[secondary_label Output]
2 certain responsibilities if you distribute copies of the software, or if
3 you modify it: responsibilities to respect the freedom of others.
4 c) Prohibiting misrepresentation of the origin of that material, or
仅显示包含该长度内的单词的行。
结论
「grep」有助於在檔案或檔案系統等級範圍內尋找模式,因此值得花時間與其選項和語法舒適。
常规表达式更具通用性,可与许多流行的程序一起使用,例如,许多文本编辑器在搜索和替换文本时执行常规表达式。
此外,大多数现代编程语言使用常规表达式来执行特定数据片段的程序,一旦您理解常规表达式,您将能够将这些知识转移到许多常见的计算机相关任务中,从在文本编辑器中执行高级搜索到验证用户输入。