使用 AST 阅读 JavaScript 源代码

介绍

假设你有一个大JavaScript文件,它来自旧时代,它长达7万行,你需要使用webpack或consorts来分割它,然后你需要知道它暴露于全球范围的函数或常数。

让计算机阅读您的代码,并从中提取您想要的内容。

它是 Abstract Syntax Trees (AST) 的工作。

Logo declaring AST in a design that resembles a comic book superhero.

我们的任务,如果您选择接受它,将是提取在全球范围内暴露的所有函数的名称。

 1[label test.js]
 2// test the code
 3function decrementAndAdd(a, b) {
 4   function add(c, d) {
 5      return c + d;
 6   }
 7   a--;
 8   b = b - 1;
 9   return add(a,b)
10}
11
12// test the code
13function incrementAndMultiply(a, b) {
14    function multiply(c, d) {
15      return c * d;
16    }
17    a++;
18    b = b + 1;
19    return multiply(a, b)
20}

结果应该是(decrementAndAdd,incrementAndMultiply`)。

破解代码

一个 AST 是对 JavaScript 的解析代码的结果.对于 JavaScript,一个 AST 是包含源的树形表示的 JavaScript 对象. 在我们使用它之前,我们必须创建它. 根据我们解析的代码,我们选择合适的解析器。

在这里,由于代码是 ES5 兼容的,我们可以选择``解析器。

以下是一些最著名的开源 ECMAScript 解析器:

Parser	Supported Languages	GitHub
acorn	esnext & JSX (using acorn-jsx)	https://github.com/acornjs/acorn
esprima	esnext & older	https://github.com/jquery/esprima
cherow	esnext & older	https://github.com/cherow/cherow
espree	esnext & older	https://github.com/eslint/espree
shift	esnext & older	https://github.com/shapesecurity/shift-parser-js
babel	esnext, JSX & typescript	https://github.com/babel/babel
TypeScript	esnext & typescript	https://github.com/Microsoft/TypeScript

所有的解析器都是一样的,给它一些代码,得到一个AST。

1const { Parser } = require('acorn')
2
3const ast = Parser.parse(readFileSync(fileName).toString())

TypeScript 解析器语法略有不同,但它是好的在这里记录。

这是通过@babel/parser解析获得的树:

Tree graph from @babel/parser

1[label test-2.js]
2// test the code
3function decrementAndAdd(a, b) {
4  return add(a, b)
5}

穿越

为了找到我们要提取的内容,通常最好不要一次处理整个AST,这将是一个具有数千个节点的大型对象,即使是小型代码片段。

最好的方法是只过滤一个关心的代币。

再次,有很多工具可用于执行此 traversing 部分. 对于我们的例子,我们将使用 recast。它非常快,具有保持代码的版本未受影响的优势. 这样,它可以返回您想要的代码的部分,其原始格式。

当我们穿越时,我们会找到所有功能代币,这就是为什么我们使用visitFunctionDeclaration方法的原因。

如果我们想看看变量分配,我们会使用visitAssignmentExpression。

 1[label recast-acorn-example.js]
 2const recast = require('recast');
 3const { Parser } = require('acorn');
 4
 5const ast = Parser.parse(readFileSync(fileName).toString());
 6
 7recast.visit(
 8  ast,
 9  {
10    visitFunctionDeclaration: (path) => {
11      // the navigation code here...
12
13      // return false to stop at this depth
14      return false;
15    }
16  }
17)

节点类型

通常,代币类型的名称并不明显,你可以使用 ast-explorer来搜索所研究的类型,只需在左侧的面板上插入你的代码,选择你正在使用的分析器,然后选择voilà!。

Shallow 或深

我们并不总是想看树的每一个层次,有时我们想做一个深入的搜索,而其他时候我们只是想看顶层,取决于框架,语法不同。

通过重置,如果我们想停止在当前的深度搜索,只需在完成时返回错误 这是我们之前所做的。

使用@babel/traverse不需要告诉babel在哪里继续,只需要指定在哪里停止返回错误声明。

 1[label recast-acorn-example.js]
 2recast.visit(
 3  ast,
 4  {
 5    visitFunctionDeclaration: (path) => {
 6      // deal with the path here...
 7
 8      // run the visit on every child node
 9      this.traverse(path);
10    }
11  }
12)

我们从一个非常广泛的搜索到一个较小的样本,我们现在可以提取我们需要的数据。

从路径导航到节点,到财产

向visitFunctionDeclaration传递的路径对象是NodePath。这个对象代表了父母和孩子的AST节点之间的连接,这种路径本身对我们无用,因为它代表了函数声明和函数体之间的连接。

使用ast-explorer,我们可以找到我们正在寻找的路径的内容。

经典的做法是:‘path.node’. 它会在父母与孩子关系中找到孩子的节点. 如果您选择搜索函数,则在‘path.node’中的节点将是类型的‘Function’:

 1[label recast-acorn-example.js]
 2const functionNames = [];
 3
 4recast.visit(
 5  ast,
 6  {
 7    visitFunctionDeclaration: (path) => {
 8      console.log(path.node.type); // will print "FunctionDeclaration"
 9      functionNames.push(path.node.id.name); // will add the name of the function to the array
10
11      // return false to avoid looking inside of the functions body
12      // we stop our search at this level
13      return false;
14    }
15  }
16)

尝试将交叉函数相互包装,以查看子树。下面的代码将返回每个正確位于第二层的函数。

 1[label recast-acorn-example.js]
 2const functionNames = [];
 3
 4recast.visit(
 5  ast,
 6  {
 7    visitFunctionDeclaration: (path) => {
 8      let newPath = path.get('body');
 9
10      // subtraversing
11      recast.visit(
12        newPath,
13        {
14          visitFunctionDeclaration: (path) => {
15            functionNames.push(path.node.id.name);
16            return false;
17          }
18        }
19      )
20
21      // return false to not look at other functions contained in this function
22      // leave this role to the sub-traversing
23      return false;
24    }
25  }
26)

任务完成了!!!

我们可以很容易地找到参数或暴露变量的名称。

词典

AST节点 一棵树中的一个对象. 示例:函数声明,变量分配,对象表达式

NodePath 连接一个家长节点和一个树上的儿童节点

NodeProperty节点的定义部分. 根据节点,一个人可能只有一个名字或更多的信息

使用 AST 阅读 JavaScript 源代码

介绍

破解代码

穿越

节点类型

Shallow 或 深

从路径导航到节点,到财产

词典

Shallow 或深