php-parser
php-parser是一个php库,可以将PHP代码解析为抽象语法树
安装:
composer require nikic/php-parser
ini_set('xdebug.max_nesting_level', 3000);
这个用于设置运行嵌套执行的函数的最大数目,防止在遍历结点树的时候出错
创建一个解析器:
use PhpParser\ParserFactory;
$parser = (new ParserFactory)->create(ParserFactory::PREFER_PHP7);
ParserFactory::PREFER_PHP7:尝试将代码解析为PHP7,如果失败则尝试将其解析为PHP5
官方的一个例子:
<?php
require('./vendor/autoload.php');
use PhpParser\ParserFactory;
use PhpParser\Error;
$code = <<<'CODE'
<?php
function printLine($msg){
echo $msg,"\n";
}
printLine("Hello World!");
CODE;
$parser = (new ParserFactory())->create(ParserFactory::PREFER_PHP7);
try{
$smts = $parser->parse($code);
}catch (Error $e){
echo 'Parser Error: ',$e->getMessage();
}
如果有语法错误则会抛出PhpParser\Error
的异常
可以使用NodeDumper来打印抽象语法树 (也可以直接使用var_dump)
use PhpParser\NodeDumper;
$nodeDumper = new NodeDumper();
$nodeDumper->dump($smts);
结果:
array(
0: Stmt_Function(
attrGroups: array(
)
byRef: false
name: Identifier(
name: printLine
)
params: array(
0: Param(
attrGroups: array(
)
flags: 0
type: null
byRef: false
variadic: false
var: Expr_Variable(
name: msg
)
default: null
)
)
returnType: null
stmts: array(
0: Stmt_Echo(
exprs: array(
0: Expr_Variable(
name: msg
)
1: Scalar_String(
value:
)
)
)
)
)
1: Stmt_Expression(
expr: Expr_FuncCall(
name: Name(
parts: array(
0: printLine
)
)
args: array(
0: Arg(
name: null
value: Scalar_String(
value: Hello World!
)
byRef: false
unpack: false
)
)
)
)
)
可以看到这个节点树是一个包含两个元素的数组,两个元素分别为
Stmt_Function
和Stmt_Expression
他们所对应的类名:
-
Stmt_Function -> PhpParser\Node\Stmt\Function_
-
Stmt_Expression -> PhpParser\Node\Stmt\Expression
PHP Parser 对语言节点(Node)进行分组:
-
PhpParser\Node\Stmt
是语句(statement)结点,包括无返回值和不会出现在表达式的语言结构,如类的定义 -
PhpParser\Node\Expr
是表达式(expression)结点,包括有返回值和能出现在表达式的语言结构,如$var
(PhpParser\Node\Expr\Variable)
<?php
require('./vendor/autoload.php');
use PhpParser\ParserFactory;
use PhpParser\Error;
use PhpParser\NodeDumper;
$code = "<?php echo 'Hi ', hi\\getTarget();";
$parser = (new ParserFactory())->create(ParserFactory::PREFER_PHP7);
try{
$smts = $parser->parse($code);
}catch (Error $e){
echo 'Parser Error: ',$e->getMessage();
}
$nodeDumper = new NodeDumper();
echo $nodeDumper->dump($smts);
输出的$smts :
array(
0: Stmt_Echo(
exprs: array(
0: Scalar_String(
value: Hello
)
1: Expr_FuncCall(
name: Name(
parts: array(
0: hi
1: getTarget
)
)
args: array(
)
)
)
)
)
改变子节点的值 将hi改成Hello:
$smts[0]->exprs[0]->value = "Hello"
可以使用PrettyPrinter将AST还原回PHP代码
use PhpParser\PrettyPrinter\Standard;
$prettyPrinter = new Standard();
echo $prettyPrinter->prettyPrint($smts);
结点遍历
可以使用PhpParser\NodeTraverser
$traverse = new NodeTraverser();
$traverse->addVisitor(new Myvisitor());
其中Myvisitor是一个继承了PhpParserNodeVisitorAbstract
接口的类
class Myvisitor extends NodeVisitorAbstract{
public function leaveNode(Node $node)
{
if($node instanceof Node\Scalar\String_){
$node->value = "114514";
}
}
}
这个功能是将所有字符串替换为"114514"
完整代码:
<?php
require('./vendor/autoload.php');
use PhpParser\ParserFactory;
use PhpParser\Error;
use PhpParser\NodeDumper;
use PhpParser\NodeTraverser;
use PhpParser\PrettyPrinter;
use PhpParser\NodeVisitorAbstract;
use PhpParser\Node;
class Myvisitor extends NodeVisitorAbstract{
public function leaveNode(Node $node)
{
if($node instanceof Node\Scalar\String_){
$node->value = "114514";
}
}
}
$parser = (new ParserFactory)->create(ParserFactory::PREFER_PHP7);
$code = '<?php echo "HelloHello";?>';
$traverse = new NodeTraverser();
$traverse->addVisitor(new Myvisitor());
$prettyPrinter = new PrettyPrinter\Standard();
try{
$stmts = $parser->parse($code);
$stmts = $traverse->traverse($stmts);
$code = $prettyPrinter->prettyPrint($stmts);
echo $code;
}catch (Error $e){
echo 'Parser Error: '.$e->getMessage();
}
运行后会输出 echo "114514";
之前继承的PhpParser\NodeVisitor接口定义了四个方法:
public function beforeTraverse(array $nodes);
public function enterNode(\PhpParser\Node $node);
public function leaveNode(\PhpParser\Node $node);
public function afterTraverse(array $nodes);
-
beforeTraverse
会在遍历开始之前调用一次,可用于在遍历之前重置值或准备遍历树 -
afterTraverse
会在遍历结束后调用一次 -
enterNode
和leaveNode
会作用于每个结点,前者是在他的子节点被遍历之前,后者是在离开后
一个遍历的例子:
Expr_FuncCall(
name: Name(
parts: array(
0: printLine
)
)
args: array(
0: Arg(
value: Scalar_String(
value: Hello World!!!
)
byRef: false
unpack: false
)
)
)
遍历上面这个AST时会按下面的顺序调用enter/leave方法
enterNode(Expr_FuncCall)
enterNode(Name)
leaveNode(Name)
enterNode(Arg)
enterNode(Scalar_String)
leaveNode(Arg)
leaveNode(Expr_FuncCall)
从visitor内部修改AST的方法:
1.直接赋值
public function leaveNode(Node $node) {
if ($node instanceof Node\Scalar\LNumber) {
// increment all integer literals
$node->value++;
}
}
2.通过返回一个新结点来替换当前结点
public function leaveNode(Node $node) {
if ($node instanceof Node\Expr\BinaryOp\BooleanAnd) {
// Convert all $a && $b expressions into !($a && $b)
return new Node\Expr\BooleanNot($node);
}
}
需要注意的是,如果结点发生了替换,那么递归遍历也将考虑新结点的子结点,如果不小心可能会造成无限递归
3.只有leaveNode支持的特殊的替换类型
删除结点:
public function leaveNode(Node $node) {
if ($node instanceof Node\Stmt\Return_) {
// Remove all return statements
return NodeTraverser::REMOVE_NODE;
}
}
一个例子:
public function leaveNode(Node $node) {
if ($node instanceof Node\Stmt\Expression
&& $node->expr instanceof Node\Expr\FuncCall
&& $node->expr->name instanceof Node\Name
&& $node->expr->name->toString() === 'var_dump'
) {
return NodeTraverser::REMOVE_NODE;
}
}
这将会删除所有作为表达式出现的var_dump
比如var_dump($a),但是if(var_dump($a))就不会被删除
除了删除结点外,还可以将一个结点替换为多个结点
public function leaveNode(Node $node) {
if ($node instanceof Node\Stmt\Return_ && $node->expr !== null) {
// Convert "return foo();" into "$retval = foo(); return $retval;"
$var = new Node\Expr\Variable('retval');
return [
new Node\Stmt\Expression(new Node\Expr\Assign($var, $node->expr)),
new Node\Stmt\Return_($var),
];
}
}
使用NodeTraverser::DONT_TRAVERSE_CHILDREN
可终止遍历该结点的子结点(只在enterNode中可用)
一个例子:
private $classes = [];
public function enterNode(Node $node) {
if ($node instanceof Node\Stmt\Class_) {
$this->classes[] = $node;
return NodeTraverser::DONT_TRAVERSE_CHILDREN;
}
}
如果需要查找文件中的所有类声明,则一旦看到了一个类声明,就没有必要再检查它的所有子节点,因为 PHP 不允许嵌套类。 在这种情况下,可以指示visitor不要递归到类节点
感觉有点类似剪枝操作
除了终止遍历子结点,还可以通过NodeTraverser::STOP_TRAVERSAL
来终止遍历(enterNode和leaveNode中均可用)
如果一个traverser注册了多个vistor,visitor的遍历会被交错
e.g:
$traverser = new NodeTraverser();
$traverser->addVisitor($visitorA);
$traverser->addVisitor($visitorB);
$stmts = $traverser->traverse($stmts);
$smts:
Stmt_Return(
expr: Expr_Variable(
name: foobar
)
)
遍历过程:
$visitorA->enterNode(Stmt_Return)
$visitorB->enterNode(Stmt_Return)
$visitorA->enterNode(Expr_Variable)
$visitorB->enterNode(Expr_Variable)
$visitorA->leaveNode(Expr_Variable)
$visitorB->leaveNode(Expr_Variable)
$visitorA->leaveNode(Stmt_Return)
$visitorB->leaveNode(Stmt_Return)
通过NodeFinder可以无需创建visitor,更加方便地查找结点
use PhpParser\{Node, NodeFinder};
$nodeFinder = new NodeFinder;
// Find all class nodes.
$classes = $nodeFinder->findInstanceOf($stmts, Node\Stmt\Class_::class);
// Find all classes that extend another class
$extendingClasses = $nodeFinder->find($stmts, function(Node $node) {
return $node instanceof Node\Stmt\Class_
&& $node->extends !== null;
});
// Find first class occurring in the AST. Returns null if no class exists.
$class = $nodeFinder->findFirstInstanceOf($stmts, Node\Stmt\Class_::class);
// Find first class that has name $name
$class = $nodeFinder->findFirst($stmts, function(Node $node) use ($name) {
return $node instanceof Node\Stmt\Class_
&& $node->resolvedName->toString() === $name;
});
$nodeFinder
也是靠traverser来实现的,只是简化了常见用例
一个简单的字符串混淆和解混淆:
将var_dump('Hello World');
替换为var_dump(str_rot13('Uryyb Jbeyq'));
:
<?php
use PhpParser\Node;
use PhpParser\ParserFactory;
use PhpParser\NodeTraverser;
use PhpParser\NodeVisitorAbstract;
use PhpParser\PrettyPrinter\Standard;
require("./vendor/autoload.php");
class MyVisitor extends NodeVisitorAbstract{
public function leaveNode(Node $node)
{
if($node instanceof Node\Scalar\String_){
return new Node\Expr\FuncCall(
new Node\Name("str_rot13"),
[new Node\Arg(new Node\Scalar\String_(str_rot13($node->value)))]
);
}
}
}
$parser = (new ParserFactory())->create(ParserFactory::PREFER_PHP7);
$ast = $parser->parse(file_get_contents('a.php'));
$traverser = new NodeTraverser();
$traverser->addVisitor(new MyVisitor($parser));
$ast = $traverser->traverse($ast);
$prettyPrinter = new Standard();
$ret = $prettyPrinter->prettyPrint($ast);
echo $ret;
相应的解混淆:
<?php
use PhpParser\Node;
use PhpParser\ParserFactory;
use PhpParser\NodeTraverser;
use PhpParser\NodeVisitorAbstract;
use PhpParser\PrettyPrinter\Standard;
require("./vendor/autoload.php");
class MyVisitor extends NodeVisitorAbstract{
public function leaveNode(Node $node)
{
if($node instanceof Node\Expr\FuncCall &&
$node->name instanceof Node\Name &&
$node->name->parts[0] == "str_rot13" &&
$node->args[0]->value instanceof Node\Scalar\String_
){
$value = $node->args[0]->value->value;
return new Node\Scalar\String_(str_rot13($value));
}
}
}
$parser = (new ParserFactory())->create(ParserFactory::PREFER_PHP7);
$ast = $parser->parse(file_get_contents('a.php'));
$traverser = new NodeTraverser();
$traverser->addVisitor(new MyVisitor($parser));
$ast = $traverser->traverse($ast);
$prettyPrinter = new Standard();
$ret = $prettyPrinter->prettyPrint($ast);
echo $ret;
参考链接:
官方文档
开发简单的PHP混淆器与解混淆器