php-parser

php-parser是一个php库,可以将PHP代码解析为抽象语法树

安装:

composer require nikic/php-parser

ini_set('xdebug.max_nesting_level', 3000);

这个用于设置运行嵌套执行的函数的最大数目,防止在遍历结点树的时候出错

创建一个解析器:

use PhpParser\ParserFactory;
$parser = (new ParserFactory)->create(ParserFactory::PREFER_PHP7);

ParserFactory::PREFER_PHP7:尝试将代码解析为PHP7,如果失败则尝试将其解析为PHP5
官方的一个例子:
<?php
require('./vendor/autoload.php');

use PhpParser\ParserFactory;
use PhpParser\Error;


$code = <<<'CODE'
<?php
function printLine($msg){
    echo $msg,"\n";
}
printLine("Hello World!");
CODE;

$parser = (new ParserFactory())->create(ParserFactory::PREFER_PHP7);
try{
    $smts = $parser->parse($code);
}catch (Error $e){
    echo 'Parser Error: ',$e->getMessage();
}

如果有语法错误则会抛出PhpParser\Error的异常


可以使用NodeDumper来打印抽象语法树 (也可以直接使用var_dump)

use PhpParser\NodeDumper;
$nodeDumper = new NodeDumper();
$nodeDumper->dump($smts);

结果:

array(
    0: Stmt_Function(
        attrGroups: array(
        )
        byRef: false
        name: Identifier(
            name: printLine
        )
        params: array(
            0: Param(
                attrGroups: array(
                )
                flags: 0
                type: null
                byRef: false
                variadic: false
                var: Expr_Variable(
                    name: msg
                )
                default: null
            )
        )
        returnType: null
        stmts: array(
            0: Stmt_Echo(
                exprs: array(
                    0: Expr_Variable(
                        name: msg
                    )
                    1: Scalar_String(
                        value: 
                    
                    )
                )
            )
        )
    )
    1: Stmt_Expression(
        expr: Expr_FuncCall(
            name: Name(
                parts: array(
                    0: printLine
                )
            )
            args: array(
                0: Arg(
                    name: null
                    value: Scalar_String(
                        value: Hello World!
                    )
                    byRef: false
                    unpack: false
                )
            )
        )
    )
)

可以看到这个节点树是一个包含两个元素的数组,两个元素分别为

Stmt_FunctionStmt_Expression

他们所对应的类名:

  • Stmt_Function -> PhpParser\Node\Stmt\Function_

  • Stmt_Expression -> PhpParser\Node\Stmt\Expression

    PHP Parser 对语言节点(Node)进行分组:

  • PhpParser\Node\Stmt 是语句(statement)结点,包括无返回值和不会出现在表达式的语言结构,如类的定义

  • PhpParser\Node\Expr是表达式(expression)结点,包括有返回值和能出现在表达式的语言结构,如$var(PhpParser\Node\Expr\Variable)


<?php
require('./vendor/autoload.php');

use PhpParser\ParserFactory;
use PhpParser\Error;
use PhpParser\NodeDumper;


$code = "<?php echo 'Hi ', hi\\getTarget();";

$parser = (new ParserFactory())->create(ParserFactory::PREFER_PHP7);
try{
    $smts = $parser->parse($code);
}catch (Error $e){
    echo 'Parser Error: ',$e->getMessage();
}

$nodeDumper = new NodeDumper();
echo $nodeDumper->dump($smts);

输出的$smts :

array(
    0: Stmt_Echo(
        exprs: array(
            0: Scalar_String(
                value: Hello
            )
            1: Expr_FuncCall(
                name: Name(
                    parts: array(
                        0: hi
                        1: getTarget
                    )
                )
                args: array(
                )
            )
        )
    )
)

改变子节点的值 将hi改成Hello:
$smts[0]->exprs[0]->value = "Hello"

可以使用PrettyPrinter将AST还原回PHP代码

use PhpParser\PrettyPrinter\Standard;
$prettyPrinter = new Standard();
echo $prettyPrinter->prettyPrint($smts);

结点遍历

可以使用PhpParser\NodeTraverser


$traverse = new NodeTraverser();
$traverse->addVisitor(new Myvisitor());

其中Myvisitor是一个继承了PhpParserNodeVisitorAbstract接口的类


class Myvisitor extends  NodeVisitorAbstract{
    public function leaveNode(Node $node)
    {
        if($node instanceof Node\Scalar\String_){
            $node->value = "114514";
        }
    }
}

这个功能是将所有字符串替换为"114514"

完整代码:

<?php
require('./vendor/autoload.php');

use PhpParser\ParserFactory;
use PhpParser\Error;
use PhpParser\NodeDumper;
use PhpParser\NodeTraverser;
use PhpParser\PrettyPrinter;
use PhpParser\NodeVisitorAbstract;
use PhpParser\Node;

class Myvisitor extends  NodeVisitorAbstract{
    public function leaveNode(Node $node)
    {
        if($node instanceof Node\Scalar\String_){
            $node->value = "114514";
        }
    }
}
$parser = (new ParserFactory)->create(ParserFactory::PREFER_PHP7);
$code = '<?php echo "HelloHello";?>';
$traverse = new NodeTraverser();
$traverse->addVisitor(new Myvisitor());
$prettyPrinter = new PrettyPrinter\Standard();


try{
    $stmts = $parser->parse($code);
    $stmts = $traverse->traverse($stmts);
    $code = $prettyPrinter->prettyPrint($stmts);
    echo $code;
}catch (Error $e){
    echo 'Parser Error: '.$e->getMessage();
}

运行后会输出 echo "114514";

之前继承的PhpParser\NodeVisitor接口定义了四个方法:

public function beforeTraverse(array $nodes);
public function enterNode(\PhpParser\Node $node);
public function leaveNode(\PhpParser\Node $node);
public function afterTraverse(array $nodes);
  • beforeTraverse会在遍历开始之前调用一次,可用于在遍历之前重置值或准备遍历树

  • afterTraverse会在遍历结束后调用一次

  • enterNodeleaveNode会作用于每个结点,前者是在他的子节点被遍历之前,后者是在离开后

一个遍历的例子:

Expr_FuncCall(
    name: Name(
        parts: array(
            0: printLine
        )
    )
    args: array(
        0: Arg(
            value: Scalar_String(
                value: Hello World!!!
            )
            byRef: false
            unpack: false
        )
    )
)

遍历上面这个AST时会按下面的顺序调用enter/leave方法

enterNode(Expr_FuncCall)
enterNode(Name)
leaveNode(Name)
enterNode(Arg)
enterNode(Scalar_String)
leaveNode(Arg)
leaveNode(Expr_FuncCall)

从visitor内部修改AST的方法:

1.直接赋值

public function leaveNode(Node $node) {
    if ($node instanceof Node\Scalar\LNumber) {
        // increment all integer literals
        $node->value++;
    }
}

2.通过返回一个新结点来替换当前结点

public function leaveNode(Node $node) {
    if ($node instanceof Node\Expr\BinaryOp\BooleanAnd) {
        // Convert all $a && $b expressions into !($a && $b)
        return new Node\Expr\BooleanNot($node);
    }
}

需要注意的是,如果结点发生了替换,那么递归遍历也将考虑新结点的子结点,如果不小心可能会造成无限递归

3.只有leaveNode支持的特殊的替换类型

删除结点:

public function leaveNode(Node $node) {
    if ($node instanceof Node\Stmt\Return_) {
        // Remove all return statements
        return NodeTraverser::REMOVE_NODE;
    }
}

一个例子:

public function leaveNode(Node $node) {
    if ($node instanceof Node\Stmt\Expression
        && $node->expr instanceof Node\Expr\FuncCall
        && $node->expr->name instanceof Node\Name
        && $node->expr->name->toString() === 'var_dump'
    ) {
        return NodeTraverser::REMOVE_NODE;
    }
}

这将会删除所有作为表达式出现的var_dump 比如var_dump($a),但是if(var_dump($a))就不会被删除

除了删除结点外,还可以将一个结点替换为多个结点

public function leaveNode(Node $node) {
    if ($node instanceof Node\Stmt\Return_ && $node->expr !== null) {
        // Convert "return foo();" into "$retval = foo(); return $retval;"
        $var = new Node\Expr\Variable('retval');
        return [
            new Node\Stmt\Expression(new Node\Expr\Assign($var, $node->expr)),
            new Node\Stmt\Return_($var),
        ];
    }
}

使用NodeTraverser::DONT_TRAVERSE_CHILDREN可终止遍历该结点的子结点(只在enterNode中可用)

一个例子:

private $classes = [];
public function enterNode(Node $node) {
    if ($node instanceof Node\Stmt\Class_) {
        $this->classes[] = $node;
        return NodeTraverser::DONT_TRAVERSE_CHILDREN;
    }
}

如果需要查找文件中的所有类声明,则一旦看到了一个类声明,就没有必要再检查它的所有子节点,因为 PHP 不允许嵌套类。 在这种情况下,可以指示visitor不要递归到类节点

感觉有点类似剪枝操作

除了终止遍历子结点,还可以通过NodeTraverser::STOP_TRAVERSAL来终止遍历(enterNode和leaveNode中均可用)

如果一个traverser注册了多个vistor,visitor的遍历会被交错

e.g:

$traverser = new NodeTraverser();
$traverser->addVisitor($visitorA);
$traverser->addVisitor($visitorB);
$stmts = $traverser->traverse($stmts);

$smts:

Stmt_Return(
    expr: Expr_Variable(
        name: foobar
    )
)

遍历过程:

$visitorA->enterNode(Stmt_Return)
$visitorB->enterNode(Stmt_Return)
$visitorA->enterNode(Expr_Variable)
$visitorB->enterNode(Expr_Variable)
$visitorA->leaveNode(Expr_Variable)
$visitorB->leaveNode(Expr_Variable)
$visitorA->leaveNode(Stmt_Return)
$visitorB->leaveNode(Stmt_Return)

通过NodeFinder可以无需创建visitor,更加方便地查找结点

use PhpParser\{Node, NodeFinder};

$nodeFinder = new NodeFinder;

// Find all class nodes.
$classes = $nodeFinder->findInstanceOf($stmts, Node\Stmt\Class_::class);

// Find all classes that extend another class
$extendingClasses = $nodeFinder->find($stmts, function(Node $node) {
    return $node instanceof Node\Stmt\Class_
        && $node->extends !== null;
});

// Find first class occurring in the AST. Returns null if no class exists.
$class = $nodeFinder->findFirstInstanceOf($stmts, Node\Stmt\Class_::class);

// Find first class that has name $name
$class = $nodeFinder->findFirst($stmts, function(Node $node) use ($name) {
    return $node instanceof Node\Stmt\Class_
        && $node->resolvedName->toString() === $name;
});

$nodeFinder也是靠traverser来实现的,只是简化了常见用例

一个简单的字符串混淆和解混淆:

var_dump('Hello World');替换为var_dump(str_rot13('Uryyb Jbeyq'));:

<?php

use PhpParser\Node;
use PhpParser\ParserFactory;
use PhpParser\NodeTraverser;
use PhpParser\NodeVisitorAbstract;
use PhpParser\PrettyPrinter\Standard;


require("./vendor/autoload.php");


class MyVisitor extends NodeVisitorAbstract{
    public function leaveNode(Node $node)
    {
        if($node instanceof Node\Scalar\String_){
            return new Node\Expr\FuncCall(
                new Node\Name("str_rot13"),
                [new Node\Arg(new Node\Scalar\String_(str_rot13($node->value)))]
            );
        }

    }
}

$parser = (new ParserFactory())->create(ParserFactory::PREFER_PHP7);
$ast = $parser->parse(file_get_contents('a.php'));
$traverser = new NodeTraverser();
$traverser->addVisitor(new MyVisitor($parser));
$ast = $traverser->traverse($ast);
$prettyPrinter = new Standard();
$ret = $prettyPrinter->prettyPrint($ast);
echo $ret;

相应的解混淆:
<?php

use PhpParser\Node;
use PhpParser\ParserFactory;
use PhpParser\NodeTraverser;
use PhpParser\NodeVisitorAbstract;
use PhpParser\PrettyPrinter\Standard;


require("./vendor/autoload.php");


class MyVisitor extends NodeVisitorAbstract{
    public function leaveNode(Node $node)
    {
        if($node instanceof Node\Expr\FuncCall &&
        $node->name instanceof Node\Name &&
        $node->name->parts[0] == "str_rot13" &&
        $node->args[0]->value instanceof Node\Scalar\String_
        ){
            $value = $node->args[0]->value->value;
            return new Node\Scalar\String_(str_rot13($value));
        }

    }
}

$parser = (new ParserFactory())->create(ParserFactory::PREFER_PHP7);
$ast = $parser->parse(file_get_contents('a.php'));
$traverser = new NodeTraverser();
$traverser->addVisitor(new MyVisitor($parser));
$ast = $traverser->traverse($ast);
$prettyPrinter = new Standard();
$ret = $prettyPrinter->prettyPrint($ast);
echo $ret;



参考链接:
官方文档
开发简单的PHP混淆器与解混淆器