Protecting a Python codebase - Part 2
This is the second part of the Protecting a Python codebase series. This time we will be playing with Python Abstract Syntax Trees in order to protect the original code of a Python based project.
Python Abstract Syntax Trees
As the Python Documentation says:
The ast module helps Python applications to process trees of the Python abstract syntax grammar. The abstract syntax itself might change with each Python release; this module helps to find out programmatically what the current grammar looks like.
The built-in module provides the tools to navigate, create or modify a tree. A Python AST is tree-like representation of the python code that’s closer to the bytecode implementation than the source code itself.
Take for instance the code in a little python module named hello.py
:
1
2
def hello_world(name='World'):
print('Hello {0}'.format(name))
We can get the tree representation using the ast
module:
1
2
3
4
5
import ast
with open('hello.py') as f:
code = ''.join(f.readlines())
root = ast.parse(code)
We can also write a node visitor to inspect the tree structure:
1
2
3
4
5
6
7
8
9
10
11
import ast
class Visitor(ast.NodeVisitor):
def __init__(self):
self.stack = 0
def visit(self, node):
print " " * self.stack, node
self.stack += 1
super(Visitor, self).visit(node)
self.stack -= 1
Applying the visitor to the parsed code will result in something like:
Each element has its own set of attributes to fulfill the operation,
Module
objects have a body
attribute which is a list with the
different elements defined in it. Call
have the function name to
call (in the example above it refers to the format
method in
String
). There’s a _field
property with the list of attributes
supported by the different types.
Python Abstract Syntax Trees are a really powerful tool, we can build really neat tools with them, tools to gather code statistics, tools to translate Python to other languages. We will use it to write a code scrambling and unscrambling tool, that combined with an import hook will serve as a protection mechanism.
For that we need a NodeTransformer class:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
from ast import NodeTransformer
class Scrambler(NodeTransformer):
HEADER = '# scrambled'
def __init__(self, scramble=True):
self.do_scramble = scramble
def visit(self, node):
node_out = super(Scrambler, self).visit(node)
if hasattr(node_out, 'body') and isinstance(node_out.body, list):
if self.do_scramble:
node_out.body = self.scramble(node_out.body)
else:
node_out.body = self.unscramble(node_out.body)
return node_out
def scramble(self, items):
return self._step2(self._step1(items[:]))
def unscramble(self, items):
return self._step1(self._step2(items[:]))
def _step1(self, items):
i = 0
length = len(items)
while (i + 1) < length:
items[i], items[i + 1] = items[i + 1], items[i]
i += 2
return items
def _step2(self, items):
length = len(items)
if length % 2 == 0:
items[:length / 2], items[length / 2:] = \
items[length / 2:], items[:length / 2]
else:
items[:(length - 1) / 2], items[(length + 1) / 2:] = \
items[(length + 1) / 2:], items[:(length - 1) / 2]
return items
This code will take a Python AST, and
scramble it by swapping them items in those with a body
attribute
that’s of type list
. The transformation applied will be similar to:
Which is easy to transform back, it’s enough to apply the transformations in the reverse order to get the original object.
If we apply it to the following code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
import os
import pprint
def files_tree(path=None):
out = {}
path = path or os.getcwd()
files = os.listdir(path)
for file in files:
if file.startswith('.'):
continue
sub_path = os.path.join(path, file)
if os.path.isdir(sub_path):
value = files_tree(sub_path)
else:
value = sub_path
out[file] = value
return out
tree = files_tree()
print 'Items in the top level: {0}'.format(len(tree.keys()))
print 'Items in the second level: {0}'.format(
sum(len(val.keys()) for val in tree.values()
if isinstance(val, dict))
)
pprint.pprint(tree)
It will come out as:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# scrambled
print 'Items in the second level: {0}'.format(sum((len(val.keys()) for val in tree.values() if isinstance(val, dict))))
print 'Items in the top level: {0}'.format(len(tree.keys()))
pprint.pprint(tree)
def files_tree(path=None):
files = os.listdir(path)
return out
for file in files:
out[file] = value
if os.path.isdir(sub_path):
value = files_tree(sub_path)
else:
value = sub_path
sub_path = os.path.join(path, file)
if file.startswith('.'):
continue
path = (path or os.getcwd())
out = {}
import pprint
import os
tree = files_tree()
And this is just a simple transformation, more changes can be implemented in order to make things a bit more interesting, some options to take into account are:
- Change parameters order in functions
- Change decorators order in functions
- Change arguments order in function calls
- Change arguments order in class creations
- Rename variable names
- Encode strings with rot13 or similar
This scrambler with the corresponding import hook are published in the companion examples repository.
Conclusion
Python Abstract Syntax Trees is a very versatile tool that allows us to create a solution that better fits the project where it’s used.
But, as it’s mentioned in the previous article, the code can be inspected from a shell once it was imported.
If you plan digging more in Python AST, I recommend Green Tree Snakes - the missing Python AST docs docs.