Foreword
python 由于是脚本语言,本质上使用了python就会遇到一个问题,当你需要加密的时候,就非常尴尬.
当然解决办法还是有的,当然还是要根据需要加密的场景来对应改变
pyc
最普遍的自然就是使用.pyc,不过pyc只是字节码文件,虽然对于普通用户来说可以达到加密的效果,但是只要对有心人来说,就好像一本日记没有上锁似的,只是合上了而已.
可以通过下面的网站,直接在线反编译pyc,简直不能再简单了
https://tool.lu/pyc/
pyinstaller
毫无疑问,如果能直接把python转化成对应的exe,其反向难度会上升一个台阶,但是破解依然可行.
类似的py2exe什么的都有相同的效果,但是这会遇到一个问题,这个问题如果对方接口是源码输入的情况下,那么要如何加密呢?
混淆
保护python代码,较为有效的方法就是混淆了,目前混淆也有很多个方式。
pyminifier
pyminifier可以用于python代码的压缩和混淆,不过他混淆后的代码还是比较有问题的,经常出现混淆后的代码无法正常使用的情况。
https://github.com/liftoff/pyminifier
通过pip安装
pip install pyminifier
混淆
pyminifier -O ./need_obfuscate.py > new.py
比较简单的代码可以试用一下,比较复杂还是算了吧,就是个demo级别的东西
pyob.oxyry.com
国内有一位大神写了一个混淆,还是十分不错的,混淆的效果也很好
http://pyob.oxyry.com/ (2017年12月12日后,再也打不开了,联系作者目前没有反应)
import sys
# save the default console interface -- maya script editor
__console__=sys.stdout
reload(sys)
# set gbk encode for Chinese
sys.setdefaultencoding('gbk')
# recover the default console interface
# sys.stdout=__console__
import maya.cmds as cmds
import maya.mel as cmds_mel
import random
import json
import copy
import math
import os
import time
import urllib2
# Clears maya output window by Zohar
from ctypes import *
user32 = windll.user32
enum_windows_proc = WINFUNCTYPE(c_int, c_int, c_int)
# in maya there is a bug , u cant use {}*10 for init
Dmd_UAVC_group = [{},{},{},{},{},{},{},{},{},{}]
def Dmd_UAVC_get_windows_handle(title, parent = None):
# Returns handles to windows with matching titles
rHwnd = []
def EnumCB(hwnd, lparam, match = title.lower(), rHwnd = rHwnd):
# child
if lparam == 1:
rHwnd.append(hwnd)
return False
title = c_buffer(' ' * 256)
user32.GetWindowTextA(hwnd, title, 255)
if title.value.lower() == match:
rHwnd.append(hwnd)
#print "Matched", title.value
return False
return True
if parent is not None:
user32.EnumChildWindows(parent, enum_windows_proc(EnumCB), 1)
else:
user32.EnumWindows(enum_windows_proc(EnumCB), 0)
return rHwnd
def Dmd_UAVC_exfunc_clear_output_windows():
# print("Clearing Maya output window")
output_window_handle = Dmd_UAVC_get_windows_handle("Output Window")
if not output_window_handle:
print("Output window wasn't found")
else:
ch = Dmd_UAVC_get_windows_handle("", output_window_handle[0])
if ( ch[0] ):
user32.SendMessageA(ch[0], 0x00B1, 0, -1)
user32.SendMessageA(ch[0], 0x00C2, 1, "")
else:
print("Child window wasn't found")
上面的是源码,下面的是混淆后的代码.可以看到导入库的名称被混淆了,函数名,变量名进行了混淆,全局性质的他不会进行混淆.
import sys as O0O0O0OO0O0O0OO0O #line:1
__console__ =O0O0O0OO0O0O0OO0O .stdout #line:3
reload (O0O0O0OO0O0O0OO0O )#line:4
O0O0O0OO0O0O0OO0O .setdefaultencoding ('gbk')#line:6
import maya .cmds as O0OO0O0OOOOO00OO0 #line:11
import maya .mel as O00O0OOOOOOO0OOOO #line:12
import random as O0OO0OOOO0O00OOO0 #line:13
import json as OOO0O00O00OOOO0OO #line:14
import copy as OOOO0000OOO0O0O00 #line:15
import math as OO0O0OO000O00O0OO #line:16
import os as OO0000OO0000O0OOO #line:17
import time as O0O0OO0O00OOO000O #line:18
import urllib2 as O0O00OOO0OOO0O0OO #line:19
from ctypes import *#line:22
user32 =windll .user32 #line:23
enum_windows_proc =WINFUNCTYPE (c_int ,c_int ,c_int )#line:24
Dmd_UAVC_group =[{},{},{},{},{},{},{},{},{},{}]#line:27
def Dmd_UAVC_get_windows_handle (O0OO000O00OOO00O0 ,parent =None ):#line:28
O0O0OOOO000O0O0O0 =[]#line:30
def O00O000O0OO0OOO00 (O00O0OO0O0O0OO000 ,O000OOOOO000O0OO0 ,match =O0OO000O00OOO00O0 .lower (),rHwnd =O0O0OOOO000O0O0O0 ):#line:31
if O000OOOOO000O0OO0 ==1 :#line:33
rHwnd .append (O00O0OO0O0O0OO000 )#line:34
return False #line:35
O000O0OO00O0OO000 =c_buffer (' '*256 )#line:37
user32 .GetWindowTextA (O00O0OO0O0O0OO000 ,O000O0OO00O0OO000 ,255 )#line:38
if O000O0OO00O0OO000 .value .lower ()==match :#line:39
rHwnd .append (O00O0OO0O0O0OO000 )#line:40
return False #line:42
return True #line:43
if parent is not None :#line:45
user32 .EnumChildWindows (parent ,enum_windows_proc (O00O000O0OO0OOO00 ),1 )#line:46
else :#line:47
user32 .EnumWindows (enum_windows_proc (O00O000O0OO0OOO00 ),0 )#line:48
return O0O0OOOO000O0O0O0 #line:49
def Dmd_UAVC_exfunc_clear_output_windows ():#line:51
OO0O0000OOOO0OO0O =Dmd_UAVC_get_windows_handle ("Output Window")#line:53
if not OO0O0000OOOO0OO0O :#line:54
print ("Output window wasn't found")#line:55
else :#line:56
O0O0OO0O0OOO0OOO0 =Dmd_UAVC_get_windows_handle ("",OO0O0000OOOO0OO0O [0 ])#line:57
if (O0O0OO0O0OOO0OOO0 [0 ]):#line:58
user32 .SendMessageA (O0O0OO0O0OOO0OOO0 [0 ],0x00B1 ,0 ,-1 )#line:59
user32 .SendMessageA (O0O0OO0O0OOO0OOO0 [0 ],0x00C2 ,1 ,"")#line:60
else :#line:61
print ("Child window wasn't found")
#e9015584e6a44b14988f13e2298bcbf9
#===============================================================#
# Obfuscated by Oxyry Python Obfuscator (http://pyob.oxyry.com) #
#===============================================================#
我之所以会选择他的混淆并不是以上的几点,而是他没有选择混淆函数名,可以看到上面的代码中全局级别的函数名都是保留了的,保留的目的也很简单为了应对UI与函数绑定的问题.
比如下面的代码中将一个button与函数进行了绑定,如果混淆了函数名,那么带来的问题就是UI的button将无法使用,找不到函数接口.
cmds. rowLayout(numberOfColumns = 1)
cmds. button(label = "清空输出窗口",width = 350,command = "Dmd_UAVC_exfunc_clear_output_windows()")
cmds.setParent( '..' )
pyobfuscate
https://github.com/astrand/pyobfuscate
这个是一个12年前的老代码了,但是如今依然可用,他会混淆函数名,并且会在代码中穿插一些毫无意义的if xxx的判断
if 24 - 24: TOKENBLANKS % NameTranslator / O0
if 46 - 46: O0 * TOKENBLANKS / NameTranslator * NameTranslator * NameTranslator . NameTranslator
if 62 - 62: i11iIiiIii - TOKENBLANKS % NameTranslator - iIii1I11I1II1 . NameTranslator . TOKENBLANKS
当然这种有点弱智,如果能直接插入一些被混淆的变量去做的话,更能掩人耳目一些.
后来在CSDN上看到了一个半成品,基本是基于上面的代码来改的,用来支持多线程,多文件同时混淆
http://download.csdn.net/download/zhangyulin54321/9749787
不过他的代码有问题,直接使用会出现混淆效果非常差,很多东西都不混淆的情况.
所以基于此我又修改了一下pyobfuscate的代码,其原来的输出接口是直接输出到console端口,我改成了输出到文件
#!/usr/bin/env python
# -*-mode: python; coding: utf-8 -*-
#
# pyobfuscate - Python source code obfuscator
#
# Copyright 2004-2007 Peter Astrand <astrand@cendio.se> for Cendio AB
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; version 2 of the License.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
#
# 2017-11-24 15:58:34
# log:
# use file replace the 'sys.stdout'
# set the source path at the main first
# by elmagnifico
import sys
import types
import symbol
import token
import keyword
import tokenize
import compiler
import parser
import random
import symtable
import StringIO
import getopt
import re
import time
import shutil
import codecs
TOKENBLANKS=1
class NameTranslator:
def __init__(self):
self.realnames = {}
self.bogusnames = []
def get_name(self, name):
"""Get a translation for a real name"""
if not self.realnames.has_key(name):
self.realnames[name] = self.gen_unique_name()
return self.realnames[name]
def get_bogus_name(self,):
"""Get a random bogus name"""
if len(self.bogusnames) < 20:
newname = self.gen_unique_name()
self.bogusnames.append(newname)
return newname
else:
return random.choice(self.bogusnames)
def gen_unique_name(self):
"""Generate a name that hasn't been used before;
not as a real name, not as a bogus name"""
existing_names = self.realnames.values() + self.bogusnames
name = ""
while 1:
name += self.gen_name()
if name not in existing_names:
break
return name
def gen_name():
if random.choice((True, False)):
# Type ilII1ili1Ilil1il1Ilili1
chars = ("i", "I", "1")
else:
# Type oooOo0oOo00oOO0o0O0
chars = ("o", "O", "0")
# Name must'nt begin with a number
result = random.choice(chars[:2])
for x in range(random.randint(1, 12)):
result += random.choice(chars)
return result
gen_name = staticmethod(gen_name)
class LambdaSymTable:
def __init__(self, symtabs, argnames):
# Well, lambdas have no name, so they are safe to obfuscate...
self.symtabs = symtabs
self.mysymbs = {}
for argname in argnames:
self.mysymbs[argname] = symtable.Symbol(argname, symtable.DEF_PARAM)
def lookup(self, name):
lsymb = self.mysymbs.get(name)
if lsymb:
return lsymb
else:
# If the symbol is not found in the current sumboltable,
# then look in the toplevel symtable. Perhaps we should
# even look in all symtabs.
try:
return self.symtabs[-1].lookup(name)
except KeyError:
return self.symtabs[0].lookup(name)
def get_type(self):
return self.symtabs[-1].get_type()
def is_lambda_arg(self, id):
return self.mysymbs.has_key(id)
class CSTWalker:
def __init__(self, source_no_encoding, pubapi):
# Our public API (__all__)
self.pubapi = pubapi
# Names of imported modules
self.modnames = []
self.symtab = symtable.symtable(source_no_encoding, "-", "exec")
cst = parser.suite(source_no_encoding)
elements = parser.ast2tuple(cst, line_info=1)
self.names = {}
self.walk(elements, [self.symtab])
def getNames(self):
return self.names
def addToNames(self, line, name, doreplace):
namedict = self.names.get(line, {})
if not namedict:
self.names[line] = namedict
occurancelist = namedict.get(name, [])
if not occurancelist:
namedict[name] = occurancelist
occurancelist.append(doreplace)
def res_name(self, name):
if name.startswith("__") and name.endswith("__"):
return 1
if name in self.modnames:
return 1
if hasattr(__builtins__, name):
return 1
return 0
def walk(self, elements, symtabs):
# We are not interested in terminal tokens
if type(elements) != types.TupleType:
return
if token.ISTERMINAL(elements[0]):
return
production = elements[0]
if production == symbol.funcdef:
self.handle_funcdef(elements, symtabs)
elif production == symbol.varargslist:
self.handle_varargslist(elements, symtabs)
elif production == symbol.fpdef:
self.handle_fpdef(elements, symtabs)
elif production == symbol.import_as_name:
self.handle_import_as_name(elements, symtabs)
elif production == symbol.dotted_as_name:
self.handle_dotted_as_name(elements, symtabs)
elif production == symbol.dotted_name:
self.handle_dotted_name(elements, symtabs)
elif production == symbol.global_stmt:
self.handle_global_stmt(elements, symtabs)
elif production == symbol.atom:
self.handle_atom(elements, symtabs)
elif production == symbol.trailer:
self.handle_trailer(elements, symtabs)
elif production == symbol.classdef:
self.handle_classdef(elements, symtabs)
elif production == symbol.argument:
self.handle_argument(elements, symtabs)
elif production == symbol.lambdef:
self.handle_lambdef(elements, symtabs)
elif production == symbol.decorator:
self.handle_decorator(elements, symtabs)
else:
for node in elements:
self.walk(node, symtabs)
def mangle_name(self, symtabs, name):
if self.res_name(name):
return name
if not name.startswith("__"):
return name
for i in xrange(len(symtabs)):
tab = symtabs[-1 - i]
tabtype = tab.get_type()
if tabtype == "class":
classname = tab.get_name().lstrip("_")
return "_" + classname + name
return name
def should_obfuscate(self, id, symtabs):
# This is the primary location of the magic in pyobfuscate,
# where we try to figure out if a given symbol should be
# obfuscated or left alone.
tab = symtabs[-1]
# Don't touch reserved names
if self.res_name(id):
return False
# Need to get the internal symbol name before we can look it
# up (needed for private class/object members)
orig_id = id
id = self.mangle_name(symtabs, id)
try:
s = tab.lookup(id)
except Exception:
return False
# XXX: Debug code
# Add the symbols you want to examine to this list
debug_symbols = []
if id in debug_symbols:
print >>sys.stderr, "%s:" % id
print >>sys.stderr, " Imported:", s.is_imported()
print >>sys.stderr, " Parameter:", s.is_parameter()
print >>sys.stderr, " Global:", s.is_global()
print >>sys.stderr, " Local:", s.is_local()
# Explicit imports are a clear no
if s.is_imported():
return False
# Don't obfuscate arguments as the caller might be external
# and referencing them by name
if s.is_parameter():
# But we assume that lambda arguments are never referenced
# by name. FIXME?
if isinstance(tab, LambdaSymTable):
if tab.is_lambda_arg(id):
return True
return False
# Lambda scopes have some kind of pseudo-inheritance from
# the surounding scope. As lambdas can only declare arguments
# (which we just handled), we should start digging upwards for
# all other symbols.
if isinstance(tab, LambdaSymTable):
while True:
symtabs = symtabs[:-1]
if symtabs == []:
raise RuntimeError("Lambda symbol '%s' is not present on any scope" % id)
if id in symtabs[-1].get_identifiers():
return self.should_obfuscate(orig_id, symtabs)
# Global objects require special consideration. Need to figure
# out where the symbol originated...
if s.is_global():
topsymtab = symtabs[0]
# A global that's not in the global symbol table is a symbol
# that Python has no idea where it comes from (it is only
# "read" in every context in the module). That means either
# buggy code, or that it got dragged in via "import *". Assume
# the latter and don't obfuscate it.
if id not in topsymtab.get_identifiers():
return False
topsym = topsymtab.lookup(id)
# XXX: See above:
if id in debug_symbols:
print >>sys.stderr, " Imported (G):", topsym.is_imported()
print >>sys.stderr, " Parameter (G):", topsym.is_parameter()
print >>sys.stderr, " Global (G):", topsym.is_global()
print >>sys.stderr, " Local (G):", topsym.is_local()
# Explicit imports are a clear no
if topsym.is_imported():
return False
# "Local" really means "written to", or "declared". So a
# global that is not "local" in the global symbol table is
# something that was created in another scope. This can happen
# in two cases:
#
# a) Imported via *
#
# b) Created via "global foo" inside a function
#
# We want to obfuscate b), but not a). But we cannot tell which
# is which, so just leave both alone.
if not topsym.is_local():
return False
# This is something we declared, so obfuscate unless it is
# part of the module API.
return id not in self.pubapi
# If it's not global, nor local, then it must come from a
# containing scope (e.g. function inside another function).
if not s.is_local():
# Any more scopes to try?
if len(symtabs) <= 2:
raise RuntimeError("Symbol '%s' is not present on any scope" % id)
return self.should_obfuscate(orig_id, symtabs[:-1])
# Local symbols are handled differently depending on what
# our current scope is.
tabtype = tab.get_type()
if tabtype == "module":
# Toplevel. Check with pubapi.
return id not in self.pubapi
elif tabtype == "function":
# Function/method. Always OK.
return True
elif tabtype == "class":
# This is a class method/variable (or, perhaps, a class in a class)
# FIXME: We cannot obfuscate methods right now,
# because we cannot handle calls like obj.meth(),
# since we do not know the type of obj.
return False
else:
raise RuntimeError("Unknown scope '%s' for symbol '%s'" % (tabtype, id))
def handle_funcdef(self, elements, symtabs):
# funcdef: 'def' NAME parameters ':' suite
# elements is something like:
# (259, (1, 'def', 6), (1, 'f', 6), (260, ...
name = elements[2]
assert name[0] == token.NAME
id = name[1]
line = name[2]
obfuscate = self.should_obfuscate(id, symtabs)
self.addToNames(line, id, obfuscate)
tab = symtabs[-1]
orig_id = id
id = self.mangle_name(symtabs, id)
functabs = tab.lookup(id).get_namespaces()
# Mangled names mess up the association with the symbol table, so
# we need to find it manually
if len(functabs) == 0:
functabs = []
for child in tab.get_children():
if child.get_name() == orig_id:
functabs.append(child)
for node in elements:
self.walk(node, symtabs + functabs)
def handle_varargslist(self, elements, symtabs):
# varargslist: (fpdef ['=' test] ',')* ('*' NAME [',' '**' NAME] | '**' NAME) ...
# elements is something like:
# (261, (262, (1, 'XXX', 37)), (12, ',', 37), (262, (1, 'bar', 38)), (22, '=', 38), (292, (293, (294,
# The purpose of this method is to find vararg and kwarg names
# (which are not fpdefs).
tab = symtabs[-1]
for tok in elements:
if type(tok) != types.TupleType:
continue
toktype = tok[0]
if toktype == symbol.test:
# This is a "= test" expression
for node in tok:
# The [:-1] is because we actually are not in the
# functions scope yet.
self.walk(node, symtabs[:-1])
elif toktype == token.NAME:
# This is either an "*args" or an "**kwargs". We could
# in theory obfuscate these as they cannot be referenced
# directly by the caller. However, we currently have no
# idea of telling that these are special when we hit the
# references to them. So for now we treat them as we
# would any other argument.
id = tok[1]
line = tok[2]
obfuscate = self.should_obfuscate(id, symtabs)
self.addToNames(line, id, obfuscate)
elif toktype == symbol.fpdef:
self.handle_fpdef(tok, symtabs)
else:
assert(toktype in [token.STAR, token.DOUBLESTAR,
token.COMMA, token.EQUAL])
def handle_fpdef(self, elements, symtabs):
# fpdef: NAME | '(' fplist ')'
# elements is something like:
# (262, (1, 'self', 13))
name = elements[1]
assert name[0] == token.NAME
id = name[1]
line = name[2]
obfuscate = self.should_obfuscate(id, symtabs)
self.addToNames(line, id, obfuscate)
for node in elements:
self.walk(node, symtabs)
def handle_import_as_name(self, elements, symtabs):
# import_as_name: NAME [NAME NAME]
# elements is something like:
# (279, (1, 'format_tb', 11))
# or
# (279, (1, 'format_tb', 11), (1, 'as', 11), (1, 'ftb', 11))
name1 = elements[1]
assert name1[0] == token.NAME
id1 = name1[1]
line1 = name1[2]
self.addToNames(line1, id1, 0)
if len(elements) > 2:
assert len(elements) == 4
name2 = elements[2]
assert name2[0] == token.NAME
id2 = name2[1]
assert id2 == "as"
line2 = name2[2]
self.addToNames(line2, id2, 0)
name3 = elements[3]
assert name3[0] == token.NAME
id3 = name3[1]
line3 = name3[2]
# FIXME: Later, obfuscate if scope/pubabi etc OK
self.addToNames(line3, id3, 0)
self.modnames.append(id3)
for node in elements:
self.walk(node, symtabs)
def handle_dotted_as_name(self, elements, symtabs):
# dotted_as_name: dotted_name [NAME NAME]
# elements is something like:
# (280, (281, (1, 'os', 2)))
# or
# (280, (281, (1, 'traceback', 11)), (1, 'as', 11), (1, 'tb', 11))
# handle_dotted_name takes care of dotted_name
dotted_name = elements[1]
modname = dotted_name[1]
assert modname[0] == token.NAME
mod_id = modname[1]
mod_line = modname[2]
self.addToNames(mod_line, mod_id, 0)
self.modnames.append(mod_id)
if len(elements) > 2:
# import foo as bar ...
assert len(elements) == 4
asname = elements[2]
assert asname[0] == token.NAME
asid = asname[1]
assert asid == "as"
asline = asname[2]
self.addToNames(asline, asid, 0)
name = elements[3]
assert name[0] == token.NAME
id = name[1]
line = name[2]
# FIXME: Later, obfuscate if scope/pubabi etc OK
self.addToNames(line, id, 0)
self.modnames.append(id)
for node in elements:
self.walk(node, symtabs)
def handle_dotted_name(self, elements, symtabs):
# dotted_name: NAME ('.' NAME)*
# elements is something like:
# (281, (1, 'os', 2))
# or
# (281, (1, 'compiler', 11), (23, '.', 11), (1, 'ast', 11))
# or
# (281, (1, 'bike', 11), (23, '.', 11), (1, 'bikefacade', 11), (23, '.', 11), (1, 'visitor', 11))
name = elements[1]
assert name[0] == token.NAME
id = name[1]
line = name[2]
self.addToNames(line, id, 0)
# Sequence length should be even
assert (len(elements) % 2 == 0)
for x in range(2, len(elements), 2):
dot = elements[x]
name = elements[x+1]
assert dot[0] == token.DOT
assert name[0] == token.NAME
id = name[1]
line = name[2]
self.addToNames(line, id, 0)
for node in elements:
self.walk(node, symtabs)
def handle_global_stmt(self, elements, symtabs):
# global_stmt: 'global' NAME (',' NAME)*
# elements is something like:
# (282, (1, 'global', 41), (1, 'foo', 41))
# or
# (282, (1, 'global', 32), (1, 'aaaa', 32), (12, ',', 32), (1, 'bbbb', 32))
gname = elements[1]
assert gname[0] == token.NAME
gid = gname[1]
assert gid == "global"
name1 = elements[2]
assert name1[0] == token.NAME
id1 = name1[1]
line1 = name1[2]
obfuscate = self.should_obfuscate(id1, symtabs)
self.addToNames(line1, id1, obfuscate)
# Sequence length should be odd
assert (len(elements) % 2)
for x in range(3, len(elements), 2):
comma = elements[x]
name = elements[x+1]
assert comma[0] == token.COMMA
assert name[0] == token.NAME
id = name[1]
line = name[2]
obfuscate = id not in self.pubapi
self.addToNames(line, id, obfuscate)
for node in elements:
self.walk(node, symtabs)
def handle_atom(self, elements, symtabs):
# atom: ... | NAME | ...
# elements is something like:
# (305, (1, 'os', 15))
name = elements[1]
if name[0] == token.NAME:
id = name[1]
line = name[2]
obfuscate = self.should_obfuscate(id, symtabs)
self.addToNames(line, id, obfuscate)
for node in elements:
self.walk(node, symtabs)
def handle_trailer(self, elements, symtabs):
# trailer: ... | '.' NAME
# elements is something like:
# (308, (23, '.', 33), (1, 'poll', 33))
trailer = elements[1]
if trailer[0] == token.DOT:
name = elements[2]
assert name[0] == token.NAME
id = name[1]
line = name[2]
# Cannot obfuscate these as we have no idea what the base
# object is.
self.addToNames(line, id, 0)
for node in elements:
self.walk(node, symtabs)
def handle_classdef(self, elements, symtabs):
# classdef: 'class' NAME ['(' testlist ')'] ':' suite
# elements is something like:
# (316, (1, 'class', 48), (1, 'SuperMyClass', 48), (11, ':', 48),
name = elements[2]
assert name[0] == token.NAME
id = name[1]
line = name[2]
obfuscate = self.should_obfuscate(id, symtabs)
self.addToNames(line, id, obfuscate)
aftername = elements[3]
aftername2 = elements[4]
# Should be either a colon or left paren
assert aftername[0] in (token.COLON, token.LPAR)
if aftername[0] == token.LPAR and aftername2[0] != token.RPAR:
# This class is inherited
testlist = elements[4]
assert testlist[0] == symbol.testlist
# Parsing of testlist should be done in the original scope
for node in testlist:
self.walk(node, symtabs)
elements = elements[5:]
tab = symtabs[-1]
classtab = tab.lookup(id).get_namespace()
for node in elements:
self.walk(node, symtabs + [classtab])
def handle_argument(self, elements, symtabs):
# argument: [test '='] test # Really [keyword '='] test
# elements is like:
# (318, (292, (293, (294, (295, (297, (298, (299, (300, (301,
# (302, (303, (304, (305, (3, '"SC_OPEN_MAX"', 15
# Keyword argument?
if len(elements) >= 4:
# keyword=test
# FIXME: A bit ugly...
if sys.hexversion >= 0x2040000:
keyword = elements[1][1][1][1][1][1][1][1][1][1][1][1][1][1][1]
else:
keyword = elements[1][1][1][1][1][1][1][1][1][1][1][1][1][1]
assert keyword[0] == token.NAME
keyword_id = keyword[1]
keyword_line = keyword[2]
# Argument names have to be in the clear as we cannot track all
# callers. See should_obfuscate().
self.addToNames(keyword_line, keyword_id, False)
# Let the obfuscator continue handling the value
elements = elements[3]
for node in elements:
self.walk(node, symtabs)
def handle_lambdef(self, elements, symtabs):
# lambdef: 'lambda' [varargslist] ':' test
# elements is like:
# (307, (1, 'lambda', 588), (261, (262, (1, 'e', 588))), (11, ':', 588)
# or
# (307, (1, 'lambda', 40), (11, ':', 40), (292 ...
if elements[2][0] == token.COLON:
# There are no lambda arguments. Simple!
# We still need to create a LambdaSymTable though since we
# rely on some magic lookup that it does.
test = elements[3]
lambdatab = LambdaSymTable(symtabs, [])
for node in test:
self.walk(node, symtabs + [lambdatab])
else:
# The more common case: You have a varargslist.
varargslist = elements[2]
# Part 1: Deal with varargslist. Fetch the names of the
# arguments. Construct a LambdaSymTable.
arguments = self.get_varargs_names(varargslist)
for line, name in arguments:
self.addToNames(line, name, 1)
argnames = [e[1] for e in arguments]
lambdatab = LambdaSymTable(symtabs, argnames)
# Part 2: Parse the 'test' part, using the LambdaSymTable.
test = elements[4]
for node in test:
self.walk(node, symtabs + [lambdatab])
def handle_decorator(self, elements, symtabs):
# decorator: '@' NAME parameters
# elements is something like:
# (259, (50, '@', 39), (288, (1, 'f', 39)), (4, '', 39))
name = elements[2][1]
assert name[0] == token.NAME
id = name[1]
line = name[2]
obfuscate = self.should_obfuscate(id, symtabs)
self.addToNames(line, id, obfuscate)
for node in elements:
self.walk(node, symtabs)
def get_varargs_names(elements):
"""Extract all argument names and lines from varargslist"""
result = []
next_is_name = False
for tok in elements:
if type(tok) != types.TupleType:
continue
toktype = tok[0]
if next_is_name:
assert tok[0] == token.NAME
id = tok[1]
line = tok[2]
result.append((line, id))
next_is_name = False
elif toktype in [token.STAR, token.DOUBLESTAR]:
next_is_name = True
elif toktype == symbol.fpdef:
result.extend(CSTWalker.get_fpdef_names(tok))
return result
get_varargs_names = staticmethod(get_varargs_names)
def get_fpdef_names(elements):
"""Extract all argument names from fpdef"""
result = []
# We are not interested in terminal tokens
if type(elements) != types.TupleType:
return result
if token.ISTERMINAL(elements[0]):
return result
name = elements[1]
assert name[0] == token.NAME
id = name[1]
line = name[2]
result.append((line, id))
for node in elements:
result.extend(CSTWalker.get_fpdef_names(node))
return result
get_fpdef_names = staticmethod(get_fpdef_names)
class PubApiExtractor:
def __init__(self, source_no_encoding):
ast = compiler.parse(source_no_encoding)
self.pubapi = None
self.matches = 0
compiler.walk(ast, self)
if self.pubapi == None:
# Didn't find __all__.
if conf.allpublic:
symtab = symtable.symtable(source_no_encoding, "-", "exec")
self.pubapi = filter(lambda s: s[0] != "_",
symtab.get_identifiers())
else:
self.pubapi = []
if self.matches > 1:
print >>sys.stderr, "Warning: Found multiple __all__ definitions"
print >>sys.stderr, "Using last definition"
def visitAssign(self, node):
for assnode in node.nodes:
if not isinstance(assnode, compiler.ast.AssName):
continue
if assnode.name == "__all__" \
and assnode.flags == compiler.consts.OP_ASSIGN:
self.matches += 1
self.pubapi = []
# Verify that the expression is a list
constant = isinstance(node.expr, compiler.ast.List)
if constant:
# Verify that each element in list is a Const node.
for node in node.expr.getChildNodes():
if isinstance(node, compiler.ast.Const):
self.pubapi.append(node.value)
else:
constant = False
break
if not constant:
print >>sys.stderr, "Error: __all__ is not a list of constants."
sys.exit(1)
class ColumnExtractor:
def __init__(self, source, names):
self.indent = 0
self.first_on_line = 1
# How many times have we seen this symbol on this line before?
self.symboltimes = {}
self.names = names
# Dictionary indexed on (row, column), containing name
self.result = {}
# To detect line num changes; backslash constructs doesn't
# generate any token
self.this_lineno = 1
f = StringIO.StringIO(source)
self.parse(f)
def parse(self, f):
for tok in tokenize.generate_tokens(f.readline):
t_type, t_string, t_srow_scol, t_erow_ecol, t_line = tok
assert self.this_lineno <= t_srow_scol[0]
if self.this_lineno < t_srow_scol[0]:
# Gosh, line has skipped. This must be due to an
# ending backslash.
self.this_lineno = t_srow_scol[0]
self.symboltimes = {}
if t_type in [tokenize.NL, tokenize.NEWLINE]:
self.this_lineno += 1
self.symboltimes = {}
elif t_type == tokenize.NAME:
# Make life easier on us by ignoring keywords
if keyword.iskeyword(t_string):
continue
srow = t_srow_scol[0]
scol = t_srow_scol[1]
namedict = self.names.get(srow)
if not namedict:
raise RuntimeError("Overlooked symbol '%s' on line %d column %d" % (t_string, srow, scol))
occurancelist = namedict.get(t_string)
if not occurancelist:
raise RuntimeError("Overlooked symbol '%s' on line %d column %d" % (t_string, srow, scol))
seen_times = self.saw_symbol(t_string)
if seen_times > len(occurancelist):
raise RuntimeError("Overlooked symbol '%s' on line %d column %d" % (t_string, srow, scol))
if occurancelist[seen_times]:
# This occurance should be obfuscated.
assert self.result.get((srow, scol)) == None
self.result[(srow, scol)] = t_string
def saw_symbol(self, name):
"""Update self.symboltimes, when we have seen a symbol
Return the current seen_times for this symbol"""
seen_times = self.symboltimes.get(name, -1)
seen_times += 1
self.symboltimes[name] = seen_times
return seen_times
class TokenPrinter:
AFTERCOMMENT = 0
INSIDECOMMENT = 1
BEFORECOMMENT = 2
def __init__(self, source, names, filename=None):
self.indent = 0
self.first_on_line = 1
self.symboltimes = {}
self.names = names
self.nametranslator = NameTranslator()
# Pending, obfuscated noop lines. We cannot add the noop lines
# until we know what comes after.
self.pending = []
self.pending_indent = 0
# To detect line num changes; backslash constructs doesn't
# generate any token
self.this_lineno = 1
self.pending_newlines = 0
# Skip next token?
self.skip_token = 0
# Keep track of constructions that can span multiple lines
self.paren_count = 0
self.curly_count = 0
self.square_count = 0
# Comment state. One of AFTERCOMMENT, INSIDECOMMENT, BEFORECOMMENT
if conf.firstcomment:
self.commentstate = TokenPrinter.AFTERCOMMENT
else:
self.commentstate = TokenPrinter.BEFORECOMMENT
f = StringIO.StringIO(source)
self.play(f, filename)
def play(self, f, filename):
for tok in tokenize.generate_tokens(f.readline):
t_type, t_string, t_srow_scol, t_erow_ecol, t_line = tok
#print >>sys.stderr, "TTTT", tokenize.tok_name[t_type], repr(t_string), self.this_lineno, t_srow_scol[0]
if t_type == tokenize.OP:
if t_string == "(":
self.paren_count += 1
elif t_string == ")":
self.paren_count -= 1
elif t_string == "{":
self.curly_count += 1
elif t_string == "}":
self.curly_count -= 1
elif t_string == "[":
self.square_count += 1
elif t_string == "]":
self.square_count -= 1
assert self.paren_count >= 0
assert self.curly_count >= 0
assert self.square_count >= 0
if self.skip_token:
self.skip_token = 0
continue
# Make sure we keep line numbers
# line numbers may not decrease
assert self.this_lineno <= t_srow_scol[0]
if self.this_lineno < t_srow_scol[0]:
# Gosh, line has skipped. This must be due to an
# ending backslash.
self.pending_newlines += t_srow_scol[0] - self.this_lineno
self.this_lineno = t_srow_scol[0]
if t_type in [tokenize.NL, tokenize.NEWLINE]:
for x in range(self.pending_newlines):
if conf.blanks != conf.KEEP_BLANKS:
self.pending.append(self.gen_noop_line() + "\n")
self.pending_indent = self.indent
else:
#sys.stdout.write("\n")
filename.write("\n")
self.pending_newlines = 0
if t_type == tokenize.NL:
if self.first_on_line and conf.blanks != conf.KEEP_BLANKS:
self.pending.append(self.gen_noop_line() + "\n")
self.pending_indent = self.indent
else:
#sys.stdout.write("\n")
filename.write("\n")
self.this_lineno += 1
if self.commentstate == TokenPrinter.INSIDECOMMENT:
self.commentstate = TokenPrinter.AFTERCOMMENT
elif t_type == tokenize.NEWLINE:
self.first_on_line = 1
self.this_lineno += 1
#sys.stdout.write("\n")
filename.write("\n")
if self.commentstate == TokenPrinter.INSIDECOMMENT:
self.commentstate = TokenPrinter.AFTERCOMMENT
elif t_type == tokenize.INDENT:
self.indent += conf.indent
elif t_type == tokenize.DEDENT:
self.indent -= conf.indent
elif t_type == tokenize.COMMENT:
if self.commentstate == TokenPrinter.BEFORECOMMENT:
self.commentstate = TokenPrinter.INSIDECOMMENT
if self.first_on_line:
if self.commentstate in [TokenPrinter.BEFORECOMMENT, TokenPrinter.INSIDECOMMENT]:
# Output comment. Only old Python includes newline.
if sys.hexversion >= 0x2040000:
t_string += "\n"
self.line_append(t_string, filename)
elif conf.blanks != conf.KEEP_BLANKS:
self.pending.append(self.gen_noop_line() + "\n")
self.pending_indent = self.indent
else:
#sys.stdout.write("\n")
filename.write("\n")
self.this_lineno += 1
else:
if sys.hexversion >= 0x2040000:
#sys.stdout.write("\n")
filename.write("\n")
self.this_lineno += 1
# tokenizer does not generate a NEWLINE after comment
self.first_on_line = 1
if sys.hexversion >= 0x2040000:
# tokinizer generates NL after each COMMENT
self.skip_token = 1
elif t_type == tokenize.STRING:
if self.first_on_line:
# Skip over docstrings
# FIXME: This simple approach fails with:
# "foo"; print 3
if self.paren_count > 0 or \
self.curly_count > 0 or \
self.square_count > 0:
self.line_append(t_string,filename)
self.this_lineno += t_string.count("\n")
else:
self.skip_token = 1
else:
self.line_append(t_string,filename)
self.this_lineno += t_string.count("\n")
elif t_type == tokenize.NAME:
(srow, scol) = t_srow_scol
if self.names.get(t_srow_scol):
t_string = self.nametranslator.get_name(t_string)
self.line_append(t_string,filename)
else:
self.line_append(t_string,filename)
def line_append(self, s, filename):
if self.pending:
indent = max(self.indent, self.pending_indent)
self.pending = map(lambda row: " "*indent + row,
self.pending)
if conf.blanks == conf.OBFUSCATE_BLANKS:
#sys.stdout.write(''.join(self.pending))
filename.write(''.join(self.pending))
self.pending = []
if self.first_on_line:
#sys.stdout.write(" "*self.indent)
filename.write(" "*self.indent)
else:
#sys.stdout.write(" "*TOKENBLANKS)
filename.write(" "*TOKENBLANKS)
#sys.stdout.write(s)
filename.write(s)
self.first_on_line = 0
def gen_noop_line(self):
if self.paren_count > 0 or \
self.curly_count > 0 or \
self.square_count > 0:
result = "# "
else:
testint = random.randint(1, 100)
result = "if %d - %d: " % (testint, testint)
num_words = random.randint(1, 6)
for x in range(num_words - 1):
op = random.choice((".", "/", "+", "-", "%", "*"))
result += self.nametranslator.get_bogus_name() + " %s " % op
result += self.nametranslator.get_bogus_name()
return result
def strip_encoding(source):
f = StringIO.StringIO(source)
lines = [f.readline(), f.readline()]
buf = ""
for line in lines:
if re.search("coding[:=]\s*([-\w_.]+)", line):
if line.strip().startswith("#"):
# Add a empty line instead
buf += "\n"
else:
# Gosh, not a comment.
print >>sys.stderr, "ERROR: Python 2.3 with coding declaration in non-comment!"
sys.exit(1)
else:
# Coding declaration not found on this line; add
# unmodified
buf += line
return buf + f.read()
def usage():
print >>sys.stderr, """
Usage:
pyobfuscate [options] <file>
Options:
-h, --help Print this help.
-i, --indent <num> Indentation to use. Default is 1.
-s, --seed <seed> Seed to use for name randomization. Default is
system time.
-r, --removeblanks Remove blank lines, instead of obfuscate
-k, --keepblanks Keep blank lines, instead of obfuscate
-f, --firstcomment Remove first block of comments as well
-a, --allpublic When __all__ is missing, assume everything is public.
The default is to assume nothing is public.
-v, --verbose Verbose mode.
"""
class Configuration:
KEEP_BLANKS = 0
OBFUSCATE_BLANKS = 1
REMOVE_BLANKS = 2
def __init__(self):
try:
opts, args = getopt.getopt(sys.argv[1:], "hi:s:rkfav",
["help", "indent=", "seed=", "removeblanks",
"keepblanks", "firstcomment", "allpublic",
"verbose"])
if len(args) != 1:
raise getopt.GetoptError("A filename is required", "")
except getopt.GetoptError, e:
print >>sys.stderr, "Error:", e
usage()
sys.exit(2)
self.file = args[0]
self.indent = 1
self.seed = 42
self.blanks = self.OBFUSCATE_BLANKS
self.firstcomment = False
self.allpublic = False
self.verbose = False
for o, a in opts:
if o in ("-h", "--help"):
usage()
sys.exit()
if o in ("-i", "--indent"):
self.indent = int(a)
if o in ("-s", "--seed"):
self.seed = a
if o in ("-r", "--removeblanks"):
self.blanks = self.REMOVE_BLANKS
if o in ("-k", "--keepblanks"):
self.blanks = self.KEEP_BLANKS
if o in ("-f", "--firstcomment"):
self.firstcomment = True
if o in ("-a", "--allpublic"):
self.allpublic = True
if o == ("-v", "--verbose"):
self.verbose = True
def main(file):
global conf
conf = Configuration()
random.seed(conf.seed)
source = open(conf.file, 'rU').read()
if sys.version_info[:2] == (2, 3):
# Enable Python 2.3 workaround for bug 898271.
source_no_encoding = strip_encoding(source)
else:
source_no_encoding = source
# Step 1: Extract __all__ from source.
pae = PubApiExtractor(source_no_encoding)
# Step 2: Walk the CST tree. The result of this step is a
# dictionary indexed on line numbers, which contains dictionaries
# indexed on symbols, which contains a list of the occurances of
# this symbol on this line. A 1 in this list means that the
# occurance should be obfuscated; 0 means not. Example: {64:
# {'foo': [0, 1], 'run': [0]}
cw = CSTWalker(source_no_encoding, pae.pubapi)
# Step 3: We need those column numbers! Use the tokenize module to
# step through the source code to gather this information. The
# result of this step is a dictionary indexed on tuples (row,
# column), which contains the symbol names. Example: {(55, 6):
# 'MyClass'} Only symbols that should be replaced are returned.
# (This step could perhaps be merged with step 4, but there are
# two reasons for not doing so: 1) Make each step less
# complicated. 2) If we want to use BRM some day, then we'll need
# the column numbers.)
ce = ColumnExtractor(source, cw.names)
# Step 4: Play the tokenizer game! Step through the source
# code. Obfuscate those symbols gathered earlier. Change
# indentation, blank lines etc.
TokenPrinter(source, ce.result, (file if file else None))
# Step 5: Output a marker that makes it possible to recognize
# obfuscated files
print "Obfuscate Complete"
if __name__ == "__main__":
# set the file source
soure_file_path = unicode(r"F:\Editor.py", "utf-8")
sys.argv.append(soure_file_path)
# new file with time stamp
output_time = time.strftime('%Y-%m-%d %H%M%S',time.localtime(time.time()))
output_file_path = unicode(r'F:\Editor_'+ output_time+'.py', "utf-8")
# start obfuscate
with open(output_file_path, "w+") as file:
if file:
main(file)
指定输入和输出文件,然后运行即可得到混淆后的结果
One-lined Python
除了上面的混淆,进行变量名替换这种方式,还有一种方式就是一行python。
简单说他就是利用lambda等语言特性,把整个函数变成一行代码
官网:
http://onelinepy.herokuapp.com/ https://github.com/csvoss/onelinerizer
可以从下面看到他的效果,简单的函数一下子变复杂了。
源码:
## YOUR CODE HERE
def f(x):
return x * 4
y = f(5)
print y
一行化以后的代码:
(lambda __print, __g: [[(__print(y), None)[1] for __g['y'] in [(f(5))]][0] for __g['f'], f.__name__ in [(lambda x: (lambda __l: [(__l['x'] * 4) for __l['x'] in [(x)]][0])({}), 'f')]][0])(__import__('__builtin__').__dict__['print'], globals())
但是这个也有很多问题,他直接对文件进行一行化的时候基本很难通过运行,建议只是对某个函数或者若干函数这样操作,又或者是配合上面的混淆一起使用,得到的结果会非常难以复原查看。
base64
还有一种掩耳盗铃的方式,就是使用base64先把代码全部转化为字节码,然后在代码里再把它解码变成code string,然后交由编译器执行.
import base64
mycode = "print 'Hello World!'"
secret = base64.b64encode(mycode)
print(secret)
mydecode = base64.b64decode(secret)
eval(compile(mydecode,'<string>','exec'))
others
github 上还有不少混淆的工程,但是呢需要注意一下,这里经常会遇到各种错误,这些错误大部分都产生于编码以及字符集出错,导致的.比如我用的是UTF-8以及GBK,其中含有中文(用于界面部分),然后就出现了各种报错
比如这个项目就会出现中文报错的情况,修改了他的config配置文件以后,总算中文不出错了,但是发现这个混淆好像有点问题,他的算法可能非常简单,能识别关键字但是不能区分变量和import的函数名,导致所有用的库函数的函数名都被当成变量来混淆了,虽然你可以在config里一个一个添加,但是如果这样的话,那不就是一个批量替换变量名的程序吗?
https://github.com/QQuick/Opy
这个项目其实从根本上改变了文件的格式,并且用了AES加密来保护你的代码,但是需要使用的人也要装其解码的工具才能正常运行
https://github.com/Falldog/pyconcrete/tree/master/example/django
当然还有直接修改python的虚拟机,非常变态,而且有局限的方式来保护代码
就这么几行代码,大部分时间都用在解决这个包的安装上了 实加密很简单的,修改Python虚拟机的代码,针对编译出pyc的部分修改下虚拟码,或者对调几个.别人死都解不出来的.这个方法是来自于(云风大侠的书<我的编程感悟>中的)缺点也很显而易见,执行时必须使用自己的修改的Python虚拟机.我的编程感悟>
Summary
总而言之,想要保护python代码,混淆是一种策略,然而更好的办法其实是不用python,用其他语言来完成同样的事情,比如c/c++或者java,他们有更好的代码加密的措施,而不会遇到这种给对方提供源码的情况
Quote
https://www.cnblogs.com/welhzh/p/5975676.html
https://www.v2ex.com/amp/t/348602
https://segmentfault.com/q/1010000000666948
https://www.zhihu.com/question/30296617
https://www.zhihu.com/question/42636207
https://weibo.com/ttarticle/p/show?id=2309404163783139439573
http://bbs.fishc.com/forum.php?mod=viewthread&tid=81593&extra=page%3D1&page=1