Python读取XML并修改导出

Foreword

SES的排除编译文件非常蠢，他不能将这个设置继承给其他配置文件，这就导致如果配置文件很多，每次变动需要把每个配置文件重新设置一次，手动的话很容易设置漏了，所以写个脚本来直接处理这个事情

XML

简单说现在有4个主配置，基于他们每个衍生出来2个配置，也就是一共12个配置，目标是每次只要配置这四个主配置，其他配置就能自动同步他们的排除编译文件的配置。

想了想用批处理或者shell实现，有点麻烦，还是XML，光是分析什么的写起来就很复杂，所以干脆用python写了，CI调用

XML DOM基础

DOM将XML以树状的方式进行构建或者展示，所以每个节点都有子节点或者父节点

<collection shelf="New Arrivals">
<m
   ovie title="Enemy Behind">
   <type>War, Thriller</type>
   <format>DVD</format>
   <year>2003</year>
   <rating>PG</rating>
   <stars>10</stars>
   <description>Talk about a US-Japan war</description>
</movie>
</collection>

这里collection就是一个元素或者节点，可以称之为Element或者Node

一个元素内部的称为属性，比如这里的shelf，就是属性Attribute

而movie则是collection的子节点，同理type format year rating...

这里要注意一个问题，python api中 getElementsByTagName是可以获得此节点下所有符合的Tag，无论他是子节点还是孙节点，只要符合都会返回。

python操作xml，可以直接使用内置的库，不需要额外引用。这里使用xml.dom.minidom来实现

提取排除编译配置

下面基本就能把所有具有排除编译的配置和文件、文件夹提取出来

#!/usr/bin/python
# -*- coding: UTF-8 -*-

import xml.dom.minidom

# open xml
DOMTree = xml.dom.minidom.parse("demo.emProject")
collection = DOMTree.documentElement

# test attribute
if collection.hasAttribute("Name"):
    print("find name")

project = collection.getElementsByTagName("project")[0]
folders = project.getElementsByTagName("folder")
print(len(folders))

root_dir = folders[0]
print("root child:" + str(root_dir.childNodes.length))

exclude_nodes = {}

confs = root_dir.getElementsByTagName('configuration')
if len(confs) > 0:
    print(len(confs))
    # check all configurations
    for conf in confs:
        if conf.hasAttribute("build_exclude_from_build"):
            # save config and parent node
            if conf.parentNode in exclude_nodes:
                exclude_nodes[conf.parentNode].append(conf)
            else:
                exclude_nodes[conf.parentNode] = [conf]
			
            # show exclude file
            if conf.hasAttribute("Name"):
                name = conf.parentNode.getAttribute("Name")
                if name == "":
                    name = conf.parentNode.getAttribute("file_name")
                    print(name + " exclude from " + conf.getAttribute("Name"))

深度、广度优先遍历

有可能会需要深度或者广度优先遍历，我也都随便写了个

def dfs(node):
    if node.hasChildNodes:
        for n in node.childNodes:
            # print(n.nodeType)
            # jump text node
            if (n.nodeType == 1):
                if n.getAttribute("Name") != "":
                    print(n.nodeName + " " + n.getAttribute("Name"))
                elif n.getAttribute("file_name") != "":
                    print(n.nodeName + " " + n.getAttribute("file_name"))
            dfs(n)
    else:
        return
    return


def bfs(nodes):
    if len(nodes) > 0:
        todo = []
        for n in nodes:
            if (n.nodeType == 1):
                # print(n.childNodes)
                if n.getAttribute("Name") != "":
                    print(n.nodeName + " " + n.getAttribute("Name"))
                elif n.getAttribute("file_name") != "":
                    print(n.nodeName + " " + n.getAttribute("file_name"))
                if len(n.childNodes) != 0:
                    for nc in n.childNodes:
                        todo.append(nc)
        bfs(todo)
    else:
        return
    return
    
dfs(root_dir)
bfs(root_dir.childNodes)

处理替换分支

这里main asdf就是主要分支了，其他的都是顺势而生的分支

need_CI = ["main", "asdf"]
need_Boot = ["main_boot", "asdf_boot"]

ci_branch_map = {
    "main": "main_CI",
    "asdf": "asdf_boot",
}

boot_branch_map = {
    "main": "main_boot",
    "asdf": "asdf_boot"
}

doc = DOMTree

# deal all exclude node
for node in exclude_nodes:
    type = 1
    name = node.getAttribute("Name")
    if name == "":
        type = 2
        name = node.getAttribute("file_name")

    branchs = []
    last_conf = ""
    for conf in exclude_nodes[node]:
        branch = conf.getAttribute("Name")
        #print(name + " exclude from " + branch)
        branchs.append(branch)
        last_conf = conf

    to_add = []
    for branch in branchs:
        if branch in need_CI:
            ci_branch = ci_branch_map[branch]
            if ci_branch not in branchs:
                to_add.append(ci_branch)
                #print("need add CI " + ci_branch)
        if branch in need_Boot:
            boot_branch = boot_branch_map[branch]
            if boot_branch not in branchs:
                to_add.append(boot_branch)
                #print("need add boot " + boot_branch)

    # get tab
    tab = last_conf.previousSibling
    for branch in to_add:
        n = doc.createElement("configuration")

        at_name = doc.createAttribute("Name")
        at_name.value = branch
        n.setAttributeNode(at_name)

        ex = doc.createAttribute("build_exclude_from_build")
        ex.value = "Yes"
        n.setAttributeNode(ex)

        new = node.insertBefore(n, last_conf)
		
        # for /n
        newline = doc.createTextNode("")
        node.insertBefore(newline, last_conf)

        # for indent
        new_tab = tab.cloneNode(deep=False)
        node.insertBefore(new_tab, last_conf)
        
f = open("demo1.emProject", "w")
doc.writexml(f)
f.close()        

这里可能最后的地方插入节点的地方有点奇怪。

XML作为一种标记性文本，每个换行和每个制表符或者每个空格，在XML中都是作为一个标记的，也就是说他们也是一种节点（Text类型）。前面没理解这个概念，弄了半天发现输出的XML中，新增节点都是连着的，怎么都不会自动换行，而且前面也没有空格。

属性也是，也有这个概念，用来区分换行和不换行

加入Text Node以后就正常了

Summary

还有另外两种方式操作XML，可能比这种更简单一些

Quote

https://docs.python.org/zh-cn/3/library/xml.dom.html#document-objects

https://docs.python.org/zh-cn/3/library/xml.dom.minidom.html

FEATURED TAGS

RaspberryPi 嵌入式 Git 脚本 python LeetCode C++ APM FreeRTOS Markdown Embedded SD Linux Vim Ubuntu Tools STM32 Maya LPWAN Graph Theory Algorithm PathFind OMPL VPS QT Router JS Chrome Tampermonkey API Java Spring MySql Springboot Docker V2ray TTRSS Nintendo Switch Trace Crack BLHeli DSHOT ESC Music C# EasyCon Blog 杂谈 Proxy UAV GuinnessWorldRecords NAS 群晖 ZeroTier Typora Map 旅游 Log JSON Cython Equip Goods Share DMX512 Blender Game AP Network CloudFlare DIY WIFI Camera Life Diablo Sensor SES QQ Bot Python Vmq Jenkins 米家 ESP32 Software C MT793x NXP CH32 OpenWrt Onion Copilot Cursor Investment ChatGPT SFX Debug RouterOS Mikrotik GitLab Drone OpenAI VS Code 管理 build Kconfig CMake Su7 Ultra Car AI MCP LLM Art 审美 Skills Agent

xml.dom.minidom