If you are using python packages like xmltodict or yaml to write and read your own XML and yaml files, you probably don’t need to know this. But if you are reading someone else’s files, here is something to be aware of.
This week I had to process an XML files in python. No problem, I thought, I’ll use a python package like xmltodict to translate the XML into a dictionary variable. Then I could edit it and print out a new file with the changes. Sounds easy!
Well, first off, it wasn’t too easy: the nesting was horrendous. However, with some help from VS Code, I was able to power through and get the value I want.
Here’s where I got burned. I wanted to change the text in the XML file, so I had a statement like this to read it
mytext = python_dict["graphml"]["graph"]["node"][nodecount]["graph"]["node"][i]["data"]["y:ShapeNode"]["y:NodeLabel"]["#text"]
and then a simple statement like this to change it to lower text:
python_dict["graphml"]["graph"]["node"][nodecount]["graph"]["node"][i]["data"]["y:ShapeNode"]["y:NodeLabel"]["#text"] = mytext.lower()
Very basic.
Now this particular file is an XML file that has a graphml extension, which allows an editor like YED to read it. YED can read the original file, but it turns out xmltodict writes the file in such a way that the YED editor can no longer see the text. I don’t know why.
I spent hours working on it until I finally gave up. I wrote a much dumber program that read through the graphml file a line at a time and changed it the way I wanted to. No fancy packages involved. Dumb but it worked.
This is the second time this year a package has given me problems. In late January I wrote some code to parse yaml files for a client to extract information for them and to produce a report. Again, there is a package to do that: yaml. Which is….good…except when the yaml it is processing it is poorly written. Which this yaml was.
Again, I spent hours linting the yaml and in some cases having to forgo certain files because they were poorly constructed. What should have been easy — read the yaml file, transform it, write a new yaml file — was instead very difficult.
And that’s often the problem with yaml files and XML and JSON files: they are often handcrafted and inconsistent. They MAY be good enough for whatever tool is ingesting them, but not good enough for the packages you want to use to process them.
I think those packages are great if you are making the input files. But if you are processing someone elses, caveat emptor (caveat programmer?).











