Identifying text nodes with MSXML

Our new look site with work, BusinessMentor.com.au has finally been launched. Although there are a few small "tie-off" type tasks to be completed (including getting the menu to actually look right in IE for Mac - whole new story there though), the bulk of the site is working great. With almost 2000 tests of the system, we're confident that things are running as smoothly as you can expect for a new system.

Earlier in the week, however, I was not quite as happy as I am now.

Part of the back end of our system relies on a Visual Basic application to edit the content of some XML files. Although not a showstopper, it's a fairly important part of the system. Of course, with all of the more pressing things to be sorted, this was pushed into the back recesses of my thoughts. After all, it's VB, and I wanted to put it off for as long as possible (-1 Troll).

Besides, it's just XML - XSLT and XPath should make it trivial to work with.

So I began work on the editor. It's scope was fairly limited - just able to edit textual content of existing elements. Not wanting to build a full visualisation control myself, and wanting our custom schema to appear as "realistic" as possible, I decided to use XSLT to get the IE ActiveX control to automagically transform our XML file into some pretty XHTML. So as that it was easy to edit the different parts of the document, I wrapped each text() node in an A(nchor) element, with a specifically crafted location href. I could then pick up this href and parse it in VB when the user clicked the region.

Initially, I was constructing a location which was a simplified XPath to the node. Something like :

/document/flow[position()=2]/flow-content/p[position()=3]/text()[position()=3]

So, pretty chuffed at the half an hour or so I'd spent building the XSLT sheet to specify the location, I sat down to get VB to edit the node.

That's when I hit the first problem. It seems the particular MSXML version (I thought it was 3, although sources suggest it may not have been) didn't support either position() or count() (my second try, count()ing the preceding-sibling as a predicate).

Striking those two off my Possible Solutions list, I decided that the easiest route would be to use VB to walk over the DOM.

It ended up with the XSLT sheet (which, does support the functions) pumping out the link location to specify a pathway of child numbers to the element. Something like :

<foo>
	<bar>
	<bletch><b>finding this</b></bletch>
</foo>

Location is : 1.0.0

To find the element in question was a trivial recursive function in VB, and just as trivial (using preceding-sibling) to generate the "path" in XSLT. Finally able to uniquely identify text() nodes, the rest of the editor was simple. After all, it's just XML.

Looking back now, an easier solution may have been to check if I had the latest version of MSXML (mind you, I haven't confirmed the functions are actually available in the latest MSXML4...). Only 3 hours of sleep does tend to warp your perspective of reality though. Besides, the functions were available to XSLT in IE, which still confuses me greatly (as one would expect they both use MSXML).

Regardless, our new editor is a great success. Not only does it provide a rich visualisation of the XML (and looks very purty, even if I do say so myself) but is powerful enough to edit any of the content in the documents.

I've said it before, and I'll say it again - this XML stuff will never catch on.