Recommend an XML Reader

Sir Carcass · June 2013

So at work a new system is about to be implemented where large data files we routinely have to do things with is going from pipe-delimited files to XML. We use Excel a lot for stuff like sorting and filtering the data for our own needs, basically read-only stuff. So we'll need a way to continue to do this with the XML files. Are there any good readers you can recommend? Free is great, but pay is okay if it's pretty awesome and not ridiculously priced. If it is pay, it will need some sort of trial version so we can make sure it meets our needs.

Certain things that we're looking for:

Being able to filter out records based on certain criteria
Being able to easily see a listing of the records (some sort of indexing, I guess)
Being able to rearrange the way records are displayed (so like sorting by case number, for example)

Thanks for any recommendations you can provide!

EggyToast · June 2013

Are you using an Excel-compatible DTD for the XML files? If so, why not just continue to use Excel for the front-end work?

XML is really just a text file, so there really isn't a "reader" per se. It typically goes through some level of transformation (XSLT) in order to be usable by another program/tool.

Sir Carcass · June 2013

EggyToast wrote: »

Are you using an Excel-compatible DTD for the XML files? If so, why not just continue to use Excel for the front-end work?

XML is really just a text file, so there really isn't a "reader" per se. It typically goes through some level of transformation (XSLT) in order to be usable by another program/tool.

I don't know what a DTD is, but the XML files aren't actually being used yet. The new system is currently being developed.

But basically today we have a bunch (like probably 30 or 40 a day) of separate files for various things that are going to be combined into one giant file with the order being the time the individual record was created. So if I need to go in for example and find all of the records with A Type this and B Type that, it's kind of hard to do it with a raw XML file with 250,000+ records in what is basically a random order. So we need a program that can kind of do that for us. If Excel can, then we can continue with that. I'll look into that avenue.

Another thing is that these files contain HIPAA protected information, so the program needs to be absolutely trustworthy (why I'm hesitant to just start googling "xml viewer").

Pantshandshake · June 2013

I don't know if it will do the things you want, specifically, but Excel will absolutely interact appropriately with XML data...

Sir Carcass · June 2013

Pantshandshake wrote: »

I don't know if it will do the things you want, specifically, but Excel will absolutely interact appropriately with XML data...

Yeah, I just tried our (newly upgraded) Excel (2010) with one of the test data files and it looks like it'll do exactly what we need.

Thanks everyone.

Sir Carcass · June 2013

I spoke too soon

After validating the data, this is what Excel is doing by default: We have several fields per record that can repeat as many times as necessary, but the fields names don't change (example, memberName will appear as many times as there are people). What Excel is doing is creating separate rows for each time those fields show up, repeating the unique data. So if we have 1 record with 3 people, when doing the Pivot Table it looks like 3 separate records. I'm guessing I'll need to write a schema definition or something for this? Is this even possible? Does anyway have any good resources for doing that?

EggyToast · June 2013

It really depends on the XML. The DTD defines what all the tags actually mean so that they can be used by another program. Here's an example:

My DTD says "italicize anything within an <italic> tag," while yours says "italicize anything within an <ital> tag." My XML files would <italic>look</italic> like this, while yours <ital>look</ital> like this.

If I want to create a well-structured XML file (and I do, because it's XML), I can end up with something like this

<item-group>
<item>
Item 1
</item>
<item>
Item 2
</item>
</item-group>

Because of the structure, it can appear that I'm repeating data or that it's appearing multiple times due to how the DTD dictates how everything needs to be. Even with a strict DTD, there can be variations:

<name-group>
<name>
<first-name>Eggy</first-name><last-name>Toast</last-name>
</name>
<name>
<string-name>Sir Carcass</string-name>
</name>
</name-group>

So there's two hoops -- the first is what the DTD says things should be tagged as, and the second is how things are actually tagged in the file. If the tags are used haphazardly, the value of the XML file is diminished.

Anyway, before you can figure out any sort of reader or fix, you should really figure out what DTD is being used. A normal XML file will include a callout in the top of the document stating the DTD so that it can be rendered correctly, and you can usually look them up (especially public ones, which there are quite a few). Once you know the DTD, you can download it and load it into a program like Netbeans, which will let you validate and review the XML file easily. The errors you're bumping in to may be inherent in the XML file, and you can either have your XML vendor correct them or figure out what you'll need to do in Excel.

It can be fixed in the original file relatively easily, since XML is just a text file and you can do all the normal text transformations available in, well, any scripting language. Once it's in Excel, you're stuck with what Excel can do (and while it can do a lot, I find it more confusing for this kind of stuff compared to the XML directly).

How are you getting these XML files, and who's behind the change? There should be someone at your organization that knows what's up with these files.

Sir Carcass · June 2013

EggyToast wrote: »

How are you getting these XML files, and who's behind the change? There should be someone at your organization that knows what's up with these files.

Here's a little rundown of the situation. Let's see how much I can say without getting fired.

So all day we have a bunch of different departments (CCRs, data entry, etc) doing tasks. A lot of those tasks result in certain letters being needed to be sent to clients. Our current system will create a letter request. At night, all of those requests are processed and put into data files, a separate one for each letter, and those are sent to our print vendor who prints and mails them. My department is responsible for the facilitation of those files, among other things. We validate the files and create monthly reports of various data that we send to who we contract for. The system we currently use is very old and archaic at this point, so our project is creating a new one that mimics the way other projects of our company's work. Part of that is changing the data files from pipe-delimited csv style files to XML.

The new system is what will create these files. Our programmers and Systems people are currently writing that system. They basically have the format they're going to be using set, so us getting them to alter it isn't really going to happen at this point (the rollout is in a couple of months). My department is pretty low on the food chain as far as that goes (even though we manage the print vendor). There's a daily files summary, basically, that we currently get that might not be created in the future, so I'm personally looking at other ways for my department to still do what we need to do and get that info directly from the data file. Opening it in Excel and using a pivot table would pretty much accomplish that, save for the fact that it's basically duplicating these records. We'll have to use something because our average daily data file will probably have somewhere around 30,000 letters.

An example of the structure is something like:

<letter>

<letterHeader>
<letterDate></letterDate>
<letterType></letterType>
<recordID></recordID>
...
</letterHeader>

<memberCount>3</memberCount>

<letterMember>
<memberName></memberName>
<memberDOB></memberDOB>
<memberID></memberID>
</letterMember>

<letterMember>
<memberName></memberName>
<memberDOB></memberDOB>
<memberID></memberID>
</letterMember>

<letterMember>
<memberName></memberName>
<memberDOB></memberDOB>
<memberID></memberID>
</letterMember>

</letter>

So in this example, there are 3 members on the letter. Some letters might have 1. Some might have 0. Some might have 10. When I open the file with Excel, it creates 3 separate rows for that record. Everything up to the member info is repeated, and then they are unique. Basically I need to make it so Excel will instead create new columns in that 1 row as needed to put in that member info. One suggestion I have seen while googling is to write something in VB to do it, which I can do with some research, but if there's an easier way I'd prefer that.

There is what I'm assuming is a DTD url at the top of the file, but it's apparently not public as even trying from our network, I was unable to access it. I might be able to get a copy of it, but I also might not be able to. I do have basically the structure the XML file will be in.

EggyToast · June 2013

The people above you should be doing something with this XML file. Have they made mention of any XSLTs they're implementing to transform this into a more usable form for other purposes?

To put it in layman's terms, an XSLT is a bit of programming/scripting that will turn the XML file into, for example, a pipe-delimited file that looks exactly like what you had before.

Also, the DTD url is not an HTTP URL, just a declaration statement. For example, for journal publishing you typically see something like this at the top:

"-//NLM//DTD Archiving and Interchange DTD Suite Notation Declarations v1.0 20001011//EN
Delivered as file "notat.ent"

There are sometimes links to w3c but they're for reference rather than an easy link to follow in a browser (unfortunately).

One of the issues involved with adopting XML in general is making sure you can use it for what you need. I don't think you need to change the actual XML, and since you say it's being produced internally that's actually good! It means that the validity and consistency of the files will be high, and means that once you produce a solution that works for you, you'll be able to rely on it. However, problems like what you're encountering are pretty common.

Have you spoken to the IT team about an XSLT or other transformation script to help convert the XML they're using to a more usable format for the job you're tasked with? Since it's all internal, even though you're low on the totem pole, it may simply be that they're unaware of what you're using the files for.

Sir Carcass · June 2013

EggyToast wrote: »

Have you spoken to the IT team about an XSLT or other transformation script to help convert the XML they're using to a more usable format for the job you're tasked with? Since it's all internal, even though you're low on the totem pole, it may simply be that they're unaware of what you're using the files for.

No, this is the first I've heard of XSLT. The XML file being generated is the end of the process for our system as far as I'm aware, so I'm not sure if they're as worried about reading the file. The XML file is basically solely for the print vendor. I can check with them, though. I'll also check with our print vendor to see how they plan to read it. I know it's not with Excel (they use some sort of super expensive program for creating workflows that create the different letters), but they may have something similar I can use.

Penny Arcade

Quick Links

Recommend an XML Reader

Posts