How I learned to stop worrying and love XML

I want my research web site to have all of the following:

Individual publication abstract pages, like this

Resumé for more traditional job applications

All of these outputs are based on what you can think of as a single relational database. I wanted to be able to keep these documents up-to-date without having to make more changes than necessary. Here are a few potential problems that I wanted to avoid, and that you probably want to avoid if you're in the same situation:

Producing the same document in different formats. I'm referring to creating several versions of what is conceptually the same document, with the same contents and layout. This one is relatively easy to solve, with a number of formats that have translators to all of the formats I wanted.

Presenting your publication information at different levels of detail. From the same publication database, I want to build all of the above documents. The level of detail ranges from a brief citation for each publication on my CV to an abstract page dedicated to each.

Handling situations like: A coauthor's home page moves to a new URL. I'm a hyperlink addict, so I need to link everything I can from my CV and other pages, including coauthors' web sites from their names. What happens when one of their URLs changes? I want to change it in one place and have the change reflected everywhere.

I'd been hearing about (and trash talking!) XML for a while, so I thought this would be a good opportunity to come to know the enemy better. With some help from W3Schools, I learned about XSLT, an XML-based declarative language for transforming XML documents.

Using XSLT, I created a small system to do what I've described. It uses entirely what I'd call domain-specific declarative languages, namely XSLT and a tiny Makefile. It's not much to look at, and I haven't spent overly much effort factorizing it for generality for other people, but I figure someone else might like to use it as a base. Here's the quick tour.

How It Works

All of the actual "data" lives in a single XML file. A number of XSLT stylesheets control building the different documents:

HTML CV

LaTeX CV

Publication abstract pages

There is a LaTeX style file that the LaTeX stylesheets use. This is based on Andrew McNabb's guide.

Finally, a small Makefile controls the build process. You can find the complete set of files here, or download the whole set as a tarball.

My Prognosis

I still don't think XML is a particularly good common data format; I think values for an ML-style (as in Standard ML and OCaml) type system are superior. If someone in the know disagrees, I'd love to hear how I'm deceived.

However, there sure are a lot of useful tools available today for XML. I think it would be worthwhile to try to duplicate them in some sort of type theoretical setting. (Maybe someone already has, but at least I now know more about the XML hub-bub.)

How to Build Your CV and More with XML

How It Works

My Prognosis