Saturday, July 19, 2014

Converting XML to CSV using XSLT 1.0

This post shows you how can convert a simple XML file to CSV using XSLT.

Consider the following sample XML:

<library>
  <book>
    <author>Dan Simmons</author>
    <title>Hyperion</title>
    <publishDate>1989</publishDate>
  </book>
  <book>
    <author>Douglas Adams</author>
    <title>The Hitchhiker's Guide to the Galaxy</title>
    <publishDate>1979</publishDate>
  </book>
</library>

This is the desired CSV output:

author,title,publishDate
Dan Simmons,Hyperion,1989
Douglas Adams,The Hitchhiker's Guide to the Galaxy,1979

The following XSL Style Sheet (compatible with XSLT 1.0) can be used to transform the XML into CSV. It is quite generic and can easily be configured to handle different xml elements by changing the list of fields defined ar the beginning.

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="text" />

  <xsl:variable name="delimiter" select="','" />

  <!-- define an array containing the fields we are interested in -->
  <xsl:variable name="fieldArray">
    <field>author</field>
    <field>title</field>
    <field>publishDate</field>
  </xsl:variable>
  <xsl:param name="fields" select="document('')/*/xsl:variable[@name='fieldArray']/*" />

  <xsl:template match="/">

    <!-- output the header row -->
    <xsl:for-each select="$fields">
      <xsl:if test="position() != 1">
        <xsl:value-of select="$delimiter"/>
      </xsl:if>
      <xsl:value-of select="." />
    </xsl:for-each>

    <!-- output newline -->
    <xsl:text>&#xa;</xsl:text>

    <xsl:apply-templates select="library/book"/>
  </xsl:template>

  <xsl:template match="book">
    <xsl:variable name="currNode" select="." />

    <!-- output the data row -->
    <!-- loop over the field names and find the value of each one in the xml -->
    <xsl:for-each select="$fields">
      <xsl:if test="position() != 1">
        <xsl:value-of select="$delimiter"/>
      </xsl:if>
      <xsl:value-of select="$currNode/*[name() = current()]" />
    </xsl:for-each>

    <!-- output newline -->
    <xsl:text>&#xa;</xsl:text>
  </xsl:template>
</xsl:stylesheet>

Let's try it out:

$ xsltproc xml2csv.xsl books.xml
author,title,publishDate
Dan Simmons,Hyperion,1989
Douglas Adams,The Hitchhiker's Guide to the Galaxy,1979