Just Write Click

Technical writing with Continuous Integration and docs-as-code

  • JustWriteClick
  • Contact
  • Books by Anne Gentle
  • Introducing Docs Like Code
You are here: Home / tools / Author-it and converting UTF-16 to UTF-8

December 4, 2007 by annegentle

Author-it and converting UTF-16 to UTF-8

In trying to modify multiple Author-it topics (okay, 5,119 topics) with variable assignments, I have had to work with the XML output that Author-it exports.

To export to XML, you select a topic or multi-select topics, then right-click on the selection and choose XML > Save to file.

Turns out, Author-it outputs its XML encoded with UTF-16, but apparently most Windows applications understand UTF-8. When I tried to open my freshly export XML file, XML Copy Editor gave me an error.

So I had to discover how to convert the XML from UTF-16 encoding to UTF-8 (and you can’t just open it in Notepad on Windows and change the 16 to 8, there are other embedded characters indicating the encoding, and, well, it’s encoded.)

First, I used the identity transform documented many places, my favorite place being the XSLT Cookbook, to convert the Author-it output to UTF-8. Here’s the XSLT code for that:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:output encoding="utf-8"/> <xsl:template match="@*|node()">

	<xsl:copy>	 	<xsl:apply-templates select="@*|node()"/>

	</xsl:copy> </xsl:template>

</xsl:stylesheet>

I just ran the above transform against the AuthorIT Objects.xml file I exported, using the Instant Saxon XSLT processor.

Then, I wanted to remove all <VariableAssigments> elements, effectively removing an entire node. Again, the identity transform (or copy transform) was effective. And, I learned that I had to identify the AuthorIT namespace thanks to this excellent helper article, Handling Default Namespaces on topxml.com.

<?xml version="1.0" encoding="UTF-8"?>

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"

xmlns:ait ="http://www.authorit.com/xml/authorit">

<!--Match everything using identity copy-->

<xsl:template match="@*|node()">

	<xsl:copy>

	<xsl:apply-templates select="@*|node()"/>

	</xsl:copy>

</xsl:template>

<!--Remove all the VariableAssignment nodes-->

<xsl:template match="ait:VariableAssignment">

	<xsl:comment>Removed VariableAssignment element</xsl:comment>

</xsl:template>

</xsl:stylesheet>

This transform is pretty dangerous, though, because it’s taking away all the VariableAssignment elements, and you might have valuable metadata stored that you would quickly blow away. So use with care.

This workaround is likely also useful for DITA and XHTML files, because Author-it outputs its DITA and XHTML files as UTF-16. I haven’t investigated its usefulness for those areas, but a quick search on the Author-it-users group on Yahoo revealed that sometimes people want UTF-8 rather than UTF-16. So, I hope this helps.

Related

Filed Under: tools Tagged With: author-it, AuthorIT, DITA, identity template, software, techpubs, tools, UTF, UTF-16, UTF-8, writing, XML, XSLT

More reading

Bubble graph showing sources of developer support data

I’ve been thinking a lot about developer support at Cisco recently, especially for the way the world works today with multiple cloud providers. This post is a re-publish of my talk from over five years ago, but the techniques and tools for listening and helping others are still true today. At Rackspace, we watched several […]

Cisco DevNet is our developer program for outreach, education, and tools for developers at Cisco. From the beginning, the team has had a vision for how to run a developer program. Customers are first, and the team implements what Cisco customers need for automation, configuration, and deployment of our various offerings. Plus, the DevNet team […]

I had a great talk with Ellis Pratt of Cherryleaf Technical Writing consulting last week. Here are the show notes, full of links to all the topics we covered. Podcasts are great fun to listen to and participate in, if a bit nerve-wracking to think on your feet and make sure you answer questions succinctly […]

At the beginning of this year, I worked hard to summarize my thoughts on API documentation, continuous publishing, and technical accuracy for developer documentation. The result is an article on InfoQ.com, edited by Deepak Nadig, who also was forward-thinking in having me speak to a few teams at Intuit about API documentation coupled with code. Always […]

Recently on Just Write Click

  • A Flight of Static Site Generators: Sampling the Best for Documentation
  • Try a GPT about “Docs Like Code” to ask questions
  • Discipline and Diplomacy: Docs in the Open
  • Let’s Find Out: When Do Static Site Generators Do Rendering?
  • GitHub for Managing Tech Docs

Just Write Click in your Inbox

Enter your email address to subscribe to Just Write Click and receive notifications of new posts by email.

Read More

  • Privacy Policy
  • About Anne Gentle, developer experience expert
  • Books by Anne Gentle
    • Conversation and Community
    • Docs Like Code, a Book for Developers and Tech Writers
  • Woman in Tech Speaker Profile
  • Contact

Books

  • JustWriteClick
  • Contact
  • Books by Anne Gentle
  • Introducing Docs Like Code

Copyright © 2025 · WordPress · Log in