word theory

contents / about / contact / rss

19.03.2009

Word to XML Conversions

I find it disturbing that converting content from Microsoft Word to XML is so difficult.

Currently my organization relies on a custom macro built in Teambase—a legacy file sharing platform that suites itself to print publications more than web production—to do the majority of our content transformations. It can only handle the more basic Word features. How basic? Less than a year ago we were excited when an update allowed the system to capture and convert hyperlinks inserted in Word.

(Teambase is a file sharing program—an editing platform, really—that allows users to manage Microsoft Word and QuarkXPress files. An in-house team of developers modified it to do XML conversion for web-production, with mixed results.)

As the amount of our organization’s web-only content grew, Google Docs caught our attention as a viable file sharing platform. Imagine: a web-based file-sharing program paired with our proprietary CMS. It’s entirely possible for a story to be conceived, authored, edited, and posted with only two browser tabs open. Editors and producers were now as mobile as their reporters, who could file stories anywhere. By embracing Google Doc’s and our CMS, we’d subscribed to cloud computing.

I only wish Google Docs had an enterprise platform or an API.

Google Docs gets around our sloppy Word to XML conversion issues easily, simply by virtue of  not being Microsoft Word. Docs allows rich-text WYSWIG to WYSIWIG copying with minor PRISM XML issues, and even has a feature that allows users to export HTML.

What does this have to do with Word? Well unless more savvy developers can tackle the jungle of code being Microsoft Word and it’s suite of features, it’s future may be limited to just print production, at least for our organization. Google Docs, despite some missing features I hope to highlight later, is just too attractive to pass up.