I spent the day getting some metadata into a .docx file by poking it into the underlying wordprocessingML files -- principally the document.xml file. It's been pretty miserable, so I'm posting a few things in case you're having the same kind of day.
First, be aware Word 2007 will throw non-descriptive errors if there are any extra tags in the document (it will tolerate extra attributes but promptly throw them away upon save), and there are very specific .zip options that have to be used to re-create the .docx file. I'm using a nice free tool called Package Explorer to view and edit the .docx files in development, and inserting into my company's product: MarkLogic Server for production (which un-packs and manages the .docx archive as well).
Secondly, there are three kinds of metadata (for custom extensions) tags: smartTag, customXml and sdt. My brief description of all three:
<sdt> a reference to some data in another xml file inside the .docx zip archive. I didn't do much with this format.
<customxml> This is a (possibly validated?) representation of XML where you can specify a schema of your choice for validating your metadata. This does NOT allow you to put the xml you want directly inside the OOXML. It allows you to encode your own xml using OOXML tags. E.g. instead of <myuri:myelem>myData</myuri:myelem>
you have to do something like:
<customxml uri="myUri" element="myElem">
<w:r><w:t>myData which I'm annoyed is showing up in the word doc</w:t></w:r>
</customxml>
A couple issues with customXml include that you need to put an entry into schema.xml for all the uri's that you use. E.g. to skip the uri="" attribute, you must add <w:attachedschema val=""> to schmea.xml. Doug Mahugh says that is a bug in Word 2007, btw.
<smartTag>
smartTag seems to be what I was looking for, but is the least blogged about or otherwise documented outside the spec. It's in part 3 of the spec on page 19. One trick here is that MS Word 2007 (all hail) seems to discard smartTag elements around paragraphs upon save. Instead, I used this tag inside a paragraph (w:p) tag as a sibling to the run (w:r) tags and it worked.
With smartTag you can specify some meaningless URI as a namespace, an arbitrary element name, and then put whatever data you want in, and it won't show up in your word doc. OTOH, you still have to use wordprocessingML/OOXML to awkwardly encode your xml as attributes:
<w:smartTag w:uri="http://schemas.openxmlformats.org/2006/smarttags"
w:element="stockticker">
<w:smartTagPr>
<w:attr w:name="fullCompanyName" w:val="Google"/>
</w:smartTagPr>
</w:smartTag>
In the above example, you really mean to say: <stockticker fullcompanyname="Google"> but you have to meta-encode it into the other XML format instead. Fortunately, you can use pretty trivial XQuery (or XSLT if you prefer) to convert it back.
For completeness I should metion that you can also squirrel additional data into the .docx zip archive if the data is at the document level rather than paragraph or block level.
BTW, the overall point to this is that I can now search the .xml that is implicitly authored with MS Word for specific paragraphs based on my custom tags. I'm going to use XQuery (including XPath) to do this against an XML database that holds both the binary zipped form of the .docx files and unzipped xml content from Word.
Wednesday, November 5, 2008
Adding metadata inside an OOXML document
Subscribe to:
Post Comments (Atom)
2 comments:
WoW shares many wow gold of its features with previously launched games. Essentially, you battle with Cheapest wow gold monsters and traverse the countryside, by yourself or as a buy cheap wow gold team, find challenging tasks, and go on to higher Cheap Wow Gold levels as you gain skill and experience. In the course of your journey, you will be gaining new powers that are increased as your skill rating goes up. All the same, in terms of its features and quality, that is a ture stroy for this.WoW is far ahead of all other games of the genre the wow power leveling game undoubtedly is in a league of its own and cheapest wow gold playing it is another experience altogether.
Even though WoW is a wow gold cheap rather complicated game, the controls and interface are done in buy warhammer gold such a way that you don't feel the complexity. A good feature of the game is that it buy wow items does not put off people with lengthy manuals. The instructions cannot be simpler and the pop up tips can help you start playing the game World Of Warcraft Gold immediately. If on the other hand, you need a detailed manual, the instructions are there for you to access. Buy wow gold in this site,good for you ,WoW Gold, BUY WOW GOLD.
Weekends to peopleig2tmean that they can have a two-day wowgold4europe good rest. For example, people gameusdcan go out to enjoy themselves or get meinwowgoldtogether with relatives and friends to talk with each storeingameother or watch interesting video tapes with the speebiewhole family.
Everyone spends agamegoldweekends in his ownmmoflyway. Within two days,some people can relax themselves by listening to music, reading novels,or watchingogeworld films. Others perhaps are more active by playing basketball,wimming ormmorpgvipdancing. Different people have different gamesavorrelaxations.
I often spend weekends withoggsalemy family or my friends. Sometimes my parents take me on a visit to their old friends. Sometimesgamersell I go to the library to study or borrow some books tommovirtexgain much knowledge. I also go to see various exhibition to broadenrpg tradermy vision. An excursion to seashore or mountain resorts is my favorite way of spending weekends. Weekends are always enjoyable for me.
Post a Comment