1. Introduction

The biggest difference is that the new exporter is an UNO component, and instead of accessing Writer’s internals directly, it uses the domain mapper (created for the docx import) interface, which uses Writer’s UNO API for the import. The vision here is that dmapper already has the implementation of mapping items from the Word domain to the Writer domain, so a writerfilter-based RTF importer reduces the amount of duplicated code.

1.1. Terminology

RTFTokenizer refers to the new UNO-based importer, RtfReader refers to the old built-in one.

2. General

An RTF document consists of different types of control words (flags, toggles, values, etc) and text. RtfReader used to hardcode the type of the control words, RTFTokenizer now has this in a central table (originally generated from the specification), making it impossible to handle incorrectly the parameters of control words (e.g. handle the parameter of a value as a toggle).

Also, RTFTokenizer separates the task of separating control words and text, for example the meaning of the special { character defined at a single place, while it was handled in 19 (!) different places in RtfReader.

3. List of fixed bugs

4. List of new features

4.1. Character properties

  • blinking

  • relative font size in superscript characters

4.2. Tables

  • vertical merged cells

  • nested tables

4.3. Footnotes / endnotes

  • all characters of the foot/endnote mark are in the field

  • the field is properly superscript

4.4. Sections

  • line numbering

4.5. Fields

  • Postit comments are supported by RTFTokenizer.

4.6. Drawings

Drawing objects for Word 97 through Word 2007 (shapes) are now handled by RTFTokenizer:

  • basic shapes (rectangle, ellipse, etc.)

  • lines, including free-form ones

  • texts, including vertical ones and their (paragraph and character) formatting

4.7. Form fields

All types supported by the RTF format are handled by RTFTokenizer, namely:

  • text boxes

  • check boxes

  • list boxes

4.8. OLE objects

Their result is imported as a picture - RtfReader did not import anything.

When native is available, then it’s handled as well, but no automatic conversion is done yet (for DOC files there is an automatic conversion from MathType to Writer formula).

4.9. Text frames

  • anchor type is now parsed by RTFTokenizer (no longer always assume to paragraph but also handle as character)

  • handling of invalid nested frames now match the behaviour of Word

5. DOCX changes

Given that sometimes I had to improve dmapper for RTF, a few features are now better for docx as well:

  • double strikethrough character property used to have an effect till the end of document (!)

  • text-to-text alignment is now imported

  • restart of footnote numbers

  • extra paragraph at the end of footnotes is no longer inserted

6. Changes in the source code outside RTF importer