Status of Writer filters.
Intro
- How to measure -- import
- How to measure -- export (checklist, unhandled props / pool items)
- This is still suboptimal -- but better, than nothing
ODF
- import: XML, has schema
- export: iteration over paras, portions, properties
- LO-specific extensions -- we hope these will be integrated
- OOo/LO-specific extensions -- applications-specific by definition
- sum: ideally export is lossless, and what we export is imported lossless as well
DOCX
- intro: can mean lots of things, here: wordprocessingml
- import: tokenizer defines all handled tokens
- export: shared word export: ideally -- unimplemented methods
- Word and Writer sometimes have different concepts, there loosy by definition
DOC
- intro: probably the oldest filter, what's possible here is mostly already done
- import: can measure SPRMS
- export: shared exporter
RTF
- import: control words
- export: shared exporter, control words, checklist
- math: import/export -- OOXML status
Questions?

Data -- RTF:
import: see writerfilter/source/rtftok/rtfcontrolwords.cxx
# of control words from the spec: 1821
# of LO-specific RTF control words: 4 (obsolete, only there for compat reasons)
# of handled control words on import:
- git grep 'case RTF_.*:\|OPEN_M_TOKEN(' writerfilter/source/rtftok/|grep -v define|grep -v '##' |wc -l -> 575
-> uncovered areas: e.g. shape props
export:
- git grep STRING_SVTOOLS_RTF_ sw/source/filter/ww8/|sed 's/.*_STRING//;s/[^A-Z_].*//'|sort -u|wc -l -> 368

Data -- DOC / WW8:
- binary format, loads of structures, SPRM can be measured
- spec has character (85), paragraph (93), table (80), section (59) and picture (8) SPRM's -> 325

----
import docsprm
import os

for m in [docsprm.picMap, docsprm.secMap, docsprm.tblMap, docsprm.parMap, docsprm.chrMap]:
        for i in m.keys():
                if os.system("git grep -q %s" % i):
                        print "%s found" % i
                else:
                        print "%s not found" % i
----
# of handled: 318 -> some may be buggy, but pretty complete

Data -- DOCX / OOXML:
# of elements in wml spec (strict) / OfficeOpenXML-XMLSchema-Strict.zip:
grep xsd:element.*name wml.xsd |sed 's/.*\(name="[^"]\+"\).*/\1/'|sort -u|wc -l -> 546
# of at least tokenized elements:
'for i in $(cat /home/vmiklos/git/libreoffice/spec/ecma/rng/wml-elements); do if grep -q $i.*tokenid model.xml; then echo found; else echo not found; fi; done|grep -c ^found' -> 461
# of unhandled elements:
'for i in $(cat /home/vmiklos/git/libreoffice/spec/ecma/rng/wml-elements); do if grep -q $i.*tokenid model.xml; then echo found; else echo not found; fi; done|grep -c ^not' -> 85