Status of Writer filters. Intro - How to measure -- import - How to measure -- export (checklist, unhandled props / pool items) - This is still suboptimal -- but better, than nothing ODF - import: XML, has schema - export: iteration over paras, portions, properties - LO-specific extensions -- we hope these will be integrated - OOo/LO-specific extensions -- applications-specific by definition - sum: ideally export is lossless, and what we export is imported lossless as well DOCX - intro: can mean lots of things, here: wordprocessingml - import: tokenizer defines all handled tokens - export: shared word export: ideally -- unimplemented methods - Word and Writer sometimes have different concepts, there loosy by definition DOC - intro: probably the oldest filter, what's possible here is mostly already done - import: can measure SPRMS - export: shared exporter RTF - import: control words - export: shared exporter, control words, checklist - math: import/export -- OOXML status Questions? Data -- RTF: import: see writerfilter/source/rtftok/rtfcontrolwords.cxx # of control words from the spec: 1821 # of LO-specific RTF control words: 4 (obsolete, only there for compat reasons) # of handled control words on import: - git grep 'case RTF_.*:\|OPEN_M_TOKEN(' writerfilter/source/rtftok/|grep -v define|grep -v '##' |wc -l -> 575 -> uncovered areas: e.g. shape props export: - git grep STRING_SVTOOLS_RTF_ sw/source/filter/ww8/|sed 's/.*_STRING//;s/[^A-Z_].*//'|sort -u|wc -l -> 368 Data -- DOC / WW8: - binary format, loads of structures, SPRM can be measured - spec has character (85), paragraph (93), table (80), section (59) and picture (8) SPRM's -> 325 ---- import docsprm import os for m in [docsprm.picMap, docsprm.secMap, docsprm.tblMap, docsprm.parMap, docsprm.chrMap]: for i in m.keys(): if os.system("git grep -q %s" % i): print "%s found" % i else: print "%s not found" % i ---- # of handled: 318 -> some may be buggy, but pretty complete Data -- DOCX / OOXML: # of elements in wml spec (strict) / OfficeOpenXML-XMLSchema-Strict.zip: grep xsd:element.*name wml.xsd |sed 's/.*\(name="[^"]\+"\).*/\1/'|sort -u|wc -l -> 546 # of at least tokenized elements: 'for i in $(cat /home/vmiklos/git/libreoffice/spec/ecma/rng/wml-elements); do if grep -q $i.*tokenid model.xml; then echo found; else echo not found; fi; done|grep -c ^found' -> 461 # of unhandled elements: 'for i in $(cat /home/vmiklos/git/libreoffice/spec/ecma/rng/wml-elements); do if grep -q $i.*tokenid model.xml; then echo found; else echo not found; fi; done|grep -c ^not' -> 85