Preserving semantic styles from DOCX/ODT in Go
The article discusses the importance of preserving semantic styles during the conversion of DOCX and ODT documents in Go. It introduces Tessera, a tool designed to maintain the meaning of manuscript styles rather than merely their visual representation. The author emphasizes the need for a conversion process that respects named styles as the source of truth in manuscript workflows.
- ▪Tessera is designed to preserve the semantic meaning of styles in manuscript conversion.
- ▪Unlike generic conversion tools, Tessera maps named styles to known roles for accurate representation.
- ▪The conversion pipeline involves extracting text and styles, creating a semantic intermediate representation, and rendering outputs like EPUB and LaTeX.
Opening excerpt (first ~120 words) tap to expand
try { if(localStorage) { let currentUser = localStorage.getItem('current_user'); if (currentUser) { currentUser = JSON.parse(currentUser); if (currentUser.id === 392649) { document.getElementById('article-show-container').classList.add('current-user-is-article-author'); } } } } catch (e) { console.error(e); } jk-kaluga Posted on May 29 Preserving semantic styles from DOCX/ODT in Go #go #epub #publishing #opensource Preserving semantic styles from DOCX/ODT in Go Most manuscript conversion tools are very good at moving text from one format to another. That sounds like enough until you work with a manuscript where styles are not just decoration. In many Word or LibreOffice files, a paragraph style called Poem does not mean “make this text indented”. It means “this is a poem”.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at DEV.to (Top).