Apache OpenOffice (AOO) Bugzilla – Issue 106833
Some letters are omitted when printing
Last modified: 2017-05-20 10:28:56 UTC
When printing documents some letters for certain fonts are omitted. The issue was experienced when printing from Writer and Calc. Russian capital letter “Short Iâ€, U+0419 is omitting when occurred with following fonts: Arial, Courier New, and Times New Roman. Russian small letter “Beâ€, U+0431 is omitting when occurred with DejaVu Sans. The issue lead to corrupted parer printout. Printout depends on version of cups using, it can look, as a space instead of the letter omitted or letter is omitted without any space and some space is inserted further in text shifting a symbol in place partially over the next one. The issue remains when printing into PDF via cups-pdf. PDF document generated this way looks the same as paper printout described above. The issue doesn't appear when document is exported to PDF first, then PDF document is printed. This is the only way found to bypass the issue. Attached zip contains: mistyped-fonts.odt initial document that contains samples with letters and fonts in subject. All paragraphs (lines) are the same text containing U+0431 at 4th position (counting from 1) and U+0419 at 20th position. There are 4 paragraphs per font, with different font styles: regular, italic, bold, and bold italic. Last 4 paragraphs are in Microsoft Verdana font that has no signs of the issue, those are given as sample; mistyped-fonts.pdf pdf “printoutâ€, generated by cups-pdf; mistyped-fonts.ps “Print into file†output; mistyped-fonts-exported.pdf export into PDF output, clean against the issue, as it was noted above. I tried to analyze postscript file that is saved when “Print into file†is checked. As I can see, there is no glyphs defined in the file for the letters that will be omitted when printing. For example, for ArialMTFID33HGSet2, there is no glyph set for “Encoding 14†(that corresponds to U+0419, 0x0E in the output later in file). The VerdanaFID59HGSet2 has “Encoding 14†glyph defined and is Verdana is printed out correctly. Environment: OS: Kubuntu 9.04 OOo: OOO310m19. Build 9420. DejaVu Sans font from OS distribution, also latest version from dejavu- Microsoft TTF fonts concerned were installed by “msttcorefonts†package that downloads them from sourceforge.net. The issue is very serious for Russian-speaking users, especially for not a lot IT-experienced office-sitter users in Belarus and Russia where the majority of the state institutions as well as many private companies use Microsoft document formats and their fonts as standards, issuing such documents outside and expecting or even requiring them on entry. Given such a document received, user may print it out and use further officially even not knowing that printout contains letters omitted.
Created attachment 66069 [details] Files noted in description
Both printing on HP LaserJet and exporting to pdf with an Englisch OOo on German WinXP have _no_ errors.
The problem is probably related to issue 104050 and issue 105631#desc14
Created attachment 66073 [details] Another sample, more simple, only 1 letter involved
The samples' zip posted above include odt file with only Cyrillic “Shot I†(U+0419) in Arial and “Print into file†postscript output. It is clearly seen with postscript file that the only one letter to be printed has no encoding defined in the embedded font subset. There is strange “Encoding 0†bound to glyph3 only in the file. Glyph3 looks like an empty rectangle. Glyph1 and glyph2 that pulled into embedded font subset are parts of Cyrillic “Shot I†(U+0419) – the letter that should be printed. “Short I†(http://en.wikipedia.org/wiki/Й) consists two graphical parts – the base that looks the same as other Russian letter “I†and the diactrical mark – breve. Glyph1 is the base and glyph2 is the breve.
HIGH IMPORTENS FOR RUSSIAN USERS! It is very important for Russian users, because they can not print in Open Office documents in MS Winword format, which is the de facto standard for government agencies and corporations in Russia
I can confirm that the second sample doc creates only ony glyph encoded as "0" in the produced font. However ghostscript seems to show this PostScript file just fine (including the sample postscript output attached). A PDF file produced with cups-pdf shows just fine in acroread; only ghostscript shows a problem with that PDF file. Anyway, let's try to avoid glyph 0 and use 1 instead.
Ok, what is wrong is the Encoding vector (and is plain wrong, the actual encoded value for the glyph is already '1' as it should be). The Encoding vector however seems to be not the first place most programs look for the glpyh, it seems to be the CharStrings array. The entries of the encoding vector are originally created in vcl/unx/source/printergfx/glpyhset.cxx in GlyphSet::PSUploadFont. The encoded entries come from a hash_map, which is not sorted; but that should not be necessary anyway since the decription comes as a glyph array and an encoding array. This then goes into FontSubsetInfo, which uses CreateT42FromTTFont (and friends) to create the subsetted font file. Now the latter expect the notdef glpyh to be encoded '0' (which is reasonable), but do not allow for the encoding to be unsorted. So there are three places where the encodig -> glyph could be repaired by sorting: PSUploadFont (which has the original unsorted data), FontSubsetInfo (which uses CreateT???FromTTFont wrongly) or the CreateT???FromTTFont functions, which perhaps should be able to catch this. @hdu: do you have an opinion where to best fix this ? I chose to use CreatePSUploadableFont in glyphset.cxx for this since it is central and allows to use the unsorted hash_map in the GlyphSet class (which is probably a performance gain). Anyway I'd like you to review the change. fixed in CWS vcl108
*** Issue 104050 has been marked as a duplicate of this issue. ***
*** Issue 105631 has been marked as a duplicate of this issue. ***
Doing it in CreatePSUploadableFont() is the solution that will solve this problem reliably and without the risk to impact other parts of the code. The need to put the Notdef glyph at glyphid zero is needed and coded all over the place though so in the medium term I'd love to have all this consolidated into one place like FontSubsetInfo::CreateFontSubset(). I'm not sure though what all the callers are expecting since they provide a const parameter to request the id of each subsetted glyph. Are these just friendly suggestions that can be completely ignored or can they be reshuffled at will by the callee? As of now the subsetter will treat them as gospel and does just as requested.
will also commit to CWS ooo32gsl09 for 3.2 due to the overall severity for printing on Linux.
please verify in CWS ooo32gsl09
Bug was not reproducible on my suse linux. Fix verified on PL's machine. @alexps: Please close this issue when it is still ok in OOo3.2 final.