Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for soft hyphens (SHY, Unicode 173) to Word and PDF emitter #1179

Closed
hvbtup opened this issue Jan 11, 2023 · 3 comments · Fixed by #1180
Closed

Add support for soft hyphens (SHY, Unicode 173) to Word and PDF emitter #1179

hvbtup opened this issue Jan 11, 2023 · 3 comments · Fixed by #1180
Labels
Milestone

Comments

@hvbtup
Copy link
Contributor

hvbtup commented Jan 11, 2023

Tested with BIRT 4.3.0, but I think the behavior is still the same with current BIRT source:

Unicode Symbol 173 is the soft-hyphen symbol known as SHY.
The abbreviation SHY is fitting perfectly, because this symbol usually hides itself and only should be visible at the end of a line.

Some languages, German in particular, tend to use very long words, for example in chemistry.
In some cases, the text containing these words is predefined somewhere, e.g. in a database.
So it makes sense to store good possible hyhenation points by placing the SHY symbol at these places inside the word,
e.g. for a word like "kapillar­gaschromato­graphisch" this could be stored as "kapillar-gas-chromato-graphisch".
Notes:

  1. There are more possible hyphenation points inside this word, e.g. "ka-pil-lar-gas-chro-ma-to-graph-isch", but not all of them good for readability.
  2. I used the minus sign instead of the SHY symbol, because you wouldn't see it: The browser understands it and would hide it.
  3. This does not mean we need automatic hyphenation. But if the text is already prepared for hyphenation, let's just use that info.

As far as I can tell, the PDF emitter does all the line-breaking logic inside BIRT.
So it should be possible to specifically handle the SHY symbol inside pre-hyphenated word and to consider it in the simple word-breaking algorithm.
It should also be possible to make this work in some way for the Word emitter. At least it seems necessary to replace a SHY character with <w:softHyphen/> in the XML output.

I'll ask my boss if and when I can get the time to tackle this.

@hvbtup hvbtup added the Feature label Jan 11, 2023
@hvbtup
Copy link
Contributor Author

hvbtup commented Jan 11, 2023

shy.zip

Demo report, containing the same text as HTML text item and plain text item.
With BIRT 4.9, the HTML text item works correctly in the DOCX output, the plain text item does not (showing minus symbols instead). For PDF output, both text items do not work correctly.

@hvbtup
Copy link
Contributor Author

hvbtup commented Jan 12, 2023

Created a PR #1180 for this.

@wimjongman wimjongman added this to the 4.13 milestone Jan 27, 2023
@hvbtup
Copy link
Contributor Author

hvbtup commented Jan 27, 2023

Fixed with #1180

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants