-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up hangul composition #37
Conversation
I earlier commented in #33 about tests not working. It seems the tests are there but they are based on first few hangul characters and for those cases the incorrect code worked fine. |
Added a new commit for better speed up, results (% diff) before and after the new commit:
To compare just hangul:
We are now better than text-icu in all cases except mixed chars and Japanese. |
Could you possibly rebase this atop of #39? To estimate the effect of lookahead on its own, once divisions are as fast as possible. |
I rebased this branch locally atop of master. It still provides a decent improvement.
|
Earlier were determining the type of the hangul syllable as soon as we saw it. Determining whether it is a hangul LV or not required a division operation which is expensive. Instead we now delay this decision until we see the next character. In most cases we do not need to determine this, we only need to determine it if the syllable is LV and it is followed by a T. With this we need very little division, therefore speeding up hangul composition significantly.
7c4c3ea
to
2f70dc9
Compare
Rebased, this is what I see:
|
Comparison with text-icu:
AllChars and Japanese are the only ones where we are significantly worse than text-icu:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
Use faster divisibility-by-28 check.
Perf comparison: