Refactor and optimize composeChar #33

harendra-kumar · 2020-05-05T14:55:39Z

Look at the commit messages for more details. May be a good idea to review individual commits. Here is the perf summary, before and after:

Benchmark      unicode-transforms(0)(ms) unicode-transforms(1)(ms)
-------------- ------------------------- -------------------------
NFC/AllChars                       19.36                     10.21
NFC/Deutsch                        13.54                      3.04
NFC/Devanagari                     17.42                      7.97
NFC/English                        12.76                      2.24
NFC/Japanese                       20.81                     11.02
NFC/Korean                         21.38                     12.96
NFC/Vietnamese                     18.62                     11.36

harendra-kumar · 2020-05-05T15:00:17Z

Tests are failing, need to do a bit more work before it becomes ready.

Bodigrim

This is a stunning speed up! Great!

Bodigrim · 2020-05-05T21:02:18Z

Data/Unicode/Internal/NormalizeStream.hs

 -- Hold an L to wait for V, hold an LV to wait for T.
 data JamoBuf
-    = JamoEmpty
-    | JamoLIndex {-# UNPACK #-} !Int
+    = JamoLIndex {-# UNPACK #-} !Int


I believe {-# UNPACK #-} pragmas can be safely omitted, there is {-# OPTIONS_GHC -funbox-strict-fields #-} anyways.

yes, it can be removed.

Bodigrim · 2020-05-05T21:05:16Z

Data/Unicode/Internal/NormalizeStream.hs

    | JamoLV     {-# UNPACK #-} !Char

+data RegBuf
+    = RegOne !Char
+    | RegMany !Char !Char [Char]


Are there any invariants about which of these Chars are combining and which are not? If yes, it is worth to reflect in comments. (Previously ReBuf was guaranteed to consist of combining characters only, and Char in Stater Char Rebuf was guaranteed to be non-combining.)

You can probably put a bang before [Char] as well.

The first one in the buffer may be a starter. Others are guaranteed to be non-starters (i.e. combining chars). I will put a comment. I will add a bang as well.

harendra-kumar · 2020-05-06T00:15:00Z

Tests are fixed now. There were two screw ups during refactoring:

I assumed that a non-hangul character can never decompose to hangul character. But that's not true. There are hangul characters out of the hangul range which have compatibility decompositions to hangul range.
When combining starter pairs we need to combine only two contiguous starters, I combined even non-contiguous ones.

Perf comparison:

Benchmark       unicode-transforms(0)(ms) unicode-transforms(1)(ms)
--------------- ------------------------- -------------------------
NFC/AllChars                        19.92                     11.00
NFC/Deutsch                         13.25                      3.29
NFC/Devanagari                      17.47                      7.88
NFC/English                         12.58                      2.55
NFC/Japanese                        21.42                     12.34
NFC/Korean                          21.55                     13.03
NFC/Vietnamese                      18.52                     11.73

NFKC/AllChars                       23.99                     15.85
NFKC/Deutsch                        13.03                      3.56
NFKC/Devanagari                     17.09                      7.58
NFKC/English                        12.37                      2.59
NFKC/Japanese                       22.40                     13.60
NFKC/Korean                         21.31                     13.10
NFKC/Vietnamese                     18.23                     11.65

harendra-kumar · 2020-05-06T00:47:01Z

RegBuf is just ReBuf without the Empty state. I guess we can remove duplication by using Just RegBuf wherever we are using ReBuf.

Bodigrim · 2020-05-06T00:51:59Z

Just may introduce an additional indirection and (worse) an additional level of laziness.

harendra-kumar · 2020-05-06T01:04:28Z

I hope GHC would eliminate the indirection during simplification.

harendra-kumar · 2020-05-06T01:17:08Z

Comparison with text-icu (last column is % diff from the lower, negative means we are better). We are better in all decompositions and in some compositions. The only cases where we are signficantly worse is Japanese and Korean compositions. And we have to remember that we are doing a full decomposition and then composition.

Korean can be improved relatively easily perhaps.

Benchmark       ICU(ms)(base) unicode-transforms(%)(-base)
--------------- ------------- ----------------------------
NFC/AllChars             3.65                      +176.18
NFC/Deutsch              2.03                       +55.51
NFC/Devanagari           8.33                        -8.10
NFC/English              2.17                        +4.85
NFC/Japanese             2.99                      +309.14
NFC/Korean               4.51                      +184.43
NFC/Vietnamese          17.28                       -37.16
NFD/AllChars             7.57                       -12.89
NFD/Deutsch              3.55                       -35.48
NFD/Devanagari           9.04                       -27.18
NFD/English              2.82                       -38.42
NFD/Japanese             9.40                        -9.08
NFD/Korean              37.46                       -52.72
NFD/Vietnamese           7.03                       -15.52
NFKC/AllChars           10.03                       +50.94
NFKC/Deutsch             3.33                        +5.77
NFKC/Devanagari          8.34                        -5.40
NFKC/English             2.10                       +19.39
NFKC/Japanese            5.32                      +137.63
NFKC/Korean              7.65                       +69.89
NFKC/Vietnamese         17.37                       -37.23
NFKD/AllChars           11.15                       -13.69
NFKD/Deutsch             3.58                       -39.06
NFKD/Devanagari          9.16                       -27.47
NFKD/English             2.92                       -38.14
NFKD/Japanese           10.38                       -15.28
NFKD/Korean             37.46                       -52.94
NFKD/Vietnamese         12.98                       -53.81

harendra-kumar · 2020-05-06T11:00:45Z

Added one more commit to speed up isHangulLV check. Now Korean benchmark is pretty close to text-icu in NFC and significantly better than ICU in NFKC.

Benchmark       unicode-transforms(0)(ms) unicode-transforms(1)(ms)
--------------- ------------------------- -------------------------
NFC/Korean                          13.21                      5.13
NFKC/Korean                         13.36                      5.49

Bodigrim · 2020-05-06T21:56:35Z

Astonishing!

harendra-kumar · 2020-05-07T01:34:07Z

I did not properly review the code before committing. Even though the code was wrong, the tests passed, I need to check if the tests are working properly. Unicode test suite does not have a hangul (LV, T) composition test, I had added this in an extra test suite, but it does not seem to be working.

The correct code should have been:

    assert (jamoTCount `mod` 4 == 0)
           (idiv4 == 0 && ti == 0)
    where
        i = (ord c) - hangulFirst
        idiv4 = i .&. 3
        (_, ti) = i `quotRem` jamoTCount

But this does not help much, perhaps because anyway most syllables are hangul LV so we anyway have to do the expensive quotRem. I wrote this one earlier:

-- Cheaper division by 28
{-# INLINE isDivBy28 #-}
isDivBy28 :: Int -> Bool
isDivBy28 n =
    let (q, r) = divBy32 n
    in if (q == 0)
       then r == 28 || r == 0
       else isDivBy28 (q `unsafeShiftL` 2 + r)

    where

    divBy32 x =
        let q = x `unsafeShiftR` 5
            r = x .&. 31
        in (q, r)

isHangulLV c = isDivBy28 ((ord c) - hangulFirst)

This gives us the following results:

Benchmark       unicode-transforms(0)(ms) unicode-transforms(1)(ms)
--------------- ------------------------- -------------------------
NFC/Korean                          13.21                      10.65
NFKC/Korean                         13.36                      10.65

Good improvement but not as dramatic. I am looking at another strategy which could be cheaper i.e. test for a jamo T and only then check if the previous one was an LV. In a precomposed form we will mostly see LV and LVT, rarely when we see a jamo T only then we will need the expensive division.

* unfold the non-decomposable case outside the recursive loop * use SPEC constr on encode

There was a lot of code bloat earlier, the core size reduced from 50K lines to 20K lines.

In composeChar: * First check the state and then make decisions based on the char type instead of doing the opposite. This simplifies the code, the core size reduces by 1/3rd (it is still huge). * With this change the code is better amenable to spec-constr optimization, with -fspec-constr-count=8 it is able to produce a tighter loop improving performance by several times (best improvement is in English benchmark).

UNPACK pragmas are redundant as we use a blanket -funbox-strict-fields

harendra-kumar · 2020-05-07T04:10:11Z

I dropped the last commit, will continue that in a separate PR and merge this.

harendra-kumar · 2020-05-07T04:29:05Z

merged in master.

harendra-kumar requested review from Bodigrim and adithyaov May 5, 2020 14:55

Bodigrim reviewed May 5, 2020

View reviewed changes

harendra-kumar force-pushed the compose branch from fd7e8dd to 038f4fd Compare May 6, 2020 00:05

Bodigrim approved these changes May 6, 2020

View reviewed changes

harendra-kumar force-pushed the decompose branch from d6327a1 to 1615b9a Compare May 7, 2020 02:26

harendra-kumar added 4 commits May 7, 2020 09:38

optimize composeChar

2e9f0b8

* unfold the non-decomposable case outside the recursive loop * use SPEC constr on encode

refactor to reduce the number of cases in composeState

ade41dc

There was a lot of code bloat earlier, the core size reduced from 50K lines to 20K lines.

Add bang on constr, remove UNPACKs, add a comment

500a144

UNPACK pragmas are redundant as we use a blanket -funbox-strict-fields

harendra-kumar force-pushed the compose branch from 678eea0 to 500a144 Compare May 7, 2020 04:09

harendra-kumar closed this May 7, 2020

harendra-kumar mentioned this pull request May 7, 2020

Speed up hangul composition #37

Merged

harendra-kumar deleted the compose branch July 18, 2020 19:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor and optimize composeChar #33

Refactor and optimize composeChar #33

harendra-kumar commented May 5, 2020

harendra-kumar commented May 5, 2020

Bodigrim left a comment

Bodigrim May 5, 2020

harendra-kumar May 6, 2020

Bodigrim May 5, 2020 •

edited

Loading

harendra-kumar May 6, 2020

harendra-kumar commented May 6, 2020

harendra-kumar commented May 6, 2020

Bodigrim commented May 6, 2020

harendra-kumar commented May 6, 2020

harendra-kumar commented May 6, 2020 •

edited

Loading

harendra-kumar commented May 6, 2020 •

edited

Loading

Bodigrim commented May 6, 2020

harendra-kumar commented May 7, 2020

harendra-kumar commented May 7, 2020

harendra-kumar commented May 7, 2020

Refactor and optimize composeChar #33

Refactor and optimize composeChar #33

Conversation

harendra-kumar commented May 5, 2020

harendra-kumar commented May 5, 2020

Bodigrim left a comment

Choose a reason for hiding this comment

Bodigrim May 5, 2020

Choose a reason for hiding this comment

harendra-kumar May 6, 2020

Choose a reason for hiding this comment

Bodigrim May 5, 2020 • edited Loading

Choose a reason for hiding this comment

harendra-kumar May 6, 2020

Choose a reason for hiding this comment

harendra-kumar commented May 6, 2020

harendra-kumar commented May 6, 2020

Bodigrim commented May 6, 2020

harendra-kumar commented May 6, 2020

harendra-kumar commented May 6, 2020 • edited Loading

harendra-kumar commented May 6, 2020 • edited Loading

Bodigrim commented May 6, 2020

harendra-kumar commented May 7, 2020

harendra-kumar commented May 7, 2020

harendra-kumar commented May 7, 2020

Bodigrim May 5, 2020 •

edited

Loading

harendra-kumar commented May 6, 2020 •

edited

Loading

harendra-kumar commented May 6, 2020 •

edited

Loading