Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add AltiVec RAID-Z #9539

Merged
merged 3 commits into from
Jan 23, 2020
Merged

Add AltiVec RAID-Z #9539

merged 3 commits into from
Jan 23, 2020

Conversation

rdolbeau
Copy link
Contributor

@rdolbeau rdolbeau commented Oct 31, 2019

Implements the RAID-Z function using AltiVec SIMD.
This is basically the NEON code translated to AltiVec.

Note that the 'fletcher' algorithm requires 64-bits
operations, and the initial implementations of AltiVec
(PPC74xx a.k.a. G4, PPC970 a.k.a. G5) only has up to
32-bits operations, so no 'fletcher'.

Signed-off-by: Romain Dolbeau romain.dolbeau@european-processor-initiative.eu

Motivation and Context

Performance only, on a limited amount of hardware...

Description

This add AltiVec (PowerPC SIMD) support for RAID-Z SIMD.

However :

  1. This has only received limited testing on my G4, G5 & QEMU
  2. This has only been tested on a big-endian 64 bits PPC (ppc64) & BE 32 bits PPC (ppc)
  3. This should be tested on little-endian 64 bits PPC (ppc64el) I've also tested in on Debian ppc64el in QEMU, it passes ztest & raidz_test
  4. The testing code for userland is the old 'try altivec and catch sigill' method, I don't know of anything better
  5. The testing code for kernel is disabled, has it seems to be GPL-only on my kernel and I don't know how to properly detect AltiVec in-kernel (catching SIGILL is probably not an option there...) This checks the MSR 'Vec' bit is set when AltiVec is enabled
  6. Seems that adding -maltivec is crashing every non-ppc arch, I'm not sure how to properly add the option to the file that needs it (the compiler doesn't seem to want to deal with AltiVec asm without -maltivec) Fixed, but Makefile implementation might not be very clean. Makefile was updated to use the symbols introduced with Unify target_cpu handling #9848
  7. no 32 bits ztest, userland for BE is 32 bits and ztest crashes at start-up:
dolbeau@powermacg5:~/zfs$ ./cmd/ztest/ztest loading concrete vdev 0, metaslab 14 of 15 ...
error: Pool 'ztest' has encountered an uncorrectable I/O failure and the failure mode property for this pool is set to panic.
/home/dolbeau/zfs/cmd/ztest/.libs/ztest(+0x9058)[0x789058]
linux-vdso32.so.1(__kernel_sigtramp32+0x0)[0x100424]
/lib/powerpc-linux-gnu/libc.so.6(gsignal+0xdc)[0xf7a6e4bc]
/lib/powerpc-linux-gnu/libc.so.6(abort+0x144)[0xf7a55978]
/home/dolbeau/zfs/lib/libzpool/.libs/libzpool.so.2(+0x40314)[0x370314]
/home/dolbeau/zfs/lib/libzpool/.libs/libzpool.so.2(vcmn_err+0x0)[0x3703b0]
/home/dolbeau/zfs/lib/libzpool/.libs/libzpool.so.2(+0x17a42c)[0x4aa42c]
/home/dolbeau/zfs/lib/libzpool/.libs/libzpool.so.2(+0x17c07c)[0x4ac07c]
/home/dolbeau/zfs/lib/libzpool/.libs/libzpool.so.2(zio_execute+0xd8)[0x4a7f78]
/home/dolbeau/zfs/lib/libzpool/.libs/libzpool.so.2(+0x41758)[0x371758]
/lib/powerpc-linux-gnu/libpthread.so.0(+0x891c)[0x1d891c]
/lib/powerpc-linux-gnu/libc.so.6(clone+0x60)[0xf7b3d438]
child died with signal 6

However, ztest seems OK in QEMU on ppc64el with 64 bits userland.

  1. disable_kernel_altivec() was apparently introduced around kernel 4.5 so fails to compile on e.g. 3.16 disable_kernel_altivec() is only used for kernel >= 4.5

Performance (on G5):

(dolbeau)powermacg5:~/zfs> cat /proc/spl/kstat/zfs/vdev_raidz_bench 
18 0 0x01 -1 0 1022640300308 1320961408741                                                                                                                                                                                                                                     
implementation   gen_p           gen_pq          gen_pqr         rec_p           rec_q           rec_r           rec_pq          rec_pr          rec_qr          rec_pqr                                                                                                       
original         160563916       43900725        27402800        482443964       67385690        11490536        22298016        3420657         3158063         2105862                                                                                                       
scalar           694230532       199962624       86366216        607205930       174317751       116252651       62091132        53510959        42755587        30793098                                                                                                      
powerpc_altivec  1027742844      488162276       222745665       890691014       493951582       329358008       213486266       185015426       156249270       107822494                                                                                                     
fastest          powerpc_altivec powerpc_altivec powerpc_altivec powerpc_altivec powerpc_altivec powerpc_altivec powerpc_altivec powerpc_altivec powerpc_altivec powerpc_altivec       

How Has This Been Tested?

Only tested on a BE PPC64 970MP "G5" running 4.19, both raidz_test & trying a pool.
Also on a BE PPC 7455 "G4" running 5.3 (and to a limited extent 3.16).
Also on a LE POWER9 in QEMU, running 4.19, ztest & raiz_test.

ztest was not run on BE systems, see above.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • [ x] Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

  • [ x] My code follows the ZFS on Linux code style requirements.
  • [ x] I have updated the documentation accordingly.
  • [ x] I have read the contributing document.
  • I have added tests to cover my changes.
  • [ x] All new and existing tests passed.
  • [ x] All commit messages are properly formatted and contain Signed-off-by.

@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Oct 31, 2019
@rdolbeau rdolbeau force-pushed the simd-altivec branch 3 times, most recently from dc35d89 to 8b35a67 Compare November 6, 2019 19:02
@behlendorf
Copy link
Contributor

@rdolbeau I should have access to a couple ppc64el systems I can give this a spin on. Though I might not be able to get to it for a little bit.

@behlendorf behlendorf added Type: Architecture Indicates an issue is specific to a single processor architecture Type: Performance Performance improvement or performance problem labels Nov 6, 2019
@rdolbeau
Copy link
Contributor Author

rdolbeau commented Nov 6, 2019

@behlendorf Thanks, and there's no hurry. Even it if all architectures works, there's still the issue of detecting AltiVec in-kernel, I'm still not sure how to do that. And there is even some current PPC64 w/o AltiVec - the e5500 core in the new AmigaOne X5000 is one of them - so enabling AltiVec all the time isn't a good option.

Though the real fun will start with the variable-length SIMD ISA like Arm's SVE and RISC-V's V :-)

Question - do the ppc & ppc64 buildbot do any test that could validate the code, or do they really just build?

@codecov
Copy link

codecov bot commented Nov 7, 2019

Codecov Report

Merging #9539 into master will decrease coverage by <1%.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff            @@
##           master    #9539    +/-   ##
========================================
- Coverage      79%      79%   -<1%     
========================================
  Files         385      385            
  Lines      121644   121644            
========================================
- Hits        96606    96586    -20     
- Misses      25038    25058    +20
Flag Coverage Δ
#kernel 80% <ø> (ø) ⬇️
#user 67% <ø> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 09436c5...0828e5d. Read the comment docs.

@behlendorf
Copy link
Contributor

do the ppc & ppc64 buildbot do any test that could validate the code

They really only verify the compilation and don't perform any testing.

@PrivatePuffin
Copy link
Contributor

I'm going to be a bit harsh here, but is there any use for this?
Spending time getting it to work, without a test suite... On a architecture that is by all means depricated and still requires significant work to get working in any decent fasion?

@rdolbeau
Copy link
Contributor Author

@Ornias1993 It's a legitimate question to ask :-) I'll try to answer...

a) Big-endian PPC (32, 64) is obsolete for desktop/laptop/servers, but there's still some Linux-supported hardware out there, and it's still in use in the embedded market (and don't tell the Amiga crowd their current processor of choice is obsolete ;-) ). Also, the older hardware needs the performance boost the most;

b) Little-endian PPC (64) is very much alive at IBM, with POWER8 and POWER9 out there and POWER10 announced - and as far as I understand, standard AltiVec code should work on those VSX-enabled systems (they might use a bit more parallelism that what is in this patch, if someone wants to donate me a Blackbird mainboard with CPU and cooler, I'll make sure to check & tune for POWER9 ;-) ) .

Also, this doesn't require "significant work to get working", as ZFS is already working on those architectures, and the SIMD infrastructure is common to all of them. It's only a bit of assembly, and detecting the availability of the ISA.

And also, why the port to this hardware? "Because it's there" :-)

Cordially,

@PrivatePuffin
Copy link
Contributor

a) its obsolete. period. sue me.
b) Okey, thats a good argument you got there, missed the POWER10 announcement, thanks! :)
c) (-ish) I was mistaken in my understanding it was to get ZFS working, thanks for clearing that up :)

@rdolbeau rdolbeau force-pushed the simd-altivec branch 2 times, most recently from e1f068b to 2b6caae Compare November 16, 2019 14:33
@rdolbeau
Copy link
Contributor Author

@behlendorf This should work on ppc64el now (QEMU only for me), and I've added some in-kernel detection (and updated my description accordingly).

rdolbeau added a commit to rdolbeau/zfs that referenced this pull request Dec 15, 2019
The NEON code replicates too closely the SSE code, including
a masked 16-bits shift. But NEON, like AltiVec (openzfs#9539), has
unsigned 8-bits shift, so use that instead and drop the masking.

Signed-off-by: Romain Dolbeau <romain.dolbeau@european-processor-initiative.eu>
behlendorf pushed a commit that referenced this pull request Dec 18, 2019
The NEON code replicates too closely the SSE code, including
a masked 16-bits shift. But NEON, like AltiVec (#9539), has
unsigned 8-bits shift, so use that instead and drop the masking.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Romain Dolbeau <romain.dolbeau@european-processor-initiative.eu>
Closes #9725
@rdolbeau
Copy link
Contributor Author

rebased & rechecked.

Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rdolbeau I was able to test your latest version of this PR on a little endian POWER9 system running a 4.14.0 kernel with ztest and raidz_test. Functionally everything worked great. The accelerated AltiVec code definitely provides a nice performance bump. Below are the raidz_test benchmark results which I thought you'd like to see. Nice work!

raid_test -Bv
Benchmarking parity generation...
impl, math, dcols, iosize, disk_bw, total_bw, iter
  original,    gen_p, 8,       4096, 3656.968864, 32912.719774, 1048576
  original,    gen_p, 8,       8192, 391.565382, 3524.088434, 524288
  original,    gen_p, 8,      16384, 278.245982, 2504.213842, 262144
  original,    gen_p, 8,      32768, 243.224165, 2189.017486, 131072
  original,    gen_p, 8,      65536, 246.290623, 2216.615610, 65536
  original,    gen_p, 8,     131072, 241.208152, 2170.873364, 32768
  original,    gen_p, 8,     262144, 245.671028, 2211.039251, 16384
  original,    gen_p, 8,     524288, 246.788158, 2221.093423, 8192
  original,    gen_p, 8,    1048576, 247.661341, 2228.952066, 4096
  original,    gen_p, 8,    2097152, 247.455910, 2227.103189, 2048
  original,    gen_p, 8,    4194304, 247.287560, 2225.588040, 1024
  original,    gen_p, 8,    8388608, 246.660975, 2219.948771, 512
  original,    gen_p, 8,   16777216, 244.810247, 2203.292220, 256
  original,   gen_pq, 8,       4096, 2053.563877, 20535.638767, 1048576
  original,   gen_pq, 8,       8192, 295.594390, 2955.943900, 524288
  original,   gen_pq, 8,      16384, 207.364322, 2073.643217, 262144
  original,   gen_pq, 8,      32768, 180.671352, 1806.713523, 131072
  original,   gen_pq, 8,      65536, 181.452697, 1814.526970, 65536
  original,   gen_pq, 8,     131072, 181.149970, 1811.499696, 32768
  original,   gen_pq, 8,     262144, 181.579363, 1815.793634, 16384
  original,   gen_pq, 8,     524288, 181.978324, 1819.783238, 8192
  original,   gen_pq, 8,    1048576, 182.004321, 1820.043209, 4096
  original,   gen_pq, 8,    2097152, 133.262406, 1332.624063, 2048
  original,   gen_pq, 8,    4194304, 182.149313, 1821.493132, 1024
  original,   gen_pq, 8,    8388608, 150.016988, 1500.169876, 512
  original,   gen_pq, 8,   16777216, 170.881586, 1708.815860, 256
  original,  gen_pqr, 8,       4096, 1491.024417, 16401.268592, 1048576
  original,  gen_pqr, 8,       8192, 189.621040, 2085.831436, 524288
  original,  gen_pqr, 8,      16384, 132.406009, 1456.466094, 262144
  original,  gen_pqr, 8,      32768, 115.047409, 1265.521498, 131072
  original,  gen_pqr, 8,      65536, 115.176816, 1266.944978, 65536
  original,  gen_pqr, 8,     131072, 114.208494, 1256.293431, 32768
  original,  gen_pqr, 8,     262144, 115.369561, 1269.065176, 16384
  original,  gen_pqr, 8,     524288, 115.817483, 1273.992312, 8192
  original,  gen_pqr, 8,    1048576, 115.838635, 1274.224989, 4096
  original,  gen_pqr, 8,    2097152, 115.986786, 1275.854644, 2048
  original,  gen_pqr, 8,    4194304, 115.924897, 1275.173863, 1024
  original,  gen_pqr, 8,    8388608, 111.482268, 1226.304945, 512
  original,  gen_pqr, 8,   16777216, 106.950381, 1176.454191, 256
    scalar,    gen_p, 8,       4096, 1841.727771, 16575.549943, 1048576
    scalar,    gen_p, 8,       8192, 1853.161948, 16678.457530, 524288
    scalar,    gen_p, 8,      16384, 1822.644436, 16403.799926, 262144
    scalar,    gen_p, 8,      32768, 1807.432256, 16266.890304, 131072
    scalar,    gen_p, 8,      65536, 1991.621234, 17924.591105, 65536
    scalar,    gen_p, 8,     131072, 1545.706177, 13911.355592, 32768
    scalar,    gen_p, 8,     262144, 1811.350960, 16302.158642, 16384
    scalar,    gen_p, 8,     524288, 2062.910320, 18566.192877, 8192
    scalar,    gen_p, 8,    1048576, 2132.803900, 19195.235103, 4096
    scalar,    gen_p, 8,    2097152, 2129.542420, 19165.881776, 2048
    scalar,    gen_p, 8,    4194304, 2127.168241, 19144.514172, 1024
    scalar,    gen_p, 8,    8388608, 2055.131842, 18496.186576, 512
    scalar,    gen_p, 8,   16777216, 1939.809176, 17458.282585, 256
    scalar,   gen_pq, 8,       4096, 942.578063, 9425.780634, 1048576
    scalar,   gen_pq, 8,       8192, 728.793025, 7287.930247, 524288
    scalar,   gen_pq, 8,      16384, 650.667441, 6506.674414, 262144
    scalar,   gen_pq, 8,      32768, 616.020909, 6160.209087, 131072
    scalar,   gen_pq, 8,      65536, 641.924889, 6419.248888, 65536
    scalar,   gen_pq, 8,     131072, 612.713743, 6127.137432, 32768
    scalar,   gen_pq, 8,     262144, 676.097445, 6760.974452, 16384
    scalar,   gen_pq, 8,     524288, 683.510598, 6835.105981, 8192
    scalar,   gen_pq, 8,    1048576, 691.094490, 6910.944903, 4096
    scalar,   gen_pq, 8,    2097152, 693.805901, 6938.059008, 2048
    scalar,   gen_pq, 8,    4194304, 692.127050, 6921.270495, 1024
    scalar,   gen_pq, 8,    8388608, 677.172715, 6771.727148, 512
    scalar,   gen_pq, 8,   16777216, 686.917284, 6869.172843, 256
    scalar,  gen_pqr, 8,       4096, 633.388832, 6967.277155, 1048576
    scalar,  gen_pqr, 8,       8192, 368.661178, 4055.272961, 524288
    scalar,  gen_pqr, 8,      16384, 302.184634, 3324.030973, 262144
    scalar,  gen_pqr, 8,      32768, 277.590806, 3053.498870, 131072
    scalar,  gen_pqr, 8,      65536, 284.902513, 3133.927646, 65536
    scalar,  gen_pqr, 8,     131072, 289.570407, 3185.274474, 32768
    scalar,  gen_pqr, 8,     262144, 296.502373, 3261.526107, 16384
    scalar,  gen_pqr, 8,     524288, 294.142975, 3235.572725, 8192
    scalar,  gen_pqr, 8,    1048576, 306.761183, 3374.373011, 4096
    scalar,  gen_pqr, 8,    2097152, 307.112779, 3378.240568, 2048
    scalar,  gen_pqr, 8,    4194304, 307.381541, 3381.196954, 1024
    scalar,  gen_pqr, 8,    8388608, 307.121678, 3378.338460, 512
    scalar,  gen_pqr, 8,   16777216, 306.083266, 3366.915930, 256
powerpc_altivec,    gen_p, 8,       4096, 2960.059604, 26640.536432, 1048576
powerpc_altivec,    gen_p, 8,       8192, 2526.591637, 22739.324731, 524288
powerpc_altivec,    gen_p, 8,      16384, 2300.113210, 20701.018890, 262144
powerpc_altivec,    gen_p, 8,      32768, 2216.791852, 19951.126667, 131072
powerpc_altivec,    gen_p, 8,      65536, 2495.108856, 22455.979701, 65536
powerpc_altivec,    gen_p, 8,     131072, 1794.564766, 16151.082893, 32768
powerpc_altivec,    gen_p, 8,     262144, 2218.366059, 19965.294530, 16384
powerpc_altivec,    gen_p, 8,     524288, 2600.452232, 23404.070087, 8192
powerpc_altivec,    gen_p, 8,    1048576, 2714.474529, 24430.270764, 4096
powerpc_altivec,    gen_p, 8,    2097152, 2698.543308, 24286.889771, 2048
powerpc_altivec,    gen_p, 8,    4194304, 2688.745976, 24198.713788, 1024
powerpc_altivec,    gen_p, 8,    8388608, 2543.496952, 22891.472572, 512
powerpc_altivec,    gen_p, 8,   16777216, 2150.532052, 19354.788469, 256
powerpc_altivec,   gen_pq, 8,       4096, 1540.390169, 15403.901690, 1048576
powerpc_altivec,   gen_pq, 8,       8192, 1115.555622, 11155.556222, 524288
powerpc_altivec,   gen_pq, 8,      16384, 977.258056, 9772.580560, 262144
powerpc_altivec,   gen_pq, 8,      32768, 923.560405, 9235.604047, 131072
powerpc_altivec,   gen_pq, 8,      65536, 969.052376, 9690.523755, 65536
powerpc_altivec,   gen_pq, 8,     131072, 861.794187, 8617.941874, 32768
powerpc_altivec,   gen_pq, 8,     262144, 989.944038, 9899.440383, 16384
powerpc_altivec,   gen_pq, 8,     524288, 1015.455283, 10154.552830, 8192
powerpc_altivec,   gen_pq, 8,    1048576, 1020.936189, 10209.361889, 4096
powerpc_altivec,   gen_pq, 8,    2097152, 1026.762787, 10267.627866, 2048
powerpc_altivec,   gen_pq, 8,    4194304, 1029.455805, 10294.558049, 1024
powerpc_altivec,   gen_pq, 8,    8388608, 1022.938885, 10229.388850, 512
powerpc_altivec,   gen_pq, 8,   16777216, 998.514707, 9985.147074, 256
powerpc_altivec,  gen_pqr, 8,       4096, 1041.330279, 11454.633066, 1048576
powerpc_altivec,  gen_pqr, 8,       8192, 648.855154, 7137.406696, 524288
powerpc_altivec,  gen_pqr, 8,      16384, 546.148364, 6007.632003, 262144
powerpc_altivec,  gen_pqr, 8,      32768, 508.354000, 5591.894002, 131072
powerpc_altivec,  gen_pqr, 8,      65536, 500.657823, 5507.236051, 65536
powerpc_altivec,  gen_pqr, 8,     131072, 484.756543, 5332.321970, 32768
powerpc_altivec,  gen_pqr, 8,     262144, 510.101846, 5611.120303, 16384
powerpc_altivec,  gen_pqr, 8,     524288, 541.016596, 5951.182553, 8192
powerpc_altivec,  gen_pqr, 8,    1048576, 539.579715, 5935.376864, 4096
powerpc_altivec,  gen_pqr, 8,    2097152, 538.651620, 5925.167824, 2048
powerpc_altivec,  gen_pqr, 8,    4194304, 539.098816, 5930.086976, 1024
powerpc_altivec,  gen_pqr, 8,    8388608, 536.446906, 5900.915970, 512
powerpc_altivec,  gen_pqr, 8,   16777216, 528.037246, 5808.409709, 256
Benchmarking data reconstruction...
impl, math, dcols, iosize, disk_bw, total_bw, iter
  original,    rec_p, 8,      32768, 1365.404873, 15019.453606, 16384
  original,    rec_p, 8,      65536, 1500.172203, 16501.894237, 8192
  original,    rec_p, 8,     131072, 1540.449718, 16944.946903, 4096
  original,    rec_p, 8,     262144, 1607.881029, 17686.691317, 2048
  original,    rec_p, 8,     524288, 1570.430251, 17274.732761, 1024
  original,    rec_p, 8,    1048576, 1631.146830, 17942.615133, 512
  original,    rec_p, 8,    2097152, 1645.682839, 18102.511230, 256
  original,    rec_p, 8,    4194304, 1672.468958, 18397.158538, 128
  original,    rec_p, 8,    8388608, 1664.757571, 18312.333284, 64
  original,    rec_p, 8,   16777216, 1620.383041, 17824.213455, 32
  original,    rec_q, 8,      32768, 258.946541, 2848.411954, 16384
  original,    rec_q, 8,      65536, 263.869676, 2902.566432, 8192
  original,    rec_q, 8,     131072, 269.861720, 2968.478924, 4096
  original,    rec_q, 8,     262144, 270.006075, 2970.066823, 2048
  original,    rec_q, 8,     524288, 273.609740, 3009.707135, 1024
  original,    rec_q, 8,    1048576, 273.103945, 3004.143397, 512
  original,    rec_q, 8,    2097152, 273.461091, 3008.072003, 256
  original,    rec_q, 8,    4194304, 273.374356, 3007.117921, 128
  original,    rec_q, 8,    8388608, 273.133757, 3004.471328, 64
  original,    rec_q, 8,   16777216, 272.439348, 2996.832824, 32
  original,    rec_r, 8,      32768, 28.272344, 310.995788, 16384
  original,    rec_r, 8,      65536, 29.191250, 321.103745, 8192
  original,    rec_r, 8,     131072, 29.460318, 324.063496, 4096
  original,    rec_r, 8,     262144, 29.357856, 322.936418, 2048
  original,    rec_r, 8,     524288, 29.614490, 325.759394, 1024
  original,    rec_r, 8,    1048576, 29.617688, 325.794568, 512
  original,    rec_r, 8,    2097152, 29.655820, 326.214018, 256
  original,    rec_r, 8,    4194304, 28.939436, 318.333798, 128
  original,    rec_r, 8,    8388608, 29.799910, 327.799012, 64
  original,    rec_r, 8,   16777216, 29.441554, 323.857096, 32
  original,   rec_pq, 8,      32768, 81.007427, 891.081699, 16384
  original,   rec_pq, 8,      65536, 82.856417, 911.420582, 8192
  original,   rec_pq, 8,     131072, 83.886532, 922.751849, 4096
  original,   rec_pq, 8,     262144, 84.022392, 924.246313, 2048
  original,   rec_pq, 8,     524288, 88.307637, 971.384005, 1024
  original,   rec_pq, 8,    1048576, 86.591222, 952.503444, 512
  original,   rec_pq, 8,    2097152, 88.104370, 969.148067, 256
  original,   rec_pq, 8,    4194304, 89.467332, 984.140656, 128
  original,   rec_pq, 8,    8388608, 85.940431, 945.344746, 64
  original,   rec_pq, 8,   16777216, 85.884375, 944.728125, 32
  original,   rec_pr, 8,      32768, 11.623874, 127.862612, 16384
  original,   rec_pr, 8,      65536, 11.567150, 127.238647, 8192
  original,   rec_pr, 8,     131072, 11.529299, 126.822290, 4096
  original,   rec_pr, 8,     262144, 11.415832, 125.574150, 2048
  original,   rec_pr, 8,     524288, 11.260947, 123.870421, 1024
  original,   rec_pr, 8,    1048576, 11.149352, 122.642868, 512
  original,   rec_pr, 8,    2097152, 11.092749, 122.020238, 256
  original,   rec_pr, 8,    4194304, 11.008933, 121.098259, 128
  original,   rec_pr, 8,    8388608, 11.342347, 124.765822, 64
  original,   rec_pr, 8,   16777216, 11.293360, 124.226965, 32
  original,   rec_qr, 8,      32768, 11.608232, 127.690553, 16384
  original,   rec_qr, 8,      65536, 11.756600, 129.322601, 8192
  original,   rec_qr, 8,     131072, 11.596978, 127.566758, 4096
  original,   rec_qr, 8,     262144, 11.383399, 125.217394, 2048
  original,   rec_qr, 8,     524288, 11.400347, 125.403813, 1024
  original,   rec_qr, 8,    1048576, 11.418515, 125.603668, 512
  original,   rec_qr, 8,    2097152, 11.404965, 125.454615, 256
  original,   rec_qr, 8,    4194304, 11.340958, 124.750533, 128
  original,   rec_qr, 8,    8388608, 11.655530, 128.210834, 64
  original,   rec_qr, 8,   16777216, 11.920010, 131.120113, 32
  original,  rec_pqr, 8,      32768, 9.719898, 106.918877, 16384
  original,  rec_pqr, 8,      65536, 9.590323, 105.493550, 8192
  original,  rec_pqr, 8,     131072, 9.420586, 103.626441, 4096
  original,  rec_pqr, 8,     262144, 9.143235, 100.575581, 2048
  original,  rec_pqr, 8,     524288, 8.926874, 98.195610, 1024
  original,  rec_pqr, 8,    1048576, 8.769308, 96.462390, 512
  original,  rec_pqr, 8,    2097152, 8.707765, 95.785419, 256
  original,  rec_pqr, 8,    4194304, 8.619654, 94.816197, 128
  original,  rec_pqr, 8,    8388608, 9.047230, 99.519531, 64
  original,  rec_pqr, 8,   16777216, 9.209364, 101.303004, 32
    scalar,    rec_p, 8,      32768, 1722.008893, 18942.097828, 16384
    scalar,    rec_p, 8,      65536, 1922.851589, 21151.367478, 8192
    scalar,    rec_p, 8,     131072, 2000.711816, 22007.829973, 4096
    scalar,    rec_p, 8,     262144, 2092.398965, 23016.388617, 2048
    scalar,    rec_p, 8,     524288, 2029.589641, 22325.486052, 1024
    scalar,    rec_p, 8,    1048576, 2113.481612, 23248.297731, 512
    scalar,    rec_p, 8,    2097152, 2139.860693, 23538.467624, 256
    scalar,    rec_p, 8,    4194304, 2128.252556, 23410.778116, 128
    scalar,    rec_p, 8,    8388608, 2064.779955, 22712.579501, 64
    scalar,    rec_p, 8,   16777216, 1949.351760, 21442.869358, 32
    scalar,    rec_q, 8,      32768, 536.894032, 5905.834357, 16384
    scalar,    rec_q, 8,      65536, 562.671286, 6189.384148, 8192
    scalar,    rec_q, 8,     131072, 578.279406, 6361.073469, 4096
    scalar,    rec_q, 8,     262144, 552.370216, 6076.072374, 2048
    scalar,    rec_q, 8,     524288, 591.942946, 6511.372408, 1024
    scalar,    rec_q, 8,    1048576, 603.384965, 6637.234617, 512
    scalar,    rec_q, 8,    2097152, 606.392302, 6670.315317, 256
    scalar,    rec_q, 8,    4194304, 605.955614, 6665.511753, 128
    scalar,    rec_q, 8,    8388608, 605.920120, 6665.121318, 64
    scalar,    rec_q, 8,   16777216, 605.101051, 6656.111561, 32
    scalar,    rec_r, 8,      32768, 357.845576, 3936.301340, 16384
    scalar,    rec_r, 8,      65536, 370.608057, 4076.688628, 8192
    scalar,    rec_r, 8,     131072, 369.235271, 4061.587982, 4096
    scalar,    rec_r, 8,     262144, 373.443551, 4107.879066, 2048
    scalar,    rec_r, 8,     524288, 387.798563, 4265.784192, 1024
    scalar,    rec_r, 8,    1048576, 398.905324, 4387.958564, 512
    scalar,    rec_r, 8,    2097152, 401.993218, 4421.925399, 256
    scalar,    rec_r, 8,    4194304, 403.176288, 4434.939171, 128
    scalar,    rec_r, 8,    8388608, 402.539666, 4427.936321, 64
    scalar,    rec_r, 8,   16777216, 402.087308, 4422.960389, 32
    scalar,   rec_pq, 8,      32768, 348.133726, 3829.470981, 16384
    scalar,   rec_pq, 8,      65536, 358.920096, 3948.121061, 8192
    scalar,   rec_pq, 8,     131072, 365.928799, 4025.216794, 4096
    scalar,   rec_pq, 8,     262144, 367.182858, 4039.011435, 2048
    scalar,   rec_pq, 8,     524288, 374.582448, 4120.406929, 1024
    scalar,   rec_pq, 8,    1048576, 377.202619, 4149.228806, 512
    scalar,   rec_pq, 8,    2097152, 375.485363, 4130.338996, 256
    scalar,   rec_pq, 8,    4194304, 376.623923, 4142.863152, 128
    scalar,   rec_pq, 8,    8388608, 376.274054, 4139.014590, 64
    scalar,   rec_pq, 8,   16777216, 375.262198, 4127.884174, 32
    scalar,   rec_pr, 8,      32768, 261.087141, 2871.958550, 16384
    scalar,   rec_pr, 8,      65536, 268.793841, 2956.732247, 8192
    scalar,   rec_pr, 8,     131072, 274.633992, 3020.973914, 4096
    scalar,   rec_pr, 8,     262144, 271.735599, 2989.091586, 2048
    scalar,   rec_pr, 8,     524288, 279.579945, 3075.379392, 1024
    scalar,   rec_pr, 8,    1048576, 283.390444, 3117.294880, 512
    scalar,   rec_pr, 8,    2097152, 284.499873, 3129.498599, 256
    scalar,   rec_pr, 8,    4194304, 285.468453, 3140.152980, 128
    scalar,   rec_pr, 8,    8388608, 285.574598, 3141.320574, 64
    scalar,   rec_pr, 8,   16777216, 285.222815, 3137.450965, 32
    scalar,   rec_qr, 8,      32768, 160.174862, 1761.923483, 16384
    scalar,   rec_qr, 8,      65536, 163.101908, 1794.120983, 8192
    scalar,   rec_qr, 8,     131072, 164.839079, 1813.229874, 4096
    scalar,   rec_qr, 8,     262144, 163.858116, 1802.439271, 2048
    scalar,   rec_qr, 8,     524288, 166.517959, 1831.697551, 1024
    scalar,   rec_qr, 8,    1048576, 168.883924, 1857.723168, 512
    scalar,   rec_qr, 8,    2097152, 169.959898, 1869.558880, 256
    scalar,   rec_qr, 8,    4194304, 170.192651, 1872.119162, 128
    scalar,   rec_qr, 8,    8388608, 170.206913, 1872.276041, 64
    scalar,   rec_qr, 8,   16777216, 169.988370, 1869.872065, 32
    scalar,  rec_pqr, 8,      32768, 125.973107, 1385.704180, 16384
    scalar,  rec_pqr, 8,      65536, 127.771243, 1405.483678, 8192
    scalar,  rec_pqr, 8,     131072, 129.033768, 1419.371453, 4096
    scalar,  rec_pqr, 8,     262144, 129.283033, 1422.113360, 2048
    scalar,  rec_pqr, 8,     524288, 131.051098, 1441.562079, 1024
    scalar,  rec_pqr, 8,    1048576, 132.074573, 1452.820299, 512
    scalar,  rec_pqr, 8,    2097152, 132.262644, 1454.889085, 256
    scalar,  rec_pqr, 8,    4194304, 132.363997, 1456.003964, 128
    scalar,  rec_pqr, 8,    8388608, 132.138988, 1453.528870, 64
    scalar,  rec_pqr, 8,   16777216, 132.308035, 1455.388383, 32
powerpc_altivec,    rec_p, 8,      32768, 2062.916238, 22692.078614, 16384
powerpc_altivec,    rec_p, 8,      65536, 2381.982269, 26201.804962, 8192
powerpc_altivec,    rec_p, 8,     131072, 2451.254506, 26963.799567, 4096
powerpc_altivec,    rec_p, 8,     262144, 2645.385969, 29099.245664, 2048
powerpc_altivec,    rec_p, 8,     524288, 2545.830215, 28004.132360, 1024
powerpc_altivec,    rec_p, 8,    1048576, 2694.220913, 29636.430043, 512
powerpc_altivec,    rec_p, 8,    2097152, 2664.085723, 29304.942949, 256
powerpc_altivec,    rec_p, 8,    4194304, 2632.513425, 28957.647672, 128
powerpc_altivec,    rec_p, 8,    8388608, 2469.897259, 27168.869847, 64
powerpc_altivec,    rec_p, 8,   16777216, 2102.279442, 23125.073867, 32
powerpc_altivec,    rec_q, 8,      32768, 985.001187, 10835.013055, 16384
powerpc_altivec,    rec_q, 8,      65536, 1059.659176, 11656.250934, 8192
powerpc_altivec,    rec_q, 8,     131072, 1080.222705, 11882.449760, 4096
powerpc_altivec,    rec_q, 8,     262144, 1022.957892, 11252.536816, 2048
powerpc_altivec,    rec_q, 8,     524288, 1127.513933, 12402.653266, 1024
powerpc_altivec,    rec_q, 8,    1048576, 1133.197023, 12465.167250, 512
powerpc_altivec,    rec_q, 8,    2097152, 1133.449954, 12467.949489, 256
powerpc_altivec,    rec_q, 8,    4194304, 1136.923599, 12506.159594, 128
powerpc_altivec,    rec_q, 8,    8388608, 1134.996401, 12484.960411, 64
powerpc_altivec,    rec_q, 8,   16777216, 1127.215915, 12399.375064, 32
powerpc_altivec,    rec_r, 8,      32768, 746.311646, 8209.428108, 16384
powerpc_altivec,    rec_r, 8,      65536, 789.128794, 8680.416734, 8192
powerpc_altivec,    rec_r, 8,     131072, 740.453027, 8144.983294, 4096
powerpc_altivec,    rec_r, 8,     262144, 806.123008, 8867.353089, 2048
powerpc_altivec,    rec_r, 8,     524288, 825.341087, 9078.751957, 1024
powerpc_altivec,    rec_r, 8,    1048576, 826.715775, 9093.873525, 512
powerpc_altivec,    rec_r, 8,    2097152, 827.377957, 9101.157530, 256
powerpc_altivec,    rec_r, 8,    4194304, 836.060895, 9196.669844, 128
powerpc_altivec,    rec_r, 8,    8388608, 835.287547, 9188.163020, 64
powerpc_altivec,    rec_r, 8,   16777216, 832.549718, 9158.046894, 32
powerpc_altivec,   rec_pq, 8,      32768, 627.698909, 6904.688001, 16384
powerpc_altivec,   rec_pq, 8,      65536, 655.366771, 7209.034485, 8192
powerpc_altivec,   rec_pq, 8,     131072, 668.994076, 7358.934838, 4096
powerpc_altivec,   rec_pq, 8,     262144, 678.056253, 7458.618783, 2048
powerpc_altivec,   rec_pq, 8,     524288, 687.138476, 7558.523235, 1024
powerpc_altivec,   rec_pq, 8,    1048576, 691.050006, 7601.550068, 512
powerpc_altivec,   rec_pq, 8,    2097152, 693.391627, 7627.307894, 256
powerpc_altivec,   rec_pq, 8,    4194304, 694.614706, 7640.761771, 128
powerpc_altivec,   rec_pq, 8,    8388608, 693.442189, 7627.864076, 64
powerpc_altivec,   rec_pq, 8,   16777216, 691.485597, 7606.341571, 32
powerpc_altivec,   rec_pr, 8,      32768, 526.813381, 5794.947193, 16384
powerpc_altivec,   rec_pr, 8,      65536, 546.085555, 6006.941106, 8192
powerpc_altivec,   rec_pr, 8,     131072, 544.750774, 5992.258513, 4096
powerpc_altivec,   rec_pr, 8,     262144, 553.919737, 6093.117110, 2048
powerpc_altivec,   rec_pr, 8,     524288, 567.634231, 6243.976536, 1024
powerpc_altivec,   rec_pr, 8,    1048576, 568.227901, 6250.506910, 512
powerpc_altivec,   rec_pr, 8,    2097152, 567.402074, 6241.422809, 256
powerpc_altivec,   rec_pr, 8,    4194304, 568.030638, 6248.337013, 128
powerpc_altivec,   rec_pr, 8,    8388608, 568.315850, 6251.474347, 64
powerpc_altivec,   rec_pr, 8,   16777216, 567.375011, 6241.125125, 32
powerpc_altivec,   rec_qr, 8,      32768, 378.569690, 4164.266594, 16384
powerpc_altivec,   rec_qr, 8,      65536, 388.179104, 4269.970148, 8192
powerpc_altivec,   rec_qr, 8,     131072, 386.562253, 4252.184782, 4096
powerpc_altivec,   rec_qr, 8,     262144, 392.206001, 4314.266016, 2048
powerpc_altivec,   rec_qr, 8,     524288, 400.264785, 4402.912637, 1024
powerpc_altivec,   rec_qr, 8,    1048576, 400.918811, 4410.106918, 512
powerpc_altivec,   rec_qr, 8,    2097152, 398.745683, 4386.202515, 256
powerpc_altivec,   rec_qr, 8,    4194304, 398.989236, 4388.881591, 128
powerpc_altivec,   rec_qr, 8,    8388608, 398.931527, 4388.246794, 64
powerpc_altivec,   rec_qr, 8,   16777216, 398.311458, 4381.426040, 32
powerpc_altivec,  rec_pqr, 8,      32768, 294.490208, 3239.392291, 16384
powerpc_altivec,  rec_pqr, 8,      65536, 296.667955, 3263.347508, 8192
powerpc_altivec,  rec_pqr, 8,     131072, 304.419184, 3348.611022, 4096
powerpc_altivec,  rec_pqr, 8,     262144, 308.369860, 3392.068460, 2048
powerpc_altivec,  rec_pqr, 8,     524288, 310.252375, 3412.776120, 1024
powerpc_altivec,  rec_pqr, 8,    1048576, 310.320527, 3413.525798, 512
powerpc_altivec,  rec_pqr, 8,    2097152, 309.558867, 3405.147540, 256
powerpc_altivec,  rec_pqr, 8,    4194304, 309.448900, 3403.937904, 128
powerpc_altivec,  rec_pqr, 8,    8388608, 309.409284, 3403.502121, 64
powerpc_altivec,  rec_pqr, 8,   16777216, 308.730451, 3396.034960, 32

Unrelated to this PR I did run in to one strange build issue which only seems to occur on this platform I'll get a PR opened for.

Is there any additional testing, or pending work, you'd like to see done before we merge this? If not, then from my perspective this should be ready to go.

@behlendorf behlendorf self-requested a review January 16, 2020 01:35
@rdolbeau
Copy link
Contributor Author

@behlendorf Now that #9848 is merged, I'll try to rebase/upgrade and use the new CPU handling.

Implements the RAID-Z function using AltiVec SIMD.
This is basically the NEON code translated to AltiVec.

Note that the 'fletcher' algorithm requires 64-bits
operations, and the initial implementations of AltiVec
(PPC74xx a.k.a. G4, PPC970 a.k.a. G5) only has up to
32-bits operations, so no 'fletcher'.

Signed-off-by: Romain Dolbeau <romain.dolbeau@european-processor-initiative.eu>
@rdolbeau
Copy link
Contributor Author

Rebased, updated to use #9848, rechecked with raidz_test on 32BE/64BE/64LE.

Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've retested with raidz_test and ztest using the hardware I have available, 64LE, and everything looks good.

However, loading the kmods on an altivec enabled kernel resulted in the following warning. This maps to the following kernel WARN_ON.

WARNING: CPU: 92 PID: 123163 at arch/powerpc/kernel/process.c:285 enable_kernel_altivec+0x110/0x170
void enable_kernel_altivec(void)
{
           ....
>>>     WARN_ON(preemptible());

Looking at the other enable_kernel_altivec() callers it appears that the caller in responsible for disabling preemption, unlike arm and x86. Adding the missing preempt_disable() and preempt_enable() resolved the issue. Like this:

diff --git a/include/os/linux/kernel/linux/simd_powerpc.h b/include/os/linux/kernel/linux/simd_powerpc.h
index ebb88f9..194eeaa 100644
--- a/include/os/linux/kernel/linux/simd_powerpc.h
+++ b/include/os/linux/kernel/linux/simd_powerpc.h
@@ -57,16 +57,27 @@
 #include <sys/types.h>
 #include <linux/version.h>
 
-#define        kfpu_allowed()          1
-#define        kfpu_begin()            enable_kernel_altivec()
+#define        kfpu_allowed()                  1
+#define        kfpu_begin()                    \
+{                                      \
+       preempt_disable();              \
+       enable_kernel_altivec();        \
+}
 #if LINUX_VERSION_CODE >= KERNEL_VERSION(4, 5, 0)
-#define        kfpu_end()              disable_kernel_altivec()
+#define        kfpu_end()                      \
+{                                      \
+       disable_kernel_altivec();       \
+       preempt_enable();               \
+}
 #else
 /* seems that before 4.5 no-one bothered disabling ... */
-#define        kfpu_end()              ((void) 0)
+#define        kfpu_end()                      \
+{                                      \
+       preempt_enable();               \
+}
 #endif
-#define        kfpu_init()             0
-#define        kfpu_fini()             ((void) 0)
+#define        kfpu_init()                     0
+#define        kfpu_fini()                     ((void) 0)
 
 /*
  * Check if AltiVec instruction set is available

Everything worked well after resolving this and the small issue (commented inline) which prevented me for forcing altivec to be used.

7 no 32 bits ztest, userland for BE is 32 bits and ztest crashes at start-up:

I wasn't able to do any 32-bit testing, but based on your last comment it sounds like you were able to test the 32BE implementation. Should this comment from the top post be updated. Are there any other specific tests you'd like to run?

.gen = RAIDZ_GEN_METHODS(powerpc_altivec),
.rec = RAIDZ_REC_METHODS(powerpc_altivec),
.is_supported = &raidz_will_powerpc_altivec_work,
.name = "powerpc_altivec"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"powerpc_altivec" is right at the 16 character limit which causes EINVAL to be returned when trying to set it with echo powerpc_altivec >/sys/module/zfs/parameters/zfs_vdev_raidz_impl". Increasing RAIDZ_IMPL_NAME_MAX` from 16 to 20 resolves the issue. Alternately we could shorten the name to "altivec".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a) Will fix preemption ASAP

b) For BE, I did test with raidz_test, loading modules and some zpool/zfs operations, but ztest itself always crashes on my (32 bits userland) BE systems

c) I count only 15 for powerpc_altivec... I'll push the limit to 20, as I think we should keep the $arch_$simd nomenclature for clarity.

@behlendorf behlendorf self-requested a review January 21, 2020 19:37
Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've put the updated PR through some additional manual testing, including moving an existing pool between architectures, and didn't encounter any problems. From my perspective this PR is ready to be merged.

@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Jan 23, 2020
@rdolbeau
Copy link
Contributor Author

@behlendorf I still need to merge the commits, I'll do that ASAP

@behlendorf
Copy link
Contributor

@rdolbeau sounds good. Alternately, I can squash them when merging if you prefer.

@rdolbeau
Copy link
Contributor Author

@behlendorf if you can squash while merging it's OK for me.

BTW - I couldn't get the pre-emption issue message in syslog before the patch, weirdly. I might not have run operations long-running enough for the issue to show up though :-(

@behlendorf behlendorf merged commit 35b0749 into openzfs:master Jan 23, 2020
@behlendorf behlendorf changed the title First go at AltiVec RAID-Z Add AltiVec RAID-Z Feb 10, 2020
jsai20 pushed a commit to jsai20/zfs that referenced this pull request Mar 30, 2021
Implements the RAID-Z function using AltiVec SIMD.
This is basically the NEON code translated to AltiVec.

Note that the 'fletcher' algorithm requires 64-bits
operations, and the initial implementations of AltiVec
(PPC74xx a.k.a. G4, PPC970 a.k.a. G5) only has up to
32-bits operations, so no 'fletcher'.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Romain Dolbeau <romain.dolbeau@european-processor-initiative.eu>
Closes openzfs#9539
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested) Type: Architecture Indicates an issue is specific to a single processor architecture Type: Performance Performance improvement or performance problem
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants