Porting & Optimization (4) and conclusion for fossil
In my last port, I test the C codes and Assembly codes on x86_64, and now I will test it on aarch64 machine.
I will use the same test program as last post, but change the assembly codes for running on aarch64.
//Rotation using C
//#define SHA_ROT(x,l,r) ((x) << (l) | (x) >> (r))
//#define rol(x,k) SHA_ROT(x,k,32-(k))
//#define ror(x,k) SHA_ROT(x,32-(k),k)
//Rotation using assembly under x86_64
//#define SHA_ROT(op, x, k) \
({ unsigned int y; asm(op " %1,%0" : "=r" (y) : "I" (k), "0" (x)); y; })
//#define rol(x,k) SHA_ROT("roll", x, k)
//#define ror(x,k) SHA_ROT("rorl", x, k)
//Rotation using assembly under aarch64
#define SHA_ROT(op, x, k) \
({ unsigned int y; asm(op " %0,%2,%1" : "=&r" (y) : "r" (k), "r" (x)); y; })
#define rol(x,k) SHA_ROT("ror", x, 64-(k))
#define ror(x,k) SHA_ROT("ror", x, k)
testing standard:
7 tests, run the program for 2000 times for each test, record the time for each test
remove the first test (for preloading the cache), the longest time test and the shortest time test. calculate the average for the rest 4 tests.
test result:
[root@localhost test]# time ./test_c.sh
real 0m16.349s
user 0m1.070s
sys 0m2.660s
[root@localhost test]# vi ./test_c.sh
[root@localhost test]# time ./test_c.sh
real 0m16.379s
user 0m0.960s
sys 0m2.760s
[root@localhost test]# time ./test_c.sh
real 0m16.479s
user 0m0.940s
sys 0m2.850s
[root@localhost test]# time ./test_c.sh
real 0m16.408s
user 0m0.980s
sys 0m2.760s
[root@localhost test]# time ./test_c.sh
real 0m16.506s
user 0m1.080s
sys 0m2.670s
[root@localhost test]# time ./test_c.sh
real 0m16.414s
user 0m1.110s
sys 0m2.620s
[root@localhost test]# time ./test_c.sh
real 0m16.410s
user 0m1.030s
sys 0m2.720s
[root@localhost test]#
arm64:
[root@localhost test]# time ./test_arm.sh
real 0m16.440s
user 0m1.180s
sys 0m2.570s
[root@localhost test]# time ./test_arm.sh
real 0m16.438s
user 0m1.030s
sys 0m2.720s
[root@localhost test]# time ./test_arm.sh
real 0m16.451s
user 0m1.010s
sys 0m2.750s
[root@localhost test]# time ./test_arm.sh
real 0m16.473s
user 0m1.190s
sys 0m2.580s
[root@localhost test]# time ./test_arm.sh
real 0m16.519s
user 0m1.030s
sys 0m2.800s
[root@localhost test]# time ./test_arm.sh
real 0m16.432s
user 0m1.000s
sys 0m2.780s
[root@localhost test]# time ./test_arm.sh1
real 0m16.441s
user 0m1.010s
sys 0m2.760s
C:
(.98+.96+1.08+1.03)/4 = 1.0125 s
assembly:
(1.03+1.01+1.03+1.01)/4 = 1.02 s
We can see the performances between 2 types of rotation are almost the same. The C codes don't have to be converted to assembly codes.
Conclusion:
Build:
building the fossil on aarch64 you need to replace the latest
config.guess file,
the link is
http://git.savannah.gnu.org/gitweb/?p=config.git;a=blob_plain;f=config.guess;hb=HEAD
Optimization:
According to the testing result, we can see the performance doesn't show significant improvement after changing the code to assembly, so modification is not necessary.