Tuesday, 6 May 2014

Porting & Optimization (4) and conclusion for fossil

Porting & Optimization (4) and conclusion for fossil


In my last port, I test the C codes and Assembly codes on x86_64, and now I will test it on aarch64 machine.

I will use the same test program as last post, but change the assembly codes for running on aarch64.

//Rotation using C
//#define SHA_ROT(x,l,r) ((x) << (l) | (x) >> (r))
//#define rol(x,k) SHA_ROT(x,k,32-(k))
//#define ror(x,k) SHA_ROT(x,32-(k),k)

//Rotation using assembly under x86_64
//#define SHA_ROT(op, x, k) \
        ({ unsigned int y; asm(op " %1,%0" : "=r" (y) : "I" (k), "0" (x)); y; })
//#define rol(x,k) SHA_ROT("roll", x, k)
//#define ror(x,k) SHA_ROT("rorl", x, k)

//Rotation using assembly under aarch64
#define SHA_ROT(op, x, k) \
        ({ unsigned int y; asm(op " %0,%2,%1" : "=&r" (y) : "r" (k), "r" (x)); y; })
#define rol(x,k) SHA_ROT("ror", x, 64-(k))
#define ror(x,k) SHA_ROT("ror", x, k)


testing standard:

7 tests, run the program for 2000 times for each test, record the time for each test

remove the first test (for preloading the cache), the longest time test and the shortest time test. calculate the average for the rest 4 tests.

test result:

[root@localhost test]# time ./test_c.sh

real    0m16.349s
user    0m1.070s
sys    0m2.660s
[root@localhost test]# vi ./test_c.sh
[root@localhost test]# time ./test_c.sh

real    0m16.379s
user    0m0.960s
sys    0m2.760s
[root@localhost test]# time ./test_c.sh

real    0m16.479s
user    0m0.940s
sys    0m2.850s
[root@localhost test]# time ./test_c.sh

real    0m16.408s
user    0m0.980s
sys    0m2.760s
[root@localhost test]# time ./test_c.sh

real    0m16.506s
user    0m1.080s
sys    0m2.670s
[root@localhost test]# time ./test_c.sh

real    0m16.414s
user    0m1.110s
sys    0m2.620s
[root@localhost test]# time ./test_c.sh

real    0m16.410s
user    0m1.030s
sys    0m2.720s
[root@localhost test]#

arm64:

[root@localhost test]# time ./test_arm.sh

real    0m16.440s
user    0m1.180s
sys    0m2.570s

[root@localhost test]# time ./test_arm.sh

real    0m16.438s
user    0m1.030s
sys    0m2.720s
[root@localhost test]# time ./test_arm.sh

real    0m16.451s
user    0m1.010s
sys    0m2.750s
[root@localhost test]# time ./test_arm.sh

real    0m16.473s
user    0m1.190s
sys    0m2.580s
[root@localhost test]# time ./test_arm.sh

real    0m16.519s
user    0m1.030s
sys    0m2.800s
[root@localhost test]# time ./test_arm.sh

real    0m16.432s
user    0m1.000s
sys    0m2.780s
[root@localhost test]# time ./test_arm.sh1

real    0m16.441s
user    0m1.010s
sys    0m2.760s

C:
(.98+.96+1.08+1.03)/4 = 1.0125 s

assembly:
(1.03+1.01+1.03+1.01)/4 = 1.02 s


We can see the performances between 2 types of rotation are almost the same. The C codes don't have to be converted to assembly codes.

Conclusion:
Build:
building the fossil on aarch64 you need to replace the latest config.guess file,
the link is

http://git.savannah.gnu.org/gitweb/?p=config.git;a=blob_plain;f=config.guess;hb=HEAD

Optimization:
According to the testing result, we can see the performance doesn't show significant improvement after changing the code to assembly, so modification is not necessary.

No comments:

Post a Comment