Friday 4 April 2014

     Learning Porting to Aarch64: Fossil(3)

I do some search for the potential optimization by adding rotate function for arm machine.

#define SHA_ROT(op, x, k) \
        ({ unsigned int y; asm(op " %1,%0" : "=r" (y) : "I" (k), "0" (x)); y; })
#define rol(x,k) SHA_ROT("roll", x, k)
#define ror(x,k) SHA_ROT("rorl", x, k)

in aarch64 command, it the rotate function is

ROR Wd, Wm, #uimm
Rotate Right (immediate): alias for EXTR Wd,Wm,Wm,#uimm.
ROR Xd, Xm, #uimm
Rotate Right (extended immediate): alias for EXTR Xd,Xm,Xm,#uimm.
 
I try do test in x86_64 first.


#include<stdio.h>
#include<stdlib.h>
#include <sys/types.h>


#define INT_BITS 32
#define TESTNUM 16

//under aarch64 original code,using C
//#define SHA_ROT(x,l,r) ((x) << (l) | (x) >> (r))
//#define rol(x,k) SHA_ROT(x,k,32-(k))
//#define ror(x,k) SHA_ROT(x,32-(k),k)


//under X86_64 original code
#define SHA_ROT(op, x, k) \
        ({ unsigned int y; asm(op " %1,%0" : "=r" (y) : "I" (k), "0" (x)); y; })
#define rol(x,k) SHA_ROT("roll", x, k)
#define ror(x,k) SHA_ROT("rorl", x, k)

char * bit_representation(unsigned int num) {
  char * bit_string = (char *)malloc(sizeof(char)*sizeof(unsigned int)*8+1);
  unsigned int i=1, j;
  for(i=i<<(sizeof(unsigned int)*8-1), j=0; i>0; i=i>>1, j++) {
    if(num&i) {
      *(bit_string+j)='1';
    } else {
      *(bit_string+j)='0';
    }
  }
  *(bit_string+j)='&#92&#48';
  return bit_string;
}

/* Driver program to test above functions */
int main()
{
 
  display = rol(TESTNUM, 2);
  display = rol(TESTNUM, 2);
  display = rol(TESTNUM, 2);
  display = rol(TESTNUM, 2);
  display = rol(TESTNUM, 2);
  display = rol(TESTNUM, 2);
  display = rol(TESTNUM, 2);
  display = rol(TESTNUM, 2);
  display = rol(TESTNUM, 2);
  display = rol(TESTNUM, 2);
  display = rol(TESTNUM, 2);

  display = ror(TESTNUM, 2);
  display = ror(TESTNUM, 2);
  display = ror(TESTNUM, 2);
  display = ror(TESTNUM, 2);
  display = ror(TESTNUM, 2);
  display = ror(TESTNUM, 2);
  display = ror(TESTNUM, 2);
  display = ror(TESTNUM, 2);
  display = ror(TESTNUM, 2);
  display = ror(TESTNUM, 2);
  display = ror(TESTNUM, 2);

}
I run this test program for 20000 times and record the time.

5 times, get 3 times in the middle. calculate average,
under x86_64:

real    0m7.057s
user    0m0.546s
sys    0m0.918s

real    0m7.065s
user    0m0.499s
sys    0m0.959s

real    0m7.049s
user    0m0.534s
sys    0m0.906s

real    0m7.101s
user    0m0.486s
sys    0m0.986s

real    0m7.069s
user    0m0.486s
sys    0m0.983s
under C code:

real    0m7.073s
user    0m0.569s
sys    0m0.857s

real    0m7.008s
user    0m0.549s
sys    0m0.856s

real    0m7.065s
user    0m0.528s
sys    0m0.897s

real    0m7.001s
user    0m0.568s
sys    0m0.833s

real    0m7.044s
user    0m0.549s
sys    0m0.862s

result:
X86_64 asm:
user: (0.499+0.534+0.486)/3=0.50633 S

C :
user: (0.549+0.528+0.568)/3=0.54833 S

under x86_64,
assembly rotation will be  7.66% faster than C function rotation.

I will try to create the assembly code for rotating under arrch64.

Learning Porting to Aarch64: cxxtools(2)

Learning Porting to Aarch64: cxxtools(2)

 

Today I take a look at the cxxtools, which is a set of libraries including a lot of functionality. 

First of all download and prep, get everything in the rpmbuild directory.

Taking a look at the source code, we can find it has a separate file to code the assembly for arm.

Looks like everything is there?

When I build the project, it throws a lot of warnings:

/bin/sh ../libtool --tag=CXX   --mode=compile g++ -DHAVE_CONFIG_H -I.  -I../src -I../include -I../include -Wno-long-long -Wall -pedantic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -grecord-gcc-switches  -fno-stack-protector  -c -o csvformatter.lo csvformatter.cpp
libtool: compile:  g++ -DHAVE_CONFIG_H -I. -I../src -I../include -I../include -Wno-long-long -Wall -pedantic -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -grecord-gcc-switches -fno-stack-protector -c csvformatter.cpp  -fPIC -DPIC -o .libs/csvformatter.o
In file included from ../include/cxxtools/string.h:34:0,
                 from ../include/cxxtools/formatter.h:32,
                 from ../include/cxxtools/csvformatter.h:32,
                 from csvformatter.cpp:29:
../include/cxxtools/char.h: In function 'bool cxxtools::operator==(const cxxtools::Char&, wchar_t)':
../include/cxxtools/char.h:143:35: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
             { return a.value() == b; }
                                   ^
../include/cxxtools/char.h: In function 'bool cxxtools::operator==(wchar_t, const cxxtools::Char&)':
../include/cxxtools/char.h:145:35: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
             { return a == b.value(); }
                                   ^
../include/cxxtools/char.h: In function 'bool cxxtools::operator!=(const cxxtools::Char&, wchar_t)':
../include/cxxtools/char.h:156:35: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
             { return a.value() != b; }
                                   ^
../include/cxxtools/char.h: In function 'bool cxxtools::operator!=(wchar_t, const cxxtools::Char&)':
../include/cxxtools/char.h:158:35: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
             { return a != b.value(); }
                                   ^
../include/cxxtools/char.h: In function 'bool cxxtools::operator<(const cxxtools::Char&, wchar_t)':
../include/cxxtools/char.h:169:34: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
             { return a.value() < b; }
                                  ^
../include/cxxtools/char.h: In function 'bool cxxtools::operator<(wchar_t, const cxxtools::Char&)':
../include/cxxtools/char.h:171:34: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
             { return a < b.value(); }
                                  ^
......
 
And then stuck.

I was thinking about how to fix this, until I take a look at their latest patch:
....
 -            else if (ch == L'\n' || ch == L'\r')
+            else if ( (int) ch == (int) L'\n' || (int) ch == (int) L'\r')
             {
                 log_debug("title=\"" << _titles.back() << '"');
                 _noColumns = 1;
-                _state = (ch == L'\r' ? state_cr : state_rowstart);
+                _state = ( (int) ch == (int) L'\r' ? state_cr : state_rowstart);
             }
-            else if (ch == L'\'' || ch == L'"')
+            else if ( (int) ch == (int) L'\'' || (int) ch == (int) L'"')
.....

They fixed similar issue by force casting the variables to int and make the comparison safe.

I will do same thing for the char.h and try building again.
 

Thursday 3 April 2014

Learning Porting to Aarch64: Fossil(2)

Learning Porting to Aarch64: fossil

These days I am learning about porting software to Aarch64, and Fossil is the one can not be built in aarch64 architecture environment.

Fossil is a distributed version control like Git and Mercurial. Fossil also supports distributed bug tracking, distributed wiki, and a distributed blog mechanism all in a single integrated package.

I use Foundation model as the virtual aarch64 environment and rpmbuild tools to build the software. OS is fedora 19.

1. Install all the needed tools for rpmbuild,
  • "Fedora Packager"
  • rpmdevtools
  • rpmlint
  • yum-utils
2. Download source

    fedpkg clone -a fossil
    cd fossil
    fedpkg srpm

3. check dependencies

    yum-builddep *.rpm (under the fossil directory)

4. preparation for rpmbuild

    rpm -i *.rpm (same directory as above)

5. build it!

    cd ~/rpmbuild/SPECS/
    rpmbuild -ba fossil.spec

Issue: build error because the autosetup's config.guess file can not recognize the aarch 64 machine. 

Then I check the config.guess and find that file's last modified date is 2010-09-24. I go to the internet and find that the lastest version is made on 2014-03-23, I check the script and find that it support aarch 64.
 This the link:
http://git.savannah.gnu.org/gitweb/?p=config.git;a=blob_plain;f=config.guess;hb=HEAD
Then I replace the config.guess file and build again.

Building successfully:

Wrote: /root/rpmbuild/SRPMS/fossil-1.28-1.20140127173344.fc19.src.rpm
Wrote: /root/rpmbuild/RPMS/aarch64/fossil-1.28-1.20140127173344.fc19.aarch64.rpm
Wrote: /root/rpmbuild/RPMS/aarch64/fossil-doc-1.28-1.20140127173344.fc19.aarch64.rpm
Wrote: /root/rpmbuild/RPMS/aarch64/fossil-debuginfo-1.28-1.20140127173344.fc19.aarch64.rpm
Executing(%clean): /bin/sh -e /var/tmp/rpm-tmp.NTvRE9
+ umask 022
+ cd /root/rpmbuild/BUILD
+ cd fossil-src-20140127173344
+ /usr/bin/rm -fr /root/rpmbuild/BUILDROOT/fossil-1.28-1.20140127173344.fc19.aarch64
+ exit 0

Then it is time to take a look at the assembly code in the file to see if I can do something.

only one line:

#define SHA_ROT(op, x, k) \
        ({ unsigned int y; asm(op " %1,%0" : "=r" (y) : "I" (k), "0" (x)); y; })
#define rol(x,k) SHA_ROT("roll", x, k)
#define ror(x,k) SHA_ROT("rorl", x, k)

#else
/* Generic C equivalent */
#define SHA_ROT(x,l,r) ((x) << (l) | (x) >> (r))
#define rol(x,k) SHA_ROT(x,k,32-(k))
#define ror(x,k) SHA_ROT(x,32-(k),k)
#endif


#define blk0le(i) (block[i] = (ror(block[i],8)&0xFF00FF00) \
    |(rol(block[i],8)&0x00FF00FF))
#define blk0be(i) block[i]
#define blk(i) (block[i&15] = rol(block[(i+13)&15]^block[(i+8)&15] \
    ^block[(i+2)&15]^block[i&15],1))

Obviously, it doesn't optimize for aarch64, so in aarch64 it will generate C code "((x) << (l) | (x) >> (r))" for this part.

In aarch64 it only support ror(rotate right), no rol. We can try right a rotate asm code for aarch 64 and try running and compare to the c code part to see which is faster.

Hua