NanoPi NEOベンチマーク

NanoPi NEO
画像は前回の使い回し

そういえばNanoPi NEOの性能はどの程度なものだろうか、ということでベンチマーク。
使ったのはUnixBench。
armbianは基本的なものは一通り揃っているのでUnixBench以外は特に事前に準備する必要は無し。

$ wget https://storage.googleapis.com/google-code-archive-downloads/v2/code.google.com/byte-unixbench/UnixBench5.1
$ tar -zxvf UnixBench5.1.3.tgz
$ cd ./UnixBench
$ ./Run -c 1 -c 4

UnixBenchの実行は ./Run でも良いが、今回は4コアのCPUなので単独実行の-c 1と4パラレル実行の-c 4を追加指定している。
なお、UnixBenchは計測に時間がかかる。NanoPi NEOでは大戸屋で食事して歯を磨いてトイレに行くくらいは余裕。

make[1]: Entering directory '/home/foobar/UnixBench'
Checking distribution of files
./pgms  exists
./src  exists
./testdir  exists
./tmp  exists
./results  exists
make[1]: Leaving directory '/home/foobar/UnixBench'
sh: 1: 3dinfo: not found

   #    #  #    #  #  #    #          #####   ######  #    #   ####   #    #
   #    #  ##   #  #   #  #           #    #  #       ##   #  #    #  #    #
   #    #  # #  #  #    ##            #####   #####   # #  #  #       ######
   #    #  #  # #  #    ##            #    #  #       #  # #  #       #    #
   #    #  #   ##  #   #  #           #    #  #       #   ##  #    #  #    #
    ####   #    #  #  #    #          #####   ######  #    #   ####   #    #

   Version 5.1.3                      Based on the Byte Magazine Unix Benchmark

   Multi-CPU version                  Version 5 revisions by Ian Smith,
                                      Sunnyvale, CA, USA
   January 13, 2011                   johantheghost at yahoo period com


1 x Dhrystone 2 using register variables  1 2 3 4 5 6 7 8 9 10

1 x Double-Precision Whetstone  1 2 3 4 5 6 7 8 9 10

1 x Execl Throughput  1 2 3

1 x File Copy 1024 bufsize 2000 maxblocks  1 2 3

1 x File Copy 256 bufsize 500 maxblocks  1 2 3

1 x File Copy 4096 bufsize 8000 maxblocks  1 2 3

1 x Pipe Throughput  1 2 3 4 5 6 7 8 9 10

1 x Pipe-based Context Switching  1 2 3 4 5 6 7 8 9 10

1 x Process Creation  1 2 3

1 x System Call Overhead  1 2 3 4 5 6 7 8 9 10

1 x Shell Scripts (1 concurrent)  1 2 3

1 x Shell Scripts (8 concurrent)  1 2 3

4 x Dhrystone 2 using register variables  1 2 3 4 5 6 7 8 9 10

4 x Double-Precision Whetstone  1 2 3 4 5 6 7 8 9 10

4 x Execl Throughput  1 2 3

4 x File Copy 1024 bufsize 2000 maxblocks  1 2 3

4 x File Copy 256 bufsize 500 maxblocks  1 2 3

4 x File Copy 4096 bufsize 8000 maxblocks  1 2 3

4 x Pipe Throughput  1 2 3 4 5 6 7 8 9 10

4 x Pipe-based Context Switching  1 2 3 4 5 6 7 8 9 10

4 x Process Creation  1 2 3

4 x System Call Overhead  1 2 3 4 5 6 7 8 9 10

4 x Shell Scripts (1 concurrent)  1 2 3

4 x Shell Scripts (8 concurrent)  1 2 3

========================================================================
   BYTE UNIX Benchmarks (Version 5.1.3)

   System: nanopineo: GNU/Linux
   OS: GNU/Linux -- 3.4.113-sun8i -- #28 SMP PREEMPT Thu Feb 2 02:01:28 CET 2017
   Machine: armv7l (unknown)
   Language: en_US.utf8 (charmap="UTF-8", collate="UTF-8")
   10:27:43 up 4 min,  1 user,  load average: 0.20, 0.51, 0.28; runlevel 5

------------------------------------------------------------------------
Benchmark Run: Fri Feb 24 2017 10:27:44 - 11:08:57
0 CPUs in system; running 1 parallel copy of tests

Dhrystone 2 using register variables        2944908.5 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                      423.9 MWIPS (10.0 s, 7 samples)
Execl Throughput                                399.6 lps   (29.9 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks         72919.0 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks           22225.5 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks        173730.8 KBps  (30.0 s, 2 samples)
Pipe Throughput                              186981.1 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                  22891.7 lps   (10.0 s, 7 samples)
Process Creation                                896.5 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                   1126.3 lpm   (60.1 s, 2 samples)
Shell Scripts (8 concurrent)                    313.2 lpm   (60.2 s, 2 samples)
System Call Overhead                         501993.8 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0    2944908.5    252.3
Double-Precision Whetstone                       55.0        423.9     77.1
Execl Throughput                                 43.0        399.6     92.9
File Copy 1024 bufsize 2000 maxblocks          3960.0      72919.0    184.1
File Copy 256 bufsize 500 maxblocks            1655.0      22225.5    134.3
File Copy 4096 bufsize 8000 maxblocks          5800.0     173730.8    299.5
Pipe Throughput                               12440.0     186981.1    150.3
Pipe-based Context Switching                   4000.0      22891.7     57.2
Process Creation                                126.0        896.5     71.2
Shell Scripts (1 concurrent)                     42.4       1126.3    265.6
Shell Scripts (8 concurrent)                      6.0        313.2    522.0
System Call Overhead                          15000.0     501993.8    334.7
                                                                   ========
System Benchmarks Index Score                                         164.1

------------------------------------------------------------------------
Benchmark Run: Fri Feb 24 2017 11:08:57 - 11:51:07
0 CPUs in system; running 4 parallel copies of tests

Dhrystone 2 using register variables       11743019.5 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                     1696.8 MWIPS (10.0 s, 7 samples)
Execl Throughput                               1350.9 lps   (29.6 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks         96558.8 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks           37054.1 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks        274432.0 KBps  (30.0 s, 2 samples)
Pipe Throughput                              742384.9 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                  92727.9 lps   (10.0 s, 7 samples)
Process Creation                               3428.3 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                   2500.0 lpm   (60.1 s, 2 samples)
Shell Scripts (8 concurrent)                    329.9 lpm   (60.5 s, 2 samples)
System Call Overhead                        1895488.7 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   11743019.5   1006.3
Double-Precision Whetstone                       55.0       1696.8    308.5
Execl Throughput                                 43.0       1350.9    314.2
File Copy 1024 bufsize 2000 maxblocks          3960.0      96558.8    243.8
File Copy 256 bufsize 500 maxblocks            1655.0      37054.1    223.9
File Copy 4096 bufsize 8000 maxblocks          5800.0     274432.0    473.2
Pipe Throughput                               12440.0     742384.9    596.8
Pipe-based Context Switching                   4000.0      92727.9    231.8
Process Creation                                126.0       3428.3    272.1
Shell Scripts (1 concurrent)                     42.4       2500.0    589.6
Shell Scripts (8 concurrent)                      6.0        329.9    549.8
System Call Overhead                          15000.0    1895488.7   1263.7
                                                                   ========
System Benchmarks Index Score                                         429.0

1パラレル(つまりシングル、オプション無しの./Runと同じ)でIndex Scoreが164.1、4パラレルで429.0というのは他所の人が掲載しているUnixBenchの結果で見る限りはRaspberryPi2 ModelBに数字上は僅かに及ばないものの事実上「ほぼ匹敵」といってよいレベル。僅か$7.99という最低グレードのNanoPi NEOがこれだけの結果を出すとは正直意外。違った、メモリ512MBモデルは$9.99だった。

NanoPi NEOのUnixBenchでこちらのページでは4パラレルが342.6と書かれているが約25%も結果が違うのは何でだろう。個々の結果も全然違うっぽいけど・・
メモリ搭載量が違うモデルだからとか?

ところで、UnixBenchでは何故かCPUの認識が正しくなくて「0 CPUs in system;」という表示が出ている。

# cat /proc/cpuinfo
Processor       : ARMv7 Processor rev 5 (v7l)
processor       : 0
BogoMIPS        : 2400.00

processor       : 1
BogoMIPS        : 2400.00

processor       : 2
BogoMIPS        : 2400.00

processor       : 3
BogoMIPS        : 2400.00

Features        : swp half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpv4 idiva idivt 
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x0
CPU part        : 0xc07
CPU revision    : 5

Hardware        : sun8i
Revision        : 0000
Serial          : 2400****************

Serialの*の部分は伏せている。
なんとなく4コアあるらしげな表示ではあるが情報がいろいろ足りないように思う。これが原因でUnixBenchでもCPUの情報が変なのかな。