RaspberrypiのRedisベンチマーク、RaspbianとArchLinuxにて

先日ラズベリーパイ(RAM 512MB版)を購入したので早速Redisベンチマーク。OSはRaspbianとArchLinuxです。

RaspbianではOverclockしています。設定は、
High 950MHz ARM, 250MHz core, 450MHz SDRAM, 6 overvolt
です。

$ uname -a
Linux raspberrypi 3.6.11+ #474 PREEMPT Thu Jun 13 17:14:42 BST 2013 armv6l GNU/Linux

$ dd if=/dev/zero of=/dev/null bs=100K count=100000
100000+0 records in
100000+0 records out
10240000000 bytes (10 GB) copied, 7.60731 s, 1.3 GB/s

$ dd if=/dev/zero of=deleteme bs=32M count=100
100+0 records in
100+0 records out
3355443200 bytes (3.4 GB) copied, 271.098 s, 12.4 MB/s

$ my_repos/redis-2.6.14/src/redis-benchmark -q -n 10000
PING_INLINE: 2975.30 requests per second
PING_BULK: 3210.27 requests per second
SET: 2769.32 requests per second
GET: 2901.92 requests per second
INCR: 2787.84 requests per second
LPUSH: 2749.52 requests per second
LPOP: 2769.32 requests per second
SADD: 2790.18 requests per second
SPOP: 2937.72 requests per second
LPUSH (needed to benchmark LRANGE): 2720.35 requests per second
LRANGE_100 (first 100 elements): 1472.32 requests per second
LRANGE_300 (first 300 elements): 638.04 requests per second
LRANGE_500 (first 450 elements): 446.91 requests per second
LRANGE_600 (first 600 elements): 336.34 requests per second
MSET (10 keys): 1813.24 requests per second

総評: redisのmakeに相当時間が掛かっていたので、遅いだろうなぁと思っていたらやはり3K req/sec程度でした。なお、OverClockの設定をTurboにしてRedisベンチを取ってみたら、

$ my_repos/redis-2.6.14/src/redis-benchmark -q -n 10000
PING_INLINE: 4255.32 requests per second
PING_BULK: 4810.00 requests per second
SET: 4001.60 requests per second
GET: 4228.33 requests per second
INCR: 3971.41 requests per second
LPUSH: 3947.89 requests per second
LPOP: 4079.97 requests per second
SADD: 4029.01 requests per second
SPOP: 4299.23 requests per second
LPUSH (needed to benchmark LRANGE): 3940.11 requests per second
LRANGE_100 (first 100 elements): 2073.83 requests per second
LRANGE_300 (first 300 elements): 866.75
↑高速なのですが、途中で止まって応答不能になってしまいました^^; pingも通りません。再起動しても起動途中で止まるので、仕方なく、shiftを押しながら起動させて、コンフィグで周波数を下げるなりしたら無事起動しました。OCはHigh程度にしておいた方が良いかもしれません。

ArchLinux版はこちら↓。OCしていません。速度的にはほぼ同じですね。Linuxカーネルが同じなので当然といえば当然ですが。
$ uname -a
Linux alarmpi 3.6.11-12-ARCH+ #1 PREEMPT Tue Jun 11 16:09:48 CDT 2013 armv6l GNU/Linux

$ dd if=/dev/zero of=/dev/null bs=100K count=100000
100000+0 records in
100000+0 records out
10240000000 bytes (10 GB) copied, 9.33748 s, 1.1 GB/s

$ dd if=/dev/zero of=deleteme bs=32M count=10
10+0 records in
10+0 records out
335544320 bytes (336 MB) copied, 30.4456 s, 11.0 MB/s

$ my_repos/redis-2.6.14/src/redis-benchmark -q -n 10000
PING_INLINE: 2242.66 requests per second
PING_BULK: 2329.92 requests per second
SET: 1970.83 requests per second
GET: 2078.57 requests per second
INCR: 1887.15 requests per second
LPUSH: 1934.98 requests per second
LPOP: 1761.80 requests per second
SADD: 2023.88 requests per second
SPOP: 2153.78 requests per second
LPUSH (needed to benchmark LRANGE): 1902.23 requests per second
LRANGE_100 (first 100 elements): 1115.70 requests per second
LRANGE_300 (first 300 elements): 483.89 requests per second
LRANGE_500 (first 450 elements): 349.55 requests per second
LRANGE_600 (first 600 elements): 272.49 requests per second
MSET (10 keys): 1166.18 requests per second

また、ArchLinuxにてUnixBenchを試してみました。総合スコアは72.9でした。比較対象として、自宅のOpteron3280で2331.1、ラボのi7-3820で7719.6、鼻毛鯖で2655.8でした。うーん、悲しい結果ですね^^;とは言えそのうちARM系が主流になる日が来るのかもですね。MacBookもARMになるのではと随分前から言われていて、実際にAppleのインターンでMacOS XをARMで動かせるように書き直しているという話がありましたし。。。とは言え、Intelの最新CPUに対抗できるARMを早く見せて欲しいですね。電力効率に優れていると言われるARMですが、一説によると電力効率で言えばx86系CPUと変わらないという話もありますね。
ARMサーバーがx86を駆逐するのではと言う意見がありますが、x86のラップトップ向けの省電力CPUに勝てる日が来るのですかね。

$ wget http://byte-unixbench.googlecode.com/files/UnixBench5.1.3.tgz
$ tar zxvf UnixBench5.1.3.tgz 
$ cd UnixBench
$ ./Run 
========================================================================
   BYTE UNIX Benchmarks (Version 5.1.3)

   System: alarmpi: GNU/Linux
   OS: GNU/Linux -- 3.6.11-12-ARCH+ -- #1 PREEMPT Tue Jun 11 16:09:48 CDT 2013
   Machine: armv6l (unknown)
   Language: en_US.utf8 (charmap="UTF-8", collate="ANSI_X3.4-1968")
   16:48:50 up  3:32,  4 users,  load average: 0.49, 0.14, 0.08; runlevel unknown

------------------------------------------------------------------------
Benchmark Run: Sat Jun 29 2013 16:48:50 - 17:17:12
0 CPUs in system; running 1 parallel copy of tests

Dhrystone 2 using register variables        1671263.0 lps   (10.1 s, 7 samples)
Double-Precision Whetstone                      237.3 MWIPS (10.0 s, 7 samples)
Execl Throughput                                205.8 lps   (29.8 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks         32465.3 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks            9951.0 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks         75809.3 KBps  (30.0 s, 2 samples)
Pipe Throughput                              132478.4 lps   (10.1 s, 7 samples)
Pipe-based Context Switching                  19193.6 lps   (10.1 s, 7 samples)
Process Creation                                677.7 lps   (30.1 s, 2 samples)
Shell Scripts (1 concurrent)                    180.9 lpm   (60.4 s, 2 samples)
Shell Scripts (8 concurrent)                     25.4 lpm   (61.4 s, 2 samples)
System Call Overhead                         359444.5 lps   (10.1 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0    1671263.0    143.2
Double-Precision Whetstone                       55.0        237.3     43.1
Execl Throughput                                 43.0        205.8     47.9
File Copy 1024 bufsize 2000 maxblocks          3960.0      32465.3     82.0
File Copy 256 bufsize 500 maxblocks            1655.0       9951.0     60.1
File Copy 4096 bufsize 8000 maxblocks          5800.0      75809.3    130.7
Pipe Throughput                               12440.0     132478.4    106.5
Pipe-based Context Switching                   4000.0      19193.6     48.0
Process Creation                                126.0        677.7     53.8
Shell Scripts (1 concurrent)                     42.4        180.9     42.7
Shell Scripts (8 concurrent)                      6.0         25.4     42.3
System Call Overhead                          15000.0     359444.5    239.6
                                                                   ========
System Benchmarks Index Score                                          72.9

追記@2013年7月1日月曜日

I-O DATAのclass 10のSDHCメモリーカード、16GBが980円で売っていたので買ってきて、Raspbianでベンチを取ってみました。OCはしていません。

$ uname -a
Linux raspberrypi 3.6.11+ #474 PREEMPT Thu Jun 13 17:14:42 BST 2013 armv6l GNU/Linux

$ dd if=/dev/zero of=/dev/null bs=100K count=100000
100000+0 records in
100000+0 records out
10240000000 bytes (10 GB) copied, 8.60358 s, 1.2 GB/s

$ dd if=/dev/zero of=deleteme bs=10M count=100
100+0 records in
100+0 records out
1048576000 bytes (1.0 GB) copied, 61.8227 s, 17.0 MB/s

$ my_repos/redis-2.6.14/src/redis-benchmark -q
PING_INLINE: 2495.01 requests per second
PING_BULK: 2570.03 requests per second
SET: 2278.94 requests per second
GET: 2406.16 requests per second
INCR: 2231.64 requests per second
LPUSH: 2267.06 requests per second
LPOP: 2310.54 requests per second
SADD: 2280.50 requests per second
SPOP: 2442.60 requests per second
LPUSH (needed to benchmark LRANGE): 2244.67 requests per second
LRANGE_100 (first 100 elements): 1129.31 requests per second
LRANGE_300 (first 300 elements): 479.46 requests per second
LRANGE_500 (first 450 elements): 342.98 requests per second
LRANGE_600 (first 600 elements): 254.19 requests per second
MSET (10 keys): 1311.48 requests per second

# UnixBench5.3
$ ./Run
========================================================================
   BYTE UNIX Benchmarks (Version 5.1.3)

   System: raspberrypi: GNU/Linux
   OS: GNU/Linux -- 3.6.11+ -- #474 PREEMPT Thu Jun 13 17:14:42 BST 2013
   Machine: armv6l (unknown)
   Language: en_US.utf8 (charmap="ANSI_X3.4-1968", collate="ANSI_X3.4-1968")
   23:57:13 up  1:09,  3 users,  load average: 0.40, 0.32, 0.53; runlevel 2

------------------------------------------------------------------------
Benchmark Run: Sun Jun 30 2013 23:57:13 - 00:25:22
0 CPUs in system; running 1 parallel copy of tests

Dhrystone 2 using register variables        1676880.9 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                      268.1 MWIPS (10.0 s, 7 samples)
Execl Throughput                                246.4 lps   (29.9 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks         41452.8 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks           13437.5 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks         94322.2 KBps  (30.0 s, 2 samples)
Pipe Throughput                              179376.1 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                  24071.3 lps   (10.0 s, 7 samples)
Process Creation                                779.9 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                    436.6 lpm   (60.1 s, 2 samples)
Shell Scripts (8 concurrent)                     55.6 lpm   (60.4 s, 2 samples)
System Call Overhead                         382046.5 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0    1676880.9    143.7
Double-Precision Whetstone                       55.0        268.1     48.7
Execl Throughput                                 43.0        246.4     57.3
File Copy 1024 bufsize 2000 maxblocks          3960.0      41452.8    104.7
File Copy 256 bufsize 500 maxblocks            1655.0      13437.5     81.2
File Copy 4096 bufsize 8000 maxblocks          5800.0      94322.2    162.6
Pipe Throughput                               12440.0     179376.1    144.2
Pipe-based Context Switching                   4000.0      24071.3     60.2
Process Creation                                126.0        779.9     61.9
Shell Scripts (1 concurrent)                     42.4        436.6    103.0
Shell Scripts (8 concurrent)                      6.0         55.6     92.7
System Call Overhead                          15000.0     382046.5    254.7
                                                                   ========
System Benchmarks Index Score                                          97.3

スコアが20ポイントも伸びていますね。ストレージの性能差の影響が強いようですが、Dboule-Precisionの計算が早かったりしていますね。RaspbianとArchLinuxの差の可能性もありますね。

追記@2013年7月2日

ヒートシンクをネットワークチップとCPUに載せてみました。RaspbianでOCなしの状態です。すると平均的に5度ほど下がりました。UnixBenchなどを動かしたとき最大で↓ぐらいです。
$ cat /sys/class/thermal/thermal_zone0/temp 
48692

ちなみに再度UnixBenchをしてみたら少しスコアが伸びていました。何故でしょう。↑とまったく同じ環境なのですが。USBキーボードやHDMIケーブルを挿していない分、負荷が減ったとか?
========================================================================
   BYTE UNIX Benchmarks (Version 5.1.3)

   System: raspberrypi: GNU/Linux
   OS: GNU/Linux -- 3.6.11+ -- #474 PREEMPT Thu Jun 13 17:14:42 BST 2013
   Machine: armv6l (unknown)
   Language: en_US.utf8 (charmap="ANSI_X3.4-1968", collate="ANSI_X3.4-1968")
   01:02:56 up 1 min,  2 users,  load average: 1.03, 0.46, 0.17; runlevel 2

------------------------------------------------------------------------
Benchmark Run: Tue Jul 02 2013 01:02:56 - 01:31:05
0 CPUs in system; running 1 parallel copy of tests

Dhrystone 2 using register variables        1691199.1 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                      270.9 MWIPS (10.0 s, 7 samples)
Execl Throughput                                268.9 lps   (29.9 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks         46980.8 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks           15009.0 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks        106418.0 KBps  (30.0 s, 2 samples)
Pipe Throughput                              189392.4 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                  26221.7 lps   (10.0 s, 7 samples)
Process Creation                                868.2 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                    473.8 lpm   (60.1 s, 2 samples)
Shell Scripts (8 concurrent)                     60.3 lpm   (60.7 s, 2 samples)
System Call Overhead                         381172.2 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0    1691199.1    144.9
Double-Precision Whetstone                       55.0        270.9     49.3
Execl Throughput                                 43.0        268.9     62.5
File Copy 1024 bufsize 2000 maxblocks          3960.0      46980.8    118.6
File Copy 256 bufsize 500 maxblocks            1655.0      15009.0     90.7
File Copy 4096 bufsize 8000 maxblocks          5800.0     106418.0    183.5
Pipe Throughput                               12440.0     189392.4    152.2
Pipe-based Context Switching                   4000.0      26221.7     65.6
Process Creation                                126.0        868.2     68.9
Shell Scripts (1 concurrent)                     42.4        473.8    111.7
Shell Scripts (8 concurrent)                      6.0         60.3    100.5
System Call Overhead                          15000.0     381172.2    254.1
                                                                   ========
System Benchmarks Index Score                                         104.7

Bookmark the permalink.

Comments are closed.