Uwubernetes - logo for Kubernetes v1.30
Kubernetes v1.30 was released yesterday and although I have explored some interesting features in the upcoming Cilium v1.16 release, I have not put down steps for it in words, yet. I intend to rewrite the illumos part a bit. Adapt it to instead include auto-node join and try to simplify the steps after my experience with getting a control plane in FreeBSD up and running.
About two weeks ago I finally received a Turing RK1 module (Rockchip RK3588) that I ordered during my vacation in July and I’m one step closer to proceed with a plan I had hoped to get started with during the winter — to have my worker nodes running on metal instead of running as bhyve guests.
Not that there’s something wrong with bhyve, not at all! Especially not considered the nvme backend performing rather well, as seen here — a quick sample from the first reasonable web search hit I found, so take this for what it is (https://medium.com/@krisiasty/nvme-storage-verification-and-benchmarking-49b026786297), but this is on a virtual guest with 4G memory and 4 virtual cores. Way better than my experience on the virtio backend:
cat << EOF > nvme-seq-read.fio
[global]
name=nvme-seq-read
time_based
ramp_time=5
runtime=30
readwrite=read
bs=256k
ioengine=libaio
direct=1
numjobs=1
iodepth=32
group_reporting=1
[nvme0]
filename=/dev/nvme0n1
EOF
fio nvme-seq-read.fio
nvme0: (g=0): rw=read, bs=(R) 256KiB-256KiB, (W) 256KiB-256KiB, (T) 256KiB-256KiB, ioengine=libaio, iodepth=32
fio-3.28
Starting 1 process
Jobs: 1 (f=1): [R(1)][100.0%][r=400MiB/s][r=1598 IOPS][eta 00m:00s]
nvme0: (groupid=0, jobs=1): err= 0: pid=3674: Thu Apr 18 19:57:37 2024
read: IOPS=3015, BW=754MiB/s (791MB/s)(22.1GiB/30006msec)
slat (usec): min=20, max=98618, avg=324.23, stdev=518.38
clat (msec): min=2, max=123, avg=10.28, stdev= 8.47
lat (msec): min=2, max=124, avg=10.61, stdev= 8.72
clat percentiles (msec):
| 1.00th=[ 3], 5.00th=[ 4], 10.00th=[ 4], 20.00th=[ 4],
| 30.00th=[ 4], 40.00th=[ 5], 50.00th=[ 8], 60.00th=[ 11],
| 70.00th=[ 14], 80.00th=[ 18], 90.00th=[ 22], 95.00th=[ 26],
| 99.00th=[ 36], 99.50th=[ 40], 99.90th=[ 49], 99.95th=[ 61],
| 99.99th=[ 124]
bw ( KiB/s): min=293450, max=1670656, per=100.00%, avg=779886.51, stdev=430755.05, samples=59
iops : min= 1146, max= 6526, avg=3046.14, stdev=1682.62, samples=59
lat (msec) : 4=38.08%, 10=21.05%, 20=27.28%, 50=13.55%, 100=0.04%
lat (msec) : 250=0.03%
cpu : usr=2.40%, sys=97.55%, ctx=191, majf=0, minf=58
IO depths : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=100.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
issued rwts: total=90489,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=32
Run status group 0 (all jobs):
READ: bw=754MiB/s (791MB/s), 754MiB/s-754MiB/s (791MB/s-791MB/s), io=22.1GiB (23.7GB), run=30006-30006msec
Disk stats (read/write):
nvme0n1: ios=101251/713, merge=0/304, ticks=295519/1964, in_queue=297518, util=99.78%
And this one:
cat << EOF > nvme-rand-read.fio
[global]
name=nvme-rand-read
time_based
ramp_time=5
runtime=30
readwrite=randread
random_generator=lfsr
bs=4k
ioengine=libaio
direct=1
numjobs=16
iodepth=16
group_reporting=1
[nvme0]
new_group
filename=/dev/nvme0n1
EOF
fio
EOF
fio nvme-rand-read.fio
nvme0: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=16
...
fio-3.28
Starting 16 processes
Jobs: 16 (f=6): [f(2),r(2),f(4),r(1),f(4),r(3)][3.7%][r=349MiB/s][r=89.4k IOPS][eta 15m:43s]
nvme0: (groupid=0, jobs=16): err= 0: pid=4845: Thu Apr 18 20:02:18 2024
read: IOPS=87.6k, BW=342MiB/s (359MB/s)(10.0GiB/30021msec)
slat (usec): min=15, max=181136, avg=162.29, stdev=1669.78
clat (usec): min=2, max=181964, avg=2752.86, stdev=6578.19
lat (usec): min=53, max=182049, avg=2916.97, stdev=6752.52
clat percentiles (usec):
| 1.00th=[ 461], 5.00th=[ 482], 10.00th=[ 498], 20.00th=[ 519],
| 30.00th=[ 553], 40.00th=[ 660], 50.00th=[ 693], 60.00th=[ 717],
| 70.00th=[ 742], 80.00th=[ 865], 90.00th=[ 3163], 95.00th=[24511],
| 99.00th=[25035], 99.50th=[28443], 99.90th=[32900], 99.95th=[36963],
| 99.99th=[69731]
bw ( KiB/s): min=298836, max=383848, per=100.00%, avg=350382.76, stdev=994.23, samples=944
iops : min=74709, max=95962, avg=87595.24, stdev=248.57, samples=944
lat (usec) : 4=0.01%, 10=0.01%, 50=0.01%, 100=0.01%, 250=0.01%
lat (usec) : 500=11.78%, 750=59.74%, 1000=13.71%
lat (msec) : 2=4.74%, 4=0.07%, 10=0.63%, 20=1.99%, 50=7.34%
lat (msec) : 100=0.02%, 250=0.01%
cpu : usr=2.99%, sys=21.95%, ctx=18056, majf=0, minf=932
IO depths : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=100.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=2628389,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=16
Run status group 0 (all jobs):
READ: bw=342MiB/s (359MB/s), 342MiB/s-342MiB/s (359MB/s-359MB/s), io=10.0GiB (10.8GB), run=30021-30021msec
Disk stats (read/write):
nvme0n1: ios=3062520/15, merge=0/3, ticks=169941/3, in_queue=169944, util=99.61%
But bare metal on a couple of RK1 is still interesting for me, just the potential of having the Turing PI (through an API) boot/halt the tiny nodes on request, through some custom node autoscaler.. energy/noise/heat would really be in my favour. Well, that is one goal.
Another goal of mine is to explore Talos, Pulumi, Dagger and some other projects a bit further. Time is a constraint I guess we all have to deal with.
As I mentioned initially, I’ve been looking into the upcoming v1.16 release of Cilium and one (of the many) goodies is the ability to announce ClusterIP CIDR to the network. This should probably excite a couple of you as much as it excites me! This means that the worker nodes can register themselves dynamically without any special route trickery. I haven’t went into depth with this yet, but here’s a teaser:
vtysh -c 'show ip route' |grep '10.22.14'
K>* 0.0.0.0/0 [0/0] via 10.22.14.62, enp0s6f1, 00:58:52
B>* 10.0.0.0/24 [20/0] via 10.22.14.12, enp0s6f1, weight 1, 00:54:18
B>* 10.0.1.0/24 [20/0] via 10.22.14.11, enp0s6f1, weight 1, 00:54:33
B>* 10.0.2.0/24 [20/0] via 10.22.14.15, enp0s6f1, weight 1, 00:54:38
C>* 10.22.14.0/26 is directly connected, enp0s6f1, 00:58:52
B>* 10.96.0.1/32 [20/0] via 10.22.14.11, enp0s6f1, weight 1, 00:54:18
* via 10.22.14.12, enp0s6f1, weight 1, 00:54:18
* via 10.22.14.15, enp0s6f1, weight 1, 00:54:18
B>* 10.96.0.10/32 [20/0] via 10.22.14.11, enp0s6f1, weight 1, 00:54:18
* via 10.22.14.12, enp0s6f1, weight 1, 00:54:18
* via 10.22.14.15, enp0s6f1, weight 1, 00:54:18
B>* 10.97.113.249/32 [20/0] via 10.22.14.11, enp0s6f1, weight 1, 00:54:18
* via 10.22.14.12, enp0s6f1, weight 1, 00:54:18
* via 10.22.14.15, enp0s6f1, weight 1, 00:54:18
B>* 10.99.55.64/32 [20/0] via 10.22.14.11, enp0s6f1, weight 1, 00:54:18
* via 10.22.14.12, enp0s6f1, weight 1, 00:54:18
* via 10.22.14.15, enp0s6f1, weight 1, 00:54:18
B>* 10.99.233.82/32 [20/0] via 10.22.14.11, enp0s6f1, weight 1, 00:54:18
* via 10.22.14.12, enp0s6f1, weight 1, 00:54:18
* via 10.22.14.15, enp0s6f1, weight 1, 00:54:18
B>* 10.105.105.220/32 [20/0] via 10.22.14.11, enp0s6f1, weight 1, 00:54:18
* via 10.22.14.12, enp0s6f1, weight 1, 00:54:18
* via 10.22.14.15, enp0s6f1, weight 1, 00:54:18
B>* 10.107.197.40/32 [20/0] via 10.22.14.11, enp0s6f1, weight 1, 00:54:18
* via 10.22.14.12, enp0s6f1, weight 1, 00:54:18
* via 10.22.14.15, enp0s6f1, weight 1, 00:54:18
Another goal of mine is that I’m consolidating my two ZFS arrays with 40k+ Power_On_Hours into a new ZFS array with two attached mirrors. Fourteen disks will become four when I’m done with this. I have bought two disks from two vendors with a couple of months between the purchases, the disks have been powered on for varying amount of hours and so on (as I usually do with my disk purchases, actually two of the disks in my zones array are bough on Ebay just for the sake of having a different history).
Hopefully the risk of simultaneous failure will be mitigated, but this isn’t anything I want to rush. In the old array much of the data is redundant and fragmented as I, in the early days, took a backup of whole (sometimes failing) disks as images as I felt uncertain if the data was consistent. There are also ancient stuff as my old LXC environments that I will most certainly never again need (hoarding habits) with things like Debian sarge or earlier Fedora. rsync copies from mobiles and computers (not just home directories). Photos that were saved onto the first available USB-disk. A data nightmare that was created several years ago is in progress of being taken care of.
NAME STATE READ WRITE CKSUM
zones ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
c0t5000C500AFDAEAF5d0 ONLINE 0 0 0
c0t5000C500ED99A7B1d0 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
c0t5000C500B216AB2Ed0 ONLINE 0 0 0
c0t5000C500A5A08FDEd0 ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
c0t5000C500B27CE01Ad0 ONLINE 0 0 0
c0t5000C500B1B036F6d0 ONLINE 0 0 0
mirror-4 ONLINE 0 0 0
c0t50014EE25F0AC6DFd0 ONLINE 0 0 0
c0t50014EE209BD7762d0 ONLINE 0 0 0
NAME STATE READ WRITE CKSUM
domniosce ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
c0t50014EE20E090574d0 ONLINE 0 0 0
c0t50014EE2BAF0E656d0 ONLINE 0 0 0
c0t50014EE65AAF5CCCd0 ONLINE 0 0 0
c0t50014EE00376F6D6d0 ONLINE 0 0 0
c0t50014EE605959C0Dd0 ONLINE 0 0 0
c0t50014EE003772D83d0 ONLINE 0 0 0
for i in $(ls /dev/rdsk/c0t500*d0); do smartctl -a ${i} |grep Power_On; done
9 Power_On_Hours 0x0032 041 041 000 Old_age Always - 52068 (173 109 0)
9 Power_On_Hours 0x0032 046 046 000 Old_age Always - 47657 (250 35 0)
9 Power_On_Hours 0x0032 047 047 000 Old_age Always - 47064 (239 141 0)
9 Power_On_Hours 0x0032 046 046 000 Old_age Always - 47657 (72 75 0)
9 Power_On_Hours 0x0032 047 047 000 Old_age Always - 47066 (72 228 0)
9 Power_On_Hours 0x0032 086 086 000 Old_age Always - 12564
9 Power_On_Hours 0x0032 044 044 000 Old_age Always - 41106
9 Power_On_Hours 0x0032 044 044 000 Old_age Always - 41073
9 Power_On_Hours 0x0032 008 008 000 Old_age Always - 67227
9 Power_On_Hours 0x0032 020 020 000 Old_age Always - 58829
9 Power_On_Hours 0x0032 008 008 000 Old_age Always - 67248
9 Power_On_Hours 0x0032 040 040 000 Old_age Always - 44322
9 Power_On_Hours 0x0032 001 001 000 Old_age Always - 75732
9 Power_On_Hours 0x0032 001 001 000 Old_age Always - 73562
But over to the news of the day — Uwubernetes was released yesterday!
I’m happy to announce that I’ve successfully ported over the current release (v1.30) of Kubernetes to illumos and OpenBSD (and also, compiled the binaries for FreeBSD).
_output/bin/kubectl version -o yaml
clientVersion:
buildDate: "2024-04-18T16:51:06Z"
compiler: gc
gitCommit: 31799cad5ddf385f14b01fc81df99a662a54c9d2
gitTreeState: clean
gitVersion: v1.30.0-2+31799cad5ddf38
goVersion: go1.22.2
major: "1"
minor: 30+
platform: illumos/amd64
kustomizeVersion: v5.0.4-0.20230601165947-6ce0bf390ce3
Fetch the source/binaries at my GH repo https://github.com/tnorlin/kubernetes/releases
So this ended up being a post about procastrination and few results, but at least I posted a teaser.