EC2 Graviton Windows 11 ARM inbox StorNVMe Driver IO Performance
EC2 Graviton Windows 11 ARM inbox StorNVMe Driver IO Performance
A DD-installed Windows 11 ARM on Graviton uses the Microsoft inbox StorNVMe.sys rather than the official AWS AWSNVMe.sys. Will this inbox driver become an IO bottleneck? This article runs a full diskspd test suite for IOPS and throughput, locking the bottleneck to the instance side.
Test Environment
| Item | Configuration |
|---|---|
| Instance type | t4g.large (2 vCPU / 8 GB, Graviton2) |
| OS | Windows 11 Pro 25H2 ARM64 (Build 26200) |
| NVMe driver | Microsoft stornvme.inf (inbox, 10.0.26100.8521) |
| System disk | 50 GB gp3 (3,000 IOPS / 125 MB/s) |
| Test disk | 100 GB gp3 (16,000 IOPS / 1,000 MB/s) |
| Test tool | diskspd 2.1 (ARM64 native build) |
t4g.large instance-level EBS burst limits: bandwidth 2,780 Mbps (~347 MB/s), IOPS 11,800. The test disk configuration far exceeds the instance limit, ensuring the bottleneck is on the instance side, not the EBS volume side.
Driver Confirmation
DeviceName : Standard NVM Express Controller
DriverVersion : 10.0.26100.8521
InfName : stornvme.infBoth NVMe controllers (system disk + test disk) are driven by inbox StorNVMe, with no AWS official driver involved.
Test Results
IOPS
| Scenario | Total IOPS | Avg Latency | P99 Latency | CPU% |
|---|---|---|---|---|
| Random 4K read-only | 15,696 | 8.15 ms | 12.86 ms | 51% |
| Random 4K write-only | 15,699 | 8.15 ms | 9.03 ms | 48% |
| Random 4K mixed (70R/30W) | 15,697 (read 11,000 + write 4,697) | 8.15 ms | 10.10 ms | 46% |
Throughput
| Scenario | Throughput (MB/s) | Avg Latency | P99 Latency | CPU% |
|---|---|---|---|---|
| Sequential 128K read (QD=4) | 331.19 | 6.04 ms | 7.67 ms | 12% |
| Sequential 128K write (QD=4) | 154.85 | 12.91 ms | 20.46 ms | 16% |
| Sequential 1M read (QD=4) | 331.38 | 48.28 ms | 70.14 ms | 6% |
| Sequential 128K write (QD=16) | 331.37 | 24.14 ms | 33.20 ms | 14% |
During testing, CloudWatch showed EBSIOBalance% bottoming at 94% and EBSByteBalance% at 93%. Burst credits were ample, so these are steady-state numbers.
Analysis
IOPS exceeds the documented 11,800?
Measured IOPS hit ~15,700 rather than being capped at 11,800, because t4g.large's EBS performance spec is "up to 11,800 IOPS" (burst limit). With ample burst credits, the instance can briefly exceed the nominal value. The key point is that StorNVMe did not add an extra constraint — a driver with performance issues would not reach these numbers even without throttling.
Write throughput: 155 MB/s at QD=4, 331 MB/s at QD=16
Sequential write at low queue depth only hits 155 MB/s, while read reaches 331 MB/s at the same QD=4. The reason is EBS write latency (~13ms) is about double read latency (~6ms); each IO takes longer, so the same queue depth sustains fewer concurrent writes. Raising the queue depth to 16 brings write throughput to 331 MB/s, matching reads and saturating the instance bandwidth limit. This is an EBS write-path latency characteristic, unrelated to the driver.
CPU overhead
CPU usage during IOPS tests is 46-51% (2 vCPU), mostly spent in kernel-mode IO processing (Kernel 40-50%). Throughput tests use only 6-16%. StorNVMe CPU consumption at high IOPS is reasonable, with no abnormal software emulation overhead.
Test Commands
# Random 4K read-only - measure IOPS ceiling
diskspd.exe -b4K -r -o32 -t4 -w0 -d60 -Sh -L -c10G D:\testfile.dat
# Random 4K write-only - measure IOPS ceiling
diskspd.exe -b4K -r -o32 -t4 -w100 -d60 -Sh -L -c10G D:\testfile.dat
# Sequential 128K read - measure throughput ceiling
diskspd.exe -b128K -si -o4 -t4 -w0 -d60 -Sh -L -c10G D:\testfile.dat
# Sequential 128K write - high queue depth
diskspd.exe -b128K -si -o16 -t4 -w100 -d60 -Sh -L -c10G D:\testfile.datParameters: -b block size, -r random / -si sequential interlocked, -o queue depth, -t4 4 threads, -w write ratio, -d60 60 seconds, -Sh disable OS/hardware cache, -L latency stats, -c10G create 10 GB test file.
Conclusion
The inbox StorNVMe driver is not an IO bottleneck on EC2 Graviton2 + Windows 11 ARM:
- IOPS reached ~15,700, exceeding the documented instance burst limit
- Throughput reached 331 MB/s (95.4% of the 347 MB/s instance limit)
- Both read and write can saturate instance bandwidth (write needs higher queue depth to compensate for latency)
- CPU overhead is normal, with no driver-layer extra loss
- No need to install the official AWS NVMe driver (testing showed it actually causes boot failure)
