Nitro Card, Why AWS is best!

Nitro Card, Why AWS is best!


One of my interesting area is a virtualization. It is very complicated area in computer science and hardware. Why AWS is so special compared with Google Cloud and Microsoft Azure. Some people said that “First Mover Advantage”

Generally speaking, Yes. But from the point of computer scientist, I think AWS Nitro is the key technology. There is no similar technology in Google Cloud and Microsoft Azure. But the detailed information is not well documented. I want to summarize the Nitro Card. This is my understanding from the AWS re-invent.

In Early 2010, the performance of AWS EC2 instance was not different other Companies, since commodity hardware and open-source Xen hypervisor would have been used. To overcome the performance bottleneck of virtualization, IO virtualization was the key technical area. At that time, everyone said about SR-IOV for network interface.

In 2015, the partnership opportunity between AWS and Annapurna Labs launched AWS EC2 C4 instance family, the network virtualization to custom hardware and ASIC optimized for storage services.

In 2016, Amazon announced that it is acquiring the Israeli startup, Annapurna Labs.

Image description

With Annapurna Labs, AWS introduced Storage Virtualization.

Image description

Image description

EC2 C4 has traditional Xen hypervisor and Intel SR-IOV.
In C4 instance type, block storage looks like scsi disk not NVMe.

# Hypervisor Info
[    0.000000] DMI: Xen HVM domU, BIOS 4.11.amazon 08/24/2006
[    0.000000] Hypervisor detected: Xen HVM
[    0.000000] Xen version 4.11.

# Block Device
[    0.858936] blkfront: xvda: barrier or flush: disabled; persistent grants: disabled; indirect descriptors: enabled; bounce buffer: disabled;
[    0.859416] scsi host1: ata_piix
[    0.863963] ata1: PATA max MWDMA2 cmd 0x1f0 ctl 0x3f6 bmdma 0xc000 irq 14 lpm-pol 0
[    0.866375] ata2: PATA max MWDMA2 cmd 0x170 ctl 0x376 bmdma 0xc008 irq 15 lpm-pol 0
[    0.867267]  xvda: xvda1 xvda14 xvda15 xvda16

# SR-IOV intel NIC
[    4.511853] ixgbevf: Intel(R) 10 Gigabit PCI Express Virtual Function Network Driver
[    4.511857] ixgbevf: Copyright (c) 2009 - 2018 Intel Corporation.
[    4.535946] ixgbevf 0000:00:03.0: 02:64:96:26:40:13
[    4.535949] ixgbevf 0000:00:03.0: MAC: 1
[    4.535951] ixgbevf 0000:00:03.0: Intel(R) 82599 Virtual Function
Enter fullscreen mode

Exit fullscreen mode



PCI Devices

root@ip-172-31-30-241:~# lspci
00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II]
00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 01)
00:02.0 VGA compatible controller: Cirrus Logic GD 5446
00:03.0 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01)
00:1f.0 Unassigned class [ff80]: XenSource, Inc. Xen Platform Device (rev 01)
In EC2 C4 instance type, there is no Nitro System.
Enter fullscreen mode

Exit fullscreen mode



Intel 82599 10GbE

Image description

Key Description
Name Intel 82599 10G gigabit Ethernet Controller
Launch Date Q2’09
Lithography 65 nm
System Interface Type PCIe v2.0 (5.0 GT/s)

AWS Nitro System was adopted from EC2 C5 instance.
With the Nitro System, AWS changed the hypervisor from Xen to KVM.

# KVM Hypervisor
[    0.000000] efi: EFI v2.7 by EDK II
[    0.000000] efi: SMBIOS=0xbbe6a000 ACPI=0xbbf5d000 ACPI 2.0=0xbbf5d014 MEMATTR=0xba3f0518 MOKvar=0xbbe58000 INITRD=0xb9ecdf18
[    0.000000] secureboot: Secure boot disabled
[    0.000000] SMBIOS 2.7 present.
[    0.000000] DMI: Amazon EC2 c5.large/, BIOS 1.0 10/16/2017
[    0.000000] Hypervisor detected: KVM

[    0.014734] Booting paravirtualized kernel on KVM
[    0.202690] smpboot: CPU0: Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz (family: 0x6, model: 0x55, stepping: 0x7)

# NVMe
[    0.455535] nvme nvme0: pci function 0000:00:04.0
[    0.465281] nvme nvme0: 2/0/0 default/read/poll queues
[    0.469270]  nvme0n1: p1 p14 p15 p16
[    0.992904] EXT4-fs (nvme0n1p1): mounted filesystem 6387caa1-122f-48bc-b447-9f1386d06e06 ro with ordered data mode. Quota mode: none.
[    2.899354] EXT4-fs (nvme0n1p1): re-mounted 6387caa1-122f-48bc-b447-9f1386d06e06 r/w. Quota mode: none.
[    3.729719] EXT4-fs (nvme0n1p16): mounted filesystem 9d39f3ff-b465-4c20-9ec1-06182226356c r/w with ordered data mode. Quota mode: none.

# Elastic Network Adaptor
[    3.409564] ena 0000:00:05.0: ENA controller version: 0.0.1 implementation version 1
[    3.430162] ena 0000:00:05.0: Elastic Network Adapter (ENA) found at mem c0510000, mac addr 06:4e:11:f5:80:6b
[    3.466230] ena 0000:00:05.0 ens5: renamed from eth0
Enter fullscreen mode

Exit fullscreen mode



PCI Devices

root@ip-172-31-44-203:~# lspci
00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma]
00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
00:01.3 Non-VGA unclassified device: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 08)
00:03.0 VGA compatible controller: Amazon.com, Inc. Device 1111
00:04.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe EBS Controller
00:05.0 Ethernet controller: Amazon.com, Inc. Elastic Network Adapter (ENA)
Enter fullscreen mode

Exit fullscreen mode



NVMe

00:04.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe EBS Controller (prog-if 02 [NVM Express])
Subsystem: Amazon.com, Inc. NVMe EBS Controller
Physical Slot: 4
Latency: 0

...

Interrupt: pin A routed to IRQ 11
Region 0: Memory at c0514000 (32-bit, non-prefetchable) [size=16K]
Capabilities: [70] Express (v2) Endpoint, MSI 00
PBA: BAR=0 offset=00003000
Kernel driver in use: nvme
Enter fullscreen mode

Exit fullscreen mode



AWS ENA Network Interface Card

ENA does not provide Link Speed. From the Linux Kernel, single ENA driver can provides 10GbE, 25GbE, or 40GbE.

00:05.0 Ethernet controller: Amazon.com, Inc. Elastic Network Adapter (ENA)
Physical Slot: 5

...

Capabilities: [b0] MSI-X: Enable+ Count=9 Masked-
Vector table: BAR=0 offset=00002000
PBA: BAR=0 offset=00003000
Kernel driver in use: ena
Kernel modules: ena
Enter fullscreen mode

Exit fullscreen mode

2023 AWS re-invent, AWS describe the Nitro Server in “A Deep dive on AWS infrastructure powering the generative AI boom”, https://d1.awsstatic.com/events/Summits/reinvent2023/CMP201_A-deep-dive-on-AWS-infrastructure-powering-the-generative-AI-boom.pdf

There are two Nitro Cards, 1x primary and 1x networking.

From the image and form factor, this nitro card is nitro v3. But these two nitro cards have a little different from size and color.

  • Front panel cabling: most of commodity server has back panel cabling. But AWS has front panel cabling.
  • No Power Unit: in the back side there is no power unit. This means AWS Nitro server uses DC power instead of AC.
  • No SSD: From the image, I can’t find any SSD like NVMe. This means the root volume of EC2 instance is not located at local SSD. Nitro Card provides Instance Store remotely.

Image description

After watching two different Nitro Cards, my question was what is the functionality of each Nitro Cards?

The hint can be founded in AWS Outpost.

AWS Outpost uses same hardware in AWS Data Center.

Image description

  • IPMI UTP cable: Left green UTP cable is IPMI port
  • Orange UTP in left Nitro Card: Nitro Controller APIs. EC2 control plane network.

Image description

From the above feature, there is a EC2 VM with single ENA, EBS, Local Storage. The Local Storage and EBS may be processed by right Nitro Card. The ENA may be processed by left Nitro Card.



No 802.11ad Bonding interface

One of hardware challenges are no redundancy for Nitro Card failure. Commodity server usually make bonding interface coupling two physical interfaces to single logical interface. But AWS Nitro Server does not make bonding interface.
But this is an understandable architecture, since network interface and storage device are PCIe devices with SR-IOV.

Remote NVMe via Nitro Card is one of the key technologies in AWS.

In the commodity hardware and standard, we need PCIe switch and separate PCI express networking equipments with MR-IOV (multi root complex) support. But Nitro Card do all these features in the Ethernet networking.

We call this kind of Nitro Card as SmartNIC or DPU(Data Processing Unit),a new class of programmable processor.



Nvidia BlueField-3

Nvidia acquired Mellanox in 2020

Image description



AMD Pendando DSC

AMD acquired Pensando for its DPU Future in 2022

Image description



Source link
lol

By stp2y

Leave a Reply

Your email address will not be published. Required fields are marked *

No widgets found. Go to Widget page and add the widget in Offcanvas Sidebar Widget Area.