This article is translated to English with the help of CharGPT. The original Chinese version is at 基于 Rust 的 K230 裸机嵌入式编程
Difficulty: Intermediate. Readers should have foundational knowledge of embedded systems and Rust embedded development.
This article documents the process of conducting bare-metal development using Rust on the K230 chip. From analyzing the startup process, parsing the firmware format, writing bare-metal Rust programs, perfecting initialization code, to actual peripheral control and function implementation, and exploring optimization schemes in subsequent development—all have been thoroughly investigated.
Relevant code repository for this article: k230-bare-metal. It is recommended to refer to earlier commits such as e15968040 for better understanding.
Project Background
Previously, I received a review opportunity for the Luchan Pi K230-CanMV development board from LCSC(The same group as JLC). In addition, I also own a CanMV-K230 development board.
The K230 chip is an AIoT SoC launched by Canaan Technology, adopting a heterogeneous unit acceleration computing architecture. It integrates 2 RISC-V computing cores and an AI subsystem KPU (Knowledge Process Unit). In terms of the timeline, it should be one of the earliest chips on the market to support the RVV 1.0 vector extension. Main features include:
-
Dual-core RISC-V processor:
- Core 0: 64-bit RISC-V (RV64GCB), 800MHz
- Core 1: 64-bit RISC-V, 1.6GHz, supports RVV 1.0 vector extension
-
Dedicated acceleration units:
- KPU: AI inference accelerator, supports INT8/INT16
- DPU: 3D structured light depth calculation unit
- VPU: Video codec, supports 4K resolution
-
Rich peripheral interfaces:
- Communication interfaces: UART×5, I2C×5, SPI×3
- Storage interfaces: USB 2.0×2, SD/eMMC
- Others: GPIO×72, PWM×6, WDT, RTC, Timer
Under normal usage, the development board runs the CanMV firmware, which is compatible with OpenMV, providing a very convenient development environment for developers.
The firmware is based on RT-Thread Smart (RT-Smart), which is a version of RT-Thread that supports user-space applications, suitable for SoCs with MMU, such as the K230. CanMV is implemented as an application (a fork of MicroPython) on RT-Thread.
Additionally, early versions of the CanMV firmware used Linux + RT-Thread + MicroPython. The official sources also provide a pure Linux version of the firmware.
This project aims to explore:
- The differences in startup methods and usage modes between MPUs and MCUs
- How to use Rust for bare-metal development on MPU chips
- The underlying startup mechanism and hardware features of the K230
For MPUs and most MCUs, there is an on-chip Boot ROM used to start the system. Typically, the Boot ROM initializes some hardware (e.g., SPI Flash, TF Card), loads the firmware into memory, and then executes the first instruction of the system firmware (such as U-Boot). Subsequently, the system firmware provided by the user further initializes more hardware and loads the actual operating system.
Bare-metal development refers to running programs directly on the hardware without using an operating system, similar to how an MCU runs directly after the system’s Boot ROM.
Boot Code Analysis
First, we need to read the code from the official CanMV repository to determine if there are any non-open-source parts, especially core components like U-Boot and RT-Thread/Linux drivers.
For U-Boot, we also need to confirm whether the Secondary Program Loader (SPL) is open-source. SPL is often used to initialize peripherals like DDR and to load U-Boot. Many manufacturers do not open-source it and only provide binary files.
Note: SPL literally means Secondary Program Loader. Boot ROM is generally considered the first-stage loader.
The good news is that the relevant code is all in the CanMV repository and open-source. However, the code structure is relatively complex, requiring some time to read and analyze the specific startup processes and logic.
With the advent of ChatGPT, we can complete code analysis more quickly. I once joked that if ChatGPT had appeared a few years earlier, many toolchains would not need to exist.
Here, we only consider the TF card startup scenario, where the system firmware is on the TF card, and the on-chip Boot ROM loads the firmware into memory. That is, our program needs to perform the same tasks as U-Boot, including the functions of SPL.
Note: TF card, SD card, and eMMC are essentially the same at the protocol level. This article does not make strict distinctions among them.
From Power-On Reset to Loading and Executing User Firmware
First, the Boot ROM loads the firmware into memory. This part of the logic is directly solidified in the chip’s Boot ROM and is uncontrollable because the Boot ROM’s code and logic are integrated inside the chip and cannot be modified or interfered with by the user. The Boot ROM determines the boot method by reading the status of the BOOT0 and BOOT1 pins. The voltage levels of these two pins decide from which medium the chip loads the boot program during startup.
According to the chip manual, the Boot ROM’s memory-mapped location is 0x9120_0000 ~ 0x9121_0000, using the first half of the SRAM 0x8020_0000 ~ 0x8030_0000. This information can be confirmed by reading characteristics like sp/ra through bare-metal programs. For example, the Boot ROM sets the stack pointer sp to the highest address of available memory. The Boot ROM typically uses the call instruction to transfer control to the user firmware, and ra will be set to the pc of the current jump function.
The Boot ROM loads the firmware (usually U-Boot) from the TF card according to a predetermined fixed format. Specifically, the Boot ROM accesses the TF card, reads the firmware area, decodes it, and copies it to the specified memory location 0x8030_0000.
After the firmware is loaded, the Boot ROM transfers control to the firmware just loaded into memory, i.e., it jumps to execute U-Boot. This marks the transition of the startup process from the Boot ROM stage to the firmware (U-Boot) stage. U-Boot, as a more powerful bootloader, can further initialize system hardware, load the operating system kernel such as RT-Thread or Linux Kernel, and execute other user-defined startup tasks.
U-Boot has a two-stage startup process: SPL and U-Boot. SPL is used to initialize peripherals like DDR and to load U-Boot. We will not consider the logic after U-Boot (e.g., OpenSBI, RT-Thread Smart) in this article. From the firmware format, this part exists in the form of firmware partitions, sequentially loaded by U-Boot SPL to load U-Boot, and then U-Boot loads RT-Thread/Linux Kernel.
The K230 is equipped with two CPUs, referred to as CPU0 (small core) and CPU1 (big core). The two cores operate at different frequencies, and CPU1 supports the RVV 1.0 vector extension, constituting a heterogeneous multi-core architecture.
During the startup process, when the chip’s reset signal is released, the Boot ROM starts execution on the small core. This means that CPU0 is the first activated core, responsible for executing the initial boot program and performing basic system initialization. Meanwhile, the de-reset process of the big core is controlled by the small core. In other words, while the small core completes its own initialization, it also needs to send instructions to release the reset state of the big core, allowing it to start running from a specific location. This architectural design ensures that the small core not only shoulders the responsibility of booting the system but also controls the startup process of the big core, laying the foundation for the entire SoC to begin functioning.
Firmware Format
To ensure our firmware is recognized by the Boot ROM, it needs to conform to a specific firmware format. Different SoC manufacturers have different solutions; some use fixed filenames on FAT32, some use fixed formats at specific offsets, and some use configuration files. The K230 uses a fixed offset firmware format.
The Boot ROM of the K230 identifies data characteristics at a fixed offset on the TF card, and firmware that meets the format will be loaded into memory. The Boot ROM has initialized UART0 and will output simple error messages, such as “boot failed with exit code 19” indicating that the TF card was not found, or “boot failed with exit code 13” indicating a firmware format error.
After analyzing the relevant compilation process, we deduced the firmware format of the K230 as follows:
00000000 +-------------+-------------+-------------+-------------+
| ........... | ........... | ........... | ........... | <- Partition table / any other data
| ........... | ........... | ........... | ........... |
+-------------+-------------+-------------+-------------+
00100000 | 4B 32 33 30 | 8C FC 02 00 | 00 00 00 00 | BF 8D 0F 38 | <- Firmware header: "K230...........8"
| MAGIC: K230 | Length | Encryption | SHA256 hash | <- Encryption 0: none, 1: SM4, 2: AES+RSA
+-------------+-------------+-------------+-------------+
00100010 | 03 F3 87 07 | FA 1B D8 1D | 4F A0 CD A0 | 7B 54 35 BD | <- SHA256 hash continuation
+-------------+-------------+-------------+-------------+
00100020 | 35 82 85 89 | 66 4D AC 27 | CA F8 56 49 | 00 00 00 00 | <- SHA256 hash continuation + Padding
+-------------+-------------+-------------+-------------+
00100030 | 00 00 00 00 | 00 00 00 00 | 00 00 00 00 | 00 00 00 00 | <- Padding zeros
+-------------+-------------+-------------+-------------+
| ........... | ........... | ........... | ........... | <- Padding zeros
+-------------+-------------+-------------+-------------+
00100210 | 00 00 00 00 | 73 25 40 F1 | 2A 82 AE 84 | 93 01 00 00 | <- Firmware data, length zero position
| Version | OpCodes | Data | Padding | <- Version: 0
| ........... | ........... | ........... | ........... | <- Firmware data, raw opcodes
+-------------+-------------+-------------+-------------+
Relevant C structure definitions are located in CanMV at src/uboot/uboot/board/kendryte/common/board_common.h
.
Here, we simplify the processing by not encrypting the firmware and using version number 0. We write a Python script to create the .img
firmware file for the TF card image:
#!/usr/bin/env python3
# genimage.py
import hashlib
MAGIC = b"K230"
def sha256(message):
digest = hashlib.sha256(message).digest()
return digest
VERSION = b"x00x00x00x00"
with open("./firmware.bin", "rb") as f:
data = f.read()
input_data = VERSION + data
data_len = len(input_data)
raw_data_len = data_len.to_bytes(4, byteorder="little")
encryption_type = 0
encryption_type = encryption_type.to_bytes(4, byteorder="little")
hash_data = sha256(input_data)
firmware = MAGIC + raw_data_len + encryption_type + hash_data
firmware += bytes(516 - 32) # padding
firmware += input_data
img = bytes(0x100000) + firmware # image offset 0x100000
# Ensure the image size is a multiple of 512 bytes
if len(img) % 512 != 0:
img += bytes(512 - len(img) % 512)
with open("./firmware.img", "wb") as f:
f.write(img)
print("len", len(img))
Where firmware.bin
is generated via objcopy -O binary
:
cargo objcopy --release -- -O binary firmware.bin && python3 genimage.py
Note that disk images are generally aligned to 512 bytes, so we need to pad to align to 512 bytes.
Flashing the firmware can be done using any programming tool, including the dd
command.
Start Writing Some Bare-Metal Code
With the firmware loading settled, the SoC control flow can be handed over to our program. Here, we use the Rust language to write a bare-metal program.
Essential elements for Rust bare-metal embedded development include:
-
Toolchain target: Install using
rustup
:rustup target add riscv64gc-unknown-none-elf
-
Linker script
link.x
: Used to define memory layout (can also directly define firmware layout) -
Startup code: Used to initialize the stack, jump to Rust code, similar to
start.S
in C embedded development
From the relevant code reading, we know that the code in the TF card is loaded to 0x8030_0000 ~ 0x8040_0000. To avoid additional uncertainties, we can directly use the linker script from U-Boot to ensure the symbols defined in Rust code are properly loaded.
MEMORY { .spl_mem : ORIGIN = 0x80300000, LENGTH = 0x80000 }
MEMORY { .bss_mem : ORIGIN = 0x80380000, LENGTH = 0x20000 }
OUTPUT_ARCH("riscv")
ENTRY(_start)
PROVIDE(__stack_start__ = ORIGIN(.bss_mem) + LENGTH(.bss_mem));
/* Omitted specific section definitions */
Due to the lack of first-hand chip development materials, we do not know exactly what the initialized state is after the Boot ROM; at this time, we can only rely on speculation and experimentation.
Verifying Bare-Metal Execution – UART
For bare-metal programming, we need to initialize the device’s initial state, including the stack pointer sp, system execution mode, interrupt table, enabling interrupts, etc. These tasks are usually completed by start.S
or crt0.c
. Minimal initialization code often only needs to set the stack pointer sp to ensure that functions can be called. If sp is invalid, using the stack (e.g., function calls) will lead to memory access violations or illegal instruction exceptions, i.e., “running wild.”
Without a JTAG debugging environment (the chip supports it, but I didn’t use CK-LINK), how do we determine whether our code is being executed and whether it is executing correctly? Here, we can use UART0 to output debugging information. Since the Boot ROM has already initialized UART0, we can use it directly.
From the Device Tree .dtsi
files in the U-Boot source code, we can see that the K230 uses a lot of DesignWare IP peripherals, such as UART0, SPI, I2C, etc. The specific register manuals for these peripherals can be obtained online. The UART peripheral is compatible with the 16550, which is the serial port chip we’re familiar with on PCs. The register address for UART0 is 0x9140_0000.
We can use global_asm!
to output characters to verify whether the firmware code is being executed. For example:
#![no_std]
#![no_main]
global_asm!(r#"
.section .text.start
.global _start
la sp, __stack_start__
call _start_rust
"#);
#[no_mangle]
pub extern "C" fn _start_rust() {
loop {
// UART0.THR = 'A'
core::ptr::write_volatile(0x9140_0000 as *mut u32, 0x41);
for _ in 0..100000000 {
unsafe { asm!("nop") }
}
}
}
After compiling and flashing the above code, you should see a series of A characters in the serial terminal. This indicates that our code has been successfully executed.
Accessing Peripheral Registers – PAC
In Rust embedded development, accessing peripheral registers is often done through PAC (Peripheral Access Crate), such as the stm32xxxx-pac
crate. However, since the K230 is a relatively new chip, there is no relevant PAC crate available. The official sources are also unlikely to provide an SVD file for reference. Therefore, I chose to use the chiptool method and employed the yaml2pac tool to generate the PAC crate by manually maintaining YAML definitions of the peripheral registers. Regarding PAC access, please refer to my article Peripheral Register Access in Rust Embedded Development: From svd2rust to chiptool and metapac – Using hpm-data as an Example.
The relevant YAML files can be conveniently created with the assistance of LLM (Large Language Models) by extracting OCR from PDF manuals.
Using the yaml2pac tool, we can easily generate our own PAC library:
yaml2pac -i registers/uart_dw.yaml -o pac/src/uart_dw.rs
Then, add specific peripheral address definitions in lib.rs
:
#[path = "uart_dw.rs"]
pub mod uart;
pub const UART0: uart::Uart = unsafe { uart::Uart::from_ptr(0x9140_0000 as *mut ()) };
pub const UART1: uart::Uart = unsafe { uart::Uart::from_ptr(0x9140_1000 as *mut ()) };
pub const UART2: uart::Uart = unsafe { uart::Uart::from_ptr(0x9140_2000 as *mut ()) };
pub const UART3: uart::Uart = unsafe { uart::Uart::from_ptr(0x9140_3000 as *mut ()) };
With simple encapsulation, we can conveniently access peripherals. Creating and maintaining a PAC in the absence of documentation is relatively difficult, but once completed, it can greatly improve development efficiency.
Facilitating Debugging – println!
Macro
With the peripheral register definitions, we can now write a complete UART HAL driver or achieve a println!
macro through simple register access.
#[derive(Debug)]
pub struct Console;
impl core::fmt::Write for Console {
fn write_str(&mut self, s: &str) -> core::fmt::Result {
use pac::UART0;
for c in s.as_bytes() {
unsafe {
while !UART0.lsr().read().thre() {
asm!("nop");
}
UART0.thr().write(|w| w.set_thr(*c));
}
}
Ok(())
}
}
#[macro_export]
macro_rules! println {
($($arg:tt)*) => {
{
use core::fmt::Write;
writeln!(&mut $crate::Console, $($arg)*).unwrap();
}
};
() => {
{
use core::fmt::Write;
writeln!(&mut $crate::Console, "").unwrap();
}
};
}
With the println!
macro, we can conveniently output debugging information, significantly improving development efficiency.
Complete Initialization Code
So far, we’ve only initialized the stack; other essential elements such as the system interrupts and even the .bss
section have not been initialized. In a complete embedded program, these are necessary.
Unlike MCU programming, the code execution of an MPU is loaded into a certain area of memory by the Boot ROM, so the .data
section copy commonly seen in the start.S
of an MCU is not needed. Clearing the .bss
section depends on the situation; since it’s relatively simple, we will skip the memory initialization part in this section.
Interrupt Handler
For RISC-V, the interrupt handler is a special function. Rust provides the "riscv-interrupt-m"
ABI specifically for the special logic of interrupt handlers. Specifically, it adds stack frame preservation and restoration for interrupt handlers and uses the mret
instruction instead of ret
to return.
#[link_section = ".trap"]
#[no_mangle]
unsafe extern "riscv-interrupt-m" fn _start_trap_rust() {
println!("trap!");
let mcause = riscv::register::mcause::read();
println!("mstatus: {:016x}", riscv::register::mstatus::read().bits());
println!("mcause: {:016x}", riscv::register::mcause::read().bits());
println!("mtval: {:016x}", riscv::register::mtval::read());
println!("mepc: {:016x}", riscv::register::mepc::read());
loop {}
}
Here, we print some important interrupt information to help determine whether the interrupt function is being called correctly.
Using #[no_mangle]
is to expose the symbol so that we can set the interrupt handler entry address in assembly code.
Using #[link_section = ".trap"]
is to place this function in the .trap
section for handling in the linker script, especially memory alignment (ALIGN(8)
). This is a common error when writing bare-metal code because the address of the mtvec
register must be aligned (the lower 2 bits are occupied by the vector mode bits); otherwise, it will cause an exception.
For now, we don’t need to handle interrupts; we just need to observe if interrupts are being triggered and whether the interrupt handler is being executed. So we use loop {}
.
Interrupt Initialization
For RISC-V, initializing interrupts generally involves the following steps:
- Set mtvec: Interrupt handler entry address
- Set the MIE bit in mstatus: Enable interrupts
- Set the MEIE bit in mie: Enable external interrupts, timer interrupts, etc.
The K230 uses a Xuantie C908 core, supporting CLINT and PLIC interrupt controllers. Relevant information can be obtained from the C908 manual.
global_asm!("
.option push
.option norelax
la gp, __global_pointer$
.option pop
la t1, __stack_start__
addi sp, t1, -16
// Initialize interrupts
la t0, _start_trap_rust
csrw mtvec, t0
call _early_init
// Continue to call _start_rust
call _start_rust
");
#[no_mangle]
unsafe extern "C" fn _early_init() {
use riscv::register::*;
mstatus::set_mie(); // Enable global interrupts
mstatus::set_sie(); // Enable supervisor interrupts
mie::set_mext(); // Enable external interrupts
mie::set_msoft(); // Enable software interrupts
mie::set_mtimer(); // Enable timer interrupts
}
The MIE bit in the mstatus
register is used to control interrupt enabling, and the MEXT bit in the mie
register is used to control external interrupt enabling, i.e., PLIC, for handling peripheral interrupts.
Here, we also initialize gp, which is a global pointer register used for accessing global variables in Rust (defined at a special location in the linker script). Of course, when using small and concentrated memory regions, you may not see instructions using the gp register.
Other CSR Initialization
Depending on the platform, other hardware may need initialization, such as disabling PMP, initializing the FPU, enabling mcycle and mtime counters, etc.
Initializing the FPU is necessary; otherwise, any floating-point instruction will cause an exception. Rust’s "riscv-interrupt-m"
implementation isn’t intelligent enough to determine FPU usage, so when the target includes +f
/+d
, the ABI will default to using FPU push/pop instructions.
// Omitted platform-specific register initialization
// Including disabling PMP
asm!(
"
li t0, 0x00001800
csrw mstatus, t0
"
);
mcounteren::set_cy(); // Enable cycle counter
mcounteren::set_tm(); // Enable time counter
// FPU initialization
mstatus::set_fs(mstatus::FS::Clean);
mstatus::set_fs(mstatus::FS::Initial);
asm!("csrwi fcsr, 0");
In addition to interrupt enabling, mstatus is also responsible for the current CPU operating mode, such as M/S/U mode.
With the system’s mcycle CSR, we can conveniently use the Delay
trait in the embedded-hal
ecosystem to achieve more precise delays, moving away from using nop
.
const CPU0_CORE_CLK: u32 = 800_000_000;
let mut delay = riscv::delay::McycleDelay::new(CPU0_CORE_CLK);
delay.delay_ms(1000);
Verifying Interrupt Handling
We can verify whether the interrupt handler is being executed by directly triggering a software interrupt. The CLINT interrupt controller of the K230 can trigger a software interrupt through the msip
register.
pac::CLINT.msip(0).write(|w| w.set_msip(true)); // Trigger software interrupt
Modify the interrupt handler _start_trap_rust
to add a return:
if mcause.is_interrupt() && mcause.code() == riscv::interrupt::Interrupt::MachineSoft as _ {
println!("Machine Software Interrupt");
pac::CLINT.msip(0).write(|w| w.set_msip(false)); // Clear software interrupt
return;
}
Using the mtime
and mtimecmp
CSRs can also verify timer interrupts. However, I found a pitfall: reading the mtime
of K230’s CLINT via a 64-bit load instruction yields random content without any exception. This means that the 64-bit mtime
must be read twice in 32-bit segments and then combined into 64 bits. Only the rdtime
instruction can read the 64-bit mtime
at once.
DDR Initialization
DDR initialization (SDRAM initialization) is a relatively complex process, generally requiring clock initialization, reset controller configuration, PHY training, chip initialization, timing configuration, self-check, etc. These contents are often provided directly by the manufacturer, and the register writing flow in the corresponding DDR initialization code is also like cryptic scripts.
Therefore, the DDR initialization code is directly translated from C using LLM (Large Language Models) without additional explanations. The DDR initialization code varies among different DDR chips.
After DDR initialization, we can use the DDR memory region. A pitfall here is that the starting address of DDR memory is 0x0000_0000. However, Rust has many restrictions on accessing the zero address, and most functions will directly panic. Programs should avoid using the 0x0000_0000 address.
Start Real Bare-Metal Programming
With the above initialization foundation, we can finally start actual bare-metal programming—for example, initializing other peripherals, reading and writing peripheral registers, and even implementing some simple functions.
Here, we demonstrate with two peripherals as examples. The relevant peripheral register definitions are already written in the k230-bare-metal repository.
Blinking an LED Using GPIO
For both MCUs and MPUs, the steps for blinking an LED using GPIO are similar:
- Enable (or reset) the GPIO peripheral clock and power
- Set the pin function multiplexing and pin mode
- Perform GPIO write operations
In the K230, the peripheral clock and power signals are enabled by default (this can be confirmed by checking the relevant registers). Therefore, we only need to set the multiplexing function through IOMUX and set the pin mode through the GPIO peripheral.
The functionality can be referenced from official documentation, and the pin multiplexing documentation is located in K230_PINOUT_V*.xlsx
.
The IOMUX peripheral is a PAD-like structure where each pin is set through a 32-bit register to configure multiplexing functions, pull-up/pull-down settings, input/output enable, etc. I obtained these definitions through .dtsi
files and C header files, also using LLM to translate them into YAML definitions. Calling IOMUX.pad(n).set_sel(0)
sets the pin’s mode to the corresponding GPIO.
The GPIO peripheral comes from DW_apb_gpio. For those familiar with Verilog or other HDL languages, this is a configurable GPIO IP core with up to 4 ports. There are several configuration registers that can obtain the initial parameters of the peripheral:
GPIO0 config_reg1: num_ports=1
GPIO0 config_reg2: len(PA)=32
GPIO1 config_reg1: num_ports=2
GPIO1 config_reg2: len(PA)=32 len(PB)=8
A total of 32 + 32 + 8 = 72 pins are divided into two GPIO controllers, where the GPIO1 controller has two ports. This can perfectly fit the cluster/array definition method in chiptool.
fn blinky() {
// RGB LED of Luchan Pi K230
// - R: GPIO62
// - G: GPIO20
// - B: GPIO63
use pac::{GPIO0, GPIO1, IOMUX};
IOMUX.pad(20).modify(|w| w.set_sel(0)); // function = GPIOx
IOMUX.pad(62).modify(|w| w.set_sel(0));
IOMUX.pad(63).modify(|w| w.set_sel(0));
GPIO0.swport(0).ddr().modify(|w| *w |= 1 << 20); // output mode
GPIO1.swport(0).ddr().modify(|w| *w |= 1 << 30);
GPIO1.swport(0).ddr().modify(|w| *w |= 1 << 31);
loop {
GPIO0.swport(0).dr().modify(|w| *w ^= 1 << 20); // toggle data
// GPIO1.swport(0).dr().modify(|w| *w ^= 1 << 30);
GPIO1.swport(0).dr().modify(|w| *w ^= 1 << 31);
riscv::delay::McycleDelay::new(CPU0_CORE_CLK).delay_ms(1000);
}
}
PWM Buzzer
The K230 has 6 PWM outputs, divided into two PWM controllers. Each controller internally has 3 PWM output channels. An additional channel 0 is responsible for configuring the reload.
The buzzer on the Luchan Pi K230 development board is controlled via PWM1 GPIO43. The input clock of the PWM peripheral is 100MHz, and the division factor is set via PWMCFG.SCALE as (2^n).
To make the buzzer reach a frequency audible to the human ear, the PWM frequency is generally set around 1KHz. The PWM frequency and duty cycle are set through PWMCFG.SCALE and PWMx.CMP. The relevant code is as follows; refer to the comments for register value calculations.
fn buzzer() {
// GPIO43 - PWM1
use pac::{IOMUX, PWM0};
// PCLK, PWM uses the APB clock to program registers and generate waveforms. The default frequency is 100MHz.
IOMUX.pad(43).modify(|w| {
w.set_sel(2); // PWM function = 2
w.set_oe(true);
w.set_ds(7);
});
// Calculations:
// scale = 2
// period = 0x5000
// freq = 100,000,000 / (1 << 2) / 0x5000 = 1,220.7 Hz
// duty = period / 2 = 0x2800
PWM0.pwmcfg().modify(|w| {
w.set_zerocomp(true);
w.set_scale(2);
});
PWM0.pwmcmp(0).write(|w| w.0 = 0x5000); // PWMCMP0: RELOAD
let duty = 0x2800;
PWM0.pwmcmp(2).modify(|w| w.0 = duty); // PWMCMP2: PWM1
// Enable PWM
PWM0.pwmcfg().modify(|w| w.set_enalways(true));
riscv::delay::McycleDelay::new(CPU0_CORE_CLK).delay_ms(100);
// Disable PWM
PWM0.pwmcfg().modify(|w| w.set_enalways(false));
riscv::delay::McycleDelay::new(CPU0_CORE_CLK).delay_ms(100);
}
Some Extended Thoughts
Why Bare-Metal?
Bare-metal programming is the foundation of embedded development and is also the lowest level of development. Through bare-metal programming, we can better understand the working principles of hardware and the underlying aspects of operating systems.
Using all the libraries and SDKs out there is not as good as writing one ourselves; once you understand one, you understand many.
Shell?
In a bare-metal environment, since there is no operating system, no standard input/output, and no file system, a full-fledged Shell is impossible. However, we can implement simple command-line interaction via the serial port. All we need are two serial port functions: putchar
and getchar
, and a simple parser.
noline is a small no-std line-editing crate that can be used to implement simple command-line interactions. Moreover, it’s based on the embedded-hal
ecosystem, making it easy to port. It supports line history and common shortcuts. Of course, writing a readline from scratch is also a good exercise.
By implementing several shell commands, we can achieve simple interactions such as reading and writing peripheral registers, reading and writing memory, printing system information, etc.
The relevant implementation can be found in the k230-bare-metal repository. The final effect is as follows:
K230> help
Available commands:
help - print this help
echo <text> - print <text>
reboot - reboot the system
mem_read <address> <length> - read memory
mem_write <address> <u32> - write memory
tsensor - read temperature sensor
cpuid - print CPUID
serialboot - enter serial boot mode
jump <address> - jump to address
jumpbig <address> - jump to big core and run
Download?
The K230, in essence, is more like an SBC (Single Board Computer). Flashing firmware often involves using a TF card, which is extremely inconvenient in bare-metal development. Continuous plugging and unplugging of the TF card can cause poor contact or even damage.
Referring to how LiteX provides a very convenient kernel/firmware loading method for the FPGA soft core environment—downloading firmware via serial port to a specific memory location (DDR), or even downloading firmware via network—I attempted to port the litex_term‘s UART download logic. It comes with a serial port download protocol and serial command line. After detecting a special string, it automatically switches to download mode, downloads the firmware to a specified memory location via the serial port, and jumps to execute it.
The final effect is:
> litex_term /dev/tty.usbmodem56C40035621 --kernel-adr 0x01000000 --kernel ../firmware.img
......
Press Q or ESC to abort boot completely.
sL5DdSMmkekro
[LITEX-TERM] Received firmware download request from the device.
[LITEX-TERM] Uploading ../firmware.img to 0x01000000 (17400 bytes)...
[LITEX-TERM] Upload calibration... failed, switching to --safe mode.
[LITEX-TERM] Upload complete (8.7KB/s).
[LITEX-TERM] Booting the device.
[LITEX-TERM] Done.
Jumping to 0x01000000...
It’s very convenient; I might introduce it separately in the future. Note that when writing firmware to the memory area, you need to handle the states of the I-Cache and D-Cache. When writing this article, I chose to completely disable the I-Cache and D-Cache.
Jumping to the Big Core
As mentioned earlier, the startup of CPU1 (big core) is controlled by CPU0 (small core). The specific startup logic is straightforward: set the reset vector and reset CPU1:
unsafe {
ptr::write_volatile(0x91102104 as *mut u32, jump_addr as u32);
ptr::write_volatile(0x9110100c as *mut u32, 0x10001000);
ptr::write_volatile(0x9110100c as *mut u32, 0x10001);
ptr::write_volatile(0x9110100c as *mut u32, 0x10000);
}
To facilitate development and testing, I also made jumping to the big core a Shell command. By inputting jumpbig 0x01000000
via UART0, you can make the big core execute code in the memory region. Attempting to dump the big core’s register information, we can see the startup information:
Rust 2nd stage on CPU1
mstatus: 0000000a00001900
mie: 0000000000000000
mip: 0000000000000000
misa: 8000000000b4112f
RV64ABCDFIMSUVX
mvendorid: 5b7
marchid: 8000000009140d00
mhartid: 0
cpuid: 09140b0d 10050000 260c0001
Here, the V
in RV64ABCDFIMSUVX
indicates support for the RVV vector instruction set. The K230 is a heterogeneous dual-core; the small core does not support RVV. This proves that our code has successfully jumped to the big core.
An interesting point is that mhartid
is 0, indicating that the K230 does not comply with the RISC-V specification of assigning different IDs to different harts. This needs attention in actual development. You can only distinguish different harts through miscellaneous CSRs—this is a small pitfall of the K230.
Next, we can perform more complex operations on the big core, such as applying RVV vector instructions.
Conclusion
Through this experiment of bare-metal embedded development using Rust on the K230 chip, we deeply explored the differences in startup methods and usage modes between MPUs and MCUs, mastering the key steps of using Rust for bare-metal development on MPU chips, including startup process, firmware format parsing, interrupts, and peripheral initialization.
In practice, we successfully achieved UART debug output, GPIO LED blinking, PWM buzzer control, and other functions, deepening our understanding of the K230’s underlying startup mechanism and hardware features. These achievements lay a solid foundation for future, more complex embedded development on the K230 and other RISC-V chips.
Looking ahead, we can further improve peripheral drivers, explore multi-core collaboration, apply RVV vector instructions, and leverage the Rust ecosystem to build efficient and secure embedded systems, contributing more to the RISC-V open-source community.
Tips
- The Boot ROM provides exception error messages for illegal execution; you can use this behavior to inversely verify whether the code is being executed—for example, insert illegal instructions to check the pc of the error location.
- It’s best to avoid using the full target features in bare-metal code to prevent the compiler from generating instruction features that have not been enabled, such as the V extension.
- In Rust bare-metal development, since there is no operating system, you cannot use the standard library or the
panic!
macro; therefore, you need to implement a panic handler yourself. - The states of D-Cache and I-Cache need to be managed; generally, disable them before jumping to new code to avoid cache inconsistencies.
- The
println!
macro can conveniently output debug information, but note that printing is blocking and may affect time-sensitive operations. - Learn to use LLMs to assist your exploration process—for example, export YAML definitions from PDF manuals via OCR, translate DDR initialization code, and get explanations for specific registers.
- The Boot ROM initializes some peripherals like UART0, but the specific states still need to be verified again, such as FIFO mode, baud rate, etc.
- For possible hardware implementation bugs or peculiarities, you can try using equivalent alternative methods.
Source link
lol