I find an interesting challenge for myself.
Set a topic: Tailor RT-Thread OpenSource Real-time Operating System
Set a goal: Using the IPC mechanism of RT-Thread (v3.1.2 above) to flash LED at 1HZ frequencies, and tailor the system as small as possible.
Hardware Platform: STM32H750 Let's Start!
Tip 1 (Make the smallest bare metal lighting system)
What is the minimum procedure for the STM32 bare metal LED lighting?
We have three methods can process:
- Select LL library on STM32CUBEMX
- Select HAL library on STM32CUBEMX
- Create the main .c to light a small LED directly
We know that the code with the HAL library is definitely much more than the code in the LL library, and the lighting program we need may not requires as many library files, so I chose to create the main .c directly to light the small LED.
First, create an empty project that can run main.c project. By doing this, you can take the debugging method to determine whether the project is working properly. Use a while loop to judge. Second, light the LED. Figure out how many registers need to be configured to light up the STM 32 LED, the steps that required for lighting STM 32 are:
- Turn on the RCC clock
- Configure OUTPUT mode for PIN
- Set a high pin or low pin
I debugged the main function which is shown below:
1#include "stm32h7xx.h"
2#define GPIO_PIN_8 ((uint16_t)0x0100) /* Pin 8 selected */
3
4int main(void)
5{
6 int i;
7 uint8_t abc = 0;
8 RCC->AHB4ENR |= RCC_AHB4ENR_GPIOIEN; //1)Turn on RCC Clock
9 i++;
10 GPIOI->MODER = 0xFFFDFFFF; //2)Settings
11 for(i=0;;)
12 {
13 if(i == 1000000)
14 {
15 if(abc == 1)
16 {
17 GPIOI->BSRR = GPIO_PIN_8; //Set high pin or low pin
18 abc =0;
19 }
20 else
21 {
22 abc = 1;
23 GPIOI->BSRR = GPIO_PIN_8 << 16;
24 }
25 i = 0;
26 }
27 i++;
28 }
29}
This code is analyzed based on the HAL and LL libraries, and if running the code into the board, we can see the small LED flashing.
Here we can see that the code is relatively fewer.
There is only a main.c file and a system_stm32h7xx.c file
Actually, the light can flash up only using these 3 files, and at this point, we can tailor the system_stm32h7xx.c, because we are probably using different IDE, so assembly syntax will also be different, the *.s file may be changed later, but you can also give a try, the same operation.
You can also check out 01_led_mini_system
project under the current folder.
At this point, we can use Keil to help us to get familiar with how to achieve the minimum system in this bare metal case.
startup_stm32h750xx.s
In file startup_stm32h750xx.s, the peripheral interrupt vector table occupies a lot of codesize and can be optimized.
- Peripheral interrupt
- Internal interrupt can also be deleted if they are not commonly used
SystemInit
In this function, some unnecessary code can be optimized, the optimized code is as follows, this process can help to save almost 0.04KB code:
1void SystemInit (void)
2{
3
4 /* FPU settings ------------------------------------------------------------*/
5 #if (__FPU_PRESENT == 1) && (__FPU_USED == 1)
6 SCB->CPACR |= ((3UL << (10*2))|(3UL << (11*2))); /* set CP10 and CP11 Full Access */
7 #endif
8 /* Reset the RCC clock configuration to the default reset state ------------*/
9 /* Set HSION bit */
10 RCC->CR |= RCC_CR_HSION;
11
12 /* Reset CFGR register */
13 RCC->CFGR = 0;
14
15 /* Reset HSEON, CSSON , CSION,RC48ON, CSIKERON PLL1ON, PLL2ON and PLL3ON bits */
16 RCC->CR &= 0xEAF6ED7FU;
17
18
19 /* Reset HSEBYP bit */
20 RCC->CR &= 0xFFFBFFFFU;
21
22 /* Disable all interrupts */
23 RCC->CIER = 0;
24
25 SCB->VTOR = FLASH_BANK1_BASE | VECT_TAB_OFFSET; /* Vector Table Relocation in Internal FLASH */
26}
Tip 2(IPC minimum resource consumption)
First of all, to select the kernel version since RT-Thread has a Standard Version and a Nano version, in this case, select the RT-Thread Nano version is more appropriate, let's start with RT-Thread Nano v3.1.3. RT-Thread Nano contains semaphore, mutex, event, mailbox, message queue, these sizes are similar, take any of it to use. Because we still need to tailor it.
Here's the case of thread numbers.
We all know that the main function in RT-Thread is also a thread, and idle is a thread. These two threads I think should be enough for use, because we also have an interrupt, systick. Here I use systick to wake up the main thread to operate, we'll need to consider how to light up the LED with the least amount of resources. I am thinking about getting it done on the MASTER branch and successfully light up the LED.
Add RTOS Test on keil
1#include "stm32h7xx.h"
2#include "rtthread.h"
3#define GPIO_PIN_8 ((uint16_t)0x0100) /* Pin 8 selected */
4struct rt_semaphore dynamic_sem;
5
6int main(void)
7{
8 int i;
9 rt_sem_init(&dynamic_sem, "dsem", 0, RT_IPC_FLAG_FIFO);
10 static rt_err_t result;
11 RCC->AHB4ENR |= RCC_AHB4ENR_GPIOIEN;
12 GPIOI->MODER = 0xFFFDFFFF;
13 while(1)
14 {
15 result = rt_sem_take(&dynamic_sem, RT_WAITING_FOREVER);
16 if(i%2 == 0)
17 {
18 GPIOI->BSRR = GPIO_PIN_8;
19 }
20 else
21 {
22 GPIOI->BSRR = GPIO_PIN_8 << 16;
23 }
24 i++;
25 }
26}
27
28uint32_t count = 0;
29extern struct rt_semaphore dynamic_sem;
30void SysTick_Handler(void)
31{
32 /* enter interrupt */
33 rt_interrupt_enter();
34 count++;
35 if(count >= RT_TICK_PER_SECOND)
36 {
37 count = 0;
38 rt_sem_release(&dynamic_sem);
39 }
40 rt_tick_increase();
41
42 /* leave interrupt */
43 rt_interrupt_leave();
44}
Here I just add the RTOS pack with Keil and then directly generate the project.
By this time, we can see that the project is very small, and the project is in 02_led_rtthread_mini_system_keil:
1Program Size: Code=10394 RO-data=1450 RW-data=72 ZI-data=1944
Tip 3(Integrate Project)
- Start by building a new nano project with RT-Thread Studio
- Then replace it with the code in Keil
system_init
Let's optimize the code outside of RTOS, as mentioned earlier in SystemInit
The optimized files here have startup_stm32h750xx. S and system_stm32h7xx.c.
After integration, we can see our code size:
Here we come to 03_rtstudio_mini_rtthread
Tip 4(Modify the compiler options)
The compilation options have the following:
- Highest optimization level - Os
- Turn each function into a .o and then link it -ffunction-sections-fdata-sections
- Not applicable to standard libraries - nostdlib
- Do not enable FPU with softfpu (current project does not require FPU for the time being, so we can save code)
Let's do it step by step to see how much codesize can be reduced at a time, and this time it's important to note that since the whole project is created by Makefile, there are some caches in it, so we're going to clean up the project and build it again each time, or rebuild it at all.
Optimize s level
In the optimization interface, select -Os
The code has almost halved to 5.8KB
Optimize - ffunction-sections
When the source file of ffunction-sectionsis compiled, a separate section is assigned to each fusion. This option is often used.
Fdata-sections when compiling the source file, assign a separate section to each data.
Select ffunction-secionts in the optimization interface, almost 2.0KB
Because we don't have a lot of data, the option -fdata-secions is not optimized very well, and I've seen it, and there's basically no code reduction.
Optimize the standard library
1-nostdlib does not use standard libraries
Optimised down to a size of about 1.3KB:
Optimize FPU
Some code can be optimized using the software FPU.
After optimization, the code is:
We're basically complete most works there. Then, let's take a look at the official RT-Thread Nano occupation data: optimization level 3
There's still something we can improve to catch up with the data that RT-Thread official presented.
Be sure to burn in time to see if the led can light up.
Tip5(Configure rtconfig.h RT-Thread)
Some of the RT-Thread default configurations are not required:
Most of the configuration in rtconfig.h can be removed, you can try to remove it and then compile it to see if the LED can still flash.
Here's the configuration I left behind.
1#define RT_USING_SEMAPHORE // To achive IPC communication
2
3#define RT_THREAD_PRIORITY_MAX 3 //This is the largest priority number, and this can be reduced to 3
4
5#define RT_USING_USER_MAIN //This involves stdlib, so keeping it to save space
6
7#define RT_MAIN_THREAD_STACK_SIZE 128 //This is data from experiments
8
9#define RT_USING_CPU_FFS //This option allows to optimize __lowest_bit_bitmap to use CPU instructions
10
11#define RT_USING_COMPONENTS_INIT
12
13#define RT_TICK_PER_SECOND 1000
Let's check how much the specific size can be reduced after configuration:
RT_USING_CPU_FFS
This is an FFS using a CPU that optimizes a larger array into CPU commands:
It's almost 0.3KB:
RT_USING_CONSOLE
Console system takes almost 2KB size.
RT_MAIN_THREAD_STACK_SIZE
RT_MAIN_THREAD_STACK_SIZE is taken very little RAM.
Other compilation options have limitations for optimization, and I'll give a pass. Now, we can put the header file below in rtconfig.h to see how much code has been reduced:
1#define RT_THREAD_PRIORITY_MAX 3
2
3#define RT_TICK_PER_SECOND 1000
4
5#define RT_ALIGN_SIZE 4
6
7#define RT_NAME_MAX 4
8
9#define RT_USING_COMPONENTS_INIT
10// </c>
11
12#define RT_USING_USER_MAIN
13
14
15#define RT_MAIN_THREAD_STACK_SIZE 128
16
17#define RT_USING_CPU_FFS
18
19#define RT_USING_SEMAPHORE
Finally, the code stays at this size:
Now we come to 04_nostdlib_mini_rtthread
Tip 6 (Tailor code according to map file)
This part is boring but it is important to know.
Let's mainly look at the map file.
.map file in debug can be easily found, so I won't give a further introduction on this, I will introduce how to view the size of the function, here's a fragment intercepted from the map file.
1 .text.rt_tick_increase
2 0x080000f0 0x28 ./rt-thread/src/clock.o
3 0x080000f0 rt_tick_increase
4 .text.rti_end 0x08000118 0x4 ./rt-thread/src/components.o
5 .text.main_thread_entry
6 0x0800011c 0x4 ./rt-thread/src/components.o
7 0x0800011c main_thread_entry
8 .text.rti_board_end
9 0x08000120 0x4 ./rt-thread/src/components.o
10 .text.rti_start
11 0x08000124 0x4 ./rt-thread/src/components.o
12 .text.rti_board_start
13 0x08000128 0x4 ./rt-thread/src/components.o
14 .text.rt_application_init
15 0x0800012c 0x3c ./rt-thread/src/components.o
16 0x0800012c rt_application_init
- The STM32 code starts at 0x08000000 in the ROM, so start with this address inside the map.
There's a clip on it:
rt_tick_increase this function starts at the 0x080000f0 address and is 0x28 in size
rt_application_init this function starts at the 0x0800012c address and is 0x3c in size
So little by little to reduce the code.
Tailor timer
Since the timer is not used, the function timer. c is basically not used, we can comment it all out, and the related calls are commented out. The entire TIMER accounts for about 0.6KB.
Optimize rt_memset and rt_memcpy
In kservice. c, the entire files, and calls can all be commented out, so the whole will occupy almost 0.08KB
rt_thread_exit
Exit the thread. In thread. c, there has a 0.08KB code for which we could tailor
idle task
There are some actions in the idle task that you can comment out on the idle.c
Only keep a while loop, and the small LED can also flash, and we can save 0.15KB.
hardfault handle
Hard fault occupies some codesize, hard faults implementation is in contex_gcc. S, because GCC did not optimize the .s file (i.e. the unused assembly can not be optimized, we could only reduce it little by little) this almost save 0.08KB
- startup_stm32h750xx should also remove the hard fault handler function.
rt_critical
Because there is no inter-process competition, so scheduler.c
rt_enter_critical and rt_exit_critical can be optimized, which would save 0.09KB
rt_components_init
Some options for components can also be optimized to save about 0.04KB.
rt_hw_interrupt_enable
We have fewer threads here, so the switch interrupts in thread.c can be optimized to save about 0.2KB.
rt_interrupt_enter
There are fewer interrupts, rt_interrupt_enter in and out interrupts can be tailored, to save almost 0.4KB.
also tailor the rt_hw_interrupt_thread_switch
rt_components_board_init
Remove this function to save 0.03KB
rt_ipc_list_suspend
case RT_IPC_FLAG_PRIO function is not needed, save 0.05KB by commenting it out directly.
Optimize inline function
When compiling the inline function we will need to put the entire code into the function, it can result in an excessive amount of code.
rt_service.h
I was thinking to change the inline function into a function that can reduce the code, but when I resumed the process, I found that the amount of code actually increased by 0.07KB by doing this.
This fully illustrates that if the inline function is well written and it could actually reduce code size.
rt_thread
The struct rt_thread has something that can be optimized to reduce code size
struct rt_timer thread_timer;
this member-related code can be deleted.
It's almost done.
The final is project 05_final_cut_mini_system.
Result
RT-Thread Contact Info: