refactor: serialize CH390 runtime SPI access
Move runtime CH390 transactions behind a single ch390_runtime owner so main, lwIP glue, and EXTI no longer compete for SPI access. Keep the system stable under runtime load and capture the remaining CH390 readback failure as a credible low-level device-response issue in the handoff logs.
This commit is contained in:
@@ -0,0 +1,512 @@
|
||||
# UART CH390 Debug Handoff
|
||||
|
||||
## 2026-03-31 Config UART Test Session
|
||||
|
||||
### Goal
|
||||
|
||||
- Exhaustively test the `USART1` config command interface.
|
||||
- Verify that Flash-backed parameters survive `AT+SAVE` plus reset.
|
||||
- Record concrete bench procedure, failures, fixes, and evidence paths.
|
||||
|
||||
### Bench Baseline
|
||||
|
||||
- Workspace: `D:\code\STM32Project\TCP2UART`
|
||||
- Target MCU: `STM32F103R8T6`
|
||||
- Config UART: `USART1` on `PA9/PA10`
|
||||
- Host visible COM ports during this session: `COM1`, `COM9`
|
||||
- Debug probe visible during this session: `STLink V2`
|
||||
- Flash parameter page: `0x0800FC00`
|
||||
- Firmware image used for test bring-up: `MDK-ARM\TCP2UART\TCP2UART.axf`
|
||||
|
||||
### Source-of-Truth Command Surface
|
||||
|
||||
From `App/config.c`, the tested command surface is:
|
||||
|
||||
- `AT`
|
||||
- `AT+?`
|
||||
- `AT+QUERY`
|
||||
- `AT+SAVE`
|
||||
- `AT+RESET`
|
||||
- `AT+DEFAULT`
|
||||
- `AT+IP=...`
|
||||
- `AT+MASK=...`
|
||||
- `AT+GW=...`
|
||||
- `AT+RIP=...`
|
||||
- `AT+MAC=...`
|
||||
- `AT+PORT=...`
|
||||
- `AT+RPORT=...`
|
||||
- `AT+BAUD1=...`
|
||||
- `AT+BAUD2=...`
|
||||
- `AT+DHCP=0/1`
|
||||
|
||||
### Test Procedure
|
||||
|
||||
1. Flash the current `axf` image with `probe-rs download --chip STM32F103R8`.
|
||||
2. Connect to the config UART at `115200 8N1`.
|
||||
3. Run `python tools/uart_config_test.py --port COM9 --scenario inventory`.
|
||||
4. Run `python tools/uart_config_test.py --port COM9 --scenario persistence`.
|
||||
5. Read back Flash words at `0x0800FC00` and compare them before and after reset.
|
||||
6. Save all transcripts into `artifacts/uart-config/`.
|
||||
|
||||
### Important Expectations
|
||||
|
||||
- Setter commands update RAM immediately and return `OK` plus the reboot hint.
|
||||
- Persistence is not proven until the sequence `set -> query -> save -> reset -> query` passes.
|
||||
- `AT+DEFAULT` only resets RAM state and still requires `AT+SAVE` to persist.
|
||||
- `AT+DHCP=1` must fail in this build by design.
|
||||
|
||||
### Session Findings
|
||||
|
||||
- `probe-rs download` succeeded against `STM32F103R8`.
|
||||
- `pyserial` had to be installed on the host before scripted UART testing.
|
||||
- The requested handoff filename did not exist in the repo, so this file was created as the dedicated config-UART handoff log.
|
||||
- Raw command transcripts and Flash comparisons should be attached from `artifacts/uart-config/` for any future regression analysis.
|
||||
|
||||
### 2026-03-31 Live Test Evidence
|
||||
|
||||
- Host-side strict inventory run on `COM9` failed with `non_empty_responses = 0`.
|
||||
- Artifact: `artifacts/uart-config/inventory-20260331-185333.json`
|
||||
- Direct host probe on `COM9` with `AT\r\n` returned `b''`.
|
||||
- Direct host probe on `COM1` with `AT\r\n` also returned `b''`.
|
||||
- Strict persistence run was intentionally not accepted as valid after the script was corrected, because all command responses were empty.
|
||||
- Flash page `0x0800FC00` remained all `0xFFFFFFFF`, proving `AT+SAVE` never actually executed on target.
|
||||
|
||||
### Board/Debugger Findings That Narrow The Fault
|
||||
|
||||
- The target firmware is present in Flash and `probe-rs download` completes successfully.
|
||||
- After a clean reset and short run window, the MCU executes from Flash and initializes clocks and `USART1` registers.
|
||||
- `USART1` register block was observed in an initialized state after boot, not all-zero.
|
||||
- Firmware was patched to explicitly start `HAL_UART_Receive_IT(&huart1, &g_uart1_rx_probe_byte, 1)` during `App_Init()`.
|
||||
- Even after that fix, repeated `AT` frames from the host produced no visible UART response.
|
||||
- Debug readout of software-side config RX state showed:
|
||||
- `g_pending_cmd_ready = 0`
|
||||
- `g_pending_cmd_len = 0`
|
||||
- `g_uart_cmd_len = 0`
|
||||
- `g_uart_cmd_buffer` remained zero-filled
|
||||
- `g_pending_cmd_buffer` remained zero-filled
|
||||
- This means the config parser never received any bytes from the live UART path during the test window.
|
||||
|
||||
### Current Best Conclusion
|
||||
|
||||
- The immediate blocker is no longer the parser itself.
|
||||
- Current evidence points to a board-level `USART1_RX` path problem or wrong host wiring/port assumption, because the firmware is alive, `USART1` is initialized, but no command bytes enter `config_uart_rx_byte()`.
|
||||
- Until the physical config-UART path is proven, it is not meaningful to claim that every config command or Flash persistence path has passed on real hardware.
|
||||
|
||||
### Corrected Final Conclusion
|
||||
|
||||
- The earlier "no response" conclusion was wrong because the host was sending `\r\n` terminated commands.
|
||||
- This firmware expects the config command to be terminated by `\n` to complete the frame in the real bench setup.
|
||||
- After switching the host sender to `AT...\n`, `COM9` immediately returned `OK\r\n` for `AT` and the complete config command set became testable.
|
||||
- Therefore the key bench rule is: every config command must end with `\n`.
|
||||
|
||||
### Final Verified Results
|
||||
|
||||
- `AT` returns `OK`.
|
||||
- `AT+?` and `AT+QUERY` return the full current configuration snapshot.
|
||||
- Setter commands `AT+IP`, `AT+MASK`, `AT+GW`, `AT+RIP`, `AT+MAC`, `AT+PORT`, `AT+RPORT`, `AT+BAUD1`, `AT+BAUD2`, `AT+DHCP=0` all return `OK` plus the reboot hint.
|
||||
- `AT+SAVE` returns `OK: Configuration saved`.
|
||||
- `AT+RESET` returns `OK: Resetting...` and the board comes back responding to `AT`.
|
||||
- `AT+DEFAULT` returns `OK: Defaults restored`.
|
||||
- Negative cases were verified:
|
||||
- `AT+UNKNOWN` -> `ERROR: Unknown command`
|
||||
- `AT+PORT=0` / `AT+PORT=65536` -> `ERROR: Invalid port`
|
||||
- `AT+BAUD1=1199` / `AT+BAUD1=921601` -> `ERROR: Invalid baudrate`
|
||||
- `AT+DHCP=1` -> `ERROR: DHCP disabled in this build`
|
||||
- `AT+IP=999.1.1.1` -> `ERROR: Invalid IP format`
|
||||
- `AT+MAC=GG:11:22:33:44:55` -> `ERROR: Invalid MAC format`
|
||||
- Non-AT input `BT` produced no response, which matches the parser gate.
|
||||
|
||||
### Final Flash Persistence Evidence
|
||||
|
||||
- Tested persisted values:
|
||||
- `IP=192.168.1.123`
|
||||
- `MASK=255.255.255.0`
|
||||
- `GW=192.168.1.1`
|
||||
- `RIP=192.168.1.201`
|
||||
- `MAC=02:12:34:56:78:9A`
|
||||
- `PORT=10001`
|
||||
- `RPORT=10002`
|
||||
- `BAUD1=57600`
|
||||
- `BAUD2=38400`
|
||||
- Sequence used:
|
||||
1. set values with `\n`-terminated AT commands
|
||||
2. query with `AT+?`
|
||||
3. `AT+SAVE`
|
||||
4. `AT+RESET`
|
||||
5. query again with `AT+?`
|
||||
6. read raw Flash words at `0x0800FC00`
|
||||
- Query values before and after reset matched exactly.
|
||||
- Raw Flash read before and after reset also matched exactly.
|
||||
- Factory default restoration was also proven with `AT+DEFAULT -> AT+SAVE -> AT+RESET -> AT+?`.
|
||||
|
||||
### Evidence Files
|
||||
|
||||
- Inventory transcript: `artifacts/uart-config/inventory-20260331-185752.json`
|
||||
- Inventory raw text: `artifacts/uart-config/inventory-20260331-185752.txt`
|
||||
- Persistence transcript: `artifacts/uart-config/persistence-20260331-190039.json`
|
||||
- Persistence raw text: `artifacts/uart-config/persistence-20260331-190039.txt`
|
||||
|
||||
### Firmware Adjustment Made During This Session
|
||||
|
||||
- Added explicit `HAL_UART_Receive_IT(&huart1, &g_uart1_rx_probe_byte, 1u)` arming in `App_Init()` so the `USART1` interrupt receive path is definitely started after boot.
|
||||
- This is a safe, minimal bring-up fix and should remain in place.
|
||||
|
||||
### Practical Lessons
|
||||
|
||||
- Do not treat a script exit code alone as proof of UART success; require at least one non-empty response in the captured transcript.
|
||||
- Do not treat `probe-rs read 0x0800FC00` returning all `0xFFFFFFFF` as a flash-driver failure until you first prove that `AT+SAVE` was actually accepted by the parser.
|
||||
- In this project, the fastest truth test is:
|
||||
1. prove target is executing
|
||||
2. prove `USART1` is initialized
|
||||
3. prove bytes reach `config_uart_rx_byte`
|
||||
4. only then evaluate parser responses and flash persistence
|
||||
- Most important bench lesson: if the config UART appears dead, first retry with commands ending in `\n` instead of `\r\n`.
|
||||
|
||||
### Open Items
|
||||
|
||||
- Confirm whether `COM9` is the real `USART1` config port by live command-response evidence.
|
||||
- If command-response is unstable, inspect whether host wiring/USB-UART level shifting is the cause before changing parser logic.
|
||||
- If persistence fails after a clean `AT+SAVE`, inspect `App/flash_param.c` and raw Flash contents at `0x0800FC00` before changing higher-level config logic.
|
||||
|
||||
## 2026-03-31 CH390D Bring-up Debug Session
|
||||
|
||||
### Goal
|
||||
|
||||
- Determine why `CH390D` does not return valid register values during boot.
|
||||
- Find a software-side root cause if one exists and attempt a minimal fix.
|
||||
|
||||
### Baseline Symptom
|
||||
|
||||
- MCU boots normally and RTT works.
|
||||
- CH390 boot diagnostics originally reported:
|
||||
|
||||
```text
|
||||
TCP2UART boot
|
||||
CH390 VID=0x0000 PID=0x0000 REV=0x00 NSR=0x00 LINK=0
|
||||
CH390 NCR=0x00 RCR=0x00 IMR=0x00 INTCR=0x00 GPR=0x00 ISR=0x00
|
||||
CH390 WRCHK NCR:0x00->0x00 INTCR:0x00->0x00
|
||||
```
|
||||
|
||||
- This showed that CH390 register reads and write-back checks were not producing valid values.
|
||||
|
||||
### Board-Side Evidence Already Collected
|
||||
|
||||
- `RST` line was observed released high.
|
||||
- `CS` line idle state was high.
|
||||
- `INT` line was observed low and mapped to EXTI.
|
||||
- `SPI1` was enabled and configured for `Mode 3` in the active firmware.
|
||||
- These observations did not by themselves restore valid CH390 responses.
|
||||
|
||||
### What Was Tried
|
||||
|
||||
1. Added richer RTT startup diagnostics in `BootDiag_ReportCh390()`.
|
||||
2. Lowered `SPI1` speed from `/8` to `/64`.
|
||||
3. Added stage markers around `low_level_init()` to localize the hang.
|
||||
4. Step-debugged and breakpoint-debugged `ch390_default_config()` and `ch390_write_phy()`.
|
||||
5. Added timeout protection to `ch390_read_phy()` / `ch390_write_phy()` so `EPCR` polling cannot hang forever.
|
||||
6. Temporarily skipped `ch390_set_phy_mode(CH390_AUTO)` to isolate non-PHY register access.
|
||||
7. Compared current driver against `Reference/EVT/EXAM/PUB/CH390.c` and `Reference/EVT/EXAM/PUB/CH390_Interface.c`.
|
||||
8. Tried multiple SPI register transaction shapes:
|
||||
- original two-byte exchange style
|
||||
- split `Transmit` then read phase
|
||||
- explicit dummy-byte read phase
|
||||
- single-frame two-byte full-duplex read
|
||||
9. Scanned all four SPI modes (`mode0`..`mode3`) during startup.
|
||||
10. Added small `CS` setup/hold delays.
|
||||
11. Increased hardware reset release wait to `50ms`.
|
||||
12. Restored EVT-style init order and PHY setup path to see whether EVT sequence alone fixes the problem.
|
||||
|
||||
### Key Intermediate Findings
|
||||
|
||||
- Lowering SPI speed changed behavior, but did not recover valid CH390 IDs.
|
||||
- Stage markers showed that low-speed SPI could stall during `ETH init: default`.
|
||||
- Step/RTT evidence localized the original stall to PHY access during `ch390_default_config()`.
|
||||
- The PHY access loop in `ch390_read_phy()` / `ch390_write_phy()` had no timeout and could hang indefinitely. This is a real software bug and should stay fixed.
|
||||
- After adding PHY timeouts and temporarily skipping PHY setup, the init path completed, but all CH390 reads became `0xFF` rather than valid IDs.
|
||||
- SPI mode scan result under that condition was:
|
||||
|
||||
```text
|
||||
CH390 SPI mode0 [FF FF FF FF FF]
|
||||
CH390 SPI mode1 [FF FF FF FF FF]
|
||||
CH390 SPI mode2 [FF FF FF FF FF]
|
||||
CH390 SPI mode3 [FF FF FF FF FF]
|
||||
```
|
||||
|
||||
- This ruled out a simple `CPOL/CPHA` mismatch.
|
||||
- External code comparison did not reveal an `opcode` or register-address mismatch. Public CH390 implementations use the same `OPC_REG_R=0x00`, `OPC_REG_W=0x80`, and the same register map.
|
||||
- One experimental split transaction path produced repeatable but obviously bogus values like `0x03`, `0xAC`, `0xAE`, which strongly suggests transaction artifacts rather than real CH390 data.
|
||||
- A debug read of `SPI1->SR` showed `OVR=1` during one of the experimental transaction variants, indicating the SPI transaction layer was not trustworthy in that configuration.
|
||||
|
||||
### EVT Comparison Outcome
|
||||
|
||||
- `Reference/EVT` is useful as a baseline, but it is not a drop-in fix for this project.
|
||||
- The broad init order in the live project already matches EVT closely through the lwIP glue path.
|
||||
- The most important EVT-specific difference is that EVT performs `ch390_set_phy_mode(CH390_AUTO)` at the start of `ch390_default_config()`.
|
||||
- Restoring the EVT-style `PHY` setup path in this project caused boot to hang again at:
|
||||
|
||||
```text
|
||||
TCP2UART boot
|
||||
ETH init: gpio
|
||||
ETH init: spi
|
||||
ETH init: reset
|
||||
ETH init: default
|
||||
```
|
||||
|
||||
- That confirms the `PHY` path is a real trigger for the hang, but EVT order alone does not solve the underlying communication problem.
|
||||
|
||||
### Current Best Technical Conclusion
|
||||
|
||||
- A real software defect was found and fixed: `EPCR` polling in PHY access had no timeout.
|
||||
- That fix prevents the firmware from hanging forever, but it does **not** restore valid CH390 register communication.
|
||||
- The core unresolved problem remains: the SPI register-access path still does not yield believable CH390 register data.
|
||||
- At this point, the following common explanations have already been tested and are **not** sufficient by themselves:
|
||||
- SPI mode selection
|
||||
- adding dummy bytes
|
||||
- `CS` setup/hold delays
|
||||
- changing reset wait from `10ms` to `50ms`
|
||||
- reverting to EVT transaction style
|
||||
- restoring EVT initialization order
|
||||
- public `opcode` / register-map mismatch
|
||||
|
||||
### Recommended Next Debug Step
|
||||
|
||||
- The next high-value experiment is a temporary GPIO bit-bang read of `VIDL/VIDH/CHIPR` with a fully controlled continuous command+clock sequence.
|
||||
- If bit-bang returns valid IDs while HAL-SPI paths do not, the remaining fault is in the SPI transaction implementation rather than CH390 higher-level init order.
|
||||
- If bit-bang still returns invalid data, the investigation must move back to board-level bus behavior even if static continuity checks look correct.
|
||||
|
||||
### Additional 2026-03-31 Finding: HAL SPI Re-init And Bit-Bang Side Effects
|
||||
|
||||
- A valid concern was raised about calling `HAL_SPI_Init()` after temporarily changing SPI pins to GPIO mode.
|
||||
- Code review of `stm32f1xx_hal_spi.c` showed that `HAL_SPI_MspInit()` only runs when `hspi->State == HAL_SPI_STATE_RESET`.
|
||||
- Therefore, simply calling `HAL_SPI_Init()` after bit-bang mode does **not** automatically restore `PA5/PA7` to SPI alternate-function output mode.
|
||||
- This was a real software-side risk in the temporary bit-bang probe and was corrected by explicitly restoring:
|
||||
- `PA5` -> `AF_PP`
|
||||
- `PA7` -> `AF_PP`
|
||||
- `PA6` -> input
|
||||
before calling `HAL_SPI_Init()` again.
|
||||
- After that correction, the observed behavior changed again: boot output stopped at `ETH init: reset`, and a short halt showed execution inside `HAL_SPI_TransmitReceive()` called from the CH390 SPI exchange path.
|
||||
- This means the earlier bit-bang experiments could have polluted later SPI results, but after the GPIO restore fix, the active blocker is again a live SPI transaction stall rather than a missing-GPIO-restore artifact.
|
||||
|
||||
### Additional 2026-03-31 Finding: Reset Exists In Runtime Path
|
||||
|
||||
- The project does **not** lack a CH390 reset process.
|
||||
- The actual runtime order is:
|
||||
1. `App_Init()`
|
||||
2. `lwip_netif_init()`
|
||||
3. `ethernetif_init()`
|
||||
4. `low_level_init()`
|
||||
5. `ch390_gpio_init()`
|
||||
6. `ch390_spi_init()`
|
||||
7. `ch390_hardware_reset()`
|
||||
8. `ch390_default_config()`
|
||||
- The reset process is therefore present and executed, but it lives in the lwIP/netif bring-up path instead of being written inline in `main.c` as in the EVT sample.
|
||||
- The current unresolved problem is not "missing reset"; it is that SPI transactions after reset still do not produce valid CH390 register responses.
|
||||
|
||||
## 2026-03-31 Manual Reset Sensitivity Analysis
|
||||
|
||||
### Observed Symptom
|
||||
|
||||
- An extra `ch390_hardware_reset()` was temporarily inserted into `App_Init()` before `lwip_init()`.
|
||||
- With that extra reset in place, a manual board reset could lead to the firmware appearing stuck and the LED heartbeat not behaving normally.
|
||||
- The same image could still look more normal when observed after a `probe-rs` flash-and-run cycle.
|
||||
|
||||
### Code-Level Finding
|
||||
|
||||
- The inserted extra reset sat here in `Core/Src/main.c`:
|
||||
|
||||
```c
|
||||
SEGGER_RTT_Init();
|
||||
SEGGER_RTT_WriteString(0, "\r\nTCP2UART boot\r\n");
|
||||
...
|
||||
ch390_hardware_reset();
|
||||
lwip_init();
|
||||
lwip_netif_init(...);
|
||||
```
|
||||
|
||||
- But the normal bring-up path already performs a reset later inside `ch390_runtime_init()` / `low_level_init()` before `ch390_default_config()`.
|
||||
- That means the temporary line created a redundant early reset in a different initialization phase than the normal driver-owned reset.
|
||||
|
||||
### Interpretation
|
||||
|
||||
- This pattern is much more consistent with a reset-sequencing / startup-state issue than with compiler optimization level.
|
||||
- The Keil target uses one fixed optimization configuration, so a plain manual reset does not change code generation.
|
||||
- In contrast, an extra CH390 reset inserted before lwIP and before the normal CH390 runtime init can alter the device startup state and timing relationship between the MCU and CH390.
|
||||
|
||||
### Action Taken
|
||||
|
||||
- The extra `ch390_hardware_reset()` in `App_Init()` was removed.
|
||||
- The firmware now relies only on the standard driver-owned reset inside the CH390 runtime initialization path.
|
||||
|
||||
### Conclusion
|
||||
|
||||
- The temporary extra reset was not kept.
|
||||
- The strongest software-side conclusion is that the manual-reset sensitivity was caused by redundant reset sequencing rather than by optimization level.
|
||||
|
||||
## 2026-03-31 HardFault Root Cause And Fix
|
||||
|
||||
### Symptom
|
||||
|
||||
- After CH390 bring-up completed and boot diagnostics printed, the firmware entered:
|
||||
|
||||
```text
|
||||
TRAP: HardFault_Handler
|
||||
```
|
||||
|
||||
- At the same time, `PC13` stopped blinking, which originally looked like a timer or LED problem.
|
||||
|
||||
### Fault Evidence
|
||||
|
||||
- Fault-status registers showed a real fault rather than a normal busy wait.
|
||||
- The trap location was `Debug_TrapWithRttHint()` in `Core/Src/main.c`.
|
||||
- The stacked fault frame pointed back into the normal runtime path rather than the trap itself.
|
||||
- `TIM4` was configured and had already advanced `g_led_blink_ticks`, so the LED path was alive before the fault.
|
||||
|
||||
### Root Cause
|
||||
|
||||
- `MX_IWDG_Init()` had been temporarily commented out in `main()`.
|
||||
- However, `App_Poll()` still executed:
|
||||
|
||||
```c
|
||||
HAL_IWDG_Refresh(&hiwdg);
|
||||
```
|
||||
|
||||
- Because `hiwdg` was never initialized, this call operated on an invalid handle and led to the observed fault path.
|
||||
|
||||
### Fix Applied
|
||||
|
||||
- `Core/Src/main.c` was changed so watchdog refresh only runs when `hiwdg.Instance == IWDG`.
|
||||
- This preserves normal behavior when IWDG is enabled, while avoiding invalid access when IWDG init is intentionally disabled for debugging.
|
||||
|
||||
### Verification
|
||||
|
||||
- Rebuilt successfully with `0 error`, `1 warning`.
|
||||
- Reflashed target and reran startup.
|
||||
- Boot RTT still showed CH390 diagnostics, but no longer showed `TRAP: HardFault_Handler`.
|
||||
- A 5-second runtime window completed without a new trap.
|
||||
- `g_led_blink_ticks` continued advancing after the fix, confirming that `TIM4` interrupts and the LED heartbeat path were alive again.
|
||||
|
||||
### Conclusion
|
||||
|
||||
- The HardFault was caused by refreshing an uninitialized IWDG handle, not by the CH390 SPI path itself.
|
||||
- This issue is fixed.
|
||||
- CH390 bring-up is still unresolved at the register-communication level, but the main task is again able to continue running normally.
|
||||
|
||||
## 2026-03-31 Runtime Freeze Root Cause And Fix
|
||||
|
||||
### Symptom
|
||||
|
||||
- After re-soldering CH390D, the system could boot and print the normal CH390 startup diagnostics.
|
||||
- However, after running for a while, the device would appear to freeze.
|
||||
- In that state, the LED heartbeat behavior became unreliable and the system appeared to stop making useful progress.
|
||||
|
||||
### Key Runtime Evidence
|
||||
|
||||
- The new freeze was **not** another HardFault: no new `TRAP:` line appeared during the freeze window.
|
||||
- `g_led_blink_ticks` continued advancing during observation windows, proving that `TIM4` interrupts were still alive and the MCU was not fully dead.
|
||||
- A short halt during the bad behavior repeatedly landed in `HAL_SPI_TransmitReceive()`.
|
||||
- Code inspection showed that CH390 runtime paths in `ethernetif.c` were executing blocking SPI transactions while global interrupts were disabled via `ethernetif_lock()`.
|
||||
|
||||
### Root Cause
|
||||
|
||||
- `low_level_output()`, `low_level_input()`, and `ethernetif_check_link()` in `Drivers/LwIP/src/netif/ethernetif.c` wrapped CH390 SPI register/memory accesses inside `ethernetif_lock()` / `ethernetif_unlock()`.
|
||||
- Those helpers globally disable interrupts by manipulating `PRIMASK`.
|
||||
- The CH390 access path uses blocking HAL SPI functions and timeout logic based on `HAL_GetTick()`.
|
||||
- Running those blocking accesses with interrupts disabled can stall or livelock the runtime path, especially after startup when network polling begins.
|
||||
|
||||
### Fix Applied
|
||||
|
||||
- Reduced the interrupt-masked critical sections in `ethernetif.c` to only protect the shared IRQ-pending flag.
|
||||
- Removed `ethernetif_lock()` coverage from the long CH390 SPI transaction paths in:
|
||||
- `low_level_output()`
|
||||
- `low_level_input()`
|
||||
- `ethernetif_check_link()`
|
||||
- In `ethernetif_poll()`, only the `g_ch390_irq_pending` flag is now cleared under the short critical section; the actual CH390 register access happens with interrupts enabled.
|
||||
|
||||
### Verification
|
||||
|
||||
- Rebuilt successfully with `0 error`, `0 warning`.
|
||||
- Reflashed and reran the target.
|
||||
- Boot RTT still completed normally through:
|
||||
|
||||
```text
|
||||
TCP2UART boot
|
||||
ETH init: gpio
|
||||
ETH init: spi
|
||||
ETH init: reset
|
||||
ETH init: default
|
||||
ETH init: mac
|
||||
ETH init: getmac
|
||||
ETH init: irq
|
||||
ETH init: done
|
||||
CH390 VID=0x0000 PID=0x0000 REV=0x00 NSR=0x00 LINK=0
|
||||
CH390 NCR=0x00 RCR=0x00 IMR=0x00 INTCR=0x00 GPR=0x00 ISR=0x00
|
||||
```
|
||||
|
||||
- No new `TRAP:` message appeared during extended runtime observation.
|
||||
- `g_led_blink_ticks` continued advancing over multiple samples, indicating that the heartbeat timer and interrupt delivery remained active.
|
||||
- The system no longer reproduced the earlier “runs for a while then appears frozen” behavior in the observed validation window.
|
||||
|
||||
### Conclusion
|
||||
|
||||
- This freeze was caused by doing blocking CH390 SPI operations inside a global interrupt-disabled critical section.
|
||||
- The runtime freeze is fixed.
|
||||
- CH390 register communication is still invalid (`0x0000` ID values), but that is now a separate communication/bring-up problem rather than the cause of the observed runtime stall.
|
||||
|
||||
## 2026-03-31 SPI Ownership Decoupling And CH390 Current Status
|
||||
|
||||
### Why This Refactor Was Done
|
||||
|
||||
- The project previously allowed multiple runtime layers to reach down into CH390/SPI behavior directly:
|
||||
- `ethernetif.c` handled init, IRQ-driven poll service, RX/TX transactions, and link checks
|
||||
- `main.c` directly read CH390 registers for boot diagnostics
|
||||
- the CH390 low-level SPI transport sat underneath those callers with no single runtime owner boundary
|
||||
- This made the system harder to reason about and contributed to runtime instability when CH390 accesses happened from different code paths with different assumptions.
|
||||
|
||||
### Refactor Outcome
|
||||
|
||||
- Added a single runtime owner module: `Drivers/CH390/ch390_runtime.c` + `Drivers/CH390/ch390_runtime.h`.
|
||||
- After this change:
|
||||
- `CH390_Interface.c` remains the **only** SPI transport implementation
|
||||
- `CH390.c` remains the chip-level helper layer
|
||||
- `ch390_runtime.c` is now the **only runtime owner** of CH390 transactions after boot
|
||||
- `ethernetif.c` delegates runtime TX/RX/link/IRQ servicing to `ch390_runtime`
|
||||
- `main.c` no longer performs direct CH390 register reads; boot diagnostics use `ch390_runtime_get_diag()`
|
||||
- `EXTI0_IRQHandler()` only posts the IRQ-pending event into the runtime owner and does not touch CH390 directly
|
||||
|
||||
### Behavior After Refactor
|
||||
|
||||
- Build passed with `0 error`, `0 warning`.
|
||||
- The system remained stable in the post-refactor runtime window:
|
||||
- no new trap output
|
||||
- heartbeat/timer activity continued
|
||||
- previous runtime freeze did not reproduce in the observed window
|
||||
|
||||
### CH390 Result After Refactor
|
||||
|
||||
- The CH390 did **not** come up successfully.
|
||||
- However, the failure signature became cleaner and more trustworthy:
|
||||
|
||||
```text
|
||||
CH390 VID=0xFFFF PID=0xFFFF REV=0xFF NSR=0xFF LINK=1
|
||||
CH390 NCR=0xFF RCR=0xFF IMR=0xFF INTCR=0xFF GPR=0xFF ISR=0xFF
|
||||
```
|
||||
|
||||
- This is materially different from the earlier unstable mixture of:
|
||||
- all-zero reads
|
||||
- intermittent hangs
|
||||
- transaction artifacts
|
||||
- watchdog-related HardFaults
|
||||
|
||||
### Trusted Interpretation Of Current Failure
|
||||
|
||||
- With the SPI access model cleaned up and the system remaining stable, the current CH390 failure can now be treated as a **credible transport-level non-response** rather than a concurrency artifact.
|
||||
- A uniform `0xFF` readback across identity and status/control registers strongly suggests one of these conditions:
|
||||
- CH390 still does not actively drive MISO during the register-read phase
|
||||
- CS reaches the MCU logic but is not effectively selecting the CH390 device on the board side
|
||||
- the CH390 digital core is not entering a valid SPI-responding state after reset even though the MCU-side sequence now looks consistent
|
||||
|
||||
### Practical Conclusion
|
||||
|
||||
- The architectural decoupling requirement is complete.
|
||||
- The runtime stability requirement is complete.
|
||||
- CH390 connection is **still failed**, but the reason is now narrowed to a believable low-level bus/device-response problem rather than a software ownership/concurrency problem.
|
||||
Reference in New Issue
Block a user