I built a reasonably basic client-server IoT setup in C++ with a Raspberry Pi 5 server and a few Raspberry Pi Pico W clients. It's nothing special, just keeps track of some temperature and voltage readings for an off grid solar and hydropower system that powers a couple of houses.
I am struggling with the Raspberry Pi Pico W clients randomly crashing outputting *** PANIC *** now and then. If I could work out how catch and handle whatever the error is I could consider the system functional. Even a hard reboot and just start again would be better than as is having to pull the plug to get it working again.
This is the last few lines of serial output before the Raspberry Pi Pico W calls exit(1), I included one complete send/receive followed by a send/failedReceive, but there ate many identical successful send/receives
StartTasks...
task1: Connecting to access point JDnD
Connected to wifi with an IP address
MAC ADDRESS: D83ADD3E85F9
IP ADDRESS: 192.168.1.66
task2: Connecting to 192.168.1.1 port 8081
task3: waiting for connection from server.
Connected to 192.168.1.1 port 8081
task3:
Send... 466
Type:TS;Module:D83ADD3E85F9;Device:2846BEA00600004C;Value:22.3C;IP:192.168.1.66;
Type:TS;Module:D83ADD3E85F9;Device:2835BAA006000092;Value:22.7C;IP:192.168.1.66;
Type:TS;Module:D83ADD3E85F9;Device:2873B6A006000074;Value:22.2C;IP:192.168.1.66;
Type:RL;Module:D83ADD3E85F9;Device:Relay1;Value:ON;IP:192.168.1.66;Enable:FALSE;Action:;
Type:RL;Module:D83ADD3E85F9;Device:Relay2;Value:ON;IP:192.168.1.66;Enable:;Action:;
Type:TM;Module:D83ADD3E85F9;Device:Timer1;Value:;
task4: waiting for response from server......
tcp_client_sent: len=466
tcp_client_receive: len=357 err=0
Responce...
Type;TS;Module;;Device;2846BEA00600004C;Value;22.3C;1;
Type;TS;Module;;Device;2835BAA006000092;Value;22.3C;1;
Type;TS;Module;;Device;2873B6A006000074;Value;22.3C;1;
Type;RL;Module;;Device;Relay1;Value;ON;2;Group;Huntsman;Name;Fire1R;Enable;FALSE;2;Action;2873B6A006000074>25;
Value:ON:2;
Enable:FALSE:2;
Type;RL;Module;;Device;Relay2;Value;OFF;1;Group;Huntsman;Name;Heater;Enable;FALSE;Action;;
Type;TM;Module;;Device;Timer1;Value;;
task4: Close Client
StartTasks...
task1: Connecting to access point JDnD
Connected to wifi with an IP address
MAC ADDRESS: D83ADD3E85F9
IP ADDRESS: 192.168.1.66
task2: Connecting to 192.168.1.1 port 8081
task3: waiting for connection from server.
Connected to 192.168.1.1 port 8081
task3:
Send... 466
Type:TS;Module:D83ADD3E85F9;Device:2846BEA00600004C;Value:22.3C;IP:192.168.1.66;
Type:TS;Module:D83ADD3E85F9;Device:2835BAA006000092;Value:22.7C;IP:192.168.1.66;
Type:TS;Module:D83ADD3E85F9;Device:2873B6A006000074;Value:22.2C;IP:192.168.1.66;
Type:RL;Module:D83ADD3E85F9;Device:Relay1;Value:ON;IP:192.168.1.66;Enable:FALSE;Action:;
Type:RL;Module:D83ADD3E85F9;Device:Relay2;Value:ON;IP:192.168.1.66;Enable:;Action:;
Type:TM;Module:D83ADD3E85F9;Device:Timer1;Value:;
task4: waiting for responce from server.
*** PANIC ***
tcp_slowtmr: TIME-WAIT pcb->state == TIME-W
I grep searched the Raspberry Pi Pico W SDK for "PANIC" and found this function in 'rp2_common/pico_platform_panic/panic.c' and 'pico_platform/platform_base.c' that takes a variable number of arguments and outputs them to standard output:
void panic(const char *fmt, ...)
{
va_list args;
puts("*** PANIC ***\n");
if (fmt)
{
va_start(args, fmt);
vprintf(fmt, args);
va_end(args);
}
puts("\n");
__breakpoint();
}
After '*** PANIC ***', I have a blank line, followed by 'tcp_slowtmr: TIME-WAIT pcb->state == TIME-W'.
I grep search the SDK for calls to the panic function without any real success. I notice a few calls, 'panic("");' and 'panic(NULL);', that could be relevant since there is a blank line output, but nothing stands out, pointing me in any real direction.
pico-sdk/lib/tinyusb/src/portable/raspberrypi/rp2040/rp2040_usb.c:108: panic("ep %02X was already available", ep->ep_addr);
pico-sdk/lib/tinyusb/src/portable/raspberrypi/rp2040/rp2040_usb.c:307: panic("Can't continue xfer on inactive ep %02X", ep->ep_addr);
pico-sdk/lib/tinyusb/src/portable/raspberrypi/rp2040/hcd_rp2040.c:163: panic("Unhandled buffer %d\n", remaining_buffers);
pico-sdk/lib/tinyusb/src/portable/raspberrypi/rp2040/hcd_rp2040.c:246: panic("Data Seq Error \n");
pico-sdk/lib/tinyusb/src/portable/raspberrypi/rp2040/hcd_rp2040.c:251: panic("Unhandled IRQ 0x%x\n", (uint) (status ^ handled));
pico-sdk/lib/tinyusb/src/portable/raspberrypi/rp2040/hcd_rp2040.c:456: panic("Invalid speed\n");
pico-sdk/lib/tinyusb/src/portable/raspberrypi/rp2040/hcd_rp2040.c:641: panic("hcd_clear_stall");
pico-sdk/lib/tinyusb/src/portable/raspberrypi/rp2040/dcd_rp2040.c:352: panic("Unhandled IRQ 0x%x\n", (uint) (status ^ handled));
pico-sdk/src/rp2040/pico_platform/include/pico/platform.h:25:#include "pico/platform/panic.h"
pico-sdk/src/host/hardware_sync/sync_core0_only.c:98: panic("Can't wait on irq for host core0 only implementation");
pico-sdk/src/host/pico_runtime/runtime.c:10: panic("Hard assert");
pico-sdk/src/host/pico_platform/platform_base.c:20:void panic(const char *fmt, ...)
pico-sdk/src/host/pico_platform/include/pico/platform.h:123:void __noreturn panic(const char *fmt, ...);
pico-sdk/src/rp2_common/pico_platform_panic/panic.c:22: panic("not supported");
pico-sdk/src/rp2_common/pico_platform_panic/panic.c:34:void __attribute__((naked, noreturn)) __printflike(1, 0) panic(__unused const char *fmt, ...)
pico-sdk/src/rp2_common/pico_platform_panic/panic.c:65:void __attribute__((noreturn)) __printflike(1, 0) panic(const char* fmt, ...)
pico-sdk/src/rp2_common/pico_platform_panic/include/pico/platform/panic.h:31:void __attribute__((noreturn)) panic(const char *fmt, ...);
pico-sdk/src/rp2_common/pico_runtime/runtime.c:12: panic("Hard assert");
pico-sdk/src/rp2_common/pico_lwip/include/arch/cc.h:89:#define LWIP_PLATFORM_ASSERT(x) panic(x)
pico-sdk/src/rp2_common/pico_cyw43_driver/cybt_shared_bus/cybt_shared_bus.c:275: panic("cyw43 buffer overflow");
pico-sdk/src/rp2_common/pico_cyw43_driver/cybt_shared_bus/cybt_shared_bus_driver.c:414: panic("cyw43 btsdio register corruption");
pico-sdk/src/rp2_common/pico_cyw43_driver/cyw43_driver.c:185: panic("cyw43 has no ethernet interface");
pico-sdk/src/rp2_common/pico_malloc/malloc.c:30: panic("Out of memory");
pico-sdk/src/rp2_common/pico_double/double_init_rom_rp2040.c:17: panic("missing double function");
pico-sdk/src/rp2_common/pico_double/double_init_rom_rp2040.c:61: panic(NULL);
pico-sdk/src/rp2_common/pico_double/double_none.S:77: j panic
pico-sdk/src/rp2_common/pico_double/double_none.S:81: bl panic
pico-sdk/src/rp2_common/pico_float/float_none.S:75: j panic
pico-sdk/src/rp2_common/pico_float/float_none.S:79: bl panic
pico-sdk/src/rp2_common/pico_float/float_init_rom_rp2040.c:18: panic("");
pico-sdk/src/rp2_common/pico_float/float_init_rom_rp2040.c:38: panic("");
pico-sdk/src/rp2_common/pico_printf/printf_none.S:22: bl panic
pico-sdk/src/rp2_common/pico_printf/printf_none.S:25: call panic
pico-sdk/src/rp2_common/pico_async_context/async_context_poll.c:50: panic("async_context_poll context check failed (IRQ or wrong core)");
pico-sdk/src/rp2_common/pico_multicore/multicore.c:335: panic( "Multicoore doorbell %d already claimed on core mask 0x%x; requested core mask 0x%x\n",
pico-sdk/src/rp2_common/pico_multicore/multicore.c:366: panic("No free doorbells");
pico-sdk/src/rp2_common/pico_stdio_uart/stdio_uart.c:62: panic("UART baud rate undefined");
pico-sdk/src/rp2_common/pico_stdio_uart/stdio_uart.c:73: panic("UART baud rate undefined");
pico-sdk/src/rp2_common/pico_stdio_uart/stdio_uart.c:85: panic("UART baud rate undefined");
pico-sdk/src/rp2_common/hardware_clocks/clocks.c:202: panic("Unexpected clocks irq\n");
pico-sdk/src/rp2_common/hardware_clocks/include/hardware/clocks.h:442: panic("System clock of %u Hz cannot be exactly achieved", freq_hz);
pico-sdk/src/rp2_common/hardware_clocks/include/hardware/clocks.h:464: panic("System clock of %u kHz cannot be exactly achieved", freq_khz);
pico-sdk/src/rp2350/pico_platform/include/pico/platform.h:25:#include "pico/platform/panic.h"
pico-sdk/src/common/pico_time/time.c:387: panic("Attempted to sleep inside of an exception handler; use busy_wait if you must");
pico-sdk/src/common/hardware_claim/claim.c:24: panic(message, bit_index);
pico-sdk/src/common/hardware_claim/claim.c:44: panic(message);
Noticing the panic is followed by 'tcp_slowtmr: TIME-WAIT pcb->state == TIME-W', I have found these lines that look like they could be relevant:
pico-sdk/lib/lwip/src/core/tcp.c:1445: LWIP_ASSERT("tcp_slowtmr: TIME-WAIT pcb->state == TIME-WAIT", pcb->state == TIME_WAIT);
pico-sdk/lib/btstack/3rd-party/lwip/core/src/core/tcp.c:1445: LWIP_ASSERT("tcp_slowtmr: TIME-WAIT pcb->state == TIME-WAIT", pcb->state == TIME_WAIT);
/**
* Called every 500 ms and implements the retransmission timer and the timer that
* removes PCBs that have been in TIME-WAIT for enough time. It also increments
* various timers such as the inactivity timer in each PCB.
*
* Automatically called from tcp_tmr().
*/
void
tcp_slowtmr(void);
/* Steps through all of the TIME-WAIT PCBs. */
prev = NULL;
pcb = tcp_tw_pcbs;
while (pcb != NULL) {
LWIP_ASSERT("tcp_slowtmr: TIME-WAIT pcb->state == TIME-WAIT", pcb->state == TIME_WAIT);
pcb_remove = 0;
/* Check if this PCB has stayed long enough in TIME-WAIT */
if ((u32_t)(tcp_ticks - pcb->tmr) > 2 * TCP_MSL / TCP_SLOW_INTERVAL) {
++pcb_remove;
}
/* If the PCB should be removed, do it. */
if (pcb_remove) {
struct tcp_pcb *pcb2;
tcp_pcb_purge(pcb);
/* Remove PCB from tcp_tw_pcbs list. */
if (prev != NULL) {
LWIP_ASSERT("tcp_slowtmr: middle tcp != tcp_tw_pcbs", pcb != tcp_tw_pcbs);
prev->next = pcb->next;
} else {
/* This PCB was the first. */
LWIP_ASSERT("tcp_slowtmr: first pcb == tcp_tw_pcbs", tcp_tw_pcbs == pcb);
tcp_tw_pcbs = pcb->next;
}
pcb2 = pcb;
pcb = pcb->next;
tcp_free(pcb2);
} else {
prev = pcb;
pcb = pcb->next;
}
}
I cannot work out how to tie this background process timing out to were panic is called, or how I am supposed to handle the timeout error.
I expect the answer is simple but my question is, how am I supposed to catch this call to panic and handle it rather than letting the background task crash the module and exit?
My client main loop is single threaded pico_cyw43_arch_poll
int loop()
{
cyw43_arch_poll();
cyw43_arch_wait_for_work_until(make_timeout_time_ms(1));
int err = task.Next();
if (err) {
JDND::Terminal::Error("ERROR: task.Next err=" + JDND::itoa(err));
}
// Run timer tasks
if (startTasks)
{
JDND::Terminal::Debug("StartTasks...");
startTasks = false;
timer_start(10);
if (!task.IsActive())
{
int err = task.Start(task1);
if (err) {
JDND::Terminal::Error("ERROR: task.Start " + JDND::itoa(err));
}
}
}
}
My server is also singlethreaded and only handles one client at a time, but there are only a few modules and even with only one module running, the problem still exists.
I'm thinking of making a backup copy of the SDK, so I can put it back as is, then adding an extra line just before each call to panic() outputting a number or something, so I can see which function is actually firing it. Do you think this is viable as a test?