0

I built a reasonably basic client-server IoT setup in C++ with a Raspberry Pi 5 server and a few Raspberry Pi Pico W clients. It's nothing special, just keeps track of some temperature and voltage readings for an off grid solar and hydropower system that powers a couple of houses.

I am struggling with the Raspberry Pi Pico W clients randomly crashing outputting *** PANIC *** now and then. If I could work out how catch and handle whatever the error is I could consider the system functional. Even a hard reboot and just start again would be better than as is having to pull the plug to get it working again.

This is the last few lines of serial output before the Raspberry Pi Pico W calls exit(1), I included one complete send/receive followed by a send/failedReceive, but there ate many identical successful send/receives

StartTasks...
task1: Connecting to access point JDnD
Connected to wifi with an IP address
MAC ADDRESS: D83ADD3E85F9
IP ADDRESS:  192.168.1.66
task2: Connecting to 192.168.1.1 port 8081
task3: waiting for connection from server.
Connected to 192.168.1.1 port 8081
task3:
Send... 466
Type:TS;Module:D83ADD3E85F9;Device:2846BEA00600004C;Value:22.3C;IP:192.168.1.66;
Type:TS;Module:D83ADD3E85F9;Device:2835BAA006000092;Value:22.7C;IP:192.168.1.66;
Type:TS;Module:D83ADD3E85F9;Device:2873B6A006000074;Value:22.2C;IP:192.168.1.66;
Type:RL;Module:D83ADD3E85F9;Device:Relay1;Value:ON;IP:192.168.1.66;Enable:FALSE;Action:;
Type:RL;Module:D83ADD3E85F9;Device:Relay2;Value:ON;IP:192.168.1.66;Enable:;Action:;
Type:TM;Module:D83ADD3E85F9;Device:Timer1;Value:;
task4: waiting for response from server......
tcp_client_sent: len=466
tcp_client_receive: len=357 err=0
Responce...
  Type;TS;Module;;Device;2846BEA00600004C;Value;22.3C;1;
  Type;TS;Module;;Device;2835BAA006000092;Value;22.3C;1;
  Type;TS;Module;;Device;2873B6A006000074;Value;22.3C;1;
  Type;RL;Module;;Device;Relay1;Value;ON;2;Group;Huntsman;Name;Fire1R;Enable;FALSE;2;Action;2873B6A006000074>25;
    Value:ON:2;
    Enable:FALSE:2;
  Type;RL;Module;;Device;Relay2;Value;OFF;1;Group;Huntsman;Name;Heater;Enable;FALSE;Action;;
  Type;TM;Module;;Device;Timer1;Value;;
task4: Close Client
StartTasks...
task1: Connecting to access point JDnD
Connected to wifi with an IP address
MAC ADDRESS: D83ADD3E85F9
IP ADDRESS:  192.168.1.66
task2: Connecting to 192.168.1.1 port 8081
task3: waiting for connection from server.
Connected to 192.168.1.1 port 8081
task3:
Send... 466
Type:TS;Module:D83ADD3E85F9;Device:2846BEA00600004C;Value:22.3C;IP:192.168.1.66;
Type:TS;Module:D83ADD3E85F9;Device:2835BAA006000092;Value:22.7C;IP:192.168.1.66;
Type:TS;Module:D83ADD3E85F9;Device:2873B6A006000074;Value:22.2C;IP:192.168.1.66;
Type:RL;Module:D83ADD3E85F9;Device:Relay1;Value:ON;IP:192.168.1.66;Enable:FALSE;Action:;
Type:RL;Module:D83ADD3E85F9;Device:Relay2;Value:ON;IP:192.168.1.66;Enable:;Action:;
Type:TM;Module:D83ADD3E85F9;Device:Timer1;Value:;
task4: waiting for responce from server.
*** PANIC ***

tcp_slowtmr: TIME-WAIT pcb->state == TIME-W

I grep searched the Raspberry Pi Pico W SDK for "PANIC" and found this function in 'rp2_common/pico_platform_panic/panic.c' and 'pico_platform/platform_base.c' that takes a variable number of arguments and outputs them to standard output:

void panic(const char *fmt, ...)
{
    va_list args;
    puts("*** PANIC ***\n");
    if (fmt)
    {
        va_start(args, fmt);
        vprintf(fmt, args);
        va_end(args);
    }
    puts("\n");
    __breakpoint();
}

After '*** PANIC ***', I have a blank line, followed by 'tcp_slowtmr: TIME-WAIT pcb->state == TIME-W'.

I grep search the SDK for calls to the panic function without any real success. I notice a few calls, 'panic("");' and 'panic(NULL);', that could be relevant since there is a blank line output, but nothing stands out, pointing me in any real direction.

pico-sdk/lib/tinyusb/src/portable/raspberrypi/rp2040/rp2040_usb.c:108:    panic("ep %02X was already available", ep->ep_addr);
pico-sdk/lib/tinyusb/src/portable/raspberrypi/rp2040/rp2040_usb.c:307:    panic("Can't continue xfer on inactive ep %02X", ep->ep_addr);
pico-sdk/lib/tinyusb/src/portable/raspberrypi/rp2040/hcd_rp2040.c:163:    panic("Unhandled buffer %d\n", remaining_buffers);
pico-sdk/lib/tinyusb/src/portable/raspberrypi/rp2040/hcd_rp2040.c:246:    panic("Data Seq Error \n");
pico-sdk/lib/tinyusb/src/portable/raspberrypi/rp2040/hcd_rp2040.c:251:    panic("Unhandled IRQ 0x%x\n", (uint) (status ^ handled));
pico-sdk/lib/tinyusb/src/portable/raspberrypi/rp2040/hcd_rp2040.c:456:    panic("Invalid speed\n");
pico-sdk/lib/tinyusb/src/portable/raspberrypi/rp2040/hcd_rp2040.c:641:    panic("hcd_clear_stall");
pico-sdk/lib/tinyusb/src/portable/raspberrypi/rp2040/dcd_rp2040.c:352:    panic("Unhandled IRQ 0x%x\n", (uint) (status ^ handled));
pico-sdk/src/rp2040/pico_platform/include/pico/platform.h:25:#include "pico/platform/panic.h"
pico-sdk/src/host/hardware_sync/sync_core0_only.c:98:                     panic("Can't wait on irq for host core0 only implementation");
pico-sdk/src/host/pico_runtime/runtime.c:10:                              panic("Hard assert");
pico-sdk/src/host/pico_platform/platform_base.c:20:void panic(const char *fmt, ...)
pico-sdk/src/host/pico_platform/include/pico/platform.h:123:void __noreturn panic(const char *fmt, ...);
pico-sdk/src/rp2_common/pico_platform_panic/panic.c:22:                   panic("not supported");
pico-sdk/src/rp2_common/pico_platform_panic/panic.c:34:void __attribute__((naked, noreturn)) __printflike(1, 0) panic(__unused const char *fmt, ...)
pico-sdk/src/rp2_common/pico_platform_panic/panic.c:65:void __attribute__((noreturn)) __printflike(1, 0) panic(const char* fmt, ...)
pico-sdk/src/rp2_common/pico_platform_panic/include/pico/platform/panic.h:31:void __attribute__((noreturn)) panic(const char *fmt, ...);
pico-sdk/src/rp2_common/pico_runtime/runtime.c:12:                        panic("Hard assert");
pico-sdk/src/rp2_common/pico_lwip/include/arch/cc.h:89:#define LWIP_PLATFORM_ASSERT(x) panic(x)
pico-sdk/src/rp2_common/pico_cyw43_driver/cybt_shared_bus/cybt_shared_bus.c:275:        panic("cyw43 buffer overflow");
pico-sdk/src/rp2_common/pico_cyw43_driver/cybt_shared_bus/cybt_shared_bus_driver.c:414: panic("cyw43 btsdio register corruption");
pico-sdk/src/rp2_common/pico_cyw43_driver/cyw43_driver.c:185:             panic("cyw43 has no ethernet interface");
pico-sdk/src/rp2_common/pico_malloc/malloc.c:30:                          panic("Out of memory");
pico-sdk/src/rp2_common/pico_double/double_init_rom_rp2040.c:17:          panic("missing double function");
pico-sdk/src/rp2_common/pico_double/double_init_rom_rp2040.c:61:          panic(NULL);
pico-sdk/src/rp2_common/pico_double/double_none.S:77:    j  panic
pico-sdk/src/rp2_common/pico_double/double_none.S:81:    bl panic
pico-sdk/src/rp2_common/pico_float/float_none.S:75:    j  panic
pico-sdk/src/rp2_common/pico_float/float_none.S:79:    bl panic
pico-sdk/src/rp2_common/pico_float/float_init_rom_rp2040.c:18:           panic("");
pico-sdk/src/rp2_common/pico_float/float_init_rom_rp2040.c:38:           panic("");
pico-sdk/src/rp2_common/pico_printf/printf_none.S:22:    bl panic
pico-sdk/src/rp2_common/pico_printf/printf_none.S:25:    call panic
pico-sdk/src/rp2_common/pico_async_context/async_context_poll.c:50:      panic("async_context_poll context check failed (IRQ or wrong core)");
pico-sdk/src/rp2_common/pico_multicore/multicore.c:335:                  panic( "Multicoore doorbell %d already claimed on core mask 0x%x; requested core mask 0x%x\n",
pico-sdk/src/rp2_common/pico_multicore/multicore.c:366:                  panic("No free doorbells");
pico-sdk/src/rp2_common/pico_stdio_uart/stdio_uart.c:62:                 panic("UART baud rate undefined");
pico-sdk/src/rp2_common/pico_stdio_uart/stdio_uart.c:73:                 panic("UART baud rate undefined");
pico-sdk/src/rp2_common/pico_stdio_uart/stdio_uart.c:85:                 panic("UART baud rate undefined");
pico-sdk/src/rp2_common/hardware_clocks/clocks.c:202:                    panic("Unexpected clocks irq\n");
pico-sdk/src/rp2_common/hardware_clocks/include/hardware/clocks.h:442:   panic("System clock of %u Hz cannot be exactly achieved", freq_hz);
pico-sdk/src/rp2_common/hardware_clocks/include/hardware/clocks.h:464:   panic("System clock of %u kHz cannot be exactly achieved", freq_khz);
pico-sdk/src/rp2350/pico_platform/include/pico/platform.h:25:#include "pico/platform/panic.h"
pico-sdk/src/common/pico_time/time.c:387:                                panic("Attempted to sleep inside of an exception handler; use busy_wait if you must");
pico-sdk/src/common/hardware_claim/claim.c:24:                           panic(message, bit_index);
pico-sdk/src/common/hardware_claim/claim.c:44:                           panic(message);

Noticing the panic is followed by 'tcp_slowtmr: TIME-WAIT pcb->state == TIME-W', I have found these lines that look like they could be relevant:

pico-sdk/lib/lwip/src/core/tcp.c:1445:                           LWIP_ASSERT("tcp_slowtmr: TIME-WAIT pcb->state == TIME-WAIT", pcb->state == TIME_WAIT);
pico-sdk/lib/btstack/3rd-party/lwip/core/src/core/tcp.c:1445:    LWIP_ASSERT("tcp_slowtmr: TIME-WAIT pcb->state == TIME-WAIT", pcb->state == TIME_WAIT);
/**
 * Called every 500 ms and implements the retransmission timer and the timer that
 * removes PCBs that have been in TIME-WAIT for enough time. It also increments
 * various timers such as the inactivity timer in each PCB.
 *
 * Automatically called from tcp_tmr().
 */
void
tcp_slowtmr(void);
/* Steps through all of the TIME-WAIT PCBs. */
  prev = NULL;
  pcb = tcp_tw_pcbs;
  while (pcb != NULL) {
    LWIP_ASSERT("tcp_slowtmr: TIME-WAIT pcb->state == TIME-WAIT", pcb->state == TIME_WAIT);
    pcb_remove = 0;

    /* Check if this PCB has stayed long enough in TIME-WAIT */
    if ((u32_t)(tcp_ticks - pcb->tmr) > 2 * TCP_MSL / TCP_SLOW_INTERVAL) {
      ++pcb_remove;
    }

    /* If the PCB should be removed, do it. */
    if (pcb_remove) {
      struct tcp_pcb *pcb2;
      tcp_pcb_purge(pcb);
      /* Remove PCB from tcp_tw_pcbs list. */
      if (prev != NULL) {
        LWIP_ASSERT("tcp_slowtmr: middle tcp != tcp_tw_pcbs", pcb != tcp_tw_pcbs);
        prev->next = pcb->next;
      } else {
        /* This PCB was the first. */
        LWIP_ASSERT("tcp_slowtmr: first pcb == tcp_tw_pcbs", tcp_tw_pcbs == pcb);
        tcp_tw_pcbs = pcb->next;
      }
      pcb2 = pcb;
      pcb = pcb->next;
      tcp_free(pcb2);
    } else {
      prev = pcb;
      pcb = pcb->next;
    }
  }

I cannot work out how to tie this background process timing out to were panic is called, or how I am supposed to handle the timeout error.

I expect the answer is simple but my question is, how am I supposed to catch this call to panic and handle it rather than letting the background task crash the module and exit?

My client main loop is single threaded pico_cyw43_arch_poll

int loop()
{
    cyw43_arch_poll();
    cyw43_arch_wait_for_work_until(make_timeout_time_ms(1));
    int err = task.Next();
    if (err) {
        JDND::Terminal::Error("ERROR: task.Next err=" + JDND::itoa(err));
    }
    // Run timer tasks
    if (startTasks)
    {
        JDND::Terminal::Debug("StartTasks...");
        startTasks = false;
        timer_start(10);
        if (!task.IsActive())
        {
            int err = task.Start(task1);
            if (err) {
                JDND::Terminal::Error("ERROR: task.Start " + JDND::itoa(err));
            }
        }
    }
}

My server is also singlethreaded and only handles one client at a time, but there are only a few modules and even with only one module running, the problem still exists.

I'm thinking of making a backup copy of the SDK, so I can put it back as is, then adding an extra line just before each call to panic() outputting a number or something, so I can see which function is actually firing it. Do you think this is viable as a test?

4
  • 1
    From my experience with ESP32s, this almost looks as if there is a task watchdog that gets triggered because the polling doesn't allow the idle task to run. I'm not sure if RPi Pico runs any RTOSs as is and I'm not sure about that networking driver, but if neither error nor startTasks kick in, then it seems that all that your loop doing is running really fast and polling a lot (I'm not sure about the second line either yet). I'd say try adding a small delay and see if that helps or reduces the crashes. Commented Oct 29, 2024 at 4:07
  • im thinking of making a backup copy of the sdk so i can put it back as is, then adding an extra line just before each call to panic() outputting a number or something, so i can see witch function is actually firing it. do you think this is viable as a test? Commented Oct 29, 2024 at 11:53
  • 1
    You can and I say do it, but I'm also still thinking either you're starving the idle task or there is a memory leak by you opening too many TCP connections or not freeing something, that may be dependent on the lwIP settings/configuration. Could you share enough server & client code that anyone here could replicate the issue? Commented Oct 30, 2024 at 1:04
  • I guess I likely have a memory leak to find. what do you mean by 'you're starving the idle task' Commented Oct 30, 2024 at 4:31

1 Answer 1

1

I finally fixed it.

It was missing a null terminator from the end of the response from the server back to the Raspberry Pi Pico W.

std::string result = JDND::Strings::Join(lines, '\n');
status = write(sock_client, result.c_str(), result.length());

Notice the plus 1.

std::string result = JDND::Strings::Join(lines, '\n');
status = write(sock_client, result.c_str(), result.length()+1);

The Raspberry Pi Pico W send a packet 'DATA\nDATA\nDATA\0' and received back 'DATA\nDATA\nDATA\nUNDEFINED\0'. It iterated over the response, checking each DATA, and then doped out, ignoring the last UNDEFINED line gone unnoticed.

Most of the time, it did not cause any bother, but I guess depending on how far away the next undefined \0 happen to be determined if it would fit in the allocated buffer on the Raspberry Pi Pico W or not.

I have realized that well sending the null terminator in the packet fixed the problem, it was still stupid as that makes the clients rely on the server to do so. Adding the null terminator to the end of the the received packet on the Raspberry Pi Pico W client upon receiving data is the real fix:

static void client_receive(void* arg, struct udp_pcb* pcb, struct pbuf* p, const ip_addr_t* addr, u16_t port)
{
    if (p == nullptr) {
        return;
    }
    cyw43_arch_lwip_check();

    char* data = (char*)p->payload; // ADDED THIS LINE
    data[p->tot_len] = '\0'; // ADDED THIS LINE

    client_received = std::string((const char*)p->payload);
    pbuf_free(p);
}
Sign up to request clarification or add additional context in comments.

3 Comments

Great! I am still curious why that would cause a panic though. In my comments above, I've indicated it's possible that the idle task is starved - that's because with wireless communication, the system is de facto multi-threaded (or even multi-cored) - one of the threads services/works on the wireless stuff. Based on my poking around, it seems that the driver does use FreeRTOS. If you have a WDT set for idle task, then It's possible that if there is blocking loop on the wireless (or any) thread, which doesn't allow for the RTOS to run the idle task, then it'll panic (crash/restart).
ok thanks, i was just trying to grasp what you meant, yes the peco is multi core but im only using core 1 single thread poll the wifi in main loop. speed and packet size is quite irrelevant for my tiny little packets so i did not see the point in threading it. i have since realized that well sending the null terminator in the packet fixed the problem, it was still stupid as that makes the clients rely on the server to do so. adding the null terminator to the end of the the received packet on the pico client upon receiving data is the real fix
i guess anything that generates undefined behavior is capable of setting of the PANIC warning after accidentally dialing 999 from the US military base (can you actually call 999 via UDP)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.