aboutsummaryrefslogtreecommitdiffstats
path: root/net/dsa
AgeCommit message (Collapse)AuthorFilesLines
11 daysnet: dsa: add simple HSR offload helpersVladimir Oltean1-0/+65
It turns out that HSR offloads are so fine-grained that many DSA switches can do a small part even though they weren't specifically designed for the protocols supported by that driver (HSR and PRP). Specifically NETIF_F_HW_HSR_DUP - it is simple packet duplication on transmit, towards all (aka 2) ports members of the HSR device. For many DSA switches, we know how to duplicate a packet, even though we never typically use that feature. The transmit port mask from the tagging protocol can have multiple bits set, and the switch should send the packet once to every port with a bit set from that mask. Nonetheless, not all tagging protocols are like this, and sometimes the port is a single numeric value rather than a bit mask. For that reason, and also because switches can sometimes change tagging protocols for different ones, we need to make HSR offload helpers opt-in. For devices that can do nothing else HSR-specific, we introduce dsa_port_simple_hsr_join() and dsa_port_simple_hsr_leave(). These functions monitor when two user ports of the same switch are part of the same HSR device, and when that condition is true, they toggle the NETIF_F_HW_HSR_DUP feature flag of both net devices. Normally only dsa_port_simple_hsr_join() and dsa_port_simple_hsr_leave() are needed. The dsa_port_simple_hsr_validate() helper is just to see what kind of configuration could be offloadable using the generic helpers. This is used by switch drivers which are not currently using the right tagging protocol to offload this HSR ring, but could in principle offload it after changing the tagger. Suggested-by: David Yang <mmyangfl@gmail.com> Cc: "Alvin Šipraga" <alsi@bang-olufsen.dk> Cc: Chester A. Unal" <chester.a.unal@arinc9.com> Cc: "Clément Léger" <clement.leger@bootlin.com> Cc: Daniel Golle <daniel@makrotopia.org> Cc: DENG Qingfang <dqfext@gmail.com> Cc: Florian Fainelli <florian.fainelli@broadcom.com> Cc: George McCollister <george.mccollister@gmail.com> Cc: Hauke Mehrtens <hauke@hauke-m.de> Cc: Jonas Gorski <jonas.gorski@gmail.com> Cc: Kurt Kanzenbach <kurt@linutronix.de> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Sean Wang <sean.wang@mediatek.com> Cc: UNGLinuxDriver@microchip.com Cc: Woojung Huh <woojung.huh@microchip.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20251130131657.65080-6-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 daysnet: dsa: avoid calling ds->ops->port_hsr_leave() when unoffloadedVladimir Oltean1-0/+3
This mirrors what we do in dsa_port_lag_leave() and dsa_port_bridge_leave(): when ds->ops->port_hsr_join() returns -EOPNOTSUPP, we fall back to a software implementation where dp->hsr_dev is NULL, and the unoffloaded port is no longer bothered with calls from the HSR layer. This helps, for example, with interlink ports which current DSA drivers don't know how to offload. We have to check only in port_hsr_join() for the port type, then in port_hsr_leave() we are sure we're dealing only with known port types. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20251130131657.65080-4-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
14 daysnet: dsa: tag_yt921x: use the dsa_xmit_port_mask() helperVladimir Oltean1-5/+3
The "yt921x" tagging protocol populates a bit mask for the TX ports, so we can use dsa_xmit_port_mask() to centralize the decision of how to set that field. Cc: David Yang <mmyangfl@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20251127120902.292555-16-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
14 daysnet: dsa: tag_xrs700x: use the dsa_xmit_port_mask() helperVladimir Oltean1-7/+1
The "xrs700x" is the original DSA tagging protocol with HSR TX replication support, we now essentially move that logic to the dsa_xmit_port_mask() helper. The end result is something akin to hellcreek_xmit() (but reminds me I should also take care of skb_checksum_help() for tail taggers in the core). The implementation differences to dsa_xmit_port_mask() are immaterial. Cc: George McCollister <george.mccollister@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20251127120902.292555-15-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
14 daysnet: dsa: tag_trailer: use the dsa_xmit_port_mask() helperVladimir Oltean1-2/+1
The "trailer" tagging protocol populates a bit mask for the TX ports, so we can use dsa_xmit_port_mask() to centralize the decision of how to set that field. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20251127120902.292555-14-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
14 daysnet: dsa: tag_rzn1_a5psw: use the dsa_xmit_port_mask() helperVladimir Oltean1-2/+1
The "a5psw" tagging protocol populates a bit mask for the TX ports, so we can use dsa_xmit_port_mask() to centralize the decision of how to set that field. Cc: "Clément Léger" <clement.leger@bootlin.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20251127120902.292555-13-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
14 daysnet: dsa: tag_rtl8_4: use the dsa_xmit_port_mask() helperVladimir Oltean1-2/+1
The "rtl8_4" and "rtl8_4t" tagging protocols populate a bit mask for the TX ports, so we can use dsa_xmit_port_mask() to centralize the decision of how to set that field. Cc: Linus Walleij <linus.walleij@linaro.org> Cc: "Alvin Šipraga" <alsi@bang-olufsen.dk> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20251127120902.292555-12-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
14 daysnet: dsa: tag_rtl4_a: use the dsa_xmit_port_mask() helperVladimir Oltean1-1/+1
The "rtl4a" tagging protocol populates a bit mask for the TX ports, so we can use dsa_xmit_port_mask() to centralize the decision of how to set that field. Cc: Linus Walleij <linus.walleij@linaro.org> Cc: "Alvin Šipraga" <alsi@bang-olufsen.dk> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20251127120902.292555-11-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
14 daysnet: dsa: tag_qca: use the dsa_xmit_port_mask() helperVladimir Oltean1-2/+1
The "qca" tagging protocol populates a bit mask for the TX ports, so we can use dsa_xmit_port_mask() to centralize the decision of how to set that field. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20251127120902.292555-10-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
14 daysnet: dsa: tag_ocelot: use the dsa_xmit_port_mask() helperVladimir Oltean1-4/+2
The "ocelot" and "seville" tagging protocols populate a bit mask for the TX ports, so we can use dsa_xmit_port_mask() to centralize the decision of how to set that field. This protocol used BIT_ULL() rather than simple BIT() to silence Smatch, as explained in commit 1f778d500df3 ("net: mscc: ocelot: avoid type promotion when calling ocelot_ifh_set_dest"). I would expect that this tool no longer complains now, when the BIT(dp->index) is hidden inside the dsa_xmit_port_mask() function, the return value of which is promoted to u64. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20251127120902.292555-9-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
14 daysnet: dsa: tag_mxl_gsw1xx: use the dsa_xmit_port_mask() helperVladimir Oltean1-3/+4
The "gsw1xx" tagging protocol populates a bit mask for the TX ports, so we can use dsa_xmit_port_mask() to centralize the decision of how to set that field. Cc: Hauke Mehrtens <hauke@hauke-m.de> Cc: Daniel Golle <daniel@makrotopia.org> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20251127120902.292555-8-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
14 daysnet: dsa: tag_mtk: use the dsa_xmit_port_mask() helperVladimir Oltean1-1/+2
The "mtk" tagging protocol populates a bit mask for the TX ports, so we can use dsa_xmit_port_mask() to centralize the decision of how to set that field. Cc: Chester A. Unal" <chester.a.unal@arinc9.com> Cc: Daniel Golle <daniel@makrotopia.org> Cc: DENG Qingfang <dqfext@gmail.com> Cc: Sean Wang <sean.wang@mediatek.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20251127120902.292555-7-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
14 daysnet: dsa: tag_ksz: use the dsa_xmit_port_mask() helperVladimir Oltean1-16/+4
The "ksz8795", "ksz9893", "ksz9477" and "lan937x" tagging protocols populate a bit mask for the TX ports. Unlike the others, "ksz9477" also accelerates HSR packet duplication. Make the HSR duplication logic available generically to all 4 taggers by using the dsa_xmit_port_mask() function to set the TX port mask. Cc: Woojung Huh <woojung.huh@microchip.com> Cc: UNGLinuxDriver@microchip.com Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20251127120902.292555-6-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
14 daysnet: dsa: tag_hellcreek: use the dsa_xmit_port_mask() helperVladimir Oltean1-2/+1
The "hellcreek" tagging protocol populates a bit mask for the TX ports, so we can use dsa_xmit_port_mask() to centralize the decision of how to set that field. Cc: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20251127120902.292555-5-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
14 daysnet: dsa: tag_gswip: use the dsa_xmit_port_mask() helperVladimir Oltean1-4/+2
The "gswip" tagging protocol populates a bit mask for the TX ports, so we can use dsa_xmit_port_mask() to centralize the decision of how to set that field. Cc: Hauke Mehrtens <hauke@hauke-m.de> Cc: Daniel Golle <daniel@makrotopia.org> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20251127120902.292555-4-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
14 daysnet: dsa: tag_brcm: use the dsa_xmit_port_mask() helperVladimir Oltean1-4/+4
The "brcm" and "brcm-prepend" tagging protocols populate a bit mask for the TX ports, so we can use dsa_xmit_port_mask() to centralize the decision of how to set that field. The port mask is written u8 by u8, first the high octet and then the low octet. Cc: Florian Fainelli <florian.fainelli@broadcom.com> Cc: Jonas Gorski <jonas.gorski@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20251127120902.292555-3-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
14 daysnet: dsa: introduce the dsa_xmit_port_mask() tagging protocol helperVladimir Oltean1-0/+18
Many tagging protocols deal with the transmit port mask being a bit mask, and set it to BIT(dp->index). Not a big deal. Also, some tagging protocols are written for switches which support HSR offload (including packet duplication offload), there we see a walk using dsa_hsr_foreach_port() to find the other port in the same switch that's member of the HSR, and set that bit in the port mask too. That isn't sufficiently interesting either, until you come to realize that there isn't anything special in the second case that switches just in the first one can't do too. It just becomes a matter of "is it wise to do it? are sufficient people using HSR/PRP with generic off-the-shelf switches to justify add an extra test in the data path?" - the answer to which is probably "it depends". It isn't _much_ worse to not have HSR offload at all, so as to make it impractical, esp. with a rich OS like Linux. But the HSR users are rather specialized in industrial networking. Anyway, the change acts on the premise that we're going to have support for this, it should be uniformly implemented for everyone, and that if we find some sort of balance, we can keep everyone relatively happy. So I've disabled that logic if CONFIG_HSR isn't enabled, and I've tilted the branch predictor to say it's unlikely we're transmitting through a port with this capability currently active. On branch miss, we're still going to save the transmission of one packet, so there's some remaining benefit there too. I don't _think_ we need to jump to static keys yet. The helper returns a 32-bit zero-based unsigned number, that callers have to transpose using FIELD_PREP(). It is not the first time we assume DSA switches won't be larger than 32 ports - dsa_user_ports() has that assumption baked into it too. One last development note about why pass the "skb" argument when this isn't used. Looking at the compiled code on arm64, which is identical both with and without it, the answer is "why not?" - who knows what other features dependent on the skb may be handled in the future. Link: https://lore.kernel.org/netdev/20251126093240.2853294-4-mmyangfl@gmail.com/ Cc: "Alvin Šipraga" <alsi@bang-olufsen.dk> Cc: Chester A. Unal" <chester.a.unal@arinc9.com> Cc: "Clément Léger" <clement.leger@bootlin.com> Cc: Daniel Golle <daniel@makrotopia.org> Cc: David Yang <mmyangfl@gmail.com> Cc: DENG Qingfang <dqfext@gmail.com> Cc: Florian Fainelli <florian.fainelli@broadcom.com> Cc: George McCollister <george.mccollister@gmail.com> Cc: Hauke Mehrtens <hauke@hauke-m.de> Cc: Jonas Gorski <jonas.gorski@gmail.com> Cc: Kurt Kanzenbach <kurt@linutronix.de> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Sean Wang <sean.wang@mediatek.com> Cc: UNGLinuxDriver@microchip.com Cc: Woojung Huh <woojung.huh@microchip.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20251127120902.292555-2-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25net: dsa: append ethtool counters of all hidden ports to conduitVladimir Oltean1-33/+93
Currently there is no way to see packet counters on cascade ports, and no clarity on how the API for that would look like. Because it's something that is currently needed, just extend the hack where ethtool -S on the conduit interface dumps CPU port counters, and also use it to dump counters of cascade ports. Note that the "pXX_" naming convention changes to "sXX_pYY", to distinguish between ports having the same index but belonging to different switches. This has a slight chance of causing regressions to existing tooling: - grepping for "p04_counter_name" still works, but might return more than one string now - grepping for " p04_counter_name" no longer works Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20251122112311.138784-4-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25net: dsa: use kernel data types for ethtool ops on conduitVladimir Oltean1-6/+5
Suppress some checkpatch 'CHECK' messages about u8 being preferable over uint8_t, etc. No functional change. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20251122112311.138784-3-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25net: dsa: cpu_dp->orig_ethtool_ops might be NULLVladimir Oltean1-7/+7
In theory this would have been seen by now, but it seems that all drivers used as DSA conduit interfaces thus far have had ethtool_ops set, and it's hard to even find modern Ethernet drivers (and not VF ones) which don't use ethtool. Here is the unfiltered list of drivers which register any sort of net_device but don't set its ethtool_ops pointer. I don't think any of them 'risks' being used as a DSA conduit, maybe except for moxart, rnpbge and icssm, I'm not sure. - drivers/net/can/dev/dev.c - drivers/net/wwan/qcom_bam_dmux.c - drivers/net/wwan/t7xx/t7xx_netdev.c - drivers/net/arcnet/arcnet.c - drivers/net/hamradio/ - drivers/net/slip/slip.c - drivers/net/ethernet/ezchip/nps_enet.c - drivers/net/ethernet/moxa/moxart_ether.c - drivers/net/ethernet/wangxun/txgbevf/txgbevf_main.c - drivers/net/ethernet/wangxun/ngbevf/ngbevf_main.c - drivers/net/ethernet/huawei/hinic3/hinic3_main.c - drivers/net/ethernet/i825xx/ - drivers/net/ethernet/ti/icssm/icssm_prueth.c - drivers/net/ethernet/seeq/ - drivers/net/ethernet/litex/litex_liteeth.c - drivers/net/ethernet/sunplus/spl2sw_driver.c - drivers/net/ethernet/mucse/rnpgbe/rnpgbe_main.c - drivers/net/ipa/ - drivers/net/wireless/microchip/wilc1000/ - drivers/net/wireless/mediatek/mt76/dma.c - drivers/net/wireless/ath/ath12k/ - drivers/net/wireless/ath/ath11k/ - drivers/net/wireless/ath/ath6kl/ - drivers/net/wireless/ath/ath10k/ - drivers/net/wireless/intel/iwlwifi/pcie/gen1_2/trans.c - drivers/net/wireless/virtual/mac80211_hwsim.c - drivers/net/wireless/quantenna/qtnfmac/pcie/pcie.c - drivers/net/wireless/realtek/rtw89/core.c - drivers/net/wireless/realtek/rtw88/pci.c - drivers/net/caif/ - drivers/net/plip/ - drivers/net/wan/ - drivers/net/mctp/ - drivers/net/ppp/ - drivers/net/thunderbolt/ Nonetheless, it's good for the framework not to make such assumptions, and not panic when coming across such kind of host device in the future. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20251122112311.138784-2-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-20devlink: pass extack through to devlink_param::get()Daniel Zahka1-1/+2
Allow devlink_param::get() handlers to report error messages via extack. This function is called in a few different contexts, but not all of them will have an valid extack to use. When devlink_param::get() is called from param_get_doit or param_get_dumpit contexts, pass the extack through so that drivers can report errors when retrieving param values. devlink_param::get() is called from the context of devlink_param_notify(), pass NULL in for the extack. Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com> Link: https://patch.msgid.link/20251119025038.651131-2-daniel.zahka@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-2/+4
Cross-merge networking fixes after downstream PR (net-6.18-rc6). No conflicts, adjacent changes in: drivers/net/phy/micrel.c 96a9178a29a6 ("net: phy: micrel: lan8814 fix reset of the QSGMII interface") 61b7ade9ba8c ("net: phy: micrel: Add support for non PTP SKUs for lan8814") and a trivial one in tools/testing/selftests/drivers/net/Makefile. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-10net: dsa: tag_brcm: do not mark link local traffic as offloadedJonas Gorski1-2/+4
Broadcom switches locally terminate link local traffic and do not forward it, so we should not mark it as offloaded. In some situations we still want/need to flood this traffic, e.g. if STP is disabled, or it is explicitly enabled via the group_fwd_mask. But if the skb is marked as offloaded, the kernel will assume this was already done in hardware, and the packets never reach other bridge ports. So ensure that link local traffic is never marked as offloaded, so that the kernel can forward/flood these packets in software if needed. Since the local termination in not configurable, check the destination MAC, and never mark packets as offloaded if it is a link local ether address. While modern switches set the tag reason code to BRCM_EG_RC_PROT_TERM for trapped link local traffic, they also set it for link local traffic that is flooded (01:80:c2:00:00:10 to 01:80:c2:00:00:2f), so we cannot use it and need to look at the destination address for them as well. Fixes: 964dbf186eaa ("net: dsa: tag_brcm: add support for legacy tags") Fixes: 0e62f543bed0 ("net: dsa: Fix duplicate frames flooded by learning") Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20251109134635.243951-1-jonas.gorski@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-06net: dsa: add tagging driver for MaxLinear GSW1xx switch familyDaniel Golle3-0/+125
Add support for a new DSA tagging protocol driver for the MaxLinear GSW1xx switch family. The GSW1xx switches use a proprietary 8-byte special tag inserted between the source MAC address and the EtherType field to indicate the source and destination ports for frames traversing the CPU port. Implement the tag handling logic to insert the special tag on transmit and parse it on receive. Signed-off-by: Daniel Golle <daniel@makrotopia.org> Reviewed-by: Alexander Sverdlin <alexander.sverdlin@siemens.com> Tested-by: Alexander Sverdlin <alexander.sverdlin@siemens.com> Link: https://patch.msgid.link/0e973ebfd9433c30c96f50670da9e9449a0d98f2.1762170107.git.daniel@makrotopia.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-2/+8
Cross-merge networking fixes after downstream PR (net-6.18-rc5). Conflicts: drivers/net/wireless/ath/ath12k/mac.c 9222582ec524 ("Revert "wifi: ath12k: Fix missing station power save configuration"") 6917e268c433 ("wifi: ath12k: Defer vdev bring-up until CSA finalize to avoid stale beacon") https://lore.kernel.org/11cece9f7e36c12efd732baa5718239b1bf8c950.camel@sipsolutions.net Adjacent changes: drivers/net/ethernet/intel/Kconfig b1d16f7c0063 ("libie: depend on DEBUG_FS when building LIBIE_FWLOG") 93f53db9f9dc ("ice: switch to Page Pool") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-31net: dsa: tag_brcm: legacy: fix untagged rx on unbridged ports for bcm63xxJonas Gorski1-2/+8
The internal switch on BCM63XX SoCs will unconditionally add 802.1Q VLAN tags on egress to CPU when 802.1Q mode is enabled. We do this unconditionally since commit ed409f3bbaa5 ("net: dsa: b53: Configure VLANs while not filtering"). This is fine for VLAN aware bridges, but for standalone ports and vlan unaware bridges this means all packets are tagged with the default VID, which is 0. While the kernel will treat that like untagged, this can break userspace applications processing raw packets, expecting untagged traffic, like STP daemons. This also breaks several bridge tests, where the tcpdump output then does not match the expected output anymore. Since 0 isn't a valid VID, just strip out the VLAN tag if we encounter it, unless the priority field is set, since that would be a valid tag again. Fixes: 964dbf186eaa ("net: dsa: tag_brcm: add support for legacy tags") Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://patch.msgid.link/20251027194621.133301-1-jonas.gorski@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-21net: dsa: tag_yt921x: add support for Motorcomm YT921x tagsDavid Yang3-0/+148
Add support for Motorcomm YT921x tags, which includes a proper configurable ethertype field (default to 0x9988). Signed-off-by: David Yang <mmyangfl@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20251017060859.326450-3-mmyangfl@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-18net: s/dev_close_many/netif_close_many/Stanislav Fomichev2-2/+3
Commit cc34acd577f1 ("docs: net: document new locking reality") introduced netif_ vs dev_ function semantics: the former expects locked netdev, the latter takes care of the locking. We don't strictly follow this semantics on either side, but there are more dev_xxx handlers now that don't fit. Rename them to netif_xxx where appropriate. netif_close_many is used only by vlan/dsa and one mtk driver, so move it into NETDEV_INTERNAL namespace. Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250717172333.1288349-8-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17net: dsa: tag_brcm: add support for legacy FCS tagsÁlvaro Fernández Rojas2-3/+86
Add support for legacy Broadcom FCS tags, which are similar to DSA_TAG_PROTO_BRCM_LEGACY. BCM5325 and BCM5365 switches require including the original FCS value and length, as opposed to BCM63xx switches. Adding the original FCS value and length to DSA_TAG_PROTO_BRCM_LEGACY would impact performance of BCM63xx switches, so it's better to create a new tag. Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20250614080000.1884236-3-noltari@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17net: dsa: tag_brcm: legacy: reorganize functionsÁlvaro Fernández Rojas1-32/+32
Move brcm_leg_tag_rcv() definition to top. This function is going to be shared between two different tags. Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com> Link: https://patch.msgid.link/20250614080000.1884236-2-noltari@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-30net: dsa: tag_brcm: legacy: fix pskb_may_pull lengthÁlvaro Fernández Rojas1-1/+1
BRCM_LEG_PORT_ID was incorrectly used for pskb_may_pull length. The correct check is BRCM_LEG_TAG_LEN + VLAN_HLEN, or 10 bytes. Fixes: 964dbf186eaa ("net: dsa: tag_brcm: add support for legacy tags") Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20250529124406.2513779-1-noltari@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-4/+15
Cross-merge networking fixes after downstream PR (net-6.15-rc8). Conflicts: 80f2ab46c2ee ("irdma: free iwdev->rf after removing MSI-X") 4bcc063939a5 ("ice, irdma: fix an off by one in error handling code") c24a65b6a27c ("iidc/ice/irdma: Update IDC to support multiple consumers") https://lore.kernel.org/20250513130630.280ee6c5@canb.auug.org.au No extra adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-16net: dsa: microchip: linearize skb for tail-tagging switchesJakob Unterwurzacher1-4/+15
The pointer arithmentic for accessing the tail tag only works for linear skbs. For nonlinear skbs, it reads uninitialized memory inside the skb headroom, essentially randomizing the tag. I have observed it gets set to 6 most of the time. Example where ksz9477_rcv thinks that the packet from port 1 comes from port 6 (which does not exist for the ksz9896 that's in use), dropping the packet. Debug prints added by me (not included in this patch): [ 256.645337] ksz9477_rcv:323 tag0=6 [ 256.645349] skb len=47 headroom=78 headlen=0 tailroom=0 mac=(64,14) mac_len=14 net=(78,0) trans=78 shinfo(txflags=0 nr_frags=1 gso(size=0 type=0 segs=0)) csum(0x0 start=0 offset=0 ip_summed=0 complete_sw=0 valid=0 level=0) hash(0x0 sw=0 l4=0) proto=0x00f8 pkttype=1 iif=3 priority=0x0 mark=0x0 alloc_cpu=0 vlan_all=0x0 encapsulation=0 inner(proto=0x0000, mac=0, net=0, trans=0) [ 256.645377] dev name=end1 feat=0x0002e10200114bb3 [ 256.645386] skb headroom: 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 256.645395] skb headroom: 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 256.645403] skb headroom: 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 256.645411] skb headroom: 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 256.645420] skb headroom: 00000040: ff ff ff ff ff ff 00 1c 19 f2 e2 db 08 06 [ 256.645428] skb frag: 00000000: 00 01 08 00 06 04 00 01 00 1c 19 f2 e2 db 0a 02 [ 256.645436] skb frag: 00000010: 00 83 00 00 00 00 00 00 0a 02 a0 2f 00 00 00 00 [ 256.645444] skb frag: 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 [ 256.645452] ksz_common_rcv:92 dsa_conduit_find_user returned NULL Call skb_linearize before trying to access the tag. This patch fixes ksz9477_rcv which is used by the ksz9896 I have at hand, and also applies the same fix to ksz8795_rcv which seems to have the same problem. Signed-off-by: Jakob Unterwurzacher <jakob.unterwurzacher@cherry.de> CC: stable@vger.kernel.org Fixes: 016e43a26bab ("net: dsa: ksz: Add KSZ8795 tag code") Fixes: 8b8010fb7876 ("dsa: add support for Microchip KSZ tail tagging") Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://patch.msgid.link/20250515072920.2313014-1-jakob.unterwurzacher@cherry.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-09net: dsa: convert to ndo_hwtstamp_get() and ndo_hwtstamp_set()Vladimir Oltean2-21/+30
New timestamping API was introduced in commit 66f7223039c0 ("net: add NDOs for configuring hardware timestamping") from kernel v6.6. It is time to convert DSA to the new API, so that the ndo_eth_ioctl() path can be removed completely. Move the ds->ops->port_hwtstamp_get() and ds->ops->port_hwtstamp_set() calls from dsa_user_ioctl() to dsa_user_hwtstamp_get() and dsa_user_hwtstamp_set(). Due to the fact that the underlying ifreq type changes to kernel_hwtstamp_config, the drivers and the Ocelot switchdev front-end, all hooked up directly or indirectly, must also be converted all at once. The conversion also updates the comment from dsa_port_supports_hwtstamp(), which is no longer true because kernel_hwtstamp_config is kernel memory and does not need copy_to_user(). I've deliberated whether it is necessary to also update "err != -EOPNOTSUPP" to a more general "!err", but all drivers now either return 0 or -EOPNOTSUPP. The existing logic from the ocelot_ioctl() function, to avoid configuring timestamping if the PHY supports the operation, is obsoleted by more advanced core logic in dev_set_hwtstamp_phylib(). This is only a partial preparation for proper PHY timestamping support. None of these switch driver currently sets up PTP traps for PHY timestamping, so setting dev->see_all_hwtstamp_requests is not yet necessary and the conversion is relatively trivial. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> # felix, sja1105, mv88e6xxx Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20250508095236.887789-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-16net: dsa: avoid refcount warnings when ds->ops->tag_8021q_vlan_del() failsVladimir Oltean1-1/+1
This is very similar to the problem and solution from commit 232deb3f9567 ("net: dsa: avoid refcount warnings when ->port_{fdb,mdb}_del returns error"), except for the dsa_port_do_tag_8021q_vlan_del() operation. Fixes: c64b9c05045a ("net: dsa: tag_8021q: add proper cross-chip notifier support") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20250414213020.2959021-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-16net: dsa: free routing table on probe failureVladimir Oltean1-7/+14
If complete = true in dsa_tree_setup(), it means that we are the last switch of the tree which is successfully probing, and we should be setting up all switches from our probe path. After "complete" becomes true, dsa_tree_setup_cpu_ports() or any subsequent function may fail. If that happens, the entire tree setup is in limbo: the first N-1 switches have successfully finished probing (doing nothing but having allocated persistent memory in the tree's dst->ports, and maybe dst->rtable), and switch N failed to probe, ending the tree setup process before anything is tangible from the user's PoV. If switch N fails to probe, its memory (ports) will be freed and removed from dst->ports. However, the dst->rtable elements pointing to its ports, as created by dsa_link_touch(), will remain there, and will lead to use-after-free if dereferenced. If dsa_tree_setup_switches() returns -EPROBE_DEFER, which is entirely possible because that is where ds->ops->setup() is, we get a kasan report like this: ================================================================== BUG: KASAN: slab-use-after-free in mv88e6xxx_setup_upstream_port+0x240/0x568 Read of size 8 at addr ffff000004f56020 by task kworker/u8:3/42 Call trace: __asan_report_load8_noabort+0x20/0x30 mv88e6xxx_setup_upstream_port+0x240/0x568 mv88e6xxx_setup+0xebc/0x1eb0 dsa_register_switch+0x1af4/0x2ae0 mv88e6xxx_register_switch+0x1b8/0x2a8 mv88e6xxx_probe+0xc4c/0xf60 mdio_probe+0x78/0xb8 really_probe+0x2b8/0x5a8 __driver_probe_device+0x164/0x298 driver_probe_device+0x78/0x258 __device_attach_driver+0x274/0x350 Allocated by task 42: __kasan_kmalloc+0x84/0xa0 __kmalloc_cache_noprof+0x298/0x490 dsa_switch_touch_ports+0x174/0x3d8 dsa_register_switch+0x800/0x2ae0 mv88e6xxx_register_switch+0x1b8/0x2a8 mv88e6xxx_probe+0xc4c/0xf60 mdio_probe+0x78/0xb8 really_probe+0x2b8/0x5a8 __driver_probe_device+0x164/0x298 driver_probe_device+0x78/0x258 __device_attach_driver+0x274/0x350 Freed by task 42: __kasan_slab_free+0x48/0x68 kfree+0x138/0x418 dsa_register_switch+0x2694/0x2ae0 mv88e6xxx_register_switch+0x1b8/0x2a8 mv88e6xxx_probe+0xc4c/0xf60 mdio_probe+0x78/0xb8 really_probe+0x2b8/0x5a8 __driver_probe_device+0x164/0x298 driver_probe_device+0x78/0x258 __device_attach_driver+0x274/0x350 The simplest way to fix the bug is to delete the routing table in its entirety. dsa_tree_setup_routing_table() has no problem in regenerating it even if we deleted links between ports other than those of switch N, because dsa_link_touch() first checks whether the port pair already exists in dst->rtable, allocating if not. The deletion of the routing table in its entirety already exists in dsa_tree_teardown(), so refactor that into a function that can also be called from the tree setup error path. In my analysis of the commit to blame, it is the one which added dsa_link elements to dst->rtable. Prior to that, each switch had its own ds->rtable which is freed when the switch fails to probe. But the tree is potentially persistent memory. Fixes: c5f51765a1f6 ("net: dsa: list DSA links in the fabric") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20250414213001.2957964-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-16net: dsa: clean up FDB, MDB, VLAN entries on unbindVladimir Oltean1-3/+35
As explained in many places such as commit b117e1e8a86d ("net: dsa: delete dsa_legacy_fdb_add and dsa_legacy_fdb_del"), DSA is written given the assumption that higher layers have balanced additions/deletions. As such, it only makes sense to be extremely vocal when those assumptions are violated and the driver unbinds with entries still present. But Ido Schimmel points out a very simple situation where that is wrong: https://lore.kernel.org/netdev/ZDazSM5UsPPjQuKr@shredder/ (also briefly discussed by me in the aforementioned commit). Basically, while the bridge bypass operations are not something that DSA explicitly documents, and for the majority of DSA drivers this API simply causes them to go to promiscuous mode, that isn't the case for all drivers. Some have the necessary requirements for bridge bypass operations to do something useful - see dsa_switch_supports_uc_filtering(). Although in tools/testing/selftests/net/forwarding/local_termination.sh, we made an effort to popularize better mechanisms to manage address filters on DSA interfaces from user space - namely macvlan for unicast, and setsockopt(IP_ADD_MEMBERSHIP) - through mtools - for multicast, the fact is that 'bridge fdb add ... self static local' also exists as kernel UAPI, and might be useful to someone, even if only for a quick hack. It seems counter-productive to block that path by implementing shim .ndo_fdb_add and .ndo_fdb_del operations which just return -EOPNOTSUPP in order to prevent the ndo_dflt_fdb_add() and ndo_dflt_fdb_del() from running, although we could do that. Accepting that cleanup is necessary seems to be the only option. Especially since we appear to be coming back at this from a different angle as well. Russell King is noticing that the WARN_ON() triggers even for VLANs: https://lore.kernel.org/netdev/Z_li8Bj8bD4-BYKQ@shell.armlinux.org.uk/ What happens in the bug report above is that dsa_port_do_vlan_del() fails, then the VLAN entry lingers on, and then we warn on unbind and leak it. This is not a straight revert of the blamed commit, but we now add an informational print to the kernel log (to still have a way to see that bugs exist), and some extra comments gathered from past years' experience, to justify the logic. Fixes: 0832cd9f1f02 ("net: dsa: warn if port lists aren't empty in dsa_port_teardown") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20250414212930.2956310-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-08net: move misc netdev_lock flavors to a separate headerJakub Kicinski1-0/+1
Move the more esoteric helpers for netdev instance lock to a dedicated header. This avoids growing netdevice.h to infinity and makes rebuilding the kernel much faster (after touching the header with the helpers). The main netdev_lock() / netdev_unlock() functions are used in static inlines in netdevice.h and will probably be used most commonly, so keep them in netdevice.h. Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250307183006.2312761-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06net: ethtool: try to protect all callback with netdev instance lockJakub Kicinski1-1/+15
Protect all ethtool callbacks and PHY related state with the netdev instance lock, for drivers which want / need to have their ops instance-locked. Basically take the lock everywhere we take rtnl_lock. It was tempting to take the lock in ethnl_ops_begin(), but turns out we actually nest those calls (when generating notifications). Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Cc: Saeed Mahameed <saeed@kernel.org> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250305163732.2766420-11-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20bpf: Add BPF_SOCK_OPS_TSTAMP_SND_HW_CB callbackJason Xing1-1/+1
Support hw SCM_TSTAMP_SND case for bpf timestamping. Add a new sock_ops callback, BPF_SOCK_OPS_TSTAMP_SND_HW_CB. This callback will occur at the same timestamping point as the user space's hardware SCM_TSTAMP_SND. The BPF program can use it to get the same SCM_TSTAMP_SND timestamp without modifying the user-space application. To avoid increasing the code complexity, replace SKBTX_HW_TSTAMP with SKBTX_HW_TSTAMP_NOBPF instead of changing numerous callers from driver side using SKBTX_HW_TSTAMP. The new definition of SKBTX_HW_TSTAMP means the combination tests of socket timestamping and bpf timestamping. After this patch, drivers can work under the bpf timestamping. Considering some drivers don't assign the skb with hardware timestamp, this patch does the assignment and then BPF program can acquire the hwstamp from skb directly. Signed-off-by: Jason Xing <kerneljasonxing@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250220072940.99994-9-kerneljasonxing@gmail.com
2025-02-12net: dsa: allow use of phylink managed EEE supportRussell King (Oracle)1-8/+17
In order to allow DSA drivers to use phylink managed EEE, we need to change the behaviour of the DSA's .set_eee() ethtool method. Implementation of the DSA .set_mac_eee() method becomes optional with phylink managed EEE as it is only used to validate the EEE parameters supplied from userspace. The rest of the EEE state management should be left to phylink. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1thR9l-003vXC-9F@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-28Merge tag 'driver-core-6.14-rc1' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core and debugfs updates from Greg KH: "Here is the big set of driver core and debugfs updates for 6.14-rc1. Included in here is a bunch of driver core, PCI, OF, and platform rust bindings (all acked by the different subsystem maintainers), hence the merge conflict with the rust tree, and some driver core api updates to mark things as const, which will also require some fixups due to new stuff coming in through other trees in this merge window. There are also a bunch of debugfs updates from Al, and there is at least one user that does have a regression with these, but Al is working on tracking down the fix for it. In my use (and everyone else's linux-next use), it does not seem like a big issue at the moment. Here's a short list of the things in here: - driver core rust bindings for PCI, platform, OF, and some i/o functions. We are almost at the "write a real driver in rust" stage now, depending on what you want to do. - misc device rust bindings and a sample driver to show how to use them - debugfs cleanups in the fs as well as the users of the fs api for places where drivers got it wrong or were unnecessarily doing things in complex ways. - driver core const work, making more of the api take const * for different parameters to make the rust bindings easier overall. - other small fixes and updates All of these have been in linux-next with all of the aforementioned merge conflicts, and the one debugfs issue, which looks to be resolved "soon"" * tag 'driver-core-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (95 commits) rust: device: Use as_char_ptr() to avoid explicit cast rust: device: Replace CString with CStr in property_present() devcoredump: Constify 'struct bin_attribute' devcoredump: Define 'struct bin_attribute' through macro rust: device: Add property_present() saner replacement for debugfs_rename() orangefs-debugfs: don't mess with ->d_name octeontx2: don't mess with ->d_parent or ->d_parent->d_name arm_scmi: don't mess with ->d_parent->d_name slub: don't mess with ->d_name sof-client-ipc-flood-test: don't mess with ->d_name qat: don't mess with ->d_name xhci: don't mess with ->d_iname mtu3: don't mess wiht ->d_iname greybus/camera - stop messing with ->d_iname mediatek: stop messing with ->d_iname netdevsim: don't embed file_operations into your structs b43legacy: make use of debugfs_get_aux() b43: stop embedding struct file_operations into their objects carl9170: stop embedding file_operations into their objects ...
2025-01-22Merge tag 'net-next-6.14' of ↵Linus Torvalds2-11/+39
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Paolo Abeni: "This is slightly smaller than usual, with the most interesting work being still around RTNL scope reduction. Core: - More core refactoring to reduce the RTNL lock contention, including preparatory work for the per-network namespace RTNL lock, replacing RTNL lock with a per device-one to protect NAPI-related net device data and moving synchronize_net() calls outside such lock. - Extend drop reasons usage, adding net scheduler, AF_UNIX, bridge and more specific TCP coverage. - Reduce network namespace tear-down time by removing per-subsystems synchronize_net() in tipc and sched. - Add flow label selector support for fib rules, allowing traffic redirection based on such header field. Netfilter: - Do not remove netdev basechain when last device is gone, allowing netdev basechains without devices. - Revisit the flowtable teardown strategy, dealing better with fin, reset and re-open events. - Scale-up IP-vs connection dumping by avoiding linear search on each restart. Protocols: - A significant XDP socket refactor, consolidating and optimizing several helpers into the core - Better scaling of ICMP rate-limiting, by removing false-sharing in inet peers handling. - Introduces netlink notifications for multicast IPv4 and IPv6 address changes. - Add ipsec support for IP-TFS/AggFrag encapsulation, allowing aggregation and fragmentation of the inner IP. - Add sysctl to configure TIME-WAIT reuse delay for TCP sockets, to avoid local port exhaustion issues when the average connection lifetime is very short. - Support updating keys (re-keying) for connections using kernel TLS (for TLS 1.3 only). - Support ipv4-mapped ipv6 address clients in smc-r v2. - Add support for jumbo data packet transmission in RxRPC sockets, gluing multiple data packets in a single UDP packet. - Support RxRPC RACK-TLP to manage packet loss and retransmission in conjunction with the congestion control algorithm. Driver API: - Introduce a unified and structured interface for reporting PHY statistics, exposing consistent data across different H/W via ethtool. - Make timestamping selectable, allow the user to select the desired hwtstamp provider (PHY or MAC) administratively. - Add support for configuring a header-data-split threshold (HDS) value via ethtool, to deal with partial or buggy H/W implementation. - Consolidate DSA drivers Energy Efficiency Ethernet support. - Add EEE management to phylink, making use of the phylib implementation. - Add phylib support for in-band capabilities negotiation. - Simplify how phylib-enabled mac drivers expose the supported interfaces. Tests and tooling: - Make the YNL tool package-friendly to make it easier to deploy it separately from the kernel. - Increase TCP selftest coverage importing several packetdrill test-cases. - Regenerate the ethtool uapi header from the YNL spec, to ease maintenance and future development. - Add YNL support for decoding the link types used in net self-tests, allowing a single build to run both net and drivers/net. Drivers: - Ethernet high-speed NICs: - nVidia/Mellanox (mlx5): - add cross E-Switch QoS support - add SW Steering support for ConnectX-8 - implement support for HW-Managed Flow Steering, improving the rule deletion/insertion rate - support for multi-host LAG - Intel (ixgbe, ice, igb): - ice: add support for devlink health events - ixgbe: add initial support for E610 chipset variant - igb: add support for AF_XDP zero-copy - Meta: - add support for basic RSS config - allow changing the number of channels - add hardware monitoring support - Broadcom (bnxt): - implement TCP data split and HDS threshold ethtool support, enabling Device Memory TCP. - Marvell Octeon: - implement egress ipsec offload support for the cn10k family - Hisilicon (HIBMC): - implement unicast MAC filtering - Ethernet NICs embedded and virtual: - Convert UDP tunnel drivers to NETDEV_PCPU_STAT_DSTATS, avoiding contented atomic operations for drop counters - Freescale: - quicc: phylink conversion - enetc: support Tx and Rx checksum offload and improve TSO performances - MediaTek: - airoha: introduce support for ETS and HTB Qdisc offload - Microchip: - lan78XX USB: preparation work for phylink conversion - Synopsys (stmmac): - support DWMAC IP on NXP Automotive SoCs S32G2xx/S32G3xx/S32R45 - refactor EEE support to leverage the new driver API - optimize DMA and cache access to increase raw RX performances by 40% - TI: - icssg-prueth: add multicast filtering support for VLAN interface - netkit: - add ability to configure head/tailroom - VXLAN: - accepts packets with user-defined reserved bit - Ethernet switches: - Microchip: - lan969x: add RGMII support - lan969x: improve TX and RX performance using the FDMA engine - nVidia/Mellanox: - move Tx header handling to PCI driver, to ease XDP support - Ethernet PHYs: - Texas Instruments DP83822: - add support for GPIO2 clock output - Realtek: - 8169: add support for RTL8125D rev.b - rtl822x: add hwmon support for the temperature sensor - Microchip: - add support for RDS PTP hardware - consolidate periodic output signal generation - CAN: - several DT-bindings to DT schema conversions - tcan4x5x: - add HW standby support - support nWKRQ voltage selection - kvaser: - allowing Bus Error Reporting runtime configuration - WiFi: - the on-going Multi-Link Operation (MLO) effort continues, affecting both the stack and in drivers - mac80211/cfg80211: - Emergency Preparedness Communication Services (EPCS) station mode support - support for adding and removing station links for MLO - add support for WiFi 7/EHT mesh over 320 MHz channels - report Tx power info for each link - RealTek (rtw88): - enable USB Rx aggregation and USB 3 to improve performance - LED support - RealTek (rtw89): - refactor power save to support Multi-Link Operations - add support for RTL8922AE-VS variant - MediaTek (mt76): - single wiphy multiband support (preparation for MLO) - p2p device support - add TP-Link TXE50UH USB adapter support - Qualcomm (ath10k): - support for the QCA6698AQ IP core - Qualcomm (ath12k): - enable MLO for QCN9274 - Bluetooth: - Allow sysfs to trigger hdev reset, to allow recovering devices not responsive from user-space - MediaTek: add support for MT7922, MT7925, MT7921e devices - Realtek: add support for RTL8851BE devices - Qualcomm: add support for WCN785x devices - ISO: allow BIG re-sync" * tag 'net-next-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1386 commits) net/rose: prevent integer overflows in rose_setsockopt() net: phylink: fix regression when binding a PHY net: ethernet: ti: am65-cpsw: streamline TX queue creation and cleanup net: ethernet: ti: am65-cpsw: streamline RX queue creation and cleanup net: ethernet: ti: am65-cpsw: ensure proper channel cleanup in error path ipv6: Convert inet6_rtm_deladdr() to per-netns RTNL. ipv6: Convert inet6_rtm_newaddr() to per-netns RTNL. ipv6: Move lifetime validation to inet6_rtm_newaddr(). ipv6: Set cfg.ifa_flags before device lookup in inet6_rtm_newaddr(). ipv6: Pass dev to inet6_addr_add(). ipv6: Convert inet6_ioctl() to per-netns RTNL. ipv6: Hold rtnl_net_lock() in addrconf_init() and addrconf_cleanup(). ipv6: Hold rtnl_net_lock() in addrconf_dad_work(). ipv6: Hold rtnl_net_lock() in addrconf_verify_work(). ipv6: Convert net.ipv6.conf.${DEV}.XXX sysctl to per-netns RTNL. ipv6: Add __in6_dev_get_rtnl_net(). net: stmmac: Drop redundant skb_mark_for_recycle() for SKB frags net: mii: Fix the Speed display when the network cable is not connected sysctl net: Remove macro checks for CONFIG_SYSCTL eth: bnxt: update header sizing defaults ...
2025-01-21Merge tag 'kthread-for-6.14-rc1' of ↵Linus Torvalds3-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks Pull kthread updates from Frederic Weisbecker: "Kthreads affinity follow either of 4 existing different patterns: 1) Per-CPU kthreads must stay affine to a single CPU and never execute relevant code on any other CPU. This is currently handled by smpboot code which takes care of CPU-hotplug operations. Affinity here is a correctness constraint. 2) Some kthreads _have_ to be affine to a specific set of CPUs and can't run anywhere else. The affinity is set through kthread_bind_mask() and the subsystem takes care by itself to handle CPU-hotplug operations. Affinity here is assumed to be a correctness constraint. 3) Per-node kthreads _prefer_ to be affine to a specific NUMA node. This is not a correctness constraint but merely a preference in terms of memory locality. kswapd and kcompactd both fall into this category. The affinity is set manually like for any other task and CPU-hotplug is supposed to be handled by the relevant subsystem so that the task is properly reaffined whenever a given CPU from the node comes up. Also care should be taken so that the node affinity doesn't cross isolated (nohz_full) cpumask boundaries. 4) Similar to the previous point except kthreads have a _preferred_ affinity different than a node. Both RCU boost kthreads and RCU exp kworkers fall into this category as they refer to "RCU nodes" from a distinctly distributed tree. Currently the preferred affinity patterns (3 and 4) have at least 4 identified users, with more or less success when it comes to handle CPU-hotplug operations and CPU isolation. Each of which do it in its own ad-hoc way. This is an infrastructure proposal to handle this with the following API changes: - kthread_create_on_node() automatically affines the created kthread to its target node unless it has been set as per-cpu or bound with kthread_bind[_mask]() before the first wake-up. - kthread_affine_preferred() is a new function that can be called right after kthread_create_on_node() to specify a preferred affinity different than the specified node. When the preferred affinity can't be applied because the possible targets are offline or isolated (nohz_full), the kthread is affine to the housekeeping CPUs (which means to all online CPUs most of the time or only the non-nohz_full CPUs when nohz_full= is set). kswapd, kcompactd, RCU boost kthreads and RCU exp kworkers have been converted, along with a few old drivers. Summary of the changes: - Consolidate a bunch of ad-hoc implementations of kthread_run_on_cpu() - Introduce task_cpu_fallback_mask() that defines the default last resort affinity of a task to become nohz_full aware - Add some correctness check to ensure kthread_bind() is always called before the first kthread wake up. - Default affine kthread to its preferred node. - Convert kswapd / kcompactd and remove their halfway working ad-hoc affinity implementation - Implement kthreads preferred affinity - Unify kthread worker and kthread API's style - Convert RCU kthreads to the new API and remove the ad-hoc affinity implementation" * tag 'kthread-for-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks: kthread: modify kernel-doc function name to match code rcu: Use kthread preferred affinity for RCU exp kworkers treewide: Introduce kthread_run_worker[_on_cpu]() kthread: Unify kthread_create_on_cpu() and kthread_create_worker_on_cpu() automatic format rcu: Use kthread preferred affinity for RCU boost kthread: Implement preferred affinity mm: Create/affine kswapd to its preferred node mm: Create/affine kcompactd to its preferred node kthread: Default affine kthread to its preferred NUMA node kthread: Make sure kthread hasn't started while binding it sched,arm64: Handle CPU isolation on last resort fallback rq selection arm64: Exclude nohz_full CPUs from 32bits el0 support lib: test_objpool: Use kthread_run_on_cpu() kallsyms: Use kthread_run_on_cpu() soc/qman: test: Use kthread_run_on_cpu() arm/bL_switcher: Use kthread_run_on_cpu()
2025-01-17net: dsa: implement get_ts_stats ethtool operation for user portsVladimir Oltean1-0/+11
Integrate with the standard infrastructure for reporting hardware packet timestamping statistics. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20250116104628.123555-3-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-13Merge 6.13-rc7 into driver-core-nextGreg Kroah-Hartman1-5/+11
We need the debugfs / driver-core fixes in here as well for testing and to build on top of. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-08treewide: Introduce kthread_run_worker[_on_cpu]()Frederic Weisbecker3-3/+3
kthread_create() creates a kthread without running it yet. kthread_run() creates a kthread and runs it. On the other hand, kthread_create_worker() creates a kthread worker and runs it. This difference in behaviours is confusing. Also there is no way to create a kthread worker and affine it using kthread_bind_mask() or kthread_affine_preferred() before starting it. Consolidate the behaviours and introduce kthread_run_worker[_on_cpu]() that behaves just like kthread_run(). kthread_create_worker[_on_cpu]() will now only create a kthread worker without starting it. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
2025-01-07net: dsa: no longer call ds->ops->get_mac_eee()Russell King (Oracle)1-8/+0
All implementations of get_mac_eee() now just return zero without doing anything useful. Remove the call to this method in preparation to removing the method from each DSA driver. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1tUlkz-007Uyl-UA@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-03driver core: Constify API device_find_child() and adapt for various usagesZijun Hu1-1/+1
Constify the following API: struct device *device_find_child(struct device *dev, void *data, int (*match)(struct device *dev, void *data)); To : struct device *device_find_child(struct device *dev, const void *data, device_match_t match); typedef int (*device_match_t)(struct device *dev, const void *data); with the following reasons: - Protect caller's match data @*data which is for comparison and lookup and the API does not actually need to modify @*data. - Make the API's parameters (@match)() and @data have the same type as all of other device finding APIs (bus|class|driver)_find_device(). - All kinds of existing device match functions can be directly taken as the API's argument, they were exported by driver core. Constify the API and adapt for various existing usages. BTW, various subsystem changes are squashed into this commit to meet 'git bisect' requirement, and this commit has the minimal and simplest changes to complement squashing shortcoming, and that may bring extra code improvement. Reviewed-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Takashi Sakamoto <o-takashi@sakamocchi.jp> Acked-by: Uwe Kleine-König <ukleinek@kernel.org> # for drivers/pwm Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Link: https://lore.kernel.org/r/20241224-const_dfc_done-v5-4-6623037414d4@quicinc.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-5/+11
Cross-merge networking fixes after downstream PR (net-6.13-rc4). No conflicts. Adjacent changes: drivers/net/ethernet/renesas/rswitch.h 32fd46f5b69e ("net: renesas: rswitch: remove speed from gwca structure") 922b4b955a03 ("net: renesas: rswitch: rework ts tags management") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-18net: dsa: restore dsa_software_vlan_untag() ability to operate on ↵Vladimir Oltean1-5/+11
VLAN-untagged traffic Robert Hodaszi reports that locally terminated traffic towards VLAN-unaware bridge ports is broken with ocelot-8021q. He is describing the same symptoms as for commit 1f9fc48fd302 ("net: dsa: sja1105: fix reception from VLAN-unaware bridges"). For context, the set merged as "VLAN fixes for Ocelot driver": https://lore.kernel.org/netdev/20240815000707.2006121-1-vladimir.oltean@nxp.com/ was developed in a slightly different form earlier this year, in January. Initially, the switch was unconditionally configured to set OCELOT_ES0_TAG when using ocelot-8021q, regardless of port operating mode. This led to the situation where VLAN-unaware bridge ports would always push their PVID - see ocelot_vlan_unaware_pvid() - a negligible value anyway - into RX packets. To strip this in software, we would have needed DSA to know what private VID the switch chose for VLAN-unaware bridge ports, and pushed into the packets. This was implemented downstream, and a remnant of it remains in the form of a comment mentioning ds->ops->get_private_vid(), as something which would maybe need to be considered in the future. However, for upstream, it was deemed inappropriate, because it would mean introducing yet another behavior for stripping VLAN tags from VLAN-unaware bridge ports, when one already existed (ds->untag_bridge_pvid). The latter has been marked as obsolete along with an explanation why it is logically broken, but still, it would have been confusing. So, for upstream, felix_update_tag_8021q_rx_rule() was developed, which essentially changed the state of affairs from "Felix with ocelot-8021q delivers all packets as VLAN-tagged towards the CPU" into "Felix with ocelot-8021q delivers all packets from VLAN-aware bridge ports towards the CPU". This was done on the premise that in VLAN-unaware mode, there's nothing useful in the VLAN tags, and we can avoid introducing ds->ops->get_private_vid() in the DSA receive path if we configure the switch to not push those VLAN tags into packets in the first place. Unfortunately, and this is when the trainwreck started, the selftests developed initially and posted with the series were not re-ran. dsa_software_vlan_untag() was initially written given the assumption that users of this feature would send _all_ traffic as VLAN-tagged. It was only partially adapted to the new scheme, by removing ds->ops->get_private_vid(), which also used to be necessary in standalone ports mode. Where the trainwreck became even worse is that I had a second opportunity to think about this, when the dsa_software_vlan_untag() logic change initially broke sja1105, in commit 1f9fc48fd302 ("net: dsa: sja1105: fix reception from VLAN-unaware bridges"). I did not connect the dots that it also breaks ocelot-8021q, for pretty much the same reason that not all received packets will be VLAN-tagged. To be compatible with the optimized Felix control path which runs felix_update_tag_8021q_rx_rule() to only push VLAN tags when useful (in VLAN-aware mode), we need to restore the old dsa_software_vlan_untag() logic. The blamed commit introduced the assumption that dsa_software_vlan_untag() will see only VLAN-tagged packets, assumption which is false. What corrupts RX traffic is the fact that we call skb_vlan_untag() on packets which are not VLAN-tagged in the first place. Fixes: 93e4649efa96 ("net: dsa: provide a software untagging function on RX for VLAN-aware bridges") Reported-by: Robert Hodaszi <robert.hodaszi@digi.com> Closes: https://lore.kernel.org/netdev/20241215163334.615427-1-robert.hodaszi@digi.com/ Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20241216135059.1258266-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-1/+1
Cross-merge networking fixes after downstream PR (net-6.13-rc3). No conflicts or adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-12net: dsa: tag_ocelot_8021q: fix broken receptionRobert Hodaszi1-1/+1
The blamed commit changed the dsa_8021q_rcv() calling convention to accept pre-populated source_port and switch_id arguments. If those are not available, as in the case of tag_ocelot_8021q, the arguments must be pre-initialized with -1. Due to the bug of passing uninitialized arguments in tag_ocelot_8021q, dsa_8021q_rcv() does not detect that it needs to populate the source_port and switch_id, and this makes dsa_conduit_find_user() fail, which leads to packet loss on reception. Fixes: dcfe7673787b ("net: dsa: tag_sja1105: absorb logic for not overwriting precise info into dsa_8021q_rcv()") Signed-off-by: Robert Hodaszi <robert.hodaszi@digi.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20241211144741.1415758-1-robert.hodaszi@digi.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-11net: dsa: require .support_eee() method to be implementedRussell King (Oracle)1-2/+2
Now that we have updated all drivers, switch DSA to require an implementation of the .support_eee() method for EEE to be usable, rather than defaulting to being permissive when not implemented. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://patch.msgid.link/E1tL14e-006cZy-AT@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-11net: dsa: provide implementation of .support_eee()Russell King (Oracle)1-0/+16
Provide a trivial implementation for the .support_eee() method which switch drivers can use to simply indicate that they support EEE on all their user ports. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://patch.msgid.link/E1tL149-006cZJ-JJ@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-11net: dsa: add hook to determine whether EEE is supportedRussell King (Oracle)1-0/+8
Add a hook to determine whether the switch supports EEE. This will return false if the switch does not, or true if it does. If the method is not implemented, we assume (currently) that the switch supports EEE. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://patch.msgid.link/E1tL144-006cZD-El@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-11net: dsa: remove check for dp->pl in EEE methodsRussell King (Oracle)1-2/+2
When user ports are initialised, a phylink instance is always created, and so dp->pl will always be non-NULL. The EEE methods are only used for user ports, so checking for dp->pl to be NULL makes no sense. No other phylink-calling method implements similar checks in DSA. Remove this unnecessary check. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://patch.msgid.link/E1tL13z-006cZ7-BZ@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-10rtnetlink: add ndo_fdb_dump_contextEric Dumazet1-1/+2
rtnl_fdb_dump() and various ndo_fdb_dump() helpers share a hidden layout of cb->ctx. Before switching rtnl_fdb_dump() to for_each_netdev_dump() in the following patch, make this more explicit. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20241209100747.2269613-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-03net: dsa: use ethtool string helpersRosen Penev1-8/+5
These are the preferred way to copy ethtool strings. Avoids incrementing pointers all over the place. Signed-off-by: Rosen Penev <rosenp@gmail.com> (for hellcreek driver) Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Link: https://patch.msgid.link/20241028044828.1639668-1-rosenp@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-30net: dsa: allow matchall mirroring rules towards the CPUVladimir Oltean1-6/+30
If the CPU bandwidth capacity permits, it may be useful to mirror the entire ingress of a user port to software. This is in fact possible to express even if there is no net_device representation for the CPU port. In fact, that approach was already exhausted and that representation wouldn't have even helped [1]. The idea behind implementing this is that currently, we refuse to offload any mirroring towards a non-DSA target net_device. But if we acknowledge the fact that to reach any foreign net_device, the switch must send the packet to the CPU anyway, then we can simply offload just that part, and let the software do the rest. There is only one condition we need to uphold: the filter needs to be present in the software data path as well (no skip_sw). There are 2 actions to consider: FLOW_ACTION_MIRRED (redirect to egress of target interface) and FLOW_ACTION_MIRRED_INGRESS (redirect to ingress of target interface). We don't have the ability/API to offload FLOW_ACTION_MIRRED_INGRESS when the target port is also a DSA user port, but we could also permit that through mirred to the CPU + software. Example: $ ip link add dummy0 type dummy; ip link set dummy0 up $ tc qdisc add dev swp0 clsact $ tc filter add dev swp0 ingress matchall action mirred ingress mirror dev dummy0 Any DSA driver with a ds->ops->port_mirror_add() implementation can now make use of this with no additional change. [1] https://lore.kernel.org/netdev/20191002233750.13566-1-olteanv@gmail.com/ Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20241023135251.1752488-6-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-30net: dsa: add more extack messages in dsa_user_add_cls_matchall_mirred()Vladimir Oltean1-3/+14
Do not leave -EOPNOTSUPP errors without an explanation. It is confusing for the user to figure out what is wrong otherwise. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20241023135251.1752488-5-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-30net: dsa: use "extack" as argument to flow_action_basic_hw_stats_check()Vladimir Oltean1-4/+2
We already have an "extack" stack variable in dsa_user_add_cls_matchall_police() and dsa_user_add_cls_matchall_mirred(), there is no need to retrieve it again from cls->common.extack. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20241023135251.1752488-4-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-30net: dsa: clean up dsa_user_add_cls_matchall()Vladimir Oltean1-9/+16
The body is a bit hard to read, hard to extend, and has duplicated conditions. Clean up the "if (many conditions) else if (many conditions, some of them repeated)" pattern by: - Moving the repeated conditions out - Replacing the repeated tests for the same variable with a switch/case - Moving the protocol check inside the dsa_user_add_cls_matchall_mirred() function call. This is pure refactoring, no logic has been changed, though some tests were reordered. The order does not matter - they are independent things to be tested for. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20241023135251.1752488-3-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-29net: dsa: replace devlink resource registration calls by devl_ variantsPrzemek Kitszel1-7/+16
Replace devlink_resource_register(), devlink_resource_occ_get_register(), and devlink_resource_occ_get_unregister() calls by respective devl_* variants. Mentioned functions have no direct users in any drivers, and are going to be removed in subsequent patches. Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://patch.msgid.link/20241023131248.27192-6-przemyslaw.kitszel@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-23netpoll: remove ndo_netpoll_setup() second argumentEric Dumazet1-2/+1
npinfo is not used in any of the ndo_netpoll_setup() methods. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241018052108.2610827-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-17net: dsa: remove dsa_port_phylink_mac_select_pcs()Russell King (Oracle)1-8/+0
There is no longer any reason to implement the mac_select_pcs() callback in DSA. Returning ERR_PTR(-EOPNOTSUPP) is functionally equivalent to not providing the function. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Andrew Lunn <andrew@lunn.ch>
2024-10-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-3/+8
Cross-merge networking fixes after downstream PR (net-6.12-rc3). No conflicts and no adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-09net: dsa: refuse cross-chip mirroring operationsVladimir Oltean1-3/+8
In case of a tc mirred action from one switch to another, the behavior is not correct. We simply tell the source switch driver to program a mirroring entry towards mirror->to_local_port = to_dp->index, but it is not even guaranteed that the to_dp belongs to the same switch as dp. For proper cross-chip support, we would need to go through the cross-chip notifier layer in switch.c, program the entry on cascade ports, and introduce new, explicit API for cross-chip mirroring, given that intermediary switches should have introspection into the DSA tags passed through the cascade port (and not just program a port mirror on the entire cascade port). None of that exists today. Reject what is not implemented so that user space is not misled into thinking it works. Fixes: f50f212749e8 ("net: dsa: Add plumbing for port mirroring") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20241008094320.3340980-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-07net: dsa: remove obsolete phylink dsa_switch operationsRussell King (Oracle)2-41/+1
No driver now uses the DSA switch phylink members, so we can now remove the method pointers, but we need to leave empty shim functions to allow those drivers that do not provide phylink MAC operations structure to continue functioning. Signed-off-by: Russell King (oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> # sja1105, felix, dsa_loop Link: https://patch.msgid.link/E1swKNV-0060oN-1b@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-01net: dsa: improve shutdown sequenceVladimir Oltean1-0/+7
Alexander Sverdlin presents 2 problems during shutdown with the lan9303 driver. One is specific to lan9303 and the other just happens to reproduce there. The first problem is that lan9303 is unique among DSA drivers in that it calls dev_get_drvdata() at "arbitrary runtime" (not probe, not shutdown, not remove): phy_state_machine() -> ... -> dsa_user_phy_read() -> ds->ops->phy_read() -> lan9303_phy_read() -> chip->ops->phy_read() -> lan9303_mdio_phy_read() -> dev_get_drvdata() But we never stop the phy_state_machine(), so it may continue to run after dsa_switch_shutdown(). Our common pattern in all DSA drivers is to set drvdata to NULL to suppress the remove() method that may come afterwards. But in this case it will result in an NPD. The second problem is that the way in which we set dp->conduit->dsa_ptr = NULL; is concurrent with receive packet processing. dsa_switch_rcv() checks once whether dev->dsa_ptr is NULL, but afterwards, rather than continuing to use that non-NULL value, dev->dsa_ptr is dereferenced again and again without NULL checks: dsa_conduit_find_user() and many other places. In between dereferences, there is no locking to ensure that what was valid once continues to be valid. Both problems have the common aspect that closing the conduit interface solves them. In the first case, dev_close(conduit) triggers the NETDEV_GOING_DOWN event in dsa_user_netdevice_event() which closes user ports as well. dsa_port_disable_rt() calls phylink_stop(), which synchronously stops the phylink state machine, and ds->ops->phy_read() will thus no longer call into the driver after this point. In the second case, dev_close(conduit) should do this, as per Documentation/networking/driver.rst: | Quiescence | ---------- | | After the ndo_stop routine has been called, the hardware must | not receive or transmit any data. All in flight packets must | be aborted. If necessary, poll or wait for completion of | any reset commands. So it should be sufficient to ensure that later, when we zeroize conduit->dsa_ptr, there will be no concurrent dsa_switch_rcv() call on this conduit. The addition of the netif_device_detach() function is to ensure that ioctls, rtnetlinks and ethtool requests on the user ports no longer propagate down to the driver - we're no longer prepared to handle them. The race condition actually did not exist when commit 0650bf52b31f ("net: dsa: be compatible with masters which unregister on shutdown") first introduced dsa_switch_shutdown(). It was created later, when we stopped unregistering the user interfaces from a bad spot, and we just replaced that sequence with a racy zeroization of conduit->dsa_ptr (one which doesn't ensure that the interfaces aren't up). Reported-by: Alexander Sverdlin <alexander.sverdlin@siemens.com> Closes: https://lore.kernel.org/netdev/2d2e3bba17203c14a5ffdabc174e3b6bbb9ad438.camel@siemens.com/ Closes: https://lore.kernel.org/netdev/c1bf4de54e829111e0e4a70e7bd1cf523c9550ff.camel@siemens.com/ Fixes: ee534378f005 ("net: dsa: fix panic when DSA master device unbinds on shutdown") Reviewed-by: Alexander Sverdlin <alexander.sverdlin@siemens.com> Tested-by: Alexander Sverdlin <alexander.sverdlin@siemens.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20240913203549.3081071-1-vladimir.oltean@nxp.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-09-10net: dsa: microchip: update tag_ksz masks for KSZ9477 familyPieter Van Trappen1-2/+3
Remove magic number 7 by introducing a GENMASK macro instead. Remove magic number 0x80 by using the BIT macro instead. Signed-off-by: Pieter Van Trappen <pieter.van.trappen@cern.ch> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20240909134301.75448-1-vtpieter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-03netdev_features: convert NETIF_F_LLTX to dev->lltxAlexander Lobakin1-1/+2
NETIF_F_LLTX can't be changed via Ethtool and is not a feature, rather an attribute, very similar to IFF_NO_QUEUE (and hot). Free one netdev_features_t bit and make it a "hot" private flag. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski3-67/+110
Cross-merge networking fixes after downstream PR. No conflicts. Adjacent changes: drivers/net/ethernet/broadcom/bnxt/bnxt.h c948c0973df5 ("bnxt_en: Don't clear ntuple filters and rss contexts during ethtool ops") f2878cdeb754 ("bnxt_en: Add support to call FW to update a VNIC") Link: https://patch.msgid.link/20240822210125.1542769-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-16net: dsa: microchip: fix tag_ksz egress mask for KSZ8795 familyPieter Van Trappen1-2/+4
Fix the tag_ksz egress mask for DSA_TAG_PROTO_KSZ8795, the port is encoded in the two and not three LSB. This fix is for completeness, for example the bug doesn't manifest itself on the KSZ8794 because bit 2 seems to be always zero. Signed-off-by: Pieter Van Trappen <pieter.van.trappen@cern.ch> Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com> Link: https://patch.msgid.link/20240813142750.772781-7-vtpieter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-16net: dsa: provide a software untagging function on RX for VLAN-aware bridgesVladimir Oltean2-32/+108
Through code analysis, I realized that the ds->untag_bridge_pvid logic is contradictory - see the newly added FIXME above the kernel-doc for dsa_software_untag_vlan_unaware_bridge(). Moreover, for the Felix driver, I need something very similar, but which is actually _not_ contradictory: untag the bridge PVID on RX, but for VLAN-aware bridges. The existing logic does it for VLAN-unaware bridges. Since I don't want to change the functionality of drivers which were supposedly properly tested with the ds->untag_bridge_pvid flag, I have introduced a new one: ds->untag_vlan_aware_bridge_pvid, and I have refactored the DSA reception code into a common path for both flags. TODO: both flags should be unified under a single ds->software_vlan_untag, which users of both current flags should set. This is not something that can be carried out right away. It needs very careful examination of all drivers which make use of this functionality, since some of them actually get this wrong in the first place. For example, commit 9130c2d30c17 ("net: dsa: microchip: ksz8795: Use software untagging on CPU port") uses this in a driver which has ds->configure_vlan_while_not_filtering = true. The latter mechanism has been known for many years to be broken by design: https://lore.kernel.org/netdev/CABumfLzJmXDN_W-8Z=p9KyKUVi_HhS7o_poBkeKHS2BkAiyYpw@mail.gmail.com/ and we have the situation of 2 bugs canceling each other. There is no private VLAN, and the port follows the PVID of the VLAN-unaware bridge. So, it's kinda ok for that driver to use the ds->untag_bridge_pvid mechanism, in a broken way. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-08-16net: mscc: ocelot: use ocelot_xmit_get_vlan_info() also for FDMA and ↵Vladimir Oltean1-35/+2
register injection Problem description ------------------- On an NXP LS1028A (felix DSA driver) with the following configuration: - ocelot-8021q tagging protocol - VLAN-aware bridge (with STP) spanning at least swp0 and swp1 - 8021q VLAN upper interfaces on swp0 and swp1: swp0.700, swp1.700 - ptp4l on swp0.700 and swp1.700 we see that the ptp4l instances do not see each other's traffic, and they all go to the grand master state due to the ANNOUNCE_RECEIPT_TIMEOUT_EXPIRES condition. Jumping to the conclusion for the impatient ------------------------------------------- There is a zero-day bug in the ocelot switchdev driver in the way it handles VLAN-tagged packet injection. The correct logic already exists in the source code, in function ocelot_xmit_get_vlan_info() added by commit 5ca721c54d86 ("net: dsa: tag_ocelot: set the classified VLAN during xmit"). But it is used only for normal NPI-based injection with the DSA "ocelot" tagging protocol. The other injection code paths (register-based and FDMA-based) roll their own wrong logic. This affects and was noticed on the DSA "ocelot-8021q" protocol because it uses register-based injection. By moving ocelot_xmit_get_vlan_info() to a place that's common for both the DSA tagger and the ocelot switch library, it can also be called from ocelot_port_inject_frame() in ocelot.c. We need to touch the lines with ocelot_ifh_port_set()'s prototype anyway, so let's rename it to something clearer regarding what it does, and add a kernel-doc. ocelot_ifh_set_basic() should do. Investigation notes ------------------- Debugging reveals that PTP event (aka those carrying timestamps, like Sync) frames injected into swp0.700 (but also swp1.700) hit the wire with two VLAN tags: 00000000: 01 1b 19 00 00 00 00 01 02 03 04 05 81 00 02 bc ~~~~~~~~~~~ 00000010: 81 00 02 bc 88 f7 00 12 00 2c 00 00 02 00 00 00 ~~~~~~~~~~~ 00000020: 00 00 00 00 00 00 00 00 00 00 00 01 02 ff fe 03 00000030: 04 05 00 01 00 04 00 00 00 00 00 00 00 00 00 00 00000040: 00 00 The second (unexpected) VLAN tag makes felix_check_xtr_pkt() -> ptp_classify_raw() fail to see these as PTP packets at the link partner's receiving end, and return PTP_CLASS_NONE (because the BPF classifier is not written to expect 2 VLAN tags). The reason why packets have 2 VLAN tags is because the transmission code treats VLAN incorrectly. Neither ocelot switchdev, nor felix DSA, declare the NETIF_F_HW_VLAN_CTAG_TX feature. Therefore, at xmit time, all VLANs should be in the skb head, and none should be in the hwaccel area. This is done by: static struct sk_buff *validate_xmit_vlan(struct sk_buff *skb, netdev_features_t features) { if (skb_vlan_tag_present(skb) && !vlan_hw_offload_capable(features, skb->vlan_proto)) skb = __vlan_hwaccel_push_inside(skb); return skb; } But ocelot_port_inject_frame() handles things incorrectly: ocelot_ifh_port_set(ifh, port, rew_op, skb_vlan_tag_get(skb)); void ocelot_ifh_port_set(struct sk_buff *skb, void *ifh, int port, u32 rew_op) { (...) if (vlan_tag) ocelot_ifh_set_vlan_tci(ifh, vlan_tag); (...) } The way __vlan_hwaccel_push_inside() pushes the tag inside the skb head is by calling: static inline void __vlan_hwaccel_clear_tag(struct sk_buff *skb) { skb->vlan_present = 0; } which does _not_ zero out skb->vlan_tci as seen by skb_vlan_tag_get(). This means that ocelot, when it calls skb_vlan_tag_get(), sees (and uses) a residual skb->vlan_tci, while the same VLAN tag is _already_ in the skb head. The trivial fix for double VLAN headers is to replace the content of ocelot_ifh_port_set() with: if (skb_vlan_tag_present(skb)) ocelot_ifh_set_vlan_tci(ifh, skb_vlan_tag_get(skb)); but this would not be correct either, because, as mentioned, vlan_hw_offload_capable() is false for us, so we'd be inserting dead code and we'd always transmit packets with VID=0 in the injection frame header. I can't actually test the ocelot switchdev driver and rely exclusively on code inspection, but I don't think traffic from 8021q uppers has ever been injected properly, and not double-tagged. Thus I'm blaming the introduction of VLAN fields in the injection header - early driver code. As hinted at in the early conclusion, what we _want_ to happen for VLAN transmission was already described once in commit 5ca721c54d86 ("net: dsa: tag_ocelot: set the classified VLAN during xmit"). ocelot_xmit_get_vlan_info() intends to ensure that if the port through which we're transmitting is under a VLAN-aware bridge, the outer VLAN tag from the skb head is stripped from there and inserted into the injection frame header (so that the packet is processed in hardware through that actual VLAN). And in all other cases, the packet is sent with VID=0 in the injection frame header, since the port is VLAN-unaware and has logic to strip this VID on egress (making it invisible to the wire). Fixes: 08d02364b12f ("net: mscc: fix the injection header") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-15Merge branch 'net-make-timestamping-selectable'Jakub Kicinski1-1/+1
First part of "net: Make timestamping selectable" from Kory Maincent. Change the driver-facing type already to lower rebasing pain. Link: https://lore.kernel.org/20240709-feature_ptp_netnext-v17-0-b5317f50df2a@bootlin.com/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-15net: Add struct kernel_ethtool_ts_infoKory Maincent1-1/+1
In prevision to add new UAPI for hwtstamp we will be limited to the struct ethtool_ts_info that is currently passed in fixed binary format through the ETHTOOL_GET_TS_INFO ethtool ioctl. It would be good if new kernel code already started operating on an extensible kernel variant of that structure, similar in concept to struct kernel_hwtstamp_config vs struct hwtstamp_config. Since struct ethtool_ts_info is in include/uapi/linux/ethtool.h, here we introduce the kernel-only structure in include/linux/ethtool.h. The manual copy is then made in the function called by ETHTOOL_GET_TS_INFO. Acked-by: Shannon Nelson <shannon.nelson@amd.com> Acked-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Link: https://patch.msgid.link/20240709-feature_ptp_netnext-v17-6-b5317f50df2a@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-15net: dsa: prepare 'dsa_tag_8021q_bridge_join' for standalone usePawel Dembicki1-1/+4
The 'dsa_tag_8021q_bridge_join' could be used as a generic implementation of the 'ds->ops->port_bridge_join()' function. However, it is necessary to synchronize their arguments. This patch also moves the 'tx_fwd_offload' flag configuration line into 'dsa_tag_8021q_bridge_join' body. Currently, every (sja1105) driver sets it, and the future vsc73xx implementation will also need it for simplification. Suggested-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://patch.msgid.link/20240713211620.1125910-11-paweldembicki@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-15net: dsa: vsc73xx: introduce tag 8021q for vsc73xxPawel Dembicki3-0/+75
This commit introduces a new tagger based on 802.1q tagging. It's designed for the vsc73xx driver. The VSC73xx family doesn't have any tag support for the RGMII port, but it could be based on VLANs. Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: https://patch.msgid.link/20240713211620.1125910-8-paweldembicki@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-15net: dsa: tag_sja1105: refactor skb->dev assignment to dsa_tag_8021q_find_user()Vladimir Oltean3-17/+24
A new tagging protocol implementation based on tag_8021q is on the horizon, and it appears that it also has to open-code the complicated logic of finding a source port based on a VLAN header. Create a single dsa_tag_8021q_find_user() and make sja1105 call it. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20240713211620.1125910-7-paweldembicki@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-15net: dsa: tag_sja1105: prefer precise source port info on SJA1110 tooVladimir Oltean1-4/+4
Now that dsa_8021q_rcv() handles better the case where we don't overwrite the precise source information if it comes from an external (non-tag_8021q) source, we can now unify the call sequence between sja1105_rcv() and sja1110_rcv(). This is a preparatory change for creating a higher-level wrapper for the entire sequence which will live in tag_8021q. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://patch.msgid.link/20240713211620.1125910-6-paweldembicki@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-15net: dsa: tag_sja1105: absorb entire sja1105_vlan_rcv() into dsa_8021q_rcv()Vladimir Oltean4-36/+34
tag_sja1105 has a wrapper over dsa_8021q_rcv(): sja1105_vlan_rcv(), which determines whether the packet came from a bridge with vlan_filtering=1 (the case resolved via dsa_find_designated_bridge_port_by_vid()), or if it contains a tag_8021q header. Looking at a new tagger implementation for vsc73xx, based also on tag_8021q, it is becoming clear that the logic is needed there as well. So instead of forcing each tagger to wrap around dsa_8021q_rcv(), let's merge the logic into the core. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Tested-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com> Link: https://patch.msgid.link/20240713211620.1125910-5-paweldembicki@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-15net: dsa: tag_sja1105: absorb logic for not overwriting precise info into ↵Vladimir Oltean2-23/+32
dsa_8021q_rcv() In both sja1105_rcv() and sja1110_rcv(), we may have precise source port information coming from parallel hardware mechanisms, in addition to the tag_8021q header. Only sja1105_rcv() has extra logic to not overwrite that precise info with what's present in the VLAN tag. This is because sja1110_rcv() gets by, by having a reversed set of checks when assigning skb->dev. When the source port is imprecise (vbid >=1), source_port and switch_id will be set to zeroes by dsa_8021q_rcv(), which might be problematic. But by checking for vbid >= 1 first, sja1110_rcv() fends that off. We would like to make more code common between sja1105_rcv() and sja1110_rcv(), and for that, we need to make sure that sja1110_rcv() also goes through the precise source port preservation logic. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Tested-by: Vladimir Oltean <olteanv@gmail.com> Link: https://patch.msgid.link/20240713211620.1125910-4-paweldembicki@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-11net: dsa: Fix typo in NET_DSA_TAG_RTL4_A KconfigChris Packham1-1/+1
Fix a minor typo in the help text for the NET_DSA_TAG_RTL4_A config option. Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Luiz Angelo Daros de Luca <luizluca@gmail.com> Link: https://lore.kernel.org/r/20240607020843.1380735-1-chris.packham@alliedtelesis.co.nz Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-10net: dsa: update the unicast MAC address when changing conduitMarek Behún3-8/+44
When changing DSA user interface conduit while the user interface is up, DSA exhibits different behavior in comparison to when the interface is down. This different behavior concerns the primary unicast MAC address stored in the port standalone FDB and in the conduit device UC database. If we put a switch port down while changing the conduit with ip link set sw0p0 down ip link set sw0p0 type dsa conduit conduit1 ip link set sw0p0 up we delete the address in dsa_user_close() and install the (possibly different) address in dsa_user_open(). But when changing the conduit on the fly, the old address is not deleted and the new one is not installed. Since we explicitly want to support live-changing the conduit, uninstall the old address before calling dsa_port_assign_conduit() and install the (possibly different) new address after the call. Because conduit change might also trigger address change (the user interface is supposed to inherit the conduit interface MAC address if no address is defined in hardware (dp->mac is a zero address)), move the eth_hw_addr_inherit() call from dsa_user_change_conduit() to dsa_port_change_conduit(), just before installing the new address. Although this is in theory a flaw in DSA core, it needs not be backported, since there is currently no DSA driver that can be affected by this. The only DSA driver that supports changing conduit is felix, and, as explained by Vladimir Oltean [1]: There are 2 reasons why with felix the bug does not manifest itself. First is because both the 'ocelot' and the alternate 'ocelot-8021q' tagging protocols have the 'promisc_on_conduit = true' flag. So the unicast address doesn't have to be in the conduit's RX filter - neither the old or the new conduit. Second, dsa_user_host_uc_install() theoretically leaves behind host FDB entries installed towards the wrong (old) CPU port. But in felix_fdb_add(), we treat any FDB entry requested towards any CPU port as if it was a multicast FDB entry programmed towards _all_ CPU ports. For that reason, it is installed towards the port mask of the PGID_CPU port group ID: if (dsa_port_is_cpu(dp)) port = PGID_CPU; Therefore no Fixes tag for this change. [1] https://lore.kernel.org/netdev/20240507201827.47suw4fwcjrbungy@skbuf/ Signed-off-by: Marek Behún <kabel@kernel.org> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Tested-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-10net: dsa: deduplicate code adding / deleting the port address to fdbMarek Behún1-44/+47
The sequence if (dsa_switch_supports_uc_filtering(ds)) dsa_port_standalone_host_fdb_add(dp, addr, 0); if (!ether_addr_equal(addr, conduit->dev_addr)) dev_uc_add(conduit, addr); is executed both in dsa_user_open() and dsa_user_set_mac_addr(). Its reverse is executed both in dsa_user_close() and dsa_user_set_mac_addr(). Refactor these sequences into new functions dsa_user_host_uc_install() and dsa_user_host_uc_uninstall(). Signed-off-by: Marek Behún <kabel@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-29net: dsa: remove mac_prepare()/mac_finish() shimsRussell King (Oracle)2-34/+0
No DSA driver makes use of the mac_prepare()/mac_finish() shimmed operations anymore, so we can remove these. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://lore.kernel.org/r/E1sByNx-00ELW1-Vp@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-22tracing/treewide: Remove second parameter of __assign_str()Steven Rostedt (Google)1-17/+17
With the rework of how the __string() handles dynamic strings where it saves off the source string in field in the helper structure[1], the assignment of that value to the trace event field is stored in the helper value and does not need to be passed in again. This means that with: __string(field, mystring) Which use to be assigned with __assign_str(field, mystring), no longer needs the second parameter and it is unused. With this, __assign_str() will now only get a single parameter. There's over 700 users of __assign_str() and because coccinelle does not handle the TRACE_EVENT() macro I ended up using the following sed script: git grep -l __assign_str | while read a ; do sed -e 's/\(__assign_str([^,]*[^ ,]\) *,[^;]*/\1)/' $a > /tmp/test-file; mv /tmp/test-file $a; done I then searched for __assign_str() that did not end with ';' as those were multi line assignments that the sed script above would fail to catch. Note, the same updates will need to be done for: __assign_str_len() __assign_rel_str() __assign_rel_str_len() I tested this with both an allmodconfig and an allyesconfig (build only for both). [1] https://lore.kernel.org/linux-trace-kernel/20240222211442.634192653@goodmis.org/ Link: https://lore.kernel.org/linux-trace-kernel/20240516133454.681ba6a0@rorschach.local.home Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Julia Lawall <Julia.Lawall@inria.fr> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Jani Nikula <jani.nikula@intel.com> Acked-by: Christian König <christian.koenig@amd.com> for the amdgpu parts. Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> #for Acked-by: Rafael J. Wysocki <rafael@kernel.org> # for thermal Acked-by: Takashi Iwai <tiwai@suse.de> Acked-by: Darrick J. Wong <djwong@kernel.org> # xfs Tested-by: Guenter Roeck <linux@roeck-us.net>
2024-05-08net: dsa: add support switches global DSCP priority mappingOleksij Rempel1-0/+75
Some switches like Microchip KSZ variants do not support per port DSCP priority configuration. Instead there is a global DSCP mapping table. To handle it, we will accept set/del request to any of user ports to make global configuration and update dcb app entries for all other ports. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08net: dsa: add support for DCB get/set apptrust configurationOleksij Rempel1-0/+28
Add DCB support to get/set trust configuration for different packet priority information sources. Some switch allow to chose different source of packet priority classification. For example on KSZ switches it is possible to configure VLAN PCP and/or DSCP sources. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-07net: annotate writes on dev->mtu from ndo_change_mtu()Eric Dumazet1-1/+1
Simon reported that ndo_change_mtu() methods were never updated to use WRITE_ONCE(dev->mtu, new_mtu) as hinted in commit 501a90c94510 ("inet: protect against too small mtu values.") We read dev->mtu without holding RTNL in many places, with READ_ONCE() annotations. It is time to take care of ndo_change_mtu() methods to use corresponding WRITE_ONCE() Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Simon Horman <horms@kernel.org> Closes: https://lore.kernel.org/netdev/20240505144608.GB67882@kernel.org/ Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Simon Horman <horms@kernel.org> Acked-by: Shannon Nelson <shannon.nelson@amd.com> Link: https://lore.kernel.org/r/20240506102812.3025432-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-01net: dsa: Remove adjust_link pathsFlorian Fainelli2-130/+12
Now that we no longer any drivers using PHYLIB's adjust_link callback, remove all paths that made use of adjust_link as well as the associated functions. Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/20240430164816.2400606-3-florian.fainelli@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22devlink: extend devlink_param *set pointerMateusz Polchlopek1-1/+2
Extend devlink_param *set function pointer to take extack as a param. Sometimes it is needed to pass information to the end user from set function. It is more proper to use for that netlink instead of passing message to dmesg. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-04-15net: dsa: convert dsa_user_phylink_fixed_state() to use dsa_phylink_to_port()Russell King (Oracle)1-1/+1
Convert dsa_user_phylink_fixed_state() to use the newly introduced dsa_phylink_to_port() helper. Suggested-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-11net: dsa: allow DSA switch drivers to provide their own phylink mac opsRussell King (Oracle)2-6/+31
Rather than having a shim for each and every phylink MAC operation, allow DSA switch drivers to provide their own ops structure. When a DSA driver provides the phylink MAC operations, the shimmed ops must not be provided, so fail an attempt to register a switch with both the phylink_mac_ops in struct dsa_switch and the phylink_mac_* operations populated in dsa_switch_ops populated. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://lore.kernel.org/r/E1rudqF-006K9H-Cc@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-11net: dsa: introduce dsa_phylink_to_port()Russell King (Oracle)1-6/+6
We convert from a phylink_config struct to a dsa_port struct in many places, let's provide a helper for this. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://lore.kernel.org/r/E1rudqA-006K9B-85@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07net: dsa: Leverage core stats allocatorFlorian Fainelli1-7/+1
With commit 34d21de99cea9 ("net: Move {l,t,d}stats allocation to core and convert veth & vrf"), stats allocation could be done on net core instead of in this driver. With this new approach, the driver doesn't have to bother with error handling (allocation failure checking, making sure free happens in the right spot, etc). This is core responsibility now. Remove the allocation in the DSA user network device code and leverage the network core allocation instead. Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/r/20240306200416.2973179-1-florian.fainelli@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-26rtnetlink: prepare nla_put_iflink() to run under RCUEric Dumazet1-1/+1
We want to be able to run rtnl_fill_ifinfo() under RCU protection instead of RTNL in the future. This patch prepares dev_get_iflink() and nla_put_iflink() to run either with RTNL or RCU held. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-21net: dsa: constify the struct device_type usageRicardo B. Marliere1-1/+1
Since commit aed65af1cc2f ("drivers: make device_type const"), the driver core can properly handle constant struct device_type. Move the dsa_type variable to be a constant structure as well, placing it into read-only memory which can not be modified at runtime. Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-15net: dsa: remove OF-based MDIO bus registration from DSA coreArınç ÜNAL1-6/+1
The code block under the "!ds->user_mii_bus && ds->ops->phy_read" check under dsa_switch_setup() populates ds->user_mii_bus. The use of ds->user_mii_bus is inappropriate when the MDIO bus of the switch is described on the device tree [1]. For this reason, use this code block only for switches [with MDIO bus] probed on platform_data, and OF which the switch MDIO bus isn't described on the device tree. Therefore, remove OF-based MDIO bus registration as it's useless for these cases. These subdrivers which control switches [with MDIO bus] probed on OF, will lose the ability to register the MDIO bus OF-based: drivers/net/dsa/b53/b53_common.c drivers/net/dsa/lan9303-core.c drivers/net/dsa/vitesse-vsc73xx-core.c These subdrivers let the DSA core driver register the bus: - ds->ops->phy_read() and ds->ops->phy_write() are present. - ds->user_mii_bus is not populated. The commit fe7324b93222 ("net: dsa: OF-ware slave_mii_bus") which brought OF-based MDIO bus registration on the DSA core driver is reasonably recent and, in this time frame, there have been no device trees in the Linux repository that started describing the MDIO bus, or dt-bindings defining the MDIO bus for the switches these subdrivers control. So I don't expect any devices to be affected. The logic we encourage is that all subdrivers should register the switch MDIO bus on their own [2]. And, for subdrivers which control switches [with MDIO bus] probed on OF, this logic must be followed to support all cases properly: No switch MDIO bus defined: Populate ds->user_mii_bus, register the MDIO bus, set the interrupts for PHYs if "interrupt-controller" is defined at the switch node. This case should only be covered for the switches which their dt-bindings documentation didn't document the MDIO bus from the start. This is to keep supporting the device trees that do not describe the MDIO bus on the device tree but the MDIO bus is being used nonetheless. Switch MDIO bus defined: Don't populate ds->user_mii_bus, register the MDIO bus, set the interrupts for PHYs if ["interrupt-controller" is defined at the switch node and "interrupts" is defined at the PHY nodes under the switch MDIO bus node]. Switch MDIO bus defined but explicitly disabled: If the device tree says status = "disabled" for the MDIO bus, we shouldn't need an MDIO bus at all. Instead, just exit as early as possible and do not call any MDIO API. After all subdrivers that control switches with MDIO buses are made to register the MDIO buses on their own, we will be able to get rid of dsa_switch_ops :: phy_read() and :: phy_write(), and the code block for registering the MDIO bus on the DSA core driver. Link: https://lore.kernel.org/netdev/20231213120656.x46fyad6ls7sqyzv@skbuf/ [1] Link: https://lore.kernel.org/netdev/20240103184459.dcbh57wdnlox6w7d@skbuf/ [2] Suggested-by: Luiz Angelo Daros de Luca <luizluca@gmail.com> Acked-by: Luiz Angelo Daros de Luca <luizluca@gmail.com> Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://lore.kernel.org/r/20240213-for-netnext-dsa-mdio-bus-v2-1-0ff6f4823a9e@arinc9.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-08net: dsa: tag_sja1105: remove "inline" keywordVladimir Oltean1-2/+2
The convention is to not use the "inline" keyword for functions in C files, but to let the compiler choose. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240206112927.4134375-2-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-08net: dsa: remove "inline" from dsa_user_netpoll_send_skb()Vladimir Oltean1-2/+2
The convention is to not use "inline" functions in C files, and let the compiler decide whether to inline or not. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240206112927.4134375-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-05net: dsa: reindent arguments of dsa_user_vlan_for_each()Vladimir Oltean1-4/+4
These got misaligned after commit 6ca80638b90c ("net: dsa: Use conduit and user terms"). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-31ethtool: replace struct ethtool_eee with a new struct ethtool_keee on kernel ↵Heiner Kallweit1-2/+2
side In order to pass EEE link modes beyond bit 32 to userspace we have to complement the 32 bit bitmaps in struct ethtool_eee with linkmode bitmaps. Therefore, similar to ethtool_link_settings and ethtool_link_ksettings, add a struct ethtool_keee. In a first step it's an identical copy of ethtool_eee. This patch simply does a s/ethtool_eee/ethtool_keee/g for all users. No functional change intended. Suggested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-11net: dsa: fix netdev_priv() dereference before check on non-DSA netdevice eventsVladimir Oltean1-2/+5
After the blamed commit, we started doing this dereference for every NETDEV_CHANGEUPPER and NETDEV_PRECHANGEUPPER event in the system. static inline struct dsa_port *dsa_user_to_port(const struct net_device *dev) { struct dsa_user_priv *p = netdev_priv(dev); return p->dp; } Which is obviously bogus, because not all net_devices have a netdev_priv() of type struct dsa_user_priv. But struct dsa_user_priv is fairly small, and p->dp means dereferencing 8 bytes starting with offset 16. Most drivers allocate that much private memory anyway, making our access not fault, and we discard the bogus data quickly afterwards, so this wasn't caught. But the dummy interface is somewhat special in that it calls alloc_netdev() with a priv size of 0. So every netdev_priv() dereference is invalid, and we get this when we emit a NETDEV_PRECHANGEUPPER event with a VLAN as its new upper: $ ip link add dummy1 type dummy $ ip link add link dummy1 name dummy1.100 type vlan id 100 [ 43.309174] ================================================================== [ 43.316456] BUG: KASAN: slab-out-of-bounds in dsa_user_prechangeupper+0x30/0xe8 [ 43.323835] Read of size 8 at addr ffff3f86481d2990 by task ip/374 [ 43.330058] [ 43.342436] Call trace: [ 43.366542] dsa_user_prechangeupper+0x30/0xe8 [ 43.371024] dsa_user_netdevice_event+0xb38/0xee8 [ 43.375768] notifier_call_chain+0xa4/0x210 [ 43.379985] raw_notifier_call_chain+0x24/0x38 [ 43.384464] __netdev_upper_dev_link+0x3ec/0x5d8 [ 43.389120] netdev_upper_dev_link+0x70/0xa8 [ 43.393424] register_vlan_dev+0x1bc/0x310 [ 43.397554] vlan_newlink+0x210/0x248 [ 43.401247] rtnl_newlink+0x9fc/0xe30 [ 43.404942] rtnetlink_rcv_msg+0x378/0x580 Avoid the kernel oops by dereferencing after the type check, as customary. Fixes: 4c3f80d22b2e ("net: dsa: walk through all changeupper notifier functions") Reported-and-tested-by: syzbot+d81bcd883824180500c8@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/0000000000001d4255060e87545c@google.com/ Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240110003354.2796778-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-05net: fill in MODULE_DESCRIPTION()s for DSA tagsJakub Kicinski18-0/+18
W=1 builds now warn if module is built without a MODULE_DESCRIPTION(). Add descriptions to all the DSA tag modules. The descriptions are copy/pasted Kconfig names, with s/^Tag/DSA tag/. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com> Acked-by: Arınç ÜNAL <arinc.unal@arinc9.com> Acked-by: Kurt Kanzenbach <kurt@linutronix.de> Link: https://lore.kernel.org/r/20240104143759.1318137-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-27net: rename dsa_realloc_skb to skb_ensure_writable_head_tailRadu Pirea (NXP OSS)1-26/+3
Rename dsa_realloc_skb to skb_ensure_writable_head_tail and move it to skbuff.c to use it as helper. Signed-off-by: Radu Pirea (NXP OSS) <radu-nicolae.pirea@oss.nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-14net: dsa: tag_rtl4_a: Use existing ETH_P_REALTEK constantFlorian Fainelli1-3/+2
No functional change, uses the existing ETH_P_REALTEK constant already defined in if_ether.h. Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20231113165030.2440083-1-florian.fainelli@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-24net: dsa: Rename IFLA_DSA_MASTER to IFLA_DSA_CONDUITFlorian Fainelli1-5/+5
This preserves the existing IFLA_DSA_MASTER which is part of the uAPI and creates an alias named IFLA_DSA_CONDUIT. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://lore.kernel.org/r/20231023181729.1191071-3-florian.fainelli@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-24net: dsa: Use conduit and user termsFlorian Fainelli36-1194/+1194
Use more inclusive terms throughout the DSA subsystem by moving away from "master" which is replaced by "conduit" and "slave" which is replaced by "user". No functional changes. Acked-by: Rob Herring <robh@kernel.org> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://lore.kernel.org/r/20231023181729.1191071-2-florian.fainelli@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-11net: dsa: remove dsa_port_phylink_validate()Russell King (Oracle)1-15/+0
As all drivers now provide phylink capabilities (including MAC), the if() condition in dsa_port_phylink_validate() will always be true. We will always use the generic validator, which phylink will call itself if the .validate method isn't populated. Thus, there is now no need to implement the .validate method, so this implementation can be removed. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-03net: dsa: tag_ksz: Extend ksz9477_xmit() for HSR frame duplicationLukasz Majewski1-0/+8
The KSZ9477 has support for HSR (High-Availability Seamless Redundancy). One of its offloading (i.e. performed in the switch IC hardware) features is to duplicate received frame to both HSR aware switch ports. To achieve this goal - the tail TAG needs to be modified. To be more specific, both ports must be marked as destination (egress) ones. The NETIF_F_HW_HSR_DUP flag indicates that the device supports HSR and assures (in HSR core code) that frame is sent only once from HOST to switch with tail tag indicating both ports. Signed-off-by: Lukasz Majewski <lukma@denx.de> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-03net: dsa: notify drivers of MAC address changes on user portsVladimir Oltean1-0/+7
In some cases, drivers may need to veto the changing of a MAC address on a user port. Such is the case with KSZ9477 when it offloads a HSR device, because it programs the MAC address of multiple ports to a shared hardware register. Those ports need to have equal MAC addresses for the lifetime of the HSR offload. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Lukasz Majewski <lukma@denx.de> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-03net: dsa: propagate extack to ds->ops->port_hsr_join()Vladimir Oltean3-4/+6
Drivers can provide meaningful error messages which state a reason why they can't perform an offload, and dsa_slave_changeupper() already has the infrastructure to propagate these over netlink rather than printing to the kernel log. So pass the extack argument and modify the xrs700x driver's port_hsr_join() prototype. Also take the opportunity and use the extack for the 2 -EOPNOTSUPP cases from xrs700x_hsr_join(). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Lukasz Majewski <lukma@denx.de> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-08-09net: dsa: mark parsed interface mode for legacy switch driversRussell King (Oracle)1-4/+8
If we successfully parsed an interface mode with a legacy switch driver, populate that mode into phylink's supported interfaces rather than defaulting to the internal and gmii interfaces. This hasn't caused an issue so far, because when the interface doesn't match a supported one, phylink_validate() doesn't clear the supported mask, but instead returns -EINVAL. phylink_parse_fixedlink() doesn't check this return value, and merely relies on the supported ethtool link modes mask being cleared. Therefore, the fixed link settings end up being allowed despite validation failing. Before this causes a problem, arrange for DSA to more accurately populate phylink's supported interfaces mask so validation can correctly succeed. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://lore.kernel.org/r/E1qTKdM-003Cpx-Eh@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-1/+8
Cross-merge networking fixes after downstream PR. Conflicts: net/dsa/port.c 9945c1fb03a3 ("net: dsa: fix older DSA drivers using phylink") a88dd7538461 ("net: dsa: remove legacy_pre_march2020 detection") https://lore.kernel.org/all/20230731102254.2c9868ca@canb.auug.org.au/ net/xdp/xsk.c 3c5b4d69c358 ("net: annotate data-races around sk->sk_mark") b7f72a30e9ac ("xsk: introduce wrappers and helpers for supporting multi-buffer in Tx path") https://lore.kernel.org/all/20230731102631.39988412@canb.auug.org.au/ drivers/net/ethernet/broadcom/bnxt/bnxt.c 37b61cda9c16 ("bnxt: don't handle XDP in netpoll") 2b56b3d99241 ("eth: bnxt: handle invalid Tx completions more gracefully") https://lore.kernel.org/all/20230801101708.1dc7faac@canb.auug.org.au/ Adjacent changes: drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c 62da08331f1a ("net/mlx5e: Set proper IPsec source port in L4 selector") fbd517549c32 ("net/mlx5e: Add function to get IPsec offload namespace") drivers/net/ethernet/sfc/selftest.c 55c1528f9b97 ("sfc: fix field-spanning memcpy in selftest") ae9d445cd41f ("sfc: Miscellaneous comment removals") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-01net: dsa: tag_qca: return early if dev is not foundChristian Marangi1-4/+4
Currently checksum is recalculated and dsa tag stripped even if we later don't find the dev. To improve code, exit early if we don't find the dev and skip additional operation on the skb since it will be freed anyway. Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://lore.kernel.org/r/20230730074113.21889-1-ansuelsmth@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-07-27net: dsa: fix older DSA drivers using phylinkRussell King (Oracle)1-1/+8
Older DSA drivers that do not provide an dsa_ops adjust_link method end up using phylink. Unfortunately, a recent phylink change that requires its supported_interfaces bitmap to be filled breaks these drivers because the bitmap remains empty. Rather than fixing each driver individually, fix it in the core code so we have a sensible set of defaults. Reported-by: Sergei Antonov <saproj@gmail.com> Fixes: de5c9bf40c45 ("net: phylink: require supported_interfaces to be filled") Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Tested-by: Vladimir Oltean <olteanv@gmail.com> # dsa_loop Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://lore.kernel.org/r/E1qOflM-001AEz-D3@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-23net: dsa: remove deprecated strncpyjustinstitt@google.com1-4/+5
`strncpy` is deprecated for use on NUL-terminated destination strings [1]. Even call sites utilizing length-bounded destination buffers should switch over to using `strtomem` or `strtomem_pad`. In this case, however, the compiler is unable to determine the size of the `data` buffer which renders `strtomem` unusable. Due to this, `strscpy` should be used. It should be noted that most call sites already zero-initialize the destination buffer. However, I've opted to use `strscpy_pad` to maintain the same exact behavior that `strncpy` produced (zero-padded tail up to `len`). Also see [3]. [1]: www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [2]: elixir.bootlin.com/linux/v6.3/source/net/ethtool/ioctl.c#L1944 [3]: manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html Link: https://github.com/KSPP/linux/issues/90 Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Justin Stitt <justinstitt@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-18net: dsa: remove legacy_pre_march2020 detectionRussell King (Oracle)1-41/+0
All drivers are now updated for the March 2020 changes, and no longer make use of the mac_pcs_get_state() or mac_an_restart() operations, which are now NULL across all DSA drivers. All DSA drivers don't look at speed, duplex, pause or advertisement in their phylink_mac_config() method either. Remove support for these operations from DSA, and stop marking DSA as a legacy driver by default. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-07-04net: dsa: sja1105: always enable the send_meta optionsVladimir Oltean1-45/+0
incl_srcpt has the limitation, mentioned in commit b4638af8885a ("net: dsa: sja1105: always enable the INCL_SRCPT option"), that frames with a MAC DA of 01:80:c2:xx:yy:zz will be received as 01:80:c2:00:00:zz unless PTP RX timestamping is enabled. The incl_srcpt option was initially unconditionally enabled, then that changed with commit 42824463d38d ("net: dsa: sja1105: Limit use of incl_srcpt to bridge+vlan mode"), then again with b4638af8885a ("net: dsa: sja1105: always enable the INCL_SRCPT option"). Bottom line is that it now needs to be always enabled, otherwise the driver does not have a reliable source of information regarding source_port and switch_id for link-local traffic (tag_8021q VLANs may be imprecise since now they identify an entire bridging domain when ports are not standalone). If we accept that PTP RX timestamping (and therefore, meta frame generation) is always enabled in hardware, then that limitation could be avoided and packets with any MAC DA can be properly received, because meta frames do contain the original bytes from the MAC DA of their associated link-local packet. This change enables meta frame generation unconditionally, which also has the nice side effects of simplifying the switch control path (a switch reset is no longer required on hwtstamping settings change) and the tagger data path (it no longer needs to be informed whether to expect meta frames or not - it always does). Fixes: 227d07a07ef1 ("net: dsa: sja1105: Add support for traffic through standalone ports") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-04net: dsa: tag_sja1105: fix MAC DA patching from meta framesVladimir Oltean1-2/+2
The SJA1105 manual says that at offset 4 into the meta frame payload we have "MAC destination byte 2" and at offset 5 we have "MAC destination byte 1". These are counted from the LSB, so byte 1 is h_dest[ETH_HLEN-2] aka h_dest[4] and byte 2 is h_dest[ETH_HLEN-3] aka h_dest[3]. The sja1105_meta_unpack() function decodes these the other way around, so a frame with MAC DA 01:80:c2:11:22:33 is received by the network stack as having 01:80:c2:22:11:33. Fixes: e53e18a6fe4d ("net: dsa: sja1105: Receive and decode meta frames") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-03net: dsa: tag_sja1105: fix source port decoding in vlan_filtering=0 bridge modeVladimir Oltean1-3/+6
There was a regression introduced by the blamed commit, where pinging to a VLAN-unaware bridge would fail with the repeated message "Couldn't decode source port" coming from the tagging protocol driver. When receiving packets with a bridge_vid as determined by dsa_tag_8021q_bridge_join(), dsa_8021q_rcv() will decode: - source_port = 0 (which isn't really valid, more like "don't know") - switch_id = 0 (which isn't really valid, more like "don't know") - vbid = value in range 1-7 Since the blamed patch has reversed the order of the checks, we are now going to believe that source_port != -1 and switch_id != -1, so they're valid, but they aren't. The minimal solution to the problem is to only populate source_port and switch_id with what dsa_8021q_rcv() came up with, if the vbid is zero, i.e. the source port information is trustworthy. Fixes: c1ae02d87689 ("net: dsa: tag_sja1105: always prefer source port information from INCL_SRCPT") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-29net: dsa: tag_sja1105: always prefer source port information from INCL_SRCPTVladimir Oltean1-9/+29
Currently the sja1105 tagging protocol prefers using the source port information from the VLAN header if that is available, falling back to the INCL_SRCPT option if it isn't. The VLAN header is available for all frames except for META frames initiated by the switch (containing RX timestamps), and thus, the "if (is_link_local)" branch is practically dead. The tag_8021q source port identification has become more loose ("imprecise") and will report a plausible rather than exact bridge port, when under a bridge (be it VLAN-aware or VLAN-unaware). But link-local traffic always needs to know the precise source port. With incorrect source port reporting, for example PTP traffic over 2 bridged ports will all be seen on sockets opened on the first such port, which is incorrect. Now that the tagging protocol has been changed to make link-local frames always contain source port information, we can reverse the order of the checks so that we always give precedence to that information (which is always precise) in lieu of the tag_8021q VID which is only precise for a standalone port. Fixes: d7f9787a763f ("net: dsa: tag_8021q: add support for imprecise RX based on the VBID") Fixes: 91495f21fcec ("net: dsa: tag_8021q: replace the SVL bridging with VLAN-unaware IVL bridging") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-06-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski4-29/+64
Merge in late fixes to prepare for the 6.5 net-next PR. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-27net: dsa: avoid suspicious RCU usage for synced VLAN-aware MAC addressesVladimir Oltean4-29/+64
When using the felix driver (the only one which supports UC filtering and MC filtering) as a DSA master for a random other DSA switch, one can see the following stack trace when the downstream switch ports join a VLAN-aware bridge: ============================= WARNING: suspicious RCU usage ----------------------------- net/8021q/vlan_core.c:238 suspicious rcu_dereference_protected() usage! stack backtrace: Workqueue: dsa_ordered dsa_slave_switchdev_event_work Call trace: lockdep_rcu_suspicious+0x170/0x210 vlan_for_each+0x8c/0x188 dsa_slave_sync_uc+0x128/0x178 __hw_addr_sync_dev+0x138/0x158 dsa_slave_set_rx_mode+0x58/0x70 __dev_set_rx_mode+0x88/0xa8 dev_uc_add+0x74/0xa0 dsa_port_bridge_host_fdb_add+0xec/0x180 dsa_slave_switchdev_event_work+0x7c/0x1c8 process_one_work+0x290/0x568 What it's saying is that vlan_for_each() expects rtnl_lock() context and it's not getting it, when it's called from the DSA master's ndo_set_rx_mode(). The caller of that - dsa_slave_set_rx_mode() - is the slave DSA interface's dsa_port_bridge_host_fdb_add() which comes from the deferred dsa_slave_switchdev_event_work(). We went to great lengths to avoid the rtnl_lock() context in that call path in commit 0faf890fc519 ("net: dsa: drop rtnl_lock from dsa_slave_switchdev_event_work"), and calling rtnl_lock() is simply not an option due to the possibility of deadlocking when calling dsa_flush_workqueue() from the call paths that do hold rtnl_lock() - basically all of them. So, when the DSA master calls vlan_for_each() from its ndo_set_rx_mode(), the state of the 8021q driver on this device is really not protected from concurrent access by anything. Looking at net/8021q/, I don't think that vlan_info->vid_list was particularly designed with RCU traversal in mind, so introducing an RCU read-side form of vlan_for_each() - vlan_for_each_rcu() - won't be so easy, and it also wouldn't be exactly what we need anyway. In general I believe that the solution isn't in net/8021q/ anyway; vlan_for_each() is not cut out for this task. DSA doesn't need rtnl_lock() to be held per se - since it's not a netdev state change that we're blocking, but rather, just concurrent additions/removals to a VLAN list. We don't even need sleepable context - the callback of vlan_for_each() just schedules deferred work. The proposed escape is to remove the dependency on vlan_for_each() and to open-code a non-sleepable, rtnl-free alternative to that, based on copies of the VLAN list modified from .ndo_vlan_rx_add_vid() and .ndo_vlan_rx_kill_vid(). Fixes: 64fdc5f341db ("net: dsa: sync unicast and multicast addresses for VLAN filters too") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20230626154402.3154454-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-1/+23
Cross-merge networking fixes after downstream PR. Conflicts: tools/testing/selftests/net/fcnal-test.sh d7a2fc1437f7 ("selftests: net: fcnal-test: check if FIPS mode is enabled") dd017c72dde6 ("selftests: fcnal: Test SO_DONTROUTE on TCP sockets.") https://lore.kernel.org/all/5007b52c-dd16-dbf6-8d64-b9701bfa498b@tessares.net/ https://lore.kernel.org/all/20230619105427.4a0df9b3@canb.auug.org.au/ No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-20net: dsa: introduce preferred_default_local_cpu_port and use on MT7530Vladimir Oltean1-1/+23
Since the introduction of the OF bindings, DSA has always had a policy that in case multiple CPU ports are present in the device tree, the numerically smallest one is always chosen. The MT7530 switch family, except the switch on the MT7988 SoC, has 2 CPU ports, 5 and 6, where port 6 is preferable on the MT7531BE switch because it has higher bandwidth. The MT7530 driver developers had 3 options: - to modify DSA when the MT7531 switch support was introduced, such as to prefer the better port - to declare both CPU ports in device trees as CPU ports, and live with the sub-optimal performance resulting from not preferring the better port - to declare just port 6 in the device tree as a CPU port Of course they chose the path of least resistance (3rd option), kicking the can down the road. The hardware description in the device tree is supposed to be stable - developers are not supposed to adopt the strategy of piecemeal hardware description, where the device tree is updated in lockstep with the features that the kernel currently supports. Now, as a result of the fact that they did that, any attempts to modify the device tree and describe both CPU ports as CPU ports would make DSA change its default selection from port 6 to 5, effectively resulting in a performance degradation visible to users with the MT7531BE switch as can be seen below. Without preferring port 6: [ ID][Role] Interval Transfer Bitrate Retr [ 5][TX-C] 0.00-20.00 sec 374 MBytes 157 Mbits/sec 734 sender [ 5][TX-C] 0.00-20.00 sec 373 MBytes 156 Mbits/sec receiver [ 7][RX-C] 0.00-20.00 sec 1.81 GBytes 778 Mbits/sec 0 sender [ 7][RX-C] 0.00-20.00 sec 1.81 GBytes 777 Mbits/sec receiver With preferring port 6: [ ID][Role] Interval Transfer Bitrate Retr [ 5][TX-C] 0.00-20.00 sec 1.99 GBytes 856 Mbits/sec 273 sender [ 5][TX-C] 0.00-20.00 sec 1.99 GBytes 855 Mbits/sec receiver [ 7][RX-C] 0.00-20.00 sec 1.72 GBytes 737 Mbits/sec 15 sender [ 7][RX-C] 0.00-20.00 sec 1.71 GBytes 736 Mbits/sec receiver Using one port for WAN and the other ports for LAN is a very popular use case which is what this test emulates. As such, this change proposes that we retroactively modify stable kernels (which don't support the modification of the CPU port assignments, so as to let user space fix the problem and restore the throughput) to keep the mt7530 driver preferring port 6 even with device trees where the hardware is more fully described. Fixes: c288575f7810 ("net: dsa: mt7530: Add the support of MT7531 switch") Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-26net: dsa: add support for mac_prepare() and mac_finish() callsRussell King (Oracle)1-0/+32
Add DSA support for the phylink mac_prepare() and mac_finish() calls. These were introduced as part of the PCS support to allow MACs to perform preparatory steps prior to configuration, and finalisation steps after the MAC and PCS has been configured. Introducing phylink_pcs support to the mv88e6xxx DSA driver needs some code moved out of its mac_config() stage into the mac_prepare() and mac_finish() stages, and this commit facilitates such code in DSA drivers. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-23net: dsa: tag_ocelot: call only the relevant portion of __skb_vlan_pop() on TXVladimir Oltean1-1/+1
ocelot_xmit_get_vlan_info() calls __skb_vlan_pop() as the most appropriate helper I could find which strips away a VLAN header. That's all I need it to do, but __skb_vlan_pop() has more logic, which will become incompatible with the future revert of commit 6d1ccff62780 ("net: reset mac header in dev_start_xmit()"). Namely, it performs a sanity check on skb_mac_header(), which will stop being set after the above revert, so it will return an error instead of removing the VLAN tag. ocelot_xmit_get_vlan_info() gets called in 2 circumstances: (1) the port is under a VLAN-aware bridge and the bridge sends VLAN-tagged packets (2) the port is under a VLAN-aware bridge and somebody else (an 8021q upper) sends VLAN-tagged packets (using a VID that isn't in the bridge vlan tables) In case (1), there is actually no bug to defend against, because br_dev_xmit() calls skb_reset_mac_header() and things continue to work. However, in case (2), illustrated using the commands below, it can be seen that our intervention is needed, since __skb_vlan_pop() complains: $ ip link add br0 type bridge vlan_filtering 1 && ip link set br0 up $ ip link set $eth master br0 && ip link set $eth up $ ip link add link $eth name $eth.100 type vlan id 100 && ip link set $eth.100 up $ ip addr add 192.168.100.1/24 dev $eth.100 I could fend off the checks in __skb_vlan_pop() with some skb_mac_header_was_set() calls, but seeing how few callers of __skb_vlan_pop() there are from TX paths, that seems rather unproductive. As an alternative solution, extract the bare minimum logic to strip a VLAN header, and move it to a new helper named vlan_remove_tag(), close to the definition of vlan_insert_tag(). Document it appropriately and make ocelot_xmit_get_vlan_info() call this smaller helper instead. Seeing that it doesn't appear illegal to test skb->protocol in the TX path, I guess it would be a good for vlan_remove_tag() to also absorb the vlan_set_encap_proto() function call. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-23net: dsa: update TX path comments to not mention skb_mac_header()Vladimir Oltean2-3/+3
Once commit 6d1ccff62780 ("net: reset mac header in dev_start_xmit()") will be reverted, it will no longer be true that skb->data points at skb_mac_header(skb) - since the skb->mac_header will not be set - so stop saying that, and just say that it points to the MAC header. I've reviewed vlan_insert_tag() and it does not *actually* depend on skb_mac_header(), so reword that to avoid the confusion. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-23net: dsa: tag_sja1105: replace skb_mac_header() with vlan_eth_hdr()Vladimir Oltean1-1/+1
This is a cosmetic patch which consolidates the code to use the helper function offered by if_vlan.h. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-23net: dsa: tag_sja1105: don't rely on skb_mac_header() in TX pathsVladimir Oltean1-1/+1
skb_mac_header() will no longer be available in the TX path when reverting commit 6d1ccff62780 ("net: reset mac header in dev_start_xmit()"). As preparation for that, let's use skb_vlan_eth_hdr() to get to the VLAN header instead, which assumes it's located at skb->data (assumption which holds true here). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-23net: dsa: tag_ksz: do not rely on skb_mac_header() in TX pathsVladimir Oltean1-9/+9
skb_mac_header() will no longer be available in the TX path when reverting commit 6d1ccff62780 ("net: reset mac header in dev_start_xmit()"). As preparation for that, let's use skb_eth_hdr() to get to the Ethernet header's MAC DA instead, helper which assumes this header is located at skb->data (assumption which holds true here). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-23net: dsa: tag_ocelot: do not rely on skb_mac_header() for VLAN xmitVladimir Oltean1-1/+1
skb_mac_header() will no longer be available in the TX path when reverting commit 6d1ccff62780 ("net: reset mac header in dev_start_xmit()"). As preparation for that, let's use skb_vlan_eth_hdr() to get to the VLAN header instead, which assumes it's located at skb->data (assumption which holds true here). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-12net: dsa: add trace points for VLAN operationsVladimir Oltean2-5/+137
These are not as critical as the FDB/MDB trace points (I'm not aware of outstanding VLAN related bugs), but maybe they are useful to somebody, either debugging something or simply trying to learn more. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-12net: dsa: add trace points for FDB/MDB operationsVladimir Oltean4-12/+423
DSA performs non-trivial housekeeping of unicast and multicast addresses on shared (CPU and DSA) ports, and puts a bit of pressure on higher layers, requiring them to behave correctly (remove these addresses exactly as many times as they were added). Otherwise, either addresses linger around forever, or DSA returns -ENOENT complaining that entries that were already deleted must be deleted again. To aid debugging, introduce some trace points specifically for FDB and MDB - that's where some of the bugs still are right now. Some bugs I have seen were also due to race conditions, see: - 630fd4822af2 ("net: dsa: flush switchdev workqueue on bridge join error path") - a2614140dc0f ("net: dsa: mv88e6xxx: flush switchdev FDB workqueue before removing VLAN") so it would be good to not disturb the timing too much, hence the choice to use trace points vs regular dev_dbg(). I've had these for some time on my computer in a less polished form, and they've proven useful. What I found most useful was to enable CONFIG_BOOTTIME_TRACING, add "trace_event=dsa" to the kernel cmdline, and run "cat /sys/kernel/debug/tracing/trace". This is to debug more complex environments with network managers started by the init system, things like that. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-09net: dsa: replace NETDEV_PRE_CHANGE_HWTSTAMP notifier with a stubVladimir Oltean6-13/+37
There was a sort of rush surrounding commit 88c0a6b503b7 ("net: create a netdev notifier for DSA to reject PTP on DSA master"), due to a desire to convert DSA's attempt to deny TX timestamping on a DSA master to something that doesn't block the kernel-wide API conversion from ndo_eth_ioctl() to ndo_hwtstamp_set(). What was required was a mechanism that did not depend on ndo_eth_ioctl(), and what was provided was a mechanism that did not depend on ndo_eth_ioctl(), while at the same time introducing something that wasn't absolutely necessary - a new netdev notifier. There have been objections from Jakub Kicinski that using notifiers in general when they are not absolutely necessary creates complications to the control flow and difficulties to maintainers who look at the code. So there is a desire to not use notifiers. In addition to that, the notifier chain gets called even if there is no DSA in the system and no one is interested in applying any restriction. Take the model of udp_tunnel_nic_ops and introduce a stub mechanism, through which net/core/dev_ioctl.c can call into DSA even when CONFIG_NET_DSA=m. Compared to the code that existed prior to the notifier conversion, aka what was added in commits: - 4cfab3566710 ("net: dsa: Add wrappers for overloaded ndo_ops") - 3369afba1e46 ("net: Call into DSA netdevice_ops wrappers") this is different because we are not overloading any struct net_device_ops of the DSA master anymore, but rather, we are exposing a rather specific functionality which is orthogonal to which API is used to enable it - ndo_eth_ioctl() or ndo_hwtstamp_set(). Also, what is similar is that both approaches use function pointers to get from built-in code to DSA. There is no point in replicating the function pointers towards __dsa_master_hwtstamp_validate() once for every CPU port (dev->dsa_ptr). Instead, it is sufficient to introduce a singleton struct dsa_stubs, built into the kernel, which contains a single function pointer to __dsa_master_hwtstamp_validate(). I find this approach preferable to what we had originally, because dev->dsa_ptr->netdev_ops->ndo_do_ioctl() used to require going through struct dsa_port (dev->dsa_ptr), and so, this was incompatible with any attempts to add any data encapsulation and hide DSA data structures from the outside world. Link: https://lore.kernel.org/netdev/20230403083019.120b72fd@kernel.org/ Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-03net: create a netdev notifier for DSA to reject PTP on DSA masterVladimir Oltean3-35/+29
The fact that PTP 2-step TX timestamping is broken on DSA switches if the master also timestamps the same packets is documented by commit f685e609a301 ("net: dsa: Deny PTP on master if switch supports it"). We attempt to help the users avoid shooting themselves in the foot by making DSA reject the timestamping ioctls on an interface that is a DSA master, and the switch tree beneath it contains switches which are aware of PTP. The only problem is that there isn't an established way of intercepting ndo_eth_ioctl calls, so DSA creates avoidable burden upon the network stack by creating a struct dsa_netdevice_ops with overlaid function pointers that are manually checked from the relevant call sites. There used to be 2 such dsa_netdevice_ops, but now, ndo_eth_ioctl is the only one left. There is an ongoing effort to migrate driver-visible hardware timestamping control from the ndo_eth_ioctl() based API to a new ndo_hwtstamp_set() model, but DSA actively prevents that migration, since dsa_master_ioctl() is currently coded to manually call the master's legacy ndo_eth_ioctl(), and so, whenever a network device driver would be converted to the new API, DSA's restrictions would be circumvented, because any device could be used as a DSA master. The established way for unrelated modules to react on a net device event is via netdevice notifiers. So we create a new notifier which gets called whenever there is an attempt to change hardware timestamping settings on a device. Finally, there is another reason why a netdev notifier will be a good idea, besides strictly DSA, and this has to do with PHY timestamping. With ndo_eth_ioctl(), all MAC drivers must manually call phy_has_hwtstamp() before deciding whether to act upon SIOCSHWTSTAMP, otherwise they must pass this ioctl to the PHY driver via phy_mii_ioctl(). With the new ndo_hwtstamp_set() API, it will be desirable to simply not make any calls into the MAC device driver when timestamping should be performed at the PHY level. But there exist drivers, such as the lan966x switch, which need to install packet traps for PTP regardless of whether they are the layer that provides the hardware timestamps, or the PHY is. That would be impossible to support with the new API. The proposal there, too, is to introduce a netdev notifier which acts as a better cue for switching drivers to add or remove PTP packet traps, than ndo_hwtstamp_set(). The one introduced here "almost" works there as well, except for the fact that packet traps should only be installed if the PHY driver succeeded to enable hardware timestamping, whereas here, we need to deny hardware timestamping on the DSA master before it actually gets enabled. This is why this notifier is called "PRE_", and the notifier that would get used for PHY timestamping and packet traps would be called NETDEV_CHANGE_HWTSTAMP. This isn't a new concept, for example NETDEV_CHANGEUPPER and NETDEV_PRECHANGEUPPER do the same thing. In expectation of future netlink UAPI, we also pass a non-NULL extack pointer to the netdev notifier, and we make DSA populate it with an informative reason for the rejection. To avoid making it go to waste, we make the ioctl-based dev_set_hwtstamp() create a fake extack and print the message to the kernel log. Link: https://lore.kernel.org/netdev/20230401191215.tvveoi3lkawgg6g4@skbuf/ Link: https://lore.kernel.org/netdev/20230310164451.ls7bbs6pdzs4m6pw@skbuf/ Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-03net: dsa: make dsa_port_supports_hwtstamp() construct a fake ifreqVladimir Oltean3-6/+8
dsa_master_ioctl() is in the process of getting converted to a different API, where we won't have access to a struct ifreq * anymore, but rather, to a struct kernel_hwtstamp_config. Since ds->ops->port_hwtstamp_get() still uses struct ifreq *, this creates a difficult situation where we have to make up such a dummy pointer. The conversion is a bit messy, because it forces a "good" implementation of ds->ops->port_hwtstamp_get() to return -EFAULT in copy_to_user() because of the NULL ifr->ifr_data pointer. However, it works, and it is only a transient step until ds->ops->port_hwtstamp_get() gets converted to the new API which passes struct kernel_hwtstamp_config and does not call copy_to_user(). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-30net: dsa: fix db type confusion in host fdb/mdb add/delVladimir Oltean1-12/+12
We have the following code paths: Host FDB (unicast RX filtering): dsa_port_standalone_host_fdb_add() dsa_port_bridge_host_fdb_add() | | +--------------+ +------------+ | | v v dsa_port_host_fdb_add() dsa_port_standalone_host_fdb_del() dsa_port_bridge_host_fdb_del() | | +--------------+ +------------+ | | v v dsa_port_host_fdb_del() Host MDB (multicast RX filtering): dsa_port_standalone_host_mdb_add() dsa_port_bridge_host_mdb_add() | | +--------------+ +------------+ | | v v dsa_port_host_mdb_add() dsa_port_standalone_host_mdb_del() dsa_port_bridge_host_mdb_del() | | +--------------+ +------------+ | | v v dsa_port_host_mdb_del() The logic added by commit 5e8a1e03aa4d ("net: dsa: install secondary unicast and multicast addresses as host FDB/MDB") zeroes out db.bridge.num if the switch doesn't support ds->fdb_isolation (the majority doesn't). This is done for a reason explained in commit c26933639b54 ("net: dsa: request drivers to perform FDB isolation"). Taking a single code path as example - dsa_port_host_fdb_add() - the others are similar - the problem is that this function handles: - DSA_DB_PORT databases, when called from dsa_port_standalone_host_fdb_add() - DSA_DB_BRIDGE databases, when called from dsa_port_bridge_host_fdb_add() So, if dsa_port_host_fdb_add() were to make any change on the "bridge.num" attribute of the database, this would only be correct for a DSA_DB_BRIDGE, and a type confusion for a DSA_DB_PORT bridge. However, this bug is without consequences, for 2 reasons: - dsa_port_standalone_host_fdb_add() is only called from code which is (in)directly guarded by dsa_switch_supports_uc_filtering(ds), and that function only returns true if ds->fdb_isolation is set. So, the code only executed for DSA_DB_BRIDGE databases. - Even if the code was not dead for DSA_DB_PORT, we have the following memory layout: struct dsa_bridge { struct net_device *dev; unsigned int num; bool tx_fwd_offload; refcount_t refcount; }; struct dsa_db { enum dsa_db_type type; union { const struct dsa_port *dp; // DSA_DB_PORT struct dsa_lag lag; struct dsa_bridge bridge; // DSA_DB_BRIDGE }; }; So, the zeroization of dsa_db :: bridge :: num on a dsa_db structure of type DSA_DB_PORT would access memory which is unused, because we only use dsa_db :: dp for DSA_DB_PORT, and this is mapped at the same address with dsa_db :: dev for DSA_DB_BRIDGE, thanks to the union definition. It is correct to fix up dsa_db :: bridge :: num only from code paths that come from the bridge / switchdev, so move these there. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20230329133819.697642-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-30net: dsa: sync unicast and multicast addresses for VLAN filters tooVladimir Oltean1-5/+116
If certain conditions are met, DSA can install all necessary MAC addresses on the CPU ports as FDB entries and disable flooding towards the CPU (we call this RX filtering). There is one corner case where this does not work. ip link add br0 type bridge vlan_filtering 1 && ip link set br0 up ip link set swp0 master br0 && ip link set swp0 up ip link add link swp0 name swp0.100 type vlan id 100 ip link set swp0.100 up && ip addr add 192.168.100.1/24 dev swp0.100 Traffic through swp0.100 is broken, because the bridge turns on VLAN filtering in the swp0 port (causing RX packets to be classified to the FDB database corresponding to the VID from their 802.1Q header), and although the 8021q module does call dev_uc_add() towards the real device, that API is VLAN-unaware, so it only contains the MAC address, not the VID; and DSA's current implementation of ndo_set_rx_mode() is only for VID 0 (corresponding to FDB entries which are installed in an FDB database which is only hit when the port is VLAN-unaware). It's interesting to understand why the bridge does not turn on IFF_PROMISC for its swp0 bridge port, and it may appear at first glance that this is a regression caused by the logic in commit 2796d0c648c9 ("bridge: Automatically manage port promiscuous mode."). After all, a bridge port needs to have IFF_PROMISC by its very nature - it needs to receive and forward frames with a MAC DA different from the bridge ports' MAC addresses. While that may be true, when the bridge is VLAN-aware *and* it has a single port, there is no real reason to enable promiscuity even if that is an automatic port, with flooding and learning (there is nowhere for packets to go except to the BR_FDB_LOCAL entries), and this is how the corner case appears. Adding a second automatic interface to the bridge would make swp0 promisc as well, and would mask the corner case. Given the dev_uc_add() / ndo_set_rx_mode() API is what it is (it doesn't pass a VLAN ID), the only way to address that problem is to install host FDB entries for the cartesian product of RX filtering MAC addresses and VLAN RX filters. Fixes: 7569459a52c9 ("net: dsa: manage flooding on the CPU ports") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20230329151821.745752-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-21net: dsa: tag_brcm: legacy: fix daisy-chained switchesÁlvaro Fernández Rojas1-2/+8
When BCM63xx internal switches are connected to switches with a 4-byte Broadcom tag, it does not identify the packet as VLAN tagged, so it adds one based on its PVID (which is likely 0). Right now, the packet is received by the BCM63xx internal switch and the 6-byte tag is properly processed. The next step would to decode the corresponding 4-byte tag. However, the internal switch adds an invalid VLAN tag after the 6-byte tag and the 4-byte tag handling fails. In order to fix this we need to remove the invalid VLAN tag after the 6-byte tag before passing it to the 4-byte tag decoding. Fixes: 964dbf186eaa ("net: dsa: tag_brcm: add support for legacy tags") Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20230319095540.239064-1-noltari@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-20net: dsa: report rx_bytes unadjusted for ETH_HLENVladimir Oltean1-1/+1
We collect the software statistics counters for RX bytes (reported to /proc/net/dev and to ethtool -S $dev | grep 'rx_bytes: ") at a time when skb->len has already been adjusted by the eth_type_trans() -> skb_pull_inline(skb, ETH_HLEN) call to exclude the L2 header. This means that when connecting 2 DSA interfaces back to back and sending 1 packet with length 100, the sending interface will report tx_bytes as incrementing by 100, and the receiving interface will report rx_bytes as incrementing by 86. Since accounting for that in scripts is quirky and is something that would be DSA-specific behavior (requiring users to know that they are running on a DSA interface in the first place), the proposal is that we treat it as a bug and fix it. This design bug has always existed in DSA, according to my analysis: commit 91da11f870f0 ("net: Distributed Switch Architecture protocol support") also updates skb->dev->stats.rx_bytes += skb->len after the eth_type_trans() call. Technically, prior to Florian's commit a86d8becc3f0 ("net: dsa: Factor bottom tag receive functions"), each and every vendor-specific tagging protocol driver open-coded the same bug, until the buggy code was consolidated into something resembling what can be seen now. So each and every driver should have its own Fixes: tag, because of their different histories until the convergence point. I'm not going to do that, for the sake of simplicity, but just blame the oldest appearance of buggy code. There are 2 ways to fix the problem. One is the obvious way, and the other is how I ended up doing it. Obvious would have been to move dev_sw_netstats_rx_add() one line above eth_type_trans(), and below skb_push(skb, ETH_HLEN). But DSA processing is not as simple as that. We count the bytes after removing everything DSA-related from the packet, to emulate what the packet's length was, on the wire, when the user port received it. When eth_type_trans() executes, dsa_untag_bridge_pvid() has not run yet, so in case the switch driver requests this behavior - commit 412a1526d067 ("net: dsa: untag the bridge pvid from rx skbs") has the details - the obvious variant of the fix wouldn't have worked, because the positioning there would have also counted the not-yet-stripped VLAN header length, something which is absent from the packet as seen on the wire (there it may be untagged, whereas software will see it as PVID-tagged). Fixes: f613ed665bb3 ("net: dsa: Add support for 64-bit statistics") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-16net: dsa: don't error out when drivers return ETH_DATA_LEN in .port_max_mtu()Vladimir Oltean1-4/+5
Currently, when dsa_slave_change_mtu() is called on a user port where dev->max_mtu is 1500 (as returned by ds->ops->port_max_mtu()), the code will stumble upon this check: if (new_master_mtu > mtu_limit) return -ERANGE; because new_master_mtu is adjusted for the tagger overhead but mtu_limit is not. But it would be good if the logic went through, for example if the DSA master really depends on an MTU adjustment to accept DSA-tagged frames. To make the code pass through the check, we need to adjust mtu_limit for the overhead as well, if the minimum restriction was caused by the DSA user port's MTU (dev->max_mtu). A DSA user port MTU and a DSA master MTU are always offset by the protocol overhead. Currently no drivers return 1500 .port_max_mtu(), but this is only temporary and a bug in itself - mv88e6xxx should have done that, but since commit b9c587fed61c ("dsa: mv88e6xxx: Include tagger overhead when setting MTU for DSA and CPU ports") it no longer does. This is a preparation for fixing that. Fixes: bfcb813203e6 ("net: dsa: configure the MTU for switch ports") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-03net: dsa: use NL_SET_ERR_MSG_WEAK_MOD() more consistentlyVladimir Oltean1-5/+6
Now that commit 028fb19c6ba7 ("netlink: provide an ability to set default extack message") provides a weak function that doesn't override an existing extack message provided by the driver, it makes sense to use it also for LAG and HSR offloading, not just for bridge offloading. Also consistently put the message string on a separate line, to reduce line length from 92 to 84 characters. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20230202140354.3158129-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-02net: dsa: Use sysfs_emit() to instead of sprintf()Bo Liu1-1/+1
Follow the advice of the Documentation/filesystems/sysfs.rst and show() should only use sysfs_emit() or sysfs_emit_at() when formatting the value to be returned to user space. Signed-off-by: Bo Liu <liubo03@inspur.com> Link: https://lore.kernel.org/r/20230201081438.3151-1-liubo03@inspur.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-01netlink: provide an ability to set default extack messageLeon Romanovsky2-6/+2
In netdev common pattern, extack pointer is forwarded to the drivers to be filled with error message. However, the caller can easily overwrite the filled message. Instead of adding multiple "if (!extack->_msg)" checks before any NL_SET_ERR_MSG() call, which appears after call to the driver, let's add new macro to common code. [1] https://lore.kernel.org/all/Y9Irgrgf3uxOjwUm@unreal Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/6993fac557a40a1973dfa0095107c3d03d40bec1.1675171790.git.leon@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-23net: dsa: microchip: enable port queues for tc mqprioArun Ramadoss1-0/+15
LAN937x family of switches has 8 queues per port where the KSZ switches has 4 queues per port. By default, only one queue per port is enabled. The queues are configurable in 2, 4 or 8. This patch add 8 number of queues for LAN937x and 4 for other switches. In the tag_ksz.c file, prioirty of the packet is queried using the skb buffer and the corresponding value is updated in the tag. Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-23net: dsa: add plumbing for changing and getting MAC merge layer stateVladimir Oltean1-0/+37
The DSA core is in charge of the ethtool_ops of the net devices associated with switch ports, so in case a hardware driver supports the MAC merge layer, DSA must pass the callbacks through to the driver. Add support for precisely that. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-13net: dsa: microchip: ptp: move pdelay_rsp correction field to tail tagChristian Eggers1-1/+28
For PDelay_Resp messages we will likely have a negative value in the correction field. The switch hardware cannot correctly update such values (produces an off by one error in the UDP checksum), so it must be moved to the time stamp field in the tail tag. Format of the correction field is 48 bit ns + 16 bit fractional ns. After updating the correction field, clone is no longer required hence it is freed. Signed-off-by: Christian Eggers <ceggers@arri.de> Co-developed-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-13net: dsa: microchip: ptp: add packet transmission timestampingChristian Eggers1-3/+51
This patch adds the routines for transmission of ptp packets. When the ptp pdelay_req packet to be transmitted, it uses the deferred xmit worker to schedule the packets. During irq_setup, interrupt for Sync, Pdelay_req and Pdelay_rsp are enabled. So interrupt is triggered for all three packets. But for p2p1step, we require only time stamp of Pdelay_req packet. Hence to avoid posting of the completion from ISR routine for Sync and Pdelay_resp packets, ts_en flag is introduced. This controls which packets need to processed for timestamp. After the packet is transmitted, ISR is triggered. The time at which packet transmitted is recorded to separate register. This value is reconstructed to absolute time and posted to the user application through socket error queue. Signed-off-by: Christian Eggers <ceggers@arri.de> Co-developed-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-13net: dsa: microchip: ptp: add packet reception timestampingChristian Eggers1-6/+19
Rx Timestamping is done through 4 additional bytes in tail tag. Whenever the ptp packet is received, the 4 byte hardware time stamped value is added before 1 byte tail tag. Also, bit 7 in tail tag indicates it as PTP frame. This 4 byte value is extracted from the tail tag and reconstructed to absolute time and assigned to skb hwtstamp. If the packet received in PDelay_Resp, then partial ingress timestamp is subtracted from the correction field. Since user space tools expects to be done in hardware. Signed-off-by: Christian Eggers <ceggers@arri.de> Co-developed-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-13net: dsa: microchip: ptp: add 4 bytes in tail tag when ptp enabledArun Ramadoss1-7/+88
When the PTP is enabled in hardware bit 6 of PTP_MSG_CONF1 register, the transmit frame needs additional 4 bytes before the tail tag. It is needed for all the transmission packets irrespective of PTP packets or not. The 4-byte timestamp field is 0 for frames other than Pdelay_Resp. For the one-step Pdelay_Resp, the switch needs the receive timestamp of the Pdelay_Req message so that it can put the turnaround time in the correction field. Since PTP has to be enabled for both Transmission and reception timestamping, driver needs to track of the tx and rx setting of the all the user ports in the switch. Two flags hw_tx_en and hw_rx_en are added in ksz_port to track the timestampping setting of each port. When any one of ports has tx or rx timestampping enabled, bit 6 of PTP_MSG_CONF1 is set and it is indicated to tag_ksz.c through tagger bytes. This flag adds 4 additional bytes to the tail tag. When tx and rx timestamping of all the ports are disabled, then 4 bytes are not added. Tested using hwstamp -i <interface> Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> # mostly api Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netPaolo Abeni1-1/+10
Merge in the left-over fixes before the net-next pull-request. net/mptcp/subflow.c d3295fee3c75 ("mptcp: use proper req destructor for IPv6") 36b122baf6a8 ("mptcp: add subflow_v(4,6)_send_synack()") Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-12-12net: dsa: tag_8021q: avoid leaking ctx on dsa_tag_8021q_register() error pathVladimir Oltean1-1/+10
If dsa_tag_8021q_setup() fails, for example due to the inability of the device to install a VLAN, the tag_8021q context of the switch will leak. Make sure it is freed on the error path. Fixes: 328621f6131f ("net: dsa: tag_8021q: absorb dsa_8021q_setup into dsa_tag_8021q_{,un}register") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20221209235242.480344-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-12net: dsa: don't call ptp_classify_raw() if switch doesn't provide RX ↵Vladimir Oltean1-4/+4
timestamping ptp_classify_raw() is not exactly cheap, since it invokes a BPF program for every skb in the receive path. For switches which do not provide ds->ops->port_rxtstamp(), running ptp_classify_raw() provides precisely nothing, so check for the presence of the function pointer first, since that is much cheaper. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Link: https://lore.kernel.org/r/20221209175840.390707-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski3-3/+6
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-02net: dsa: sja1105: Check return valueArtem Chernyshev1-1/+2
Return NULL if we got unexpected value from skb_trim_rcsum() in sja1110_rcv_inband_control_extension() Fixes: 4913b8ebf8a9 ("net: dsa: add support for the SJA1110 native tagging protocol") Signed-off-by: Artem Chernyshev <artem.chernyshev@red-soft.ru> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20221201140032.26746-3-artem.chernyshev@red-soft.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-02net: dsa: hellcreek: Check return valueArtem Chernyshev1-1/+2
Return NULL if we got unexpected value from skb_trim_rcsum() in hellcreek_rcv() Fixes: 01ef09caad66 ("net: dsa: Add tag handling for Hirschmann Hellcreek switches") Signed-off-by: Artem Chernyshev <artem.chernyshev@red-soft.ru> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Link: https://lore.kernel.org/r/20221201140032.26746-2-artem.chernyshev@red-soft.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-02net: dsa: ksz: Check return valueArtem Chernyshev1-1/+2
Return NULL if we got unexpected value from skb_trim_rcsum() in ksz_common_rcv() Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: bafe9ba7d908 ("net: dsa: ksz: Factor out common tag code") Signed-off-by: Artem Chernyshev <artem.chernyshev@red-soft.ru> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20221201140032.26746-1-artem.chernyshev@red-soft.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-22net: dsa: kill off dsa_priv.hVladimir Oltean8-25/+14
The last remnants in dsa_priv.h are a netlink-related definition for which we create a new header, and DSA_MAX_NUM_OFFLOADING_BRIDGES which is only used from dsa.c, so move it there. Some inclusions need to be adjusted now that we no longer have headers included transitively from dsa_priv.h. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-22net: dsa: move tag_8021q headers to their proper placeVladimir Oltean7-8/+46
tag_8021q definitions are all over the place. Some are exported to linux/dsa/8021q.h (visible by DSA core, taggers, switch drivers and everyone else), and some are in dsa_priv.h. Move the structures that don't need external visibility into tag_8021q.c, and the ones which don't need the world or switch drivers to see them into tag_8021q.h. We also have the tag_8021q.h inclusion from switch.c, which is basically the entire reason why tag_8021q.c was built into DSA in commit 8b6e638b4be2 ("net: dsa: build tag_8021q.c as part of DSA core"). I still don't know how to better deal with that, so leave it alone. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-22net: dsa: move definitions from dsa_priv.h to slave.cVladimir Oltean2-42/+42
There are some definitions in dsa_priv.h which are only used from slave.c. So move them to slave.c. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-22net: dsa: rename dsa2.c back into dsa.c and create its headerVladimir Oltean8-31/+46
The previous change moved the code into the larger file (dsa2.c) to minimize the delta. Rename that now to dsa.c, and create dsa.h, where all related definitions from dsa_priv.h go. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-22net: dsa: merge dsa.c into dsa2.cVladimir Oltean4-238/+221
There is no longer a meaningful distinction between what goes into dsa2.c and what goes into dsa.c. Merge the 2 into a single file. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-22net: dsa: move notifier definitions to switch.hVladimir Oltean2-105/+108
Reduce bloat in dsa_priv.h by moving the cross-chip notifier data structures to switch.h. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-22net: dsa: move dsa_tree_notify() and dsa_broadcast() to switch.cVladimir Oltean4-48/+50
There isn't an intuitive place for these 2 cross-chip notifier functions according to the function-to-file classification based on names (dsa_switch_*() goes to switch.c), but I consider these to be part of the cross-chip notifier handling, therefore part of switch.c. Move them there to reduce bloat in dsa2.c (the place where all code with no better place to go goes). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-22net: dsa: move headers exported by switch.c to switch.hVladimir Oltean6-4/+15
Reduce code bloat in dsa_priv.h by moving the prototypes exported by switch.h into their own header file. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-22net: dsa: move tagging protocol code to tag.{c,h}Vladimir Oltean27-541/+581
It would be nice if tagging protocol drivers could include just the header they need, since they are (mostly) data path and isolated from most of the other DSA core code does. Create a tag.c and a tag.h file which are meant to support tagging protocol drivers. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-22net: dsa: move headers exported by slave.c to slave.hVladimir Oltean8-57/+77
Minimize the use of the bloated dsa_priv.h by moving the prototypes exported by slave.c to their own header file. This is just approximate to get the code structure right. There are some interdependencies with static inline code left in dsa_priv.h, so leave slave.h included from there for now. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-22net: dsa: move headers exported by master.c to master.hVladimir Oltean5-9/+27
Minimize the use of the bloated dsa_priv.h by moving the prototypes exported by master.c to their own header file. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-22net: dsa: move headers exported by port.c to port.hVladimir Oltean8-97/+120
Minimize the use of the bloated dsa_priv.h by moving the prototypes exported by port.c to their own header file. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-22net: dsa: move rest of devlink setup/teardown to devlink.cVladimir Oltean3-20/+49
The code that needed further refactoring into dedicated functions in dsa2.c was left aside. Move it now to devlink.c, and make dsa2.c stop including net/devlink.h. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-22net: dsa: if ds->setup is true, ds->devlink is always non-NULLVladimir Oltean1-7/+5
Simplify dsa_switch_teardown() to remove the NULL checking for ds->devlink. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-22net: dsa: move bulk of devlink code to devlink.{c,h}Vladimir Oltean5-346/+370
dsa.c and dsa2.c are bloated with too much off-topic code. Identify all code related to devlink and move it to a new devlink.c file. Steer clear of the dsa_priv.h dumping ground antipattern and create a dedicated devlink.h for it, which will be included only by the C files which need it. Usage of dsa_priv.h will be minimized in later patches. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-22net: dsa: modularize DSA_TAG_PROTO_NONEVladimir Oltean5-22/+37
There is no reason that I can see why the no-op tagging protocol should be registered manually, so make it a module and make all drivers which have any sort of reference to DSA_TAG_PROTO_NONE select it. Note that I don't know if ksz_get_tag_protocol() really needs this, or if it's just the logic which is poorly written. All switches seem to have their own tagging protocol, and DSA_TAG_PROTO_NONE is just a fallback that never gets used. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-22net: dsa: unexport dsa_dev_to_net_device()Vladimir Oltean2-1/+2
dsa.o and dsa2.o are linked into the same dsa_core.o, there is no reason to export this symbol when its only caller is local. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-18net: dsa: tag_mtk: assign per-port queuesFelix Fietkau1-0/+2
Keeps traffic sent to the switch within link speed limits Signed-off-by: Felix Fietkau <nbd@nbd.name> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20221116080734.44013-6-nbd@nbd.name Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-17net: dsa: autoload tag driver module on tagging protocol changeVladimir Oltean4-5/+7
Issue a request_module() call when an attempt to change the tagging protocol is made, either by sysfs or by device tree. In the case of ocelot (the only driver for which the default and the alternative tagging protocol are compiled as different modules), the user is now no longer required to insert tag_ocelot_8021q.ko manually. In the particular case of ocelot, this solves a problem where tag_ocelot_8021q.ko is built as module, and this is present in the device tree: &mscc_felix_port4 { dsa-tag-protocol = "ocelot-8021q"; }; &mscc_felix_port5 { dsa-tag-protocol = "ocelot-8021q"; }; Because no one attempts to load the module into the kernel at boot time, the switch driver will fail to probe (actually forever defer) until someone manually inserts tag_ocelot_8021q.ko. This is now no longer necessary and happens automatically. Rename dsa_find_tagger_by_name() to denote the change in functionality: there is now feature parity with dsa_tag_driver_get_by_id(), i.o.w. we also load the module if it's missing. Link: https://lore.kernel.org/lkml/20221027113248.420216-1-michael@walle.cc/ Suggested-by: Michael Walle <michael@walle.cc> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Michael Walle <michael@walle.cc> # on kontron-sl28 w/ ocelot_8021q Tested-by: Michael Walle <michael@walle.cc> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-17net: dsa: rename dsa_tag_driver_get() to dsa_tag_driver_get_by_id()Vladimir Oltean3-3/+3
A future patch will introduce one more way of getting a reference on a tagging protocl driver (by name). Rename the current method to "by_id". Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Michael Walle <michael@walle.cc> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-17net: dsa: strip sysfs "tagging" string of trailing newlineVladimir Oltean3-4/+15
Currently, dsa_find_tagger_by_name() uses sysfs_streq() which works both with strings that contain \n at the end (echo ocelot > .../dsa/tagging) and with strings that don't (printf ocelot > .../dsa/tagging). There will be a problem once we'll want to construct the modalias string based on which we auto-load the protocol kernel module. If the sysfs buffer ends in a newline, we need to strip it first. This is a preparatory patch specifically for that. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Michael Walle <michael@walle.cc> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-17net: dsa: provide a second modalias to tag proto drivers based on their nameVladimir Oltean18-51/+98
Currently, tagging protocol drivers have a modalias of "dsa_tag:id-<number>", where the number is one of DSA_TAG_PROTO_*_VALUE. This modalias makes it possible for the request_module() call in dsa_tag_driver_get() to work, given the input it has - an integer returned by ds->ops->get_tag_protocol(). It is also possible to change tagging protocols at (pseudo-)runtime, via sysfs or via device tree, and this works via the name string of the tagging protocol rather than via its id (DSA_TAG_PROTO_*_VALUE). In the latter case, there is no request_module() call, because there is no association that the DSA core has between the string name and the ID, to construct the modalias. The module is simply assumed to have been inserted. This is actually slightly problematic when the tagging protocol change should take place at probe time, since it's expected that the dependency module should get autoloaded. For this purpose, let's introduce a second modalias, so that the DSA core can call request_module() by name. There is no reason to make the modalias by name optional, so just modify the MODULE_ALIAS_DSA_TAG_DRIVER() macro to take both the ID and the name as arguments, and generate two modaliases behind the scenes. Suggested-by: Michael Walle <michael@walle.cc> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Michael Walle <michael@walle.cc> # on kontron-sl28 w/ ocelot_8021q Tested-by: Michael Walle <michael@walle.cc> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-17net: dsa: rename tagging protocol driver modaliasVladimir Oltean2-4/+5
It's autumn cleanup time, and today's target are modaliases. Michael says that for users of modinfo, "dsa_tag-20" is not the most suggestive name, and recommends a change to "dsa_tag-id-20". Andrew points out that other modaliases have a prefix delimited by colons, so he recommends "dsa_tag:20" instead of "dsa_tag-20". To satisfy both proposals, Florian recommends "dsa_tag:id-20". The modaliases are not stable ABI, and the essential information (protocol ID) is still conveyed in the new string, which request_module() must be adapted to form. Link: 20221027210830.3577793-1-vladimir.oltean@nxp.com Suggested-by: Andrew Lunn <andrew@lunn.ch> Suggested-by: Michael Walle <michael@walle.cc> Suggested-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Michael Walle <michael@walle.cc> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-17net: dsa: stop exposing tag proto module helpers to the worldVladimir Oltean1-0/+70
The DSA tagging protocol driver macros are in the public include/net/dsa.h probably because that's also where the DSA_TAG_PROTO_*_VALUE macros are (MODULE_ALIAS_DSA_TAG_DRIVER hinges on those macro definitions). But there is no reason to expose these helpers to <net/dsa.h>. That header is shared between switch drivers (drivers/net/dsa/), tagging protocol drivers (net/dsa/tag_*.c), the DSA core (net/dsa/ sans tag_*.c), and the rest of the world (DSA master drivers, network stack, etc). Too much exposure. On the other hand, net/dsa/dsa_priv.h is included only by the DSA core and by DSA tagging protocol drivers (or IOW, "friend" modules). Also a bit too much exposure - I've contemplated creating a new header which is only included by tagging protocol drivers, but completely separating a new dsa_tag_proto.h from dsa_priv.h is not immediately trivial - for example dsa_slave_to_port() is used both from the fast path and from the control path. So for now, move these definitions to dsa_priv.h which at least hides them from the world. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Michael Walle <michael@walle.cc> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-17net: dsa: set name_assign_type to NET_NAME_ENUM for enumerated user portsRasmus Villemoes1-1/+1
When a user port does not have a label in device tree, and we thus fall back to the eth%d scheme, the proper constant to use is NET_NAME_ENUM. See also commit e9f656b7a214 ("net: ethernet: set default assignment identifier to NET_NAME_ENUM"), which in turn quoted commit 685343fc3ba6 ("net: add name_assign_type netdev attribute"): ... when the kernel has given the interface a name using global device enumeration based on order of discovery (ethX, wlanY, etc) ... are labelled NET_NAME_ENUM. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.faineli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-17net: dsa: use NET_NAME_PREDICTABLE for user ports with name given in DTRasmus Villemoes1-1/+1
When a user port has a label in device tree, the corresponding netdevice is, to quote include/uapi/linux/netdevice.h, "predictably named by the kernel". This is also explicitly one of the intended use cases for NET_NAME_PREDICTABLE, quoting 685343fc3ba6 ("net: add name_assign_type netdev attribute"): NET_NAME_PREDICTABLE: The ifname has been assigned by the kernel in a predictable way [...] Examples include [...] and names deduced from hardware properties (including being given explicitly by the firmware). Expose that information properly for the benefit of userspace tools that make decisions based on the name_assign_type attribute, e.g. a systemd-udev rule with "kernel" in NamePolicy. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.faineli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-17net: dsa: refactor name assignment for user portsRasmus Villemoes2-5/+11
The following two patches each have a (small) chance of causing regressions for userspace and will in that case of course need to be reverted. In order to prepare for that and make those two patches independent and individually revertable, refactor the code which sets the names for user ports by moving the "fall back to eth%d if no label is given in device tree" to dsa_slave_create(). No functional change (at least none intended). Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.faineli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski4-2/+28
include/linux/bpf.h 1f6e04a1c7b8 ("bpf: Fix offset calculation error in __copy_map_value and zero_map_value") aa3496accc41 ("bpf: Refactor kptr_off_tab into btf_record") f71b2f64177a ("bpf: Refactor map->off_arr handling") https://lore.kernel.org/all/20221114095000.67a73239@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-15net: dsa: don't leak tagger-owned storage on switch driver unbindVladimir Oltean1-0/+10
In the initial commit dc452a471dba ("net: dsa: introduce tagger-owned storage for private and shared data"), we had a call to tag_ops->disconnect(dst) issued from dsa_tree_free(), which is called at tree teardown time. There were problems with connecting to a switch tree as a whole, so this got reworked to connecting to individual switches within the tree. In this process, tag_ops->disconnect(ds) was made to be called only from switch.c (cross-chip notifiers emitted as a result of dynamic tag proto changes), but the normal driver teardown code path wasn't replaced with anything. Solve this problem by adding a function that does the opposite of dsa_switch_setup_tag_protocol(), which is called from the equivalent spot in dsa_switch_teardown(). The positioning here also ensures that we won't have any use-after-free in tagging protocol (*rcv) ops, since the teardown sequence is as follows: dsa_tree_teardown -> dsa_tree_teardown_master -> dsa_master_teardown -> unsets master->dsa_ptr, making no further packets match the ETH_P_XDSA packet type handler -> dsa_tree_teardown_ports -> dsa_port_teardown -> dsa_slave_destroy -> unregisters DSA net devices, there is even a synchronize_net() in unregister_netdevice_many() -> dsa_tree_teardown_switches -> dsa_switch_teardown -> dsa_switch_teardown_tag_protocol -> finally frees the tagger-owned storage Fixes: 7f2973149c22 ("net: dsa: make tagging protocols connect to individual switches from a tree") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Saeed Mahameed <saeed@kernel.org> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20221114143551.1906361-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-15net: dsa: remove phylink_validate() methodVladimir Oltean1-10/+8
As of now, no DSA driver uses a custom link mode validation procedure anymore. So remove this DSA operation and let phylink determine what is supported based on config->mac_capabilities (if provided by the driver). Leave a comment why we left the code that we did, and that there is more work to do. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-15net: dsa: add support for DSA rx offloading via metadata dstFelix Fietkau1-1/+18
If a metadata dst is present with the type METADATA_HW_PORT_MUX on a dsa cpu port netdev, assume that it carries the port number and that there is no DSA tag present in the skb data. Signed-off-by: Felix Fietkau <nbd@nbd.name> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-14net: dsa: make dsa_master_ioctl() see through port_hwtstamp_get() shimsVladimir Oltean3-2/+18
There are multi-generational drivers like mv88e6xxx which have code like this: int mv88e6xxx_port_hwtstamp_get(struct dsa_switch *ds, int port, struct ifreq *ifr) { if (!chip->info->ptp_support) return -EOPNOTSUPP; ... } DSA wants to deny PTP timestamping on the master if the switch supports timestamping too. However it currently relies on the presence of the port_hwtstamp_get() callback to determine PTP capability, and this clearly does not work in that case (method is present but returns -EOPNOTSUPP). We should not deny PTP on the DSA master for those switches which truly do not support hardware timestamping. Create a dsa_port_supports_hwtstamp() method which actually probes for support by calling port_hwtstamp_get() and seeing whether that returned -EOPNOTSUPP or not. Fixes: f685e609a301 ("net: dsa: Deny PTP on master if switch supports it") Link: https://patchwork.kernel.org/project/netdevbpf/patch/20221110124345.3901389-1-festevam@gmail.com/ Reported-by: Fabio Estevam <festevam@gmail.com> Reported-by: Steffen Bätz <steffen@innosonix.de> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Fabio Estevam <festevam@denx.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-03net: remove unused ndo_get_devlink_portJiri Pirko1-8/+0
Remove ndo_get_devlink_port which is no longer used alongside with the implementations in drivers. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-03net: make drivers to use SET_NETDEV_DEVLINK_PORT to set devlink_portJiri Pirko2-9/+1
Benefit from the previously implemented tracking of netdev events in devlink code and instead of calling devlink_port_type_eth_set() and devlink_port_type_clear() to set devlink port type and link to related netdev, use SET_NETDEV_DEVLINK_PORT() macro to assign devlink_port pointer to netdevice which is about to be registered. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-3/+10
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-28net: dsa: fall back to default tagger if we can't load the one from DTVladimir Oltean1-3/+10
DSA tagging protocol drivers can be changed at runtime through sysfs and at probe time through the device tree (support for the latter was added later). When changing through sysfs, it is assumed that the module for the new tagging protocol was already loaded into the kernel (in fact this is only a concern for Ocelot/Felix switches, where we have tag_ocelot.ko and tag_ocelot_8021q.ko; for every other switch, the default and alternative protocols are compiled within the same .ko, so there is nothing for the user to load). The kernel cannot currently call request_module(), because it has no way of constructing the modalias name of the tagging protocol driver ("dsa_tag-%d", where the number is one of DSA_TAG_PROTO_*_VALUE). The device tree only contains the string name of the tagging protocol ("ocelot-8021q"), and the only mapping between the string and the DSA_TAG_PROTO_OCELOT_8021Q_VALUE is present in tag_ocelot_8021q.ko. So this is a chicken-and-egg situation and dsa_core.ko has nothing based on which it can automatically request the insertion of the module. As a consequence, if CONFIG_NET_DSA_TAG_OCELOT_8021Q is built as module, the switch will forever defer probing. The long-term solution is to make DSA call request_module() somehow, but that probably needs some refactoring. What we can do to keep operating with existing device tree blobs is to cancel the attempt to change the tagging protocol with the one specified there, and to remain operating with the default one. Depending on the situation, the default protocol might still allow some functionality (in the case of ocelot, it does), and it's better to have that than to fail to probe. Fixes: deff710703d8 ("net: dsa: Allow default tag protocol to be overridden from DT") Link: https://lore.kernel.org/lkml/20221027113248.420216-1-michael@walle.cc/ Reported-by: Heiko Thiery <heiko.thiery@gmail.com> Reported-by: Michael Walle <michael@walle.cc> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Michael Walle <michael@walle.cc> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20221027145439.3086017-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-28net: Remove the obsolte u64_stats_fetch_*_irq() users (net).Thomas Gleixner1-2/+2
Now that the 32bit UP oddity is gone and 32bit uses always a sequence count, there is no need for the fetch_irq() variants anymore. Convert to the regular interface. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-15net: dsa: uninitialized variable in dsa_slave_netdevice_event()Dan Carpenter1-1/+1
Return zero if both dsa_slave_dev_check() and netdev_uses_dsa() are false. Fixes: acc43b7bf52a ("net: dsa: allow masters to join a LAG") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>