Packet delay during PROFINET realtime communication

asked 2022-03-20 14:46:34 +0000

81 ●1 ●21 ●11

updated 2022-03-20 14:50:34 +0000

Hey,

We're having a recurring issue with a PROFINET communication between a main CPU and different clients. The main CPU receives data from the clients every 2 ms and sends back an acknowledge for each received packet. If the client does not get the acknowlege within a defined timeout (in this case 12 ms), it terminates the communication by sending an alert.

The captures were made by a Siemens technician. He configured a span port on the PROFINET switch where the CPU is connected to. For the client capture he installed another PROFINET switch between the client and our Cisco access switch. Both captures are showing the traffic of the uplink port to our Cisco environment.

The capture from the CPU side looks good. The CPU receives data packets from different clients and acknowleges them immediatly.

The capture from the client side shows, that the acknowleges from the CPU are missing at some certain point. After 12 ms the client sends the alert and 6 ms later all missing acknowleges from the CPU are comming in, including the response to the clients alert.

It's always this behaviour, but not for all clients at the same time. It happens randomly for each of the clients. We've already raised the timeout for the communication, and this just leads into more missing acknowledges until the client sends it alert.

The CPU and the clients are in the same VLAN. The CPU is connected to a PROFINET switch, which has a 1 Gbit/s copper uplink to one of our Cisco access switches. The clients are connected directly to different Cisco access switches. All Cisco access switches are having a 2 x 1 Gbit/s LACP uplink to the core switches in the data center. The RTT between Client and CPU is 1 ms.

It happens only during the working hours, so it sounds like an bandwidth issue on one of the uplinks (probably on the CPU side), and that the switch queues the small packets until the uplink is free again. But then I would assume, that the communication to the other PROFINET clients would terminate as well. And why are all the missing packets are comming in as soon as the alert was sent? There are also no other known delay issues at the employees clients which are connected to the access switches.

FYI: Due to the fact that the PROFILINK frames are sometimes too small for the Ethernet network, the access switch ports for the devices are configured as trunk port (with native and allowed VLAN = PROFINET VLAN) to add the VLAN header to all the frames. According to Cisco this is a known workaround for PROFINET communication over non-PROFINET switches.

CPU: ac:64:17:cf:44:cf
Client: 10:df:fc:e3:20:84
Other clients: 10:df:fc:6c:5e:18, 10:df:fc:70:31:56, 10:df:fc:e0:62:4e, ac:64:17:84:2c:eb

CPU Capture Client Capture

Any ideas?

Thank ... (more)

answered 2022-03-20 20:51:26 +0000

Christian_R

2059 ●11 ●74 ●51 http://crnetpackets.com

Yes you are right, the CPU-acknowledges are queued in the network (my best guess is somewhere on the LACP). You can see it in the CPU trace that the packets are sent out immediately, but on the client trace the came in after a delay. It is by coincidence, that they came with the alarm message.

So I would try to have a deeper look at the statistics of the switches, that you might find a indication of queuing. Maybe you can tune your QOS-buffers.

Root-cause could be Microburst or an issue in LACP handling.

edit flag offensive delete link

Comments

Coat-tailing on Christain R's answer,

Are the access and core switches also used for non-PROFINET traffic?

What type of utilization are you seeing on the uplinks during working hours?

Has any engineering been put in place to prioritize the PROFINET traffic over any other traffic that might need to concurrently traverse the LACP uplinks between the access and the core switches?

My understanding is that PROFINET traffic generally needs to be marked at L2 (or the vlan be explicitly configured some way) in order to insure the traffic has higher precedence for priority queue processing.

The two captures do not have any 802.1q vlan headers and therefore don't reveal what, if any, 802.1p priorities these frames may have been sent with.

If there were 802.1q headers were they stripped as a side effect of the span configuration used for the packet capture or perhaps stripped ...(more)

Jim Young ( 2022-03-21 06:12:59 +0000 )edit

Thank you guys for your responses.

802.1q is only enabled for our voice VLANs. We can enable it for the PROFINET traffic as well, but I would like to know which kind of traffic fully utilize the uplink. I will capture the traffic on the uplinks to check if there are any microburst or other issues on it. Will let you know if I found something.

JasMan ( 2022-03-21 06:56:43 +0000 )edit

FYI: Due to the fact that the PROFILINK frames are sometimes too small for the Ethernet network, the access switch ports for the devices are configured as trunk port (with native and allowed VLAN PROFINET VLAN) to add the VLAN header to all the frames. According to Cisco this is a known workaround for PROFINET communication over non-PROFINET switches.

Hard to believe and never heard before. In the trace all frames where 64 Bytes (with not traced FCS) long.

Christian_R ( 2022-03-21 19:49:54 +0000 )edit

@Christian_R I can't find any officiel documentation about this workaround, but here's a thread about it: https://community.cisco.com/t5/switch...

I hope we've finally found the reason for the issue. On some access ports for the Profinet devices the "allowed VLAN xxx" command was missing. Therefore all broad- and multicasts of all VLANs were send to the PROFINET device.

And when our print spooler sends a print job to a network printer, which hasn't sent/received packets in the last 5 minutes, the traffic is blown out on all switch ports because of the timed-out mac address table entry. And that was the point where the PROFINET alert happened. Seems that the devices don't like too much incoming traffic.

We've added the missing "allowed VLAN xxx" command to each port yesterday, and we hadn't any outages since then anymore.

A really ...(more)

JasMan ( 2022-03-29 12:27:54 +0000 )edit

The article states that the Profinet Frames comes with a pre tagged VLAN0 Header. And this could cause some issues, depends on how sensitive your switch firmware is(if is not allowed to have tagged frames on an access port). But they do it, as they want to advertise the correct Prio not to make the frame longer.

Christian_R ( 2022-03-29 19:55:43 +0000 )edit

add a comment

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Packet delay during PROFINET realtime communication

1 Answer

Comments

Your Answer

Question Tools

Stats

Related questions

Packet delay during PROFINET realtime communication edit

1 Answer

Comments

Your Answer

Question Tools

Stats

Related questions

Packet delay during PROFINET realtime communication