Remove the limit of MAX_PORTS ports (default 8) and keep the ports in
a linked list. This allows ptp4l to be used on large machines and in the
future, it will allow dynamic adding and removing of ports while ptp4l is
running.
For this to work, pollfd needs to be dynamically allocated. Changed pollfd
handling from clock_install_fda/clock_remove_fda to notification
(clock_fda_changed), where the clock will rebuild pollfd by querying all its
ports.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
The fault timer file descriptor is a per port item, put it inside struct
port where other per port file descriptors are kept.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Split management message creation to more fine-grained functions to allow
notification messages to be created.
The new clock_management_fill_response is called from
clock_management_get_response (so the function behaves exactly the same as
before this patch) and from a new clock_notify_event function. The
difference is clock_management_get_response uses the request message to
construct the reply message, while clock_notify_event constructs the reply
message based on the notification id.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Split management message creation to more fine-grained functions to allow
notification messages to be created.
The new port_management_fill_response is called from
port_management_get_response (so the function behaves exactly the same
as before this patch) and from a new port_notify_event function. The
difference is port_management_get_response uses the request message to
construct the reply message, while port_notify_event constructs the
reply message based on the notification id.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
The callers of those functions are all using ptp_message. As we're going to
return more information (the address), let those functions just fill in the
ptp_message fields directly.
Some minor reshuffling needed to prevent circular header dependencies.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
The task of preparing the message for transmission and sending it appears
at many places. Unify them into a new function.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
With the new linreg servo the frequency offset and time offset are
controlled separately. The ratio between master's frequency and the
current frequency of the local clock is known and can be used when
calculating delay or peer delay to improve their accuracy.
This greatly improves the stability of the delay when the servo is
correcting a large offset.
Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
When peer delay is < min_neighbor_prop_delay the port is flagged
as non 802.1AS capable. min_neighbor_prop_delay defaults to -20ms.
Signed-off-by: Delio Brignoli <dbrignoli@audioscience.com>
Previouly the peer delay was not taking into account the
frequency offset between the local clock and the peer's clock.
Reset neighborRateRatio to 1.0 in port_nrate_initialize().
Signed-off-by: Delio Brignoli <dbrignoli@audioscience.com>
Commit e425da2f inadvertently enabled the announce timer on the UDS port,
causing it to continually reopen the socket when in slave mode. This patch
fixes the issue by passing zero in the 'span' field of the new function,
set_tmo_random, which disables the timer again.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Reported-by: Rohrer Hansjoerg <hj.rohrer@mobatime.com>
According to 802.1AS, ports are always expected to transmit announce
messages, even if they never want to become the grand master. Instead
of using a slave only BMC state machine as in 1588, 802.1AS offers a
"grand master capable" flag which allows clocks to not send sync
messages.
This patch keeps a port from transmitting sync (but not announce)
messages when there is no other master.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
[ RC: the point is that a port may not be considered capable until
enough messages to compute the ratio have been received. ]
Signed-off-by: Delio Brignoli <dbrignoli@audioscience.com>
Reviewed-by: Richard Cochran <richardcochran@gmail.com>
Sync rx timeout should be set only after receiving the first sync, see
section 10.2.7, figure 10-4 PortSyncSyncReceive state machine in 802.1AS
Signed-off-by: Delio Brignoli <dbrignoli@audioscience.com>
Add new options delay_filter and delay_filter_length to select the
filter and its length. They set both the clock delay filter and the port
peer delay filter. The default is now moving median with 10 samples.
Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
Similarly to the servo interface, allow multiple filters to be
used for delay filtering. Convert mave to the new interface.
Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
Instead of maintaining a table of precalculated values, use the
newly added set_tmo_random() function to set the delay request timeout.
It saves some memory and improves the timeout granularity, but has a
higher computational cost. It follows the requirements from section
9.5.11.2 of the spec.
Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
According to 9.2.6.11 of the spec the ANNOUNCE_RECEIPT_TIMEOUT_EXPIRES
timeout in addition to announceReceiptTimeoutInterval includes a random
number up to one announceInterval.
Add a new function for setting random timeout and use it in
port_set_announce_tmo().
Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
Check the sanity of the synchronized clock by comparing its uncorrected
frequency with the system monotonic clock. When the measured frequency
offset is larger than the value of the sanity_freq_limit option (20% by
default), a warning message will be printed and the servo will be reset.
Setting the option to zero disables the check.
This is useful to detect when the clock is broken or adjusted by another
program.
Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
When a new master appears, it will start to respond to our delay_req
messages. Make sure we process only responses from our current master
before switching to the new master.
Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
When ptp4l was configured to use the auto delay mechanism and the first
pdelay request was not received in the slave or uncalibrated state, it
would not make any pdelay requests itself, because there was no delay
timer running.
Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
Because of packet reordering that can occur in the network, in the
hardware, or in the networking stack, a follow up message can appear
to arrive in the application before the matching sync message. As this
is a normal occurrence, and the sequenceID message field ensures
proper matching, the ptp4l program accepts out of order packets.
This patch adds an additional check using the software time stamps
from the networking stack to verify that the sync message did arrive
first. This check is only useful if the sequence IDs generated by
the master might possibly be incorrect.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
This patch lets a port send the first announce message one millisecond
after the port state transition, rather than waiting one announce interval.
This change is needed because it is desirable to reconfigure the time
network without delay, especially in P2P mode.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
This patch adds a new timer for use in 802.1AS-2011 applications. When
running as a slave in gPTP mode, the program must monitor both announce
and sync messages from the master. If either one goes missing, then we
trigger a BMC election. The sync timeout is actually reset by a valid
sync/follow up pair of messages.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
This patch renames the per-port timer in order to make room in the
namespace for a timer that detects a sync message input timeout.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
The closing and reopening of the transport when in slave only mode is not
necessary if the port is using the peer delay mechanism. In that case, the
port will discover the network error by transmitting a peer delay request.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Acked-by: Delio Brignoli <dbrignoli@audioscience.com>
Ken Ichikawa has identified a situation in which a sync message can be
wrongly associated with a follow up after the sequence counter wraps
around.
Port is LISTENING
Sync (seqId 0) : ignored
Fup (seqId 0) : ignored
Sync (seqId 1) : ignored
Port becomes UNCALIBRATED here
Fup (seqId 1) : cached!
Sync (seqId 2) : cached
Fup (seqId 2) : match
Sync (seqId 3) : cached
Fup (seqId 3) : match
...
Sync (seqId 65535) : cached
Fup (seqId 65535) : match
Sync (seqId 0) : cached
Fup (seqId 0) : match
Sync (seqId 1) : match with old Fup!!
Fup (seqId 1) : cached!
Sync (seqId 2) : cached
Fup (seqId 2) : match
Actually, I experienced 65500 secs offset every about 65500 secs.
I'm thinking this is the cause.
This patch fixes the issue by changing the port code to remember one
sync or one follow up, never both. The previous ad hoc logic has been
replaced with a small state machine that handles the messages in the
proper order.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Reported-by: Ken ICHIKAWA <ichikawa.ken@jp.fujitsu.com>
This patch fixes a bug with time mysteriously jumping back and forth:
ptp4l[930.687]: port 1: UNCALIBRATED to SLAVE on MASTER_CLOCK_SELECTED
ptp4l[931.687]: master offset 17 s2 freq +33014 path delay 2728
ptp4l[932.687]: master offset -74 s2 freq +32928 path delay 2734
ptp4l[933.687]: master offset 2 s2 freq +32982 path delay 2734
ptp4l[934.687]: master offset -3 s2 freq +32977 path delay 2728
ptp4l[935.687]: master offset 17 s2 freq +32996 path delay 2729
ptp4l[936.687]: master offset -10 s2 freq +32974 path delay 2729
ptp4l[937.687]: master offset 35 s2 freq +33016 path delay 2727
ptp4l[938.686]: master offset 60001851388 s2 freq +62499999 path delay 2728
ptp4l[939.687]: master offset -62464938 s2 freq -62431946 path delay 2728
The last follow up message arriving out of order is cached. Before the state
machine changes to UNCALIBRATED, all sync and follow up messages are discarded.
If we get into that state between a sync and follow up message, the latter is
cached. When there's no real roerdering happening, it's kept cached forever.
When we restart the master, it starts numbering the messages from zero again.
The initial synchronization doesn't take always the same amount of time, so it
can happen that we get into UNCALIBRATED a little bit faster than before,
managing to get the sync message with the sequenceId that we missed last time.
As it has the same sequenceId as the cached (old) follow up message, it's
incorrectly assumed those two belong together.
Flush the cache when changing to UNCALIBRATED. Also, do similar thing for other
cached packets.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Management messages can cause a change in the clock quality. If this
happens, then it is time to run the Best Master Clock algorithm again.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Whenever a port enters the passive state, it should act like a slaved
port in one respect. Incoming announce messages from the grand master
are supposed to reset the announce timer. This patch fixes the port
logic to properly maintain the passive state.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Reported-by: Rohrer Hansjoerg <hj.rohrer@mobatime.com>
When there is a peer speaking PTPv1 in the network we want to silently ignore
the packets instead of flooding system log with error messages. At the same
time we still want to report malformed packets. For that we reuse standard
error numbers and do more fine-grained error reporting in packet processing
routines.
Signed-off-by: Libor Pechacek <lpechacek@suse.cz>
Handle reception of >=3 sequential multiple pdelay responses from
distinct peers as a fault of type FT_BAD_PEER_NETWORK.
Signed-off-by: Delio Brignoli <dbrignoli@audioscience.com>
This patch also changes port_capable() to reset the port's nrate every time asCapable changes
from true to false.
Signed-off-by: Delio Brignoli <dbrignoli@audioscience.com>
If messages are not freed, it is possible (with purposely crafted traffic) to trigger
a peer delay calculation which will use message's data from the previous round.
Signed-off-by: Delio Brignoli <dbrignoli@audioscience.com>
When a port makes a transition from one state to another, it resets all of
the message timers. While this is the correct behavior for E2E mode, the
P2P mode requires sending peer delay requests most of the time.
Even though all the other timer logic is identical, still making an
exception for P2P mode would make the code even harder to follow. So this
patch introduces two nearly identical helper functions to handle timer
reprogramming during a state transition.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Now that there are clock/port_management_set functions, the IDs that
GETs are handled for, like DEFUALT_DATA_SET, still need to be in the
case for sending NOT_SUPPORTED errors.
Signed-off-by: Geoff Salmon <gsalmon@se-instruments.com>
Adds port_management_send_error and clock_management_send_error to
avoid repeatedly checking the result of port_managment_send_error and
calling pr_err if it failed. Future patches send more mgmt errors so
this will avoid repeated code.
Signed-off-by: Geoff Salmon <gsalmon@se-instruments.com>
There really is no such state, but there probably should have been one.
In any case, we do have one just to make the code simpler, but this should
not appear in the management responses. This patch fixes the issue by
covering over our tracks before sending a response.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
If the port resets itself after detecting a fault, then the polling events
for that port are no longer valid. This patch fixes a latent bug that
would appear if a fault and another event were to happen simultaneously.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
A timeout of 15 seconds is not always acceptable, make it configurable.
By popular consensus, instead of using a linear number of seconds, use
the 2^N format for the time interval, just like the other intervals in
the PTP data sets. In addition to numeric values, let the configuration
file support 'ASAP' to have the fault reset immediately.
[RC - moved the handling of special case tmo=0 and added a break out
of the fd event loop in case the fds have been closed.
- changed the linear seconds option to log second instead.
- changed the commit message to reflect the final version. ]
Signed-off-by: Delio Brignoli <dbrignoli@audioscience.com>
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
This patch implements the capable flag as follows.
1. After calculating the neighbor rate, we are capable.
2. If we miss too many responses, we are incapable.
3. If we get multiple responses, we throw a fault,
and so we are also incapable.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
This commit only provides helper functions that will implement the effect
of a port being not capable. We let the port be always 'capable' for now,
until we actually have added the details of that flag.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
We use the follow_up_info to control behavior that is specific to the
802.1AS standard. In several instances, that standard goes against the
1588 standard or requires new run time logic that exceeds what can be
reasonably described as a 1588 profile.
Since we will need a few more run time exceptions in order to support
802.1AS, we introduce a helper function to identify this case, rather
than hard coding a test for follow_up_info, in order to be more clear
about it.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Because of an oversight in the event code, a port will not send peer delay
request messages while in the initial listening state. This patch fixes
the issue by expanding this special, initial case.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
The code previously treated all supported request as 'get' actions and
ignored the actual action field in the message. This commit makes the
code look at the action field when processing the requests.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Reforming the data structure in this way will greatly simplify the
implementation of the management message for this data set.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Reforming the data structure in this way will greatly simplify the
implementation of the management message for this data set.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
The Linux kernel supports a hardware time stamping mode that allows
sending a one step sync message. This commit adds support for this mode
by expanding the time stamp type enumeration. In order to enable this
mode, the configuration must specify both hardware time stamping and set
the twoStepFlag to false.
We still do not support the one step peer delay request mechanism since
there is neither kernel nor hardware support for it at this time.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
this patch changes sk_interface_phc to sk_get_ts_info, by allowing the function
to store all the data returned by Ethtool's get_ts_info IOCTL in a struct. A new
struct "sk_ts_info" contains the same data as well as a field for specifying the
structure as valid (in order to support old kernels without the IOCTL). The
valid field should be set only when the IOCTL successfully populates the fields.
A follow-on patch will add new functionality possible because of these
changes. This patch only updates the programs which use the call to perform the
minimum they already do, using the new interface.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
If a buggy driver or hardware delivers bogus time stamps, then we might
crash with a divide by zero exception.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
We have one timer used for both delay request mechanisms, and we ought
to set the message interval accordingly.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
This fixes the following issue reported by valgrind, which occurs
after a port disable/initialize subsequent to having entered slave
mode.
==10651== Invalid read of size 4
==10651== at 0x804E6E2: fc_clear (port.c:175)
==10651== by 0x805132F: port_event (port.c:1352)
==10651== by 0x804B383: clock_poll (clock.c:597)
==10651== by 0x80498AE: main (ptp4l.c:278)
==10651== Address 0x41cba60 is 16 bytes inside a block of size 60 free'd
==10651== at 0x4023B6A: free (vg_replace_malloc.c:366)
==10651== by 0x804EB09: free_foreign_masters (port.c:287)
==10651== by 0x804FB14: port_disable (port.c:722)
==10651== by 0x8051228: port_dispatch (port.c:1298)
==10651== by 0x804B3C6: clock_poll (clock.c:602)
==10651== by 0x80498AE: main (ptp4l.c:278)
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
this patch allows each port to maintain its own pod structure since it is only
used in ports. This will allow the user to configure any special settings per
port. It takes a copy of the default pod, and a future patch will allow the
configuration file to set per-port specific changes
-v2
* Minor change to fix merge with previous patch
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
the port_open function takes a large number of command options, a few of which
are actually all values of struct interface. This patch modifies the port_open
call to take a struct interface value instead of all the other values. This
simplifies the overall work necessary and allows for adding new port
configuration values by appending them to the struct interface
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
This commit only adds support for forwarding the management messages.
The actual local effects of the management commands still need to be
implemented.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Add transportSpecific parameter to config file parser
Set transportSpecific field in message headers as using the configuration (default to 0)
[ RC - reduced this patch to just the addition of the field ]
Signed-off-by: Delio Brignoli <dbrignoli@audioscience.com>
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
An oversize incoming packet might overwrite the reference counter in a
message. Prevent this by providing a buffer large enough for the largest
possible packet.
This will also be needed to support TLV suffixes.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
When computing a port's best foreign master, we make use of a message
reference that possibly might have been dropped by calling msg_put in
the fc_prune subroutine. This commit fixes the issue by copying the
needed data from the message before pruning.
[ Actually, since msg_put only places the message into a list without
altering its contents, there was no ill effect. But using a message
after having released it is just plain wrong. ]
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Under Linux, when the link goes down our multicast socket becomes stale.
We always poll(2) for events, but the link down does not trigger any event
to let us know that something is wrong. Once the port enters master mode
and starts announcing itself, the socket throws an error. This in turn
causes a fault, and we reopen the socket when clearing the fault.
However, in the case of slave only mode, if the port is listening then
it will never send, discover the link error, or repair the socket. This
patch fixes the issue by simply reopening the socket after an announce
timeout.
[ Another way would be to use a netlink socket, but that would add too
much complexity as it poorly matches our port/interface model. ]
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
The message code is horribly broken in three ways.
1. Clearing the message also sets the reference count to zero.
2. The recycling code in msg_put does not test the reference count.
3. The allocation code does not remove the message from the pool,
although this code was never reached because of point 2.
This patch fixes the issues and also adds some debugging code to trace
the message pool statistics.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
If the new ethtool operation is supported, then use it to verify that the PHC
selected by the user is correct. If the user doesn't specify a PHC and ethtool
is supported then automatically select the PHC device.
If the user specifies a PHC device, and the ethtool operation is suppported,
automatically confirm that the PHC device requested is correct. This check is
performed for all ports, in order to verify that a boundary clock setup is
valid.
The check for PHC device validity is not done in the transport because the
only thing necessary for performing the check is the port name. Handled this
in the port_open code instead.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
this patch makes sure every function is checked for a negative return value
and ensures that a fault is detected when these fail
-v2-
* Fixed only check the ones with return value
-v3-
* Modified the delay_req functions to return 0 on nonfault cases
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
This commit makes each pair of port functions, open/close and
initialize/disable, balance each other in how they allocate or free
resources. This change lays some ground work to allow proper fault
handling and disable/enable logic later on.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
It was a cute idea to have the raw Ethernet layer use just one socket,
but it ended up not working on some specific PTP time stamping hardware.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
In the course of development we added more and more allocations into the
port code without freeing them on close. We do not yet call the close
function, so there was never an issue. Once we start to reset the ports,
to clear faults for example, then we will need this.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Although the UDP/IPv4 layer does not need any state per instance (other
than the two file descriptors), the raw Ethernet layer will need this.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
We always wait for the transmit time stamp after sending an event message.
Thus a missing time stamp is clearly a fault, even if the hardware can
only handle one time stamp at a time.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
The state machine needs to know whether a new master has just been
selected in order to choose between the slave and uncalibrated states.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
The timerfd calls are missing from Sourcery CodeBench Lite 2011.09-23.
We can remove this code once these calls are properly integrated into a
current tool chain.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Since the master implementation is still lacking, we will just keep
the slave-only flag hard coded for now.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>