The data points provided in Table 7.7 are visually represented in Figure 7.41. Notice that event messages that arrive within a 5‐second window get the same watermark. For example, the event messages with sequence numbers between 0 and 2 all arrived between 10:00:00 and 10:00:05 and received the same watermark. You also will notice that the event message with sequence number six has an arrival time that is 6 seconds after its event time.
FIGURE 7.41 Watermark progression example
To understand what happens when event messages arrive outside a given time window and to learn about some other data streaming time progression concepts, it is important to understand time management.
In numerous places throughout this book, you have seen the data structure of a brain wave reading. This brain wave reading contains a column named ReadingDate. This can be considered the event time and is accessible using the System.Timestamp(), as shown in the following example query snippet, as well as through the column itself:
System.TimeStamp() AS IngestionTime
The ingestion point of the brain wave readings in all exercises has been via an event hub. Therefore, while working through this chapter’s exercises, you may have noticed two columns on the Input Preview tab on the Query blade, such as those shown in Figure 7.42, EventEnqueuedUtcTime and EventProcessedUtcTime.
The EventEnqueuedUtcTime column is the timestamp in which the event message was received by the event hub for the Azure Stream Analytics job—in other words, the arrival time. The other column, EventProcessedUtcTime, is the date and time the Azure Stream Analytics job processed the event message. It makes sense then that the enqueued date timestamp is earlier than the processed date timestamp. Notice that the difference between the two date timestamps is about 14 seconds. To understand if 14 seconds is fast, slow, or expected, you first need to understand how Azure Stream Analytics manages time. The variation in timestamps, like the ones shown in Table 7.7 and the one here, can be impacted for many reasons, such as clock skews between the data producer and the following five items: ingestion point availability, network bandwidth pressure, the stream processor, the unavailability of streaming data components, or latency on any of those streaming data components. Any kind of disruption can result in event messages arriving late, early, or out of order. To calculate whether an event message has arrived late, the platform compares the event time with the arrival time. If the difference between the two is greater than the time window, then the event message is considered late. Configuring the platform to handle such a scenario is done on the Event Ordering blade for the Azure Stream Analytics job, as shown in Figure 7.43.
FIGURE 7.42 The EventEnqueuedUtcTime and EventProcessedUtcTime columns on the Query blade
FIGURE 7.43 Event ordering for a late‐arriving streamed event message
In this case, the range allows for event messages with an event time to be up to 15 seconds after the arrival time. Any event messages that fall outside of that range will be dropped. If the configuration had been less than 14 seconds, the readings shown in Figure 7.42 would have been ignored and not included in the Azure Stream Analytics output. Because the event time was within the range threshold, they were not dropped and received a timestamp stored in the EventProcessedUtcTime column that identifies when they were processed. If the event time had fallen outside of that range, you might have wondered why the event messages had not been processed. To gain some clues concerning why event messages are not processed, you can review the available Azure Stream Analytics job metrics. Some default metrics are viewable on the Overview blade for the Azure Stream Analytics job after you select the Monitoring tab (see Figure 7.44).
FIGURE 7.44 Azure Stream Analytics monitoring metrics