

Colm Naish, 26th of November, 2010
http://support.citrix.com
In complex software systems, we tend to find embedded tracing throughout the various components and modules that make up the system. As systems become more complex, mechanisms are employed to modularize the tracing in an attempt to allow only relevant trace statements to be captured, and to improve efficiency. Microsoft’s Event Tracing for Windows (ETF) and Citrix’s Citrix Diagnostic Facility (CDF) are two examples of this. They allow for high speed, efficient tracing on a modular basis. Tracing can be enabled dynamically, which means that we can selectively choose which components or modules to trace on.
The ability to dynamically select tracing modules on a modular basis helps ensure that the trace statements captured are relevant to the scenario that is being investigated. However, there is a trade off within software tracing. For a trace to be useful, we want it to be as rich in information as possible. This means more trace statements in the code. However, more trace statements in the code means more trace statements in the captured trace. This can leave the engineer with an enormous trace to analyze. The trace will be full of rich information, but the law of diminishing marginal returns applies. It’s much easier for an engineer to analyze a 500 line trace than a 50,000 line trace, even if the 50,000 line trace has a lot of rich information included in it.
One way to alleviate this situation is to employ some method to reactively set ‘sign posts’ in the trace. These ‘sign posts’ are merely easily identifiable trace messages, which can be searched for by the engineer. The trick is to have them written at specific points in time during the gathering of the trace, for example at the exact point in time that a process spikes the CPU, or when a specific error pops up in front of the end user whose session is being traced. By using a reactive ‘sign post’ method like this, engineers can take their 50,000 line trace and search for the ‘sign post’ message. This could appear at line 35,000 for example. Straight away, they will have saved themselves the need to parse through 35,000 lines of trace statements, looking for the area of interest relevant to their issue! Combine this with a mechanism for writing a second ‘sign post’ to the trace to mark the end of the interesting event, say when the process stops spiking the CPU (around line 37,000 in the trace for example), and the engineer has effectively book-ended the area of interest in the trace containing the interesting messages. The engineer now knows with confidence that it's only necessary to analyze the area of the trace between the two ‘sign post’ messages. In our example, they have cut the 50,000 line trace down to a very manageable 2,000 line trace, with very little effort.
All the engineer needs to achieve this is a tool or mechanism to send these ‘sign post’ messages to the running trace, on demand. Once such tool from Citrix is CDFMarker On Demand – For XenApp and XenDesktop. CDFMarker is a small utility, which allows sending a customizable CDF trace statement into a running CDF trace, on demand, on either XenApp Servers or XenDesktop Virtual Desktop Agents. It allows the engineer (or end user) to send a customizable CDF message to the running CDF trace by hitting a hot key (Ctrl-M). So, for example, whenever end users see an intermittent issue occurring, they immediately hit the hot key and the ‘sign post’ message is sent to the trace to mark the exact point in the trace where the issue occurred.
Possibly more useful, though, is the ability for CDFMarker to take command line input and send it as the ‘sign post’ message. This allows the tool to be called from inside scripts and batch files, or as part of an event monitoring system such as Windows Performance Monitor.
Windows Performance Monitor allows the administrator to configure alerts on a host of embedded performance counters. These alerts can be configured to fire once a configured threshold has been breached. When the alert fires in response to the occurrence of the configured event, it can call an external program, i.e. CDFMarker, passing it text and system variables via the command line, which CDFMarker then sends into the running CDF trace as a ‘sign post’ message. There are a great many built-in performance counters in the various Windows operating systems that can be leveraged in this manner. But now consider that many software applications will add their own relevant performance counters, which can also be harnessed, and the scope for setting ‘sign post’ messages in a running trace greatly increases.
Some examples of how CDFMarker can be used to send customizable CDF ‘sign post’ messages in response to various events are listed below. The associated link will take you to an article in the Citrix Knowledgebase detailing the exact step by step instructions to implement each scenario:
You can also find some videos on CitrixTV and YouTube describing some of these use cases.
As we said at the outset, there is a trade off in software tracing between the amount of data and trace statements captured and the ease and efficiency of analyzing the trace. Sooner or later, a point is reached where the usefulness of the trace starts to diminish. This can be directly related to the size of the trace, and the ability of the engineer analyzing it to distinguish what is relevant to the interesting event being traced, and, more importantly, what is not. By utilizing a tool such as CDFMarker to write ‘sign post’ messages into the running trace, the area of interest in the trace can be identified quickly and easily, allowing the engineer to disregard irrelevant trace messages and focus on analyzing the most relevant trace messages. This should lead to a quicker diagnostic and resolution, and an easier life for everyone. Happy Holidays!