How to use PAL to properly collect and analyze Performance Data
Perhaps you’ve heard of PAL (Performance Analysis of Logs) by Clint Huffman. Or maybe not? Or maybe you have used parts of it but are unsure of how to fully utilize this awesome tool? Look no further, the Dude is here to set things right…
First, download PAL and install it. (http://pal.codeplex.com/). Once that is sorted, proceed…
Now we’re in the fun stuff. Lets say, for example, you have a server that gets cranky on you the longer its online. One time you had to skip a patch Tuesday by a week and that ended with a SEV-A for your company because the server crashed even. To properly collect data and find root cause, you would do the following (we’ll use the example of a SQL 2005 Server here for fun):
Open PAL and go to the “Threshold File” tab. Chances are you run SCOM 2007 or newer or even a third party tool to monitor, but are you collecting what you really need for root cause here? Do you know? If you had the data in a database with this monitoring tool, do you know how to pull it out in a format that is productive to finding root cause? Maybe not, PAL to the rescue.
We’re going to change our drop down for “Threshold file title” to SQL 2005:
[image error]
So instead, it looks like this:
[image error]
And then, we’re going to Export this, using the button directly below the drop down. Depending on Operating System we are on, we need the export in either HTM or XML format. I’ll pick XML here since most of us are on 2008 Server or higher, but same concepts apply to 2003 SP2 and XP.
[image error]
And then, we move this XML file to our Server, and import it as a Data Collector Set template.
[image error]
From here it is a simple process of browsing (by clicking Browse on the next screen) and picking your template XML file and then clicking Next a bit and setting timings (I’d generally recommend 5 second intervals, but 1 second would be more appropriate for disk latency, and maybe 15 seconds for just memory leaks).
[image error]
Now we have a Data Collector Set in the User Defined Area, but we don’t have it running yet:
[image error]
Double Click the Data Collector Set and change the Sample Interval to what you like. Do not be lead into temptation and change the format off BLG. CSV is smaller, yes, but it does not record new PIDs in its collection once it starts, so if a process stops and restarts for example, you won’t see the new PID.
[image error]
Then right click and select properties on the Data Collector of Doom on the left pane, and do yourself a favor. Click on Stop Condition and change it to a maximum file size of 200 MB. Then check the box “Restart the data collector set at limits.” so it will create rolling 200 MB files. Perfmon isn’t made to open large (over 250 or so MB) files, so this aids in manual analysis. It also helps pinpoint outages to specific time frames in the BLG file.
[image error]
Then we can click OK to close out this window and right click “start” our Data Collector Set. Now we reproduce the problem and we have data to analyze. I’ll cover that in the next blog post, including some ideas on how to use PAL to populate web farms or SQL backends with appropriate data sets.


