Wednesday, December 16, 2009

Translating Data With PowerShell

I recently was confronted with a situation where I needed to transform a list of developer-friendly filenames contained in a text file into a user-friendly list of report names. In the past I probably would have written a Console app to do this work but PowerShell is a much more lightweight solution to this problem.

The filenames were the result of a query against TFS version control and were in the format of $/[TFS Project Name]/Reports/[Codeline]/[Visual Studio Project Name]/*.rdl. A given *.rdl file, for example "UpcomingBdayRpt.rdl", would be represented in the application as "Upcoming Birthdays Report". I needed to provide the names which the tester was familiar with in the application. Providing a list of cryptic names like "UpcomingBdayRpt.rdl" would be as useful as providing a list in cuneiform (assuming that the tester doesn't read cuneiform).

The first step was to get the list into PowerShell. (For the purposes of this demonstration assume that I've already navigated in PowerShell to the directory that contains the source files.)
$allReportFiles = Get-Content .\MyInputFile.txt

Because I had a predictable path pattern I could use Split on the forward slash to get a list of only the filename. I removed the extension on the filename with -replace.
$rdlNames = $allReportFiles | ForEach-Object{ $_.split("/")[5] -replace ".rdl", ""}

The database that supports our application has a cross-reference table that relates the *.rdl name to the user-friendly name. I exported the two columns that I needed into a comma-delimited file named "reports.txt". The format of the text file looked like this:
DocumentLetter,Document Letter
InterviewReport,Interview Report
OutstandingItemsLetter,Outstanding Items Letter

I got lazy during the next step. Pipelining the contents of $rdlNames to ForEach-Object, I used GetChildItem to get a reference to my data file, searched that file for a line that contained the filename contained in "$_", and, when the line was found, got the FileName, Pattern, and Line properties. Note that I don't need the FileName property, but included it so that I could confirm that my data was coming from where I expected it. I'm paranoid about things like that.
$data = $rdlNames | ForEach-Object { Get-ChildItem * -include reports* | Select-String -pattern $_.Trim() -SimpleMatch} | Select-Object -property FileName, Pattern, Line -unique
So why do I consider this lazy? Well, there are almost certainly more elegant solutions to this problem. But then again, this is just a trivial script meant to solve a unique problem. The solution doesn't have to stand the test of time nor be incredibly efficient- it just needs to get the job done.

The final step is just to extract the user-friendly names from the objects in $data using $_.Line.split(",")[1]. In case I need to discuss one of the reports with the tester, I provide the developer-friendly filename in brackets so that we can translate between the two names (developer and tester).
$data | ForEach-Object{$_.Line.split(",")[1] + " [" + $_.Pattern + "]" } | Sort-Object

So that's it. I copied and pasted the output from the final line into an email and I was done. The tester had the information that she needed, all was right in the world, and I could leave on time for once.