There are a number of diff tools available. The reason that I choose to use DiffMerge is that I find it has the nicest interface I've found for visually representing differences between files in a clean and simple manner. It is this visual element that is most important for my use case. It is not immediately obvious why someone working primarily with big data, manipulating huge log files and SQL via command line interfaces, would worry too much about visual elegance. As a data warehouse tester friend of mine Ranjit wrote in his blog - being able to visualise key characteristics of information gives is a critical skill in testing in our domain, and here is a great technique to demonstrate this.
An inaccessible problem
Both through my testing work and my role running the technical support I am often faced with investigating why certain operations that work fine in one context will fail to do so in another. The classic It works on my machine scenario is a common example. If the installation appears to be sound and the cause of the issue is not obvious, then it can be difficult to know where to look when trying to recreate and diagnose problems. This is the situation that I found myself in a while ago when a someone found an issue trying to use our software via an Oracle Gateway interface and had encountered an error returning LONG data types. Having set up a similar environment myself a few weeks before I was well placed to investigate. Unfortunately after checking out all of the configuration files and parameters that I knew could cause issues if not set correctly, I'd not got any nearer to the cause of the problem. I had requested and received full debug tracing from the Oracle gateway application, however I was not overly familiar with the structure of the logs, which contained a number of bespoke parameters and settings.
Staring at the log file was getting me nowhere. I knew that my own test environment could perform the equivalent operation without issue - it worked on my machine. Figuring that I could at least take the logs from my healthy system to refer to , I performed the same operation, got the trace files then compared the two equivalent logs in DiffMerge
I was unsure about how much useful information this would provide, or even whether the logs would be comparable at all, however the initial few rows looked promising. A quick examination showed that
- The logs were from equivalent systems
- The logs were comparable
The first few pages revealed nothing interesting, then my eye was drawn to some differences that did not conform to any of the mental patterns I'd already identified.
It was clear that some of the settings being applied in the gateway connection were different to my installation. Although not familiar with the keys or values, I could make a reasonable inference that these settings related to the character sets in the connection. Definitely something worth following up on - I noted it and pressed on.
I rattled through the next couple of hundred lines, my brain now easily identifying and dismissing repeated date and connection id difference patterns. I could, with a little effort, have written a script to abstract these out. I decided that this wasn't necessary given how easy it was to scan through the diffs, and I also was still in a position to spot a change in use or pattern of these values.
The next set of differences was interesting again - it showed some settings being detected or applied for the data columns in my query. Again the differences that caught my eye appeared to relate to the character set (CSET) of the columns.
With a scan to the end of the file revealing nothing further that appeared interesting, I investigated the character set changes further. Having an area of interest to focus on I was able to very quickly recreate the issue by altering the character set settings of the Oracle database, which I discovered were then passed by default into the gateway and resulted in the incorrect language settings being used on the ODBC connection. I verified my new settings exhibited the same behaviour by re-diffing the log files from my newly failing system against the ones I'd been given. A bit of internet research revealed how to explicitly configure the Gateway connection language settings to force the correct language, and the problem was resolved.
An addition to my testing toolkit
I've used this technique many times since, often to great effect. It is particularly useful in testing to investigate changes in behaviour across different versions of software , or when faced with a failing application or test that has worked successfully elsewhere. It is similarly effective for examining why an application starts failing when a certain setting is enabled - only today I was examining a difference in behavior when a specific flag was enabled on the ODBC connection using this technique.
Give it a go - you might be surprised at how good your brain is at processing huge volumes of seemingly inaccessible data once it is visualized in the correct way.
Nice article, Adam!
I like the way you take the reader step-by-step through your process.
And while I use WinMerge for similar reasons (http://www.allthingsquality.com/2010/06/blink-tests-in-blink.html), I agree with you that nice clean visualization of differences is an important attribute for this sort of tool.
Thanks Joe for taking the time to read and comment. I really like your post - I'd not come across Baretail before - I work a lot in Linux and use "tail -f" in a similar way. I will certainly be adding BareTail to my toolkit, and it is great to see that you use WinDiff in much the same way that I describe here.
Thanks,
Adam
Post a Comment