Friday, February 18, 2011

A FileNotFoundException, Wrapped In A CommunicationException

For some time I've been involved in an effort to run an existing intranet Windows Forms application in an extranet configuration. It's been quite a struggle because the original app, which I'll refer to as CRAM, was never designed to NOT run in our corporate intranet. The process has required altering firewall rules, server and database configuration, new deployment scripts, and many more things than I care to remember.

A couple of days ago I thought everything was finally finished. One last error to resolve:
System.ServiceModel.CommunicationException: The socket connection was aborted. This could be caused by an error processing your message or a receive timeout being exceeded by the remote host, or an underlying network resource issue. Local socket timeout was '00:04:59.8798782'.
Surely this was just another WCF configuration issue, right? I fired up Wireshark to diagnose the problem. Hmmm, Wireshark showed that a call to a SQL Server 2008 R2 Reporting Services (SSRS) web service was being rejected as 401 Unauthorized. Thinking that authentication was failing, I embarked on a journey of discovery where I learned about SSRS web service changes between the 2005 and 2008 versions. At one point I changed the account under which the SQL Server services run, thinking that the current accounts didn't have the required network access to properly authenticate web service callers. But after this frenzy of learning and tweaking / hacking CRAM still was throwing the same exception.

Lucky for me the original developer had the foresight to log the WCF calls. Using Service Trace Viewer I was able to get a lot more detail about the communication error but not the root cause. Buried in all of this information was another error that I hadn't seen on the client or Event Log:
System.IO.FileNotFoundException: Could not load file or assembly 'Oracle.DataAccess, Version=2.111.6.20, Culture=neutral, PublicKeyToken=89b483f429c47342' or one of its dependencies. The system cannot find the file specified.
Must be a deployment issue, I thought. It didn't seem to be directly related to the communication issue, and it was appearing after the communication errors. I decided to fix the FileNotFoundException to reduce the amount of event tracing data for me to process as well as reassure myself that I wasn't a total failure.

The problem with the Oracle.DataAccess reference is that the required version wasn't present on the application server. I changed the reference's Specific Version property to False, compiled, deployed, and still got the same error. I looked at the installed Oracle client versions on the intranet and extranet application servers and noticed that they were different. I then copied the Oracle.DataAccess.dll (version 2.111.6.20) to CRAM's extranet application server so that it would function as a private assembly. Same result upon running CRAM: FileNotFoundException. Recalling a previous project where I needed to deploy Oracle.DataAccess as a private assembly I copied the same set of dll's to CRAM's executable directory.

And that was the answer. CRAM required a version of Oracle.DataAccess that was unavailable on the application server. Apparently this caused an exception that set off a chain of events resulting in the System.ServiceModel.CommunicationException that vexed me for so long. What I've learned from this- and I probably should have already known this- is that when troubleshooting WCF errors you can't only look at the error on the client: the server must also be taken into account as well. The sheer disconnected nature of WCF means that one might not be getting the complete set of information if only one side of the client + server equation is examined.

Further reading:
Copying Oracle.DataAccess as a private assembly
SSRS configuration changes