Expose data through an ADO.NET data provider

A frequently asked question since SQL Server 2005 shipped is "how do I expose my custom data source to SSIS/SQLRS/your app here". If you have a data server that you must connect to, and it accepts commands and returns resultsets, its a nature fit for an ADO.NET data provider-based mechanism. Having worked with OLE DB provider writers for a few years, I'm always surprised that its quite straightforward to write an ADO.NET data provider. As an example, the ATL OLE DB Provider Wizard uses a skeleton starter provider implementation in atldb.h that weighs in at over 12000 lines of code. A similar skeleton for a ADO.NET data provider is a hundred or so lines.

I originally built an ADO.NET data provider for version 1.0 that simply accepted a command that consisted of a file path and returned file information as a rowset. I'd upgraded the provider for ADO.NET 2.0 without adding much, just restructuring to use base classes instead of interfaces. Today I added enough of the "additional classes" to make it useable in SSIS. Haven't tried much else yet. I had to add two more skeleton classes: a ProviderFactory and ConnectionStringBuilder.

The provider factory connects is the glue between the machine.config info that is used as a "provider enumerator" and the provider itself. You need to:
1. Either register the provider in the GAC or make it available to each client.
2. Add "provider configuration" entry for it, either in machine.config or in each client's application configuration file. The console app has a specific entry in app.config, for SSIS its probably easier to put it in the GAC.

So that SSIS would accept the connection string (which the provider really doesn't use), I added a dummy property, DataSource. SSIS doesn't like empty connection strings. The provider, test console app, and test SSIS project (data connection only) is attached.

Although the provider doesn't do much (many of the methods are simply stubs) it may be a useful "starter" providing the scaffolding. Let me know if this is useful and I might do some work to expand it further. (Apr 17: Updated the provider based on feedback in this blog thread)  Enjoy!

MDirProv2_For_NET2.zip (123.85 KB)

8 thoughts on “Expose data through an ADO.NET data provider

  1. Bob,

    So in your implementation, it sounds like you leverage a traditional OLE DB Provider and then wrap your implementation of an ADO.NET 2.0 provider around this?

    Is this the only means by which you can inter-operate with the a custom data source such as SSIS ? I’d be interested in your opinions on this.

  2. Thanks for this example! It is exactly what I have been fumbling around trying to do. However, I’m running into a problem actually using it as a source. See Below. It’s problably because I haven’t done your #2 above. Can you please elaborate on what that is about? I think I got #1 covered… I copied the DbProviderFactories entry from the client app into my machine.config.

    From a simple SSIS project, I’m getting as far as calling into MDirDataReader’s GetDirectory() and then GetSchemaTable(). It nevers gets to the Read(). I get an 0x80004002 (Interface not supported?) error. See Below.

    Debug ERRORS:

    SSIS package "Package.dtsx" starting.
    Information: 0x4004300A at Data Flow Task, DTS.Pipeline: Validation phase is beginning.
    Warning: 0x80047076 at Data Flow Task, DTS.Pipeline: The output column "Size" (108) on output "DataReader Output" (6) and component "DataReader Source" (1) is not subsequently used in the Data Flow task. Removing this unused output column can increase Data Flow task performance.
    Warning: 0x80047076 at Data Flow Task, DTS.Pipeline: The output column "Type" (111) on output "DataReader Output" (6) and component "DataReader Source" (1) is not subsequently used in the Data Flow task. Removing this unused output column can increase Data Flow task performance.
    Warning: 0x80047076 at Data Flow Task, DTS.Pipeline: The output column "CreationDate" (114) on output "DataReader Output" (6) and component "DataReader Source" (1) is not subsequently used in the Data Flow task. Removing this unused output column can increase Data Flow task performance.
    Information: 0x4004300A at Data Flow Task, DTS.Pipeline: Validation phase is beginning.
    Warning: 0x80047076 at Data Flow Task, DTS.Pipeline: The output column "Size" (108) on output "DataReader Output" (6) and component "DataReader Source" (1) is not subsequently used in the Data Flow task. Removing this unused output column can increase Data Flow task performance.
    Warning: 0x80047076 at Data Flow Task, DTS.Pipeline: The output column "Type" (111) on output "DataReader Output" (6) and component "DataReader Source" (1) is not subsequently used in the Data Flow task. Removing this unused output column can increase Data Flow task performance.
    Warning: 0x80047076 at Data Flow Task, DTS.Pipeline: The output column "CreationDate" (114) on output "DataReader Output" (6) and component "DataReader Source" (1) is not subsequently used in the Data Flow task. Removing this unused output column can increase Data Flow task performance.
    Information: 0x40043006 at Data Flow Task, DTS.Pipeline: Prepare for Execute phase is beginning.
    Information: 0x40043007 at Data Flow Task, DTS.Pipeline: Pre-Execute phase is beginning.
    Information: 0x402090DC at Data Flow Task, Flat File Destination [52]: The processing of file "\dave.flat" has started.
    Information: 0x4004300C at Data Flow Task, DTS.Pipeline: Execute phase is beginning.
    Error: 0xC0209029 at Data Flow Task, DataReader Source [1]: The "component "DataReader Source" (1)" failed because error code 0x80004002 occurred, and the error row disposition on "output column "Size" (108)" specifies failure on error. An error occurred on the specified object of the specified component.
    Error: 0xC02090F5 at Data Flow Task, DataReader Source [1]: The component "DataReader Source" (1) was unable to process the data.
    Error: 0xC0047038 at Data Flow Task, DTS.Pipeline: The PrimeOutput method on component "DataReader Source" (1) returned error code 0xC02090F5. The component returned a failure code when the pipeline engine called PrimeOutput(). The meaning of the failure code is defined by the component, but the error is fatal and the pipeline stopped executing.
    Error: 0xC0047021 at Data Flow Task, DTS.Pipeline: Thread "SourceThread0" has exited with error code 0xC0047038.
    Error: 0xC0047039 at Data Flow Task, DTS.Pipeline: Thread "WorkThread0" received a shutdown signal and is terminating. The user requested a shutdown, or an error in another thread is causing the pipeline to shutdown.
    Error: 0xC0047021 at Data Flow Task, DTS.Pipeline: Thread "WorkThread0" has exited with error code 0xC0047039.
    Information: 0x40043008 at Data Flow Task, DTS.Pipeline: Post Execute phase is beginning.
    Information: 0x402090DD at Data Flow Task, Flat File Destination [52]: The processing of file "\dave.flat" has ended.
    Information: 0x40043009 at Data Flow Task, DTS.Pipeline: Cleanup phase is beginning.
    Information: 0x4004300B at Data Flow Task, DTS.Pipeline: "component "Flat File Destination" (52)" wrote 0 rows.
    Task failed: Data Flow Task
    Warning: 0x80019002 at Package: The Execution method succeeded, but the number of errors raised (6) reached the maximum allowed (1); resulting in failure. This occurs when the number of errors reaches the number specified in MaximumErrorCount. Change the MaximumErrorCount or fix the errors.
    SSIS package "Package.dtsx" finished: Failure.

  3. More info:
    I had also already added a post build command to put the dll in the GAC:
    "C:\Program Files\Microsoft Visual Studio 8\SDK\v2.0\Bin\gacutil" /if "$(TargetPath)"

    So maybe I do have 1 and 2 covered. I’m not not sure what you mean by "provider configuration". Sorry for my confusion

    Also, the test client app works fine.

    Any idea what SSIS is barkin about?

  4. I think I found the problem:

    In the DataReader the column names and types are defined like this:

    internal String[] _names = { "Name",
    "Size",
    "Type",
    "CreationDate" };
    internal Type[] _types = { typeof(String),
    typeof(long),
    typeof(String),
    typeof(DateTime)};

    But the values in the Read() method are set like this:
    FileInfo f = (FileInfo)_fsi[_CurrentRow];
    _cols[0] = f.Name;
    _cols[1] = f.Length.ToString();
    _cols[2] = "File";
    _cols[3] = f.CreationTime.ToString();

    The "Size" is set as a string but was defined as long.

    Thanks again Bob for this much needed example!

  5. "CreationDate" is also type mismatched.

    Question: My breakpoints hit all over the DataReader object with the exception of the Read() method. I know it is executing because I can change it, recompile and see the effect of the change. It’s as if the package has its own copy of this code. Any idea why I can’t break on Read()?

  6. Hi Jonathan,

    This provider does not wrap any OLE DB code at all. What it does (read file information from a directory) is based on an old OLE DB provider sample, but that’s its only connection with OLE DB. The provider implementation is the .NET equivalent of "type 4" provider, its completely written in .NET code.

    Cheers,
    Bob

  7. Hi Dave,

    It’s interesting that SSIS complained about the type mismatch, that was the piece of the original provider (that I wrote in .NET 1.0) that I didn’t change. Thanks for finding this.

    Bob

  8. Hi Bob,

    Your blog is very useful and informative. Could you please tell me the nature and scope of "maximumerrorcount"; what types of errors does it take into account.

    One of the ‘Data Flow Tasks’ of my package gave this error and I had to increase the "maximumerrorcount" parameter to a higher value. Though I don’t think if anything was wrong with it.

    Thanks and regards,
    Apurva

Comments are closed.

Other articles

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.