Data Import component in the Machine Learning Studio supports on-premise gateways, so I got this idea that I should be able to use it to import data from the on-prem Dynamics database. Even more, I was actually hoping to use the gateway which I just installed with the Machine Learning, but, as it turned out, it was not the right gateway to start with.
When setting up “Import Data” component for the experiment in the Machine Learning Studio, there is an option to use a Gateway.
First of all, it turned out that option won’t work for the free tier, so I had to re-create the workspace under my Visual Studio Enterprise subscription to start with.
Now, when adding a new gateway, machine learning studio will offer you to download a gateway and to use a registration key to register it:
What I found out is that this gateway is different from the gateway I used before, so.. yes, I had to uninstall that first gateway and, instead, had to install a new one. Which is actually called “Microsoft Integration Runtime”:
This version of the “gateway” did ask me for the authentication key, so I used the one provided by the Machine Learning Studio:
Finally, got it registered:
At which point I started the configuration manager and tried testing the connection:
This did not work, but, after a bit of digging, it turned out this was happening because of the firewall on my SQL VM. Once I disabled the firewall, I could test the connection just fine. Ended up adding a rule for the default SQL port(1433):
After which I was, finally, able to test the connection.
Back to the Machine Learning Studio!
Actually, it did recognize the gateway automatically – once I switched back to the browser, I saw this screen:
So, I just had to specify the server name, configure the credentials, and, it seems, my data import component is ready:
Now, how do I test it?
There is “run selected” option in the popup menu on the import data component:
My first attempt to run it was a complete failure:
I can’t use decimals.. nice. Well, let’s change the query – at this point, I’d be happy if it worked at all.. even with only two fields: Name and StatusCode
This time “Run Selected” succeeded:
Once that worked, “Visualize” and “Save Dataset” options became available under the “Results Dataset”:
Selecting the “Visualize”.. and, finally, I have my on-prem Dynamics data in the Machine Learning Studio:
What am I going to do with this next is a different question.. but, at least, it’s a small step in the right direction.