Power platform Dataflows – have I missed “Incremental refresh” option somehow?

By | December 12, 2020

This month seems to be all about data integration/data migration for me, so, even though I’m still using good old SSIS in many cases, I wanted to look at the Dataflows a little closer.

After all, they don’t require any additional deployment/licensing – we just need to go to the maker portal and start building them:

image

But, it seems, there is something missing.  I’ll get to it closer to the end of this post – right now let’s just try something first.

We will need XrmToolBox and Record Counter tool (I used the one with 5 stars below):

image

There is a special “ETL Test Entity” I just created – it has a name field and and a “Test Id” text field (which is also an alternative key). Right now there are 0 records in this entity:

image

Then, there is a SQL table which is pre-populated with 10000 records:

drop table ETLTest
go
CREATE TABLE ETLTest (
Id uniqueidentifier,
FullName nvarchar(250),
ModifiedOn DATETIME2(7)
)
go
INSERT INTO ETLTest(Id, FullName,  ModifiedOn) VALUES(newid(), ‘Test’, getutcdate())
GO 10000

 

image

 

There is a dataflow which is loading data from the SQL table above to the ETL Test Entity in Dataverse:

image

image

 

And, below, I’ll give it a try:

image

 

10000 records moved in in just under 2 minutes – that was pretty fast:

image

 

image

This is a good result so far, since it could be very suitable for all sorts of data integrations. Besides, on a related note, it seems Dataflows can handle API throttling limits intelligently, though I did not take it far enough to test it this time:

https://powerusers.microsoft.com/t5/Power-Apps-Ideas/Remove-Service-Protection-API-throttling-limits-for-Dataflows/idi-p/484654

However, what if we do another refresh?

image

This is where I was hoping that, somewhat miraculously, the data flow would know that none of the records have changed, so none of them would have to be updated in Dataverse.

Instead, I can see all those records coming in again:

image

3 minutes later, they have all been updated:

image

image

There are still only 10000 records, so none were added (pretty sure that’s because my entity had an alternative key):

image

Still, I was hoping none of them would even be updated in Dataverse, since I had not changed any of them in the source, right?

And this is where I am finding this particular scenario (using Standard dataflows to load data to Dataverse) somewhat unclear. It seems there is supposed to be incremental refresh option according to the documentation page below:

https://docs.microsoft.com/en-us/power-query/dataflows/incremental-refresh

image

I can see this functionality for analytical data flows (those which would be dumping data to the data lake, for instance):

image

But I don’t see it for the Standard v1/v2 dataflows:

image

Which means using dataflows to load large data tables into Dataverse might be a waste of API calls for the entitlement limits.

Did I miss that feature somehow? Let me know if I did.

Otherwise, it might actually be more practical to use Azure Data Factory (or, possibly, something else such as SSIS) in the scenario above. Which is a little unfortunate since, compared to those other options, setting up a dataflow takes almost no effort at all.

 

PS. And here it a confirmation, thanks to a fellow MVP Carina M Claesson:

“Considerations and limitations: Incremental refresh in Power Platform dataflows is only supported in dataflows with an Azure Data Lake Storage account, not in dataflows with Dataverse as the destination.” It’s right there, on the same docs page, just all the way down to the bottom.

4 thoughts on “Power platform Dataflows – have I missed “Incremental refresh” option somehow?

  1. Hosk

    Considerations and limitations
    Incremental refresh in Power Platform dataflows is only supported in dataflows with an Azure Data Lake Storage account, not in dataflows with Dataverse as the destination.

    Reply
  2. Andrew Wolfe

    Edit Incremental Refresh is available today in a Standard V2 (not Analytical) dataflow loading data from SQL (via data gateway) into Dataverse. I haven’t seen that option before.

    Of course it still doesn’t have the option that we really want – Include only data modified since the last successful refresh. Typical of the half solution mentality in most things that we’re given; stop now before I rant about overridencreatedon.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *