1 year ago
#293407

Paweł Bulwan
Azure Data Lake Gen1: after `concurrentappend`, subsequent reads of file return different content
We have an instance of Azure Data Lake Storage Gen1 and a system that appends incoming data to files there.
We want to guarantee to another team, that at midnight, the files for the day that ends become immutable. However, the other team complained that files appear to change after midnight.
I enabled "Request logs" in Diagnostic settings and confirmed that content of the files appear to change between the reads, even though there is no write operation in between the reads:
It appears that the first read of the file after concurrentappend
operation returns data that does not include the last appended content.
This seems true even when there is a lot of time between write and the first read (e.g. 1 hour).
The second and subsequent reads return all the data - correctly.
My question is: is it a bug in ADLS Gen1 or an intended behavior? This answer says ADLS has read-after-write consistency, but what I'm observing seems to contradict it...
azure
azure-data-lake
consistency
0 Answers
Your Answer