1 year ago

#293407

test-img

Paweł Bulwan

Azure Data Lake Gen1: after `concurrentappend`, subsequent reads of file return different content

We have an instance of Azure Data Lake Storage Gen1 and a system that appends incoming data to files there.

We want to guarantee to another team, that at midnight, the files for the day that ends become immutable. However, the other team complained that files appear to change after midnight.

I enabled "Request logs" in Diagnostic settings and confirmed that content of the files appear to change between the reads, even though there is no write operation in between the reads:

Request logs collected for a single file in ADLS Gen1

It appears that the first read of the file after concurrentappend operation returns data that does not include the last appended content. This seems true even when there is a lot of time between write and the first read (e.g. 1 hour). The second and subsequent reads return all the data - correctly.

My question is: is it a bug in ADLS Gen1 or an intended behavior? This answer says ADLS has read-after-write consistency, but what I'm observing seems to contradict it...

azure

azure-data-lake

consistency

0 Answers

Your Answer

Accepted video resources