Understanding Delta Lake Time Travel in 2 minutes
Delta Lake provides a way to version data for operations like merge, update and delete. This makes transparent how data life cycle inside Delta Lake works it.
For each operation a version will be incremented and if you have a table with multiple operations, different versions of table will be created. Delta Lake offers a mechanism to navigate over the different versions called Time Travel. It's a temporary way to access data from the past.
For this post we're going to use this feature to see different versions of table. Below we have a Delta Table called people that all versions were generated through write operations using append mode.
When we perform a simple read on a table, the current version is always the must recent one. So, for this scenario, the current version is 2 (two). Note that we don't need to specify which version we want to use because we're not using Time Travel yet.
Nothing changes at the moment, let's keep for the next steps.
Working with Time Travel
Here begins how we can work with Time Travel, for the next steps, we'll perform readings on the people table specifying different versions to understand how Time travel works.
Reading Delta table - Version 0 (zero)
Now we're going to work with different versions starting from the 0 (zero) version, let's read the table again but now adding a new parameter, take a look at the code below.
session.read().format("delta") .option("versionAsOf", 0) .load("table/people") .orderBy("id").show();
Notice that we added a new parameter called versionAsOf , this parameter allows us to configure the number of version you want to restore temporarily for a table. For this scenario we configure the reading for the Delta Table version zero (0). This was the first version generated by Delta Lake after write operation.
Reading Delta table - Version 1 (one)
For this last step we're using the version one (1), note that the data from the previous version has been maintained because an append mode was executed.
session.read().format("delta") .option("versionAsOf", 1) .load("table/people") .orderBy("id").show();
Delta lake has a lot of benefits and Time travels allows us flexibility in a Big Data architecture, for more details I recommend see the Delta Lake docs .
Well that's it, I hope you enjoyed it.