• JP

Converting Parquet table to Delta Table

For this post we're going to create examples to how convert parquet table to Delta table. First, we'll create a parquet table from scratch through a Spark Dataframe and then converting to Delta table.


Using Delta table has some benefits comparing to a Parquet table. Delta enables to restore versions of your table through time travel function, ACID supports and more.


Creating a Parquet table


First of all, let's create a parquet table to be converted later to Delta Table. I'll prefer create a parquet table from scratch to bring a better understanding.


The following code will be executed once, just to create a parquet table. We're going to use a Spark Dataframe that will be loaded from a JSON file containing semi-structured records.


public static void main(String[] args){

    SparkConf conf = new SparkConf();
    conf.setAppName("spark-delta-table");
    conf.setMaster("local[1]");

    SparkSession session = SparkSession.builder()
            .config(conf)
            .getOrCreate();

    Dataset<Row> dataFrame = session.read().json("product.json");

    dataframe.write().format("parquet").save("table/product");

}

The above example, we start creating a SparkSession object to create and manage a Spark Dataframe that was loaded from the product.json file content.


Alter load, the Dataframe creates (or write) a table in parquet format in the table/product directory.


JSON content


File represented by product.json file that contains semi-structured records.

{"id":1, "name":"rice", "price":12.0, "qty": 2}
{"id":2, "name":"beans", "price":7.50, "qty": 5}
{"id":3, "name":"coke", "price":5.50, "qty": 2}
{"id":4, "name":"juice", "price":3.80, "qty": 1}
{"id":5, "name":"meat", "price":1.50, "qty": 1}
{"id":6, "name":"ice-cream", "price":6.0, "qty": 2}
{"id":7, "name":"potato", "price":3.70, "qty": 10}
{"id":8, "name":"apple", "price":5.60, "qty": 5}

After running the code above, parquet files will be generated in the table/product directory containing the files below.




Converting Parquet table to Delta Table


Now that we have a Parquet table already created, we can convert easily to Delta Table, let's do this.


public static void main(String[] args){

    SparkConf conf = new SparkConf();
    conf.setAppName("spark-delta-table");
    conf.setMaster("local[1]");

    SparkSession session = SparkSession.builder()
            .config(conf)
            .getOrCreate();

    DeltaTable.convertToDelta(session, "parquet.`table/product`");
    
}

DeltaTable.convertToDelta method is responsible to convert parquet table to Delta table. Note that we had to use SparkSession as a parameter and also specify the path of parquet table using this format "parquet.`<path>`" .


The result after execution you can see in the picture below.


After conversion running, Delta creates the famous _delta_log directory containing commit info and checkpoint files.

Well that's it, I hope you enjoyed it!









Posts recentes

Ver tudo