How to read CSV file with Apache Spark

Apache Spark works very well in reading several files for data extraction, in this post we'll create an example of reading a CSV file using Spark, Java and Maven. Maven < dependencies >
< dependency >
< groupId >org.apache.spark</ groupId >
< artifactId >spark-core_2.12</ artifactId >
< version >3.1.0</ version >
</ dependency >
< dependency >
< groupId >org.apache.spark</ groupId >
< artifactId >spark-sql_2.12</ artifactId >
< version >3.1.0</ version >
</ dependency >
</ dependencies > CSV Content Let's suppose that file's name below is movies.csv . title;year;rating
The Shawshank Redemption;1994;9.3
The Godfather;1972;9.2
The Dark Knight;2008;9.0
The Lord of the Rings: The Return of the King ;2003;8.9
Pulp Fiction;1994;8.9
Fight Club;1999;8.8
Star Wars: Episode V - The Empire Strikes Back;1980;8.7
Star Wars;1977;8.6 Creating a SparkSession SparkConf sparkConf = new SparkConf();
sparkConf.setMaster( "local[*]" );
sparkConf.setAppName( "app" );

SparkSession sparkSession = SparkSession. builder ()
.getOrCreate(); Running the Read Dataset<Row> ds =
.format( "CSV" )
.option( "sep",";" )
.option( "inferSchema" , "true" )
.option( "header" , "true" )
.load( "movies.csv" ); "title","year","rating" ).show(); Result Understanding some parameters .option( "sep" , ";" ): Defines the use of a default delimiter for file reading, in this case the delimiter is a semicolon (;) .option( "inferSchema" , "true" ): The inferSchema parameter makes it possible to infer the file(s) in order to understand (guess) the data types of each field .option( "header" , "true" ): Enabling the header parameter makes it possible to use the name of each field defined in the file header .load( "movies.csv" ): movies.csv is the name of the file to be read

