Published On: January 24th, 2023Categories: AI News

AWS Glue is a fully managed serverless ETL service. It makes it easy to discover, transform and load data that would be consumed by various processes and applications. If you want to learn more about AWS Glue then please refer to the video on AWS Glue Overview

Objective (CSV to Parquet)

In this article, we will go through the basic end-to-end CSV to Parquet transformation using AWS Glue. We will use multiple services to implement the solution like IAM, S3 and AWS Glue. As a part of AWS Glue, we will use crawlers, Data Catalog including Database & Tables and ETL jobs.



Architecture

Workflow

Let’s understand the above flow.

  1. Create a crawler, which will connect to the S3 data store
  2. Post successful connection, it will infer or determine the structure of the CSV file using a built-in classifier
  3. The crawler will write the metadata in the form of a table in the AWS Glue Data Catalog
  4. After populating the data catalog, create the ETL job to transform CSV into parquet
  5. The data source for the…

Source link

Leave A Comment