Connect Build

Project

Analyze and transform structured dataset by using Apache Spark and Python. Engineer a data pipeline to extract the raw data from source, create and maintain the source data in Data Lake using IBM Cloud Storage Instances.

Objective

Analyze and transform structured dataset as per user requirements using Apache Spark and Python. Engineer a data pipeline to extract the raw data from source, create and maintain the source data in Data Lake using IBM Cloud Storage Instances. Apply relevant transformations on the data and save the data in DB2 on IBM Cloud. Ensure the data pipeline is automated and the data is governed, audited at all stages. Create a self-reliant system by notifying appropriate systems on successful completion of Data transformations and log the records that were not successfully processed.

Outcome

Develop Data pipeline and automated, and the data is governed, audited at all stages.

Apply By Date	28 Feb 2025
Students	1 / 2
Duration	3 months
Mentor	Venkata H Sonti

Tools-Technologies

DB2, Spark

College

1. Symbiosis Institute of Technology, Pune

Documents

1 ) IBM GRM PROJECT REPORT_Symbiosis Institute of Technology Download