Project
Analyze and transform structured dataset by using Apache Spark and Python. Engineer a data pipeline to extract the raw data from source, create and maintain the source data in Data Lake using IBM Cloud Storage Instances.
Objective
Analyze and transform structured dataset as per user requirements using Apache Spark and Python. Engineer a data pipeline to extract the raw data from source, create and maintain the source data in Data Lake using IBM Cloud Storage Instances. Apply relevant transformations on the data and save the data in DB2 on IBM Cloud. Ensure the data pipeline is automated and the data is governed, audited at all stages. Create a self-reliant system by notifying appropriate systems on successful completion of Data transformations and log the records that were not successfully processed.
Outcome
Develop Data pipeline and automated, and the data is governed, audited at all stages.
Apply By Date 28 Feb 2025
Students 1 / 2
Duration 3 months
Mentor Venkata H Sonti
Tools-Technologies
DB2, Spark
College
1. Symbiosis Institute of Technology, Pune
Documents
1 ) IBM GRM PROJECT REPORT_Symbiosis Institute of Technology Download