Project
Analyze and transform structured dataset by using Apache Spark and Python. Engineer a data pipeline to extract the raw data from source, create and maintain the source data in Data Lake using IBM Cloud Storage Instances.
Objective
Analyze and transform structured dataset as per user requirements using Apache Spark and Python. Engineer a data pipeline to extract the raw data from source, create and maintain the source data in Data Lake using IBM Cloud Storage Instances. Apply relevant transformations on the data and save the data in DB2 on IBM Cloud. Ensure the data pipeline is automated and the data is governed, audited at all stages. Create a self-reliant system by notifying appropriate systems on successful completion of Data transformations and log the records that were not successfully processed.
Outcome
Develop Data pipeline and automated, and the data is governed, audited at all stages.
| Apply By Date |
28 Feb 2025 |
| Students |
1 / 2 |
| Duration |
3 months |
| Mentor |
Venkata H Sonti |
Tools-Technologies | DB2, Spark |
College | | 1. Symbiosis Institute of Technology, Pune |
|
Documents | 1 ) IBM GRM PROJECT REPORT_Symbiosis Institute of Technology Download |