Connect Build

Project

Create and maintain the source data in Data Lake using IBM Cloud Storage Instances.

Objective

Create and manage source data in a data lake using IBM Cloud Storage instances, and build a data pipeline to extract raw data from the source.

Outcome

Objective: Design a pipeline to handle data injection from APIs. Design a data pipeline to inject data from REST APIs. Make sure you use a REST client and a mechanism to schedule batch data processing using cron expressions.. The data pipeline should include, Extraction of data from APIs/DB2 table. Loading data into IBM COS (Object Storage) as a landing area. Batch data load monitoring mechanism (validation and validations) Design a mechanism to handle errors and capture errors. Ensure that data loss does not occur by retrying mechanisms or creating a notification system and auditing the ETL processes. Technology Stack: API: REST API, BOX API (SDK) Cloud: IBM Cloud (SDK) Programming: Python, DB2 SQL Storage: IBM Object Storage, IBM Db2 Version Control: GitHub

Apply By Date	28 Feb 2025
Students	1 / 2
Duration	3 Months
Mentor	Muralidhar Boddepalli

Tools-Technologies

DB2, Jupyter Python Notebooks, Personality insights API, Spark, Watson APIs, WatsonX.data

Platform

1 ) WatsonX

College

1. Symbiosis Institute of Technology, Pune

Documents

1 ) IBM GRM PROJECT REPORT_Symbiosis Institute of Technology Download

Muralidhar Boddepalli' Comment

Create and manage source data in a data lake using IBM Cloud Storage instances, and build a data pipeline to extract raw data from the source.