Create and maintain the source data in Data Lake using IBM Cloud Storage Instances.
Create and manage source data in a data lake using IBM Cloud Storage instances, and build a data pipeline to extract raw data from the source.
Objective: Design a pipeline to handle data injection from APIs. Design a data pipeline to inject data from REST APIs. Make sure you use a REST client and a mechanism to schedule batch data processing using cron expressions..
The data pipeline should include, Extraction of data from APIs/DB2 table.
Loading data into IBM COS (Object Storage) as a landing area. Batch data load monitoring mechanism (validation and validations)
Design a mechanism to handle errors and capture errors. Ensure that data loss does not occur by retrying mechanisms or creating a notification system and auditing the ETL processes. Technology Stack: API: REST API, BOX API (SDK) Cloud: IBM Cloud (SDK) Programming: Python, DB2 SQL Storage: IBM Object Storage, IBM Db2 Version Control: GitHub
Tools-Technologies | DB2, Jupyter Python Notebooks, Personality insights API, Spark, Watson APIs, WatsonX.data |
Platform | 1 ) WatsonX |
College | | 1. Symbiosis Institute of Technology, Pune |
|
Documents | 1 ) IBM GRM PROJECT REPORT_Symbiosis Institute of Technology Download |
Muralidhar Boddepalli' Comment
Create and manage source data in a data lake using IBM Cloud Storage instances, and build a data pipeline to extract raw data from the source.