Project -
Code data preprocessing module to remove the headers and commented code
Objective
This module is designed to automatically remove license headers and commented code from source code files. It will support multiple programming languages and can be easily extended to accommodate new ones. The module will be integrated into the data preprocessing pipeline to generate a clean code datasets to train Code LLM.
Outcomes
deliver a new code data preprocessing module based on the data prep kit
Apply by Date
15/05/2024
Applied Teams
1 / 1
Duration
8 weeks
College
1. Sarvajanik College of Engineering and Technology
Tools-Technologies
Jupyter Python Notebooks, Python
Mentor
Parameswaran Selvam
Platform
1 ) Watson Data Platform