Project
Code data preprocessing module to remove the headers and commented code
Objective
This module is designed to automatically remove license headers and commented code from source code files. It will support multiple programming languages and can be easily extended to accommodate new ones. The module will be integrated into the data preprocessing pipeline to generate a clean code datasets to train Code LLM.
Outcome
deliver a new code data preprocessing module based on the data prep kit
Apply By Date 15 May 2024
Students 1 / 1
Duration 8 weeks
Mentor Parameswaran Selvam
Tools-Technologies
Jupyter Python Notebooks,
Platform
1 ) WatsonX
College
1. Sarvajanik College of Engineering and Technology