Project - | Data Augmentation for Personal Knowledge Graph Population |
Objective | To identify bias in datasets (as in AI Fairness, bias against any minority population), we first need to be able to identify and extract protected variables (personal data entities) from unstructured text. Our team in IBM Research has done some work in this area. During this student project, we want to explore the problem of determining if the distribution of protected variables is biased towards a subset of possible values and hence biased on that protected variable. We are proposing a solution that uses a combination of generative and discriminative models to determine if the dataset is biased on any of the protected variables. |
Outcomes | 1. Extract protected variables from unstructured text using data-discovery-api.
2. Ability to identify the distribution of protected variables in a dataset.
3. Create models to predict if the dataset is biased on a protected variable. |
Apply by Date | 31/12/2020 |
Applied Teams | 10 / 10 |
Duration | 2-3 months |
College | All College |
Tools-Technologies | Jupyter Python Notebooks, Python |
Mentor | Balaji Ganesan |
Documents | 1 ) Bias Detection in Unstructured Datasets Download |