Project -
Data Augmentation for Personal Knowledge Graph Population
Objective
To identify bias in datasets (as in AI Fairness, bias against any minority population), we first need to be able to identify and extract protected variables (personal data entities) from unstructured text. Our team in IBM Research has done some work in this area. During this student project, we want to explore the problem of determining if the distribution of protected variables is biased towards a subset of possible values and hence biased on that protected variable. We are proposing a solution that uses a combination of generative and discriminative models to determine if the dataset is biased on any of the protected variables.
Outcomes
1. Extract protected variables from unstructured text using data-discovery-api. 2. Ability to identify the distribution of protected variables in a dataset. 3. Create models to predict if the dataset is biased on a protected variable.
Apply by Date
31/12/2020
Applied Teams
10 / 10
Duration
2-3 months
College
All College
Tools-Technologies
Jupyter Python Notebooks, Python
Mentor
Balaji Ganesan
Documents
1 ) Bias Detection in Unstructured Datasets Download