Arjun Neervannan (M&T ’24) on Combating Cyberbullying with AI and Winning one of the Regeneron STS Scholarship Prizes

Arjun Neervannan is a First Year student in the M&T Program from Irvine, CA studying Computer and Information Science in the School of Engineering and an Undeclared concentration in the Wharton School. Last May, he won one of the Regeneron Science Talent Search Scholarships. Each year, nearly 1,900 students submit original research in critically important scientific fields. Arjun’s project, “Combating Cyberbullying and Toxicity by Teaching AI to Use Linguistic Insights from Human Interactions in Social Media” identifies the critical need to reduce toxic behavior online and the potential for AI technology to aid in these efforts.

We connected with Arjun to tell us more about his prize-winning research.

M&T: How did you conceive of this project?

AN: After noticing toxic comments in social media, I researched the AI-based monitoring tools that classified such comments as toxic and found that they were biased in identifying toxicity because of their inabilities to pinpoint the identity terms (e.g., “gay”) that influenced the classification. The root of the issue was that the algorithms didn’t understand the context and unnecessarily flagged them as toxic without providing reasoning for the classification.

I independently researched about Ethics Bias in AI and came up with this project idea to improve on current work. My personal experiences from elementary school while working on group projects on Google Docs also gave an idea on the impact it would have on kids and played a role in choosing this topic for research. Students would post toxic comments (while working in the group) on Google Docs and quickly delete them before teachers would see them. I wanted to come up with a transparent mechanism that would highlight the toxic language usage.

The goal of my project was to develop a scalable, interpretable model that would generate the identity terms and de-bias the terms that the model is actually biased against.

M&T: Is this your first project of this scale? What coding have you done in the past?

AN: This is not my first project involving AI and Machine Learning, but it was the first of this scale. In the past, I developed an AI algorithm that utilized Reinforcement Learning to make a simulated robot learn to walk. The idea for this came about after seeing that Google’s AlphaGo had beat the best Go player.

How did it learn complex strategies? Upon reading about AlphaGo’s Deep Reinforcement Learning (RL) algorithms, Trust-Region and Proximal Policy Optimizations, I found myself tinkering with these algorithms to make a simulated robot learn to walk, and even published a paper on selecting hyperparameters in an international journal. It was fascinating to
see the potential of an algorithm to learn complex strategies with mathematics and iterations. This project was my initial foray into the world of AI research and Machine Learning. While I was really interested in the ability of the AI to learn to walk on its own, I also wanted to explore different aspects of AI, and build some kind of AI-driven solution to help combat a social crisis.

M&T: Can you explain the limitations of other AI debiasing models and how your model solves for these?

AN: Current approaches to bias-free AI models are limited in their debiasing scope, are not scalable due to manual feature selection, and are often black-boxes, or not interpretable. With regards to the debiasing scope, some debiasing approaches use human-selected identity terms, rather than using the model’s predictions to figure out where the model is biased against. The method that I used attacked this problem from a different perspective by using a more transparent model, called a Hierarchical Attention Model, to determine where the model “pays attention” when classifying a sentence. Then using this, I was able to determine the words that the model was biased against. This process of determining the words that the model is biased against using the model’s predictions also enables the model to be more scalable across different languages and contexts as human intervention is not needed.

In addition, the use of the Hierarchical Attention Model also prevents the model from being a black-box as the results are more interpretable. The study of making models more interpretable and understandable is an ongoing one, so many debiasing approaches do not use such approaches.

M&T: In your abstract you describe that one advantage to your model is that it does not have comment length limitations the way other debiasing models do. Why is this a limitation for other algorithms and how were you able to get around it?

AN: Some toxic comment classification models use a type of network called a “Convolution Neural Network,” which is more sensitive to the length of comment due to the structure of the network and the fact that it does not have sequence-learning capabilities. The model that I used, called the Hierarchical Attention Network, is a sequence learning-based model, so it is able to better understand the relationship between words in a long sentence. For this reason, the model that I used was less sensitive to comment length and can thus handle different comments well.

M&T: Can you describe how you came to adopt the linguistic-driven Noun-Adjective criteria to generate a set of words to “debias”?

AN: After reading many of the comments in the dataset, I noticed that a common feature of the identity term (“gay”, “muslim”, “white”, etc.) was that it was used as nouns in some contexts and adjectives in different contexts. I realized that using such a criteria would help me narrow the selection of the words to “debias” so that I would be only fixing the
model’s prediction on a particular set of identity terms (since the scope of the project was to fix only the bias on identity terms).

M&T: What was the most surprising fact you learned in your research on cyberbullying?

AN: One thing that was shocking to me about cyberbullying was just the scale at which it occurred (both in the US and worldwide) and the number of different ways in which it happened. Previously, I understood that it was a big issue, but I didn’t comprehend the scale at which it was happening worldwide–one study revealed that as many as 7/10 teenagers were victims of cyberbullying at some point in their lives. I also became more aware of the different ways in which cyberbullying happened–from actions as hurtful as blackmail videos to just toxic comments in social media.

Through this, I began to realize the power that AI had and how it could be useful in scanning through these massive volumes of data to discover instances of cyberbullying. I also realized that such AI algorithms could be used to correct behavior by showing the words that made a comment toxic.

M&T: Is it possible to expand this model to other languages? What would that entail?

AN: To expand this model to other languages, I would have to first source a labeled toxic comment dataset in another language and train my baseline model on that language. Then, to debias it, I would have to use a similar linguistic criteria (e.g., the noun-adjective criteria discussed previously) to create a set of words to “debias” the model. Next, to actually debias it, I would have to source an external set of content in this language (e.g., for the English model I used a corpus of text from Wikipedia) to augment the original dataset with content. Finally, I would have to retrain the model on this augmented dataset to rectify the biases (if there were any).

M&T: You recently won the Regeneron prize – congratulations! – how did that award come about and how do you plan to make use of your prize money?

AN: After submitting my project to the Regeneron STS competition and being selected to be one of 40 finalists, I had to participate in the Regeneron Finalist Week Competition, which was held entirely online. This involved participating in 8 rounds of judging, 5 of which tested my general science knowledge (anything from computer science/math to biology) and problem solving ability, and 3 of which were about my project itself.

Initially, the Regeneron STS Finalist Week Competition was scheduled to be in person, but due to the pandemic, the event was held online this year. This meant that the entire judging process and awards ceremony were held online. Although it wasn’t what I originally had hoped for, I’m still very glad that I got to participate in the Finals Week competition and
meet so many other amazing finalists. I was ecstatic to hear that I’d won the 10th place award, which also included $40,000 in scholarships. I plan to use this prize money for my education here at Penn.

M&T: What is your end goal for this project?

AN: My end goal for this project is to not only improve the accuracy of the debiasing process and refine some of the shortcomings of the existing model, but to also use the model in a social media or chat setting, where it could be used to detect toxic comments and give live feedback. My initial attempt at this was detoxifAI, an AI-powered cyberbullying Chrome Extension that detects toxic comments on Google Docs and gives live feedback. I found that in elementary and middle school, students often said toxic comments in Google Docs but quickly erased them, and as a result, teachers would often miss these comments. I designed detoxifAI to be a more interactive and feedback-oriented solution that would encourage better behavior at a young age. In the longer term, I hope to implement such explainable AI across social media platforms to help improve behavior online.