Using Natural Language Processing (NLP) to Implement Sentiment Analysis and Keyword Extraction on Yale Course Evaluations
In this project, we collect Yale course evaluations from the student run website CourseTable. Using these text reviews, we then perform two natural language processing (NLP) techniques: sentiment analysis and keyword extraction. The overall goal is to identify suitable NLP tools that can quickly and effectively summarize key course information for the Yale community. In theory, sentiment analysis would be used to calculate the percentage of students who recommend a course, while keyword extraction would be used to identify the key skills, strengths, weaknesses, and areas of improvement for Yale courses. The sentiment analysis portion includes several pretrained and newly trained models using a manually labeled dataset. The pretrained models prove extremely ineffective, however, the machine learning models perform quite well. These include random forest, logistic regression, support vector machine, and neural networks. Support vector machine (SVM) proves to be the most robust model, boasting an F1 score of 85.9% and 77.6% for the three-class and five-class datasets respectively. To better understand SVM’s effectiveness, we also discuss some of the mathematical theory behind this machine learning algorithm. We then test several pretrained keyword extraction models, all of which produce unsatisfactory results. As an alternative, we create a ChatGPT API to handle keyword extraction on the Yale course reviews. This model performs extremely well and is fairly cost effective. Finally, we combine these sentiment analysis and keyword extraction methods to produce a proof of concept dashboard. This serves as an example implementation of what a real application of these NLP techniques would look like for the Yale communit