LLM Based Linguistic Heuristics for Digital Forensics and Incident Response

Author: 
Cody Wyatt Neiman
Adviser(s): 
Timothy Barron
Abstract: 

The integration of Large Language Models (LLMs) into digital forensics and incident response (DFIR) practices represents a transformative shift from traditional methodologies, which largely depend on manual analysis and basic automations. This project presents TensorGuard, a comprehensive digital forensics tool that leverages the sophisticated linguistic capabilities of LLMs to enhance the processing, analysis, and reporting of digital evidence within cybersecurity incidents.

TensorGuard consists of four main components: the Collector, Extractor, Integrator, and Reporter. The Collector efficiently gathers digital artifacts from live systems or forensic images. The Extractor processes these artifacts by decrypting, decompressing, and parsing them to produce structured and meaningful data outputs. The Integrator then applies contextual windowing techniques to structure data into meaningful segments, facilitating the LLM’s ability to generate analyses based on the contextual relationships among data points. Finally, the Reporter presents the findings in an intuitive and interactive format, enabling analysts to understand and communicate the implications of the incident effectively.

The application of LLMs in TensorGuard allows for the employment of advanced linguistic heuristics directly on the forensic data, bypassing traditional numerical conversions and enhancing both the speed and accuracy of the analysis. Initial trials with TensorGuard have demonstrated its capability to identify and report suspicious activities accurately, although these trials also highlighted remaining areas for improvement such as reducing false positives and refining the model’s sensitivity through parameters and fine-tuning.

Future work for TensorGuard includes expanding its applicability to other operating systems, enhancing user input features for more tailored analyses, and exploring potential live response capabilities. Additionally, integrating reinforcement learning and continuous feedback mechanisms could further refine the tool’s accuracy and user experience. TensorGuard presents an intriguing challenge to current practices in DFIR by the integration of LLMs in cybersecurity, promising to significantly enhance the efficiency and effectiveness of incident response operations.

Term: 
Spring 2024