QuArch: A Question-Answering Dataset for AI Agents in Computer Architecture

arXiv:2501.01892v1 Announce Type: cross
Abstract: We introduce QuArch, a dataset of 1500 human-validated question-answer pairs designed to evaluate and enhance language models’ understanding of computer architecture. The dataset covers areas including processor design, memory systems, and performance optimization. Our analysis highlights a significant performance gap: the best closed-source model achieves 84% accuracy, while the top small open-source model reaches 72%. We observe notable struggles in memory systems, interconnection networks, and benchmarking. Fine-tuning with QuArch improves small model accuracy by up to 8%, establishing a foundation for advancing AI-driven computer architecture research. The dataset and leaderboard are at https://harvard-edge.github.io/QuArch/.

Source link
lol

QuArch: A Question-Answering Dataset for AI Agents in Computer Architecture

By stp2y

Leave a Reply Cancel reply