24
May
Some friends and I started a weekly paper club to read and discuss fundamental papers in language modeling. By pooling together our shared knowledge, experience, and questions, we learned more as a group than we could have individually. To encourage others to do the same, here’s the list of papers we covered, and a one-sentence summary for each. I’ll update this list with new papers as we discuss them. (Also, why and how to read papers .) Attention Is All You Need: Query, Key, and Value are all you need* (*Also position embeddings, multiple heads, feed-forward layers, skip-connections, etc.) GPT:…