Porting microgpt to Futhark, Part I
The author explores porting Andrej Karpathy's microgpt, a minimal GPT-2-like implementation in Python, to the data-parallel language Futhark to improve scalability. This first part focuses on translating the forward pass while maintaining structural similarity to the original code. While the Futhark version scales better, it sacrifices some conciseness compared to the Python version.
Opening excerpt (first ~120 words) tap to expand
.highlight { margin-left: 3em; } I have been wanting to find a project to try out the data-parallel language Futhark. They have a very good blog that I've been following for years, but I've never actually written anything in it. Andrej Karpathy's microgpt, a self-contained implementation of a GPT-2-like neural network in 200 lines of Python, finally provided the excuse. I like microgpt, but it does not scale at all. Obviously the point of this implementation is not efficiency, but it's not just that it's slow: you also can't scale up to even slightly larger networks, because you quickly hit Python recursion depth errors. So, I was curious whether I could port it as 1-to-1 as possible and get much better scaling without losing too much concision.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at Kmjn.