Compiling From C
In which our intrepid protagonist attempts to take words and turn them into a functioning chunk of ones and zeros that does something on his computer
I was scrolling through Twitter, which remains a source of fascinating learning for me, especially about AI stuff, and came across this tweet. The author is Victor Taelin who is doing a project called HVM, GitHub repo here, described as “a compiler and evaluator for high-level languages that automatically achieves near-ideal speedup, up to 1000+ threads.”
Pretty advanced stuff and I really don’t understand much of it, but from what I gather he’s using a theory that uses a concurrent computation to create Interaction Combinators and in that way allow for a program to run in parallel. This is important because instead of things running serially, working down a bunch of code on a CPU, it can run simultaneously on the thousands of nodes in the GPU. I tried to read the paper on which it’s based, from 1997 by Yves Lafont, but wow did my comprehension come nowhere close to grasping it. Theoretical computer science is … challenging.
Anyway, Victor had this bit of code in a gist that he was asking folks on the internet to run for him, to see if the new M4 chips from Apple were worth investigating. Since his code is pure C, it should run just fine and even includes a little thing at the end to spit out the results. It takes a fixed number of interactions and runs them over a fixed number of nodes and the fast the better, more int/sec, the better.
All of this is an excuse to exercise my new Mac mini with an M4 Pro and 64 gigabytes of unified RAM, shared with the twenty GPU cores. The memory bandwidth is substantially higher at 273 gb/s and I wanted to see how it compared to some of the other systems running Nvidia GPUs and another person with an M2 Ultra Mac Studio.
I used Claude to help me with some of the background and it was surprisingly easy. I’ve seen mention by coders about “compiling from source” and how Real Nerds never just download a binary, they get the source code from GitHub or SourceForge back in the day and compile it themselves. And from what I gather a considerable portion of a developer’s day back in the olden times (like 2004) was waiting for the code to recompile after making an edit or two to the Java code.
It was really quite simple. I copy-pasted the gist to a plain text document on my computer, saved it with a .c extension, which then made the text all colorful because I have syntax highlighting turned on in TextMate. Then I went to the command line and verified that I had Clang installed (not sure why but I did, and it was recent) and typed “clang hvm3.c -o hvm.” I have no idea what the -o flag is doing, but that’s what Claude says so I did it. [apparently means “optimize” and there are three different levels that make various tradeoffs. I stuck to the plain one without a numeric identifier.]
A second or two later, possibly less, it was done. I did a quick ls -lah and saw that, yep, there’s now a binary called “hvm” sitting right there in my home directory.
I had no idea how to run the thing, but another trip to Claude revealed the oh-so-difficult command of “./hvm3”. And do you know what, dear reader? IT WORKED! First time! Amazing.
Here’s the fun part: I ran it ten times in a row, by pressing the up arrow cursor key to get the last run command and then hitting enter, so it did ten different runs of the program all with slightly different times. I then took a screen shot of it and dumped it into Claude and asked it to pull the interactions per second data, do a min-max-mean on it and report the results.
I love using these multi-billion dollar LLMs for mundane tasks, like copying and pasting and performing rudimentary math. In my day job, I had Claude pull the email address for the first lawyer for each defendant out of a 20+ page PDF and then format so I could copy-paste into Outlook. Took maybe forty seconds. Me highlighting and copying over would’ve been easily fifteen minutes of tedium, and I might have gotten something wrong. I spent a minute to double-check, of course, but that was no problem at all and it got it right the first time.
I also ran the same program on my MacBook Air with an M2 chip and on my Ubuntu desktop with a RTX 3080 board.
(oh and one more thing: the cute developer bear cartoon is courtesy of xAI in the Twitter app, it took me four or five iterations to get it right but I think it’s a nice little addition to the essay.)