Engineer 'builds a GPU from scratch' in two weeks

https://www.tomshardware.com/pc-components/gpus/engineer-builds-a-gpu-from-scratch-in-two-weeks-process-much-harder-than-he-expected

An engineer has shared his journey in “building a GPU from scratch with no prior experience.” As with his prior project of designing a CPU from scratch, Adam Majmudar took just two weeks to complete this cerebral feat. In a Twitter/X thread Majmudar takes us through the process, step-by-step, and admits GPU designing was a much harder task than expected. To be clear, the current conclusion of the project is a chip layout in Verilog which was finally passed through OpenLane EDA software to verify it. However, the GPU is going to be submitted for tapeout via Tiny Tapeout 7 so is destined to become a physical chip in the coming months.

Above you can see the flow of tasks Majmudar worked through to design his GPU. Yet, as a ‘from scratch’ project, a lot of study and thought was required even before the first step was tentatively taken. Last time we highlighted the engineer’s concerns that GPUs would be a relatively difficult field of study, due to the dominance of proprietary tech, as that prediction came true.

Through several iterations of the above architecture, Majmudar decided to focus on general-purpose parallel computing (GPGPUs) capabilities. Thus he adjusted his Instruction Set Architecture (ISA), which features just 11 instructions, to achieve this goal. Next up, the engineer wrote two matrix math kernels to run on his GPU. These matrix addition and multiplication kernels would demonstrate the key functionality of the GPU and provide evidence of its useful application in graphics and machine learning tasks.

It had been relatively easy for the engineer so far, but building his GPU in Verilog presented “many issues.” Advice from the (in)famous George Hotz helped Majmudar move past one of his first (and second) hurdles regarding memory and a warp-scheduler implementation. A third rewrite of his code did the trick though, fixing compute core execution scheduling.

Some more unspecified redesigns later and the proof of the pudding, a video showing the matrix addition kernel running and validating, was shared in the Tweet thread.

Lastly, the completed Verilog design was passed through OpenLane EDA, targeting the Skywater 130nm process node (for Tiny Tapeout). Again some issues needed to be ironed out. In particular, Majmudar explains that some Design Rule Checks (DRCs) failed and necessitated rework.

After the two-week effort, the engineer enjoyed playing with a cool 3D visualization of his GPU design. That will have to suffice until TT7 returns silicon to participants. Of course, the work isn't going to rank among the best graphics cards. If you want to read more about this homemade GPU check out the entertaining social media thread and / or investigate the dedicated Tiny-GPU GitHub page.

Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.

{
  "by": "blcArmadillo",
  "descendants": 25,
  "id": 40219510,
  "kids": [
    40219702,
    40219892,
    40220314,
    40220326,
    40239871,
    40221721
  ],
  "score": 71,
  "time": 1714537745,
  "title": "Engineer 'builds a GPU from scratch' in two weeks",
  "type": "story",
  "url": "https://www.tomshardware.com/pc-components/gpus/engineer-builds-a-gpu-from-scratch-in-two-weeks-process-much-harder-than-he-expected"
}

{
  "author": "Mark Tyson",
  "date": "2024-04-29T13:10:52.000Z",
  "description": "The ‘CPU from scratch’ guy has done it again.",
  "image": "https://cdn.mos.cms.futurecdn.net/X67X6qB6M6RSurcZV9eTgT-1280-80.jpg",
  "logo": null,
  "publisher": "Tom's Hardware",
  "title": "Engineer ‘builds a GPU from scratch’ in two weeks — process much harder than he expected",
  "url": "https://www.tomshardware.com/pc-components/gpus/engineer-builds-a-gpu-from-scratch-in-two-weeks-process-much-harder-than-he-expected"
}

{
  "url": "https://www.tomshardware.com/pc-components/gpus/engineer-builds-a-gpu-from-scratch-in-two-weeks-process-much-harder-than-he-expected",
  "title": "Engineer 'builds a GPU from scratch' in two weeks — process much harder than he expected",
  "description": "An engineer has shared his journey in “building a GPU from scratch with no prior experience.” As with his prior project of designing a CPU from scratch, Adam Majmudar took just two weeks to complete this...",
  "links": [
    "https://www.tomshardware.com/pc-components/gpus/engineer-builds-a-gpu-from-scratch-in-two-weeks-process-much-harder-than-he-expected"
  ],
  "image": "https://cdn.mos.cms.futurecdn.net/X67X6qB6M6RSurcZV9eTgT-1200-80.jpg",
  "content": "<div>\n<p>An engineer has shared his journey in “building a GPU from scratch with no prior experience.” As with his prior project of <a target=\"_blank\" href=\"https://www.tomshardware.com/pc-components/cpus/engineer-creates-cpu-from-scratch-in-two-weeks-begins-work-on-gpus#xenforo-comments-3841959\">designing a CPU from scratch</a>, Adam Majmudar took just two weeks to complete this cerebral feat. In a <a href=\"https://twitter.com/MajmudarAdam/status/1783304260303855774\" target=\"_blank\">Twitter/X thread</a> Majmudar takes us through the process, step-by-step, and admits <a target=\"_blank\" href=\"https://www.tomshardware.com/news/nvidia-gpu-powered-ai-improves-gpu-designs\">GPU designing</a> was a much harder task than expected. To be clear, the current conclusion of the project is a chip layout in Verilog which was finally passed through OpenLane EDA software to verify it. However, the GPU is going to be submitted for tapeout via <a href=\"https://tinytapeout.com/\" target=\"_blank\">Tiny Tapeout 7</a> so is destined to become a physical chip in the coming months.</p><figure><div><p> <picture>\n<source type=\"image/webp\" srcset=\"https://cdn.mos.cms.futurecdn.net/G67Se7AK9B7Dgt6RMVsLnT-1200-80.jpg.webp 1200w, https://cdn.mos.cms.futurecdn.net/G67Se7AK9B7Dgt6RMVsLnT-1024-80.jpg.webp 1024w, https://cdn.mos.cms.futurecdn.net/G67Se7AK9B7Dgt6RMVsLnT-970-80.jpg.webp 970w, https://cdn.mos.cms.futurecdn.net/G67Se7AK9B7Dgt6RMVsLnT-650-80.jpg.webp 650w, https://cdn.mos.cms.futurecdn.net/G67Se7AK9B7Dgt6RMVsLnT-480-80.jpg.webp 480w, https://cdn.mos.cms.futurecdn.net/G67Se7AK9B7Dgt6RMVsLnT-320-80.jpg.webp 320w\" sizes=\"(min-width: 1000px) 970px, calc(100vw - 40px)\"></source>\n<img src=\"https://cdn.mos.cms.futurecdn.net/G67Se7AK9B7Dgt6RMVsLnT.jpg\" alt=\"GPU from scratch\" srcset=\"https://cdn.mos.cms.futurecdn.net/G67Se7AK9B7Dgt6RMVsLnT-1200-80.jpg 1200w, https://cdn.mos.cms.futurecdn.net/G67Se7AK9B7Dgt6RMVsLnT-1024-80.jpg 1024w, https://cdn.mos.cms.futurecdn.net/G67Se7AK9B7Dgt6RMVsLnT-970-80.jpg 970w, https://cdn.mos.cms.futurecdn.net/G67Se7AK9B7Dgt6RMVsLnT-650-80.jpg 650w, https://cdn.mos.cms.futurecdn.net/G67Se7AK9B7Dgt6RMVsLnT-480-80.jpg 480w, https://cdn.mos.cms.futurecdn.net/G67Se7AK9B7Dgt6RMVsLnT-320-80.jpg 320w\" />\n</picture><a href=\"https://cdn.mos.cms.futurecdn.net/G67Se7AK9B7Dgt6RMVsLnT.jpg\" target=\"_blank\"></a></p></div><figcaption><span>(Image credit: Adam Majmudar)</span></figcaption></figure><p>Above you can see the flow of tasks Majmudar worked through to design his GPU. Yet, as a ‘from scratch’ project, a lot of study and thought was required even before the first step was tentatively taken. Last time we highlighted the engineer’s concerns that GPUs would be a relatively difficult field of study, due to the dominance of <a target=\"_blank\" href=\"https://www.tomshardware.com/tech-industry/artificial-intelligence/jim-keller-suggests-nvidia-should-have-used-ethernet-to-stitch-together-blackwell-gpus\">proprietary tech</a>, as that prediction came true.</p><figure><div><p> <picture>\n<source type=\"image/webp\" srcset=\"https://cdn.mos.cms.futurecdn.net/3qNXFjj8rKeLXHRECFKmtT-1200-80.jpg.webp 1200w, https://cdn.mos.cms.futurecdn.net/3qNXFjj8rKeLXHRECFKmtT-1024-80.jpg.webp 1024w, https://cdn.mos.cms.futurecdn.net/3qNXFjj8rKeLXHRECFKmtT-970-80.jpg.webp 970w, https://cdn.mos.cms.futurecdn.net/3qNXFjj8rKeLXHRECFKmtT-650-80.jpg.webp 650w, https://cdn.mos.cms.futurecdn.net/3qNXFjj8rKeLXHRECFKmtT-480-80.jpg.webp 480w, https://cdn.mos.cms.futurecdn.net/3qNXFjj8rKeLXHRECFKmtT-320-80.jpg.webp 320w\" sizes=\"(min-width: 1000px) 970px, calc(100vw - 40px)\"></source>\n<img src=\"https://cdn.mos.cms.futurecdn.net/3qNXFjj8rKeLXHRECFKmtT.jpg\" alt=\"GPU from scratch\" srcset=\"https://cdn.mos.cms.futurecdn.net/3qNXFjj8rKeLXHRECFKmtT-1200-80.jpg 1200w, https://cdn.mos.cms.futurecdn.net/3qNXFjj8rKeLXHRECFKmtT-1024-80.jpg 1024w, https://cdn.mos.cms.futurecdn.net/3qNXFjj8rKeLXHRECFKmtT-970-80.jpg 970w, https://cdn.mos.cms.futurecdn.net/3qNXFjj8rKeLXHRECFKmtT-650-80.jpg 650w, https://cdn.mos.cms.futurecdn.net/3qNXFjj8rKeLXHRECFKmtT-480-80.jpg 480w, https://cdn.mos.cms.futurecdn.net/3qNXFjj8rKeLXHRECFKmtT-320-80.jpg 320w\" />\n</picture><a href=\"https://cdn.mos.cms.futurecdn.net/3qNXFjj8rKeLXHRECFKmtT.jpg\" target=\"_blank\"></a></p></div><figcaption><span>(Image credit: Adam Majmudar)</span></figcaption></figure><p>Through several iterations of the above architecture, Majmudar decided to focus on general-purpose parallel computing (GPGPUs) capabilities. Thus he adjusted his Instruction Set Architecture (ISA), which features just <a href=\"https://twitter.com/MajmudarAdam/status/1783304244474659049\" target=\"_blank\">11 instructions</a>, to achieve this goal. Next up, the engineer wrote two matrix math kernels to run on his GPU. These matrix addition and multiplication kernels would demonstrate the key functionality of the GPU and provide evidence of its useful application in graphics and <a target=\"_blank\" href=\"https://www.tomshardware.com/tech-industry/artificial-intelligence\">machine learning</a> tasks.</p><p>It had been relatively easy for the engineer so far, but building his GPU in Verilog presented “many issues.” Advice from the (in)famous <a target=\"_blank\" href=\"https://www.tomshardware.com/pc-components/gpus/amds-lisa-su-steps-in-to-fix-driver-issues-with-new-tinybox-ai-servers-tiny-corp-calls-for-amd-to-make-its-radeon-7900-xtx-gpu-firmware-open-source\">George Hotz</a> helped Majmudar move past one of his first (and second) hurdles regarding memory and a warp-scheduler implementation. A third rewrite of his code did the trick though, fixing compute core execution scheduling.</p><p>Some more unspecified redesigns later and the proof of the pudding, a video showing the matrix addition kernel running and validating, was shared in the Tweet thread.</p><figure><div><p> <picture>\n<source type=\"image/webp\" srcset=\"https://cdn.mos.cms.futurecdn.net/t5Y5VSx5UhBK3JAYKdYZ3U-1200-80.jpg.webp 1200w, https://cdn.mos.cms.futurecdn.net/t5Y5VSx5UhBK3JAYKdYZ3U-1024-80.jpg.webp 1024w, https://cdn.mos.cms.futurecdn.net/t5Y5VSx5UhBK3JAYKdYZ3U-970-80.jpg.webp 970w, https://cdn.mos.cms.futurecdn.net/t5Y5VSx5UhBK3JAYKdYZ3U-650-80.jpg.webp 650w, https://cdn.mos.cms.futurecdn.net/t5Y5VSx5UhBK3JAYKdYZ3U-480-80.jpg.webp 480w, https://cdn.mos.cms.futurecdn.net/t5Y5VSx5UhBK3JAYKdYZ3U-320-80.jpg.webp 320w\" sizes=\"(min-width: 1000px) 970px, calc(100vw - 40px)\"></source>\n<img src=\"https://cdn.mos.cms.futurecdn.net/t5Y5VSx5UhBK3JAYKdYZ3U.jpg\" alt=\"GPU from scratch\" srcset=\"https://cdn.mos.cms.futurecdn.net/t5Y5VSx5UhBK3JAYKdYZ3U-1200-80.jpg 1200w, https://cdn.mos.cms.futurecdn.net/t5Y5VSx5UhBK3JAYKdYZ3U-1024-80.jpg 1024w, https://cdn.mos.cms.futurecdn.net/t5Y5VSx5UhBK3JAYKdYZ3U-970-80.jpg 970w, https://cdn.mos.cms.futurecdn.net/t5Y5VSx5UhBK3JAYKdYZ3U-650-80.jpg 650w, https://cdn.mos.cms.futurecdn.net/t5Y5VSx5UhBK3JAYKdYZ3U-480-80.jpg 480w, https://cdn.mos.cms.futurecdn.net/t5Y5VSx5UhBK3JAYKdYZ3U-320-80.jpg 320w\" />\n</picture><a href=\"https://cdn.mos.cms.futurecdn.net/t5Y5VSx5UhBK3JAYKdYZ3U.jpg\" target=\"_blank\"></a></p></div><figcaption><span>(Image credit: Adam Majmudar)</span></figcaption></figure><p>Lastly, the completed Verilog design was passed through OpenLane EDA, targeting the Skywater 130nm process node (for Tiny Tapeout). Again some issues needed to be ironed out. In particular, Majmudar explains that some Design Rule Checks (DRCs) failed and necessitated rework.</p><p>After the two-week effort, the engineer enjoyed playing with a cool <a href=\"https://twitter.com/MajmudarAdam/status/1783304258462646734\" target=\"_blank\">3D visualization</a> of his GPU design. That will have to suffice until TT7 returns silicon to participants. Of course, the work isn't going to rank among <a target=\"_blank\" href=\"https://www.tomshardware.com/reviews/best-gpus,4380.html\">the best graphics cards</a>. If you want to read more about this homemade GPU check out the entertaining social media thread and / or investigate the dedicated <a href=\"https://github.com/adam-maj/tiny-gpu\" target=\"_blank\">Tiny-GPU GitHub</a> page.</p><div><section><p>Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.</p></section></div>\n</div>",
  "author": "@tomshardware",
  "favicon": "https://vanilla.futurecdn.net/tomshardware/1628188/apple-touch-icon.png",
  "source": "tomshardware.com",
  "published": "2024-04-29T13:10:52Z",
  "ttr": 93,
  "type": "article"
}