A Journey into Local AI Exploration
Part 1

The tech world has seemingly embraced the current iteration of AI/Large Language Models (LLMs). Numerous closed and open-source renditions of different models are all over the place, as are the methods to run them. Whether you want to run models on GPUs or CPUs, nVidia or ATI, Windows or Linux, Cloud or local, the options are abound. The choices are plentiful, but are not equal. An over-simplified general comparison is as follows:
I’m a homelabber and a self-hoster, I like being able to run things on my own. The Cloud is more powerful and has more hardware than I could ever hope to own, but I truly don’t find it cost effective for me. There is a balance for everything and everyone. I have both the ability and the desire to take care of IT at home, fix problems, and source hardware. I choose to spend my time in lieu of spending money on the Cloud. When the interest arose to start playing with LLMs, StableDiffusion, and other AI models, I knew the hardware I had at home wasn’t optimized for AI, so any learning would be slow, if models even ran at all. What kind of Cloud options are available? In short, tons… Many advertisements lead you to find that you can run VMs or rent Cloud GPUs or run complete solutions for anywhere from roughly $0.50 to $3.00+ per hour. Ok, not bad, but let’s break it down into some more tangible examples.
Say you want to run a business workday’s worth of services for 5 days a week all month. That’s 40 hours a week at 4 weeks a month on average, so 160 hours in total for a given month. Based on pricing estimates above, that’s $80-$480+ per month. That gets expensive quickly. Now factor in all the other micro-transactions involved in the Cloud (service fees, data egress, storage charges, perhaps backups, maybe some support costs, etc) and that might easily tack another $20, $60, or even $100 per month. Now if this is for a business and/or you’re generating money and these costs are justified, that doesn’t feel bad. But if this is for just a hobby or a learn-and-burn environment, maybe not so much. When I dive into something to learn, grow, and play, I don’t want to feel restricted by the hour. Imagine running models overnight daily and your hours (and cost) now suddenly double! What about data security, ownership, availability? These are other aspects that are outside of your control if you use a cloud-hosted private AI service, and thrown out the window entirely for using public AI tools. Maybe I’m a bit over-protective of my data and information, but I’d rather keep it contained when possible. Thoughts like these lead me to want to run some AI models at home. My reasons may differ from the next person’s, but all options are valid. I’m not knocking Cloud options, if anything I am grateful that they exist. Ease of access greatly increases odds of widespread adoption, if only 1% can afford something, then it doesn’t really take off. Yachts are crazy expensive, a low percentage of people have them; we know they exist, but many of us will never even see one. Cars are (relatively) more affordable and accessible, and they’re all over the place.
So back to my aspiration of running AI locally… I decided that I want to build something new and to give myself some flexibility without spending $10K, but still be effective and be able to get something real out of it. I chose to document the whole process, list my build, and show every major step along the way to possibly help others and share what I learn. First step, the build!
I am typically one to get used equipment where it makes sense, focusing on enterprise gear, but given how radically (and quickly) the playing field has evolved recently, used gear felt like it wouldn’t get me much or save much money, so aside from the case (and related case accessories) this is a 100% new build.
In my current homelab, nearly my entire server rack is occupied and the power available to that room is consumed near its safe limit. I don’t want to add another rackmount server, I want to build something into a desktop form factor and be able to put it into a different room with the intentions of connecting remotely. I will likely add a JetKVM to the mix at some point, but I will start out with a LapDock (exact model here) as my initial video and peripheral interaction. I want a powerful CPU that can handle workloads, but doesn’t require extreme cooling solutions or insane power requirements. I want 128GB to 256GB of RAM (DDR5) to ensure this is not a bottleneck, and I would like to run multiple AI-grade GPUs.
The overall thought here is I plan to run a baremetal hypervisor on the machine, likely Proxmox, and be able to run a few VMs at a time and pass-through a GPU directly to a VM in order to run, test, and learn multiple workloads and workflows at the same time. The VM route will also allow me to switch between systems and OSes more easily without rebuilding. Let’s say I wanted to test 2 different Windows builds and 3 Linux builds overall. I could spin up 2 VMs at a time, assigning each VM a GPU directly, then if I want to switch to another build, I can shut down the VM, de-allocate the GPU, reallocate to another VM and power that one on. I would like for this build to be able to support at least 2 powerful AI-centric VMs running simultaneously. I have other servers to handle anything outside of this, so not really interested in anything lightweight or container-based unless it directly applies to my AI goals. My target is a desktop build with manageable power and cooling requirements (say 800W or less) and remote connectivity so it can sit in another room and be what I need it to be.
Starting at the base, I wanted a motherboard that was made with generative AI in mind. I wanted something with PCI-E 5.0 or PCI-E 4.0, DDR5 support, good onboard network (wired is ok, Wi-Fi is a plus), and enough connectivity that I need no add-in cards other than the GPUs. I wanted enough M.2 NVMe slots and SATA ports to accommodate any storage I might want to add, and a minimum of 2 (with a preference of 3) PCI-E x16 slots. Since this isn’t a high-end AMD Epyc or similar server build with some 180 PCI-E lanes, I am fully well aware that at best I’d get 8 lanes per slot if using 2 or 3 at once.
I’m normally an AMD fan, but I didn’t want to limit myself to just one team, so I looked at both sides. I ended up choosing the ASUS ProArt Z890-Creator WIFI motherboard (vendor site)(Amazon). This thing meets every checkbox I had. I would encourage you to read up on the details of the board yourself if interested, but here are the highlights:

Aside from the GPUs, this was the most expensive piece, costing me $490 (pre-tax) from Amazon. On paper, this thing meets every requirement I have and more. The reviews were largely positive so my confidence level was high to select this board.
I started with the motherboard, and selected a CPU based off of that. Given the board is LGA 1851, I core go with Meteor Lake or Arrow Lake CPUs, also known as Core Ultra Series 1 and Core Ultra Series 2 respectively. I want newest, so I searched Arrow Lake processors.
Following a handy chart from Wikipedia (link) as seen below, I compared my options.
Since I want to run a minimum of 2 VMs, in my mind I’m basically taking the cores here and halving them. I also want good clock speeds and onboard video, so anything ending in F or T is out. I was basically left with the Core Ultra 5 245K, the Core Ultra 7 265K, or the Core Ultra 9 285K. I really wanted more cores and therefore ruled out the Ultra 5, so the battle was between Ultra 7 & Ultra 9. The Ultra 9 has a bit more cache, but it mostly boils down to 16E cores vs 12E cores… for double the price. As soon as I looked at the listing for both, easily went with Core Ultra 7 265K (Intel ARK)(Amazon) for $259 at time of purchase from Amazon.
This gives me 20 Cores/Threads to work with, 8 Performance, 12 Efficiency. The E cores aren’t even all that much slower than the P cores, and Proxmox will handle the core assignments to the VMs with no issue, so I don’t see any negatives here.
By all accounts that I can find online, I should be able to air cool this CPU. I have no issue with watercoolers/AIOs, but they do eventually need to be replaced and sometimes there are issues with the pumps, etc. It’s less overall maintenance with air cooling if it can adequately cool the CPU. For this I went with be quiet! Pure Rock Pro 3 Black CPU Air Cooler. It’s a dual tower, dual fan CPU cooler with 6 heatpipes. Overall the reviews are very positive and according to the vendor website, it should fit the motherboard I chose, so I went with it. At the time of purchase, this was $60 from Amazon (Vendor site)(Amazon)

Since the motherboard maxes out at 256GB of RAM, with 4x 64GB DIMMs, I would like to eventually reach that maximum. Planning towards that, I wanted to start out with 128GB for now and expand later, so I opted for 2x 64GB DIMMs vs 4x 32GB DIMMs, this way I can add instead of replace later on.
I went with a Crucial Pro 128GB kit, DDR5 5600 MT/s (formerly MHz) from Amazon (Amazon) for $299 at time of purchase. As testing and growth goes on, I plan on purchasing a second kit if/when the need or desires arises, but for now 128GB shall be my starting point.
For the graphics cards, I wanted something that would be capable of running AI workloads without intense power draws. I understand that higher end cards run at better performance, but cost more in terms of both price and power draw. I decided on a couple of cards from the RTX ADA generation, specifically the RTX 4000 ADA and the RTX 2000 ADA.
The RTX 4000 ADA is a single slot PCI-E 4.0 x16 card with a power draw of only 130W. According to reports online, it is roughly equivalent to the RTX 4060 Ti in terms of performance, but with better power efficiency, and significantly more VRAM (20GB vs 8GB). This one was the most expensive component, clocking in at $1425 at time of purchase on Amazon (Amazon).
The RTX 2000 ADA is a dual slot PCI-E 4.0 x16 card with no external power required, it runs at the 75W it can draw from PCI-E alone. According to reports online, it is roughly equivalent to the RTX 3060 Ti in terms of performance, but way better power efficiency, more VRAM (16GB vs 8GB), and the benefits of the ADA generation GPUs. This was the second most expensive component in the build, coming in at $699 at time of purchase (Amazon).
Now I could have gotten a single RTX 5000 ADA card, but this one is almost triple the cost of the 4000, coming in at $4100, but it has 32GB VRAM, but it also has 250W power draw (more than the two cards I chose combined). It also only offers 40-65% improvement over the 4000 and since it’s a single card I’d be limited to just one workflow at a time. I’m happy with my dual GPU purchase at half the cost of this next step up.
Also, it seems I forgot to take pics of the video cards prior to installation, so nothing for this section…
Now I plan on running a hypervisor and a few local VMs, so I wanted some fast local storage. To start, I’m going with a Crucial P3 Plus 500GB PCI-E Gen 4 NVMe SSD ($43 at time of purchase)(Amazon) for my main OS and local datastore, and a Crucial P3 Plus 1TB Pci-E Gen 4 NVMe SSD ($57 at time of purchase)(Amazon) for the VMs. I figure this will be sufficient storage and speed for the initial few runs.
I do have a 4-bay hot-swappable 2.5” enclosure in this case for some SATA SSDs. The motherboard has enough ports that I plan on wiring this up at some point down the road, potentially buying up to four 1TB or 2TB SSDs for a ZFS pool for additional local storage. If I ever find I need more than that, I’ll just use network storage I have available at home. If you don’t have that available, one or two large HDDs would be your best bet.
I wanted a brand new, fully modular PSU to go with the new build. Given the components listed above, I went with an 850W 80Plus Gold power supply from MSI that includes one of those new GPU 12-pin connectors, the MSI MPG A850G which clocked in at $165 on Amazon at time of purchase (vendor site)(Amazon). 850W should be more than sufficient to power the motherboard, SSDs, CPU, and GPUs. Many of the PSU calculators on the web suggest 565W as the minimum so this should give me plenty of overhead.
Fully modular is my PSU design of choice, you include the cables you need, remove the ones you don’t; nice and neat.
So while all the main components are new, there are a few pieces that are used from previous builds, which help reduce my cost, but might add to yours if you’re trying something similar.
The case I am using is my old personal desktop case, it a RAIDMAX Cobra Z case I purchased in 2015. It’s an ATX mid-tower case with decent cooling/fan options. I can’t find an exact link, but this is the closest I could find on an old NewEgg product page (NewEgg item). It has adequate room for the motherboard, CPU cooler, GPUs and any drives I might want to add (DVD drive to feel nostalgic perhaps?).
Along with the case, I have reused some case fans. There are two or three standard 120mm fans in normal mounting points, and two 140mm fans attached with twist ties for more directed airflow towards the GPUs.
All the new components listed above came out to $3500 before tax, plus $250 in tax (free shipping on all of it), coming in at a total of roughly $3750 for the build. To be fair, this is less costly than simply the next tier GPU in the workstation RTX ADA line (RTX 5000 ADA).
Now how do this compare to running in the cloud? If compared to cloud GPU costs of roughly $1/hr (on the cheaper end of the description earlier), this is the equivalent of 156.25 days of non-stop usage. That is a little over 5 months, so you can see if the usage is right how this build could be more cost effective in the long run (as in 6 months to a year, or longer). If we estimate using this at 50% PSU workload (420W) 24 hours a day as a long term average (hopefully less), that works out to ~10kWh of power per day, est $0.13 per kWh, you’re looking at roughly $1.30 per day or $475 per year in electricity costs. That’s about $40 per month. Now let’s take that $1/hr estimate, say 10hrs a day, 5 days a week for 200 hours per month plus $40 in miscellaneous Cloud fees, you’d run at $240 per month.
With the local build costing you $40 per month, the difference is $200 per month. So how long would it take to break even? $3750 investment divided by $200 per month cost difference, brings you to 18.75 months, so just over a year and a half to break even and then pull ahead. Let’s tweak the parameters a bit. Most cloud providers estimate monthly costs at 720 hours in a given month. So let’s say you run 50% of a month, 360 hours and get a slightly beefier VM/GPU running at $2 per hour. Keeping the same $40 misc cloud fees the same, you’re now looking at (360 x $2) + $40, or $760 per month. Now with a difference of $720 per month (760 – 40), it takes only 5 months to rack up the same cost as the build. It’s all about what you use, how you balance it, and what’s important. This build is a tangible thing that will retain at least some value, can be resold, can be reused for other purposes… at least that’s how I sold it to myself 😅
Anyway, this is what I put together and my rough overall idea. Hopefully it all works out how I am anticipating! Next entry will be the actual build/assembly and getting the project moving.
If you like this site, help us out.
Spread the word and share it with others!
First two comments: