S Tier Models: GLM and Hermes 👀👀🔥🥰
3 months ago, I realized I was hopelessly dependent on corporations that only care about power, money, and control. At this point Cursor, Claude, OpenAI, all had rugged their unlimited plans. I wanted a Mac M3 Ultra with 512GB RAM. Ahmad and Pewdiepie convinced me otherwise. Here's what I learned building my own AI Rig ----------------------------- The Build ($3K-$10K) This is the top performance you can get below 10k USD • 4x RTX 3090s with 2x NVLink • Epyc CPU with 128 PCIe lanes • 256-512GB DDR4 RAM • Romed8-2T motherboard • Custom rack + fan cooling • AX1600i PSU + quality risers Cost: $5K in US, $8K in EU (thanks VAT) Performance Reality Check More 3090s = larger models, but diminishing returns kick in fast. Next step: 8-12 GPUs for AWQ 4-bit or BF16 Mix GLM 4.5-4.6 But at this point, you've hit consumer hardware limits. ---------------------------------------- Models that work: S-Tier Models (The Golden Standard) • GLM-4.5-Air: Matches Sonnet 4.0, codes flawlessly got this up to a steady 50 tps and 4k/s prefill with vLLM • Hermes-70B: Tells you anything without jailbreaking A-Tier Workhorses • Qwen line • Mistral line • GPT-OSS B-Tier Options • Gemma line • Llama line ------------------------------------ The Software Stack That Actually Works For coding/agents: • Claude Code + Router (GLM-4.5-Air runs perfectly) • Roocode Orchestrator: Define modes (coding, security, reviewer, researcher) The orchestrator manages scope, spins up local LLMs with fragmented context, then synthesizes results. You can use GPT-5 or Opus/GLM-4.6 as orchestrator, and local models as everything else! Scaffolding Options (Ranked) 1. vLLM: Peak performance + usability, blazing fast if model fits 2. exllamav3: Much faster, all quant sizes, but poor scaffolding 3. llama.cpp: Easy start, good initial speeds, degrades over context UI Recommendations • lmstudio: Locked to llama.cpp but great UX • 3 Sparks: Apple app for local LLMs • JanAI: Fine but feature-limited ------------------------------- Bottom Line Mac Ultra M3 gets you 60-80% performance with MLX access. But if you want the absolute best you need Nvidia. This journey taught me: real independence comes from understanding and building your own tools. If you're interested in benchmarks I've posted a lot on my profile
Show original
7.9K
70
The content on this page is provided by third parties. Unless otherwise stated, OKX is not the author of the cited article(s) and does not claim any copyright in the materials. The content is provided for informational purposes only and does not represent the views of OKX. It is not intended to be an endorsement of any kind and should not be considered investment advice or a solicitation to buy or sell digital assets. To the extent generative AI is utilized to provide summaries or other information, such AI generated content may be inaccurate or inconsistent. Please read the linked article for more details and information. OKX is not responsible for content hosted on third party sites. Digital asset holdings, including stablecoins and NFTs, involve a high degree of risk and can fluctuate greatly. You should carefully consider whether trading or holding digital assets is suitable for you in light of your financial condition.