Everyone in AI has been obsessed with one thing for the last two years: compute power. More GPUs. Faster training. Bigger clusters. But a South Korean chip startup called XCENA just raised $135 million betting that the whole industry has been looking at the wrong bottleneck. Memory, not compute, is what’s actually holding AI back – and they’ve built a chip to prove it.
On May 29, 2026, XCENA closed a $135 million Series B at a $570 million valuation, bringing total funding to $185 million since the company was founded in 2022. The round was led by Seoul-based VCs Atinum and IMM Investment, with participation from Corstone Asia and existing backers SBI Investment and Mirae Asset Capital. The pitch that convinced them? Inference isn’t a compute problem anymore. It’s a memory scaling problem, and nobody has solved it for decades.
In This Article
The Memory Wall That Nobody Talks About
Here’s what’s actually happening when you chat with an LLM. Every time the model generates a token, it has to fetch attention keys and values from memory, run matrix multiplications on the GPU, then write the results back. The math part is fast. The data movement part is not. On modern inference servers, memory bandwidth saturation – not how many FLOPs you can throw at the problem – is what determines your throughput per dollar.
This is known as the “memory wall,” and it gets worse the bigger your models get. A 70-billion-parameter model serving 1,000 simultaneous requests at 4K context length needs tens of gigabytes of attention cache per layer stack. That data has to move between storage, CPU, and GPU on every single forward pass.
GPU VRAM is fast but limited – a pair of H100s gives you 160GB of HBM3e at around 3.35 TB/s bandwidth. Sounds good until you’re doing continuous batching across hundreds of long-context requests and KV cache eviction starts killing your latency before raw compute even breaks a sweat.
XCENA’s CEO Jin Kim puts it bluntly: “CPUs and GPUs have both gotten smarter over the decades. Memory never did.” The memory controller logic sitting between DRAM and the rest of the system has been basically static since the DDR3 era while everything else in the compute stack got completely redesigned.
Meet the MX1: Compute Inside Your Memory
XCENA’s answer is a chip called the MX1, and it’s not like anything you’ve seen before. Instead of building another GPU or accelerator that plugs into a server, XCENA put the compute inside the memory itself. The MX1 is a CXL 3.2 device that attaches to the host CPU over a PCIe 6.0 link.
From the server’s perspective, it looks like a remote memory region on a separate NUMA node. But inside, it’s running thousands of custom RISC-V cores at 1.4 GHz with vector engines that handle FP32 and FP16 operations.
Here’s what makes it interesting: the MX1 packs up to 1TB of DDR5-8400 DRAM in a quad-channel configuration. The onboard RISC-V cores handle KV cache orchestration, vector database queries, prefetch scheduling, compression, and access pattern tracking – all without any data ever crossing the PCIe link back to the host. For retrieval-augmented generation workloads, that’s a massive efficiency gain because you’re not shuttling data back and forth across the slowest link in the system.
No Custom Drivers Required
One of the smartest decisions XCENA made was building the MX1 around CXL 3.2 (Compute Express Link). This open interconnect standard adds cache-coherent memory semantics over PCIe. On a Linux host running kernel 6.2 or later, the MX1 shows up as a standard attached memory expander. You can manage it with regular tooling, pin KV cache allocations to its NUMA node with a single command, and your inference framework doesn’t need custom drivers or kernel modules.
That’s a huge deal. Previous attempts at processing-in-memory by Samsung (their AXDIMM) and SK Hynix (their AiM chips) never reached mass-market production for inference workloads, largely because the integration was too painful. XCENA’s founders – all veterans of Samsung and SK Hynix – believe CXL 3.2’s cache-coherent semantics finally solve that problem.
InfiniteMemory: SSD-Backed Petabyte-Scale Capacity
The MX1 doesn’t stop at DRAM. It also connects to NVMe SSDs over PCIe 6.0, creating what XCENA calls InfiniteMemory: a tiered address space where hot data lives in DDR5 and cold data spills to flash. The RISC-V management cores handle tier migration autonomously, so the host sees a single flat address space that can extend to petabyte scale. Access latency varies by tier, but the system handles placement decisions without any host involvement.
XCENA ships the MX1 in two variants:
- MX1P – Single PCIe Gen6 x16 interface, targeting production in late 2026 with revenue expected in 2027
- MX1S – Dual Gen6 x8 interface, same DDR5-8400 quad-channel memory, same CXL 3.2 support
Both variants require a host platform with PCIe Gen6 support and CXL 3.x capability. Intel Xeon 6 and AMD EPYC Genoa both qualify, but older Xeon Scalable generations won’t work.
Does It Actually Work?
Let’s be honest about where things stand. XCENA hasn’t published independent benchmark results. The claim that one MX1-equipped server can replace the workload of ten standard machines comes from the company’s own analysis and hasn’t been verified externally. Working samples started shipping to select partners in late 2025, and the MX1 won “Most Innovative Memory Technology” at FMS 2025, but production silicon is still on the way.
CXL 3.2 adoption in production infrastructure is also genuinely early. The hyperscalers most likely to buy XCENA’s hardware at scale are still rolling out first-generation CXL 1.1 deployments. XCENA needs CXL 3.2 to reach broad hardware availability at roughly the same time their chips are ready to ship in volume – a coordination challenge that’s outside their control.
The software story needs work too. XCENA ships an LLVM-based toolchain and SDK, but getting inference workloads to efficiently offload KV cache management to the device requires application-level integration that most ML infrastructure teams haven’t had to write before. The Linux NUMA path lowers the bar significantly, but optimized usage still demands real engineering time from customers.
What This Means for the AI Industry
XCENA’s thesis – that memory bandwidth, not compute, is the binding constraint on inference economics – is sound. We’re already seeing it play out across the industry. Groq built an entire architecture around deterministic memory access. The memory chip shortage pushed Micron, SK Hynix, and Samsung past $1 trillion each in market value. And NVIDIA’s next-gen Blackwell and Rubin platforms are increasingly memory-bandwidth-limited rather than compute-limited.
If XCENA delivers on the MX1’s promises, it changes the math on inference deployment. Instead of buying more GPU servers to handle longer context windows and more concurrent users, you could slot in a memory co-processor that does the heavy lifting at a fraction of the cost. For anyone building AI products at scale – and for anyone paying the inference bills – that’s a future worth watching.
Final verdict
XCENA raised $135 million because a room full of smart investors believes that the AI industry’s next bottleneck isn’t about making chips that compute faster. It’s about making memory that computes at all. The team has the pedigree, the technology has a clear use case, and the timing lines up with a broader market realization that throwing more GPUs at inference problems has diminishing returns.
But there’s real execution risk. Production delays, slow CXL ecosystem adoption, and the software integration burden could all push revenue well past the 2027 target. The $185 million in total funding needs to cover at least 18 more months of operations before the company produces returns. That’s achievable but leaves thin margin for error.
Still, in an AI hardware market dominated by NVIDIA’s GPU hegemony, XCENA represents something genuinely different. Not another accelerator. Not another cloud service. A fundamental rethinking of where computation should happen in the first place. Sometimes the smartest innovation is asking whether the data even needs to move at all.
For more on the latest AI tools, infrastructure innovations, and startup funding news, keep checking aitoolgate.com. We track the tools and technologies actually shaping the AI landscape – no hype, just the signal.
Looking for more AI hardware deep dives? Check out our coverage of other AI chip startups and infrastructure innovations reshaping how AI models are deployed and served at scale.
How I reviewed this
AI Tool Gate evaluates AI tools and AI industry updates from a developer/operator perspective. I look at practical use cases, product positioning, pricing signals, reliability concerns, and whether the tool is actually useful for real workflows.
- Use-case fit: who this is for and who should skip it.
- Practical value: what changes for developers, creators, teams, or businesses.
- Trust check: claims are compared against public product pages, announcements, docs, and observable market context when available.
Written by
Gallih Armadaw
Senior backend developer with 8+ years of experience building production systems across PHP/Laravel, Node.js, cloud infrastructure, Web3, and AI-assisted workflows. I review AI tools from a practical developer/operator perspective.