A global memory shortage is making the most AI-capable Macs nearly impossible to obtain. Memory — not just GPUs — is the defining constraint of the AI era, and the edge AI future will depend on it entirely.
In April 2026, you cannot easily buy the most AI-capable Apple computers. The bottleneck is not Apple manufacturing capacity. It is a global memory shortage driven by the AI infrastructure build-out.
Mac mini with 64GB RAM: 16–18 weeks. Mac Studio with 256GB RAM: 4–5 months. Base $599 Mac mini: sold out entirely. Not because of a lack of Apple hardware — because there is simply not enough memory in the world.
Everyone talks about Nvidia's GPUs and Google's TPUs. Om Malik argues that memory — the capacity to hold model weights, KV cache, and context — is equally fundamental. Without it, no amount of compute delivers usable AI performance.
Unified Memory Architecture places CPU, GPU, and Neural Engine on a single die sharing one high-bandwidth memory pool. This makes memory capacity — not clock speed, not core count — the primary determinant of on-device AI capability.
All compute units access the full memory pool simultaneously at maximum bandwidth. No discrete memory pools, no cross-bus bottlenecks. The entire device memory is available to every compute unit at once.
A Mac Studio with 256GB can run 70B+ parameter LLMs entirely on-device — a capability unavailable on most other hardware at any price point. Memory capacity directly translates to model size capability.
On-device inference via on-device AI means queries never leave the device. Memory-rich Apple devices are the only platform today that combines this privacy, this capability, and this availability — and they are sold out.
Om Malik identifies the devices that will define the next wave of edge AI. In each case, available memory will be the binding constraint on AI capability — more so than connectivity, battery, or processor speed.
Real-time perception, decision-making, and physical interaction require large models running locally. Memory determines model complexity — and therefore what a robot can actually do without cloud dependence.
Form factor imposes severe memory constraints. The AI capability of smart glasses — what they can understand, identify, and respond to — will be defined entirely by grams of silicon allocated to memory.
Ultra-constrained AI wearables represent the extreme end of the edge AI spectrum. Memory — measured in megabytes — will be the specification that matters most for every function these devices can perform.
A forthcoming integration will combine Apple Intelligence (on-device AI via Apple Silicon) with Google's Gemini LLM in the cloud. Memory is the variable that determines where the boundary sits.
Apple Silicon + Unified Memory. Fast, private, offline-capable. Limited by available device RAM — 8GB to 512GB. The memory ceiling is the capability ceiling.
The global AI boom has consumed memory chip supply across data centres, training clusters, and AI infrastructure. Apple Silicon Macs require the same class of high-bandwidth memory that AI infrastructure competes for. It is not an Apple production problem — it is a global memory supply problem.
Running large language models requires massive, fast memory bandwidth — to store model weights and execute inference at usable speed. Without sufficient memory, even the most powerful GPU is bottlenecked waiting for data. The memory shortage of 2026 is quieter than the GPU shortage of 2023–24, but equally fundamental.
Unified Memory Architecture places CPU, GPU, and Neural Engine on a single die sharing one high-speed memory pool. All compute units access the full memory at maximum bandwidth simultaneously. This enables on-device inference of much larger models than possible on conventional architectures with discrete memory pools.
Om Malik identifies robots, smart glasses, and neck pendants as the next wave. Each has increasingly severe form-factor constraints. For wearables, the grams of silicon available for memory will be the primary factor determining AI capability — not software, connectivity, or algorithms.
A forthcoming integration combines Apple Intelligence on-device with Gemini in the cloud. Simple, privacy-sensitive tasks run locally on Apple Silicon; complex queries route to the cloud. The device's memory capacity determines how much stays local — i.e., how fast, private, and offline-capable the experience is.