KV Cache

Alibaba's new open source Qwen3.5-Medium models offer Sonnet 4.5 performance on local computers

This leap is made possible by near-lossless accuracy under 4-bit weight and KV cache quantization, allowing developers to process massive datasets without server-grade infrastructure.

VentureBeat

Nvidia’s new technique cuts LLM reasoning costs by 8x without losing accuracy

Researchers at Nvidia have developed a technique that can reduce the memory costs of large language model reasoning by up to eight times. Their technique, called dynamic memory sparsification (DMS), ...

insideHPC

DDN Takes on GPU Waste with KV Cache Performance for AI Reasoning

CHATSWORTH, Calif. — July 18, 2025 DDN today unveiled performance benchmarks that the company said demonstrates how its AI-optimized DDN Infinia platform eliminates GPU waste and delivers the fastest ...

Semiconductor Engineering

Dynamic KV Cache Scheduling in Heterogeneous Memory Systems for LLM Inference (Rensselaer Polytechnic Institute, IBM)

A new technical paper titled “Accelerating LLM Inference via Dynamic KV Cache Placement in Heterogeneous Memory System” was published by researchers at Rensselaer Polytechnic Institute and IBM. “Large ...

Virtualization Review

What GPU You Really Need for AI Workloads

GPU memory (VRAM) is the critical limiting factor that determines which AI models you can run, not GPU performance. Total VRAM requirements are typically 1.2-1.5x the model size due to weights, KV ...

Semiconductor Engineering

AI Inference Needs A Mix-And-Match Memory Strategy

Interactive LLMs (chat, copilots, agents) with strict latency targets Long‑context reasoning (codebases, research, video) with massive KV (key value) cache footprints Ranking and recommendation models ...

The Next Platform

Generative AI is arguably the most complex application that humankind has ever created, and the math behind it is incredibly complex even if the results are simple enough to understand. … ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results