Laskamp4 -
: Unlike previous versions that relied on "bolted-on" vision components, Llama 4 was trained from the start with text, images, and video frames.
: Designed for efficiency, this model has 17 billion active parameters. It fits on a single H100 GPU. It is optimized for high-speed performance (up to 460+ tokens per second) and long-document reasoning. Laskamp4
: The models use a "mixture of experts," where only a subset of the total parameters (e.g., 17 billion active parameters in the Scout model) are activated for any given task. This significantly reduces computational costs and latency while maintaining high performance. : Unlike previous versions that relied on "bolted-on"