We present YvyrAI, a Spanish-first decoder-only language model architecture centered on a recurrent module called the DeliberationBlock. Rather than treating scale only as parameter count, YvyrAI is designed to expose a second axis of scaling: internal test-time computation. A shared Transformer block is iterated within a single forward pass while a learned controller maintains a gated planning state, reads and writes a compressed latent scratchpad, estimates an internal verification signal, applies a conditional repair update proportional to estimated unreliability, and decides how many iterations should be spent before emitting the final hidden representation.
Around this core, the system combines a novel recurrent-deliberation framework with several state-of-the-art language-model technologies, including Grouped-Query Attention (GQA), RoPE positional embeddings, SwiGLU activations, RMSNorm, optional Mixture-of-Depths routing, sliding-window attention, low-precision training support, and inference-time KV-cache compression, a Spanish-first data pipeline, a multi-term training objective, and an implemented Spanish evaluation harness.
The contribution is architectural and methodological. YvyrAI is not reported here as a trained high-capability model, and this paper does not claim superiority over existing systems. Large-scale training and benchmarking remain future work. Beyond the technical contribution, YvyrAI constitutes the first language model architecture developed from first principles by a Latin American research team in the Spanish-speaking world, positioned as the research and intelligence complement to Paraguay's emerging AI infrastructure strategy.