dotfiles/.agents/docs/ai_architectures.md

> What are the different AI model architectures? ... What are the experimental
> ones? ... Any more? ... I want you to combine all of those into a small
> bulleted list

---

## Sequential & Vision Foundations

- RNN, LSTM, GRU: Sequential networks processing text and audio through internal
  memory loops.
- CNN: Spatial processing architectures using mathematical filters to parse
  visual grids.
- Vision Transformers (ViT): Vision models using self-attention directly on
  split image patches.

## Modern Generative Engines

- Transformers: High-capacity architectures utilizing global self-attention to
  process text sequences.
- GANs: Paired networks staging a generator against a discriminator to create
  imagery.
- Diffusion Models: Generative systems producing content by systematically
  reversing data noise.
- VAEs: Systems compressing data into lower dimensions to decode structural
  variations.

## Efficiency & Structure Scaling

- Mixture of Experts (MoE): Sparse networks activating targeted sub-expert nodes
  to optimize computation.
- Graph Neural Networks (GNN): Frameworks natively executing message-passing
  over complex non-grid relationships.
- Large Action Models (LAM): Automation architectures translating semantic
  intents into executable digital workflows.
- Attention-SSM Hybrids: Layered blocks combining global attention processing
  with fast linear scaling.

## Next-Gen Context & Linear Processing

- Mamba & SSMs: Hardware-aware state models enabling infinite context lengths
  with linear scaling.
- Liquid Neural Networks (LNN): Dynamic architectures utilizing fluid
  differential equations for real-time adaptation.
- Diffusion Language Models (DLM): Non-sequential text systems iteratively
  refining whole content blocks.
- Joint Embedding Predictive (JEPA): Meaning-driven architectures predicting
  high-level conceptual outcomes over raw pixels.

## Dynamic Reasoning & Alternative Physics

- Test-Time Training (TTT / MoR): Flexible loops scaling internal computing
  depth dynamically per query.
- Distributed Neural Graphs: Modular webs routing separate context tokens
  through specialized pathways.
- Neuro-Symbolic Hybrids: Integrated systems merging raw neural intuition with
  precise sandbox compilation.
- Spiking Neural Networks (SNN): Bio-mimetic hardware structures firing nodes
  strictly via threshold spikes.
- Thermodynamic Models: Analog physical systems computing answers by settling
  into low-energy equilibriums.

---

> Which of these require the least power consumption, least training data, least
> compute, and least VRAM

---

To identify which architecture requires the absolute minimum across all four
constraints, the category of Alternative Physics & Bio-mimetic Architectures
wins outright, followed immediately by classical spatial networks. The absolute
winners for each specific bottleneck are broken down below:

## 1. Least Power Consumption: Spiking Neural Networks (SNNs)

- The Winner: Spiking Neural Networks (SNNs).
- Why: Traditional AI models are mathematically "dense," meaning billions of
  multipliers and transistors must turn on and consume electricity for every
  single token processed. SNNs mimic biological neurons and operate using sparse
  electrical spikes.
- The Difference: If there is no new incoming data, energy consumption drops to
  near zero. Because nodes only fire when an electrical threshold is crossed,
  SNNs use up to 100× to 1000× less electrical energy than traditional deep
  learning networks, making them the gold standard for low-power edge hardware.
  [1]

## 2. Least Training Data: Neuro-Symbolic Hybrids & JEPAs

- The Winner: Neuro-Symbolic Hybrids (closely followed by Joint Embedding
  Predictive Architectures / JEPA).
- Why: Transformers require hundreds of billions of text tokens to accidentally
  stumble upon and learn basic mathematical rules (like addition).
  Neuro-Symbolic systems hardcode a classical, rigid rules engine directly into
  the AI's core.
- The Difference: Instead of needing to see 10,000 examples of a math problem to
  recognize a pattern, a Neuro-Symbolic model needs exactly one prompt because
  it instantly routes the logic out to a pre-programmed mathematical compiler.

## 3. Least Compute (FLOPs): Convolutional Neural Networks (CNNs)

- The Winner: Convolutional Neural Networks (CNNs).
- Why: Modern Vision Transformers (ViTs) compare every single pixel patch
  against every other pixel patch globally, creating an exploding computation
  bottleneck. CNNs strictly look at localized, overlapping grids using tiny math
  matrices (kernels).
- The Difference: For simple classification and computer vision tasks (like
  recognizing a stop sign), a highly optimized CNN calculates a mere fraction of
  the floating-point operations (FLOPs) required by an attention-based Vision
  Transformer.

## 4. Least VRAM: Mamba / State Space Models (SSMs)

- The Winner: Mamba & State Space Models (SSMs).
- Why: In traditional Transformers, VRAM usage scales quadratically with length
  because the model must store a massive, continuously growing history of the
  conversation (known as the KV-Cache) directly inside the GPU memory.
- The Difference: Mamba compresses conversational history into a fixed-size
  mathematical "state." Whether you have been chatting with the model for 5
  sentences or 50,000 sentences, the VRAM footprint remains entirely flat and
  locked, drastically reducing hardware memory requirements during deployment.
  [2, 3]

---

## Direct Comparison Overview

| Architecture [2, 4, 5] | Power Efficiency | Data Efficiency | Compute (FLOPs) Needed | VRAM Footprint | Best Used For                      |
| ---------------------- | ---------------- | --------------- | ---------------------- | -------------- | ---------------------------------- |
| SNNs                   | 🏆 Lowest        | Moderate        | Low                    | Low            | Battery-powered edge devices       |
| Neuro-Symbolic         | Moderate         | 🏆 Lowest       | Low                    | Low            | Exact math, logic, and coding      |
| CNNs                   | Low              | High            | 🏆 Lowest              | Low            | Embedded image/object detection    |
| Mamba / SSMs           | Moderate         | High            | Low                    | 🏆 Lowest      | Long-context chatbots on local PCs |
| Transformers           | Extreme High     | Extreme High    | Extreme High           | Extreme High   | Frontier LLMs and heavy cloud AI   |

---

If you want to move forward with a lightweight project, let me know:

- Do you want to see a Python sample of a lightweight CNN or Mamba block?
- Are you trying to deploy an AI model onto a specific microdevice (like a
  Raspberry Pi or Arduino)?

[1] [https://arxiv.org](https://arxiv.org/html/2406.08413v1) [2]
[https://www.reddit.com](https://www.reddit.com/r/LocalLLaMA/comments/1qk1pzy/is_the_next_leap_in_ai_architectural_comparing/)
[3]
[https://virtualizationreview.com](https://virtualizationreview.com/articles/2026/01/27/what-gpu-do-you-really-need.aspx)
[4]
[https://medium.com](https://medium.com/@TDKVentures/breaking-the-compute-bottleneck-power-memory-and-the-new-ai-architecture-paradigms-eee264cec99a)
[5]
[https://www.socomec.us](https://www.socomec.us/en-us/solutions/business/data-centers/understanding-power-consumption-data-centers)

---

Further exploration of SSNs reveals they are the most efficient in all
categories but they require specialized hardware, like the Akida™ PCIe Board
AKD1000 (or you can use their cloud service pretty cheaply during a trial)

If your ultimate goal is to build or run something that achieves frontier-class
reasoning while staying highly hardware-efficient, you must look toward a Hybrid
SSM (Mamba) + Transformer + MoE architecture. This gives you the static VRAM
footprint and linear scaling of an alternative model, backed by the proven
intelligence of standard attention loops.