- AGENTS.md: design principles, enforcement hierarchy, deferred loading - agents/: brainstorm, build, orchestrator, research (auto-discovered by MCP server) - skills/: research methodology (auto-discovered by MCP server) - hooks/: pre-tool-use, post-tool-use (BFF block removed), session-start, stop, pre-compact, user-prompt-submit - frameworks/: opencode/plugin.ts (resolves hooks via import.meta.url — works as project-local or global plugin), github/hooks.json - mcp/index.ts: auto-discovers agents/*.md and skills/*.md from frontmatter (replaces hand-maintained registry); server renamed all-agents - docs/: agent-infrastructure.md (generalized), research docs (7 files), ai_architectures.md, llama-server-cuda-wsl2.md - install.sh: idempotent setup — Copilot global hooks, OpenCode global plugin + AGENTS.md + MCP entry, VS Code global MCP config
7.8 KiB
What are the different AI model architectures? ... What are the experimental ones? ... Any more? ... I want you to combine all of those into a small bulleted list
Sequential & Vision Foundations
- RNN, LSTM, GRU: Sequential networks processing text and audio through internal memory loops.
- CNN: Spatial processing architectures using mathematical filters to parse visual grids.
- Vision Transformers (ViT): Vision models using self-attention directly on split image patches.
Modern Generative Engines
- Transformers: High-capacity architectures utilizing global self-attention to process text sequences.
- GANs: Paired networks staging a generator against a discriminator to create imagery.
- Diffusion Models: Generative systems producing content by systematically reversing data noise.
- VAEs: Systems compressing data into lower dimensions to decode structural variations.
Efficiency & Structure Scaling
- Mixture of Experts (MoE): Sparse networks activating targeted sub-expert nodes to optimize computation.
- Graph Neural Networks (GNN): Frameworks natively executing message-passing over complex non-grid relationships.
- Large Action Models (LAM): Automation architectures translating semantic intents into executable digital workflows.
- Attention-SSM Hybrids: Layered blocks combining global attention processing with fast linear scaling.
Next-Gen Context & Linear Processing
- Mamba & SSMs: Hardware-aware state models enabling infinite context lengths with linear scaling.
- Liquid Neural Networks (LNN): Dynamic architectures utilizing fluid differential equations for real-time adaptation.
- Diffusion Language Models (DLM): Non-sequential text systems iteratively refining whole content blocks.
- Joint Embedding Predictive (JEPA): Meaning-driven architectures predicting high-level conceptual outcomes over raw pixels.
Dynamic Reasoning & Alternative Physics
- Test-Time Training (TTT / MoR): Flexible loops scaling internal computing depth dynamically per query.
- Distributed Neural Graphs: Modular webs routing separate context tokens through specialized pathways.
- Neuro-Symbolic Hybrids: Integrated systems merging raw neural intuition with precise sandbox compilation.
- Spiking Neural Networks (SNN): Bio-mimetic hardware structures firing nodes strictly via threshold spikes.
- Thermodynamic Models: Analog physical systems computing answers by settling into low-energy equilibriums.
Which of these require the least power consumption, least training data, least compute, and least VRAM
To identify which architecture requires the absolute minimum across all four constraints, the category of Alternative Physics & Bio-mimetic Architectures wins outright, followed immediately by classical spatial networks. The absolute winners for each specific bottleneck are broken down below:
1. Least Power Consumption: Spiking Neural Networks (SNNs)
- The Winner: Spiking Neural Networks (SNNs).
- Why: Traditional AI models are mathematically "dense," meaning billions of multipliers and transistors must turn on and consume electricity for every single token processed. SNNs mimic biological neurons and operate using sparse electrical spikes.
- The Difference: If there is no new incoming data, energy consumption drops to near zero. Because nodes only fire when an electrical threshold is crossed, SNNs use up to 100× to 1000× less electrical energy than traditional deep learning networks, making them the gold standard for low-power edge hardware. [1]
2. Least Training Data: Neuro-Symbolic Hybrids & JEPAs
- The Winner: Neuro-Symbolic Hybrids (closely followed by Joint Embedding Predictive Architectures / JEPA).
- Why: Transformers require hundreds of billions of text tokens to accidentally stumble upon and learn basic mathematical rules (like addition). Neuro-Symbolic systems hardcode a classical, rigid rules engine directly into the AI's core.
- The Difference: Instead of needing to see 10,000 examples of a math problem to recognize a pattern, a Neuro-Symbolic model needs exactly one prompt because it instantly routes the logic out to a pre-programmed mathematical compiler.
3. Least Compute (FLOPs): Convolutional Neural Networks (CNNs)
- The Winner: Convolutional Neural Networks (CNNs).
- Why: Modern Vision Transformers (ViTs) compare every single pixel patch against every other pixel patch globally, creating an exploding computation bottleneck. CNNs strictly look at localized, overlapping grids using tiny math matrices (kernels).
- The Difference: For simple classification and computer vision tasks (like recognizing a stop sign), a highly optimized CNN calculates a mere fraction of the floating-point operations (FLOPs) required by an attention-based Vision Transformer.
4. Least VRAM: Mamba / State Space Models (SSMs)
- The Winner: Mamba & State Space Models (SSMs).
- Why: In traditional Transformers, VRAM usage scales quadratically with length because the model must store a massive, continuously growing history of the conversation (known as the KV-Cache) directly inside the GPU memory.
- The Difference: Mamba compresses conversational history into a fixed-size mathematical "state." Whether you have been chatting with the model for 5 sentences or 50,000 sentences, the VRAM footprint remains entirely flat and locked, drastically reducing hardware memory requirements during deployment. [2, 3]
Direct Comparison Overview
| Architecture [2, 4, 5] | Power Efficiency | Data Efficiency | Compute (FLOPs) Needed | VRAM Footprint | Best Used For |
|---|---|---|---|---|---|
| SNNs | 🏆 Lowest | Moderate | Low | Low | Battery-powered edge devices |
| Neuro-Symbolic | Moderate | 🏆 Lowest | Low | Low | Exact math, logic, and coding |
| CNNs | Low | High | 🏆 Lowest | Low | Embedded image/object detection |
| Mamba / SSMs | Moderate | High | Low | 🏆 Lowest | Long-context chatbots on local PCs |
| Transformers | Extreme High | Extreme High | Extreme High | Extreme High | Frontier LLMs and heavy cloud AI |
If you want to move forward with a lightweight project, let me know:
- Do you want to see a Python sample of a lightweight CNN or Mamba block?
- Are you trying to deploy an AI model onto a specific microdevice (like a Raspberry Pi or Arduino)?
[1] https://arxiv.org [2] https://www.reddit.com [3] https://virtualizationreview.com [4] https://medium.com [5] https://www.socomec.us
Further exploration of SSNs reveals they are the most efficient in all categories but they require specialized hardware, like the Akida™ PCIe Board AKD1000 (or you can use their cloud service pretty cheaply during a trial)
If your ultimate goal is to build or run something that achieves frontier-class reasoning while staying highly hardware-efficient, you must look toward a Hybrid SSM (Mamba) + Transformer + MoE architecture. This gives you the static VRAM footprint and linear scaling of an alternative model, backed by the proven intelligence of standard attention loops.