- AGENTS.md: design principles, enforcement hierarchy, deferred loading - agents/: brainstorm, build, orchestrator, research (auto-discovered by MCP server) - skills/: research methodology (auto-discovered by MCP server) - hooks/: pre-tool-use, post-tool-use (BFF block removed), session-start, stop, pre-compact, user-prompt-submit - frameworks/: opencode/plugin.ts (resolves hooks via import.meta.url — works as project-local or global plugin), github/hooks.json - mcp/index.ts: auto-discovers agents/*.md and skills/*.md from frontmatter (replaces hand-maintained registry); server renamed all-agents - docs/: agent-infrastructure.md (generalized), research docs (7 files), ai_architectures.md, llama-server-cuda-wsl2.md - install.sh: idempotent setup — Copilot global hooks, OpenCode global plugin + AGENTS.md + MCP entry, VS Code global MCP config
163 lines
7.8 KiB
Markdown
163 lines
7.8 KiB
Markdown
> What are the different AI model architectures? ... What are the experimental
|
||
> ones? ... Any more? ... I want you to combine all of those into a small
|
||
> bulleted list
|
||
|
||
---
|
||
|
||
## Sequential & Vision Foundations
|
||
|
||
- RNN, LSTM, GRU: Sequential networks processing text and audio through internal
|
||
memory loops.
|
||
- CNN: Spatial processing architectures using mathematical filters to parse
|
||
visual grids.
|
||
- Vision Transformers (ViT): Vision models using self-attention directly on
|
||
split image patches.
|
||
|
||
## Modern Generative Engines
|
||
|
||
- Transformers: High-capacity architectures utilizing global self-attention to
|
||
process text sequences.
|
||
- GANs: Paired networks staging a generator against a discriminator to create
|
||
imagery.
|
||
- Diffusion Models: Generative systems producing content by systematically
|
||
reversing data noise.
|
||
- VAEs: Systems compressing data into lower dimensions to decode structural
|
||
variations.
|
||
|
||
## Efficiency & Structure Scaling
|
||
|
||
- Mixture of Experts (MoE): Sparse networks activating targeted sub-expert nodes
|
||
to optimize computation.
|
||
- Graph Neural Networks (GNN): Frameworks natively executing message-passing
|
||
over complex non-grid relationships.
|
||
- Large Action Models (LAM): Automation architectures translating semantic
|
||
intents into executable digital workflows.
|
||
- Attention-SSM Hybrids: Layered blocks combining global attention processing
|
||
with fast linear scaling.
|
||
|
||
## Next-Gen Context & Linear Processing
|
||
|
||
- Mamba & SSMs: Hardware-aware state models enabling infinite context lengths
|
||
with linear scaling.
|
||
- Liquid Neural Networks (LNN): Dynamic architectures utilizing fluid
|
||
differential equations for real-time adaptation.
|
||
- Diffusion Language Models (DLM): Non-sequential text systems iteratively
|
||
refining whole content blocks.
|
||
- Joint Embedding Predictive (JEPA): Meaning-driven architectures predicting
|
||
high-level conceptual outcomes over raw pixels.
|
||
|
||
## Dynamic Reasoning & Alternative Physics
|
||
|
||
- Test-Time Training (TTT / MoR): Flexible loops scaling internal computing
|
||
depth dynamically per query.
|
||
- Distributed Neural Graphs: Modular webs routing separate context tokens
|
||
through specialized pathways.
|
||
- Neuro-Symbolic Hybrids: Integrated systems merging raw neural intuition with
|
||
precise sandbox compilation.
|
||
- Spiking Neural Networks (SNN): Bio-mimetic hardware structures firing nodes
|
||
strictly via threshold spikes.
|
||
- Thermodynamic Models: Analog physical systems computing answers by settling
|
||
into low-energy equilibriums.
|
||
|
||
---
|
||
|
||
> Which of these require the least power consumption, least training data, least
|
||
> compute, and least VRAM
|
||
|
||
---
|
||
|
||
To identify which architecture requires the absolute minimum across all four
|
||
constraints, the category of Alternative Physics & Bio-mimetic Architectures
|
||
wins outright, followed immediately by classical spatial networks. The absolute
|
||
winners for each specific bottleneck are broken down below:
|
||
|
||
## 1. Least Power Consumption: Spiking Neural Networks (SNNs)
|
||
|
||
- The Winner: Spiking Neural Networks (SNNs).
|
||
- Why: Traditional AI models are mathematically "dense," meaning billions of
|
||
multipliers and transistors must turn on and consume electricity for every
|
||
single token processed. SNNs mimic biological neurons and operate using sparse
|
||
electrical spikes.
|
||
- The Difference: If there is no new incoming data, energy consumption drops to
|
||
near zero. Because nodes only fire when an electrical threshold is crossed,
|
||
SNNs use up to 100× to 1000× less electrical energy than traditional deep
|
||
learning networks, making them the gold standard for low-power edge hardware.
|
||
[1]
|
||
|
||
## 2. Least Training Data: Neuro-Symbolic Hybrids & JEPAs
|
||
|
||
- The Winner: Neuro-Symbolic Hybrids (closely followed by Joint Embedding
|
||
Predictive Architectures / JEPA).
|
||
- Why: Transformers require hundreds of billions of text tokens to accidentally
|
||
stumble upon and learn basic mathematical rules (like addition).
|
||
Neuro-Symbolic systems hardcode a classical, rigid rules engine directly into
|
||
the AI's core.
|
||
- The Difference: Instead of needing to see 10,000 examples of a math problem to
|
||
recognize a pattern, a Neuro-Symbolic model needs exactly one prompt because
|
||
it instantly routes the logic out to a pre-programmed mathematical compiler.
|
||
|
||
## 3. Least Compute (FLOPs): Convolutional Neural Networks (CNNs)
|
||
|
||
- The Winner: Convolutional Neural Networks (CNNs).
|
||
- Why: Modern Vision Transformers (ViTs) compare every single pixel patch
|
||
against every other pixel patch globally, creating an exploding computation
|
||
bottleneck. CNNs strictly look at localized, overlapping grids using tiny math
|
||
matrices (kernels).
|
||
- The Difference: For simple classification and computer vision tasks (like
|
||
recognizing a stop sign), a highly optimized CNN calculates a mere fraction of
|
||
the floating-point operations (FLOPs) required by an attention-based Vision
|
||
Transformer.
|
||
|
||
## 4. Least VRAM: Mamba / State Space Models (SSMs)
|
||
|
||
- The Winner: Mamba & State Space Models (SSMs).
|
||
- Why: In traditional Transformers, VRAM usage scales quadratically with length
|
||
because the model must store a massive, continuously growing history of the
|
||
conversation (known as the KV-Cache) directly inside the GPU memory.
|
||
- The Difference: Mamba compresses conversational history into a fixed-size
|
||
mathematical "state." Whether you have been chatting with the model for 5
|
||
sentences or 50,000 sentences, the VRAM footprint remains entirely flat and
|
||
locked, drastically reducing hardware memory requirements during deployment.
|
||
[2, 3]
|
||
|
||
---
|
||
|
||
## Direct Comparison Overview
|
||
|
||
| Architecture [2, 4, 5] | Power Efficiency | Data Efficiency | Compute (FLOPs) Needed | VRAM Footprint | Best Used For |
|
||
| ---------------------- | ---------------- | --------------- | ---------------------- | -------------- | ---------------------------------- |
|
||
| SNNs | 🏆 Lowest | Moderate | Low | Low | Battery-powered edge devices |
|
||
| Neuro-Symbolic | Moderate | 🏆 Lowest | Low | Low | Exact math, logic, and coding |
|
||
| CNNs | Low | High | 🏆 Lowest | Low | Embedded image/object detection |
|
||
| Mamba / SSMs | Moderate | High | Low | 🏆 Lowest | Long-context chatbots on local PCs |
|
||
| Transformers | Extreme High | Extreme High | Extreme High | Extreme High | Frontier LLMs and heavy cloud AI |
|
||
|
||
---
|
||
|
||
If you want to move forward with a lightweight project, let me know:
|
||
|
||
- Do you want to see a Python sample of a lightweight CNN or Mamba block?
|
||
- Are you trying to deploy an AI model onto a specific microdevice (like a
|
||
Raspberry Pi or Arduino)?
|
||
|
||
[1] [https://arxiv.org](https://arxiv.org/html/2406.08413v1) [2]
|
||
[https://www.reddit.com](https://www.reddit.com/r/LocalLLaMA/comments/1qk1pzy/is_the_next_leap_in_ai_architectural_comparing/)
|
||
[3]
|
||
[https://virtualizationreview.com](https://virtualizationreview.com/articles/2026/01/27/what-gpu-do-you-really-need.aspx)
|
||
[4]
|
||
[https://medium.com](https://medium.com/@TDKVentures/breaking-the-compute-bottleneck-power-memory-and-the-new-ai-architecture-paradigms-eee264cec99a)
|
||
[5]
|
||
[https://www.socomec.us](https://www.socomec.us/en-us/solutions/business/data-centers/understanding-power-consumption-data-centers)
|
||
|
||
---
|
||
|
||
Further exploration of SSNs reveals they are the most efficient in all
|
||
categories but they require specialized hardware, like the Akida™ PCIe Board
|
||
AKD1000 (or you can use their cloud service pretty cheaply during a trial)
|
||
|
||
If your ultimate goal is to build or run something that achieves frontier-class
|
||
reasoning while staying highly hardware-efficient, you must look toward a Hybrid
|
||
SSM (Mamba) + Transformer + MoE architecture. This gives you the static VRAM
|
||
footprint and linear scaling of an alternative model, backed by the proven
|
||
intelligence of standard attention loops.
|