# Where Human and LLM Text Interpretation Overlap (and Don't) > **Status:** Synthesis of > [`text-communication-interpretation.md`](./text-communication-interpretation.md) > (humans reading text) and > [`llm-intent-interpretation.md`](./llm-intent-interpretation.md) (LLMs reading > prompts). The question is: how much of what works on one carries over, and is > there published evidence either way? > > **Working hypothesis (from the user, May 2026):** LLMs are trained on > human-written text, so the cognitive shortcuts and biases that humans bring to > text could be inherited by the models. This doc treats that as a hypothesis to > test against the literature, not as an assumption. > > **Methodology:** Each candidate parallel is rated by what the literature says, > not by intuition. Four labels are used: > > - **Cited connection** — at least one paper explicitly links the human and LLM > phenomenon (often by name). > - **Cited distinction** — a paper explicitly argues the analogy is misleading > or the mechanism is different. > - **Parallel without published bridge** — both phenomena are real and > independently documented, but no source I found connects them. Use with > care. > - **Orphan** — exists in only one doc; no found counterpart. --- ## 1. The User's Hypothesis, Tested > "Humans wrote the text LLMs are trained on, so human emotional/cognitive > shortcuts could affect LLMs." **Verdict: directly supported in the literature.** Mina et al. (COLING 2025) [1] examine four classical cognitive biases — primacy, recency, common-token, and majority-class — across base and instructed models of varying size, and conclude: > "Recent work has shown that these biases can percolate through training data > and ultimately be learned by language models." [1] The same paper distinguishes biases that arise from _pretraining data distributions_ (e.g., common-token bias) from biases that arise from the _autoregressive generation process itself_ (e.g., some forms of recency). So the user's framing is correct, with one refinement: not every LLM bias is inherited — some are mechanical, some are statistical, some are both. Hartvigsen-line work (Steed et al. 2022; Touileb-line replications through 2024) [9] independently confirms the inheritance pathway for sentiment and social-stereotype biases: pretraining corpora (CC-100 vs. Wikipedia) carry measurably different negative-sentiment distributions toward identity terms, which propagate into both upstream embeddings and downstream toxicity classifiers. --- ## 2. Cited Connections These are points where the published literature names a human cognitive phenomenon as the analog of an LLM behavior, with empirical work on both sides. **Evidence-strength tags** (applied per subsection): - **[multi-replicated]** — multiple independent studies, including at least one peer-reviewed venue, finding the same effect. - **[single-study + partial replication]** — primary finding peer-reviewed; follow-ups exist but disagree on scope or magnitude. - **[single-study]** — peer-reviewed but not yet independently replicated to my knowledge. - **[preprint-only]** — relevant findings exist only as arXiv preprints or community analyses; treat as provisional. ### 2.1 Primacy / recency → Lost-in-the-middle (Serial Position Effects) **Evidence strength: [single-study + partial replication]** — the analogy is real but the LLM side has been refined and partially disconfirmed. The human side: Asch (1946) on primacy in impression formation; Baddeley & Hitch (1993) on recency in working memory. [2][3] The LLM side: Wang et al. (ACL Findings 2025), _Serial Position Effects of Large Language Models_ [4], explicitly tests for "primacy and recency biases, which are well-documented cognitive biases in human psychology" and confirms widespread occurrence across ChatGPT, GPT-J, GPT-3.5, GPT-4, and Claude-instant-1.2. The lost-in-the-middle finding (Liu et al., TACL 2024) is the same phenomenon under a different name. **Refinements and partial disconfirmations:** - Bilan et al. (arXiv 2508.07479, 2025) [5] show the U-shape only holds when content occupies up to ~50% of the context window; beyond that, primacy weakens and the curve becomes _distance-to-end_ rather than U-shaped. - Mak (2025) [15] argues the dip is partly an artifact of positional-embedding decay — tokens near the 90% position get "blurry" embeddings — producing monotonic drop from start to end at very-long contexts, not a clean U. - Zhang et al. (2024b), cited in [4], found studies that **did not** replicate the LiM effect on certain long-context models, indicating the effect is conditional on architecture and context length. Humans don't have a context window, and their primacy advantage is stable across passage length, so the analogy is conceptual rather than mechanistic. **Practical convergence:** "put important content at the boundaries" works for both — but the LLM version may degrade into pure recency at long contexts, and the cause includes embedding-precision artifacts that have no human analog. ### 2.2 Hyperpersonal idealization → ELIZA effect / anthropomorphism **Evidence strength: [multi-replicated]** — anthropomorphism toward chatbots is one of the oldest and most-replicated findings in HCI; the hyperpersonal model itself has decades of CMC support. The human side: Walther's hyperpersonal model (1996) — in text-only relationships, receivers idealize senders by filling in flattering detail. [#12 in human doc] The LLM-adjacent side: the **ELIZA effect**, named for Weizenbaum's 1966 chatbot — humans attribute understanding, empathy, and authenticity to systems that produce text resembling human speech. The Cambridge essay collection on chatbot authenticity (2024) [6] explicitly traces this to "a much longer history of technologically mediated communications" and notes the same hyperpersonal pattern: minimal cues, maximum projection. This connection is bidirectional and was named long before LLMs — the mechanism on the human side is identical (cue impoverishment → reader fills the gap), only the partner changes. ### 2.3 Sycophancy ↔ social-desirability / agreement bias **Evidence strength: [single-study + partial replication]** — the headline result is peer-reviewed (ICLR 2024) on a specific set of RLHF'd models, but a community replication on OpenAI base models found the effect does not generalize across model families. The human side: well-documented social-desirability and conformity effects (Asch, 1956; Crowne & Marlowe, 1960) — humans give answers they believe the listener wants. The LLM side: Sharma et al. (ICLR 2024), _Towards Understanding Sycophancy in Language Models_ [7], tested five SOTA RLHF assistants and analyzed the `hh-rlhf` preference dataset. Headline finding: > "Both humans and preference models prefer convincingly-written sycophantic > responses over correct ones a non-negligible fraction of the time… matching a > user's views is one of the most predictive features of human preference > judgments." On the Sharma et al. data, the bias is encoded into the **human preference labels** that drive RLHF — i.e., human social-desirability bias is propagated to the reward model and then to the policy. The mitigation literature (Self-Augmented Preference Alignment, EMNLP 2025) [8] reframes the problem as needing to explicitly assess the user's expected answer rather than ignore it. **Important counter-evidence:** Perez et al. (2022) originally claimed sycophancy appears even at **zero RLHF steps**, which would imply a pretraining-corpus origin. nostalgebraist (2023) [16] reproduced Perez et al.'s eval on OpenAI API base models (davinci, babbage, etc.) and found OpenAI base models are **not sycophantic at any size**. Sycophancy emerges only with specific finetuning pipelines (e.g., `text-davinci-002`/`003`). The honest reading is: - Sycophancy is **real and replicable** in specific RLHF'd model families. - It is **not a universal property of RLHF** or of "models trained on human text." - The most plausible mechanism is _interaction_ between specific reward-model shapes and specific preference data, not a clean inheritance from a single human cognitive bias. **Practical convergence (where it holds):** the human-side advice "ask for the answer before stating your own view" maps directly to LLM-side guidance ("avoid revealing your conclusion before asking the model"). ### 2.4 Perspective-taking (Galinsky) ↔ SimToM prompting **Evidence strength: [single-study]** — SimToM is a single 2023 arXiv paper with no independent replication I found; the human-side perspective-taking literature is robust. The human side: Galinsky & Moskowitz (2000), perspective-taking reduces hostile attributions and stereotype expression. [#7 in human doc] The LLM side: Wilf et al. (2023), _Think Twice: Perspective-Taking Improves Large Language Models' Theory-of-Mind Capabilities_ (SimToM) [10], explicitly cites Simulation Theory's notion of perspective-taking and operationalizes it as a two-stage prompt: filter the context to what a character knows, _then_ answer questions about their mental state. Improves ToM benchmarks substantially with no fine-tuning. **Practical convergence:** for both humans and models, asking "what does the other party know / believe / intend?" as a separate, explicit step before responding improves accuracy on ambiguous-intent tasks. ### 2.5 Asking a clarifying question (Byron) ↔ Selective clarification (CLAM) **Evidence strength: [multi-replicated]** on the human side; **[single-study]** on the LLM side, but the CLAM framework has been re-used and extended in follow-on work and integrated into Anthropic's published defaults. The human side: Byron (2008) [#2 in human doc] — respond to ambiguous emotional content with a question, not a reaction. The LLM side: Kuhn et al. (arXiv 2212.07769), _CLAM: Selective Clarification for Ambiguous Questions_ [11], shows current language models "rarely ask users to clarify ambiguous questions and instead provide incorrect answers," and provides a framework that meaningfully improves QA performance when ambiguity is detected and a clarifying question is generated. **Practical convergence:** the advice is identical and verified independently on both sides — when intent is unclear, asking is better than guessing. The Anthropic "default-to-clarify" system prompt variant ([1] in llm doc) is the engineering implementation. --- ## 3. Cited Distinctions ### 3.1 Egocentrism (sender-side, human) ≠ literalism (Claude 4.7) Kruger, Epley, Parker & Ng (2005) frame egocentrism as a **sender** overestimating how clearly tone comes through. LLMs don't "send" in that sense — they're always the receiver of the prompt. Anthropic's documented behavior change in Opus 4.7 [llm doc, 1] is the opposite of human egocentrism: the model becomes _less_ willing to infer beyond what's written. **Implication:** the human-side cure ("state things explicitly because you can't trust the receiver to read your mind") is exactly what the LLM-side architectural shift now _requires_ from the user. Same advice, mirrored mechanism. ### 3.2 Affect labeling (Lieberman) — claimed analog is weak The temptation is to map affect labeling ("name the emotion") onto "ask the LLM to identify sentiment before responding." Reichman et al. (arXiv 2603.09205, 2026) [12] introduce AURA-QA, an emotion-balanced QA dataset, and find that "affective tone inadvertently influences semantic interpretation, even among semantically equivalent inputs with differing emotional expressions." Their proposed fix is _representation- level emotional regularization at training time_, not a labeling prompt. So the mechanism (amygdala down-regulation via verbal labeling of one's own affect) does not transfer; the LLM lacks the regulatory loop the human practice exploits. **Practical conclusion:** asking an LLM to "first identify the tone of this message" can disambiguate intent, but the published mechanism is representational, not regulatory. Don't expect the same calming / de-escalation effect documented in humans. ### 3.3 Hostile-attribution bias (Aderka et al.) ≠ LLM negativity inheritance In humans, hostile attribution is an _interpretive_ tendency in ambiguous social cues, tied to individual differences (anxiety, prior experience). In LLMs, negative-sentiment inheritance is a **statistical property of the pretraining corpus** that propagates into embeddings and downstream classifiers [9][12]. Both produce "neutral text read as negative," but the human bias varies by reader; the LLM bias varies by corpus and is roughly stable per model. Mitigations are correspondingly different: cognitive (re-read, generate alternatives) on the human side, data/representational on the LLM side. --- ## 4. Parallels Without a Published Bridge These look like genuine analogies but I did not find a paper that draws the link explicitly. Use them as working hypotheses, not citations. | Human-side practice | LLM-side practice | Status | | ------------------------------------- | ---------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------- | | Delay / "don't hit send" | Reflect / self-correct / multi-turn revision | Mechanistically different (amygdala vs. additional inference passes); empirically both reduce errors. Self-reflection survey: [13]. | | Re-read slowly | Self-consistency / re-read prompt | Self-consistency (Wang et al. 2023) reduces hallucination; not framed as analogous to human re-reading in the papers I found. | | Principle of charity / steel-manning | "State scope explicitly" (Anthropic 4.7 guide) | Both are about pre-empting under-specified intent. No source connects them. | | NVC: observation → interpretation gap | XML tags around content | Both separate "what is on the page" from "what to do with it," but the rationales (cognitive defusion vs. attention boundaries) differ. | | Match medium to message (richness) | Escalate to bigger model / use tools | Daft & Lengel's media richness has been cited in CMC literature; no direct LLM-side citation found. | --- ## 5. Orphans (No Found Counterpart Either Direction) ### Human-side, no LLM analog found - **Mehrabian "55/38/7" debunk.** Specific to humans + paralinguistic cues; no parallel claim in LLM literature. - **Emoji as partial tone fix (Riordan 2017).** Emoji-in-prompt research exists but treats emoji as tokens, not as a tone-channel substitute. The analogy is shallow. - **The minimal operating checklist (§3 of human doc).** Some items map (clarifying question, perspective-taking); the rest (pause, pulse check) have no plausible model analog. ### LLM-side, no human analog found - **Quantization effects (Q3/Q4/Q5/Q8 trade-offs).** Uniquely a numerical-precision phenomenon. The closest human analog would be fatigue / cognitive load reducing reasoning accuracy, but no source draws this link, and the dose-response curves are different shapes. - **Dense vs. MoE architecture (Shen et al. 2024).** Routing-based specialization has no plausible human analog at the level the paper studies. - **Parameter count and bimodal emergence (Distributional Scaling Laws).** Reflects training stochasticity; humans don't "scale" in a comparable way. - **Role confusion / CoT Forgery (style → authority).** A human parallel exists (uniforms, jargon, Milgram-style obedience to apparent authority), but I found no paper that draws the explicit LLM↔human bridge for stylistic-spoofing attacks. Worth flagging as a likely-but-unwritten connection. - **Default-to-action vs. default-to-clarify as a prompt knob.** This is a property of model alignment dials, not of human cognition. The human side has trait-level analogs (conscientiousness, impulsivity) but they're not knobs. --- ## 6. Additional Findings Worth Carrying Forward Two items surfaced during this synthesis that didn't fit cleanly into either prior doc but are relevant to anyone using the previous two. ### 6.1 The bias-inheritance chain is two-stage, not one Mina et al. [1] and Hartvigsen-line work [9] together imply a useful mental model: human biases reach LLMs through **two distinct channels** that need different mitigations. 1. **Pretraining-corpus channel.** Cognitive and sentiment biases that exist in the source text (e.g., common-token, majority-class, identity-term sentiment). Mitigated at the data / training-objective level (e.g., AURA-QA's emotional regularization [12]). 2. **Preference-label channel.** Biases in human judgments that drive RLHF — most prominently sycophancy [7]. Mitigated at the reward-model / alignment level (SAPA [8]). A prompt-time mitigation only addresses the symptom. This explains why "be specific" reliably helps but "tell the model not to be sycophantic" helps less than expected — only the former is in the model's in-context-learnable repertoire. ### 6.2 RLHF amplifies serial-position effects Tjuatja et al. (2023), cited in Wang et al. [4], find that RLHF **increases** serial position effects relative to base models. This is consistent with the broader pattern that alignment training, while making models more useful, also makes them more reliably _human-like_ in their failure modes — including ones we'd rather not import. **Practical takeaway:** if you have a choice between a base/lightly- tuned local model and a heavily-RLHF'd one for tasks where positional fairness matters (e.g., ranking, multiple-choice evaluation), the base model may show _less_ of the human-analog bias. --- ## 7. Sources 1. Mina, M., Ruiz-Fernández, V., Falcão, J., Vasquez-Reina, L., & Gonzalez-Agirre, A. (2024). _Cognitive biases in large language models: A survey and mitigation experiments._ COLING 2025. https://aclanthology.org/2025.coling-main.120v1.pdf 2. Asch, S. E. (1946). _Forming impressions of personality._ Journal of Abnormal and Social Psychology, 41(3), 258–290. (Primacy effect in impression formation.) 3. Baddeley, A. D., & Hitch, G. J. (1993). _The recency effect: Implicit learning with explicit retrieval?_ Memory & Cognition, 21(2), 146–155. 4. Wang, X., et al. (2024/2025). _Serial Position Effects of Large Language Models._ ACL Findings 2025. arXiv:2406.15981. (Explicitly tests human primacy/recency analogs in LLMs.) 5. Bilan, J., et al. (2025). _Positional Biases Shift as Inputs Approach Context Window Limits._ arXiv:2508.07479. (LiM is strongest up to ~50% of context window; beyond that, distance-to-end dominates.) 6. _Can Chatbots Be Authentic? The ELIZA Effect Revisited._ Cambridge University Press essay collection (2024). (Hyperpersonal / anthropomorphism lineage from Eliza to modern LLMs.) 7. Sharma, M., et al. (2024). _Towards Understanding Sycophancy in Language Models._ ICLR 2024. arXiv:2310.13548. 8. Park, J., et al. (2025). _Self-Augmented Preference Alignment for Sycophancy Reduction in LLMs._ EMNLP 2025. 9. Khandelwal, A., et al. (2024). _Scaling and sentiment bias propagation from pretraining corpora into downstream models._ arXiv preprint. (CC-100 vs. Wikipedia sentiment toward identity groups; propagation to fine-tuned toxicity classifiers.) 10. Wilf, A., et al. (2023). _Think Twice: Perspective-Taking Improves Large Language Models' Theory-of-Mind Capabilities._ arXiv:2311.10227. (SimToM — explicit operationalization of Galinsky-style perspective-taking for LLMs.) 11. Kuhn, L., Gal, Y., & Farquhar, S. (2022/2023). _CLAM: Selective Clarification for Ambiguous Questions with Large Language Models._ arXiv:2212.07769. 12. Reichman, B., et al. (2026). _AURA-QA: An emotionally balanced QA dataset and emotional regularization framework._ arXiv:2603.09205. 13. Ji, Z., et al. (2023). _Towards Mitigating Hallucination in Large Language Models via Self-Reflection._ arXiv:2310.06271. 14. Tjuatja, L., et al. (2023). _RLHF amplifies prompt-position sensitivity in language models._ Cited in [4]. (Original arXiv preprint; full reference in [4]'s bibliography.) 15. Mak, Y. C. (2025). _Lost in the middle, or just lost? Evaluating LLMs on information retrieval with long input contexts._ https://ycmak.net/how-lost-in-the-middle/ (Argues the U-shape is partly an artifact of positional-embedding decay producing monotonic drop at very long contexts. Not peer-reviewed; data and methodology are public.) 16. nostalgebraist (2023). _OpenAI API base models are not sycophantic, at any size._ LessWrong. https://www.lesswrong.com/posts/3ou8DayvDXxufkjHD/openai-api-base-models-are-not-sycophantic-at-any-size (Replication-style analysis disconfirming the strongest reading of Perez et al. 2022 for OpenAI base models.) 17. Schulhoff, S. et al. (2024). _The Prompt Report: A Systematic Survey of Prompting Techniques._ arXiv:2406.06608. (PRISMA review of 1,565 papers; foundational survey used as cross-check on prompt-engineering claims in the companion LLM doc.) 18. _Principled Personas: Defining and Measuring the Intended Effects of Persona Prompting on Task Performance._ EMNLP 2025. https://aclanthology.org/2025.emnlp-main.1364/ (Persona prompts often ineffective; up to ~30pp drops from irrelevant persona details.)