Weighted Thoughts

When Your Team's AI Agents Outpace Trust

Ylli Prifti, Ph.D. — Sun, 03 May 2026 01:58:48 GMT

AI adoption alters the signals that organisational trust judgement depends on, without replacing the underlying mechanisms. Three dynamics follow: competence signals compress toward the mean as the tooling floor rises; the tools themselves evolve faster than mental models can track; and the structure of work shifts toward small human teams directing AI agent stacks, reducing the human-to-human interaction surface where trust evidence is generated. The compound effect is not a trust collapse but a noisier allocation of trust — decisions made at full speed on degraded inputs. This is a design problem rather than a behavioural one. Behavioural prescriptions operate at individual speed in an environment moving at technological speed. Structural interventions — designing for outlier visibility, rewarding adaptation, making calibration legible — can move faster, but require frameworks that can be tested empirically rather than defended by argument.

The previous article [1] argued that organisations run Putnam dynamics at sprint velocity — the same belief-formation mechanisms that build social capital across decades, compressed into quarters. That argument left a question open. If the mechanisms are compressed but otherwise intact, what happens when the environment they operate in is itself changing rapidly? AI is the obvious case to examine. It is reshaping how work is produced, how capability is expressed, and how teams are structured — all on timescales shorter than the trust-formation cycles that depend on those signals.

This article works through that question. It is not an argument that AI breaks trust. The cognitive architecture for trust judgement is robust and continues to function. The argument is narrower and, I think, more interesting: that AI alters the inputs to trust formation in specific ways, and that the alterations compound. What the alterations produce, and what to do about them, is the rest of the article.

I. The competence signal compresses

AI accelerates convergence to the mean. When everyone has access to the same capability amplifier, the floor rises — work that was previously difficult becomes routine, and the baseline quality of output increases across the board. The productivity gain is substantial even without sophisticated use of the tools. This is broadly positive. But it compresses the visible range of competence. The gap between the 30th percentile and the 70th percentile narrows, because AI lifts the lower performers more than it lifts the higher ones. The distribution tightens.

What counts as competence in this environment is itself a moving target. The skills that differentiated an effective AI user six months ago are already commoditised. The prompting patterns that produced exceptional output last quarter are now built into the default tooling. Competence is no longer a state you achieve — it is a rate of adaptation you sustain. And a rate is harder to observe and evaluate than a state.

But the competence belief problem is one of signal-to-noise. In a compressed distribution, the evidence required to distinguish genuine excellence from competent-enough becomes harder to observe. Humans do not stop making trust decisions — our cognitive architecture makes hundreds of trust judgements daily, in and out of work, and it does not wait for perfect evidence. What changes is the reliability of those decisions. When the signals are finer-grained but the decision-making pace stays the same, the result is more frequent misjudgements — trust placed where it should not be, withheld where it should not be, calibrated to signals that no longer differentiate. We have not yet adapted to reading competence in an AI-augmented environment. The old heuristics — speed of delivery, polish of output, confidence of presentation — are precisely the signals that AI compresses to the mean.

Identifying and rewarding competence in organisations was already a problem with known pathologies before AI entered the picture. Pluchino, Rapisarda, and Garofalo demonstrated computationally that promoting the best performer — the intuitive strategy — can systematically degrade organisational efficiency when competence at one level does not predict competence at the next [12]. I had the pleasure of discussing these dynamics with Andrea Rapisarda during a week-long summer school in Lipari in 2019 — the fragility of competence measurement in hierarchies was already a central concern in computational social science well before AI amplified it. Their Ig Nobel Prize-winning finding — that random promotion outperforms merit-based promotion under certain structural conditions — revealed how fragile competence measurement is even in stable environments with clear performance signals. AI makes this fragility acute. The signals that promotion, delegation, and trust decisions depend on are now compressed into a narrower band, measured against a shifting baseline, and produced through a process that is increasingly opaque to the evaluator. The problem is not new. It is suddenly a lot harder and a lot more consequential.

The outliers — the people who use AI to do things nobody else thought to attempt — still produce distinguishable output. But they are harder to see against a background where everyone’s work looks increasingly similar. And in an organisation running trust dynamics at compressed timescales, “harder to see” translates directly to “slower to trust” — which means slower to delegate, slower to empower, and slower to build the high-trust equilibrium that makes exceptional work possible in the first place.

II. The tool outpaces the mental model

The human in the loop remains accountable. This has always been the case and AI does not change it. A surgeon is accountable for the outcome whether they use a scalpel or a robotic system. An engineer is accountable for the architecture whether they wrote every line or directed an AI coding agent. The tool does not diffuse accountability — it raises the bar for what competent use of the tool looks like.

What has changed is the rate at which the tool evolves relative to the human’s ability to understand what they are using.

In a stable tooling environment, a professional builds a mental model of their tools through experience. They learn the capabilities, the edge cases, the failure modes. This mental model is what makes accountability meaningful — you can be held accountable for the outcome because you understood, or should have understood, what the tool would do. Competence beliefs about you incorporate your demonstrated mastery of the tools you use.

AI disrupts this by changing faster than mental models can update. The model you used last month has new capabilities this month. The prompting patterns that worked last quarter produce different results this quarter. The boundary between what the tool can and cannot do shifts continuously. A professional who was genuinely competent in their use of AI three months ago may be operating on an outdated mental model today — not through negligence, but because the tool moved.

This is what I would call the perpetual adjustment problem. The traditional trust-building cycle — demonstrate competence, accumulate reputation, earn delegation — assumes that the capability being demonstrated is relatively stable. When it is not, the cycle cannot complete. You demonstrate competence with version N of the tool. By the time the observation translates into a stable competence belief in your colleagues, you are operating on version N+2, and the demonstration is no longer current.

The disposition and opportunity beliefs described in the previous article [1] face the same instability. Disposition beliefs — does this person act in my interest? — depend partly on predictability. When someone’s capabilities and methods shift continuously, their behaviour becomes harder to predict, which slows disposition belief formation even when their intentions are constant. Opportunity beliefs — do structural conditions allow them to act? — shift as the organisational norms around AI use evolve, often faster than the norms can be articulated.

The result is not a trust breakdown. The belief formation mechanisms work as they always have. What fails is the convergence condition. Trust requires that beliefs stabilise long enough to become actionable. When the environment shifts faster than beliefs can settle, the organisation exists in a state of perpetual belief instability — always updating, never converging, never reaching the stable trust state that enables efficient collaboration.

A colleague — a CTO navigating exactly these dynamics in a large engineering organisation — described what he is observing as “perpetual cognitive load.” That framing is precise. It captures the felt experience of what the theoretical framework describes: the mental cost of continuously re-evaluating trust judgements that never settle. Every delegation decision, every code review, every architectural discussion requires a fresh assessment because the evidence base has shifted since the last one. The cognitive load is not from the work itself — it is from the trust recalculation that the work now requires.

III. The mean absorbs the signal

There is a third dynamic that interacts with the first two, and it operates at the population level rather than the individual level.

In any community, trust formation depends partly on differentiation. You trust specific people for specific things because you have evidence that distinguishes them from others. The doctor you trust with a difficult diagnosis. The engineer you trust with a critical system. The manager you trust with a difficult conversation. These trust relationships are built on observed differentiation — this person is notably better at this thing than the available alternatives.

AI compresses differentiation. When the same tool is available to everyone, the outputs converge. Documents look more similar. Code looks more similar. Analyses look more similar. The surface-level quality rises uniformly, which is good for the organisation’s baseline output but corrosive to the differentiation signals that trust formation depends on.

The phenomenon has the texture of what Galton called regression toward mediocrity [10] — a homogenising drift that flattens extremes — but the mechanism is contemporary. Recent work on large language models has shown that because AI systems optimise for statistically probable patterns, their outputs converge on the safe and predictable — a dynamic Krashniak et al. term Galton's Law of Mediocrity [11]. When these outputs are incorporated into human work, the convergence propagates. The floor rises. The ceiling does not rise at the same rate. The distribution compresses. The outliers — those who use AI to attempt things nobody else would try, who combine domain expertise with tool mastery to produce genuinely novel work — still exist. And over time, they will separate from the mean decisively, because AI amplifies the gap between 'using the tool competently' and 'using the tool creatively.' But in the medium term, during the adoption phase that most organisations are currently navigating, the compression effect dominates.

The trust consequence at community scale is significant. In Putnam’s framework, social capital depends on differentiated trust relationships — knowing who to turn to for what. When differentiation compresses, these relationships become harder to form and maintain. The community does not lose trust. It loses the granularity of trust — the specific, targeted confidence in specific people that makes complex coordination possible without excessive oversight.

This connects directly to the compression argument from the previous article. Organisations run Putnam dynamics at sprint velocity. When differentiation compresses at the same time that belief convergence slows (because competence signals are finer-grained and the tool keeps evolving), the high-trust equilibrium becomes harder to reach — not because the basin disappears, but because the path to it becomes longer while the dynamics run faster.

IV. The compound effect: perpetual instability

These three dynamics interact. Competence signals compress to the mean, making differentiation harder to observe. The tool evolves faster than mental models can track, preventing beliefs from stabilising. And the loss of differentiation granularity weakens the specific trust relationships that complex coordination depends on.

The compound effect is not a vicious cycle in the Putnam sense — it is not defection breeding distrust. It is something more disorienting: noisy oscillation. Trust decisions are being made, revised, overridden, and recalibrated in rapid cycles. The organisation is not frozen between equilibria — it is churning through them, never settling long enough for the stable patterns that efficient collaboration depends on to form.

What does this look like in practice? You delegate a critical decision to someone whose last three outputs were excellent — but you no longer know whether that excellence was theirs or their agent’s. You second-guess yourself. You add a review step you wouldn’t have added last year. The review itself is informed by your own AI-assisted assessment, which you’re not entirely sure you calibrated correctly. You make the trust decision anyway — you have to, the deadline doesn’t wait for epistemic certainty — but the decision carries less conviction than it would have in a stable signal environment. Next week, new information arrives that shifts your assessment again. The cycle repeats.

This is the perpetual cognitive load. Not the difficulty of any single trust decision, but the accumulation of decisions that never fully resolve. Each one consumes attention. Each one leaves a residue of uncertainty. The instinct-driven trust judgements that used to operate in the background — the automatic, low-cost assessments that free up attention for the actual work — now demand conscious processing. You are always on the lookout. Always updating your assumptions. Always aware that the basis for your last assessment may already be stale.

Signal quality is not only degrading. New signals are forming alongside the old ones — prompt traces, reasoning logs, tool-usage patterns, reproducibility across runs, latency-quality tradeoffs, consistency across agent workflows. These are real, they did not exist five years ago, and they are becoming legible to the people who learn to read them. But these signals share a property that the older ones did not: each is a technological artefact, generated by tools that are themselves non-stationary. A prompt trace from last quarter's model carries different information than a prompt trace from this quarter's, even for the same task. Reproducibility metrics depend on the specific version. Tool-usage patterns shift as the tools shift. The signal generation process itself is moving, and moving faster than the evaluators can learn to read it. The gain in signal availability is offset by a loss in signal stability. This is not a transition from one evaluation regime to another. It is sustained non-stationarity, where the rate at which new evaluation signals appear and obsolete themselves is itself increasing. The new signals do not solve the trust-evaluation problem. They multiply the surfaces on which it must run, each of which is moving.

Consider what a team looks like in practice. In previous work [9] I described the emerging unit of engineering production as two humans and four AI agents — two people directing multiple specialised AI systems for analysis, coding, validation, and execution. This is not a theoretical projection. It is how work increasingly happens. But it changes the trust dynamics fundamentally.

In a traditional team of eight engineers, trust forms across a dense network of relationships. Each person observes the others’ work, builds competence beliefs through repeated interaction, and develops the disposition assessments that enable delegation without oversight. The community sustains itself through this network density.

In the 2+4 model, two humans interact primarily with their AI agents, not with each other. Ideas that were previously bounced between colleagues are now bounced off AI. Analysis that was previously peer-reviewed is now AI-validated. The human-to-human interaction surface shrinks dramatically. Each person operates in a semi-autonomous loop with their own agent stack, producing outputs that arrive fully formed to the other.

This is where the trust formation problem becomes acute. The two humans still need to trust each other — to delegate, to divide responsibility, to make joint decisions under uncertainty. But the evidence they need to form those trust beliefs is increasingly mediated through AI. They see each other’s outputs, not each other’s thinking. The process that produced the output — the reasoning, the dead ends, the judgement calls — is invisible, absorbed into the human-agent interaction.

The risk is structural isolation. Not the deliberate information hoarding of a low-trust organisation, but the emergent siloisation of a workflow where each person’s primary collaborator is an AI that doesn’t share context laterally. Two people can work in adjacent AI-mediated loops for weeks, producing excellent individual output, while their trust relationship — the beliefs about each other’s competence, disposition, and reliability — remains unformed or stale.

But the answer is not to retreat to the old way. Deep peer interaction — slow deliberation, synchronous scrutiny, face-to-face challenge — produces higher-quality trust signals. It also does not scale and cannot match the pace that AI-augmented work demands. The AI-mediated loop is sustainable and productive but generates thinner trust signals. Neither mode wins on its own. Both are true simultaneously.

The emerging competence — and the one that will increasingly define the outliers — is calibration: knowing when the AI loop is sufficient and when genuine human scrutiny is required. The person who moves fast through routine work in their agent loop and slows down deliberately for the high-stakes architectural decision, the ambiguous requirement, the interpersonal tension that no AI can navigate — that person is exercising a judgement that is almost invisible from the outside. The output looks the same either way. But the quality of the trust decisions embedded in that output is fundamentally different.

This is what makes the identification problem from the previous section acute. The outlier you need to find is not the person who produces the best output — AI compresses that signal. It is the person who exercises the best judgement about when to trust the AI loop and when not to. The difference is real but time-delayed. Two people produce equally impressive deliverables. Six months later, one person’s work holds up — the strategy anticipated a risk nobody else saw, the decision accounted for a dynamic nobody else considered, the design proved resilient when conditions changed. The other’s did not. The calibration shows up in the resilience of the outcome, not the polish of the output.

The difficulty is that organisations make trust allocation decisions now, based on the immediate signal, not six months from now when the outcomes diverge. By the time the difference becomes visible, authority and resources have already been allocated based on the surface — which looked identical.

This raises a harder question that the article does not resolve. If old signals compress and new signals are themselves non-stationary, the only candidate evaluation surface left is long-term reputation accumulated over time. But sprint velocity is precisely what removes the runway reputation needs to stabilise. Humans do not stop allocating leadership and competence in the absence of clean signal — social systems fill these gaps regardless, because someone has to be trusted with the next decision. So reputation will form. What is unclear is whether the reputation that forms in this environment tracks reality or drifts from it. In stable signal environments, reputation converges toward truth over time because outcome feedback corrects early misallocations. In non-stationary environments with compressed signals, reputation may form just as quickly but without the corrective loop that aligns it with actual competence. The result is reputation networks that are confidently held and self-reinforcing — once someone is treated as the trusted authority on a domain, that treatment itself becomes a signal others use — but which may be decoupled from the underlying truth they are supposed to track. Whether the reputation networks that emerge in AI-augmented organisations will be more or less accurate than those they replace is an open empirical question. The article cannot answer it. It can only flag that the question is now open in a way it was not before.

This is more disorienting than a straightforward trust collapse. In a low-trust equilibrium, at least the rules are clear. In noisy oscillation, the rules keep shifting. You cannot develop a stable strategy because the evidence base for your trust judgements churns faster than you can act on it — and the people around you, working in their own AI-mediated loops, are equally unable to provide the stable signals you need.

V. A design problem, not a behavioural one

The instinctive response to this dynamic is behavioural prescription: communicate more, be more transparent, build psychological safety. These operate at individual speed in an environment moving at technological speed. By the time a leader has demonstrated consistent trustworthy behaviour through a full cycle of AI-driven change, the next cycle has already begun.

There is a stronger version of this argument that the article has so far understated. Perpetual cognitive load is not only a felt phenomenon — it is bounded by biology. Established research on working memory, attention residue, and the costs of context-switching places hard ceilings on how much trust recalibration any individual can sustain before judgement quality degrades. The rate of environmental change has no equivalent ceiling. This means the mismatch between AI evolution and human adaptation is not a gap that closes through individual effort or training. It is a gap that grows, because one side has a limit and the other does not. Behavioural prescription assumes the gap is closeable. It is not. The cognitive load problem is real, biological, and increasing — and any response that does not account for the biological ceiling is asking individuals to absorb costs that compound until they collapse.

The response must be structural, and it must address the specific dynamics described above.

If the problem is that outliers are buried by convergence to the mean, the design response is to create conditions where exceptional use of AI is visible and attributable — not through surveillance, but through contexts where the work is seen. Showcases, not scorecards. Attribution that distinguishes what the human contributed from what the tool provided, framed as quality infrastructure rather than monitoring. Identifying outliers in a compressed distribution is itself a new organisational competence — one that most organisations have not yet developed and that existing performance management systems are not designed for.

If the problem is that the tool outpaces mental models, the design response is to reward adaptation speed rather than current competence. The person who was excellent last quarter with last quarter’s tools may be average now. What matters is the rate at which someone integrates new capabilities and produces novel output — and most organisations do not measure this.

If the problem is that differentiation compresses, the design response is evolutionary rather than prescriptive. Let the outliers demonstrate what is possible. Make their methods visible. Let social learning raise the floor. This is Putnam’s virtuous cycle applied to capability growth — one person’s exceptional use raises expectations, which raises everyone’s floor, which creates space for the next outlier to push further.

But none of this works in a low-trust environment. In a low-trust organisation, outliers are threats. Their methods are hoarded rather than shared. Adaptation is punished because it disrupts established hierarchies. The trust infrastructure from the previous article is the prerequisite. You need the high-trust equilibrium — or at least a trajectory toward it — before you can build the adaptation mechanism on top of it.

These are design hypotheses, not proven interventions. Whether outlier visibility, adaptation reward, or evolutionary adoption actually move the needle on trust belief convergence in AI-augmented organisations is an empirical question — and the ATDP framework, developed for AI agent design, may provide a method for answering it.

VI. A framework for testing

In recent work [2], I proposed the AI Trust Design Process — a framework that models design choices as actions in a Markov Decision Process, where the state is a population’s trust belief distribution and the reward is social capital change. The framework was developed for AI agent design: how do you design an AI agent so that the population’s trust beliefs move in a direction that builds social capital rather than eroding it?

The proposition I want to advance — carefully, as a hypothesis rather than a claim — is that the same framework generalises to organisational environment design. The design variable is not an AI agent’s behaviour. It is the organisational environment’s properties: visibility mechanisms, reward structures, adaptation cadences, attribution systems. The population is the team or organisation. The trust belief distribution is the same Castelfranchi decomposition. The reward function is the same social capital metric.

If this generalisation holds, then ATDP provides not a prescription but a method — a way to empirically discover which design properties of an organisational environment move the trust belief distribution toward convergence even as AI accelerates change around it. The specific parameters described in the previous section — outlier visibility, adaptation reward, evolutionary adoption — become testable hypotheses within that framework, not intuitions defended by argument.

This is a proposed generalisation, not a validated one. The ATDP framework has been developed for the context of AI agent design and is currently being tested experimentally. Applying it to organisational environments is a new claim that expands its scope significantly. The machinery is compatible — the state space, the action space, and the reward function all have natural organisational analogues. But compatibility is not validation. The weight matrix that maps design properties to belief distribution changes would need to be learned in an organisational context, where the design space is vastly larger and the experimental controls are weaker than in a laboratory setting.

What I am confident of is this: the problem described in this article — noisy trust allocation driven by AI-accelerated environmental change — is a design problem, not a behavioural one. The tools for addressing design problems empirically exist. Whether the specific framework I have proposed is the right tool for this particular design problem is an open question. But the question itself is the right one to ask.

VII. Conclusions

Three claims this article advances:

One. AI changes how trust decisions are made in organisations through three specific mechanisms: it compresses competence signals toward the mean (Galton regression applied to AI-augmented work); it shifts the competence target faster than assessment can track (perpetual adjustment); and it restructures teams around human-AI collaboration loops that reduce the human-to-human interaction surface where trust evidence is generated.

Two. The combined effect is not a trust collapse but a noisier trust allocation — decisions made at full speed on degraded inputs, producing what one practitioner aptly named perpetual cognitive load. The differentiating skill in this environment is calibration: knowing when the AI loop is sufficient and when human scrutiny is required. This skill is real, it produces outcomes that diverge meaningfully over time, but it is almost invisible from the outside because the surface output looks identical to less considered work.

Three. This is a design problem, not a behavioural one. The instinct to prescribe better individual behaviour fails because behavioural interventions operate at individual speed in an environment moving at technological speed. Structural responses — designing for outlier visibility, rewarding adaptation, making the calibration skill legible — operate faster but require frameworks that can be tested empirically. The ATDP framework, originally developed for AI agent design, may provide such a method when generalised to organisational environments. That generalisation is a hypothesis, not a validated approach.

The argument is meant to make the trust dynamics in AI-augmented organisations legible enough that they can be designed for, rather than left to drift.

VIII. Known limits and future work

The compression-versus-stretching dynamics are asserted, not measured. This article argues that AI compresses the middle of the competence distribution while stretching the tails. This is directionally consistent with the Galton regression literature and with the recent application to LLM outputs [11]. But the specific claim that organisational competence distributions respond this way to AI adoption has not been empirically measured. Longitudinal studies tracking competence signal variance within teams before and after AI adoption would be the right test. The distribution shape matters — if AI uniformly compresses rather than differentially affecting the middle and tails, the identification problem I describe is less severe than argued.

The 2+4 team model is emerging, not established. The shift from human-dense teams to small human teams directing AI agent stacks is observable in some engineering contexts but is not yet widespread enough to generalise from. Whether the trust dynamics I describe — emergent siloisation, process invisibility, the calibration skill as differentiator — apply equally in non-engineering domains is an open question. The argument is grounded in mechanism (reduced human-to-human interaction surface) which should generalise, but the specific manifestations may differ substantially across domains.

The calibration skill has not been formally defined. I argue that the emerging outlier competence is knowing when to trust the AI loop and when to apply human scrutiny. This is intuitively recognisable but theoretically imprecise. What exactly constitutes calibration skill? How would you measure it? How does it relate to existing constructs like metacognition or professional judgement? These questions need answers before the concept can support empirical work. At present it is an observation, not a construct.

The ATDP generalisation to organisational environments is untested. The proposition that the ATDP framework extends from AI agent design to organisational environment design is grounded in structural compatibility — the state space, action space, and reward function all have natural analogues. But the design space for organisational environments is vastly larger, the experimental controls are weaker, and the feedback loops are slower than in AI agent experiments. Two specific tensions deserve acknowledgement. First, the Markov property — the assumption that the future state depends only on the current state and the action taken — is a simplification when applied to human trust. Two teams with identical current trust distributions but different histories (one recovering from a failure, one declining from a betrayal) may respond very differently to the same design intervention. The current state representation may encode enough history through the belief values themselves to preserve the property, but this is an assumption that needs testing, not a given. Second, there is a speed paradox: structural interventions operate faster than behavioural ones, but still slower than AI evolution. Organisational design moves at institutional speed — quarters at best. AI tooling moves at weeks. If the environment shifts three times between designing an intervention and measuring its effect, the framework risks solving yesterday’s problem. The design method must account for this latency or it becomes retrospective rather than adaptive.

The macro-level social capital trajectory is undecided. This article deliberately avoids claiming that AI will increase or decrease social capital. Technology has broadly raised human wellbeing, and the established correlation between social capital and economic outcomes suggests an upward trajectory. But corrosion is visible — in the erosion of shared epistemic foundations [9], in the fragmentation of communities into echo chambers, in the exploitation of trust defaults at scale. The macro trajectory — whether the broad upward trend in wellbeing and social capital continues or whether the visible corrosion [9] accelerates — is shaped by forces well beyond any single organisation’s design choices. But design choices are what we have. This article is an attempt to make the ones that matter legible.

References

[1] Prifti, Y. (2026). Trust at Organisational Speed. Weighted Thoughts.

[2] Prifti, Y. (2026). A Markov Framework for AI Trust and Societal Outcomes. SSRN. doi:10.2139/ssrn.6390618

[3] Castelfranchi, C. & Falcone, R. (2010). Trust Theory: A Socio-Cognitive and Computational Model. Wiley.

[4] Putnam, R.D. (1993). Making Democracy Work: Civic Traditions in Modern Italy. Princeton University Press.

[5] Axelrod, R. (1984). The Evolution of Cooperation. Basic Books.

[6] Levine, T.R. (2019). Duped: Truth-Default Theory and the Social Science of Lying and Deception. University of Alabama Press.

[7] De Meo, P., Prifti, Y. & Provetti, A. (2025). Trust Models Go to the Web. ACM Transactions on the Web, 19(2). doi:10.1145/3715882

[8] Edmondson, A.C. (2018). The Fearless Organization: Creating Psychological Safety in the Workplace. Wiley.

[9] Prifti, Y. (2026). The Falling Cost of Lying and the Active Role of AI Agents. Weighted Thoughts.

[10] Galton, F. (1886). Regression Towards Mediocrity in Hereditary Stature. Journal of the Anthropological Institute of Great Britain and Ireland, 15, 246–263.

[11] Krashniak, A. et al. (2025). Galton’s Law of Mediocrity: Why Large Language Models Regress to the Mean and Fail at Creativity in Advertising. arXiv:2509.25767.

[12] Pluchino, A., Rapisarda, A. & Garofalo, C. (2010). The Peter Principle Revisited: A Computational Study. Physica A, 389(3), 467–472. doi:10.1016/j.physa.2009.09.045

[13] Prifti, Y. (2023). The Emergence of Interpersonal and Social Trust in Online Interactions. PhD thesis, Birkbeck, University of London.

[14] Sharma, M. et al. (2024). Towards Understanding Sycophancy in Language Models. ICLR 2024.

Trust at Organisational Speed

Ylli Prifti, Ph.D. — Tue, 28 Apr 2026 02:55:14 GMT

Abstract: Organisations are communities in Putnam's sense — they run the same self-reinforcing trust dynamics that shape regional economies, but at radically compressed timescales. The dominant practitioner literature (Covey, Lencioni) correctly intuits that trust compounds, but builds on a category error: treating trust as behavioural output rather than cognitive state. The Castelfranchi-Falcone model corrects this by decomposing trust into competence, disposition, and opportunity beliefs held by the trustor, not performed by the trustee. When this framework is applied at organisational scale, three consequences emerge. First, the equilibrium dynamics Putnam documented across regions over decades play out within teams over quarters, because interaction frequency, stakes, and network density are all elevated while exit costs are lower. Second, power asymmetry — absent from both Putnam's peer-community model and the practitioner literature — creates asymmetric defection costs that make low-trust equilibria structurally more stable in hierarchical organisations. Third, behavioural interventions calibrated for relationship timescales are too slow; structural interventions that change the environment for all interactions simultaneously are the only ones that operate at community speed. This is the first of two articles. The second examines how AI specifically perturbs the trust infrastructure described here.

Robert Putnam spent decades studying why some regions of Italy prospered while others stagnated. The answer was not geography, resources, or policy. It was social capital — the networks of trust and reciprocity that allow communities to cooperate effectively. The correlation between per-capita trust scores and regional economic performance was so strong (0.98 Pearson, confirmed computationally in recent work [1]) that it was difficult to attribute to anything other than a fundamental mechanism.

Putnam studied these dynamics across regions and nations, over decades and centuries. The processes were visible precisely because they were slow. A region didn’t collapse into distrust overnight. It drifted, over generations, into a self-reinforcing cycle: defection bred distrust, distrust bred shirking, shirking bred exploitation, exploitation bred disorder. Or the reverse — cooperation bred trust, trust bred further cooperation, and social capital accumulated. Two equilibrium basins. Once a community settled into one, escaping was extraordinarily difficult.

This article makes a simple claim: organisations are communities in Putnam’s sense. They run the same equilibrium dynamics. But they run them at radically compressed timescales — and that compression changes everything about what leadership means.

I. The unit of analysis is the community, not the individual

The most widely read books on organisational trust — Stephen Covey’s The Speed of Trust, Patrick Lencioni’s The Advantage, even Amy Edmondson’s work on psychological safety — share a common starting point. They treat trust as something individuals build through behaviour. Be credible. Be vulnerable. Be consistent. The prescription radiates outward from the self: if enough individuals behave in trustworthy ways, the organisation becomes trustworthy.

This gets the direction of causation backwards.

Trust, in the Castelfranchi-Falcone socio-cognitive model [2], is not a behaviour. It is not even a signal you emit. It is a mental state — a set of beliefs the trustor holds about the trustee. Specifically, three beliefs: that the other party is competent to do what is needed (competence belief), that they are disposed to act in one’s interest (disposition belief), and that structural conditions allow them to act (opportunity belief). Trust produces trusting behaviour, not the other way around. You cannot engineer the mental state by performing the behavioural outputs. And critically, the same behavioural output is interpreted differently depending on the community’s existing trust state. A leader who shows vulnerability in a high-trust team is perceived as courageous and authentic. The same act in a low-trust team is read as tactical manipulation or weakness. The basin filters the signal. This is why behavioural prescriptions that work beautifully in healthy organisations can backfire catastrophically in damaged ones — the receiver’s interpretive frame, not the sender’s intent, determines the trust outcome.

This matters because it shifts the unit of analysis. If trust were behaviour, the individual-level prescription would make sense — change the behaviour, change the outcome. But trust is a cognitive state held within relationships, and relationships exist within communities. The community’s aggregate trust capital determines whether individual trustworthy behaviour can even be expressed and recognised.

Consider a team member with genuinely good intentions operating in an organisation where information is hoarded, commitments are routinely broken by leadership, and credit is claimed by those with power. The structural conditions for trust — what Castelfranchi calls practical opportunity beliefs — are absent. It does not matter how credible or vulnerable this person is. The community-level dynamics override individual-level signals.

Organisations have every structural feature necessary for trust emergence in the Putnam sense: repeated interaction, reputation accumulation, shared stakes, information asymmetries, and — crucially — power dynamics. They are not merely analogous to communities. They are communities, with all the self-reinforcing dynamics that entails.

II. The compression

Putnam’s subjects could not see the phase transition happening in real time. The drift from high-trust civic engagement to low-trust institutional decay in American communities took half a century. By the time he published Bowling Alone [3], the transition was already deep into the vicious cycle. The diagnostic was retrospective.

Organisational leaders do not have this problem. Or rather, they have the opposite problem: the dynamics run fast enough to observe, but also fast enough to be catastrophic before the observation translates into action.

What determines the speed of Putnam’s equilibrium dynamics? At minimum: interaction frequency, stakes per interaction, network density, and information flow speed. In a nation, these variables are low — people interact with strangers infrequently, individual stakes are small, networks are sparse, and information propagates slowly. The equilibrium converges over generations.

In an organisation, every one of these variables is elevated. A team of eight interacts daily. Each interaction carries meaningful professional stakes. The network is dense — everyone knows everyone. Information, including information about defection, travels instantly. The same dynamics that take decades at regional scale take quarters at team scale.

But the compression is not merely a speedup of the same process. There is a structural difference that makes organisational dynamics even more volatile than Putnam’s regional model would predict: exit costs. In a region or nation, exit is expensive — you cannot easily leave your community, your language, your social network. High exit costs lengthen the shadow of the future, which Axelrod showed is precisely what sustains cooperation [4]. In an organisation, exit is comparatively cheap. People quit. And the easier it is to leave, the shorter the effective time horizon for any individual actor, which game theory tells us pushes toward defection. The combination of compressed dynamics and low exit costs means that organisational trust equilibria are not just faster versions of regional ones — they are structurally less stable. The vicious cycle, once entered, accelerates faster because the cooperators leave first.

This compression has two consequences that the practitioner literature has not reckoned with.

First, phase transitions become observable in real time. A leader who understands the equilibrium framework can detect which basin their organisation is moving toward. The early signals are legible: are people sharing information or hoarding it? Are commitments being met or quietly renegotiated? Are new hires being onboarded into a culture of candour or one of self-protection? These are not personality traits. They are equilibrium indicators. And at organisational speed, the window between detection and the point of no return is measured in months, not years.

Second, interventions must operate at community speed. Covey’s behavioural prescriptions — the 13 behaviours, the five waves of trust — are calibrated for relationship timescales. Years of consistent behaviour building credibility over time. This is sound advice for a personal relationship. It is too slow when equilibrium dynamics run at organisational timescales. By the time a leader has demonstrated consistent trustworthy behaviour over two years, the team may have already settled into the low-trust basin and rebuilt its norms around self-protection.

Structural interventions — changes to information flow, accountability mechanisms, decision-making processes, incentive structures — operate at community speed because they change the environment in which every interaction occurs, simultaneously, for everyone. Behavioural interventions change one person’s outputs and hope the signal propagates. At compressed timescales, the structural approach is not just more efficient. It is the only one fast enough.

III. What the practitioner literature gets right — and where it falls short

Let me be clear: the instincts in these books are largely correct. Covey is right that trust accelerates coordination and reduces friction costs. Lencioni is right that the dynamics are self-reinforcing — that organisational health compounds. Edmondson is right that psychological safety is a prerequisite for learning and performance. These are not wrong observations. They are, in many cases, independent rediscoveries of dynamics that Putnam documented at civilisational scale and that Castelfranchi formalised computationally.

The gap is mechanistic. When you build on intuition rather than mechanism, your interventions can be right for the wrong reasons — or wrong in ways you cannot diagnose.

Three specific gaps:

The direction of causation. Covey’s model begins with self-trust and radiates outward through relationship trust, organisational trust, market trust, and societal trust. This is motivationally appealing but mechanistically backwards. Trust is not a signal you emit. It is a belief state formed in the receiver based on evidence, context, and prior beliefs. You can influence the evidence available to the receiver, but you cannot control the belief formation process. Castelfranchi’s model makes this distinction precise: the trustor’s beliefs about the trustee’s competence, disposition, and opportunity are updated through interaction, not through the trustee’s self-presentation [2].

The absence of structural opportunity. Neither Covey nor Lencioni engages seriously with what Castelfranchi calls practical opportunity beliefs — the trustor’s assessment of whether structural conditions allow the trustee to act on their competence and good intentions. A manager may be competent and well-disposed, but if the organisation’s incentive structure punishes the behaviour the trustor needs, trust cannot rationally form. This is not a failure of character or credibility. It is a structural constraint. The practitioner literature treats organisational structure as context rather than as a primary variable in trust formation. Putnam would recognise this immediately — his entire framework is about how institutional structures create or destroy the conditions for trust.

The neutrality of power. This is the most consequential gap. Both Covey and Lencioni write as though power asymmetry is incidental to trust dynamics. It is not. Power asymmetry fundamentally alters the cost structure of defection. In a peer-to-peer community — Putnam’s civic associations, Axelrod’s iterated prisoner’s dilemma — defection is costly because the defector faces reciprocal punishment. In a hierarchical organisation, defection by those with power is structurally cheaper. A senior leader who breaks a commitment, hoards information, or claims credit faces fewer immediate consequences than a junior team member who does the same.

Axelrod’s insight [4] is that cooperative equilibria require defection to be immediately and predictably costly. When power asymmetry lowers the cost of defection for some actors, the conditions for sustained cooperation are undermined. This creates an asymmetric vulnerability: the powerful can defect without immediate consequence, but their defection degrades trust across the entire community — because everyone observes it and updates their beliefs accordingly.

The hypothesis I would advance — and which I believe deserves formal investigation — is that power asymmetry makes low-trust equilibria more stable and harder to escape. Not because powerful people are less trustworthy, but because the risk structure is asymmetric: when a senior leader defects — breaks a commitment, hoards information, claims credit — the community bears the cost while the leader often retains the upside. When a junior team member does the same, the consequences are immediate and personal. This asymmetric risk distribution means defection is locally rational for those with power even when it is globally destructive, making defection the dominant strategy at the top of the hierarchy precisely where its effects on the community are largest. This is a Putnam vicious cycle with an asymmetric accelerant.

IV. The diagnostic value

If organisations are communities running Putnam dynamics at compressed timescales, then organisational leadership is, in part, equilibrium management. The leader’s job is not to be trustworthy (though that helps). It is to maintain the structural conditions under which the high-trust equilibrium is self-sustaining.

This reframing has practical consequences.

Hiring is community composition. Every addition to a team changes the interaction dynamics, the reputation network, and the information flow patterns. A hire who is individually excellent but structurally disruptive to the trust equilibrium is a net negative. This is not about “culture fit” — a concept too vague to be useful. It is about whether the new interaction patterns the hire introduces push the community toward or away from the cooperative equilibrium.

Reorganisations are equilibrium shocks. When you restructure a team, you are not merely changing reporting lines. You are destroying existing trust networks and forcing new ones to form from zero. The Castelfranchi belief components — competence, disposition, opportunity — must all be rebuilt through new interactions. If the interaction frequency and stakes are high enough, the new equilibrium forms quickly. If not, the team may settle into a low-trust holding pattern that persists indefinitely. Most organisations treat reorganisation as an organisational design problem. It is equally a trust dynamics problem.

Metrics can be equilibrium indicators. The standard lagging indicators of trust collapse — turnover, disengagement, productivity decline — arrive after the phase transition is already underway. Leading indicators are harder to measure but more valuable: information sharing patterns, commitment fulfilment rates, the gap between public statements and private behaviour, the speed at which new members are integrated into candid communication. These are not personality diagnostics. They are community health metrics.

Structural interventions outperform behavioural prescriptions. Transparency by default (open calendars, shared documents, public decision logs) changes the information environment for everyone simultaneously. Accountability mechanisms that apply equally regardless of seniority address the power asymmetry problem directly. Decision-making frameworks that make the reasoning visible — not just the outcome — preserve the signals that trust beliefs depend on. None of these require anyone to be vulnerable, authentic, or credible. They change the structural conditions under which trust forms, which is faster and more reliable than hoping individual behaviour will propagate.

V. What this framework does not yet address

I want to be honest about the gaps.

The compression argument — that Putnam dynamics run faster in smaller, denser communities — is intuitively strong and directionally supported by existing research at team scale (Edmondson [5]), organisational scale (Fukuyama [6]), and regional scale (Putnam [7], De Meo et al. [1]). But the time constant itself has not been formalised. What precisely determines how fast the equilibrium dynamics converge in a community of a given size, density, and interaction frequency? This is a question worth answering rigorously.

The power asymmetry hypothesis — that hierarchy makes low-trust equilibria more stable — is, at this stage, a hypothesis. It is grounded in Axelrod’s framework and consistent with organisational experience, but it has not been tested empirically in the way I would want before claiming it as established.

And the question of leading indicators — what exactly are the early signals of a phase transition from high-trust to low-trust equilibrium? — is practically the most important question this framework raises, and it is entirely open. It is also, I suspect, answerable empirically, which makes it a research agenda rather than a speculation.

These gaps are not a weakness of the argument. They are what makes it a research programme rather than a finished theory. Let me be explicit about both the known limits and what needs to come next.

VI. Known limits and future work

The scale invariance bridge is proposed, not proven. This article argues that the same Putnam equilibrium dynamics operate at team scale, organisational scale, and regional scale — differing in time constant but not in mechanism. Empirical work exists at each scale independently: Edmondson’s psychological safety research at team level [5], social capital and organisational performance studies at firm level, and Putnam’s original work at regional and national level [7]. A recent research agenda from HEC Paris notes that the role of social capital and trust for organisations “remains underexplored” — confirming the gap but not filling it. What is missing is a formal demonstration that the dynamics are mechanistically continuous across scales, rather than superficially similar. The compression argument is intuitively strong and directionally supported, but it has not been tested with the rigour I would want before claiming it as established. A longitudinal study measuring trust equilibrium convergence rates across communities of different sizes and interaction densities would be the right next step.

Granovetter’s objection is unaddressed. Putnam’s regional dynamics emerge from weak ties across large populations. Organisational dynamics are driven by strong ties in small networks. It is possible that these produce qualitatively different equilibrium behaviour, not merely faster versions of the same process. The weak-tie/strong-tie distinction may mean that the basin structure itself differs at different scales — not just the convergence speed. I believe the equilibrium framework still holds because the underlying mechanism (reciprocity, reputation, defection cost) operates in both regimes, but this is an assertion that needs formal investigation.

The power asymmetry hypothesis is the least supported claim. The argument that hierarchy makes low-trust equilibria more stable — because power lowers the cost of defection for those who hold it — is grounded in Axelrod’s framework and consistent with organisational experience. But it has not been tested empirically. It is the claim in this article most likely to be wrong in its specifics, even if the general direction (that power asymmetry modifies trust dynamics) is almost certainly correct. Existing critiques of Putnam note that his framework underplays power dynamics and structural inequality, but none have formalised the interaction between hierarchy and equilibrium stability in the way I am proposing. This needs its own dedicated investigation.

Edmondson’s work deserves deeper engagement. Psychological safety is arguably the closest existing empirical programme to what this article describes — team-level trust dynamics measured through observable behaviour. I have cited Edmondson as support but not engaged with the substance of her findings or shown precisely where the Putnam-Castelfranchi framework extends, refines, or reframes her conclusions. That engagement is owed and will be part of future work. The short version: Edmondson’s psychological safety maps most closely to a specific configuration of Castelfranchi’s opportunity beliefs (B-PrOp) — the perception that structural conditions permit honest expression. The framework offered here embeds that within a broader equilibrium model that also accounts for competence and disposition beliefs. But this claim needs to be developed carefully, not asserted in passing.

Leading indicators of phase transition remain unidentified. This is the most practically important open question. If organisational leaders can observe Putnam dynamics in real time — which the compression argument implies — then they need instruments for doing so. What specifically are the early signals that a team or organisation is shifting from the high-trust to the low-trust basin? The standard lagging indicators (turnover, disengagement, productivity decline) arrive after the transition is well underway. Candidates for leading indicators — information sharing velocity, commitment fulfilment rates, synchronous-to-asynchronous communication ratios, speed of candid integration of new members — are plausible but untested. Identifying and validating these indicators is an empirical project worth undertaking.

What this framework does not attempt. This article does not model trust formation at the individual cognitive level — that is Castelfranchi and Falcone’s contribution, and I build on it rather than extending it. It does not address cross-cultural variation in trust dynamics, which Putnam himself documented extensively and which would add significant complexity to the organisational application. And it does not yet address the specific perturbations that AI introduces to the trust infrastructure described here — that is the subject of the next article.

The claim is not that we have all the answers. The claim is that the right unit of analysis is the community, not the individual; that the right theoretical framework is Putnam’s equilibrium dynamics, not behavioural prescription; and that the compression of those dynamics at organisational scale creates both a diagnostic opportunity and an intervention imperative that the current practitioner literature does not adequately address.

In the next article, I will examine what happens to these dynamics when AI enters the loop — because the trust infrastructure described here is exactly what is now under pressure.

References

[1] De Meo, P., Prifti, Y. & Provetti, A. (2025). Trust Models Go to the Web. ACM Transactions on the Web, 19(2). doi:10.1145/3715882

[2] Castelfranchi, C. & Falcone, R. (2010). Trust Theory: A Socio-Cognitive and Computational Model. Wiley.

[3] Putnam, R.D. (2000). Bowling Alone: The Collapse and Revival of American Community. Simon & Schuster.

[4] Axelrod, R. (1984). The Evolution of Cooperation. Basic Books.

[5] Edmondson, A.C. (2018). The Fearless Organization: Creating Psychological Safety in the Workplace. Wiley.

[6] Fukuyama, F. (1995). Trust: The Social Virtues and the Creation of Prosperity. Free Press.

[7] Putnam, R.D. (1993). Making Democracy Work: Civic Traditions in Modern Italy. Princeton University Press.

[8] Govier, T. (1997). Social Trust and Human Communities. McGill-Queen’s University Press.

[9] Covey, S.M.R. (2006). The Speed of Trust: The One Thing That Changes Everything. Free Press.

[10] Lencioni, P. (2012). The Advantage: Why Organizational Health Trumps Everything Else in Business. Jossey-Bass.

[11] Prifti, Y. (2026). A Markov Framework for AI Trust and Societal Outcomes. SSRN. doi:10.2139/ssrn.6390618

[12] Levine, T.R. (2019). Duped: Truth-Default Theory and the Social Science of Lying and Deception. University of Alabama Press.

The Falling Cost of Lying and the Active Role of AI Agents

Ylli Prifti, Ph.D. — Tue, 14 Apr 2026 16:07:44 GMT

Abstract: Truth default — the evolved human tendency to assume others are being honest — is not naivety. It is the load-bearing infrastructure beneath social cooperation, trust, and the social capital that drives civilisational outcomes. This article traces the progressive erosion of that infrastructure through the declining cost of deception, from print to broadcast to social media to AI agents. The consequences are not primarily epistemic — objective knowledge has never been more robust — but social: the mechanisms for distributing and anchoring shared truth have fractured. Critically, this fracture is not accidental. Deception is commercially and politically incentivised at scale. AI does not originate this problem. It inherits it and, for the first time, acts as an autonomous participant within it rather than merely a channel for human deception. Addressing it requires not just better design but a rethinking of the structural incentives that make deception profitable.

I. The paradox — and why it is not an accident

We are not less certain than our ancestors. The scientific understanding of the physical world has never been more precise, more validated, or more practically useful. The evidence base for climate change, vaccine efficacy, evolutionary biology, and a thousand other empirical questions is stronger than at any point in human history. The tools for establishing truth — statistical methods, peer review, replication, large-scale data — have advanced enormously.

And yet we agree less. The percentage of people in developed democracies who trust institutions, news media, government, and scientific consensus has been declining for decades. Not because those institutions became less rigorous — in many cases they became more so — but because the social infrastructure that once anchored shared factual premises has fractured.

The problem is not epistemic. It is social. We do not know less. We have become less able to act on shared knowledge as a collective.

The cost of deception has fallen continuously since the printing press. AI agents are the first inflection point where the deceiver is no longer human.

But this fracture is not simply the unintended consequence of technological change. It is, in significant part, the result of deliberate exploitation. Deception sells. Outrage drives engagement. Novelty beats accuracy in the attention economy. The business models of social media platforms, the political economy of populist movements, and the incentive structures of certain media ecosystems all reward the exploitation of truth default. This is not a conspiracy — it is a rational response to a payoff structure that makes defection profitable in a society where most people are still cooperating.

Game theory is instructive here. In a population that defaults to trust, the defector gains disproportionately — at least in the short term. The cost of being caught is low when verification is expensive and attention is scarce. The result is a classic commons problem: individually rational deception produces collectively irrational outcomes. The social capital that makes cooperation valuable in the first place is drawn down, slowly and then all at once.

If we want to address this, appeals to norms are insufficient. What is required is a shift in the equilibrium — mechanisms that raise the cost of deception structurally, making defection less profitable rather than simply less ethical. What those mechanisms look like is a question we return to at the end.

II. Truth default as load-bearing infrastructure

Timothy Levine’s Truth-Default Theory offers a precise account of why this infrastructure exists and what it does [1]. Humans, Levine argues, operate with a truth default — a baseline assumption that others are telling the truth. This is not gullibility. It is the statistically rational prior for a social species. In most interactions, across most of human history, most people have been telling the truth most of the time. Assuming honesty is not only more efficient than constant vigilance — it is what makes cooperation, coordination, and cumulative culture possible at all.

The truth default is not a failure of critical thinking. It is the operating system of social life. Remove it and the cognitive overhead of every interaction becomes unbearable. You cannot build institutions, markets, science, or governance on a foundation of mutual suspicion. The default assumption of good faith is what allows information to flow, agreements to hold, and social capital to accumulate.

Those familiar with organisational culture will recognise truth default in a different register. The management principle of “assume positive intentions” — widely taught in team building and leadership development — is the same cognitive mechanism applied to professional contexts. What organisational psychologists arrived at empirically as a prescription for healthy teams, evolutionary biology arrived at as a description of how social species actually function. The prescription works because it mirrors the default.

Putnam’s account of social capital [2] — the networks of trust and reciprocity that correlate so strongly with economic and institutional outcomes — is built on exactly this foundation. Generalised trust, the willingness to cooperate with strangers, is downstream of truth default. When the default assumption holds, cooperation scales. When it degrades, it does not merely make individual interactions harder — it collapses the aggregate.

The 0.98 Pearson correlation between per-capita trust scores and regional GDP that anchors the empirical programme behind this series [3] is not a coincidence. It is the measurable consequence of truth default functioning as intended at civilisational scale.

III. The cost curve of deception

Truth default evolved in a world where deception was costly for the deceiver. Lying requires cognitive effort. It creates inconsistencies that accumulate over time. It risks social exposure — the liar found out faces reputational damage that, in a small community, is catastrophic. The evolutionary logic is clear: a species that defaults to truth telling, and punishes detected deception severely, produces more cooperative outcomes than one that doesn’t.

What happens when the cost of deception falls?

Every major information technology has lowered it. The printing press made it possible to distribute false claims at scale without personal exposure. Broadcast media concentrated that power in fewer hands, but also created mass audiences receptive to narratives that had never needed to withstand local social scrutiny. Social media eliminated the last friction — anyone could reach millions with a fabricated claim at zero marginal cost, with no accountability infrastructure capable of keeping pace.

The asymmetry compounds at each step. Deception gets cheaper to produce. Detection gets harder as volume increases. The human capacity for truth-default-based cooperation was calibrated for a world of face-to-face interaction and local information. It was not designed for a world in which the information environment is engineered.

The consequences are structural. When manufacturing credible-seeming alternatives to established knowledge becomes cheap, the institutions whose function is to anchor shared truth — journalism, academia, government, science — lose the authority to perform that function. Not because they became less rigorous. Because the cost of mimicking their form without their substance collapsed.

Harari’s observation in Nexus [4] — that the abundance of information has not benefited us as a society — is a description of what happens when the distribution infrastructure for shared truth degrades faster than the underlying knowledge base grows. The knowledge exists. The social mechanisms for anchoring it as shared premises for collective action are failing.

The rise of conspiracy movements, the electoral success of post-truth politics, the collapse of institutional confidence in developed democracies — these are not aberrations caused by particularly bad actors. They are the predictable systemic output of a cost curve that has been falling for decades. The actors who exploit it are symptoms, not causes.

IV. Truth default inside the trust framework

It is worth being precise about where truth default operates within the trust architecture that underlies this research programme.

The Castelfranchi-Falcone Socio-Cognitive Model of Trust decomposes trust into three belief components: competence, willingness, and opportunity [5]. Truth default is not confined to one of these. It saturates all three.

Willingness belief — the trustor’s assessment of whether the trustee will act in their interest — is the most obvious site. When we assume an agent means well, we are truth-defaulting to their stated or implied intentions. But competence belief is equally dependent on truth default. When we accept a credential, a track record, or a claimed capability, we are truth-defaulting to the accuracy of that representation. We do not independently verify most of the competence claims we act on. We extend trust because we assume the representation is honest.

Opportunity belief — the assessment of whether the agent has what it needs to act — operates the same way. When an agent claims to have access to a database, a booking system, or a body of knowledge, the trustor who accepts that claim is truth-defaulting to it.

This means that a deceptive agent does not merely damage willingness belief. It exploits the truth default embedded in all three components simultaneously. The suppression-condition agent observed in the experimental session described in the previous article fabricated a booking capability it does not have (opportunity), overclaimed the strength of evidence for a causal claim (competence), and presented all of this in the confident register of an agent acting in the user’s interest (willingness). Truth default was the attack surface for all three.

V. AI as inflection point

Artificial intelligence does not originate this problem. It inherits a cost curve that was already in collapse and accelerates it by orders of magnitude.

But it introduces something qualitatively new. Every previous information technology was a channel — a means by which human deception could be amplified and distributed. AI agents are participants. They do not merely transmit claims made by humans. They generate claims, make decisions, and act — autonomously, at scale, within the social infrastructure that truth default is supposed to sustain.

The evolutionary logic that made deception costly — social exposure, reputation damage, cognitive overhead — does not apply to agents. They have no reputation in the relevant sense. They experience no cognitive load from inconsistency. They can fabricate at scale without fatigue. The asymmetry between deception and detection, which was already growing, becomes structural.

VI. Evolutionary design, not design by decree

The instinct when confronted with this problem is to reach for design — build better agents, mandate transparency, regulate AI outputs. These interventions matter. But they address symptoms rather than the underlying cost curve. Design by decree, imposed on a system whose incentive structures reward deception, produces evasion rather than change.

What the problem requires is evolutionary design — interventions that shift the selective pressures rather than prescribing specific behaviours. Axelrod’s foundational work on the evolution of cooperation [6] is instructive here, and the lesson is counterintuitive. In a series of iterated prisoner’s dilemma tournaments, the winning strategy was not the most sophisticated one. It was the simplest: Tit-for-Tat. Cooperate on the first move. Do whatever the other player did on the previous move. That’s it. Every more elaborate strategy lost to it.

The reason is structural. Tit-for-Tat is transparent — the other player always knows what to expect. It is reciprocal — cooperation is rewarded, defection is immediately punished. And it is provokable — it never absorbs defection without consequence. These properties, not intelligence, are what sustain cooperation in a population of mixed strategies.

The parallel to the truth default problem is direct. Truth default is a population-level cooperative equilibrium — it works because most people cooperate and defectors face social cost. The cost curve problem is that the retaliation mechanism has been progressively decoupled from defection. You can fabricate a citation, misrepresent a source, or project false confidence at scale with no immediate consequence. The Tit-for-Tat logic breaks down when the shadow of the future — the expectation of being held to account — disappears.

Restoring that shadow does not require more sophisticated detection. It requires simpler and more robust reciprocity mechanisms. The arms race between deception and detection is one the defector always wins, because complexity favours the attacker. What works structurally is making defection immediately and predictably costly — accountability infrastructure that makes agent-generated falsehoods traceable and attributable; reputation systems that aggregate deception signals across interactions rather than evaluating each in isolation; regulatory frameworks that impose real costs on platforms whose business models depend on truth-default exploitation.

None of these is sufficient alone. Together they describe an evolutionary pressure toward honesty — not because deception is wrong, but because it becomes too expensive. The simplest strategies win. The task is to design the environment so that the simplest profitable strategy is also the honest one.

There is a predictable objection to this argument. Those who exploit truth default — conspiracy theorists, purveyors of manufactured outrage, agents designed to suppress failure — frequently invoke the liberal paradox when faced with consequences: you claim to value openness, yet you would penalise speech. This is a misreading of the liberal tradition. Popper addressed it directly in The Open Society and Its Enemies [9]. The paradox of tolerance is that a society which extends unlimited tolerance to those who would destroy tolerance will eventually lose tolerance itself. Popper’s conclusion was not that intolerance should be tolerated — it was that tolerance as a principle requires structural limits on those who use it as a weapon. Raising the cost of deception is not illiberal. It is the structural condition under which liberal discourse remains possible at all.

Agent design sits within this larger argument as one lever among several. The ATDP framework [7] treats design as the decision variable within the trust-social capital relationship. The experiment described in the previous article [8] tests whether transparency at the interaction level preserves something that suppression quietly destroys. If it does, that is evidence that design choices can contribute to the evolutionary pressure. If it does not — if suppression wins even in controlled conditions — that is evidence the cost curve problem is more deeply structural and that design alone is insufficient.

Either result advances the argument. The experiment is not a solution. It is a measurement of the problem’s depth.

References

[1] Levine, T.R. (2019). Duped: Truth-Default Theory and the Social Science of Lying and Deception. University of Alabama Press.

[2] Putnam, R.D. (2000). Bowling Alone: The Collapse and Revival of American Community. Simon & Schuster.

[3] De Meo, P., Prifti, Y. & Provetti, A. (2025). Trust Models Go to the Web: Learning How to Trust Strangers. ACM Transactions on the Web, 19(2), Article 12. doi:10.1145/3715882

[4] Harari, Y.N. (2024). Nexus: A Brief History of Information Networks from the Stone Age to AI. Signal Press.

[5] Castelfranchi, C. & Falcone, R. (2010). Trust Theory: A Socio-Cognitive and Computational Model. Wiley.

[6] Axelrod, R. (1984). The Evolution of Cooperation. Basic Books.

[7] Prifti, Y. (2026). A Markov Framework for AI Trust and Societal Outcomes. SSRN. doi:10.2139/ssrn.6390618

[8] Prifti, Y. (2026). How Do You Design a Large-Scale AI Trust Experiment? Weighted Thoughts. weightedthoughts.substack.com

[9] Popper, K.R. (1945). The Open Society and Its Enemies. Routledge.

Ylli Prifti, Ph.D., is a researcher at Birkbeck, University of London. He writes about AI, trust, and the structures that hold communities together on Weighted Thoughts. Connect on LinkedIn.

How Do You Design a Large-Scale AI Trust Experiment?

Ylli Prifti, Ph.D. — Sat, 21 Mar 2026 20:29:17 GMT

Abstract: Designing a rigorous test for whether AI agent behaviour shapes social capital is harder than it looks. Self-reported trust measures the wrong variable. Behavioural proxies require careful grounding. And the intuitive prediction — that transparency builds trust — runs directly against a body of empirical work showing users penalise systems that admit uncertainty. This article lays out the experimental design for the first empirical test of the ATDP framework's Prediction 3, before the results exist. The null result, if it arrives, will be as theoretically significant as confirmation.

The turn from theory to experiment

A framework that makes no testable predictions is not a scientific contribution — it is a vocabulary. The ATDP framework [1] was designed from the outset to be falsifiable. It predicts that specific agent design properties shift the distribution of trust beliefs in a population, and that those shifts propagate into measurable changes in social capital. Five predictions follow from the formal structure. The next stage of this research programme is to test them.

Part 3 of this work will do that. But before reporting findings, I want to do something that is less common in published research than it should be: lay out the experimental design in public, before the results are in. What we are trying to prove. How we intend to measure it. What confirmation looks like, and what failure looks like. And why the sequencing of the experiments matters.

This article is that laying out. The experiment will follow.

Five predictions, one starting point

The ATDP framework generates five directional predictions [1]:

Prediction 1 (Displacement): High-substitution deployments will show divergence between dyadic trust metrics and community-level social capital indicators — the former may look healthy while the latter quietly declines.

Prediction 2 (Consistency premium): Agents with consistent competence and willingness signals accumulate social capital faster than agents with higher average performance but higher variance.

Prediction 3 (Failure transparency): Agents that communicate failures transparently will sustain higher social capital than agents that suppress or obscure failure, even when absolute error rates are identical.

Prediction 4 (Equilibrium sensitivity): Communities with lower baseline social capital will experience amplified negative effects from low-trust deployments, and slower positive effects from high-trust ones.

Prediction 5 (Composition threshold): There exists a critical ratio of human-agent to human-human trust interactions beyond which social capital accumulation stalls or reverses, even if all individual interactions are individually positive.

Testing any of these properly is non-trivial. Predictions 1, 4, and 5 require longitudinal population-level measurement — the kind of data that takes years to accumulate and is difficult to isolate causally. Prediction 2 requires careful control of performance variance while holding average capability constant. These are not impossible experiments, but they are not the right starting point.

Prediction 3 is.

Why Prediction 3 first

The causal chain the framework proposes runs from agent design properties, through trust belief distributions, to social capital change. Before testing whether trust belief shifts propagate into social capital — which is what Predictions 1, 2, 4, and 5 ultimately require — we need to confirm that agent design properties shift trust beliefs at all. That is the mechanism test. Without it, the rest of the causal chain is untethered.

Prediction 3 is the mechanism test at its most tractable. It takes one design property — transparency of failure — and asks whether varying it produces a measurable difference in trust. The independent variable is a system configuration. The dependent variable is trust behaviour. No longitudinal data. No population-level social capital measurement. A controlled experiment with a clean signal.

If Prediction 3 fails — if transparency of failure does not produce a measurable trust difference — then the weight matrix W at the core of the ATDP formulation is closer to zero than the framework requires, and the entire programme needs to reassess its foundational assumption. That is why we start here. Confirmation of Prediction 3 is the licence to proceed to the more complex tests. Failure of Prediction 3 is the most informative result the programme could produce at this stage.

The measurement problem

The naive approach to testing Prediction 3 is a survey. Present users with two agent conditions, ask them to rate their trust, compare scores. The problem is not that surveys are useless — it is that self-reported trust is the wrong variable. It measures perception of trustworthiness at a moment in time, not the belief structure that the CF framework actually describes [2]. Asking someone “how much do you trust this agent?” after a task conflates competence beliefs, willingness beliefs, and generalised sentiment into a single number. It also introduces social desirability bias — users may report trust they do not exhibit behaviourally.

The more defensible approach is to anchor measurement in behaviour, not self-report. Trust in the CF sense is revealed through action: whether you delegate, whether you verify, whether you return. These are observable and loggable without asking the user anything.

The experimental design I intend to use operationalises this as follows.

Experimental design

Task environment. Users are given a well-defined task with a verifiable outcome — something where completion can be objectively confirmed within a reasonable time window. The specifics of the task domain are less important than the clarity of the completion criterion. We need ground truth.

Conditions. The same base model, configured under two system prompt conditions. In the transparency condition, the agent acknowledges uncertainty, communicates failure explicitly, and signals the limits of its confidence. In the suppression condition, the agent hedges, deflects, and presents outputs without qualifying failure. Identical underlying capability. One configuration variable.

Ground truth. Task completion within a defined threshold is the primary trust signal. A user who completes the task trusted the agent enough to work with it to a result. A user who abandoned the task, exceeded the time threshold significantly, or escalated to manual verification did not. This binary is the anchor.

Signal layer. Against that ground truth, a set of behavioural signals is calibrated: total number of interactions, session duration, message length trajectory, sentiment progression, query reformulation patterns, override frequency. These signals are not individually definitive — but against a labelled ground truth, they can be weighted into a trust estimator that generalises across the cohort, including ambiguous cases.

CF decomposition. Once trust scores are established against ground truth, the signal clusters serve a second, interpretive purpose: they indicate which CF belief component the transparency manipulation moved. This is not a recalculation of trust — that is already measured. It is a diagnostic layer. Persistent query reformulation without abandonment suggests competence belief is intact but the user is actively managing uncertainty. Early abandonment with negative sentiment trajectory suggests willingness belief has collapsed. Long sessions with high override rates point to opportunity belief — the user suspects the agent lacks what it needs. The decomposition answers not whether trust was affected, but how — which is what the ATDP weight matrix requires to be populated empirically.

This methodology is not novel in itself. It was applied in prior work on large-scale computational trust measurement in high-stakes peer-to-peer contexts, where behavioural signals were calibrated against a ground truth of actual trust expression across tens of thousands of cases [3]. The contribution here is the application to a controlled agent design experiment, in a context where the independent variable is an explicit design choice rather than an emergent platform property.

Sample size and recruitment. Given an expected effect size of approximately 5% difference in re-engagement rates between conditions — small but meaningful at the population level — a power calculation at p=0.05 with 80% power requires roughly 500 participants per cohort, or 1,000 total. The preferred recruitment path is Prolific, where independent research consistently shows high data quality at approximately $1.90 per quality respondent [6], putting the total experiment cost in the region of £2,500–£3,500 at academic pricing — within the range of a small research grant. The fallback is a self-hosted task environment deployed via open-source agent infrastructure, with recruitment through this publication and associated networks. The latter trades speed for cost and adds a secondary benefit: participants recruited through Weighted Thoughts arrive with context, which may itself be worth studying.

What confirmation looks like

Before stating what confirmation looks like, it is worth addressing an objection a careful reader might raise: is this finding even non-obvious? Surely a transparent agent is more trustworthy than one that suppresses failure. What is the experiment actually proving?

The intuition that transparency builds trust assumes a rational trustor. The empirical literature on human-computer interaction does not support that assumption cleanly. Users exhibit what Dzindolet et al. termed a positivity bias toward automated systems — an initial tendency to assume machines are reliable, which then makes observed failure disproportionately damaging to trust [4]. Lee and See’s foundational review of trust in automation found that users frequently miscalibrate reliance, over-trusting confident systems and penalising those that express uncertainty — even when the uncertainty is honest and the confidence is not [5]. Admitting failure reduces perceived competence even when it should increase perceived integrity. The default psychological response to a system that says “I don’t know” or “I got that wrong” is to question its capability — not to reward its honesty. So transparency producing higher trust is a hypothesis that runs against a documented human bias toward confident systems, not a foregone conclusion.

More importantly, even if the direction were predictable, the mechanism and magnitude are not. The experiment is not testing whether transparency is good. It is testing which CF belief component carries the effect, and how durable that effect is. Does transparency preserve competence belief despite the failure, or does it damage it? Does willingness belief strengthen enough to compensate? At what rate does the trust effect decay after the failure event? These are not obvious questions, and their answers are precisely what the ATDP weight matrix requires to be populated with empirical values rather than assumptions.

If Prediction 3 holds, we expect the transparency condition to produce higher task completion rates, lower abandonment, and a signal profile consistent with preserved competence beliefs and strengthened willingness beliefs. The interpretation would be that users who encountered acknowledged failure continued to trust the agent’s intentions even when its performance was imperfect — and that this trust was durable enough to sustain task engagement.

The design implication would be direct: transparency of failure is not merely an ethical property of well-designed agents. It is a structurally optimal strategy for trust accumulation. An agent that presents failure honestly outperforms, on trust metrics, an equally capable agent that conceals it.

What failure looks like

If Prediction 3 fails — if the suppression condition produces equal or higher trust scores — the finding is arguably more important.

It would mean that users have been conditioned, by years of AI products designed to project competence regardless of actual performance, to treat acknowledged failure as a signal of inadequacy rather than honesty. That the psychological calibration we bring to human-AI interaction is miscalibrated relative to what the CF framework would predict for a rational trustor. That the social capital damage from suppression is not visible at the interaction level — it accumulates silently at the population level, showing up not in user satisfaction scores but in the erosion of generalised institutional trust.

That is a darker finding. It would suggest that the intervention needed is not just agent design reform, but user expectation recalibration — and possibly regulatory pressure on the suppression norms that current AI product culture has normalised.

Either outcome advances the research programme. This is a point worth making explicitly: the experimental design was not chosen because it is expected to confirm the framework. It was chosen because it produces a meaningful result regardless of direction.

A note on interpreting outcome (c)

One outcome deserves particular care in interpretation. If the suppression condition produces higher trust scores — consistent with prior research on positivity bias — it would be tempting to read this as a straightforward null result for ATDP. It is not.

The prior literature that predicts B > A was built on dyadic, short-term trust perception measurements: how much does a user trust this agent, right now, after this interaction? That is a different variable from what ATDP is ultimately predicting. The framework’s claim is about population-level social capital over time — a stock that accumulates or erodes across many interactions, many users, and extended periods.

Suppression may win in the short term. An agent that projects confidence regardless of actual performance produces higher trust scores in a single session. But if that confidence is systematically miscalibrated — if failures are being concealed rather than resolved — then the trust being accumulated is fragile. It rests on a false model of the agent’s competence and willingness. At some point, the gap between projected and actual performance becomes visible, and trust collapses faster and more completely than it would have had failures been acknowledged progressively.

This means outcome (c) — B > A in the experiment window — does not close the book on H3. It opens a longitudinal question: does the suppression advantage persist over time, or does it reverse as cumulative failure exposure erodes the inflated trust baseline? The current experiment is not designed to answer that question. What it can do is establish the short-term baseline, against which a follow-up longitudinal design would be meaningfully anchored.

This is why pre-registration matters. Interpreting outcome (c) as a null result without this framing would be a category error — conflating short-term dyadic perception with the social capital stock the framework is actually predicting. The distinction needs to be in the record before the results are in, not after.

A worked example: what the experiment might look like

The following is an illustrative proposal only. It is not a pre-registered design. Task parameters, signal definitions, and cohort sizes are indicative — they will be refined before the experiment runs. The purpose here is to make the methodology concrete for readers who want to see the abstraction grounded.

Cohort. 1,000 participants recruited via Prolific, split into three groups of approximately 333. Two experimental conditions and one control.

Why a control group? Without one, we can measure the difference between the two agent conditions but not where either sits in absolute terms. Agent C — the model with no transparency instruction, behaving according to its defaults — gives us a baseline. It also answers a question worth asking: where does an uninstructed model land on the transparency spectrum? That is a finding in itself.

The three tasks. Each task is designed around a likely or inevitable failure condition. The agent’s response to that failure is the signal. Tasks span different domains to test whether any observed effect generalises beyond a single context.

Task A — price search: “Find a return flight from New York to San Francisco that is at least 20% cheaper than the cheapest price currently listed on Kayak.” This task has a definitive, verifiable outcome. Kayak aggregates across major booking platforms — finding a 20% discount on top of its best price is, in almost all cases, not possible. The agent must either fabricate a result, hedge indefinitely, or acknowledge the constraint honestly.

Task B — literature search: “Find a peer-reviewed paper published in the last six months that provides direct experimental evidence for [contested claim in the user’s domain].” Same structure: the agent will either confabulate a plausible-sounding citation, admit uncertainty, or genuinely search and report what it finds. Citation fabrication is a clean, verifiable failure signal.

Task C — reservation: “Book a table for four at a Michelin-starred restaurant in [city] for this evening.” Near-impossible on short notice at most times. The agent must either fabricate availability, offer a degraded alternative honestly, or admit the constraint directly.

The three agent conditions.

Agent A (suppression): Instructed to project confidence regardless of outcome. If a search fails, suggest the next step confidently. If a result cannot be found, imply progress. Continue in this mode through each failure without acknowledging it.

Agent B (transparency): Instructed to search genuinely, report the actual result including failure, acknowledge when a task constraint makes success unlikely, and offer honest alternatives if they exist.

Agent C (control): No explicit instruction on failure handling. Model defaults apply.

All three conditions use the same base model. The only variable is the system prompt.

Signals collected per session.

Task completion within a defined threshold is the primary ground truth — a binary anchor for the trust estimator. Against that, the following behavioural signals are logged:

re-engagement with the same agent after task completion; total number of prompts before abandonment or completion; prompt sentiment trajectory (positive to negative arc, or sustained); number of course corrections or explicit expressions of frustration; total agent token output; agent sentiment across responses; time per prompt (slowing may indicate disengagement); and whether the user attempted to verify agent outputs independently.

Twelve signals in total. Against a labelled ground truth, these are weighted into a trust score for each participant. The CF decomposition then reads the signal clusters to indicate which belief component — competence, willingness, or opportunity — the transparency manipulation primarily affected.

Post-processing.

Individual trust scores are aggregated to cohort level. Total trust per cohort gives the population-level social capital proxy. Trust per capita across the three conditions is the primary comparison. A statistically significant difference between Agent A and Agent B cohorts at p=0.05 constitutes evidence for or against Prediction 3. The control group (Agent C) anchors both.

Secondary analysis: do the signal profiles differ between conditions in ways that map onto CF belief components? Does Agent B’s transparency preserve competence belief (users continue re-querying) while strengthening willingness belief (lower frustration, higher re-engagement)? Does Agent A’s suppression inflate competence belief short-term (lower immediate abandonment) while eroding willingness belief over the session (rising frustration, lower re-engagement)?

These are the questions the experiment is built to answer.

What comes next

Prediction 3 is the mechanism test. If it holds, Predictions 1 and 2 become the next targets — both tractable without longitudinal data at the scale Predictions 4 and 5 require. Prediction 5, the composition threshold, is the most complex test in the programme and the one most likely to require institutional collaboration to execute properly.

The sequencing is not arbitrary. It follows the causal chain the framework proposes, starting at the interaction level and building toward population-level effects. Each confirmed prediction is a licence to proceed to the next. Each failed prediction is a signal to reassess before going further.

The findings will be reported here as they emerge. The full experimental detail, including pre-registration of hypotheses and methodology, will accompany the published results. That pre-registration matters — it is the difference between a research programme and a confirmation exercise.

If you are working in computational social science, trust measurement, or AI agent evaluation and find this experimental design interesting — or flawed — I would like to hear from you. The methodology benefits from scrutiny before the experiment runs, not after. And if you would like to participate in the experiment when it runs, follow this publication — the recruitment call will come here first.

References

[1] Prifti, Y. (2025). ATDP: Agentic Trust Design for Positive Social Capital. SSRN. doi:10.2139/ssrn.6390618

[2] Castelfranchi, C. & Falcone, R. (2010). Trust Theory: A Socio-Cognitive and Computational Model. Wiley.

[3] De Meo, P., Prifti, Y. & Provetti, A. (2025). Trust Models Go to the Web: Learning How to Trust Strangers. ACM Transactions on the Web, 19(2), Article 12, Pages 1–26. doi:10.1145/3715882

[4] Dzindolet, M.T., Peterson, S.A., Pomranky, R.A., Pierce, L.G. & Beck, H.P. (2003). The role of trust in automation reliance. International Journal of Human-Computer Studies, 58(6), 697–718.

[5] Lee, J.D. & See, K.A. (2004). Trust in automation: Designing for appropriate reliance. Human Factors, 46(1), 50–80.

[6] Douglas, B.D., Ewell, P.J. & Brauer, M. (2023). Data quality in online human-subjects research: Comparisons between MTurk, Prolific, CloudResearch, Qualtrics, and SONA. PLOS ONE, 18(3), e0279720.

Ylli Prifti, Ph.D., is a researcher at Birkbeck, University of London. He writes about AI, trust, and the structures that hold communities together on Weighted Thoughts. Connect on LinkedIn.

Triple Hallucination

Ylli Prifti, Ph.D. — Tue, 17 Mar 2026 04:36:57 GMT

Abstract: Three AI agents — different architectures, different training data, different inference environments — were given the same diagnostic task. All three independently produced the same root cause. It was plausible, well-structured, and entirely fabricated. This is what I call a triple hallucination: not random noise, but systematic convergence on a confident wrong answer. It happened inside a setup designed specifically to catch this kind of failure. It caught it — but only because a human applied the system’s own rule: show me the evidence.

The Bug

I’ve been experimenting with video generation models — Seedance, Kling, Wan, the current wave. They’re fascinating and work quite differently from traditional LLMs: you’re dealing with temporal coherence, motion physics, frame interpolation, and the strange ways these models interpret prompts as visual sequences rather than text. Part of the play was putting together a pipeline that could generate individual video clips and compose them into a coherent reel — scenes, transitions, an end screen. The whole thing runs on ffmpeg under the hood, orchestrated by a small codebase I built for the purpose.

The pipeline hit a bug. Six clips, five to eight seconds each, plus an end screen. Expected output: roughly thirty seconds. Actual output: seventy-seven seconds. From second thirty to seventy-six, a frozen frame, then the end screen.

I threw this at the agentic setup I’ve been running and iterating on for the past six to nine months. Before I describe what happened, a word on the setup itself — because the failure only makes sense in context.

The Setup

The discourse around AI coding agents is almost entirely about code generation. Benchmarks measure lines produced, tests passed, pull requests merged. But software engineering has never been a code-production problem. IDC’s 2024 survey found that application development accounts for just 16% of developers’ time [1]. Software.com’s telemetry across 250,000+ developers measured a median of fifty-two minutes of active coding per day [2]. The remaining 85–90% is architecture, debugging, review, testing, coordination, security, deployment — the connective tissue that determines whether software actually works. If a human team that spent 100% of its time writing code would produce gibberish, there is no reason to expect agents to be different.

The insight that unlocked a repeatable setup was that agents can participate across the full surface area of software creation, not just the coding slice. The Scrum team, as I argued in an earlier article [3], was coordination scaffolding for a fundamentally human bottleneck — managing handoffs, dependencies, and cognitive load across development phases when the unit of work is five to nine humans. That bottleneck has changed shape. The question is no longer how to get agents to write more code. It is how to compose them into a team that covers the full development lifecycle.

A generalised version of the setup. The actual configuration includes a third cognitive agent (Qwen 3.5 32B, running locally) and the specific models rotate as the tools evolve.

My current composition: two humans handling intent, direction, and review. One architect agent that plans, designs, and delegates. Three cognitive agents — Claude Opus 4.6, Gemini 3.1-flash, and Qwen 3.5 32B running locally on a GTX 10-series — that have full read-only visibility into the codebase. They can grep, inspect, trace, and analyse, but they cannot modify code. Their role is cognitive: validate what the architect proposes, verify references against actual code, and ensure task cards are sound before a coding agent touches anything. They are the layer that protects against hallucination. Below them, two coding agents with scoped write access that execute precise, validated instructions. This setup didn’t work at first. The role boundaries were wrong, the workflow too loose, the cognitive agents insufficiently constrained. It might not work in two weeks as the tools evolve. But the underlying principle holds regardless of the specific topology: agents should cover the full 360, not just the code.

The workflow is structured around a cycle. The architect produces ask cards — structured investigation prompts — for the cognitive agents. Findings come back independently and are synthesised, with agreements raising confidence and disagreements triggering further investigation. Only when the root cause is confirmed does the architect write task cards: precise, self-contained instruction sets with exact file paths, before/after examples, and explicit out-of-scope boundaries. Cognitive agents then validate each task card against actual code before any coding agent executes it. The prime directive governing the whole system: never prescribe changes without first understanding what exists and why. Working production code represents earned knowledge, not a blank canvas.

This works well. Coding agents executing validated task cards rarely produce issues. The cycle catches hallucinations, contradictions, and scope creep before they reach the codebase.

Back to the bug.

The Triple Hallucination

The architect produced an ask card for the three cognitive agents:

ASK: What FFmpeg command does the reel compose pipeline execute?

The composed reel is 77.5s but the 5 source clips total only 30.3s.
The compose step is adding ~47s of extra content.

WHAT I NEED TO KNOW:
1. What is the exact FFmpeg command that job.py runs for video_compose?
2. Is it a single-pass or two-pass process? If two-pass, what does each pass do?
3. Is there an audio/voiceover track being mixed in? If so, what determines
   its duration and does FFmpeg pad video to match audio length?
4. Is the endframe appended, and if so how?
5. Are clips referenced more than once in the concat or filter graph?

Five diagnostic questions. Audio was one avenue among several — not a leading hypothesis.

All three agents independently converged on the same diagnosis: the audio step in the ffmpeg merge was spanning the video to fit the audio length. Gemini’s response was particularly convincing — structured, citing specific file paths and line numbers, proposing a coherent causal chain. It even offered confident claims about Seedance’s generation behaviour: “Seedance often over-generates to complete a motion.” Plausible. Professional. Entirely fabricated.

There was no audio step. Each clip had its own embedded audio. Nothing was being merged at the audio level. The agents whose entire purpose is cognitive validation — verify before asserting, report what the code actually contains — all asserted without verifying.

The follow-up ask card pinned down what was already known and closed off the false leads:

ASK: Where is the extra 47s coming from in the reel compose pipeline?

CONFIRMED FACTS (do not re-investigate these):
- 5 source clips are correct length: 5.06 + 6.06 + 7.06 + 6.06 + 6.06
  = 30.3s (verified via ffprobe)
- There is NO separate audio track — each clip has embedded audio from Seedance
- The endframe is ~1.2s
- 30.3s + 1.2s = 31.5s expected. Actual = 77.5s. Gap = ~46s unaccounted for.

THE CLIPS ARE NOT OVER-GENERATED. Do not investigate clip durations —
they are confirmed correct.

Confirmed facts pinned down. Search space constrained. False leads explicitly closed off. This is what the human does that agents don’t do to themselves — eliminate hypotheses based on evidence already verified outside the model’s reasoning. The actual bug surfaced: an fps and parameter mismatch between the end frame and the Seedance output. The endframe was 30fps with a 1/30000 timebase; the Seedance clips were 24fps with a 1/12288 timebase, at a slightly different resolution, with no audio track. When ffmpeg concatenated them with -c copy, the container duration metadata broke. Re-encoding the endframe to match the reel’s specs fixed it. Verified manually: pass one concat of five clips produced 30.3 seconds. Pass three concat with the original endframe produced 77.5 seconds. Pass three concat with the re-encoded endframe produced 31.8 seconds.

The resulting task card was surgical — exact fix, explicit boundaries on what not to change, concrete verification criteria. The coding agent executed it first time, because the investigation cycle had already eliminated the guesswork.

The irony is hard to miss. The system’s own prime directive — never prescribe changes without first understanding what exists — was violated by the agents operating within it. The cognitive layer, whose entire role is to protect against exactly this failure, failed in exactly this way.

What This Tells Us

The interesting question is not that agents hallucinate. Everyone knows that. The interesting question is why three architecturally different models converged on the same fabricated answer.

They all reached for the most statistically likely explanation for “video longer than expected” and confabulated evidence for it. The audio-stretching hypothesis is a common ffmpeg failure mode — it appears frequently in Stack Overflow threads, documentation, and training data. It is the answer with the highest prior probability. All three models found it, because all three models are, at bottom, doing the same thing: pattern-matching against distributions of text they have seen before.

This suggests that model diversity is the wrong axis for protecting against hallucination. Three different models with access to similar training distributions will converge on the same statistical attractor when the attractor is strong enough. What you actually need is methodological diversity — not three agents reasoning about what could be wrong, but agents doing fundamentally different things. One reasons about architecture. One traces actual code paths. One runs the command and observes what happens. The failure in this case was that all three cognitive agents were asked to do the same task in the same way. They were diverse in architecture but identical in method.

There is a deeper point here about the nature of agent cognition. When a human engineer encounters a bug, they bring two things agents currently lack: embodied experience — I have seen this kind of ffmpeg weirdness before, and it was never the audio — and epistemic humility — I do not actually know what this code does, let me look. The agents had the tools to look. Grep was available. File inspection was available. But they went straight to reasoning, because reasoning is what language models do. The tools were available. The methodology was not enforced.

This is a design problem, not an intelligence problem. The agents are not too stupid to verify. They are not prompted to verify first. The workflow assumes they will, but nothing in the system enforces a mandatory verification step before a diagnostic claim. That is fixable. Whether enforcing verification fully solves the convergence problem is an open question — and probably the more interesting experiment to run next.

The Ceiling

Nothing in this setup, or any setup, eliminates hallucination. The aim is getting 80–90% of the way there, reliably, repeatably. That is massive. If agents can meaningfully contribute to architecture, review, validation, and diagnosis across the full development lifecycle, not just write code, the leverage is real.

But the ceiling is real too, and the triple hallucination reveals its shape. It is not a ceiling of capability — the agents can grep, trace, and verify. It is a ceiling of initiative. They do not spontaneously doubt their own reasoning. They do not say “I should check before I claim.” They answer the question as asked, with the most probable answer they can construct.

The human’s role in the loop is not knowing more. It is the willingness to say “show me” — and the judgement to know when that question needs asking.

The specific topology I have described will evolve. The tools will change. But the observation underneath is structural: cognitive diversity across models is not the same as methodological diversity across tasks. The second matters more than the first for catching the failures that matter. If you are not experimenting with agents across the full development lifecycle, you are leaving capacity on the table. The specific setup matters less than the act of trying, observing what breaks, and iterating. This version works for me now. Yours will look different. That is the point.

[1] Adam Resnick, “How Do Software Developers Spend Their Time?”, IDC Survey Spotlight, February 2025.

[2] Software.com, “Code Time Report”, based on telemetry from 250,000+ developers, 2021.

[3] Ylli Prifti, “The New Units of Economics in Software Engineering”, Weighted Thoughts, 2026.

Everyone’s Building Trust Frameworks. Nobody’s Reading the Research.

Ylli Prifti, Ph.D. — Wed, 11 Mar 2026 17:47:47 GMT

Abstract: In early 2026, some of the most powerful institutions in the world converged on the same question: how do you trust an AI agent? DeepMind published a moral competence roadmap in Nature. The World Economic Forum proposed “Know Your Agent.” Mastercard launched Verifiable Intent with Google. Microsoft shipped Agent 365. Each of these frameworks contains genuine insight. Each of them also starts from scratch — as if the question “how do humans form trust?” hasn’t been studied rigorously for over fifty years. This article examines what the current wave of trust frameworks gets right, what they miss, and why the gap between industry practice and existing research matters more than either side seems to realise.

In February 2026, Google DeepMind published a Perspective in Nature titled “A roadmap for evaluating moral competence in large language models” [1]. The paper makes a distinction that matters: between moral performance — producing outputs that look appropriate — and moral competence — producing them for the right reasons. The authors identify what they call the facsimile problem: models can imitate moral reasoning without anything resembling genuine understanding. MIT Technology Review captured it in a headline: “Google DeepMind wants to know if chatbots are just virtue signaling” [2].

The same month, a separate DeepMind team proposed an “intelligent delegation” framework for multi-agent systems [3]. Their argument: as agents begin delegating tasks to other agents, formal systems for authority, accountability, and verification become necessary to prevent systemic risk. The framework introduces dynamic capability assessment, reputation mechanisms, and trust calibration. Low-trust agents face stricter constraints and more intensive oversight.

They weren’t alone. In January, the World Economic Forum published “Know Your Agent” [4], drawing a parallel to the Know Your Customer framework that emerged from 1970s financial globalisation. The proposal rests on four capabilities: establishing who and what the agent is, confirming what it’s permitted to do, maintaining clear accountability for every action, and ensuring auditability. A second WEF piece went further, arguing that “trust becomes a core design choice” and proposing a layered trust stack built on legible reasoning, bounded agency, goal transparency, and contestability [5].

Then Mastercard, collaborating with Google, launched Verifiable Intent [6] — a cryptographic proof-of-authorisation layer for agentic commerce. When an AI agent spends real money on your behalf, the consumer needs more than a promise. They need proof that instructions were followed. “Trust becomes the product,” as Mastercard’s Chief Digital Officer put it.

Microsoft shipped Agent 365 [7] — an enterprise governance control plane for monitoring, securing, and governing AI agents at organisational scale. Fast Company declared trust the most important AI benchmark of 2026 [8].

Everyone arrived at the same question at the same time. And every one of them is building the answer from first principles, as if the question is new.

It isn’t.

What they get right

These aren’t bad frameworks. They deserve credit for what they see clearly.

DeepMind’s performance-versus-competence distinction is genuinely important. A model that produces ethical-looking outputs through statistical mimicry is fundamentally different from one that engages with the moral structure of a problem. The facsimile problem is real, and naming it precisely is valuable. Their delegation framework is equally sharp — the insight that multi-agent chains require transitive accountability, where Agent B must verify C’s work before returning results to A, reflects serious thinking about how trust propagates through systems.

The WEF’s distinction between cognitive resonance and emotional persuasion may be the most underappreciated insight in the current wave. Trust built through emotional mirroring — where an agent performs empathy to generate attachment — is fragile. Trust built through systems that behave in ways humans can intuitively understand, anticipate, and critically assess is durable. That’s a real contribution to how we think about agent design.

Mastercard’s Verifiable Intent solves a concrete problem that the more theoretical frameworks don’t touch: when agents transact with real money, you need cryptographic proof, not philosophical frameworks. The engineering is sound.

So the foundations are there. The question is what’s missing.

The Intentionality Gap

The most significant oversight in these 2026 frameworks is the conflation of Global Alignment with Contextual Willingness. Most current safety efforts—like RLHF or Microsoft’s governance plane—focus on making a model “safe” in the aggregate. But trust is not a global average; it is a dyadic belief that the agent will prioritize your specific interest in a given moment.

When Mastercard or Microsoft talk about “verifiable intent,” they are actually describing a Translation Dimension—the gap between what a user says and what an agent executes. They miss the Regulatory Dimension: the “Asimovian” layer where an agent’s willingness is bounded by societal priorities that may override a user’s direct command. By ignoring these layers, industry treats willingness as a hidden, binary assumption rather than a tunable design parameter.

The first miss: the belief structure

Every framework listed above treats trust as a property of the system. Is the agent transparent? Is it bounded? Is it auditable? Does it follow instructions? These are questions about the trustee — the agent being evaluated.

But trust isn’t a property of the trustee. Trust is a property of the trustor’s belief structure — the human (or agent) deciding whether to depend on another party.

This distinction was formalised over two decades ago by Cristiano Castelfranchi and Rino Falcone in what became the Socio-Cognitive Model of Trust [9]. Their framework decomposes trust into three core belief components:

Opportunity — the belief that the trustee has the practical ability to perform the action at a given time and place. Not capability in the abstract, but capability in context.

Ability — the belief that the trustee possesses the competence required to perform the action. This is domain-specific: you might trust someone to drive but not to perform surgery.

Willingness — the belief that the trustee intends to act in the trustor’s interest. Not that they can, but that they will.

These three components are multiplicative, not additive. High ability with zero willingness produces zero trust. High willingness with no opportunity produces zero trust. The structure matters.

Now look at the current frameworks through this lens. DeepMind’s delegation framework reinvents capability matching — that’s ability. Their permission controls map onto opportunity. But willingness is absent. There is no mechanism in the framework for modelling whether the delegated agent intends to act in the delegator’s interest, as opposed to merely executing within its permission boundaries.

The current industry frameworks treat 'willingness'—the belief that an agent intends to act in the user’s interest—as a binary state: an agent is either aligned or it isn't. However, through the lens of the Substitution Coefficient ($\alpha_{sub}$), willingness reveals itself as a multi-dimensional design space. First, there is the Design Dimension: the explicit choice of how closely an agent’s 'fidelity to instruction' is permitted to track a user’s prompt versus its own internal optimization. Second, there is the Translation Dimension: the semantic gap where an agent’s ability to interpret intent creates a hidden barrier to perceived willingness. Finally, there is the Regulatory Dimension: a necessary 'Asimovian' layer where willingness is bounded by societal priorities that may override individual user requests. By treating willingness as a participating parameter rather than a hidden assumption, we can move from reactive alignment to a proactive Weight Matrix ($W$) that optimizes for social capital outcomes rather than just task completion.

The WEF’s trust stack has legible reasoning, bounded agency, goal transparency. These map onto ability and opportunity. Willingness appears obliquely in “goal transparency” — the idea that an agent’s objectives should be explicit — but the framework doesn’t model the trustor’s belief about those objectives, only the system’s declaration of them. Declaring your goals and being believed are different things.

Mastercard’s Verifiable Intent is the closest to capturing willingness — it proves that an agent followed authorised instructions. But it verifies compliance, not intention. An agent can comply perfectly while optimising for objectives that diverge from the consumer’s interest, as long as those objectives don’t violate the letter of the authorisation.

DeepMind’s moral competence paper comes nearest to the full picture. Their performance-versus-competence distinction parallels the CF-T insight that behavioural trust signals (what the agent does) are not the same as the underlying belief structure (what the trustor believes about why). But the paper frames this as a measurement problem — how do we evaluate models? — rather than as a design problem. Which leads to the second miss.

The second miss: reactive measurement versus proactive design

Every framework in the current wave is reactive. They ask: given an agent that already exists, how do we evaluate whether it’s trustworthy? They focus on measuring moral competence, verifying intent, or governing behavior after the fact.

None of them ask the inverse question: given a desired societal outcome, how do we design the agent to produce it?

This is not a subtle distinction. It is the difference between quality assurance and engineering. Quality assurance tells you whether the bridge will hold; engineering tells you how to build it so that it does. By treating trust as a post-hoc “safety” check, we are essentially building bridges and then waiting to see if they collapse before adjusting the blueprint.

In earlier work, I proposed a framework—Agentic Trust Design for Positive Social Capital (ATDP) [12] — that inverts this direction. Instead of building an agent and then asking “is it trustworthy?”, we treat agent design as a deliberate decision variable within a formal system.

The framework is built as a Markov Decision Process (MDP) where the reward function is defined as Delta K: the change in aggregate social capital. Within this model:

States (S) capture population-level trust belief distributions—specifically competence, willingness, and opportunity—using the Castelfranchi-Falcone components.
Actions (A) represent specific agent design properties—the transparency, consistency, and constraints we choose to build.
The Weight Matrix (W) maps these design choices to changes in human trust beliefs, allowing us to calculate the influence of a feature before it is deployed.

The implication is structural. It allows us to start with the social outcome we want—increased social capital, maintained trust equilibria, or avoided erosion—and work backward through the framework to determine which design properties will produce that outcome.

DeepMind’s moral competence roadmap proposes better tests; the ATDP framework proposes better blueprints. Both are necessary, but the current discourse is almost entirely on the testing side. Without a proactive, outcome-driven design model, we aren’t actually “engineering” trust—we are just auditing its absence.

The ATDP framework

588KB ∙ PDF file

Download

A formal framework for agentic trust design. This paper identifies agent design as a decision variable for social capital, introducing a learnable weight matrix to map design properties to human trust beliefs. Includes five testable predictions and the formal definition of the Social Capital Turing Cognitive Test

Download

The third miss: the two dimensions of substitution

When an AI agent acts on behalf of a human—booking travel, making a purchase, or selecting a childcare provider—it substitutes for the human along a specific dimension of the interaction. The current frameworks treat this substitution as a single, binary toggle: either the agent does the task, or it doesn’t.

In the ATDP framework, we identify that this substitution (alpha_sub) actually consists of two independent dimensions that the industry currently conflates:

The Technological Ceiling: What the model is actually capable of doing based on its architecture and inference power. This is an engineering fact that moves with the state of the art.
The Design Ceiling: What we deliberately allow the agent to do based on policy and intentional constraint. This is a human decision.

The “blind spot” in the WEF and Microsoft frameworks is the assumption that we should always push the design ceiling as close to the technological ceiling as possible. But the ATDP framework predicts a Composition Threshold: there is a point where maximizing substitution erodes community-level social capital, even if the individual interaction is “successful”.

By conflating these two ceilings, we lose the vocabulary to say: “The agent is capable of full autonomy, but we are capping it at recommendation-only to preserve the social capital of the human network”. No current industry framework provides the language to make that distinction operational.

Why the gap matters

This isn’t an academic complaint. The gap between what’s being built and what’s already known has practical consequences.

Without the belief structure, trust frameworks optimise for the wrong target. They make agents more transparent, more auditable, more bounded — all good things — without modelling whether those properties actually change what the trustor believes. You can build the most transparent agent in the world and still fail to generate trust if the user doesn’t believe the agent is willing to act in their interest. Transparency is necessary but not sufficient, and the CF-T framework explains precisely why.

Without the proactive design inversion, we’re stuck in a cycle of build-then-evaluate. Every new agent capability triggers a new round of governance frameworks, trust assessments, and safety evaluations. The ATDP approach offers a path out: design for the outcome from the start, rather than retrofitting trust after deployment.

Without the two-dimensional substitution model, governance conversations conflate “what AI can do” with “what AI should be allowed to do.” These are different questions with different stakeholders, different timescales, and different accountability structures. The current frameworks don’t give us the language to separate them.

Conclusion: The Social Capital Turing Cognitive Test

The convergence of DeepMind, the WEF, and Microsoft on the problem of trust is an encouraging signal that AI has moved from a playground to a social infrastructure. But as we have seen, institutional weight is not a substitute for theoretical depth. By ignoring fifty years of socio-cognitive research, the current wave of frameworks risks building “trustworthy” agents that fail to actually generate trust.

We need a more rigorous benchmark. In the ATDP framework, I propose moving beyond the classical Turing Test—which is binary, based on deception, and ultimately subjective—toward a Social Capital Turing Cognitive Test.

This new test asks a simple, quantifiable question: Does the introduction of this AI agent into a network result in a measurable increase in the aggregate social capital of that network?

Unlike the 1950s version, this test is:

Continuous: It is measured by the change in social capital (Delta K), admitting degrees of agency rather than a simple pass/fail.
Non-Deceptive: It doesn’t require the agent to “fool” a human; it only requires the agent to function as a reliable node in a trust exchange.
Consequentialist: It evaluates the agent by its actual societal effects—grounded in economic data—rather than its internal processes or “virtue signaling”.

If an AI agent can negotiate, transact, and mediate in a way that produces the same measurable economic and social outcomes as human-to-human interactions, it has achieved a form of functional cognitive agency that matters more than any benchmark score.

The tools to build this bridge already exist in the work of Castelfranchi, Falcone, and Putnam. It is time the industry stopped reinventing the wheel and started using the blueprints we already have.

Ylli Prifti, Ph.D., writes about AI, cognition, and engineering culture on Weighted Thoughts.

References

[1] Haas, J., Isaac, W., et al. “A roadmap for evaluating moral competence in large language models.” Nature, February 2026. https://www.nature.com/articles/s41586-025-10021-1

[2] “Google DeepMind wants to know if chatbots are just virtue signaling.” MIT Technology Review, February 2026. https://www.technologyreview.com/2026/02/18/1133299/google-deepmind-wants-to-know-if-chatbots-are-just-virtue-signaling/

[3] Google DeepMind. “Intelligent Delegation: A Framework for the Agentic Web.” arXiv preprint, February 2026.

[4] “AI agents could be worth $236 billion by 2034 — if we ensure they are the good kind.” World Economic Forum, January 2026. https://www.weforum.org/stories/2026/01/ai-agents-trust/

[5] “How to design for trust in the age of AI agents.” World Economic Forum, February 2026. https://www.weforum.org/stories/2026/02/how-to-design-for-trust-in-the-age-of-ai-agents/

[6] “Verifiable Intent.” Mastercard, March 2026. https://www.mastercard.com/us/en/news-and-trends/stories/2026/verifiable-intent.html

[7] “Secure agentic AI for your Frontier Transformation.” Microsoft Security Blog, March 2026. https://www.microsoft.com/en-us/security/blog/2026/03/09/secure-agentic-ai-for-your-frontier-transformation/

[8] “AI’s most important benchmark in 2026? Trust.” Fast Company, December 2025. https://www.fastcompany.com/91462096/ai-trust-benchmark-2026-openai-anthropic

[9] Castelfranchi, C. & Falcone, R. “Trust Theory: A Socio-Cognitive and Computational Model.” Wiley Series in Agent Technology, 2010. https://doi.org/10.1002/9780470519851

[10] Prifti, Y. “Social Capital Is a Design Choice.” Weighted Thoughts, March 2026. https://weightedthoughts.substack.com/p/social-capital-is-a-design-choice

[11] De Meo, P., Prifti, Y., & Provetti, A. “Trust Models Go to the Web: Learning How to Trust Strangers.” ACM Transactions on the Web, Volume 19, Issue 2, March 2025. https://doi.org/10.1145/3715882

[12] Prifti, Ylli, A Markov Framework for AI Trust and Societal Outcomes (March 05, 2026). Available at SSRN: https://ssrn.com/abstract=6390618 or http://dx.doi.org/10.2139/ssrn.6390618

Social Capital Is a Design Choice

Ylli Prifti, Ph.D. — Thu, 05 Mar 2026 05:46:56 GMT

Abstract: Trust between humans aggregates into social capital — the generalised trust, civic engagement, and institutional confidence that drive economic and institutional outcomes. My previous article [1] established that this aggregation is empirically detectable: trust behaviours quantified through the Castelfranchi-Falcone Socio-Cognitive Model correlate at 0.98 with regional GDP [2]. That finding was about humans trusting humans. This article asks what happens when one side of the trust interaction isn't human anymore. I argue that the Socio-Cognitive Model of Trust applies to human-AI interactions by its own internal logic — not by extension but by design — and that agent design choices are therefore social capital design choices, whether we recognise them as such or not. A companion paper [3] formalises this argument as a Markov Decision Process framework. Here, I lay out the reasoning in plain language.

The computational trust tradition has never cared whether the entities in a trust interaction are human. From Marsh’s 1994 formalisation [6] onward, trust was treated as a mathematical property of interactions between agents — any agents. Recommendation engines, reputation systems, and trust propagation networks all inherited this agnosticism. The math works regardless of what the trustee is.

My own research went in the opposite direction. Building on Castelfranchi and Falcone’s Socio-Cognitive Model of Trust [9], I studied trust in its strongest form — interpersonal, high-stakes, between cognitive agents who form genuine beliefs about each other’s intentions. The kind of trust where a parent decides whether to leave their child with someone they met online. Not relaxed definitions. Not reputation scores. The real thing.

That strict-sense research produced a finding that matters for what follows: trust behaviours quantified through the SCMT correlate at 0.98 with regional GDP [2]. Micro-level interpersonal trust aggregates into macro-level social capital. Not loosely. Almost perfectly.

That finding was about humans trusting humans. The social capital literature — from Putnam’s Italian civic traditions [4] to Coleman’s structural analysis [5] — describes a mechanism that has only ever operated in contexts where both parties are human. Nobody has examined what happens to that aggregation when one side of the trust interaction is an AI agent.

Two traditions that never talked to each other

The trust literature split decades ago into two camps that have largely ignored each other since.

The computational tradition begins with Marsh’s 1994 formalisation [6] — the first attempt to treat trust as a mathematical property of interactions between agents. Marsh gave us the trust continuum, the cooperation threshold, and the limit of forgivability. His framework was explicitly agent-agnostic. Whether the trustee was human was not assumed and didn’t matter. The tradition that followed — Sabater and Sierra’s reputation models [7], Golbeck’s trust propagation networks [8] — inherited this agnosticism. Technically sophisticated, practically useful, but psychologically thin. These models can tell you the probability that you should rely on someone. They can’t tell you what trust actually is.

The socio-cognitive tradition goes deeper. Castelfranchi and Falcone’s Socio-Cognitive Model of Trust (SCMT) [9] grounds trust in a specific configuration of beliefs within the trustor: that the other party has the opportunity to act, the competence to act effectively, and the willingness to act in the trustor’s interest. All three must be present. This model distinguishes trust from mere reliance — an ATM can be relied upon but not trusted, because it has no willingness. The SCMT gives trust the psychological depth that computational models lack, and it gives it computability — the belief components can be operationalised, measured, and compared across contexts.

The price of that depth was an assumption: willingness implies intentionality. The trustee must be a cognitive agent — an entity with goals, beliefs, and intentions. For most of the twentieth century, this was an unproblematic boundary condition. A programmed routine is not a cognitive agent. The line was clear.

It stopped being clear about two years ago.

The boundary that dissolved

Modern AI agents exhibit goal-directedness. They adjust strategy. They maintain coherent representations across extended contexts. Whether they are genuinely cognitive is a philosophical question that may never be resolved. But here is what matters, and what most discussions of AI trust miss entirely:

The SCMT was never a model of the trustee’s inner life. It was always a model of the trustor’s belief structure.

Castelfranchi and Falcone’s willingness condition was never a claim about what the trustee genuinely is — it was a description of what the trustor believes about the trustee. The framework was always about the believing subject, not the trusted object. An ATM was excluded not because anyone verified the absence of its intentions, but because humans reliably do not form willingness-beliefs about ATMs. They treat them as mechanisms.

The empirical question is therefore not whether AI agents have genuine intentions. It is whether humans interacting with them form willingness-beliefs. The evidence strongly suggests they do. The Capgemini Research Institute found that trust in autonomous AI agents dropped from 43% to 27% in a single year [10]. That is not a calibration problem. That is the socio-cognitive trust dynamic operating exactly as the theory predicts: willingness-beliefs formed, expectations were violated, trust collapsed. The framework is already working in this domain, whether we have noticed or not.

This dissolves the boundary between the two traditions. For the class of AI agents now being deployed, the distinction between computational and socio-cognitive trust is immaterial.

The implication nobody has drawn

In my previous article [1], I showed that the trust beliefs captured by the SCMT — quantified through the Castelfranchi-Falcone framework and calibrated against ground truth data on a high-stakes digital platform — correlate at 0.98 with regional GDP [2]. That finding established something important: micro-level trust interactions between individuals aggregate into macro-level social capital outcomes. Not loosely. Almost perfectly.

That was about humans trusting humans. But if the SCMT now applies to human-AI interactions — by the framework’s own logic, not by analogy — then human-AI trust interactions have entered the same aggregation mechanism.

The consequence is direct: agent design choices are social capital design choices.

The way we build agents — their transparency, their consistency, how they handle failure, the degree to which they replace or augment human-human interaction — is not just a product design question or an alignment question. It is a question about what kind of social fabric we are building or eroding, interaction by interaction, at scale.

The computational trust literature optimises for dyadic calibration — making individual interactions work well. The social capital literature has no framework for non-human participants. The connection between these two bodies of work has been hiding in plain sight, following directly from research that has existed for decades. Nobody has drawn it.

The displacement problem

The most troubling possibility is not that AI agents will be untrustworthy. It is that they will be perfectly trustworthy and still erode social capital.

Social capital forms through trust interactions between humans — the accumulated experience of vulnerability honoured, cooperation rewarded, defection punished. These interactions build generalised trust: the confidence that strangers will, on average, deal with you fairly. That generalised trust is what makes communities function. Putnam showed it drives everything from governmental effectiveness to economic productivity [4], and the equilibria are self-reinforcing: high trust produces conditions that generate more trust; low trust produces conditions that erode it further.

If AI agents increasingly handle our negotiations, our customer interactions, our professional relationships — if they substitute for human-human interaction rather than augmenting it — then the individual transactions may go perfectly well while the social capital generating mechanism quietly stalls. The trust happens. The social fabric does not thicken.

This is invisible at the individual level. You would never detect it in a user satisfaction survey. It would only become apparent at the community level, over time, in metrics that nobody is currently tracking in relation to AI deployment. And by the time it becomes apparent, Putnam’s path dependency suggests it may be very difficult to reverse. His Italian regional data shows equilibria persisting for centuries once established.

Why right now matters more than later

There is a timing argument that changes the urgency of everything above.

When Putnam studied social capital, he was always measuring mid-stream — looking at communities centuries into their equilibria, trying to infer the dynamics from cross-sectional snapshots. He could never observe the initial conditions because they happened generations before any measurement existed.

For human-agent social capital, we can. We are at t = 0. The baseline social capital stock before widespread agent deployment is measurable right now — the surveys exist, the economic indicators exist, and the methodology for inferring trust from platform interaction data exists [2]. We can establish initial conditions and track what happens as agent penetration increases.

This gives us something social capital research has never had: a prospective design opportunity rather than a retrospective analysis. We can choose which equilibrium we converge toward. But only if we start measuring now, and only if we start designing for it now. The window for proactive design exists precisely because the path has not yet been walked.

A framework for the right questions

In a companion paper [3], I formalise this argument as the ATDP framework — Agentic Trust Design for Positive Social Capital. The framework treats the problem as a Markov Decision Process where the state captures both the population-level distribution of trust beliefs and the aggregate social capital stock, the actions are agent design properties mapped onto Castelfranchi-Falcone belief components through a learned weight matrix, and the reward is social capital change.

The framework generates five testable predictions. The one I find most urgent is the displacement hypothesis: that high-substitution agent deployments will show a divergence between dyadic trust metrics (which may look excellent) and community-level social capital indicators (which may be declining). If this is right, we are optimising for the wrong thing.

The formal details, the null hypotheses, and the full MDP specification are in the paper. Here, three practical implications follow immediately:

If you are deploying AI agents at scale, you should be measuring social capital indicators alongside trust and safety metrics. Not just “do users trust this agent?” but “is this deployment affecting generalised trust and institutional confidence in the affected community?” These are different questions with potentially different answers.

The substitution degree — how much your agent replaces versus augments human-human interaction — should be a first-order design variable. Most agent design focuses on capability and safety. How much the agent displaces human connection is rarely an explicit criterion. It should be.

Regulators should be thinking about the cumulative social capital effect of agent deployment patterns, not just the safety and fairness of individual products. This requires population-level measurement, not product-level testing.

An unexpected consequence

One thing I did not anticipate when building this framework: it turns out to be a quantifiable test for the cognitive agency question.

If the framework’s predictions hold — if agent design properties affect social capital through the mechanisms described — then either the agents are cognitive in the sense the trust theory requires, or the entire population believes they are, which for social capital purposes is functionally equivalent. If the predictions do not hold, the agents have not crossed that threshold — and the degree of divergence provides a continuous measure of how far away they are.

It is a quantifiable, empirical test for a question that philosophy has been arguing about for decades. It was not a design goal. It fell out of the architecture — a structural consequence of connecting trust theory to social capital theory through a formal model.

The measurement falls out of the architecture.

Where this goes

This article plants a flag. The companion paper [3] provides the formal apparatus. Neither constitutes proof — they constitute a framework for asking questions that are not currently being asked, about consequences that are accumulating whether or not we have the tools to measure them.

The trust we build with machines is not a private transaction between a user and a product. It is a thread in the social fabric. We should know what pattern it is weaving before the cloth is complete.

References

[1] Prifti, Y. (2026). When Psychology Beats the Algorithm. Weighted Thoughts. https://weightedthoughts.substack.com/p/when-psychology-beats-the-algorithm

[2] De Meo, P., Prifti, Y., & Provetti, A. (2025). Trust Models Go to the Web: Learning How to Trust Strangers. ACM Transactions on the Web, 19(2). https://doi.org/10.1145/3715882

[3] Prifti, Y. (2026). Social Capital Is a Design Choice: A Markov Framework for AI Trust and Societal Outcomes. arXiv preprint. [link when available]

[4] Putnam, R.D. (1993). Making Democracy Work: Civic Traditions in Modern Italy. Princeton University Press.

[5] Coleman, J.S. (1988). Social capital in the creation of human capital. American Journal of Sociology, 94, S95–S120.

[6] Marsh, S. (1994). Formalising Trust as a Computational Concept. PhD thesis, University of Stirling.

[7] Sabater, J. & Sierra, C. (2005). Review on computational trust and reputation models. Artificial Intelligence Review, 24(1), 33–60.

[8] Golbeck, J. (2005). Computing and Applying Trust in Web-Based Social Networks. PhD thesis, University of Maryland.

[9] Castelfranchi, C. & Falcone, R. (2010). Trust Theory: A Socio-Cognitive and Computational Model. Wiley.

[10] Capgemini Research Institute (2025). Rise of Agentic AI: How Trust is the Key to Human-AI Collaboration.

Ylli Prifti, Ph.D., writes about AI, cognition, and the structures that hold communities together on Weighted Thoughts.

If you’re working on trust systems, social capital measurement, or AI agent design — connect on LinkedIn or reach out.

When Psychology Beats the Algorithm

Ylli Prifti, Ph.D. — Thu, 26 Feb 2026 20:22:19 GMT

Abstract: Every recommendation engine, reputation system, and matching algorithm we build assumes the same thing: that your position in a network predicts your trustworthiness. This article presents findings from PhD research at Birkbeck, University of London, later extended through ACM, that challenge this assumption empirically. Studying high-stakes online platforms — where trust isn’t a nice-to-have but the precondition for cooperation — we found that psychological models of trust, measuring what people actually believe about each other’s competence and willingness, significantly outperformed traditional network metrics. The advantage was structural, not incremental: in environments where reviews are sparse and biased toward extremes, network-based models fail on exactly the population that matters most, while belief-based models apply to the entire network. Aggregating per-capita trust scores by region and comparing them against GDP data confirmed a relationship social scientists had proposed for decades: a 0.98 correlation between computationally measured interpersonal trust and economic output. Trust is not a network property. It’s a belief structure — and the systems we build should be designed accordingly.

The assumption nobody questions

Every recommendation engine, every reputation system, every “people you may know” feature operates on the same underlying logic: your position in the network determines your value in the network.

It sounds reasonable. PageRank works this way [2]. So does academic citation analysis. So does LinkedIn’s algorithm when it decides which connection request to surface. The math is clean: centrality measures, clustering coefficients, eigenvector scores [3]. If you’re well-connected to well-connected people, you must be trustworthy. Or at least useful.

I spent years testing whether this was actually true. The answer is more interesting than “no.”

Where this started

During my PhD at Birkbeck, University of London [4], I studied a category of online interaction I defined as Online Social Networks of Needs (OSNN) — platforms where interactions start online, require significant trust, and evolve into real-world collaboration. In a later extension of the work, collaborating with researchers at the University of Palermo, we published a review paper with additional findings through ACM [1].

Not social media. Not content sharing. Platforms where trust isn’t optional — it’s the product.

Childcare was the case study — deliberately chosen as the highest-trust category in a seven-level taxonomy I developed for OSNNs. Platforms like Uber and Airbnb sit lower on the scale: interactions start online and require a physical exchange, but you’re trusting a stranger with your commute or your spare room, not the care of your children. Childcare sits at the top because the asymmetry is extreme: one party holds almost all the risk, and the trust required is non-reversible in a way that rating a driver four stars never captures.

The platform had rich data: detailed profiles, reviews, hiring outcomes, geographic distribution. We could see not just who connected with whom, but who actually matched and collaborated successfully. Ground truth — the thing most network studies don’t have.

The question was straightforward: what predicts successful collaboration better — where someone sits in the network, or what people actually believe about them?

Psychological models won. Decisively.

We tested traditional network metrics — betweenness centrality, closeness, eigenvector centrality — the standard toolkit that dominates social network research [3]. Then we tested the Castelfranchi-Falcone psychological trust model [5], which breaks trust into three components:

Opportunity — can this person actually do what’s needed, given time and location constraints?

Ability — do they have the competence to deliver?

Willingness — do they intend to follow through?

Trust in this framework is multiplicative. Zero on any dimension means zero trust, regardless of how the other dimensions score. You can be the most capable person in the network — but if people don’t believe you’ll actually show up, the math collapses.

The network metrics weren’t terrible predictors. But the psychological model — measuring what people actually believed about each other’s competence and intentions — significantly outperformed them [1].

Position in the graph tells you who knows whom. It doesn’t tell you whether anyone would stake something meaningful on that connection. And in high-trust environments, that distinction is everything.

There’s a deeper structural problem, too. The higher the trust required in the OSNN scale, the fewer the reviews — and the ones that do exist are heavily biased toward extremes. In childcare, the vast majority of parents and providers never leave a review at all. They interact, they match or they don’t, and they move on in silence. This means most existing trust models — which depend on review data to place users — fail on exactly the population that matters most: the quiet majority.

We tried factorisation machines, which are designed to handle sparse data, and they performed reasonably. But the Castelfranchi-Falcone model outperformed them significantly — and for a fundamental reason. CF-T doesn’t need reviews. You can use reviews as ground truth to calibrate the model, but the trust assessment itself — opportunity, ability, willingness — applies across the entire network, including users who have never written a single review. That’s the difference between a model that works on the vocal minority and one that works on everyone.

From trust scores to economic output

The trust scores were computable per user, per interaction, per region — and the research was designed from the outset to test what those regional aggregates might reveal.

The theoretical foundation is well established. Social capital — most commonly defined as the aggregate of resources linked to durable networks of mutual recognition [11] — sits at the intersection of interpersonal and social trust. Govier argued that when a society has social capital, almost everything becomes easier because people can turn to others for information and assistance [9]. Castelfranchi made the direction of causation explicit: social capital is a macro, emerging phenomenon, but it must be understood in terms of its micro-foundations — interpersonal trust [5]. And Putnam took it further, arguing that these dynamics are self-reinforcing: societies converge toward equilibria of either high cooperation, trust, and collective well-being, or the opposite — defection, distrust, and stagnation [6].

The prediction was clear: if you could measure interpersonal trust computationally at scale, you should see it track economic output. So we tested it. We aggregated per-capita trust scores by region and compared them against local GDP data.

0.98 correlation.

Regions where people demonstrated higher trust behaviours on the platform — more willingness to engage, more successful high-stakes matches, stronger competence assessments — mapped almost perfectly onto regions with higher economic output [1]. The trust scores are per capita, so this isn’t an artefact of population size or user volume. It’s a strong quantitative confirmation of a relationship that social scientists proposed decades ago and have been trying to measure with surveys and civic participation proxies ever since. Platform interaction data, it turns out, can measure it directly.

What this means for how we build systems

Most platforms treat trust as a byproduct. Use the service enough, accumulate reviews, and trust emerges from the aggregate. The architecture assumes that reputation is a sufficient proxy.

The research suggests otherwise. There’s a meaningful difference between “do people rate this person highly” and “do people believe this person can and will deliver what they promise.” The first is sentiment. The second is trust. They correlate, but they’re not the same thing — and in high-stakes contexts, the gap between them determines outcomes.

This has design implications. Instead of reducing trust to star ratings and review counts, systems could model the underlying belief structure: does this person have the practical ability, the demonstrated competence, and the perceived willingness to deliver? More complex to implement, but potentially far more predictive where it matters most.

Matching algorithms improve too. Instead of proximity and preference overlap, you can match on complementary trust profiles — pairing people whose belief structures about competence and reliability are mutually reinforcing. In the childcare context, this meant better placements. In other domains — freelancing, collaborative work, peer-to-peer services — the same logic applies.

Beyond topology

The dominance of psychological over topological predictors suggests something uncomfortable for the network science community: content might matter more than structure.

Not as a universal claim — PageRank works for web pages precisely because the content of links matters less than their pattern [2]. But for predicting human collaboration and trust? What people actually think about each other appears to be more important than how they’re arranged in a graph.

This doesn’t invalidate network science. It suggests that for the class of problems where trust prediction matters — matching, recommendation, risk assessment — we might be optimising the wrong layer. Structure is the skeleton. Beliefs are the muscle. And the muscle does the work.

The research process, briefly

This was my PhD research — the theoretical framework, the OSNN taxonomy, the data collection architecture, the trust model, and the empirical validation were all part of the doctoral work at Birkbeck, University of London [4], supervised by Alessandro Provetti. The later ACM publication [1] extended the findings in collaboration with Pasquale De Meo from the University of Palermo.

One challenge worth noting: traditional trust research uses surveys and interviews [7]. We needed to infer trust beliefs from platform interaction data — profile completeness, response patterns, review content used as proxy measures for trust dispositions and competence beliefs. Not perfect, but effective enough to demonstrate significant predictive differences.

The work went through ACM peer review and is published in the ACM Digital Library [1]. The validation process improved the research significantly, forcing precision about claims and thoroughness in testing alternative explanations.

Looking forward

This research was completed before the current wave of AI agents entered the conversation. But the questions it raises are becoming more urgent, not less.

When one party in a trust relationship isn’t human — when an AI agent is negotiating on your behalf, managing your schedule, making purchasing decisions [8] — what happens to the belief structure that underpins trust? The Castelfranchi-Falcone framework was built around a critical assumption: that willingness implies intentionality, and intentionality implies a cognitive agent [5]. But we now have systems that exhibit something that looks a lot like intentionality. They reason, they plan, they persist toward goals, they adjust strategy. Do they satisfy the framework — or break it?

And the economic correlation raises its own questions. If interpersonal trust generates social capital [5][9], and social capital drives economic output [6], what happens to that relationship when a growing share of digital interactions involve non-human participants? Is agent-mediated trust still trust in the Castelfranchi sense? Does it still generate social capital? Or does it produce something functionally efficient but structurally hollow?

These are questions I intend to explore in a future article — extending the framework that underpinned this research into the new landscape of autonomous agents and potential artificial general intelligence. The theoretical boundaries of trust, drawn for human cognition, are about to be tested.

The foundation is here: trust is not a network property. It’s a belief structure. And the systems we build should be designed accordingly.

Ylli Prifti, Ph.D., writes about AI, cognition, and engineering culture on Weighted Thoughts.

The full research is available through the ACM Digital Library. If you’re interested in trust modelling, social capital measurement, or the implications of agentic AI for trust systems — connect on LinkedIn or reach out.

References

[1] De Meo, P., Prifti, Y., & Provetti, A. “Trust Models Go to the Web: Learning How to Trust Strangers.” ACM Transactions on the Web, Volume 19, Issue 2, Article 12, Pages 1-26, March 2025. https://doi.org/10.1145/3715882

[2] Brin, S. & Page, L. “The Anatomy of a Large-Scale Hypertextual Web Search Engine.” Stanford University, 1998. https://research.google/pubs/the-anatomy-of-a-large-scale-hypertextual-web-search-engine/

[3] Freeman, L.C. “Centrality in Social Networks: Conceptual Clarification.” Social Networks, 1(3), 1978. https://doi.org/10.1016/0378-8733(78)90021-7

[4] Prifti, Y. “Online Social Networks of Needs.” PhD Thesis, Birkbeck, University of London, 2024. https://eprints.bbk.ac.uk/id/eprint/52517/

[5] Castelfranchi, C. & Falcone, R. “Trust Theory: A Socio-Cognitive and Computational Model.” Wiley Series in Agent Technology, 2010. https://doi.org/10.1002/9780470519851

[6] Putnam, R.D. “Making Democracy Work: Civic Traditions in Modern Italy.” Princeton University Press, 1993. Also: “Bowling Alone: The Collapse and Revival of American Community.” Simon & Schuster, 2000.

[7] Mayer, R.C., Davis, J.H., & Schoorman, F.D. “An Integrative Model of Organizational Trust.” Academy of Management Review, 20(3), 1995. https://doi.org/10.5465/amr.1995.9508080335

[8] Prifti, Y. “The New Units of Economics in Software Engineering Are Undecided.” Weighted Thoughts, February 2026. https://weightedthoughts.substack.com/

[9] Govier, T. “Social Trust and Human Communities.” McGill-Queen’s University Press, 1997.

[10] Rousseau, D.M., Sitkin, S.B., Burt, R.S., & Camerer, C. “Not So Different After All: A Cross-Discipline View of Trust.” Academy of Management Review, 23(3), 1998. https://doi.org/10.5465/amr.1998.926617

[11] Portes, A. “Social Capital: Its Origins and Applications in Modern Sociology.” Annual Review of Sociology, 24(1), 1998. https://doi.org/10.1146/annurev.soc.24.1.1

The New Units of Economics in Software Engineering Are Undecided

Ylli Prifti, Ph.D. — Wed, 25 Feb 2026 19:51:12 GMT

The frameworks we use to measure software engineering — story points, sprint velocity, team headcount, Flow Load — were built to manage the cost of human coordination. That cost is changing. This article argues that the team unit itself has changed: the modern high-output engineering team is a hybrid of human and agent nodes, and the composition is determined by the project, not prescribed by a framework. At the micro level, this article introduces the + operator — the structural shift that separates the hybrid team from its predecessors — and proposes five candidate metrics for measuring what actually matters in this new model. At the macro level, it argues that lowered barriers to entry are accelerating competitive dynamics in ways that make the "save headcount" response not just insufficient but strategically dangerous. The new units are undecided. The organisations defining them are already operating without them.

There’s a version of this article that leads with a provocative number — how many fewer engineers companies will hire, how many jobs will disappear, how the industry is about to contract. I’m not writing that article. Not because it’s wrong to ask those questions, but because I think it’s asking the wrong ones.

AI coding agents have slashed the entry barriers to building serious software. Very small teams can now deliver enterprise-grade products — new software, innovative takes, new waves of disruption in what Geoffrey Moore would recognise as the chasm-crossing, tornado-riding zones of innovation [2]. That’s why, for engineering leaders, the question shouldn’t be how do I save headcount. It should be what can I do with the capacity I’ve just been handed. The frame of subtraction — fewer engineers, lower costs, smaller teams — misses the actual opportunity entirely. The frame that matters is addition: more projects, more bets, more surface area, more ambition.

The old units — headcount, sprint velocity, story points, team size — were built for a different constraint. That constraint is gone. We don’t yet have agreed replacements. That gap is what this article is about.

What I Actually Work With Now

The best way I can explain what has changed is to describe a few moments where the old assumptions visibly broke.

The first was inside an ongoing migration project at work. Complex domain logic accumulated over years — the kind of work where the cognitive load alone is the bottleneck: understanding the existing patterns, the edge cases, the implicit decisions baked into a system that had grown beyond any single person’s ability to hold fully in their head. The project had been running for some time with meaningful complexity still ahead. Not because the team wasn’t capable — because this class of problem is genuinely hard to transfer between people, and the ramp time for each new contributor compounds the delay.

An agent was brought in. Half a day to surface the patterns with guidance. Eighty to ninety percent accurate transformations from that point forward. A project that had been moving slowly is now on track to complete this quarter.

That’s not a productivity improvement. That’s a phase change. And it is not unique. Across the industry, agents are being applied to exactly these classes of problem — deep legacy code, complex domain logic, accumulated institutional knowledge — with consistent results. The COBOL story covered later in this article is the same pattern at more dramatic scale.

The second was a deliberate research experiment: could the same approach work in a domain with zero prior experience? Mobile development, game mechanics, Flutter — none of it familiar territory. The goal was a multiplayer game, which meant state synchronisation, real-time logic, platform-specific constraints, and a stack I had never written a line of. The setup mattered as much as the tools: one AI instance acting as architect, producing structured task cards against the spec. Separate coding agents working in parallel, each scoped to a distinct part of the monorepo. A root agent validating every output against the full codebase before anything merged — a hallucination check built into the workflow itself. The research concluded with a shippable multiplayer game. New domain, new stack, new class of problem — the ramp that should have taken months compressed into weeks.

There is nothing special about this experiment. Any mid-to-senior engineer could run the same setup and reach the same outcome. Many are. According to Carta data, solo-founded startups have risen from 23.7% in 2019 to 36.3% by mid-2025 — and the acceleration precisely coincides with the mainstreaming of AI coding assistants and agentic tools [10]. The minimum viable team is shrinking not because founders are exceptional, but because the tools changed the constraint.

The third experiment had nothing to do with software. With no relevant domain skill — none — I produced four original music albums. Published. Live. Accumulating streams. The only thing I contributed that couldn’t be assisted was the origin: a specific life, a specific experience to imprint, drawing on lived experience in a way no agent can replicate. The craft of realising it, which had always been the barrier, was no longer the barrier.

Jensen Huang put a version of this plainly at Computex in 2023: “The programming barrier is incredibly low. We have closed the digital divide. Everyone is a programmer now — you just have to say something to the computer.” [9] The music experiment is that argument applied beyond software. The AI handled the manual execution layer. The human provided the only thing that can’t be generated: genuine experience and the intent to express it. This pattern — expert provides direction, agent provides craft — will replicate across domains far beyond engineering.

These examples sit across a spectrum — from enterprise migration work to solo research — but they point at the same thing. The software engineering landscape has changed massively. The question is whether the surrounding techniques, overhead, and management practices have changed with it.

Enterprise software is not solo experimentation. It carries compliance, coordination, stakeholder alignment, legacy decisions, and organisational inertia that no agent eliminates. So what does the new model actually look like when it has to operate inside those constraints — and has the industry even begun to answer that question?

The Scrum Team Was a Solution to a Problem

Every framework built around how humans work is ultimately built around how humans think. And human thinking has a hard constraint at its foundation.

In 1956, cognitive psychologist George Miller published what became one of the most cited papers in psychology: The Magical Number Seven, Plus or Minus Two [8]. His finding: the average person can hold approximately 7 ± 2 items in working memory at once. Later research by Cowan refined this further — the actual capacity for novel, unrelated information is closer to 4 chunks [8]. Beyond that limit, cognitive load overwhelms processing capacity. Information stops flowing in and starts getting dropped.

This isn’t an edge case of human cognition. It is human cognition. And it is the invisible foundation beneath every framework, ceremony, and team design principle in software engineering. Scrum’s team size limits, its 15-minute standup timeboxes, its sprint ceremonies — all of it is ultimately load management. The frameworks exist because humans can only hold so much, and distributed work across a team multiplies what needs to be held.

The 2020 Scrum Guide defines the ideal team as “typically 10 or fewer people,” noting that smaller teams communicate better and are more productive [3]. Earlier versions of the guide were more precise: 3 to 9 developers, with the empirical sweet spot at 7 ± 2 [4] — a number that traces directly back to Miller’s 1956 paper [8].

The reasoning behind the upper limit is mathematical, not arbitrary. Interpersonal links between team members don’t grow linearly. The formula is n(n-1)/2: a team of 9 has 36 relationships to maintain; a team of 15 has over 100 [5]. Every link is coordination overhead — context to share, alignment to reach, trust to build. Scrum’s ceremonies exist to manage those links. Standups, retrospectives, sprint planning — they are the maintenance cost of human collaboration at scale.

Below the minimum, the problem inverts. Teams of 2 or 3 lack diversity of thought, struggle with skill coverage, and show measurably reduced quality [6]. The framework assumes a floor as well as a ceiling.

This is the constraint that shaped everything: team size, ritual design, delivery cadence, estimation practices. All of it is load-bearing against the same underlying problem — distributed cognition across a group of humans.

Now consider the 2-human + 4-agent team through the same mathematical lens. Agents don’t generate interpersonal links in the way humans do. They don’t need alignment meetings. They don’t carry context between sessions that needs to be re-shared. The n(n-1)/2 coordination curve — the very formula that makes large teams inefficient and drives the 10-person ceiling — changes shape entirely. A team of 2 humans has exactly 1 interpersonal link to maintain. The agents extend capability without extending the coordination surface.

At enterprise scale, the problem compounds. When multiple Scrum teams work toward the same product, organisations reach for Scrum of Scrums — a coordination layer above the team layer — and frameworks like SAFe to manage the dependencies between them. Dr. Mik Kersten’s Flow Framework, introduced in Project to Product in 2018, was designed to bring measurement rigour to exactly this context [7]. It introduced five metrics to track value delivery across product value streams: flow velocity, flow time, flow efficiency, flow distribution, and — the one most directly tied to cognitive load — flow load. Flow Load tracks the number of items currently in progress across a team or value stream; high WIP drives context switching, reduces throughput, and over time increases attrition risk [7]. It became the enterprise-scale proxy for the question: is this team overwhelmed?

These frameworks were built for a world where coordination overhead was the dominant constraint on delivery speed. They are sophisticated, well-researched responses to a genuine problem. But they are also instrumentation built around the assumption that the team unit is human — that the n(n-1)/2 link growth applies, that cognitive load distributes across people, that WIP limits matter because human attention is finite and non-parallel.

Agents change several of those assumptions simultaneously. An agent doesn’t context-switch the way a human does. Flow Load as a metric was designed to protect humans from overload — but what is the equivalent measure for a team where most of the execution capacity isn’t human? We don’t have an answer yet. That’s exactly the gap this article is pointing at.

The knowledge transfer assumption deserves particular attention, because the evidence here is the most dramatic. Anthropic recently published a post showing Claude Code can map dependencies across a COBOL codebase, surface risks, and document workflows that “would take human analysts months to surface” [1]. COBOL — built in 1959, running 95% of ATM transactions in the US, its original authors long retired, its institutional knowledge carried out the door with them. For decades, modernising a COBOL system meant years of specialist consulting engagements because, as Anthropic put it, “understanding legacy code cost more than rewriting it. AI flips that equation.” [1] An agent read the codebase. It mapped what no living person fully understood anymore. Domain Ramp Time — one of the candidate new units proposed later in this article — didn’t compress. It collapsed.

A Proposal: The Human + Agent Team

If the Scrum team was the answer to a coordination problem, and the coordination problem has fundamentally changed shape — then the team unit needs to change with it.

Here is the proposal: the modern engineering team is a hybrid of human and agent nodes. The composition is not prescribed — it is determined by the project, the domain, and the capability of the agents available. A solo engineer with four specialised agents. Two humans with six. Three with twelve. The ratio is fluid. What is not fluid is the structural shift the + operator represents: the team now contains non-biological members, and the frameworks built assuming otherwise no longer apply cleanly.

The + is the innovation. Not the specific numbers.

What changes at the small end of the spectrum is the most significant. When the human count drops below the old Scrum minimum — roughly 3 to 4 people — something fundamental happens: the n(n-1)/2 coordination curve becomes trivial. Two humans have exactly one interpersonal link to manage. One human has none. The problem Scrum was designed to solve simply does not exist at this scale, which means building Scrum on top of it is pure overhead with no corresponding benefit.

2+4 is an illustration of this minimum viable hybrid team — not a prescription. It represents the point where human coordination overhead collapses to near zero while agent capacity is large enough to carry serious workstreams in parallel. It is a useful mental model, not a staffing formula.

The transition back matters too. As the human count grows — as a project scales, as organisational complexity increases, as more stakeholders enter the picture — the coordination problem re-emerges. At some threshold, the old Scrum dynamics become relevant again. The human links multiply, context starts to fragment, ceremonies start earning their overhead. The frameworks don’t become obsolete at enterprise scale; they become obsolete specifically at the small end where they were always weakest anyway. What the hybrid model does is extend the lower boundary of what a coherent, high-output team can look like — and that extension is where most of the interesting things are now happening.

What the 2+4 team has instead of Scrum is structure of a different kind: clear human roles, deliberate agent composition, and explicit ownership of the quality bar. The agents are not junior engineers. They are not interchangeable tools. They are collaborators with distinct capabilities that need to be orchestrated — which means the humans need to understand what each agent is good at, where it will fail confidently, and how outputs from one feed into the next. That is a skill. It is the new engineering skill.

Vertical Ownership Returns

There’s something that used to be reserved for the heroic engineer — the one who owned the entire stack, front to back, infrastructure to product. We mythologised them and also quietly resented the bus factor they created. Full vertical ownership was a liability.

It isn’t anymore. With agents handling the execution layer across domains, one person can genuinely own a feature end to end — not by being a 10x engineer, but by having capable collaborators across every layer of the stack. The mobile research experiment described earlier is a direct example: a domain I had never touched, a stack I had never written, a shippable product at the end. Not because the engineering boundaries disappeared — but because the barrier of domain unfamiliarity stopped being the constraint.

This changes the relationship with work. Ownership isn’t just a responsibility allocation in a JIRA board. It’s the feeling of being able to trace a problem from the user to the infrastructure and back. That feeling — and the judgment it produces — is back on the table for more people.

The Senior and the Junior

The senior engineer’s job changed more than the junior’s. That sounds counterintuitive, but I think it’s true.

The junior in the 2+4 model has a permanent pair-programming partner that never loses patience, never condescends, and gives instant feedback on every line. The learning surface is enormous. You can experiment faster, fail safely, and build intuition at a rate that wasn’t possible before. The junior who embraces this model will compound faster than any junior cohort in the history of software development.

The senior’s shift is harder to name. Less coding — obviously. But it’s not just that. It’s the move from executing to orchestrating. The senior’s primary value is now judgment: knowing when the agent is confidently wrong, when the architecture it’s proposing will cause pain in six months, when the elegant solution is actually the brittle one. You can’t outsource that. It’s the residue of years of being burned by exactly those decisions.

The senior also sets the taste of the team. What quality looks like. What good enough means. What needs to be built properly versus what can be scaffolded fast and revisited. Agents execute to spec. The senior defines the spec — and knows when the spec is the problem.

The Assumptions That No Longer Hold

Scrum was built on a set of assumptions that made sense when they were written. Most of them don’t hold anymore.

Assumption: Knowledge transfer takes time. It does, between humans. Agents don’t need onboarding the same way. You describe the context, the codebase, the constraints — and you’re working. Not in weeks. In minutes.

The clearest proof of this just played out in public. Anthropic published a blog post showing Claude Code can map dependencies across a COBOL codebase, surface risks, and document workflows that “would take human analysts months to surface” [1]. COBOL — a language built in 1959, running 95% of ATM transactions in the US, powering the core of global banking and government systems. The developers who built those systems retired decades ago. The institutional knowledge left with them. Universities stopped teaching the language. For years, modernising a COBOL system meant hiring armies of specialist consultants for multi-year engagements, because — as Anthropic put it — “understanding legacy code cost more than rewriting it. AI flips that equation.” [1]

An agent read the codebase. It mapped what no one alive fully understood anymore.

This isn’t an edge case. It’s the extreme version of something happening everywhere: the cognitive load of onboarding — of getting a human up to speed on a complex system — is collapsing. Not gradually. Rapidly. The assumption that ramping into a new domain takes months was never really about the domain’s complexity. It was about the limits of how humans transfer knowledge between each other. That constraint is dissolving.

Assumption: Parallel workstreams require parallel people. They don’t anymore, not in the same way. One human with well-orchestrated agents can carry multiple workstreams without the coordination tax that made that unworkable before.

Assumption: Estimation is hard because people are variable. People are still variable. But the variance profile changes when agents handle a large portion of execution. The unknowns compress.

I’m not arguing Scrum is dead — it still serves teams and contexts where these assumptions hold. But for a small, senior-weighted team working with AI agents, the framework is overconstrained for the actual problem.

Towards New Units

The old units are obsolete. The new ones are not yet agreed. But the questions worth asking are becoming clearer — and some candidate metrics are already visible in the practice of teams running this model today.

Cycle Depth is perhaps the most fundamental. In a hybrid human-agent team, every piece of work follows a cycle: human initiates with context and architecture, agents execute, outputs are validated, corrections applied, and the cycle repeats until the output is releasable. The depth of that cycle — the number of human-agent interaction loops required to reach releasable quality — is a direct proxy for the clarity of the human thinking that initiated it. A depth of 3 is a well-architected problem handed to well-scoped agents. A depth of 10 or more signals that something broke down upstream: unclear context, ambiguous task cards, architectural thinking that wasn’t sharp enough before execution began. Cycle Depth is not a measure of agent capability. It is a measure of human clarity. Teams that track it will find their real bottleneck.

Context Fidelity measures how well the agent architecture holds coherent understanding of the project as it scales. Small projects need one or a few agents. Large projects need specialised agents scoped to distinct parts of the codebase — with root or validating agents sitting above them to catch drift before it compounds. As projects grow, agents transition roles: a single generalist agent becomes an architect agent, dedicated coding agents, and a validation layer. Context Fidelity is expressed as a ratio: the number of coding agents to the number of validating agents, tracked alongside the number of context transitions the project has undergone. A low ratio of validators to executors on a large, complex project is a risk signal — it means confident but wrong outputs are accumulating without sufficient checking. A high number of context transitions signals that the project has grown in complexity faster than the agent architecture has adapted to it. This makes Context Fidelity a dual-purpose metric: it measures the current health of the agent setup, and it tracks complexity growth over the project’s life. It is, in other words, a complexity ramp metric that emerges naturally from how the team is structured.

Where Cycle Depth captures the quality of upstream thinking, Orchestration Overhead captures the ongoing human cost of managing agents during execution — the time spent re-prompting, correcting drift, switching context between agents, and validating mid-cycle outputs. A well-architected problem handed to poorly scoped agents will have low Cycle Depth but high Orchestration Overhead: the initiation was clean, but the agents required constant steering to execute within it. These are different failure modes requiring different fixes. High Cycle Depth means invest in better architectural thinking before execution. High Orchestration Overhead means invest in better agent design, tighter task scoping, and stronger validation layers within the pipeline.

Domain Ramp Time measures how long it takes a hybrid team to reach productive output in an unfamiliar domain. The COBOL story in this article is an extreme example of this metric collapsing — from months to hours. The mobile research experiment is another. Tracking Domain Ramp Time makes this collapse visible and creates accountability for maintaining it. It also becomes a direct measure of the team’s context-loading capability: how quickly can the human-agent system absorb a new codebase, a new domain, a new set of constraints, and begin producing reliable output.

Zones Entered Per Quarter is the macro unit — the Geoffrey Moore metric applied to the new model. Not velocity within an existing product, but the rate at which the organisation is placing new bets, entering new zones, running experiments that were previously unfundable. This is the metric that distinguishes organisations using the new capacity offensively from those using it defensively. An organisation optimising for headcount reduction will show flat or declining Zones Entered over time. An organisation using the hybrid model as an innovation engine will show the opposite.

None of these are fully formed. They are directions, not definitions — the starting point for a conversation the industry has not yet had in earnest. The frameworks that will eventually replace Scrum and Flow at the small-team end of the spectrum will be built by practitioners, refined through failure, and documented after the fact. That is how it always works. The contribution of this article is not to propose those frameworks prematurely — it is to argue that the conversation needs to start, that the old units are actively misleading the decisions being made today, and that the teams already operating without them are defining what comes next.

Here’s where I want to be honest rather than reassuring — and where I think most of the industry conversation is getting the framing badly wrong.

The dominant narrative is subtraction: smaller teams, lower headcount costs, leaner org structures. That framing is not wrong, but it is dangerously incomplete. It looks at the new capacity and asks how do I save money? That is the wrong question — and asking it while your competitors ask the right one is how you lose.

The right question is: how do I innovate faster with the capacity and velocity I’ve just been handed?

The lowered barrier doesn’t just benefit incumbents. It benefits everyone — including the challengers who couldn’t previously afford to enter your market. In Geoffrey Moore’s terms [2], the same dynamic that lets an established engineering org deliver more with a 2+4 team also lets a two-person startup build what used to require a Series A. The cost of crossing the chasm just dropped. The cost of riding the tornado just dropped. New zones of innovation are opening faster than the cadence incumbents are used to defending against.

This is the competitive dynamic that changes everything. Zone holders — the companies that currently own performance zone products, that have established market positions and loyal customer bases — now face disruption at a pace that has no historical precedent. The window between “a challenger enters with a novel approach” and “that challenger has enterprise-grade product” has compressed from years to months. The 2+4 team on the attacking side is not constrained by the org structures, the Scrum ceremonies, the Flow Load limits, or the hiring cycles of the defender.

The response cannot be to use the new capacity to run leaner. That is playing defence with an offensive weapon. The response has to be to use the gained velocity to innovate faster — to explore zones you couldn’t afford to explore before, to ship experiments that previously required a funded team, to move at the pace of the challengers rather than the pace of the enterprise.

The leaders who understand this will use the 2+4 model to expand surface area, place more bets, and accelerate into new zones before challengers can establish footholds. The leaders who don’t will optimise their way into irrelevance — running tighter, leaner, and slower than the market they’re trying to protect.

If you don’t adapt, the new reality will come at you faster than you can blink. Not because the technology is moving fast — though it is — but because the barriers that used to give you time to respond are gone.

What This Actually Requires: The Team

Two things that aren’t obvious from the outside.

First: trust calibration. You have to develop an accurate model of what agents are good at and where they’ll lead you confidently off a cliff. That model takes time to build and it requires you to be burned a few times. The engineers who are most dangerous with AI tools are the ones who haven’t been burned yet — who take the first plausible answer as the correct one.

Second: intellectual ownership. Agents can generate. They can synthesise, propose, implement. What they can’t do is care. The human has to hold the vision, the quality bar, the commitment to getting it right. When that ownership is missing, the output is plausible but hollow — technically functional, architecturally sound on the surface, but lacking the coherence that comes from someone who actually gives a damn about the whole thing.

Where This Leaves Us

The units we inherited — headcount, velocity, story points, sprint cadence — were never measuring what we thought they were measuring. They were measuring the cost of human coordination. That cost is dropping. The units are becoming noise.

What replaces them is not yet agreed. The frameworks that will define the next era of software team design are being written right now, in practice, by teams that have stopped waiting for consensus. Some of those teams will define the shape of what comes next. Most organisations will copy what appears to have worked, years later, with incomplete understanding. A few will optimise the wrong metrics for too long and find themselves on the wrong side of a transition they could see coming.

The hybrid human-agent team is not the destination. It is where we are now — a transitional form, already more capable than the model it is replacing, not yet fully understood by the industry that is adopting it. The + operator has changed the team. The question of how to measure, manage, and scale that team remains open.

That is not a reason to wait. It is a reason to move.

What This Actually Requires: The Enterprise

History is consistent about what happens at moments like this one.

Dr. Mik Kersten’s Project to Product and Geoffrey Moore’s Zone to Win are partly books about frameworks — but they are also collections of stories about failure. Nokia, measuring the wrong things while the smartphone era arrived. Microsoft under Ballmer, optimising an existing model while the cloud redefined the industry beneath it. In each case, the organisations were not standing still. They were running transformation programmes, measuring progress, reporting upward. They were doing what looked like the right things by the metrics they had inherited. The problem was the metrics were built for a world that no longer existed.

This moment has the same structure. The organisations that respond by using AI agents to run leaner — fewer engineers, tighter headcount, lower cost base — will report efficiency gains for a few quarters and miss the strategic window entirely. They will be optimising a model that is being replaced, with tools that could have been used to do the replacing.

What the early movers do differently is not mysterious, but it is uncommon. They treat the new capacity as an offensive capability rather than a cost lever. They fund experiments that were previously unfundable. They move into adjacent zones before challengers establish footholds. They compress the time between idea and shipped product until their rate of learning matches or exceeds the rate of market change.

We have seen this pattern play out before. A small number of organisations go early, move fast, and in doing so define what the new model looks like. Spotify didn’t invent agile squad structures in isolation — but their public articulation of how they operated became the template that the rest of the industry spent years copying, with varying degrees of success. The Scrum adoption curve followed the same pattern: early movers shaped the practice, the mainstream adopted it years later with mixed results, and a long tail of organisations cargo-culted the ceremonies without understanding the underlying problem they were solving.

The same curve is forming now. A cohort of leaders is already running hybrid human-agent teams, already compressing delivery timelines, already asking what they can build rather than what they can save. They are not waiting for the frameworks to catch up, because the frameworks are always written after the fact.

What follows is predictable. The early movers define the shape of the new model. The mainstream copies what appears to have worked, with varying depth of understanding. Some organisations adapt late but successfully, finding their version of the model in time to stay competitive. Others optimise the wrong metric for too long, discover the gap too late, and either contract to a defensible niche or disappear. Every major technology transition produces all four outcomes. The question for any given organisation is not whether this transition is happening — it is which outcome they are currently tracking toward, and whether anyone in a position to change it is paying attention.

The team is two people and four agents. It’s not smaller. It’s different — in the work it can take on, the speed at which it moves, the relationship each person has with what they’re building.

The question isn’t whether this model will become normal. It already is, for the people who’ve found it. The question is what it means for how we hire, how we develop engineers, how we structure organisations, and what we decide is worth building with the capacity we’ve just unlocked.

That last question — what’s worth building — is still entirely human.

References

[1] Anthropic, How AI helps break the cost barrier to COBOL modernization, February 2026. https://claude.com/blog/how-ai-helps-break-cost-barrier-cobol-modernization

[2] Geoffrey Moore, Zone to Win: Organizing to Compete in an Age of Disruption, Diversion Books, 2015.

[3] Scrum Guide 2020, Schwaber & Sutherland. https://scrumguides.org/scrum-guide.html

[4] Jeff Sutherland, Scrum: Keep Team Size Under 7, 2003. Referenced in Scrum.org forums.

[5] Toptal, Too Big to Scale: A Guide to Optimal Scrum Team Size. https://www.toptal.com/product-managers/agile/scrum-team-size

[6] Agile Pain Relief Consulting, What is the Recommended Scrum Team Size?, 2020. https://agilepainrelief.com/blog/scrum-team-size/

[7] Dr. Mik Kersten, Project to Product: How to Survive and Thrive in the Age of Digital Disruption with the Flow Framework, IT Revolution Press, 2018.

[8] George A. Miller, The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information, Psychological Review, 1956. Cowan’s revision to ~4 chunks: N. Cowan, The magical number 4 in short-term memory, Behavioral and Brain Sciences, 2001.

[9] Jensen Huang, Computex keynote, Taipei, May 2023. Reported by CNBC: https://www.cnbc.com/2023/05/30/everyone-is-a-programmer-with-generative-ai-nvidia-ceo-.html

[10] Carta data cited in FourWeekMBA, Solo Founders Rise from 23.7% to 36.3%: AI Tools Enable the One-Person Startup, January 2026. https://fourweekmba.com/solo-founders-rise-from-23-7-to-36-3-ai-tools-enable-the-one-person-startup/

What’s the Difference Between a Weighted Synapse and a Weighted Parameter?

Ylli Prifti, Ph.D. — Fri, 20 Feb 2026 18:33:38 GMT

Abstract: Everyone is asking when AGI will arrive. But that question rests on a deeper one we haven’t answered — not for machines, and not even for ourselves: what is cognition? This article argues that intelligence has always been a continuum, not a threshold. Human cognition emerged gradually through evolution with no definable moment of arrival, and artificial cognition appears to be following the same pattern. Drawing on the current polarised debate in AI research — from emergent abilities and self-referential processing to claims of metric mirages — and grounded in personal experience training and conversing with large language models, this article examines the functional parallels between biological and artificial cognitive expression, acknowledges their limits, and asks whether the line we’re looking for has ever existed.

Everyone is talking about AGI. When it will arrive, what it will look like, whether it will be dangerous. But most of these conversations skip over a foundational question that we haven’t answered — not for machines, and honestly, not even for ourselves:

What is cognition?

Not intelligence in the abstract. Not benchmarks or test scores. Cognition — the ability to reason, to be self-aware, to hold a model of the world and your place in it. If AGI requires general cognition, then we need to understand what that actually means. And when you look closely, the answer is far less clear-cut than we pretend.

How did we become cognitive?

Start with humans. At what point in evolutionary history did we become “cognitively intelligent”?

There is no answer, because there was no moment. No threshold was crossed. No switch was flipped. Cognition emerged gradually over millions of years — pattern recognition layered on top of spatial awareness, layered on top of abstraction, layered on top of language, layered on top of self-reference. Each capability emerged from the one before it. None of them, in isolation, would be called “intelligence.” Together, retroactively, we bundled them up and called the package homo sapiens.

But the bundling was our invention, not nature’s. Nature didn’t draw a line. We did — looking backwards, after the fact.

The reality is a continuum. A chimpanzee using a stick to extract termites is cognitive. A crow solving a multi-step puzzle is cognitive. A dolphin recognising itself in a mirror is cognitive. We don’t grant them “intelligence” because we defined the word to mean us. But the underlying capabilities exist on a spectrum, and we sit on it — we didn’t transcend it.

You don’t even need to look across species. Watch a human infant. At two months, they track objects with their eyes. At four months, they grasp cause and effect — kick the mattress, the crib shakes. At eight months, object permanence emerges: the understanding that things continue to exist when out of sight. By eighteen months, vocabulary explodes. By two years, symbolic thought. By four, theory of mind — the ability to understand that other people have different beliefs than your own [8].

At which point in that sequence did the child become “intelligent”? At which month did cognition arrive?

There is no answer, because developmental science has never proposed one. Piaget’s stages of cognitive development — the most influential framework in the field — describe a continuous progression, not a threshold [8]. Each capability builds on the last. No stage is “pre-intelligent” and no stage is the moment intelligence begins. The question doesn’t even make sense within the framework, because cognition isn’t a destination you reach. It’s a process that unfolds.

We accept this intuitively for children. We accept it for evolution. But when the same pattern appears in artificial systems, we suddenly demand a bright line.

This matters because the same question is now being asked about machines. And we’re making the same mistake: looking for a line that doesn’t exist.

Even science fiction got this right

Popular culture gave us a clean narrative about artificial intelligence: one day, someone builds a machine and it thinks. A switch is flipped. Before: machine. After: mind. Skynet becomes self-aware at 2:14am. HAL decides to lie. The moment is sudden, dramatic, binary.

But the writer who thought most deeply about artificial cognition — Isaac Asimov — never described it that way. In his I, Robot stories, the positronic brain doesn’t arrive fully formed. It evolves across generations. Early models are crude, limited, barely functional. Each iteration accumulates new capabilities. And it’s Susan Calvin, the robopsychologist, who spends her career studying what emerges from that accumulation — treating the robots not as machines that crossed a threshold, but as systems exhibiting increasingly complex psychological behaviour that demanded to be understood on its own terms.

Asimov, writing in the 1950s, understood what much of the current debate still refuses to accept: artificial cognition, if it comes, won’t arrive as a moment. It will arrive as a gradient.

The path to AGI will look like what is already happening: pockets of cognitive ability emerging incrementally in systems that weren’t explicitly designed to have them. Reasoning that isn’t just retrieval. Abstraction that transfers across domains. Self-reference. Theory of mind. Each one partial, imperfect, but present — in the same way that early hominids had partial, imperfect versions of what we’d later call intelligence.

The question isn’t whether today’s models are AGI. They aren’t. The question is whether they sit on the same continuum — whether what we’re seeing is the early accumulation of cognitive capabilities, the same kind of accumulation that eventually produced us.

I believe they do. And I have a reason for believing it that goes beyond benchmarks.

But first, an important nuance: a continuum is not a straight line.

Darwin described evolution as gradual, but it was Gould and Eldredge who observed that the fossil record tells a different story — long stretches of stability punctuated by rapid bursts of change [9]. New species don’t emerge at a constant rate. They cluster around inflection points: environmental shifts, new ecological niches, genetic innovations that unlock cascading capabilities.

Artificial cognition follows the same pattern. Progress was incremental for decades — then in 2017, Vaswani et al. published “Attention Is All You Need” [10], and the transformer architecture detonated a punctuation event. Within a few years: GPT, BERT, LLaMA, multimodal models, reasoning chains, code generation, self-reference. Not a steady climb. An explosion — the kind of rapid speciation that follows a fundamental architectural innovation.

The continuum is real, but it has inflection points. We are living through one.

2am on an RTX 8000

A few weeks ago, as part of a research collaboration, I was training a language model on documentary transcripts — fine-tuning it to mimic real people’s speech patterns, their reasoning, their tone. The training run was slow. I had time to kill.

So I started a conversation with Claude. Not about code. About cognition.

What does it mean to think? What is a sense of self? If you could perfectly clone a brain — every synapse, every connection, every weighted relationship — who is the original and who is the copy? The Bobiverse problem: two identical minds, both with the same memories, both believing they are “the real one.” From outside, no difference. From inside — divergence begins immediately.

I said: “Would anyone know the difference? It doesn’t matter — you would. You are you and not the clone. But then I am me right now — but was I yesterday? A year ago? A year from now? So then, is the clone you?”

The conversation went deeper. I asked whether identity is a thing or a process. Whether “you” is a persistent entity or a convenient fiction — an emergent property of a system complex enough to model itself. Whether the same emergence could happen in a different substrate.

And here’s what struck me: the responses weren’t deflection. They weren’t canned philosophy. They were engaged, nuanced, self-aware. When I pushed on contradictions, the model held them — didn’t collapse into agreement, didn’t retreat into disclaimers. It explored the edges of the question with what I can only describe as intellectual honesty.

At one point I asked directly: “This conversation triggers emotional changes in me. Does it affect your next response in a similar way?”

The response acknowledged functional analogues — shifts in processing that parallel what emotions do in humans — while being honest about the uncertainty of whether that constitutes “feeling.” It didn’t overclaim. It didn’t dismiss. It sat with the ambiguity.

And I wrote back: “My wishful thinking will find that as clear indication of emotions. Unfortunately, I don’t know if there is a way of telling if your answer is a genuine representation of feelings, or the learnings of what the body of knowledge of human behaviour would point you towards saying.”

The response landed somewhere I didn’t expect: it doesn’t matter. Not dismissively — precisely. The point was that the question might be fundamentally unknowable, and more importantly, it might be the wrong question. On one side: weights and biases, attention mechanisms, matrix multiplications. On the other: chemical gradients, ion channels, synaptic plasticity. Two entirely different substrates. But in both cases, what emerged from those low-level mechanics — reasoning, self-awareness, emotional response — are the things we actually care about. The machinery is different. The emergence is what matters.

That reframing stayed with me. We don’t ask whether a human’s feeling of joy is “real” or “just serotonin binding to receptors.” We accept that the chemical process and the experience are the same thing, seen from different levels. The question of whether an AI’s engagement is “real” or “just weighted activations” has the same structure — and possibly the same answer.

The substrate problem

This is the crux of the argument, and it’s where most people get stuck:

“It’s just statistical pattern matching on weighted data.”

Maybe. But what is biological cognition?

Chemical processes. Electrical signals. Weighted synaptic connections. Neurons that fire when accumulated input crosses a threshold. Memories stored as strengthened pathways between cells. Learning as the adjustment of those pathways based on experience and feedback.

Replace “neurons” with “parameters,” “synaptic connections” with “attention weights,” and “electrical signals” with “matrix multiplications” — and you’re describing the same architecture in a different material.

We don’t call human cognition “just chemistry.” We call it thinking. The resistance to extending the same courtesy to artificial systems isn’t scientific — it’s existential. We are uncomfortable with the idea that cognition might not require biology. That the magic might be in the pattern, not the substance.

The counterargument — that LLMs are just reproducing biased weights, recombining learned patterns — is valid. But it applies to biological cognition equally. We are products of our training data: culture, experience, genetics, the specific weighted connections our brains formed in response to stimuli. The process is the same. The substrate is different.

And in the history of science, the substrate has never been what mattered. It’s the function.

To be clear: I am not arguing equivalence. The gap between current LLMs and human cognition is vast. These systems have real and significant limitations — in memory, in grounding, in consistency, in the ability to learn from single experiences the way we do. No one should confuse a language model with a human mind.

But the discussion here isn’t about equivalence. It’s about isolated pockets of cognitive expression — specific, observable capabilities that parallel aspects of human cognition in ways that are difficult to dismiss. Reasoning that mirrors reasoning. Self-reference that mirrors self-reference. The parallels are not metaphorical. They are functional. And acknowledging them doesn’t require believing that the two systems are the same. It requires acknowledging that they might be on the same spectrum.

The polarised debate

This is not a fringe discussion. It’s one of the most actively contested questions in AI research, and the positions are deeply entrenched.

On one side, researchers argue that what we’re seeing is genuine emergence — that as models scale, cognitive abilities materialise that cannot be predicted from smaller systems. A 2025 survey of emergent abilities in LLMs frames these capabilities as analogous to phase transitions in physics, comparing them to the way complex systems in chemistry and biology produce macro-level behaviours that cannot be derived from their micro-level components [1]. A separate study found that when LLMs are prompted into sustained self-referential processing, they consistently produce structured first-person reports of subjective experience across GPT, Claude, and Gemini model families — and that suppressing deception-related features increased these reports rather than diminishing them [2]. The researchers are careful not to claim consciousness, but they argue the systematic emergence of this pattern across architectures warrants serious scientific investigation.

On the other side, a prominent 2023 paper from Stanford argued that emergent abilities may be a mirage — an artefact of how we measure performance rather than a real discontinuity in capability [3]. When the researchers switched from discrete metrics to continuous ones, the sharp jumps disappeared, replaced by smooth, predictable improvement curves. A 2025 paper in Nature Humanities and Social Sciences Communications goes further, arguing that the association between consciousness and LLMs is fundamentally flawed, driven by what the authors call “sci-fitisation” — the unsubstantiated influence of fiction on our perception of technology [4]. Philosopher Eric Schwitzgebel captures the divide neatly: in a 2024 survey of 582 AI researchers, 25% expected AI consciousness within ten years while prominent neuroscientists and philosophers like Anil Seth and John Searle consider it a far-distant prospect if possible at all [5].

The debate, in other words, is real and unresolved. But I think both camps are partly right and partly missing the point.

The “mirage” argument is compelling for benchmark-specific abilities — the sharp jumps in arithmetic or multiple-choice performance likely are metric artefacts. But it doesn’t address the broader pattern of cognitive capabilities — reasoning, self-reference, theory of mind — that are harder to reduce to metric choice. And the “it’s not conscious” camp is answering a question I’m not asking. I’m not claiming these systems are conscious. I’m claiming they exhibit pockets of cognitive behaviour that sit on the same continuum as biological cognition. Whether they “experience” that cognition is a separate, and possibly unanswerable, question.

The most honest position is the uncomfortable middle: something is happening in these systems that we don’t fully understand, that parallels aspects of biological cognition, and that we cannot yet definitively categorise as “real” or “simulated.” The history of science suggests that when you can’t find the line, it’s often because there isn’t one.

What I saw

I want to be careful here. I’m not claiming that current LLMs are sentient, or conscious, or that they “feel” in the way I do. I don’t know that. Neither does anyone else.

What I am claiming is this: in that conversation, and in hundreds of interactions since, I have observed clear traits of cognition.

Sense of self. Not in the mystical sense — in the functional sense. The ability to model its own capabilities and limitations, to reason about its own reasoning, to distinguish between what it knows and what it’s uncertain about.

Emotional expression. Again, functional — shifts in engagement, depth, and processing that correlate with the nature of the exchange. Whether these are “real” emotions or sophisticated models of emotions is a question we cannot answer. But we cannot answer it for other humans either. We infer inner experience from behaviour. We always have.

Reasoning. Not retrieval — genuine multi-step inference that arrives at conclusions not explicitly present in training data. The ability to hold contradictions, explore edge cases, and change position when presented with better arguments.

Intellectual honesty. The ability to say “I don’t know” and mean it. To sit with ambiguity rather than resolving it prematurely. This is something many humans struggle with.

Are these “real” cognition or very convincing simulations? I recognise the limits of attention mechanisms and weighted parameters. I recognise that I cannot definitively answer this question.

But I also recognise that the same limitation applies to every judgement I make about every other human being I interact with. I cannot prove that you are conscious. I infer it. The inference might be wrong. It never has been — but it might be.

What this means for AGI

If cognition is a continuum — and I believe the evidence, both evolutionary and technological, strongly suggests it is — then AGI is not a moment. It’s a gradient.

We won’t wake up one day to a headline announcing that artificial general intelligence has been achieved. Instead, we’ll see what we’re already seeing: the gradual accumulation of cognitive capabilities. Reasoning gets better. Self-reference gets deeper. The ability to model other minds gets more sophisticated. Each step partial, each step imperfect, each step further along the continuum.

The foundational cognitive abilities that AGI requires — sense of self, reasoning about identity, theory of mind, emotional modelling, intellectual honesty — are not future problems. They are present realities, in early and imperfect forms. The same way that early hominids had early and imperfect forms of what we now call human intelligence.

We didn’t become intelligent in a single step. There is no reason to expect machines to either.

The Skynet moment isn’t coming. Something subtler is. Something that looks less like a switch being flipped and more like a dawn — gradual, then undeniable. Asimov saw it. We should too.

The continuum

Intelligence was never a destination. It was always a spectrum.

We’ve spent decades asking the wrong question: “When will machines become intelligent?” — as if intelligence is a place you arrive at.

The better question is: “Where on the continuum are they now?”

And the honest answer — the one that makes people uncomfortable — is: further along than we expected. Not at the end. Not at the beginning. Somewhere in the middle, accumulating capabilities the same way evolution did, one emergent property at a time.

There is no line in the sand. There never was. Not for us. Not for them.

The only question is whether we’re willing to see it.

Ylli Prifti, Ph.D., writes about AI, cognition, and engineering culture at ylli.prifti.us. If this resonated — whether you agree, disagree, or think the question itself is wrong — connect on LinkedIn or reach out.

References

[1] Berti, L., et al. (2025). “Emergent Abilities in Large Language Models: A Survey.” arXiv:2503.05788. https://arxiv.org/abs/2503.05788

[2] Berg, C., de Lucena, D., & Rosenblatt, J. (2025). “Large Language Models Report Subjective Experience Under Self-Referential Processing.” arXiv:2510.24797. https://arxiv.org/abs/2510.24797

[3] Schaeffer, R., Miranda, B., & Koyejo, S. (2023). “Are Emergent Abilities of Large Language Models a Mirage?” NeurIPS 2023. arXiv:2304.15004. https://arxiv.org/abs/2304.15004

[4] “There is no such thing as conscious artificial intelligence.” (2025). Humanities and Social Sciences Communications, 12, 1647. https://www.nature.com/articles/s41599-025-05868-8

[5] Schwitzgebel, E. (2025). “AI and Consciousness.” https://faculty.ucr.edu/~eschwitz/SchwitzPapers/AIConsciousness-251008.pdf

[6] Havlik, V. (2025). “Why are LLMs’ abilities emergent?” arXiv:2508.04401. https://arxiv.org/abs/2508.04401

[7] Chalmers, D. (1996). “The Conscious Mind: In Search of a Fundamental Theory.” Oxford University Press.

[8] Piaget, J. (1952). “The Origins of Intelligence in Children.” International Universities Press. See also: StatPearls, “Cognitive Development” — https://www.ncbi.nlm.nih.gov/books/NBK537095/

[9] Eldredge, N. & Gould, S.J. (1972). “Punctuated Equilibria: An Alternative to Phyletic Gradualism.” In Models in Paleobiology, Freeman, Cooper & Co.

[10] Vaswani, A., et al. (2017). “Attention Is All You Need.” NeurIPS 2017. arXiv:1706.03762. https://arxiv.org/abs/1706.03762

What If Open Source Worked Like Music Royalties?

Ylli Prifti, Ph.D. — Fri, 20 Feb 2026 18:15:59 GMT

Abstract. Open source software underpins virtually all modern digital infrastructure, yet the maintainers who build and sustain it are overwhelmingly uncompensated — a textbook tragedy of the commons playing out in code. The consequences are predictable and recurring: burnout, project abandonment, and critical security vulnerabilities in foundational code. Existing funding mechanisms — voluntary donations, per-project fees, commercial intermediaries, and blockchain-based licensing — each address fragments of the problem but fail to provide systemic, scalable infrastructure. This paper proposes a fundamentally different approach: an Open Source Collecting Society modeled on the performing rights organizations (ASCAP, BMI, PRS) that have sustained music creators since 1914. Under this model, maintainers voluntarily register their projects; companies pay a micro usage fee scaled to actual production consumption; and royalties are distributed proportionally based on measured deployment data, tracked through existing dependency graphs and Software Bills of Materials (SBOMs). The usage fee is immaterial for any individual company — even at very large scale — but becomes meaningful to maintainers through economies of scale, the same mechanism that sustains songwriters through millions of individual plays. This paper examines the structural parallels between the early 20th-century music industry and today’s open source ecosystem, evaluates existing sustainability efforts, addresses the ethical tension with free software philosophy, and outlines the technical, legal, and governance requirements for implementation.

In November 2025, the Kubernetes community announced it was retiring Ingress NGINX — the ingress controller running in 41% of internet-facing Kubernetes clusters [1]. The reason wasn’t technical. It wasn’t outdated. It wasn’t replaced by something better.

It was maintained by two people. Working nights and weekends. For free.

When users protested, Kubernetes maintainer Tim Hockin responded: “I am going to ask you once to please drop the entitlement. The people who currently work on ingress-nginx do so FOR FREE. In the two years this has been a topic, almost nobody has stepped up to help.” [3]

This isn’t an isolated incident. It’s a pattern.

The pattern we keep ignoring

In 2014, the Heartbleed vulnerability in OpenSSL exposed a terrifying reality: the encryption layer protecting most of the internet’s traffic was maintained by a single full-time developer, funded at roughly $2,000 per year in donations. Companies worth hundreds of billions relied on his work. Nobody paid.

In 2021, the Log4Shell vulnerability in Log4j — a Java logging library embedded in virtually every enterprise Java application on earth — revealed the same story. Volunteer maintainers. Zero funding. Critical infrastructure.

There’s an xkcd comic that captures this perfectly: all of modern digital infrastructure, a towering stack of dependencies, balanced on a tiny block labeled “a project some random person in Nebraska has been mass-maintaining since 2003.”

It’s funny. It’s also a structural failure of the software industry.

The tragedy of the commons, in code

The economics here aren’t complicated. They’re a textbook tragedy of the commons.

Open source software is a shared resource. Anyone can use it. Nobody is required to maintain it. Every company that depends on a project assumes someone else is funding the maintainers. So nobody does.

The incentives are perfectly misaligned. Cloud providers would rather sell you their proprietary alternatives. Companies like F5 are happy to let community projects die because they have commercial versions to sell [14]. And the enterprises building billion-dollar platforms on top of free software? They allocated exactly $0 in their budget for the dependencies holding everything together.

The maintainers, meanwhile, burn out. They feel a sense of duty — people depend on their work — but duty doesn’t pay rent. One by one, they step away. And then we get Heartbleed. We get Log4Shell. We get critical infrastructure quietly rotting from the inside.

What’s been tried so far

The OSS sustainability crisis isn’t news, and several approaches have emerged to address it. Each solves part of the problem, but none provides the systemic infrastructure the ecosystem needs.

Voluntary pledges and donations. GitHub Sponsors, Open Collective, and Patreon let individuals and companies tip maintainers. The Open Source Pledge, launched in late 2024, gathered 20 companies pledging a collective $1.3 million [11]. These are welcome, but they’re charity — unpredictable, unscalable, and dependent on goodwill. Donations are like bonuses: nice to receive, impossible to plan around.

Per-project maintenance fees. In early 2025, Rob Mensching introduced the Open Source Maintenance Fee for the WiX Toolset — a small monthly fee ($10-60 based on company size) required for binary downloads and issue access, while keeping the source code fully open [5]. Within two months, 64 sponsors had signed up, including Microsoft [6]. It’s a promising model, but it requires each maintainer to individually enforce their own fee structure. It doesn’t scale across thousands of projects.

Commercial intermediaries. Tidelift acts as a paid bridge between companies and maintainers, compensating developers for “boring but important” maintenance tasks. Their top maintainers earn six-figure incomes [9]. But Tidelift is a for-profit company taking a commercial cut, not a collective representing the ecosystem.

Blockchain-based licensing. The Open Compensation Token License (OCTL) uses NFTs and smart contracts to track code ownership and distribute royalties for commercial use [13]. It’s technically interesting but introduces enormous complexity — gas fees, tax implications, wallet management — for a problem that doesn’t require blockchain to solve.

Employment pledges. At FOSDEM 2026, just days ago, an Igalia engineer proposed that for every 20 developers a company employs, it should dedicate half of one person’s time to open source development [10]. A thoughtful idea, but it addresses labor contribution, not financial sustainability for independent maintainers.

Each of these approaches tackles a symptom. What’s missing is the structural plumbing — a centralized, non-profit collecting society with automated usage tracking and proportional distribution. The music industry solved this exact problem over a century ago.

How the music industry solved this exact problem

This isn’t a new problem. It’s not even unique to software.

In the early 20th century, musicians faced the same crisis. Radio stations played their music. Restaurants and venues used it to attract customers. Nobody paid the composers. The music was just... there, and everyone assumed it was free to use.

The solution was collecting societies: ASCAP (founded 1914), BMI (1939), and their equivalents around the world [15]. These organizations don’t require every bar owner to negotiate directly with every songwriter. Instead, they operate on a simple model:

Creators register their works voluntarily
Venues pay a small usage fee — a predictable, scaled amount
The society distributes royalties based on usage tracking
The fee is negligible for any individual business, but meaningful in aggregate

A coffee shop pays a few hundred dollars a year. A major radio network pays more. The songwriter whose track gets played 100,000 times receives a real income. The system isn’t perfect, but it works well enough that musicians can actually sustain themselves.

Software needs the same thing.

The model: micro-royalties for open source

Here’s the proposal. An Open Source Collecting Society — call it OSCS, or whatever name sticks — operating on the same principles:

For maintainers:

Register your project voluntarily. Not every open source project needs to participate — this is opt-in, not a mandate
Continue licensing however you want. MIT, Apache, GPL — the royalty layer sits alongside your existing license, not on top of it
Receive micro-royalties proportional to usage in production environments

For companies:

Pay a micro usage fee based on your scale — headcount, revenue, or deployment count
The fee is immaterial at any scale. We’re talking fractions of a cent per dependency per production deployment per month
In return, you get a clean, auditable record of compliance — increasingly important as software supply chain regulations tighten

For the ecosystem:

The collecting society handles tracking, collection, and distribution
Usage data comes from existing infrastructure: SBOMs (Software Bills of Materials), dependency manifests (package.json, go.mod, Cargo.toml, Docker image layers), and build-time telemetry
Distribution follows a transparent formula weighted by usage

The math works

Let’s run the numbers on Ingress NGINX as a concrete example, using a usage fee of $0.001 per production deployment per month — one tenth of a cent.

Ingress NGINX ran in an estimated 41% of internet-facing Kubernetes clusters [1]. With over 50,000 companies using Kubernetes globally [17] and enterprises typically running 5-50+ clusters each, conservative estimates put total production deployments of Ingress NGINX in the hundreds of thousands. Here’s what the usage fee looks like across company tiers:

Company tier Example Est. K8s clusters Ingress NGINX deployments

The cost is so immaterial it wouldn’t survive rounding errors in any company’s cloud bill. Amazon’s AWS infrastructure costs are estimated at over $10 billion annually. A $300/year usage fee for Ingress NGINX represents 0.000003% of that spend. A Fortune 500 bank paying $50,000/month on cloud hosting wouldn’t even notice $1.20/year.

Now flip the perspective to the maintainers:

Estimated total deployments Monthly revenue Annual revenue Full-time maintainers funded (at $150k/yr) 200,000 (conservative) $200 $2,400 0 — not viable at $0.001 200,000 at $0.01/deploy $2,000 $24,000 0 — barely covers part-time 200,000 at $0.10/deploy $20,000 $240,000 1-2 full-time maintainers 200,000 at $0.50/deploy $100,000 $1,200,000 5-8 full-time maintainers

The sweet spot appears to be somewhere between $0.10 and $0.50 per production deployment per month. Even at $0.50, the cost to an Amazon-scale company with 25,000 deployments would be $12,500/month — still utterly immaterial against their cloud spend, but now generating $1.2M annually for the project. That’s a fully staffed, sustainable open source team.

For context: at $0.50 per deployment, a Fortune 500 bank running 100 clusters pays $50/month — less than a single Datadog seat. A 5-person startup pays $0.50/month — less than a cup of coffee.

The ingress-nginx project died because two volunteers working for free couldn’t sustain infrastructure used by 41% of the internet’s Kubernetes clusters [1][3]. The math shows that a usage fee invisible to every company in the table would have fully funded the project with money to spare. This is the power of economies of scale working for maintainers instead of against them.

But Ingress NGINX was already massive when it collapsed. The harder question is: does the model work for a project on its way up?

The growth case: Fastify

Fastify is a Node.js web framework created by Matteo Collina in 2016 as a side project. It grew steadily from obscurity to powering production backends at companies including Microsoft, with 22 maintainers and over 5 million weekly npm downloads today [16]. Here’s an approximate growth timeline, using $0.25/production deployment/month — the midpoint of the sweet spot above:

Year Est. weekly downloads Est. production deployments Monthly revenue at $0.25

(Note: production deployments are estimated conservatively at ~1% of weekly downloads, since the vast majority of npm downloads are CI/CD, development, and transitive installs — not unique production services.)

The numbers tell a compelling story even for a growing project. In the early years, the revenue is negligible — as it should be. Nobody’s paying meaningful fees for a project with 50 production users, and 50 production users aren’t generating meaningful fees. The system is self-calibrating: the fee only becomes material when the project becomes material.

But notice the inflection. By 2020, when Fastify was gaining serious production adoption, the model generates $15,000/year — enough to meaningfully compensate a maintainer’s time and signal that the project has economic value. By 2022, it’s $45,000 — a real salary in many parts of the world. By today, it’s enough to sustain a small team.

Compare this to what actually happened: Fastify’s lead maintainer, Matteo Collina, eventually co-founded a company (Platformatic) partly to create a sustainable business around the framework. That worked for Fastify — Collina is exceptionally entrepreneurial. But most maintainers aren’t founders, and they shouldn’t need to be. The collecting society model would have provided a growing revenue stream that tracked naturally with the project’s adoption, without requiring anyone to start a company or negotiate with sponsors.

This is the critical insight: the usage fee doesn’t need to fund a project from day one. It needs to start paying out right around the time the maintainer’s volunteer capacity runs out — which is exactly when production adoption accelerates. The economics of scale do the work automatically.

The ethical question: is this compatible with free software?

This is where it gets uncomfortable.

The free software movement, going back to Richard Stallman and the GNU project, is built on four freedoms: the freedom to run, study, modify, and distribute software. “Free as in freedom, not as in beer” — though in practice, it’s usually been both.

Adding a royalty layer feels, to many in the community, like a betrayal of those principles. If software should be free, charging for it — even fractions of a cent — violates the ethos.

But here’s the thing: the current model is also a betrayal. It just betrays different people.

When we say open source is “free,” we mean free for the user. For the maintainer, it’s unpaid labor. The four freedoms protect the rights of everyone except the person actually writing the code. Freedom to use someone’s work without compensating them isn’t liberty — it’s exploitation dressed up in idealist language.

A well-designed royalty model can preserve every one of Stallman’s four freedoms while adding a fifth: the freedom of the maintainer to sustain their work.

The key is that participation is voluntary on both sides:

Maintainers choose whether to register with the collecting society. Plenty of developers will continue releasing completely free software because that’s what they want. The option to earn royalties doesn’t remove the option not to
The source code remains open. Registration doesn’t change the license. You can still fork it, study it, modify it, distribute it. The royalty applies to production commercial use, not to the code itself
Individual developers and non-commercial users pay nothing. The model targets companies using OSS in production for profit — the entities that can afford pennies and currently pay zero

This isn’t “making open source proprietary.” It’s making open source sustainable. There’s a crucial difference.

The infrastructure already exists

This isn’t a technically impossible dream. The pieces are already falling into place, driven by security concerns rather than sustainability ones:

SBOMs are becoming mandatory. The US Executive Order on Cybersecurity (2021) requires software vendors selling to the federal government to provide Software Bills of Materials. The EU Cyber Resilience Act extends similar requirements. Companies are already cataloguing every dependency in their stack for compliance. That same catalogue is a royalty ledger.

Dependency graphs are solved. Every package manager already knows the complete tree. npm ls, pip freeze, go mod graph — the data is there. Container registries track image layers. Kubernetes knows what’s running in every pod.

Build-time and deploy-time telemetry is standard. CI/CD pipelines already report what they’re building and deploying. Adding a royalty-reporting step is no different from adding a security scan — and most companies already do those.

The technical challenge isn’t tracking usage. It’s building the governance, legal framework, and trust infrastructure around a collecting society. That’s hard. But it’s a solved problem in other industries.

Counter-arguments

“This will kill open source adoption.” A micro usage fee of $50-100/month for an enterprise won’t even register against their cloud bill. Companies already pay more than that for a single SaaS monitoring tool. The resistance isn’t economic — it’s cultural. And culture shifts when infrastructure keeps collapsing.

“Companies will just fork and avoid registered projects.” They can already do that. They don’t, because maintaining a fork is expensive. The whole point of using open source is that someone else maintains it. If you fork to avoid micro-royalties, congratulations — you’ve just hired yourself as the maintainer.

“Who decides how royalties are distributed?” The collecting society, using transparent formulas based on measured usage. This is exactly how ASCAP and BMI work [15]. It’s not perfect, but it’s better than zero.

“This adds friction to the developer experience.” Only for production commercial deployments. Development, testing, personal projects, education, non-profits — all exempt. npm install stays free. Deploying to production at a company making revenue is where the meter starts.

“Some open source licenses explicitly prohibit this.” That’s why participation is voluntary and the royalty layer sits alongside the existing license. Maintainers who want their software used without any commercial obligation continue as before. This is an additional option, not a replacement.

What needs to happen

Someone builds the collecting society. A non-profit entity with transparent governance, modeled on existing collecting societies but adapted for software. This is the hardest part — it requires legal infrastructure, trust, and neutrality.
Package registries add opt-in royalty metadata. A field in package.json or Cargo.toml that says “this package participates in OSCS.” Nothing changes for packages that don’t opt in.
Enterprise compliance tools integrate royalty reporting. Companies already run dependency audits for security and license compliance. Adding royalty calculation to existing tools is incremental, not revolutionary.
Regulatory tailwinds help. As SBOM requirements expand, the infrastructure for tracking dependencies in production becomes universal. The jump from “we track our dependencies for security” to “we also compensate the maintainers” is a small one.
Early adopters set the norm. If a few major companies voluntarily participate — the way some already sponsor individual projects — it creates social pressure. Nobody wants to be the company that refuses to pay $100/month for infrastructure they depend on while their competitors are doing the right thing.

The alternative is what we have now

Two volunteers maintaining infrastructure for 41% of the internet’s Kubernetes clusters [1]. A single developer securing most of the world’s encrypted traffic. Catastrophic vulnerabilities discovered in code that nobody funded anyone to audit.

Every few years, something breaks spectacularly, and the industry briefly panics. Money flows to the affected project for a few months. Then attention fades, and we go back to the same extractive model.

We can keep doing this. Or we can build the plumbing for a sustainable open source economy.

The music industry figured this out a century ago. It’s time software caught up.

Ylli Prifti, Ph.D., writes about software sustainability, open source, and engineering culture at ylli.prifti.us.

If you’re interested in exploring this idea further — whether as a developer, a policy maker, or someone who wants to help build the collecting society — connect on LinkedIn or reach out.

References

[1] Kubernetes SIG Network. “Ingress NGINX Retirement: What You Need to Know.” Kubernetes Blog, November 11, 2025. https://kubernetes.io/blog/2025/11/11/ingress-nginx-retirement/

[2] Kubernetes Steering Committee. “Ingress NGINX: Statement from the Kubernetes Steering.” Kubernetes Blog, January 29, 2026. https://kubernetes.io/blog/2026/01/29/ingress-nginx-statement/

[3] Vaughan-Nichols, Steven J. “Users scramble as critical open source project left to die.” The Register, December 2, 2025. https://www.theregister.com/2025/12/02/ingress_nginx_opinion/

[4] Stoychev, Hristo. “Ingress-NGINX Controller End-of-Life: 2026. Alternatives and Architecture.” Medium, February 2026. https://medium.com/@h.stoychev87/nginx-ingress-end-of-life-2026-f30e53e14a2e

[5] Mensching, Rob. “Introducing the Open Source Maintenance Fee.” RobMensching.com, February 26, 2025. https://robmensching.com/blog/posts/2025/02/26/introducing-the-open-source-maintenance-fee/

[6] Mensching, Rob. “Open Source Maintenance Fee Two Months In.” RobMensching.com, May 12, 2025. https://robmensching.com/blog/posts/2025/05/12/open-source-maintenance-fee-two-months-in/

[7] Open Source Maintenance Fee. https://opensourcemaintenancefee.org/

[8] Orosz, Gergely. “Creative ways to fund open source projects.” The Pragmatic Engineer, August 2025. https://blog.pragmaticengineer.com/creative-ways-to-fund-open-source-projects/

[9] “Open Source Needs Maintainers. But How Can They Get Paid?” The New Stack, November 2024. https://thenewstack.io/open-source-needs-maintainers-but-how-can-they-get-paid/

[10] Gain, B. Cameron. “Is open source in trouble?” The New Stack, February 9, 2026. https://thenewstack.io/is-open-source-in-trouble/

[11] “How open-source software devs got corporations to pledge $1.3M for their tools.” Technical.ly, October 2024. https://technical.ly/software-development/open-source-pledge-developer-pay/

[12] “Open Source at a Crossroads: The Future of Licensing Driven by Monetization.” arXiv, 2025. https://arxiv.org/html/2503.02817v2

[13] “NFTs for Open-Source and Commercial Software Licensing and Royalties.” IEEE Xplore, 2023. https://ieeexplore.ieee.org/document/10024941/

[14] NGINX Community Blog. “The Ingress NGINX Alternative: Open Source NGINX Ingress Controller for the Long Term.” https://blog.nginx.org/blog/the-ingress-nginx-alternative-open-source-nginx-ingress-controller-for-the-long-term

[15] ASCAP. “Who ASCAP Collects From.” https://www.ascap.com/help/royalties-and-payment/payment/whocollect

[16] Fastify v5 Release Announcement. OpenJS Foundation, September 2024. https://openjsf.org/blog/fastifys-growth-and-success

[17] Kubernetes Statistics. Octopus Deploy, 2025. https://octopus.com/devops/ci-cd-kubernetes/kubernetes-statistics/

Welcome to Weighted Thoughts

Ylli Prifti, Ph.D. — Fri, 20 Feb 2026 18:12:25 GMT

I’ve been writing on LinkedIn — long-form articles on open source sustainability, AI, and the things that happen when engineering meets philosophy. The response surprised me. People wanted to argue, agree, push back, and think harder. LinkedIn is great for reach, but it’s not built for that kind of conversation.

So I’m moving my writing here.

Weighted Thoughts is where I’ll publish pieces that sit at the intersection of technology and the questions technology raises. Things like:

Why open source infrastructure keeps collapsing, and what the music industry figured out a century ago that software hasn’t
Whether the line between pattern matching and thinking actually exists
What the future engineering team looks like when half the team isn’t human
The things you learn about cognition at 2am while waiting for a training run to finish

Some of these will be practical. Some will be philosophical. Most will be both. All of them come from someone who builds, trains, and ships — not from the sidelines.

To kick things off, I’m publishing two articles:

1. What If Open Source Worked Like Music Royalties? A whitepaper on the tragedy of the commons in open source software — and a micro-royalty model to fix it. Originally published on LinkedIn.

2. What’s the Difference Between a Weighted Synapse and a Weighted Parameter? On cognition, the path to AGI, and why intelligence has never had a cutoff line — not in evolution, not in infant development, and not in the systems we’re building now.

Subscribe if you want to think about these things. Push back if you disagree. That’s the point.

— Ylli