The problem
Organizations ranging from Effective Altruism-aligned funders like Founders Pledge, GiveWell, and Open Philanthropy to government agencies and development NGOs compare interventions across very different domains—physical health, mental health, poverty alleviation—to decide where resources can do the most good. To make these comparisons, they need a common unit of measurement.
Two measures feature prominently in these analyses. The DALY (disability-adjusted life year) comes from health economics and captures years of healthy life lost to disease or disability. The WELLBY (wellbeing-adjusted life year) is based on self-reported life satisfaction, typically measured on a 0–10 scale. Each has strengths and limitations—and how they relate to each other, and whether either reliably captures what matters for human welfare, directly affects which interventions get prioritized.
This is part of The Unjournal's Pivotal Questions initiative: working with impact-focused organizations to identify their highest-value research questions, connect them to evidence, and commission expert evaluations that can inform real decisions.
What sparked this workshop
We recently commissioned an evaluation of Benjamin, Heffetz, Kimball & Szembrot's paper "Adjusting for Scale-Use Heterogeneity in Self-Reported Well-Being." This paper addresses a key question: whether people use wellbeing scales in comparable ways. If the differences in life satisfaction (not just absolute levels) aren't comparable across individuals, that poses a challenge for the WELLBY as a tool for comparing interventions.
The paper develops methods using calibration questions (survey items with objectively correct answers) and vignette exercises (rating hypothetical scenarios) to detect and adjust for scale-use heterogeneity. The evaluators' verdict was encouraging but nuanced: the differences may not be as severe as some feared, but more work is needed. This raises immediate questions for anyone using WELLBYs to compare interventions—and prompted Founders Pledge to ask us to convene this discussion.
What we want to achieve
This workshop brings together the paper's authors, the evaluators who assessed it, funders who use these measures in their work, and researchers with relevant expertise. We're organizing the discussion around four key questions:
1. Is the linear WELLBY reliable enough?
Can we treat a 1-point improvement in life satisfaction as meaning the same thing for different people and starting points? What about cardinality—does a move from 3→4 mean the same as 7→8? Where is the "neutral point" on the scale?
2. How should we convert between DALYs/QALYs and WELLBYs?
Current approaches are rough. A 1 SD change in WELLBY is often treated as equivalent to ~1 SD in DALYs, but is this defensible? How sensitive are funding decisions to the conversion factor used? (Note: QALYs may be more directly comparable than DALYs for this purpose.)
3. Could methodological adjustments improve things?
Benjamin et al. provide evidence suggesting that calibration questions and vignette exercises may reduce bias from scale-use differences. Should funders encourage these methods in future RCTs? Adding such instruments comes at a cost—increased survey length, respondent burden, and comprehension challenges—so the benefits must be weighed. Are there other refinements—such as multi-item scales—that could help?
4. What should funders do now?
Given the current evidence, how should organizations navigate the DALY–WELLBY question in their cost-effectiveness analyses today? What uncertainty ranges should they use?
How the workshop is structured
Note: This agenda is preliminary and subject to change based on participant availability and feedback.
The workshop is fully online, with approximately 3.5 hours of live sessions scheduled in segments so you can join only the parts you're interested in. We also support asynchronous participation—you can submit beliefs and comments before or after the live event, and we'll integrate these into the discussion. We anticipate beginning with a stakeholder problem statement from Founders Pledge, followed by the paper presentation from the Benjamin et al. team and responses from the evaluators. The core of the workshop would consist of two focused discussion sessions—one on WELLBY reliability and one on DALY–WELLBY conversion—plus a beliefs elicitation exercise and a practitioner panel.
We plan to record the workshop and make it publicly available by default, with an AI-queryable transcript so researchers and funders can easily search the discussion. Participants can opt out of recording for specific segments if needed.
Pivotal Questions & Beliefs
As part of this project, we've developed specific, operationalized questions (codes WELL_01–07 on WELLBY reliability, DALY_01–05 on interconvertibility) designed so that experts can state their beliefs quantitatively—and so that answers can directly inform funding decisions. We want to elicit beliefs before, during, and after reviewing the evidence, to see how expert views evolve. See the canonical formulations on Coda.
Three of these questions will also be posted on our Metaculus forecasting page. See key questions and share your beliefs →