The Unjournal · Pivotal Questions Initiative

Wellbeing Pivotal Questions

State your beliefs on specific, operationalized questions about WELLBY reliability and DALY–WELLBY interconvertibility.

These are some of the key operationalized questions from our Wellbeing Pivotal Questions project. We want to elicit expert and stakeholder beliefs—before, during, and after reviewing the evidence—to see how views evolve and where consensus exists or doesn't. (All questions are optional.)

📋 Full question specifications: For more detail, context, and the complete set of operationalized questions, see the canonical Wellbeing PQ formulations on Coda →

You don't need to be a specialist to contribute. We want your honest assessment and reasoning, whether you feel highly confident or very uncertain. Your input helps us understand the range of views in the field.

Three of these questions will also appear on our Metaculus forecasting page (coming soon). If you forecast on Metaculus, please share your username below so we can link your contributions.

How to respond

Shared Definitions

Suppose Founders Pledge is considering whether to donate $100,000, either:
  • to StrongMinds (to treat depression in women in low-income settings through group interpersonal psychotherapy)
  • or to extend a seasonal malaria chemoprevention campaign.
Suppose they have substantial evidence on the impact of each intervention coming from RCTs combined with typical self-reported wellbeing surveys as well as objective income and health measures and outcomes. They also have the opportunity to fund the collection of more data in future studies.

They want to allocate the funds to the intervention that leads to greater "social wellbeing or welfare" in expectation.

For the current context, we define a WELLBY (Wellbeing-Year) as one point of self-reported life satisfaction measured on a 0-to-10 Likert scale for one individual for one year (following Frijters et al., 2020; Frijters and Krekel, 2021).

We follow the definition from Frijters et al., 2024, based on a life satisfaction scale (acknowledging that WELLBY has been defined differently in other contexts).

"Best" = leads to the decisions that yield the highest "true welfare" on average, in the particular relevant domain (e.g., in comparing mental health interventions in Africa), perhaps taking into account the cost of doing the measurements.

More precisely: the "best" measures and aggregations would be those that, if we collected and made decisions based on them, would yield policy and funding choices with the highest overall wellbeing or welfare in expectation. Consider reliability, practicality, cost, comparability, and other real-world considerations.

The "best" mappings would be those that, if used to make conversions between WELLBYs, DALYs, etc., would be likely to lead to the better/best decisions in most relevant situations.

PQ1a · WELLBY Usefulness · WELL_01/07

How reliable is the WELLBY measure [...] relative to other available measures in the 'wellbeing space'? How much insight is lost by using WELLBY and when will it steer us wrong?

Adapted from WELL_07: "How reliable is the WELLBY measure of well-being/mental health (as defined above) relative to other available measures in the 'wellbeing space' (including other transformations of the 0-10 life satisfaction scale)?"

The WELLBY is used by several major funders (Happier Lives Institute, Founders Pledge) to compare interventions across domains. The reliability of this approach matters for resource allocation decisions.

  • For present purposes, a WELLBY is defined as an increase of 1 point for one person for one year on a 0–10 life satisfaction scale (e.g. Cantril's Ladder). (Note: definitions vary across contexts.)
  • Benjamin et al. (2023) found substantial scale-use heterogeneity. They develop methods using calibration questions (new survey items with objectively correct answers) and vignette exercises (rating hypothetical scenarios) that can reduce the estimated bias from scale-use heterogeneity—by roughly 30–50% in their sample, though generalizability remains uncertain.
  • Related canonical question (WELL_01): "What combination of (a) subjective wellbeing survey data [...], (b) income and health-outcome data, (c) metrics based on this data (e.g., linear or logarithmic WELLBYs, standard deviations, scale-use adjustments), and (d) possible conversions between different measures would be 'best' for making funding choices between interventions which may impact mental health, physical health, and/or consumption[?]"
50%
PQ1b · Best Measure · WELL_02/03

Given the available collected data [...], how should [funders] measure the impact on wellbeing? [...] What measures of well-being should charities, NGOs, and RCTs collect for impact analysis?

Even if the WELLBY is "good enough," there might be better options—multi-item scales, log-transformed life satisfaction, or standardized composites. Switching measures has costs, so the improvement needs to be meaningful.

Adapted from WELL_02: "Given the available collected data from surveys and intervention trials, how should Founders' Pledge measure the impact on wellbeing in the context of mental health interventions? [...] Consider reliability, insight, and practicability."

And WELL_03: "What measures of well-being [...] should charities, NGOs, and RCTs collect for impact analysis, particularly in contexts that may involve less tangible well-being outcomes (such as mental health interventions)? This could also include stated-preference and calibration surveys."

  • Candidates include: multi-item life satisfaction scales (e.g. SWLS), experience sampling, the WB-Pro, WEMWBS, log-transformed 0-10 LS, or domain-specific instruments.
  • Diener et al. (2018) found single-item life satisfaction has moderately high reliability (~0.70 correlation) with little validity loss compared to multi-item scales.
  • WELL_03 also asks: "How should these [measures] be used?"—considering not just what to collect but how to combine and interpret the data.
WELL_01a · Cost Ratio Extension

If you propose a measure other than linear WELLBY in your answer above, how much more would it cost to achieve the same welfare improvement using linear WELLBY instead?

Consider the welfare-improvement from allocating $100,000 among a large set of charities/interventions given the information provided by the "best measure" you propose. How much more would it cost to achieve the same outcome using the linear WELLBY? (E.g., 1.1 = 10% more, 1.5 = 50% more, 3 = 3x as much.) If you think WELLBY is optimal, skip this question. This is inherently speculative—rough estimates based on your intuition are welcome.

WELL_04 · Single vs Combined Measures

In contexts where interventions impact mental health, physical health, AND consumption: is it better to use a single WELLBY measure, or measure each dimension separately and then convert/combine?

WELL_07 · What Is Lost?

How much insight is lost by using WELLBY relative to other available measures in the "wellbeing space"? When will it steer us wrong?

WELL_08 · Life Satisfaction vs Experience

Would it be better to base the metric on life satisfaction or instantaneous experience measures (e.g., happiness, affect balance)?

WELL_09 · Cantril Ladder Conversion

If we must rely on the Cantril ladder measure, how would we best convert it into a welfare metric for comparing interventions?

PQ2a · DALY/QALY–WELLBY Conversion · DALY_01

If the impact of one program is measured in WELLBYs [...] and another program impact is measured in DALYs [or QALYs], [...] what is the best numerical conversion or mapping between them?

From DALY_01: "If the impact of one program is measured in WELLBYs (as defined above) and another program impact is measured in DALYs, and we have a reported effect size and standard deviation for each, what is the best numerical conversion or mapping between them?" (Note: QALYs may be more relevant than DALYs for this conversion—see context.)

"Best" here means: the mapping that, if used for funding decisions, would lead to the highest expected welfare. Getting this conversion wrong means systematically over- or under-investing in mental health versus physical health interventions.

  • DALY vs QALY: DALYs measure health burden (years lost to disease/disability); QALYs measure health gained. For conversion purposes, QALYs are often more directly comparable—the canonical questions note "replace DALY with QALY" may be appropriate.
  • Some organizations (including HLI and Founders Pledge) currently treat SDs on different mental health instruments as interconvertible with WELLBY SDs on a roughly 1:1 basis.
  • The conversion between DALYs/QALYs and WELLBYs depends on the "neutral point" on the LS scale—the point below which life has negative value. This is currently unknown; one small study (Peasgood et al. 2018) suggested LS ≈ 2, but this is tentative.
  • The relationship may also be non-linear—e.g., a WELLBY gained at very low wellbeing could be worth more than one gained at high wellbeing.
30%
PQ2b · Best Conversion Method · DALY_03/05

If the effectiveness of some programs have already been measured in terms of WELLBYs, while others are measured in terms of DALYs, what method or what "mapping structure or approach" should we use to compare and convert between them?

From DALY_03: "[...] E.g., direct units vs standard deviations, linear vs something else, etc.? [...] What numerical conversion factor(s) should we use between the two (possibly transformed) effectiveness metrics?" See also DALY_05: "What is the loss from the 'one SD change in WELLBY is equivalent to one SD change in DALY' approach [...] relative to the best feasible approach? Where will [this] approach be particularly incorrect?"

Funders need a usable conversion now, even if imperfect. The question is whether the current approach (SD equivalence) is defensible, or whether a better practical method exists.

  • Options include: SD-equivalence (current practice), regression-based approaches (linking LS data to DALY weights in the same populations), time-tradeoff surveys, or simply maintaining separate analyses and comparing rankings.
  • DALY_03 also asks: "If the optimal factor varies greatly from one domain to another (e.g. mental health, physical health, income/consumption; or rural Africa vs urban India), what are the domains where it varies the most?"
DALY_02 · Founders Pledge Specific

Which mapping between WELLBYs and DALYs should Founders Pledge specifically use for comparisons like the focal example (StrongMinds vs malaria)?

This asks about the best mapping for their particular use case, rather than a general-purpose conversion.

DALY_05 · Loss from SD-SD Approach

What is the loss from the "1 SD change in WELLBY ≈ 1 SD change in DALY" approach currently used by some funders, relative to the best feasible approach?

Where will this approach be particularly incorrect? Consider different intervention types, populations, or contexts.

PQ3a · Metaculus-style · Research Uptake

By 2030, will more than 50% of GiveWell's top charities include a WELLBY-based cost-effectiveness analysis alongside or instead of DALY-based analysis?

This illustrative forecasting question gauges whether the WELLBY will gain institutional traction. (Note: This is a discussion question for the workshop, not from the canonical PQ table.)

25%
PQ3b · Metaculus-style · Expert Consensus

If The Unjournal were to survey development economists and research-informed practitioners (before end of 2027), what share would agree that "the linear WELLBY (as defined above) is a reasonably useful measure in this context, and switching to a different measure is unlikely to add much value"?

(Note: This is a hypothetical scenario for discussion. We are not currently planning to conduct such a survey, though we would like to if feasible.)

PQ3c · Metaculus-style · Calibration Impact

If calibration questions and/or vignettes (as in Benjamin et al.) were added to the major wellbeing surveys used in global health RCTs, would the resulting adjustments meaningfully change the cost-effectiveness ranking of the top 5 interventions recommended by Founders Pledge?

  • "Meaningful change" = at least one intervention currently in the top 5 moves out of the top 5, OR the #1 ranked intervention changes.
  • This assumes future RCTs incorporate these methods and Founders Pledge updates their CEA accordingly.
  • Note: This question is somewhat speculative—it asks about counterfactual methodology adoption and its downstream effects.
35%

About You

Your responses are stored securely and will be used to inform the synthesis report.

Questions adapted from the canonical Wellbeing PQ formulations (codes: WELL_01–07, DALY_01–05). Last updated: February 2026.