Scaling Synthesis

The goal of this research project is to find data structures and interfaces that support synthesis and innovation in a decentralized discourse graph... Welcome to our living hypertext notebook. This notebook will continue to evolve going into the future and is by no means a finished product- as such, pages exist at varying stages of completion. https://scalingsynthesis.com/

Authored By:: P- Rob Haisfield, P- Joel Chan, and P- Brendan Langen

see (2021-08-23) Chan Sustainable Authorship Models For A Discourse-based Scholarly Communication Infrastructure

Project Mission and Impact

Through our research, we plan to generate insights and questions that enable the creation of decentralized tools for networked thought that improve the thinking of all participants

Every practitioner is learning a lot through their work, connecting the dots. Many of them use a personal knowledge management system in some form or another. Practitioners don’t have a proper feedback loop to update the academic knowledge base, and barely even talk to each other. What would happen if we unleash the knowledge of individuals and groups into a decentralized knowledge graph built to facilitate synthesis? How can that technology and behavior be actualized? (Webs Of Thinkers And Thoughts)

If we are to create a decentralized knowledge graph, we have to figure out a structure that doesn’t break if individual people don’t manually tag information consistently and honestly.

Pages have prefixes in their page titles that indicate what type of node it is:
“C- " Claim
“Q- " Question
“R- " Resource
“I- " Idea.
(Why didn't they use IBIS classes? Also, see Designing Good Page Names)

C- Synthesis as a process is usefully modeled as a specialized form of sensemaking

While the most common manifestation of synthesis is an academic literature review, the underlying cognitive processes are not so different from what people do all the time: sensemaking.

Sensemaking as a model of a core cognitive process has a long history in cognitive and information science. We draw specifically on a family of models that describe sensemaking as an iterative search for representations of input data that can make some task easier.

There are likely to be key differences between synthesis and sensemaking, and other everyday tasks of sensemaking, due to the nature of the inputs (complex research papers, theories, findings) and the nature of the output (a boundary-pushing novel conceptual whole with no obvious precedent). These differences are likely to be consequential for understanding precisely what data structures and user interactions can optimize sensemaking for synthesis.

Note: Benjamin Bloom's taxonomy is a set of three hierarchical models used for classification of educational learning objectives into levels of complexity and specificity. Synthesis involves building a structure or pattern from diverse elements; it also refers to the act of putting parts together to form a whole or bringing pieces of information together to form a new meaning. Its characteristics include: Production of a unique communication; Production of a plan, or proposed set of operations; Derivation of a set of abstract relations.

C- Effective synthesis is necessary for innovation and scientific progress

The advanced understanding from an effective synthesis can be a powerful force multiplier for choosing effective studies and operationalizations,1 and may be especially necessary for problems where it is difficult or impossible to construct decisive experimental tests.

To illustrate the power of synthesis for accelerating scientific progress, consider the example of Esther Duflo, who attributed her Nobel-Prize-winning work to the detailed synthesis of problems in developmental economics she obtained from a handbook chapter in R- How to Find the Right Questions. Indeed, scientific progress may not even be tractable without adequate synthesis.

C- Synthesis is hard to do with people who don’t share context with you

Synthesis involves individuals putting together many pieces in their minds, but It is often argued that synthesis occurs in a single mind. In our research, surprisingly few participants had distributed synthesis experiences to share. Why?

In the early stages of synthesis, when people have half-baked ideas not ready yet, they prefer to work on their own or with a close companion.

the primary concern is being able to work at the speed of thought.

when he reads a paper, he only takes notes on what is relevant for his current projects. While he does have rigorous processes for taking notes on a particularly juicy paper, he finds that often C- It is difficult to predict whether structure now will be worthwhile later

Thus, we see that shared context is critical to distributed synthesis

But there is a very significant barrier to team-based convergence because of this bottleneck of not being able to have a shared “dataset” of ideas that satisfy the properties (context)

two categories of solutions to enable synthesis with others:

Connect people who share context.
Design mechanisms to help people gain context quickly.

A promising direction would be through enhanced composability, see C- Hypertext enables communication with high information density for more detail.

C- Discourse graphs could significantly accelerate human synthesis work

An exciting hypothesis that motivates this work is that making discourse graphs widely available could accelerate human synthesis work, and thereby accelerate innovation and scientific discovery.

consider a researcher who wants to understand what interventions might be most promising for mitigating online harassment.

which theories have the most empirical support in this particular setting? Are there conflicting theoretical predictions that might signal fruitful areas of inquiry?

The answers to these questions cannot be found simply in the titles of research papers, in groupings of papers by area, or even in citation or authorship networks. The answers lie at lower levels of granularity: the level of theoretical and empirical claims or statements made within publications

claims that interrelate in complex ways, both supporting other claims/theories that are in tension with each other.

our researcher will also need to work through a range of contextual details. For example, to judge which studies, findings, or theories are most applicable to her setting, she needs to know key methodological details

she would need to know, for example, which findings came from which measures (e.g., self-report, behavioral measures), and the extent to which findings have been replicated cross authors

A discourse graph as a data structure has key affordances that are hypothesized to enable just these sorts of synthesis operations.

Note that discourse graphs need not be represented or manipulated in this visual format; the underlying graph model can be instantiated in a variety of media, such as hypertext notebooks, and also implicitly in various analog implementations that allow for cross-referencing

Beyond the theoretical match between the kinds of queries scientists need to run over their evidence collection for synthesis, a discourse-centric representation that encodes granular claims instead of document “buckets” could facilitate exploration and conceptual combination.

At the same time, constructive and creative engagement with contextual details, is thought to be necessary for developing novel conceptual wholes from “data”, such as in sensemaking, systematic reviews, or formal theory development

Discourse graphs (or parts thereof) could also significantly reduce the overhead to synthesis through reuse and repurposing over time, across projects, and potentially even across people

keyword search is only really useful when there is a stable, shared understanding of ontology: this condition is almost certainly not present when crossing knowledge boundaries

In these settings, judging that two things are “the same” is problematic and difficult task; doing so without engagement with context can sometimes introduce more destructive ambiguity, not less, a hard-won lesson from the history of Semantic Web, ontology and classification efforts.

Q- What is a decentralized discourse graph

Our research draws from a long line of information models,

These models share a common underlying model for representing scientific discourse: they distill traditional forms of publication down into more granular, formalized knowledge claims, linked to supporting evidence and context through a network or graph model.

We use the term discourse graph to refer to this information model, to evoke the core concepts of representing and relating knowledge claims (rather than concepts) as the central unit, and emphasizing linking and relating these claims (rather than categorizing or filing them).

So what does it mean for a discourse graph to be decentralized?

Synthesis has many component parts. Given that, C- The responsibilities required to produce synthesis can be split up among many people. refers to this concept as human computation

Some will simply read everyone else’s outputs and annotate them, all the while meaningfully connecting the conversation between the various fields they explore

When we look at prior attempts at building a semantic web, we find that the primary reasons have to do with human behavior and dishonesty.

C- It will be important to capture the potential energy of information consumption

if we can really leverage tacit annotation by enriching the choice architecture with options that add value, then we can perhaps fill in metadata and generate discourse. This could be interpreted as a response to the question: Q- How do we increase the frequency of social tagging behaviors. These people can verify information, agree with it, disagree, and connect it to other information.

there is exhaust to capture from consumption

C- Most people will primarily consume information

Statistics over the years cite a 90-9-1 rule to contributors in online communities, and although that does not hold up across all communities, the majority of people fall into the consumption-only group (90%).

How can you lower the barriers for someone to meaningfully contribute?

Synthesis is supported by active reading, and a number of tools assist with this.

C- Incrementally processing notes is a key user behavior to promote synthesis

the word processing refers to the act of going through old information and sorting it, either into new groups, by importance, or by stage in some pipeline (digital gardening)

C- Synthesis is supported by Active Reading, and one method of active reading is the act of progressive summarization (as coined by P- Tiago Forte)

Tools like Readwise, LiquidText, and Hypothesis promote this act within their via annotations, highlighting, and focused views to gradually process notes.

C- Compression facilitates synthesis

Synthesis is creating a new whole out of component parts. But what should the “parts” look like? What kinds of building blocks would facilitate synthesis?

blocks that are too “large” or complex would make reassembly into a new whole impractical (composability)

Decomposing ideas into smaller pieces (atomic note) also enables us to connect ideas in richer and more meaningful ways

It is important to distinguish atomicity from compression - in creative thought, it is less about decomposition in the atomic and disconnected sense, and more about flexibly using compression to move between different levels and states of “granularity”.

C- Hypertext enables communication with high information density

Incrementally processing notes is a key user behavior to promote synthesis. In doing

C- Compression facilitates synthesis and hypertext facilitates compression

When Evergreen notes are factored and titled well, those titles become an abstraction for the note itself.

Hypertext structured in this manner is excellent for exploratory search.

When you have read much of a person’s work in a well structured hypertext notebook, you can build a rich mental model of their thinking.

C- People process complex information in multiple levels and stages of processing

in the moment a note is created, it is very difficult to know where it belongs. We may not yet have a name for what we’re seeing or know how it relates to another insight

The primary models that represent this process are the Learning Loop Complex and the Notional Model of Sensemaking, which loops between and within foraging and sensemaking loops (iterative), progressively increasing in structure and effort

This all speaks to the need to treat notes as a work in progress.

In the process of synthesis, we often write a number of notes about an idea - some useful, some not. We know that C- Multiplicity facilitates synthesis, and this writing process helps, but it creates a huge amount of noise for us to later search through.

Forgetting is powerful.

While this multiplicity is generally useful during the process of thinking through an issue, later down the line reviewing it contributes to the feeling of wasted repeated effort.

Additionally, over the course of a decade, I’ll certainly update my beliefs and mental models. If that happens, I’ll have to ask myself, “Is it important to keep a record of my beliefs at that point in time, or is leaving it in my database weighing it down?”

Q- How do we solve the problem of different people referring to the same concept with different language?

One time I was in a conversation with P- Andy Matuschak on Twitter, where we ended up having a conversation through our notes. He linked to a note about what we can learn from game design, and then I linked to a note about the learning curve that’s commonly seen in puzzle games, and then he linked to a note about cognitive scaffolding. It was interesting because I had actually encountered his thought about cognitive scaffolding before, but I hadn’t drawn the connection that we were talking about the same thing with different language!

could be worse, you could be using the same term for 2 different meanings!

Jump solves for this by explicitly encoding a thesaurus into its logical relational structure. While this may not account for all areas of innovation, it’s certainly a head start. This seems like false progress, as it pretends there's more "agreement" than is real (is-ism).

Edited: 2026-04-16 20:43:21.385149 | Tweet this! | Search Twitter for discussion

Bill Seitz