Building a Continuously Updating News Intelligence Pipeline
We built a pipeline that converts continuous news ingestion into persistent dossiers, relationship tracking, and usable working memory for AI workflows.
News delivery is often treated as the end state of a news system. Articles are collected, ranked, and returned, and everything after that is left to the user or to a separate application. This is sufficient for simple reading. However, it is less useful for AI workflows that need to retain context, compare developments over time, and reason repeatedly about the same entities or events.
It is already known that language models can summarize documents and extract entities or relationships from text. However, the mechanism by which a raw news stream becomes durable working memory remains poorly specified. A feed can indicate what is new, but it usually cannot indicate what changed around an entity, which relationships remain active, or what context should persist.
Here, we built a continuously updating news intelligence pipeline on top of Currents to test whether ongoing ingestion, structured extraction, and persistent maintenance could produce a more useful state representation than a stream of articles alone. At the time of writing, the system is tracking 547 entity dossiers.
The problem
Raw articles are necessary, but they are not sufficient for persistent reasoning.
The same event often appears across multiple outlets with different framing. Entity naming varies across sources. Important developments arrive incrementally. In addition, the relationships that matter most are usually embedded in text rather than exposed as clean metadata.
As a result, a retrieval-first workflow tends to answer one question well:
What was published today?
Long-running AI systems usually need a different question:
What changed, for whom, and in relation to what?
That difference is operationally important. Without maintained state, the same context must be reconstructed repeatedly from raw articles.
Pipeline design
We organized the workflow into six stages:
Ingest → Extract → Compile → Relate → Index → Maintain
Ingest
We pull fresh articles from Currents on a recurring schedule. This provides timely external input in a stable format.
Extract
Each article is passed through a language model that identifies entities, events, relationships, and context. We do not force a rigid schema too early. Instead, the extraction layer is used to surface recurring structure before full normalization.
Compile
Each entity is assigned a persistent dossier. This is the key design choice. Rather than treating each mention as disposable output, the system merges new evidence into an existing record whenever possible.
This changes the unit of memory from article to entity state.
Relate
The system tracks relationships between entities as well as the entities themselves. This allows it to characterize dynamic interaction rather than isolated mention frequency.
Index
Compiled dossiers and relationships are written into a queryable index. This makes persistent context available to downstream workflows.
Maintain
A recurring maintenance pass handles pruning, deduplication, reconciliation, and health checks. This is less visible than extraction, but likely just as important. Without maintenance, duplicate entities accumulate, weak relationships persist, and stale records reduce selectivity.
Why dossiers matter
The main conceptual shift was moving from snapshots to memory.
A traditional article-driven workflow can identify what was said recently about a topic. A dossier-based workflow can support a more useful class of questions:
- What changed around this entity over the last 30 days?
- Which relationships are now more prominent?
- Which themes are stable, intensifying, or deteriorating?
- What context should an agent retain when this topic reappears?
These questions are more consistent with how long-running AI systems operate. Such systems do not simply need recent input. They need updateable state.
A simplified example
A simplified dossier might contain:
Federal Reserve
- Type: Central Bank
- Jurisdiction: United States
- Key figure: Jerome Powell
Recent activity
- held rates steady in the latest meeting
- continued balance sheet reduction
- signaled that future cuts remain data-dependent
Connected entities
- Jerome Powell
- US Treasury
- major equity indices
- gold markets
- other central banks
The exact representation can vary. However, the functional role is the same: the dossier is a cumulative record rather than a one-time summary.
What changed after reading Karpathy
A later improvement followed from Andrej Karpathy’s note on using language models to build and maintain knowledge bases.
The useful insight was architectural. Raw material can remain in one layer, while the model incrementally compiles that material into a more structured knowledge layer.
We applied that logic in Hermes by formalizing a personal-wiki skill with three components:
links/for raw source materialnotes/for observations and working noteswiki/for compiled, model-maintained knowledge
This separation reduced the tendency to treat every article or research session as disposable context. Useful outputs could instead be folded back into a longer-lived structure. We also maintained a master INDEX.md and periodic health checks for stale pages, contradictions, missing compilations, and broken cross-links.
Main observations
Several points became clear during implementation.
First, deduplication often matters more than model cleverness. If the entity layer is noisy, the intelligence layer is likely to become noisy as well.
Second, incremental updates appear more effective than repeated full rebuilds. Once state begins to accumulate, merging only what is new is cheaper and more stable.
Third, relationship extraction likely carries a large fraction of the long-term value, but it is also where drift appears fastest. It therefore needs explicit cleanup logic.
Finally, maintenance is not optional. The success of this approach is expected to depend on pruning, merging, validation, and coverage checks being treated as first-class operations.
Where Currents fits
Currents provides the ingestion layer in this architecture.
That matters because it allows the rest of the system to focus on extraction, organization, memory maintenance, and queryability rather than on source collection and normalization.
You can start here:
Closing
The original question was simple: if an LLM is continuously exposed to fresh news, can it build a more useful representation of the world than a list of headlines?
Our results suggest that the answer may be yes, provided that ingestion is coupled with extraction, persistent compilation, relationship tracking, and ongoing maintenance.
The implication is practical. News delivery alone is not sufficient for working memory. However, continuous ingestion combined with persistent dossiers appears capable of supporting a more stable and updateable knowledge layer for AI workflows.