From Collection to Cognition: How the OSINT Community Is Navigating the Age of AI and Data Overload

Written by Fed Gov Today | Jun 17, 2026 11:40:40 PM

Presented by Carahsoft

The open source intelligence community finds itself at a defining inflection point. Across conversations in conjunction with the OSINT Tech Expo, a clear consensus emerged from practitioners and industry leaders alike: the field's greatest challenge has fundamentally shifted. For decades, the problem was finding enough information. Today, the problem is making sense of an overwhelming torrent of it. Veteran practitioners described a landscape where social media posts, geospatial signals, dark web data, commercial telemetry, and technical intelligence are all available simultaneously — and where the sheer volume threatens to paralyze rather than empower the analysts charged with acting on it. The phrase "data toxicity" surfaced more than once as a way of describing how too much unfiltered information can be as dangerous as too little.

Artificial intelligence — and increasingly, agentic AI — emerged as both the primary hope and the most complex variable in addressing this challenge. Interviewees were largely optimistic about AI's potential to accelerate analysis and cut through noise, but uniformly cautious about treating large language models as a complete solution. The recurring emphasis on data quality, source credibility, and human judgment underscored a mature, nuanced community grappling with how to integrate powerful new tools without losing the analytical rigor on which mission outcomes depend. Across every conversation, one through-line was constant: the technology is only as good as the data fed into it, and the data is only as good as the trust placed in its sources.

When the Data Itself Becomes the Threat

The battle against cybercrime has always been a race against time, but Phil Fuster, Vice President of Government Sales at SpyCloud, argues that the calculus is finally shifting in favor of the defenders. SpyCloud's approach centers on socially engineering data directly out of the dark web — rather than scraping or purchasing it, which Fuster notes would amount to funding the very criminal ecosystem they're working to disrupt. The result is a collection of roughly one trillion digital assets, including credentials, session cookies, and crypto wallet data, that the company converts and delivers to customers like Apple within 15 minutes of discovery.

What makes this possible at scale, Fuster explained, is the application of AI and data science to what had previously been an unmanageable volume of raw, messy information. SpyCloud is now able to render more than 94% of its collected passwords in plain text, dramatically improving their utility for credential protection. The broader implication, he suggested, is a meaningful shift in the tempo of the defender-versus-attacker dynamic: by intercepting and converting stolen data before it can be weaponized, organizations are beginning to close the gap that has historically favored bad actors, whether criminal groups or nation-state operators.

📺 Watch Interview

Drowning in Signals: The Analyst's Dilemma

David Wallach, Senior Director of Defense Programs at Penlink, brings a career spanning several decades to his assessment of the OSINT landscape — and his diagnosis is pointed. Where the challenge was once a scarcity of data, it is now an excess of it, a condition he calls "data toxicity." A single analyst tracking a transnational network might ingest a million social posts, device pings, and forum mentions in a 24-hour window, only to find that a handful of data points actually carry operational weight. The system, Wallach argues, must aggressively prioritize or the analyst will drown in noise before surfacing anything that matters.

The antidote, in his view, lies in the ability to stitch together disparate data types — social media, dark web records, mobile ad identifiers, breach data — that don't naturally align with one another. Each signal in isolation is weak; combined with confidence into a unified identity, it becomes intelligence. Wallach also pointed to the cognitive warfare dimension of OSINT, citing the now-documented case of a Russian troll's single social media post triggering a real-world emergency response in Louisiana as evidence that the same tools used for analysis are being weaponized offensively. The answer, he said, lies in identity resolution at scale, aided by large language models and agentic AI, tempered by the experience and judgment of seasoned analysts.

📺 Watch Interview

Building the Professional Home for a Maturing Discipline

Eliot Jardines, Director of Operations for the OSINT Foundation, is working to ensure the discipline's rapid growth doesn't outpace the community's ability to sustain itself. The Foundation operates as a 501(c)(3) professional association with an explicit ambition: to become for OSINT practitioners what the American Medical Association is for physicians or the American Bar Association is for lawyers. At the largest OSINT exhibition in history — featuring 43 exhibitors — Jardines emphasized that the organization's focus remains deliberately ground-level, prioritizing peer-to-peer connections among frontline practitioners over senior-level keynotes, and using events to give emerging voices their first public platform.

Jardines traces the discipline's trajectory from a narrow specialty to something that now underpins every other intelligence discipline. OSINT, he noted, is celebrating its 85th year of supporting national security — and the field's current toolkit would have been unimaginable to him during his earlier career managing IC OSINT efforts. Looking ahead, he is optimistic about AI's capacity to free practitioners from the most time-consuming, low-value tasks in the analytical workflow. The workforce of the near future, he predicted, will be one fluent in scripting, prompt engineering, and data science concepts — a shift that represents both a challenge and a significant opportunity for a discipline that has never been more central to the intelligence enterprise.

📺 Watch Interview

Scaling Intelligence to Petabytes and Beyond

Andrew Borene, Executive Director at Ocient, operates at the outermost edge of data scale — what he calls "extreme scale" datasets measured not in terabytes but in petabytes and zettabytes, representing trillions to quadrillions of individual records. His argument is foundational: no AI agent is better than the scale and fidelity of the data it can access in real time. For the OSINT community, where internet telemetry, geospatial signals, and AIS maritime data are now available in the open source, the ability to process that data at speed is becoming a prerequisite for mission effectiveness.

Borene pointed to two structural developments he sees reshaping how intelligence agencies acquire and use these capabilities. The first is an emerging ODNI acquisition vehicle — structured as an Other Transaction Authority — that would, for the first time, allow all 18 intelligence community departments and agencies to purchase OSINT tools directly through a common framework. The second is the growing imperative for allied and partner sharing: because OSINT products derived entirely from open sources don't require the declassification and downgrading process that burdens traditional intelligence sharing, they represent a far more fluid mechanism for working with foreign defense and law enforcement partners. The goal, as Borene framed it, is straightforward — US government analysts should have access to at least the same open source tools that a Fortune 500 geopolitical analyst can access by logging into a commercial platform.

📺 Watch Interview

Trust Before Technology: The Case for Relationship-Based Collaboration

Randy Nixon, Interim US President of Janes, came to the role via an unusual path — he previously served as Director of Open Source Enterprise for the intelligence community, giving him a perspective on the government-industry relationship that few on either side can claim. His critique of the status quo is direct: too often, the relationship between government and industry devolves into a transactional exchange driven by contracting mechanics rather than a genuine partnership grounded in mutual understanding of mission need. The result, he said, resembles buying a used car from a dealer you don't trust — even when the outcome is acceptable, the process corrodes the relationship.

The alternative Nixon advocates for is a model where industry engages with customers by first understanding their mission requirements, then honestly assessing what can be delivered and at what cost — a collaborative build rather than a procurement transaction. Janes, a 128-year-old company with more than 500 analysts who physically handle military equipment worldwide, positions itself as a peer collector to the intelligence community rather than a vendor. On the question of AI, Nixon was measured: the technology is only as reliable as the data underneath it, and organizations that chase the LLM hype without interrogating their underlying data sources will be disappointed. Trusted data, he emphasized, is the foundational investment that makes everything else work.

📺 Watch Interview

Precise Signals in a Noisy World

Tim Miller, Global Field CTO and Chief Cybersecurity Strategist at Dataminr, frames the core OSINT challenge as a signal-to-noise problem — and argues that the analytical community is, on the whole, approaching it more thoughtfully than many other technology-adjacent fields. Rather than layering AI on top of existing workflows and hoping for improvement, the more effective practitioners are asking a harder question: how do we ensure the AI models we're using are actually calibrated for the precision our mission requires?

Miller described Dataminr's approach as a deliberate departure from reliance on large foundational models, which he noted are broadly capable but not well-suited to the high-specificity demands of intelligence analysis. Instead, the company operates a constellation of 60 purpose-built smaller models, trained on over a decade of intelligence data, that cross-correlate signals before selectively incorporating foundational models where they add value. The goal is to reduce hallucination and avoid the generalization problem that makes broad LLMs unreliable for operational use. What agentic AI offers the analyst, Miller argued, is not a replacement for human judgment but a dramatically faster path to the moment where that judgment can be applied — stripping away hours of noise so the analyst can focus on mapping information to decision.

📺 Watch Interview

The Source of Truth Problem

Todd Helfrich, Vice President of Federal Sales at Censys, centers his concern on a challenge that cuts across every conversation at the event: in a world where AI is being used to parse and act on massive OSINT datasets, the accuracy and currency of the underlying data is not a secondary consideration — it is the primary one. Processing stale or low-confidence data through an AI system doesn't just waste money on GPU compute; it actively misleads decision makers and can send analysts down costly investigative dead ends. For Helfrich, a "source of truth" is less about any single platform and more about the confidence framework an organization extends to the vendors and data feeds it relies on.

The workforce dimension of this challenge is one Helfrich returns to with urgency. Despite a decade of conversation about the cybersecurity talent shortage, the gap remains. Technology can help scale what human analysts do, but it cannot replace the need for people who understand the guardrails — who can recognize when a machine-assisted output needs to be interrogated rather than accepted. Helfrich's prescription is investment at the roots: STEM education at the K–12 level, university partnerships, and an industry commitment to building the next generation of practitioners who can work alongside AI rather than simply deferring to it.

📺 Watch Interview

Beyond LLMs: The Case for a Knowledge Engine

Bill Wall, CEO of Accrete AI Government, offers one of the more structurally rigorous frameworks for understanding why large language models alone are insufficient for the OSINT practitioner's challenge — and what a more complete solution looks like. The core problem, he argues, is not that LLMs are useless; it's that the open source information environment is inherently chaotic, populated by bots, biased sources, duplicate records, and deliberate disinformation. An LLM applied directly to that environment will hallucinate, generalize, and ultimately erode analyst confidence in outputs that may look authoritative but cannot be verified.

Wall's answer is what Accrete calls a "knowledge engine" or "context engine" — a layer that sits beneath the LLM and does the foundational work of entity extraction, normalization, and relationship mapping before any language model is applied. The practical implication is significant: knowing that two references across different datasets refer to the same individual, and understanding the nature of their relationship to other entities, is what separates data from intelligence. Only once that groundwork is complete does the LLM become a genuinely useful tool for rapid summarization and synthesis. The goal, Wall was clear, is not to replace analysts but to radically expand what they can accomplish — because in the current environment, analysts aren't underperforming, they're simply overwhelmed by a volume of information that no human workforce could process unaided.

📺 Watch Interview

View full post