DISHFIRE and the Bulk Text Message Program

DISHFIRE and the Bulk Text Message Program is one of the clearest examples of how ordinary communications can become high-value intelligence once they are stored and parsed at scale.

It matters because it sits at the intersection of four worlds:

bulk communications storage,
automated extraction,
mobile intelligence,
and target development.

This is a crucial point.

DISHFIRE was not simply a database of suspicious texts. It was a system for processing and storing lawfully collected SMS traffic at scale, paired with an extraction layer that converted text-message content into searchable intelligence fields.

That is why this entry matters so much. It preserves the story of how the NSA treated SMS as a “goldmine,” especially because routine automated messages could reveal travel, finance, contact networks, roaming, and location with very little additional analysis.

Quick profile

Topic type: declassified SMS-collection program
Core subject: an NSA bulk SMS processing and storage system paired with the PREFER extraction layer
Main historical setting: PREFER operational by 2008, large-scale 2011 reporting metrics, and public exposure in January 2014
Best interpretive lens: not “just a dragnet of text content,” but evidence for how SMS storage and automated tagging supported future searches and target development
Main warning: the broad architecture is well supported, but the public record still does not fully explain all collection sources, selector rules, or current retention practices

What this entry covers

This entry is not only about a big number.

It covers a system:

what DISHFIRE was,
what PREFER did,
why text messages were so attractive to analysts,
what categories of information were extracted,
how the system fit into foreign-intelligence collection,
why GCHQ access became controversial,
and why the program mattered in the wider Snowden archive.

That includes:

the DISHFIRE repository,
the PREFER automated parsing layer,
the 194 million SMS per day figure from April 2011,
the extraction of missed calls, roaming alerts, travel, geocoordinates, and financial events,
the concept of content-derived metadata,
the official NSA defense of lawful foreign-intelligence collection,
and the distinction between U.S. minimization claims and broader foreign retention.

So the phrase DISHFIRE and the Bulk Text Message Program should be read broadly. It names both a storage system and an extraction model.

What DISHFIRE was

DISHFIRE was an NSA system that processed and stored lawfully collected SMS data.

That matters because the public record does not describe it merely as a narrow target list or a one-off query tool. It appears instead as a large repository of text-message material that could be searched and mined after collection.

This is historically important.

The value of DISHFIRE was not only what it knew about already identified targets. Its value was also that it retained a large volume of text traffic so that analysts could look backward later, once a new phone number or new person became interesting.

That future-searchability is one of the core themes of the program.

Why SMS was treated as an intelligence goldmine

The leaked NSA slide subtitled “SMS Text Messages: A Goldmine to Exploit” captures the entire logic.

SMS messages are short, informal, and often overlooked by users. But automated texts and system notifications can contain:

contact information,
movement clues,
financial activity,
device changes,
and meeting arrangements.

This matters because DISHFIRE was not mainly valuable only because people confessed things over text. It was valuable because modern messaging ecosystems generate useful side-information automatically.

That made SMS an unusually rich intelligence environment.

DISHFIRE versus PREFER

One of the most important distinctions in the public record is the difference between DISHFIRE and PREFER.

DISHFIRE was the repository and processing environment for SMS data. PREFER was the automated extraction layer that identified message types and pulled useful entities from the text.

This is a crucial point.

Many summaries collapse the two together. But the leaked presentation makes the relationship clearer:

DISHFIRE stores and processes the SMS stream,
PREFER parses and tags it.

That is the right way to understand the system.

PREFER as the extraction engine

The leaked slide says PREFER identifies types of automated messages and extracts entities from SMS content daily. It also says PREFER had been operational on DISHFIRE servers since January 2008, inserting content-derived tags into XML output.

This matters because it shows that the intelligence value did not depend on analysts manually reading everything. Automation was central.

PREFER made the SMS stream useful by turning free-form or semi-structured text into categories and fields that other analytic systems could exploit.

Why automated messages mattered most

A particularly revealing part of the public story is the emphasis on automated texts.

The leaked slides and later Guardian reporting describe missed-call alerts, roaming notifications, travel-company itinerary messages, electronic business cards, and banking alerts as especially valuable. These were not always traditional “private conversations” in the romantic sense. They were often machine-generated messages.

This is historically important.

It means DISHFIRE’s intelligence power came partly from the way digital life is automatically documented by service systems. The program exploited that routine machine-to-person signaling.

The April 2011 scale

The strongest numerical anchor comes from the leaked 2011 presentation.

It reported 194,184,810 SMS messages per day, averaged over 30 days in April 2011, and marked as not deduped. The same slide also reported:

184,794,279 DISHFIRE message tags
188,299,963 PREFER text slices decoded

This matters because it shows the scale of the system in a concrete and internally measured form.

That is one of the most important facts in the whole record. DISHFIRE was not a niche collection effort. It was large enough to support industrial-scale parsing.

What “not deduped” tells us

The phrase not deduped matters.

It means the daily count should not be read as a count of unique messages or unique people. Some records could be duplicated or represented in more than one form in the processing flow.

This is a useful caution.

It does not make the scale small. It makes the number more precise. The slide was describing processing volume, not a perfectly cleaned end-state count.

That is exactly the kind of nuance a serious history should keep.

Content-derived metadata

The leaked slide uses a very important phrase: content-derived metadata.

That phrase helps explain why DISHFIRE matters so much historically.

The system did not just preserve text content or ordinary header fields. It used content to generate new metadata-like fields that could then enrich other analytic systems:

contact chaining,
geolocation,
alternative identifiers,
travel,
and finance.

This is one of the deepest lessons of the program.

The distinction between “content” and “metadata” can blur once content is mined to produce structured tags.

Why that distinction matters

That distinction matters because it changes how surveillance power should be understood.

A text message might be stored as content. But once a system extracts:

a phone number,
a border-crossing event,
a likely location,
a credit-card correlation,
or a meeting indicator, that content becomes a metadata-like analytic building block.

This matters because DISHFIRE shows how intelligence systems move information from one category into another. That is part of what made the program so powerful.

Missed-call alerts and contact chaining

One of the most striking extraction categories was missed-call alerts.

The leaked slide says PREFER extracted 5,058,114 missed-call alerts per day on average, and explicitly linked them to contact chaining. That means a system-generated text saying someone missed a call could help analysts infer a communications relationship even if they never saw the call audio.

This is historically important.

It shows how even incidental network notifications can become part of a social-network model.

SIM changes and device linking

The same slide says DISHFIRE/PREFER extracted 6,017,901 SIM card changes per day, using them to create IMSI/IMEI links.

This matters because phone intelligence is not only about the person. It is also about the relationship between:

subscriber identity,
device identity,
and account movement.

SIM-change detection helps analysts understand when a user moves from one device to another or when equipment and subscriber patterns should be linked.

That makes DISHFIRE valuable for device correlation as well as message interpretation.

Roaming alerts and border crossings

Another major extraction category was roaming information.

The slide reports 1,658,025 roaming events per day on average and links them to border crossings. This is a powerful example of how ordinary telecom operations can be turned into travel intelligence.

This matters because a roaming alert is not written as an intelligence report. It is written for billing or user notification. But it still reveals movement.

That is a recurring theme in modern SIGINT: administrative telecom data often carries more operational meaning than users realize.

Travel itineraries

The leaked presentation also reported extraction of travel data, including itinerary messages with:

multiple flights,
cancellations,
reschedules,
and delays.

This is historically significant because it shows DISHFIRE reaching into the logistics of everyday mobility. A target does not need to confess a journey if the airline or travel company texts the details to the phone.

That is why system-generated texts were so valuable. They often described the user’s world more directly than the user did.

Geocoordinates and meeting clues

Another revealing category involved geocoordinates.

The slide reports a daily average of 76,142 geocoordinate-related extractions, including route requests, location-based meeting setups, and tracking information. This matters because it links DISHFIRE to physical movement and rendezvous analysis, not just communications relationships.

This is a crucial point.

The program could help answer not only who someone knows, but also where they were headed or where they were planning to meet.

Electronic business cards and names

The slide also reports extraction of 113,672 VCard names per day on average, sometimes with links bridging internet and telephony identifiers and sometimes with images.

This matters because business cards and contact-exchange texts can collapse identity discovery into one message. A contact name, phone number, email address, and even image can arrive in a structured bundle.

That makes DISHFIRE part of the broader intelligence task of turning unknown selectors into named people.

Financial transactions

One of the most publicized aspects of DISHFIRE was its financial extraction capability.

The leaked slide lists:

61,488 credit-card transaction correlations
630,846 money-transfer events
115,480 bank-activity or account events

This matters because it pushed the program beyond communications and into financial pattern analysis. A text about a charge, transfer, or suspicious account activity can reveal:

identity,
travel,
relationships,
spending patterns,
and timing.

That is why the financial angle became one of the most controversial parts of the public reporting.

Why DISHFIRE was especially useful for target development

The Guardian’s reporting on internal GCHQ guidance contains one of the sharpest descriptions of DISHFIRE’s value: the system held a large volume of unselected SMS traffic.

This matters because many intelligence systems only preserve traffic connected to known selectors. DISHFIRE was reportedly useful precisely because it preserved messages from selectors that were not yet targeted.

That is historically important.

It means DISHFIRE functioned as a reservoir for future intelligence development. Once a new number became interesting, analysts could potentially look backward into older text traffic.

“Collects pretty much everything it can”

The most famous phrase associated with DISHFIRE came from the internal guide quoted by the Guardian: the system “collects pretty much everything it can.”

That phrase matters because it captures the character of the repository better than any public euphemism. The value of the program came from breadth.

This is one reason DISHFIRE belongs in the larger history of bulk collection rather than only targeted interception. The program’s power came from having a huge pool before the specific target was known.

GCHQ and UK-number searches

Another major part of the story is GCHQ access.

The Guardian reported that internal GCHQ guidance allowed analysts to search DISHFIRE for event data involving UK numbers, provided they used settings that avoided displaying message content without proper warrant authority. The same reporting said analysts were warned that showing message content from such searches without a warrant would be unlawful under UK rules.

This matters because DISHFIRE was not only an NSA system. It was part of a Five Eyes search environment.

That alliance use is one reason the controversy spread beyond the United States.

The foreign-intelligence legal framing

Publicly, NSA defended DISHFIRE as part of its lawful foreign-intelligence mission.

Reuters reported that NSA said DISHFIRE processed and lawfully stored collected SMS data, that its activities were focused on valid foreign intelligence targets, and that privacy protections existed for U.S. person SMS data that might be incidentally collected. NSA’s public EO 12333 page also says EO 12333 is the foundational authority for NSA’s foreign signals intelligence mission, especially collection on foreign persons wholly outside the United States.

This is historically important.

There is no public official DISHFIRE-only legal memo in the open record. But the public defense of the program fits the broader EO 12333-style foreign-intelligence collection world.

U.S. minimization and incidental collection

Public reporting drew a careful but important distinction regarding U.S. data.

The Guardian reported that communications from U.S. phone numbers appeared to be removed or minimized from the searchable database. Reuters separately quoted NSA saying that some U.S. person SMS data might be incidentally collected and that protections applied to its use, handling, retention, and dissemination.

This matters because the program was not publicly defended as a domestic text-message dragnet. But neither was it represented as perfectly free of U.S. person impact.

That nuance is essential to understanding the controversy.

Why this was more than “text content”

A final key lesson is that DISHFIRE was never just about reading people’s private conversations.

Its deeper value lay in turning everyday communications systems into structured intelligence:

missed calls into relationship graphs,
roaming alerts into border crossings,
banking texts into financial traces,
travel alerts into mobility timelines,
business cards into identity resolution,
and route or meeting texts into geospatial clues.

That is why the program matters so much.

It reveals how ordinary digital life produces intelligence even when users are not consciously disclosing much at all.

Why this belongs in the NSA section

This article belongs in declassified / nsa because DISHFIRE is one of the clearest examples of how NSA combined bulk storage with automated extraction to create a powerful target-development resource.

It helps explain:

how SMS became an intelligence source,
how PREFER transformed content into searchable tags,
how Five Eyes partners could use the resulting event data,
and how lawful foreign-intelligence framing coexisted with real privacy concerns.

That makes DISHFIRE more than a headline scandal. It is a structural case in the history of bulk mobile surveillance.

Why it matters in this encyclopedia

This entry matters because DISHFIRE and the Bulk Text Message Program preserves one of the strongest examples of intelligence value emerging from seemingly trivial communications.

Here DISHFIRE is not only:

a big repository,
a Snowden-era leak,
or a program with a dramatic name.

It is also:

a storage system,
an extraction engine,
a content-derived metadata factory,
a future-search target-development archive,
and a reminder that the richest intelligence often comes from the ordinary machine-generated traces people barely notice.

That makes DISHFIRE indispensable to a serious declassified history of NSA programs.

Frequently asked questions

What was DISHFIRE?

DISHFIRE was an NSA system that processed and stored lawfully collected SMS data at large scale. Public reporting and leaked documents describe it as a repository used for later search and analysis.

What was PREFER?

PREFER was the automated extraction layer that ran on DISHFIRE servers and identified message types and useful entities in SMS content, turning them into searchable tags and analytic fields.

Did DISHFIRE only collect metadata?

No. The public record suggests DISHFIRE involved SMS storage and that PREFER extracted content-derived metadata from message text. The system therefore sits across the content/metadata boundary rather than fitting neatly into only one category.

How many text messages did DISHFIRE handle?

A leaked 2011 NSA slide reported an average of 194,184,810 SMS messages per day in April 2011, marked as not deduped.

What kinds of things were extracted?

Publicly reported categories included missed-call alerts, roaming events, SIM changes, geocoordinates, travel itineraries, electronic business cards, credit-card transactions, money transfers, and bank-account activity.

Why were automated messages so important?

Because they often contained highly structured information about travel, finance, contact networks, and location. That made them easier to parse automatically and highly useful for intelligence analysis.

Did GCHQ use DISHFIRE?

Public Guardian reporting based on internal documents said GCHQ analysts could search certain DISHFIRE event data involving UK numbers while using controls intended to avoid unlawful viewing of content without proper authority.

Was DISHFIRE defended as lawful by NSA?

Yes. Public NSA statements quoted by Reuters said DISHFIRE processed and lawfully stored SMS data collected as part of NSA’s foreign-intelligence mission and that protections existed for incidentally collected U.S. person data.

References

Editorial note

This entry treats DISHFIRE not as a single surveillance headline, but as a storage-and-extraction architecture. The strongest way to read the program is through transformation. A text message looks small and ordinary to the sender. But at scale, and especially when machine-generated messages are parsed automatically, it can become a travel log, a financial trace, a social graph clue, a device-correlation signal, or a future-search lead. That is why DISHFIRE mattered. It showed how bulk storage combined with automated parsing can turn everyday mobile communications into a durable intelligence resource long before a target is even formally known.

DISHFIRE and the Bulk Text Message Program

DISHFIRE and the Bulk Text Message Program

Quick profile

What this entry covers

What DISHFIRE was

Why SMS was treated as an intelligence goldmine

DISHFIRE versus PREFER

PREFER as the extraction engine

Why automated messages mattered most

The April 2011 scale

What “not deduped” tells us

Content-derived metadata

Why that distinction matters

Missed-call alerts and contact chaining

SIM changes and device linking

Roaming alerts and border crossings

Travel itineraries

Geocoordinates and meeting clues

Electronic business cards and names

Financial transactions

Why DISHFIRE was especially useful for target development

“Collects pretty much everything it can”

GCHQ and UK-number searches

The foreign-intelligence legal framing

U.S. minimization and incidental collection

Why this was more than “text content”

Why this belongs in the NSA section

Why it matters in this encyclopedia

Frequently asked questions

What was DISHFIRE?

What was PREFER?

Did DISHFIRE only collect metadata?

How many text messages did DISHFIRE handle?

What kinds of things were extracted?

Why were automated messages so important?

Did GCHQ use DISHFIRE?

Was DISHFIRE defended as lawful by NSA?

Related pages

Suggested internal linking anchors

References

Editorial note