The Historical Data Patent
On December 31, 2003 — the last working day of the year — eight Google engineers including Matt Cutts and Jeffrey Dean filed a patent that would quietly define the next two decades of link building. Its premise is a single sentence buried in Claim 1: "A typical, legitimate document attracts back links slowly." That line has been quoted, misquoted, and misapplied across ten thousand SEO blog posts. What most of them miss is the 55 other claims in the filing — the ones that describe exactly how Google monitors DNS records, detects rotating links, flags expired domains, and distinguishes a CDC outbreak spike from a purchased link campaign. This is that patent.
If you've read my On-Page SEO guide, this patent sits on the other side of the equation — not what's on the page, but what points to it and how that changes over time. It is one of the foundational off-page patents, and the 2024 Google API leak contains attributes that strongly correspond to every major mechanism.
The Honest Hedge
Every analysis has a threshold where certainty ends and inference begins. Here's where that line falls for this patent:
Google monitors link velocity — when links appear, disappear, and at what rate. The patent explicitly describes distinguishing topical spikes from manipulative ones (the CDC/SARS example). DNS records, name server profiles, and domain registration patterns are all domain legitimacy factors. Links from updated pages that survive editorial review carry an implicit freshness advantage. Rotating links are devalued. These are all literal statements from the patent text, with corresponding attributes found across the 2024 API leak.
The phraseAnchorSpamDays attribute strongly corresponds to the velocity check, measuring days for 80% anchor accumulation. The demotedStart / demotedEnd attributes suggest velocity demotions are time-bounded events — not permanent. The droppedRedundantAnchorCount attribute likely corresponds to the rotating links detection. And the RegistrationInfo module indicates domain-level monitoring is active in production. These are API attribute names, not scoring formulas — we know the inputs exist but not the exact weights.
The exact velocity thresholds that trigger a flag. The specific "good" vs. "bad" name server criteria in 2026 versus 2003. Whether expired domain detection still fully resets link equity or whether it's been softened over time. The interaction weights between velocity, freshness, and domain trust signals. And critically — how much of this 2003 patent has been supplemented or overridden by newer systems like SpamBrain. The patent describes the foundation; how much of that foundation is still load-bearing versus decorative is genuinely unknown.
Patent Metadata
Eight inventors. 56 claims. 472 citing families. Filed the last day of 2003, this patent is older than YouTube, Chrome, and the iPhone — yet I still see its fingerprints in client data every month. The inventors aren't just any Google engineers: Matt Cutts was head of Google's webspam team for a decade. Jeffrey Dean is the architect behind TensorFlow and two of Google's core infrastructure systems. Paul Haahr was the VP of Search Quality who later testified in the DOJ antitrust trial. When these three co-author a patent, it becomes infrastructure.
What This Patent Does (Plain English)
Most search patents tackle one thing. This one tackles everything that changes over time — which is almost everything. At its core, it describes a system that monitors and scores documents based on their history. Not what the page says today, but how what points to it has changed, how the domain has changed, and whether the patterns look natural.
Here's what the system does:
- Monitors when links appear and disappear — tracking the dates and the rate of change
- Weights links by freshness — newer links from recently updated pages carry more signal than stale ones
- Detects velocity anomalies — distinguishing between a topical spike (CDC during an outbreak) and a purchased link campaign
- Catches rotating links — sites that swap out their "featured link" daily get weighted lower than consistent editorial links
- Monitors DNS records — frequent registrar changes, name server swaps, and bulk registrations flag a domain as potentially illegitimate
- Detects expired/doorway domains — throwaway domains used for spam get identified by their registration patterns
- Flags ranking jumps — a document that suddenly ranks across many queries could be topical or an attempt to spam
Wait. Let me translate that to human.
↓
Link Velocity: The Core Signal
The phrase that launched a thousand SEO blog posts comes from this patent:
"A typical, 'legitimate' document attracts back links slowly. A large spike in the quantity of back links may signal a topical phenomenon (e.g., the CDC web site may develop many links quickly after an outbreak, such as SARS), or signal attempts to spam a search engine by exchanging links, purchasing links, or gaining links from documents without editorial discretion."
This is important: the patent explicitly distinguishes between spikes that are topical and spikes that are manipulative. It doesn't say "all fast link growth is bad." It says the system monitors the rate and decides.
The patent describes three specific aspects of link velocity monitoring:
1. Link Appearance and Disappearance
"Search engine 125 may then monitor the time-varying behavior of links to the document, such as when links appear or disappear, the rate at which links appear or disappear over time."
Every link gets a timestamp. Google tracks not just when links appear, but when they disappear — and it draws conclusions from both signals. A page that steadily loses inbound links over time is flagged as stale.
2. Rotating Links Detection
"Search engine 125 may weight documents that have a different featured link each day, despite having a very fresh link, differently (e.g., lower) than documents that are consistently updated and consistently link to a given target document."
This one is subtle. If a page changes which sites it links to on a daily or weekly rotation, the freshness of each individual link doesn't save it — the rotation pattern itself is a negative signal. This catches paid link placements that rotate sponsors, blogroll-for-sale schemes, and any arrangement where links are temporary by design.
I've seen this firsthand. When I was early in my career and didn't know better, I bought 12 PBN backlinks with a variety of keyword-rich anchor text. The provider fed them over about 21 days. By the time the last link was live, the page that was sitting at position 9 started dropping. I built a couple of legitimate guest posts, and it eventually stabilized around position 21. I got hit on both velocity of link acquisition and anchor text velocity simultaneously. That experience taught me something this patent makes explicit: the system doesn't look at any single signal in isolation.
3. Link Staleness and Disappearance
"The disappearance of many links can mean that the document to which these links point is stale. Once a document has been determined to be stale, the links contained in that document may be discounted or ignored."
This creates a cascade effect. If a page accumulates many inbound links but then those linking pages themselves become stale, the target page loses equity from two directions: the direct loss of links, and the devaluation of the remaining ones because the linking pages are now considered stale themselves.
phraseAnchorSpamDays attribute found in the API leak.Freshness Weighting: Not All Links Age the Same
Most SEOs think about link building in terms of quantity: more links equals more authority. This patent introduces a variable most ignore — time decay.
"Each link may be weighted by a function that increases with the freshness of the link."
The patent goes further with a genuinely elegant observation:
"The date of appearance/change of the document containing a link may be a better indicator of the freshness of the link based on the theory that a good link may go unchanged when a document gets updated if it is still relevant and good."
Read that again. The patent is saying: if a page gets updated but the link to your site remains unchanged, that's a stronger signal than a newly placed link. The link has survived editorial review. Someone updated the page and decided the link was still worth keeping. That persistence is a freshness signal — the link gets treated as fresh because the containing page was recently updated.
This flips the common SEO narrative. Among the most valuable links are those that survive editorial updates — not because age alone matters, but because persistence through changes is itself a freshness signal. A link that's been in a Wikipedia article for eight years, through hundreds of edits, carries an implicit freshness advantage that no freshly built guest post can match.
Domain-Level Signals: DNS, Name Servers, and Doorway Domains
This is where the patent gets genuinely surprising. Most SEOs associate US7346839B2 with link velocity. But a large section of the patent is dedicated to domain infrastructure monitoring — signals derived from DNS records, hosting history, and registration patterns that are independent of page content.
DNS Record Monitoring
"The DNS record for a domain may be monitored to predict whether a domain is legitimate. Search engine 125 may monitor whether physically correct address information exists over a period of time, whether contact information for the domain changes relatively often, whether there is a relatively high number of changes between different name servers and hosting companies."
Frequent registrar transfers, WHOIS changes, and hosting swaps all factor into a domain legitimacy assessment that operates independently of content quality. For practitioners who buy and sell domains, this isn't theoretical — it's the reason why some domain acquisitions require a stabilization period before the content starts performing.
Name Server Profiling
"A 'good' name server may have a mix of different domains from different registrars and have a history of hosting those domains, while a 'bad' name server might host mainly pornography or doorway domains, domains with commercial words (a common indicator of spam), or primarily bulk domains from a single registrar, or might be brand new."
This is reputation by association. The other domains on your name server affect how Google evaluates your domain. If your name server's portfolio is a mixed bag of legitimate businesses — law firms, restaurants, tech companies — that's a positive indicator. If it's predominantly one registrar, commercial keyword domains, or recently created bulk registrations, that's a negative flag. Your DNS neighborhood matters.
Doorway Domain Detection
"Individuals who attempt to deceive (spam) search engines often use throwaway or 'doorway' domains."
I've bought my fair share of expired auction domains over the years. Here's what I've learned that this patent helps explain: there's a critical distinction between expired and auctioned. An auctioned domain changes hands, but it doesn't go through the full expiration cycle. An expired domain actually lapses — the registration drops, it sits in limbo, and then someone re-registers it. The success rate of bringing a truly expired domain back to life is dramatically lower than an auctioned one. The topic of the new site needs to be nearly one-to-one with the old site. With auction domains, you can get further from the original topic, but the same principle applies: the more the new site resembles the old one, the more likely the historical equity survives.
Ranking Jump Detection
"A document that jumps in rankings across many queries might be a topical document or it could signal an attempt to spam search engine 125."
Again, the patent uses the same framework as link velocity — a sudden change could be legitimate or manipulative, and the system's job is to distinguish between the two. Like the CDC/SARS example for link spikes, a page that jumps across many queries during a major news event behaves differently from one that jumps because someone purchased a batch of exact-match anchor text links.
I've seen clients receive a large spike of referring domains inside a 30-day window, and the traffic immediately dropped — no core update, no algorithm change, nothing else happened. The only variable was a flood of backlinks. When I dug deep into those backlinks, the majority were from extremely questionable sources with rich anchor text. I performed surgical disavows on every one of those properties, and all of them recovered without exception.
What I'm seeing now, compared to pre-2022, is a shift in Google's response model. Rather than actively demoting for bad signals, Google increasingly ignores them. What it punishes is the absence of good signals. The mechanism this patent describes still exists in the API, but the enforcement has evolved from penalty-based to reward-based. Build good signals, and the bad ones matter less.
Historical Data SEO Implications: What This Means for Your Link Building
1. Velocity Alone Isn't the Trigger — Context Is
The patent's CDC example shows that Google distinguishes between topical spikes (viral content, product launches, PR campaigns) and manipulative ones. The patent itself mentions checking "news articles, discussion groups, etc." on the theory that spam documents won't be mentioned in the news. Modern Google likely extends this to a far broader set of corroborating signals. When something launches on Product Hunt and goes viral, Google can see the Product Hunt page mentioning your brand, the referral traffic, and the backlink spike all happening simultaneously. Cross-referencing entities is trivially easy for Google. Faking that constellation of signals — buying social signals, listicle placements, Facebook ad traffic to justify the links — requires more effort than building something that actually earns the attention.
2. Link Persistence > Link Volume
The freshness weighting mechanism rewards links that survive editorial updates over links that are simply new. A five-year-old link from a regularly updated resource page is worth more than five one-month-old links from sites nobody maintains. Build the kind of content that people keep linking to — not the kind that gets linked once and forgotten.
3. Domain Infrastructure Is an Independent Signal
Your DNS history, hosting stability, and name server neighborhood contribute to a domain legitimacy score that operates independently of your content quality. If you're acquiring domains, allow a stabilization period. Avoid rapid registrar changes. And check what else your hosting provider is hosting — your DNS neighborhood is part of your reputation.
4. Rotating Link Placements Are Devalued
Sponsored sidebar links that rotate monthly, blogroll exchanges that shuffle, and any link placement that changes regularly gets weighted lower than a consistent editorial link. If you're paying for link placement, a permanent editorial mention is worth more than any rotating sponsorship, even if the sponsorship generates more raw link count.
5. The Disavow Tool Still Works — When It Needs To
For older link profiles with pre-2022 toxic backlink spikes, surgical disavows still have a demonstrable effect. I've recovered every site where I identified and disavowed the velocity-triggering links. The historical velocity system described in this patent creates the conditions that make disavow effective — when the damage is from identifiable, time-stamped link patterns rather than from modern ML-based evaluation.
US8577893B1 — "Ranking Based on Reference Contexts" — Anna Patterson and Paul Haahr's surrounding text patent. Where this patent watches when links change, that patent watches what surrounds them. Together, they form a temporal + contextual filter.
US9953049B1 — "Seed Distance PageRank" — defines how far your page sits from Google's trusted seed sites. This patent's velocity signals decide whether your links count; Seed Distance decides how much they're worth.
Google API Leak Cross-Reference: AnchorSpamInfo and RegistrationInfo
The 2024 Google API leak — first reported by Rand Fishkin and investigated by Mike King at iPullRank — revealed attributes across 8 patent mechanisms, with 5 direct thematic matches, 2 extensions, and 1 gap. This is one of the strongest patent-to-API correspondences in the entire leak:
| Patent Mechanism | API Attribute | Alignment |
|---|---|---|
| Link appearance/disappearance dates | firstseenDate / creationDate / deletionDate in AnchorsAnchor | ✅ CONFIRMED |
| "Legitimate docs attract links slowly" | phraseAnchorSpamDays / phraseAnchorSpamRate in AnchorSpamInfo | ✅ CONFIRMED |
| Link freshness weighting | firstseenNearCreation boolean in AnchorsAnchor | ✅ CONFIRMED |
| DNS record monitoring | RegistrationInfo.createdDate / expiredDate | ✅ CONFIRMED |
| Doorway domain detection | expired boolean per anchor | ✅ CONFIRMED |
| Ranking jump detection | Q* scoring system (quality rater integration) | 🔶 API EXTENDS |
| Rotating links detection | droppedRedundantAnchorCount | 🔶 API EXTENDS |
| Name server profile analysis | No direct match found | 📜 PATENT ONLY |
The most telling attribute is phraseAnchorSpamDays — it measures the number of days for 80% of a page's anchor text to accumulate. Combined with phraseAnchorSpamRate, this strongly corresponds to the velocity detection concept described in the patent. And the API reveals something the patent didn't anticipate: demotedStart and demotedEnd dates — velocity-based demotions have explicit start and end dates. They're temporary, bounded events. They're not permanent penalties.
The API leak provides attribute names and data types — not the actual scoring formulas. The patent provides the philosophy and the mechanisms. Together they form a strong evidentiary chain. Neither alone is proof; together, they're as close to proof as we get in SEO. The one gap — name server profiling — may operate in a separate pipeline that isn't reflected in the Content Warehouse API.
Citation Network
Patent Family Chain
US20050071741A1 (application, 2003) → US7346839B2 (this patent, 2008) → 6 continuation patents covering document scoring (US7797316B2), content freshness (US7840572B2, US8112426B2), anchor text analysis (US8051071B2), domain trust (US8316029B2), and historical content signals (US8082244B2).
Forward Citations (Key Patents Citing This One)
| Patent | Relevance |
|---|---|
| US9953049B1 | Seed Distance PageRank — combines temporal signals with trust distance calculation |
| US8577893B1 | Reference Contexts — adds surrounding text fingerprinting on top of temporal monitoring |
| US7716225B1 | Behavioral Link Weighting — uses ML to weight links by position, font size, and click probability |
Related Articles on This Site
- US11409748B1 (Passage Ranking) — where this patent evaluates the link graph over time, Passage Ranking evaluates the on-page heading structure. Together they represent the off-page + on-page scoring foundations.
- US8661029B1 (NavBoost) — NavBoost adds behavioral user signals on top of the temporal link signals from this patent. A link that users actually engage with (NavBoost) that is also temporally consistent (this patent) carries maximum weight. See also: How NavBoost Really Works.
- US10235423B2 (Entity Scoring) — Entity Scoring uses knowledge graph entities to validate the relationship between linking and linked pages. This patent watches when links change; Entity Scoring watches who those links connect.
- US9767157B2 (Panda) — Panda evaluates on-page content quality. This patent evaluates off-page link quality over time. A page can survive Panda and still fail this patent's velocity checks — and vice versa.
- Quality Scoring Ensemble — the ensemble system that combines signals from all quality scoring components. This patent's temporal link data feeds into the same quality aggregation pipeline that Panda, NavBoost, and Entity Scoring contribute to.
- US7603350B1 (Entity Trust) — This patent's trust decay mechanism has an entity-level counterpart in the Entity Trust patent. Trust relationships between people also decay over time — and sudden appearances in the entity trust graph may trigger the same temporal flags that Historical Data applies to links.
Historical Data: What Doesn't Matter as Much as SEOs Think
The nature of this patent is a philosophical bet that Matt Cutts and Jeffrey Dean placed in 2003: things that happen naturally leave different patterns than things that are engineered. Legitimate documents attract links slowly. Legitimate domains keep their registrar for years. Legitimate links survive page updates. The nature of this insight hasn't changed and won't change — because it describes how the web, as a human-created information ecosystem, actually behaves.
The flavor — specific velocity thresholds, the exact definition of a "doorway domain," the name server profiling criteria — was the 2003 approach. And honestly, so many new systems have been put on top that the effects this patent originally aimed to achieve have been greatly nullified by more sophisticated mechanisms. SpamBrain is a neural network. This patent is if-then logic. The philosophy survived; the implementation has been absorbed into something far more complex.
That's where I genuinely push back on the SEO community's treatment of this patent. It gets cited constantly — usually the "legitimate documents attract backlinks slowly" line — as if it's the final word on link velocity. It isn't. I've bought expired auction domains. I've seen domain changes that the patent says should reset link equity but didn't. I've watched velocity that should trigger flags pass without any visible effect. The patent says a lot of things that reality doesn't exactly follow.
That's the point. Not that the patent is wrong — it describes a real system with real API confirmation. But it's a 2003 foundation with 22 years of construction on top. Don't just take patent interpretations — from LinkedIn gurus, from SEO bloggers, from me — at face value. Test it. Acquire an expired domain and rebuild the same site. Buy an auction domain and change the topic. Observe what happens in Search Console. Nothing beats lived experience, and 2003 is dinosaur-era Google.
Frequently Asked Questions
What does US7346839B2 actually do?
It describes a system for scoring web documents based on historical data — how links appear and disappear over time, how domains change registration, how quickly a page gains backlinks, and whether link patterns look natural or engineered. The patent covers link velocity, freshness weighting, DNS monitoring, and ranking jump detection across 56 claims.
Is the "link velocity penalty" real?
The patent describes a velocity monitoring signal, not an automatic penalty. A spike in backlinks could indicate topical relevance (the patent's CDC/SARS example) or manipulation — the system distinguishes between the two using corroborating signals. The API leak confirms this with phraseAnchorSpamDays and demotedStart / demotedEnd attributes, suggesting velocity-based demotions are temporary and bounded.
Does Google really monitor DNS records for SEO?
Yes. The patent explicitly describes monitoring domain registration dates, WHOIS contact changes, registrar transfers, and name server profiles. The API leak contains RegistrationInfo.createdDate and expiredDate attributes that correspond to this. Domain infrastructure is a legitimacy signal that operates independently of content quality.
Do expired domains lose their backlink value?
The patent says they should — it describes doorway domain detection and expired domain flagging. In practice, the answer is nuanced. Auction domains that don't fully expire and are rebuilt with similar content often retain significant link equity. Truly expired domains that are re-registered with completely different content have a much lower success rate. The patent describes the theory; execution varies.
What does the API leak confirm about this patent?
Five of eight patent mechanisms have direct thematic matches in the API — firstseenDate, phraseAnchorSpamDays, phraseAnchorSpamRate, firstseenNearCreation, RegistrationInfo, and the expired boolean. Two additional mechanisms (rotating links, ranking jumps) are extended by the API with attributes the patent didn't anticipate. Only name server profiling has no direct API match. These are attribute-name correspondences, not proof of implementation — we know the data fields exist, not the exact scoring formulas.
How does this patent interact with other Google ranking systems?
This patent provides temporal link signals that feed into the broader quality scoring pipeline alongside NavBoost (behavioral signals), Entity Scoring (knowledge graph signals), Panda (content quality), and Passage Ranking (heading structure). A link must pass multiple filters — temporal consistency, contextual relevance, behavioral engagement — to carry maximum weight.
Is this patent still relevant in 2026?
The philosophy is permanent — natural patterns differ from engineered ones. The specific implementation has been supplemented by 22 years of newer systems, including SpamBrain (neural network-based link spam detection). The API attributes are still active, maintenance fees are paid through year 12, and 472 families cite this patent. It's foundational infrastructure — not the current frontline, but the bedrock everything else sits on.