Patent US7346839B2: How Google Uses Time Against Link Builders

The Historical Data Patent

On December 31, 2003 — the last working day of the year — eight Google engineers including Matt Cutts and Jeffrey Dean filed a patent that would quietly define the next two decades of link building. Its premise is a single sentence buried in Claim 1: "A typical, legitimate document attracts back links slowly." That line has been quoted, misquoted, and misapplied across ten thousand SEO blog posts. What most of them miss is the 55 other claims in the filing — the ones that describe exactly how Google monitors DNS records, detects rotating links, flags expired domains, and distinguishes a CDC outbreak spike from a purchased link campaign. This is that patent.

If you've read my On-Page SEO guide, this patent sits on the other side of the equation — not what's on the page, but what points to it and how that changes over time. It is one of the foundational off-page patents, and the 2024 Google API leak contains attributes that strongly correspond to every major mechanism.

The Honest Hedge

Every analysis has a threshold where certainty ends and inference begins. Here's where that line falls for this patent:

What We Know (From the Patent)

Google monitors link velocity — when links appear, disappear, and at what rate. The patent explicitly describes distinguishing topical spikes from manipulative ones (the CDC/SARS example). DNS records, name server profiles, and domain registration patterns are all domain legitimacy factors. Links from updated pages that survive editorial review carry an implicit freshness advantage. Rotating links are devalued. These are all literal statements from the patent text, with corresponding attributes found across the 2024 API leak.

What We Infer

The phraseAnchorSpamDays attribute strongly corresponds to the velocity check, measuring days for 80% anchor accumulation. The demotedStart / demotedEnd attributes suggest velocity demotions are time-bounded events — not permanent. The droppedRedundantAnchorCount attribute likely corresponds to the rotating links detection. And the RegistrationInfo module indicates domain-level monitoring is active in production. These are API attribute names, not scoring formulas — we know the inputs exist but not the exact weights.

What We Don't Know

The exact velocity thresholds that trigger a flag. The specific "good" vs. "bad" name server criteria in 2026 versus 2003. Whether expired domain detection still fully resets link equity or whether it's been softened over time. The interaction weights between velocity, freshness, and domain trust signals. And critically — how much of this 2003 patent has been supplemented or overridden by newer systems like SpamBrain. The patent describes the foundation; how much of that foundation is still load-bearing versus decorative is genuinely unknown.



Patent Metadata

📄 US7346839B2 — Information Retrieval Based on Historical Data

Patent Number
US 7,346,839 B2
Common Name
The Historical Data Patent / "Link Velocity Patent"
Official Title
Information retrieval based on historical data
Inventors
Anurag Acharya, Matt Cutts, Jeffrey Dean, Paul Haahr, Taher H. Haveliwala, Glen Jeh, Sepandar D. Kamvar, and others
Assignee
Google LLC (originally Google Inc.)
Filed
December 31, 2003
Granted
March 18, 2008
Status
Active — maintenance fees paid through year 12
Patent Family Chain
US20050071741A1 (application) → US7346839B2 (this patent) → 6 continuation patents (US7797316B2, US7840572B2, US8112426B2, US8051071B2, US8316029B2, US8082244B2)
Forward Citations
472 citing families — one of the most-cited search patents ever filed
Claims
56 claims across document scoring, link analysis, domain monitoring, and content freshness
Classification
G06F 16/951 — Information retrieval; Web crawling techniques
PDF
Download full patent (PDF)
Screenshot of Google Patents showing US7346839B2 — Information retrieval based on historical data, including the abstract, inventors Anurag Acharya, Matt Cutts, Jeffrey Dean, Paul Haahr, and others, assignee Google LLC, filed December 31 2003, granted March 18 2008, and Active status
Direct capture from Google Patents — the official patent record for US7346839B2, exactly as filed with the USPTO. Note the 8 inventors including Matt Cutts and Jeffrey Dean, and the filing date of December 31, 2003.

Eight inventors. 56 claims. 472 citing families. Filed the last day of 2003, this patent is older than YouTube, Chrome, and the iPhone — yet I still see its fingerprints in client data every month. The inventors aren't just any Google engineers: Matt Cutts was head of Google's webspam team for a decade. Jeffrey Dean is the architect behind TensorFlow and two of Google's core infrastructure systems. Paul Haahr was the VP of Search Quality who later testified in the DOJ antitrust trial. When these three co-author a patent, it becomes infrastructure.


What This Patent Does (Plain English)

Most search patents tackle one thing. This one tackles everything that changes over time — which is almost everything. At its core, it describes a system that monitors and scores documents based on their history. Not what the page says today, but how what points to it has changed, how the domain has changed, and whether the patterns look natural.

Here's what the system does:

  1. Monitors when links appear and disappear — tracking the dates and the rate of change
  2. Weights links by freshness — newer links from recently updated pages carry more signal than stale ones
  3. Detects velocity anomalies — distinguishing between a topical spike (CDC during an outbreak) and a purchased link campaign
  4. Catches rotating links — sites that swap out their "featured link" daily get weighted lower than consistent editorial links
  5. Monitors DNS records — frequent registrar changes, name server swaps, and bulk registrations flag a domain as potentially illegitimate
  6. Detects expired/doorway domains — throwaway domains used for spam get identified by their registration patterns
  7. Flags ranking jumps — a document that suddenly ranks across many queries could be topical or an attempt to spam
Screenshot of FIG. 3 from US Patent 7346839B2 showing the search engine functional block diagram: SEARCH ENGINE 125 containing DOCUMENT LOCATOR 310, HISTORY COMPONENT 320, and RANKING COMPONENT 330, all connected to DOCUMENT CORPUS 340
FIG. 3 from US7346839B2 — the search engine architecture. The History Component (320) sits between the Document Locator and the Ranking Component, intercepting document data before scores are generated.

Wait. Let me translate that to human.

Diagram showing the Historical Data patent scoring pipeline. Four parallel input signals — Link Velocity, Freshness Decay, DNS History, and Domain Registration — feed into a central Historical Data Scoring Engine, which produces either a ranking boost or a ranking discount.
The Historical Data scoring pipeline. Four temporal signal channels converge in a single scoring engine that adjusts rankings up or down based on the historical patterns detected. This is a simplified representation of the system described in US7346839B2.

The phrase that launched a thousand SEO blog posts comes from this patent:

DIRECT PATENT QUOTE

"A typical, 'legitimate' document attracts back links slowly. A large spike in the quantity of back links may signal a topical phenomenon (e.g., the CDC web site may develop many links quickly after an outbreak, such as SARS), or signal attempts to spam a search engine by exchanging links, purchasing links, or gaining links from documents without editorial discretion."

This is important: the patent explicitly distinguishes between spikes that are topical and spikes that are manipulative. It doesn't say "all fast link growth is bad." It says the system monitors the rate and decides.

The patent describes three specific aspects of link velocity monitoring:

1. Link Appearance and Disappearance

PATENT MECHANISM

"Search engine 125 may then monitor the time-varying behavior of links to the document, such as when links appear or disappear, the rate at which links appear or disappear over time."

Every link gets a timestamp. Google tracks not just when links appear, but when they disappear — and it draws conclusions from both signals. A page that steadily loses inbound links over time is flagged as stale.

2. Rotating Links Detection

PATENT MECHANISM

"Search engine 125 may weight documents that have a different featured link each day, despite having a very fresh link, differently (e.g., lower) than documents that are consistently updated and consistently link to a given target document."

This one is subtle. If a page changes which sites it links to on a daily or weekly rotation, the freshness of each individual link doesn't save it — the rotation pattern itself is a negative signal. This catches paid link placements that rotate sponsors, blogroll-for-sale schemes, and any arrangement where links are temporary by design.

I've seen this firsthand. When I was early in my career and didn't know better, I bought 12 PBN backlinks with a variety of keyword-rich anchor text. The provider fed them over about 21 days. By the time the last link was live, the page that was sitting at position 9 started dropping. I built a couple of legitimate guest posts, and it eventually stabilized around position 21. I got hit on both velocity of link acquisition and anchor text velocity simultaneously. That experience taught me something this patent makes explicit: the system doesn't look at any single signal in isolation.

3. Link Staleness and Disappearance

PATENT MECHANISM

"The disappearance of many links can mean that the document to which these links point is stale. Once a document has been determined to be stale, the links contained in that document may be discounted or ignored."

This creates a cascade effect. If a page accumulates many inbound links but then those linking pages themselves become stale, the target page loses equity from two directions: the direct loss of links, and the devaluation of the remaining ones because the linking pages are now considered stale themselves.

Stylized timeline diagram comparing natural link growth pattern showing a gradual upward curve over 12 months versus a velocity spike pattern where 80 percent of anchor text accumulates in 30 days, triggering phraseAnchorSpamDays
Natural link growth vs. velocity spikes. The left pattern — steady, organic — is what US7346839B2 considers legitimate. The right pattern corresponds to the phraseAnchorSpamDays attribute found in the API leak.

Freshness Weighting: Not All Links Age the Same

Most SEOs think about link building in terms of quantity: more links equals more authority. This patent introduces a variable most ignore — time decay.

FRESHNESS FUNCTION

"Each link may be weighted by a function that increases with the freshness of the link."

The patent goes further with a genuinely elegant observation:

PERSISTENCE SIGNAL

"The date of appearance/change of the document containing a link may be a better indicator of the freshness of the link based on the theory that a good link may go unchanged when a document gets updated if it is still relevant and good."

Read that again. The patent is saying: if a page gets updated but the link to your site remains unchanged, that's a stronger signal than a newly placed link. The link has survived editorial review. Someone updated the page and decided the link was still worth keeping. That persistence is a freshness signal — the link gets treated as fresh because the containing page was recently updated.

This flips the common SEO narrative. Among the most valuable links are those that survive editorial updates — not because age alone matters, but because persistence through changes is itself a freshness signal. A link that's been in a Wikipedia article for eight years, through hundreds of edits, carries an implicit freshness advantage that no freshly built guest post can match.

Stylized diagram showing freshness decay with persistence bonus: a persistent link gains trust at each editorial review while a stale link decays steadily toward zero weight over time
Freshness decay: persistence beats novelty. A link that survives page updates rebounds in weight (gold line), while a link on an abandoned source page decays toward irrelevance (dashed line).

Domain-Level Signals: DNS, Name Servers, and Doorway Domains

This is where the patent gets genuinely surprising. Most SEOs associate US7346839B2 with link velocity. But a large section of the patent is dedicated to domain infrastructure monitoring — signals derived from DNS records, hosting history, and registration patterns that are independent of page content.

DNS Record Monitoring

DIRECT PATENT QUOTE

"The DNS record for a domain may be monitored to predict whether a domain is legitimate. Search engine 125 may monitor whether physically correct address information exists over a period of time, whether contact information for the domain changes relatively often, whether there is a relatively high number of changes between different name servers and hosting companies."

Frequent registrar transfers, WHOIS changes, and hosting swaps all factor into a domain legitimacy assessment that operates independently of content quality. For practitioners who buy and sell domains, this isn't theoretical — it's the reason why some domain acquisitions require a stabilization period before the content starts performing.

Name Server Profiling

PATENT MECHANISM

"A 'good' name server may have a mix of different domains from different registrars and have a history of hosting those domains, while a 'bad' name server might host mainly pornography or doorway domains, domains with commercial words (a common indicator of spam), or primarily bulk domains from a single registrar, or might be brand new."

This is reputation by association. The other domains on your name server affect how Google evaluates your domain. If your name server's portfolio is a mixed bag of legitimate businesses — law firms, restaurants, tech companies — that's a positive indicator. If it's predominantly one registrar, commercial keyword domains, or recently created bulk registrations, that's a negative flag. Your DNS neighborhood matters.

Doorway Domain Detection

PATENT MECHANISM

"Individuals who attempt to deceive (spam) search engines often use throwaway or 'doorway' domains."

I've bought my fair share of expired auction domains over the years. Here's what I've learned that this patent helps explain: there's a critical distinction between expired and auctioned. An auctioned domain changes hands, but it doesn't go through the full expiration cycle. An expired domain actually lapses — the registration drops, it sits in limbo, and then someone re-registers it. The success rate of bringing a truly expired domain back to life is dramatically lower than an auctioned one. The topic of the new site needs to be nearly one-to-one with the old site. With auction domains, you can get further from the original topic, but the same principle applies: the more the new site resembles the old one, the more likely the historical equity survives.

Stylized pipeline diagram showing Google's domain legitimacy assessment with three inputs — DNS Records, Name Server Profile, and Registration History — feeding into a Domain Legitimacy Score that produces either a Legitimate or Flagged result
The domain legitimacy assessment pipeline. Google evaluates your domain's infrastructure — DNS stability, name server neighborhood, and registration history — as part of its historical scoring independent of content quality.

Ranking Jump Detection

PATENT MECHANISM

"A document that jumps in rankings across many queries might be a topical document or it could signal an attempt to spam search engine 125."

Again, the patent uses the same framework as link velocity — a sudden change could be legitimate or manipulative, and the system's job is to distinguish between the two. Like the CDC/SARS example for link spikes, a page that jumps across many queries during a major news event behaves differently from one that jumps because someone purchased a batch of exact-match anchor text links.

I've seen clients receive a large spike of referring domains inside a 30-day window, and the traffic immediately dropped — no core update, no algorithm change, nothing else happened. The only variable was a flood of backlinks. When I dug deep into those backlinks, the majority were from extremely questionable sources with rich anchor text. I performed surgical disavows on every one of those properties, and all of them recovered without exception.

PRACTITIONER NOTE

What I'm seeing now, compared to pre-2022, is a shift in Google's response model. Rather than actively demoting for bad signals, Google increasingly ignores them. What it punishes is the absence of good signals. The mechanism this patent describes still exists in the API, but the enforcement has evolved from penalty-based to reward-based. Build good signals, and the bad ones matter less.


Historical Data SEO Implications: What This Means for Your Link Building

1. Velocity Alone Isn't the Trigger — Context Is

The patent's CDC example shows that Google distinguishes between topical spikes (viral content, product launches, PR campaigns) and manipulative ones. The patent itself mentions checking "news articles, discussion groups, etc." on the theory that spam documents won't be mentioned in the news. Modern Google likely extends this to a far broader set of corroborating signals. When something launches on Product Hunt and goes viral, Google can see the Product Hunt page mentioning your brand, the referral traffic, and the backlink spike all happening simultaneously. Cross-referencing entities is trivially easy for Google. Faking that constellation of signals — buying social signals, listicle placements, Facebook ad traffic to justify the links — requires more effort than building something that actually earns the attention.

2. Link Persistence > Link Volume

The freshness weighting mechanism rewards links that survive editorial updates over links that are simply new. A five-year-old link from a regularly updated resource page is worth more than five one-month-old links from sites nobody maintains. Build the kind of content that people keep linking to — not the kind that gets linked once and forgotten.

3. Domain Infrastructure Is an Independent Signal

Your DNS history, hosting stability, and name server neighborhood contribute to a domain legitimacy score that operates independently of your content quality. If you're acquiring domains, allow a stabilization period. Avoid rapid registrar changes. And check what else your hosting provider is hosting — your DNS neighborhood is part of your reputation.

4. Rotating Link Placements Are Devalued

Sponsored sidebar links that rotate monthly, blogroll exchanges that shuffle, and any link placement that changes regularly gets weighted lower than a consistent editorial link. If you're paying for link placement, a permanent editorial mention is worth more than any rotating sponsorship, even if the sponsorship generates more raw link count.

5. The Disavow Tool Still Works — When It Needs To

For older link profiles with pre-2022 toxic backlink spikes, surgical disavows still have a demonstrable effect. I've recovered every site where I identified and disavowed the velocity-triggering links. The historical velocity system described in this patent creates the conditions that make disavow effective — when the damage is from identifiable, time-stamped link patterns rather than from modern ML-based evaluation.

Related Patents

US8577893B1 — "Ranking Based on Reference Contexts" — Anna Patterson and Paul Haahr's surrounding text patent. Where this patent watches when links change, that patent watches what surrounds them. Together, they form a temporal + contextual filter.

US9953049B1 — "Seed Distance PageRank" — defines how far your page sits from Google's trusted seed sites. This patent's velocity signals decide whether your links count; Seed Distance decides how much they're worth.


Google API Leak Cross-Reference: AnchorSpamInfo and RegistrationInfo

The 2024 Google API leak — first reported by Rand Fishkin and investigated by Mike King at iPullRank — revealed attributes across 8 patent mechanisms, with 5 direct thematic matches, 2 extensions, and 1 gap. This is one of the strongest patent-to-API correspondences in the entire leak:

Patent MechanismAPI AttributeAlignment
Link appearance/disappearance datesfirstseenDate / creationDate / deletionDate in AnchorsAnchor✅ CONFIRMED
"Legitimate docs attract links slowly"phraseAnchorSpamDays / phraseAnchorSpamRate in AnchorSpamInfo✅ CONFIRMED
Link freshness weightingfirstseenNearCreation boolean in AnchorsAnchor✅ CONFIRMED
DNS record monitoringRegistrationInfo.createdDate / expiredDate✅ CONFIRMED
Doorway domain detectionexpired boolean per anchor✅ CONFIRMED
Ranking jump detectionQ* scoring system (quality rater integration)🔶 API EXTENDS
Rotating links detectiondroppedRedundantAnchorCount🔶 API EXTENDS
Name server profile analysisNo direct match found📜 PATENT ONLY

The most telling attribute is phraseAnchorSpamDays — it measures the number of days for 80% of a page's anchor text to accumulate. Combined with phraseAnchorSpamRate, this strongly corresponds to the velocity detection concept described in the patent. And the API reveals something the patent didn't anticipate: demotedStart and demotedEnd dates — velocity-based demotions have explicit start and end dates. They're temporary, bounded events. They're not permanent penalties.

Stylized diagram showing 6 patent mechanisms from 2003 mapped to their corresponding API leak attributes from 2024 — Link Timestamps to firstseenDate, Velocity Detection to phraseAnchorSpamDays, Freshness Weighting to firstseenNearCreation, DNS Monitoring to RegistrationInfo, Expired Domains to expired boolean, and Rotating Links to droppedRedundantAnchorCount — all connected by gold arrows with checkmarks
Patent → API: 21 years from filing to correspondence. Five of eight patent mechanisms have direct thematic matches in the 2024 Google API leak, with two more extended by attributes the patent didn't anticipate.
Inference vs. Confirmation

The API leak provides attribute names and data types — not the actual scoring formulas. The patent provides the philosophy and the mechanisms. Together they form a strong evidentiary chain. Neither alone is proof; together, they're as close to proof as we get in SEO. The one gap — name server profiling — may operate in a separate pipeline that isn't reflected in the Content Warehouse API.


Citation Network

Patent Family Chain

US20050071741A1 (application, 2003) → US7346839B2 (this patent, 2008) → 6 continuation patents covering document scoring (US7797316B2), content freshness (US7840572B2, US8112426B2), anchor text analysis (US8051071B2), domain trust (US8316029B2), and historical content signals (US8082244B2).

Forward Citations (Key Patents Citing This One)

PatentRelevance
US9953049B1Seed Distance PageRank — combines temporal signals with trust distance calculation
US8577893B1Reference Contexts — adds surrounding text fingerprinting on top of temporal monitoring
US7716225B1Behavioral Link Weighting — uses ML to weight links by position, font size, and click probability

Related Articles on This Site

  • US11409748B1 (Passage Ranking) — where this patent evaluates the link graph over time, Passage Ranking evaluates the on-page heading structure. Together they represent the off-page + on-page scoring foundations.
  • US8661029B1 (NavBoost) — NavBoost adds behavioral user signals on top of the temporal link signals from this patent. A link that users actually engage with (NavBoost) that is also temporally consistent (this patent) carries maximum weight. See also: How NavBoost Really Works.
  • US10235423B2 (Entity Scoring) — Entity Scoring uses knowledge graph entities to validate the relationship between linking and linked pages. This patent watches when links change; Entity Scoring watches who those links connect.
  • US9767157B2 (Panda) — Panda evaluates on-page content quality. This patent evaluates off-page link quality over time. A page can survive Panda and still fail this patent's velocity checks — and vice versa.
  • Quality Scoring Ensemble — the ensemble system that combines signals from all quality scoring components. This patent's temporal link data feeds into the same quality aggregation pipeline that Panda, NavBoost, and Entity Scoring contribute to.
  • US7603350B1 (Entity Trust) — This patent's trust decay mechanism has an entity-level counterpart in the Entity Trust patent. Trust relationships between people also decay over time — and sudden appearances in the entity trust graph may trigger the same temporal flags that Historical Data applies to links.

Historical Data: What Doesn't Matter as Much as SEOs Think

The nature of this patent is a philosophical bet that Matt Cutts and Jeffrey Dean placed in 2003: things that happen naturally leave different patterns than things that are engineered. Legitimate documents attract links slowly. Legitimate domains keep their registrar for years. Legitimate links survive page updates. The nature of this insight hasn't changed and won't change — because it describes how the web, as a human-created information ecosystem, actually behaves.

The flavor — specific velocity thresholds, the exact definition of a "doorway domain," the name server profiling criteria — was the 2003 approach. And honestly, so many new systems have been put on top that the effects this patent originally aimed to achieve have been greatly nullified by more sophisticated mechanisms. SpamBrain is a neural network. This patent is if-then logic. The philosophy survived; the implementation has been absorbed into something far more complex.

That's where I genuinely push back on the SEO community's treatment of this patent. It gets cited constantly — usually the "legitimate documents attract backlinks slowly" line — as if it's the final word on link velocity. It isn't. I've bought expired auction domains. I've seen domain changes that the patent says should reset link equity but didn't. I've watched velocity that should trigger flags pass without any visible effect. The patent says a lot of things that reality doesn't exactly follow.

That's the point. Not that the patent is wrong — it describes a real system with real API confirmation. But it's a 2003 foundation with 22 years of construction on top. Don't just take patent interpretations — from LinkedIn gurus, from SEO bloggers, from me — at face value. Test it. Acquire an expired domain and rebuild the same site. Buy an auction domain and change the topic. Observe what happens in Search Console. Nothing beats lived experience, and 2003 is dinosaur-era Google.


Frequently Asked Questions

What does US7346839B2 actually do?

It describes a system for scoring web documents based on historical data — how links appear and disappear over time, how domains change registration, how quickly a page gains backlinks, and whether link patterns look natural or engineered. The patent covers link velocity, freshness weighting, DNS monitoring, and ranking jump detection across 56 claims.

Is the "link velocity penalty" real?

The patent describes a velocity monitoring signal, not an automatic penalty. A spike in backlinks could indicate topical relevance (the patent's CDC/SARS example) or manipulation — the system distinguishes between the two using corroborating signals. The API leak confirms this with phraseAnchorSpamDays and demotedStart / demotedEnd attributes, suggesting velocity-based demotions are temporary and bounded.

Does Google really monitor DNS records for SEO?

Yes. The patent explicitly describes monitoring domain registration dates, WHOIS contact changes, registrar transfers, and name server profiles. The API leak contains RegistrationInfo.createdDate and expiredDate attributes that correspond to this. Domain infrastructure is a legitimacy signal that operates independently of content quality.

Do expired domains lose their backlink value?

The patent says they should — it describes doorway domain detection and expired domain flagging. In practice, the answer is nuanced. Auction domains that don't fully expire and are rebuilt with similar content often retain significant link equity. Truly expired domains that are re-registered with completely different content have a much lower success rate. The patent describes the theory; execution varies.

What does the API leak confirm about this patent?

Five of eight patent mechanisms have direct thematic matches in the API — firstseenDate, phraseAnchorSpamDays, phraseAnchorSpamRate, firstseenNearCreation, RegistrationInfo, and the expired boolean. Two additional mechanisms (rotating links, ranking jumps) are extended by the API with attributes the patent didn't anticipate. Only name server profiling has no direct API match. These are attribute-name correspondences, not proof of implementation — we know the data fields exist, not the exact scoring formulas.

How does this patent interact with other Google ranking systems?

This patent provides temporal link signals that feed into the broader quality scoring pipeline alongside NavBoost (behavioral signals), Entity Scoring (knowledge graph signals), Panda (content quality), and Passage Ranking (heading structure). A link must pass multiple filters — temporal consistency, contextual relevance, behavioral engagement — to carry maximum weight.

Is this patent still relevant in 2026?

The philosophy is permanent — natural patterns differ from engineered ones. The specific implementation has been supplemented by 22 years of newer systems, including SpamBrain (neural network-based link spam detection). The API attributes are still active, maintenance fees are paid through year 12, and 472 families cite this patent. It's foundational infrastructure — not the current frontline, but the bedrock everything else sits on.