For Google, Everything Is a Popularity Contest

Uladzik Kryhin/Shutterstock.com

The limits of the search giant's philosophy

When I saw that Google had introduced a “Classic Papers” section of Google Scholar, its search tool for academic journals, I couldn’t help but stroke my chin professorially. What would make a paper a classic, especially for the search giant? In a blog post introducing the feature, Google software engineer Sean Henderson explains the company’s rationale. While some articles gain temporary attention for a new and surprising finding or discovery, others “have stood the test of time,” as Henderson puts it.

How to measure that longevity? Classic Papers selects papers published in 2006, in a wide range of disciplines, which had earned the most citations as of this year. To become a classic, according to Google, is just to have been the most popular over the decade during which Google itself rose to prominence.

It might seem like an unimportant, pedantic gripe to people outside of academia. But Scholar’s classic papers offers a window into how Google conceives of knowledge—and the effect that theory has on the ideas people find with its services.

* * *

Google’s original mission is to “organize the world’s information and make it universally accessible.” It sounds simple enough, if challenging given the quantity of information world and the number of people that might access it. But that mission obscures certain questions. What counts as information? By what means is it accessible, and on whose terms?

The universals quickly decay into contingencies. Computers are required, for one. Information that lives offline, in libraries or in people’s heads, must be digitized or recorded to become “universally” accessible. Then users must pay for the broadband or mobile data services necessary to access it.

At a lower level, ordinary searches reveal Google’s selectiveness. A query for “Zelda,” for example, yields six pages of links related to The Legend of Zelda series of Nintendo video games. On the seventh page, a reference to Zelda Fitzgerald appears. By the eighth, a pizzeria called Zelda in Chicago gets acknowledgement, along with Zelda’s café in Newport, Rhode Island. Adding a term to the query, like “novelist” or “pizza,” produces different results—as does searching from a physical location in Chicago or Newport. But the company’s default results for simple searches offers a reminder that organization and accessibility mean something very particular for Google.

That hidden truth starts with PageRank, Google’s first and most important product. Named after Google founder Larry Page, it is the method by which Google vanquished almost all its predecessors in web search. It did so by measuring the reputation of web sites, and using that reputation to improve or diminish its likelihood of appearing earlier in search results.

When I started using the web in 1994, there were 2,738 unique hostnames (e.g., TheAtlantic.com) online, according to Internet Live Stats. That’s few enough that it still made sense to catalog the web in a directory, like a phone book. Which is exactly what the big web business founded that year did. It was called Yahoo!

But by the time Page and Sergey Brin started Google in 1998, the web was already very large, comprising over 2.4 million unique hosts. A directory that large made no sense. Text searches had already been commercialized by Excite in 1993, and both Infoseek and AltaVista appeared in 1995, along with Hotbot in 1996. These and other early search engines used a combination of paid placement and text-matching of query terms against the contents of web pages to produce results.

Those factors proved easy to game. If queries match the words and phrases on web pages, operators can just obscure misleading terms in order to rise in the rankings. Page and Brin proposed an addition. Along with analysis of the content of a page, their software would use its status to make it rise or fall in the results. The PageRank algorithm is complex, but the idea behind it is simple: It treats a link to a webpage as a recommendation for that page. The more recommendations a page has, the more important it becomes to Google. And the more important the pages that link to a page are, the more valuable its recommendations become. Eventually, that calculated importance ranks a page higher or lower in search results.

Although numerical at heart, Google made search affective instead. The results just felt right—especially compared to other early search tools. That ability to respond as if it knew what its users were thinking needed laid the foundation for Google’s success. As the media scholar Siva Vaidhyanathan puts it in his book The Googlization of Everything, relevance became akin to value. But that value was always “relative and contingent,” in Vaidhyanathan’s words. That is, the actual relevance of a web page—whether or not it might best solve the problem or provide the information the user initially sought—became subordinated to the sense of initial delight and subsequent trust in Google’s ability to deliver the “right” results. And those results are derived mostly from a series of recurrent popularity contests PageRank runs behind the scenes.

* * *

Google Scholar’s idea of what makes a paper a classic turns out to be a lot like Google’s idea of makes a website relevant. Scholarly papers cite other papers. Like a link, a citation is a recommendation. With enough citations, a paper becomes “classic” by having been cited many times. What else would “classic” mean, to Google?

As it turns out, scholars have long used citation count as a measure of the impact of papers and the scholars who write them. But some saw problems with this metric as a measure of scholarly success. For one, a single, killer paper can skew a scholar’s citation count. For another, it’s relatively easy to game citation counts, either through self-citation or via a cabal of related scholars who systematically cite one another.

In 2005, shortly after Google went public, a University of California physicist named Jorge Hirsch tried to solve some of these problems with a new method. Instead of counting total citations, Hirsch’s index (or h-index, as it’s known) measures a scholar’s impact by finding the largest number of papers (call that number h) that have been cited at least h times. A scholar with an h-index of 12, for example, has 12 papers each of which is cited at least 12 times by other papers. H-index downgrades the impact of a few massively successful papers on a scholar’s professional standing, rewarding consistency and longevity in scholarly output instead. Hirsch’s method also somewhat dampens the effect of self- and group-citation by downplaying raw citation counts.

H-index has become immensely influential in scholarly life, especially in science and engineering. It is not uncommon to hear scholars ask after a researcher’s h-index as a measure of success, or to express pride or anxiety over their own h-indexes. H-index is regularly used to evaluate (and especially to cull) candidates for academic jobs, too. It also has its downsides. It’s hard to compare h-indexes across fields, the measure obscures an individual’s contribution in co-authored papers, and it abstracts scholarly success from its intellectual merit—the actual content of the articles in question.

That makes h-index eminently compatible with life in the Google era. For one, Google Scholar has been a boon to its influence, because it automates the process of counting citations. But for another, Google has helped normalize reference-counting as a general means of measuring relevance and value for information of all kinds, making the process seem less arbitrary and clinical when used by scholars. The geeks brought obsessive numerism to the masses.

Instead of measuring researchers’ success, Google Scholar’s Classic Papers directory defines canon by distance in time. 2006 is about ten years ago—long enough to be hard to remember in full for those who lived through it, but recent enough that Google had found its legs tracking scholarly research (the Scholar service launched in 2004). Classic papers, in other words, are classic to Google more than they are classic to humanity writ large.

In the academy today, scholars maintain professional standing by virtue of the quantity and regulatory of their productivity—thus Hirsch’s sneer at brilliant one-offs. Often, that means scholarly work gets produced not because of social, industrial, or even cosmic need, but because the wheels of academic productivity must appear to turn. Pressing toward novel methods or discoveries is still valued, but it’s hard and risky work. Instead, scholars who respond to a specific, present conditions in the context of their fields tend to perform best when measured on the calendar of performance reviews.

Looking at papers cited the most in 2006, as Google Scholar’s Classic Papers does, mostly reveals how scholars have succeeded at this gambit, whether intentionally or not. For example, the most-cited paper in film is “Narrative complexity in contemporary American television,” by the Middlebury College television studies scholar Jason Mittell. Mittell was one of the first critics to explain the rise of television as high culture, particularly via social-realist serials with complex narratives, like The Sopranos. Mittell’s take was both well-reasoned and well-timed, as shows like Deadwood, Big Love, and The Wire were enjoying their runs when he wrote the paper. That trend has continued uninterrupted for the decade since, making Mittell’s article a popular citation.

Likewise, the most cited 2006 paper in history is “Can history be open source? Wikipedia and the future of the past,” by Roy Rosenzweig. The article offers a history and explanation of Wikipedia, along with an assessment of the website’s quality and accuracy as an historical record (good and bad, it turns out). As with complex TV, the popularity of Rosenzweig’s paper relates largely to the accidents of its origin. Wikipedia was started in 2001, and by 2005 it had begun to exert significant impact on teaching and research. History has a unique relationship to encyclopedic knowledge, giving the field an obvious role in benchmarking the site. Rosenzweig’s paper even discusses the role of Google’s indexing methods in helping to boost Wikipedia’s appearance in search results, and the resulting temptation among students to use Wikipedia as a first source. Just as in Mittell’s case, these circumstances have only amplified in the ten years since the paper’s publication, steadying its influence.

This pattern continues in technical fields. In computer vision, for example, a method of identifying the subject of images is the top cited paper. Image recognition and classification was becoming increasingly important in 2006, and the technique the paper describes, called spatial pyramid matching, remains important as a method for image matching. Once more, Google itself remains an obvious beneficiary of computer vision methods.

To claim that these papers “stand the test of time,” as Henderson does, is suspect. Instead, they show that the most popular scholarship is the kind that happened to find purchase on a current or emerging trend, just at the time that it was becoming a concern for a large group of people in a field, and for whom that interest amplified rather than dissipated. A decade hence, the papers haven’t stood the test of time so much as proved, in retrospect, to have taken the right bet at the right moment—where that moment also corresponds directly with the era of Google’s ascendance and dominance.

* * *

PageRank and Classic Papers reveal Google’s theory of knowledge: What is worth knowing is what best relates to what is already known to be worth knowing. Given a system that construes value by something’s visibility, be it academic paper or web page, the valuable resources are always the ones closest to those that already proved their value.

Google enjoys the benefits of this reasoning as much as anyone. When Google tells people that it has found the most lasting scholarly articles on a subject, for example, the public is likely believe that story because they also believe Google tends to find the right answers.

But on further reflection, a lot of Google searches do not produce satisfactory answers, products, businesses, or ideas. Instead, they tend to point to other venues with high reputations, like Wikipedia and Amazon, with which the public has also developed an unexamined relationship of trust. When the information, products, and resources Google lists don’t provide a solution to the problem the seeker sought, the user has two options. Either continue searching with more and more precise terms and conditions in the hopes of being led to more relevant answers, or shrug and click the links provided, resolved to take what was given. Most choose the latter.

This way of consuming information and ideas has spread everywhere else, too. The goods worth buying are the ones that ship via Amazon Prime. The Facebook posts worth seeing are the ones that show up in the newsfeed. The news worth reading is the stuff that shows up to be tapped on. And as services like Facebook, Twitter, and Instagram incorporate algorithmic methods of sorting information, as Google did for search, all those likes and clicks and searches and hashtags and the rest become votes—recommendations that combine with one another to produce output that’s right by virtue of having been sufficiently right before.

It’s as if Google, the company that promised to organize and make accessible the world’s information, has done the opposite. Almost anything can be posted, published, or sold online today, but most of it cannot be seen. Instead, information remains hidden, penalized for having failed to be sufficiently connected to other, more popular information. But to think differently is so uncommon, the idea of doing so might not even arise—for shoppers and citizens as much as for scholars. All information is universally accessible, but some information is more universally accessible than others.

X
This website uses cookies to enhance user experience and to analyze performance and traffic on our website. We also share information about your use of our site with our social media, advertising and analytics partners. Learn More / Do Not Sell My Personal Information
Accept Cookies
X
Cookie Preferences Cookie List

Do Not Sell My Personal Information

When you visit our website, we store cookies on your browser to collect information. The information collected might relate to you, your preferences or your device, and is mostly used to make the site work as you expect it to and to provide a more personalized web experience. However, you can choose not to allow certain types of cookies, which may impact your experience of the site and the services we are able to offer. Click on the different category headings to find out more and change our default settings according to your preference. You cannot opt-out of our First Party Strictly Necessary Cookies as they are deployed in order to ensure the proper functioning of our website (such as prompting the cookie banner and remembering your settings, to log into your account, to redirect you when you log out, etc.). For more information about the First and Third Party Cookies used please follow this link.

Allow All Cookies

Manage Consent Preferences

Strictly Necessary Cookies - Always Active

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data, Targeting & Social Media Cookies

Under the California Consumer Privacy Act, you have the right to opt-out of the sale of your personal information to third parties. These cookies collect information for analytics and to personalize your experience with targeted ads. You may exercise your right to opt out of the sale of personal information by using this toggle switch. If you opt out we will not be able to offer you personalised ads and will not hand over your personal information to any third parties. Additionally, you may contact our legal department for further clarification about your rights as a California consumer by using this Exercise My Rights link

If you have enabled privacy controls on your browser (such as a plugin), we have to take that as a valid request to opt-out. Therefore we would not be able to track your activity through the web. This may affect our ability to personalize ads according to your preferences.

Targeting cookies may be set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant adverts on other sites. They do not store directly personal information, but are based on uniquely identifying your browser and internet device. If you do not allow these cookies, you will experience less targeted advertising.

Social media cookies are set by a range of social media services that we have added to the site to enable you to share our content with your friends and networks. They are capable of tracking your browser across other sites and building up a profile of your interests. This may impact the content and messages you see on other websites you visit. If you do not allow these cookies you may not be able to use or see these sharing tools.

If you want to opt out of all of our lead reports and lists, please submit a privacy request at our Do Not Sell page.

Save Settings
Cookie Preferences Cookie List

Cookie List

A cookie is a small piece of data (text file) that a website – when visited by a user – asks your browser to store on your device in order to remember information about you, such as your language preference or login information. Those cookies are set by us and called first-party cookies. We also use third-party cookies – which are cookies from a domain different than the domain of the website you are visiting – for our advertising and marketing efforts. More specifically, we use cookies and other tracking technologies for the following purposes:

Strictly Necessary Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Functional Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Performance Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Social Media Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Targeting Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.