recommended reading

Most Data Isn’t 'Big,' and Businesses Are Wasting Money Pretending It Is

HerrBullermann/Shutterstock.com

Big data! If you don’t have it, you better get yourself some. Your competition has it, after all. Bottom line: If your data is little, your rivals are going to kick sand in your face and steal your girlfriend.

There are many problems with the assumptions behind the “big data” narrative (above, in a reductive form) being pushed, primarily, by consultants and IT firmsthat want to sell businesses the next big thing. Fortunately, honest practitioners of big data—aka data scientists—are by nature highly skeptical, and they’ve provided us with a litany of reasons to be weary of many of the claims made for this field. Here they are:

Even web giants like Facebook and Yahoo generally aren’t dealing with big data, and the application of Google-style tools is inappropriate.

Facebook and Yahoo run their own giant, in-house “clusters”—collections of powerful servers—for crunching data. The necessity of these clusters is one of the hallmarks of big data. After all, data isn’t all that “big” if you could chew through it on your PC at home. The necessity of breaking problems into many small parts, and processing each on a large array of computers, characterizes classic big data problems like Google’s need to compute the rank of every single web page on the planet.

But it appears that for both Facebook and Yahoo, those same clusters are unnecessary for many of the tasks which they’re handed. In the case of Facebook, most of the jobs engineers ask their clusters to perform are in the “megabyte to gigabyte” range (pdf), which means they could easily be handled on a single computer—even a laptop.

The story is similar at Yahoo, where it appears the median task size handed to Yahoo’s cluster is 12.5 gigabytes. (pdf) That’s bigger than what the average desktop PC could handle, but it’s no problem for a single powerful server.

All of this is outlined in a paper from Microsoft Research, aptly titled “Nobody ever got fired for buying a cluster,” which points out that a lot of the problems solved by engineers at even the most data-hungry firms don’t need to be run on clusters. And why is that an issue? Because there are vast classes of problems for which clusters are a relatively inefficient—or even totally inappropriate—solution.

Read more at Quartz

(Image via HerrBullermann/Shutterstock.com)

Threatwatch Alert

Thousands of cyber attacks occur each day

See the latest threats

JOIN THE DISCUSSION

Close [ x ] More from Nextgov
 
 

Thank you for subscribing to newsletters from Nextgov.com.
We think these reports might interest you:

  • It’s Time for the Federal Government to Embrace Wireless and Mobility

    The United States has turned a corner on the adoption of mobile phones, tablets and other smart devices, outpacing traditional desktop and laptop sales by a wide margin. This issue brief discusses the state of wireless and mobility in federal government and outlines why now is the time to embrace these technologies in government.

    Download
  • Featured Content from RSA Conference: Dissed by NIST

    Learn more about the latest draft of the U.S. National Institute of Standards and Technology guidance document on authentication and lifecycle management.

    Download
  • A New Security Architecture for Federal Networks

    Federal government networks are under constant attack, and the number of those attacks is increasing. This issue brief discusses today's threats and a new model for the future.

    Download
  • Going Agile:Revolutionizing Federal Digital Services Delivery

    Here’s one indication that times have changed: Harriet Tubman is going to be the next face of the twenty dollar bill. Another sign of change? The way in which the federal government arrived at that decision.

    Download
  • Software-Defined Networking

    So many demands are being placed on federal information technology networks, which must handle vast amounts of data, accommodate voice and video, and cope with a multitude of highly connected devices while keeping government information secure from cyber threats. This issue brief discusses the state of SDN in the federal government and the path forward.

    Download
  • The New IP: Moving Government Agencies Toward the Network of The Future

    Federal IT managers are looking to modernize legacy network infrastructures that are taxed by growing demands from mobile devices, video, vast amounts of data, and more. This issue brief discusses the federal government network landscape, as well as market, financial force drivers for network modernization.

    Download

When you download a report, your information may be shared with the underwriters of that document.