Government Technology Featured Article
February 25, 2013
The Federal Government's Giant Interest in Big Data
The U.S. government’s interest in supercomputing, from ENIAC’s hydrogen bomb simulations to NOAA’s hurricane landfall predictions to the IRS’s tax processing, is so well established that it’s only natural the government would find Big Data tools and techniques equally as attractive. The term “big data” refers to data sets so large, they can’t be captured, processed, stored, searched or analyzed by standard relational database and data visualization technologies. Nor can the data sets be stored in a normal manner. This information has to live in a data warehouse. The Federal Government regularly generates datasets of this size.
One 2012 study by the Federal Big Data Commission said, “In 2009, the U.S. Government produced 848 petabytes of data and U.S. healthcare data alone reached 150 exabytes. Five exabytes (10^18 gigabytes) of data would contain all words ever spoken by human beings on earth.” The benefits of using big data tools has become apparent both as the tools have developed and the cost of using them dropped down into a range accessible to more users. Examples include the Center for Medicare and Medicaid Services (CMS) using big data to analyze the Medicare reimbursement system for improper payments.
CMS generates terabytes of data every day, and it’s only with the emergence of data warehousing technologies, along with mapreduce frameworks like Hadoop, and real-time streaming analytic tools that it is possible to analyze the torrents of data.
The goal is clear enough: to discover patterns and anomalies that were not visible looking at samples of isolated data silos. Now, it is possible to see a holistic view of a government agency’s situation by taking in data from multiple sources.
The CGA is expected to accumulate several petabytes of data by the end of 2014.
Other science-related big data initiatives to which agencies of the federal government have committed are at NASA, the Department of Energy, and the Center for Disease Control.
"Wired” reporter James Bamford wrote that the NSA intends to “intercept, decipher, analyze and store vast amounts of the world’s communications from satellites and underground and undersea cables of international, foreign and domestic networks.” Bamford says this will include “private emails, mobile phone calls and Google (News - Alert) searches, as well as personal data trails – travel itineraries, purchases and other digital ‘pocket litter’” to be stored in a dedicated Bluffdale, Utah data center expected to house an undisclosed number of yottabytes (yottabyte = 10^24 bytes) – which is big data by anyone’s definition.
Big data is an extraordinary tool for the United States Federal Government, for both good and bad reasons. While the ability to store and sort through yottabytes of data could lead to the government spying on more citizens; however, many bureaus will be able to use this data to improve the lives on American citizens.
Edited by Braden Becker
LATEST GOVERNMENT TECHNOLOGY NEWS