The Case for Using Algorithms to Validate Broadband Data

rangizzz/Shutterstock.com

Consider it Moneyball for the FCC.

With an estimated 2.5 billion plus gigabytes of data created every day, people, businesses, governments and organizations of every kind are generating and accessing more information than ever before. This information avalanche creates challenges and opportunities. For example, good data put to good use is revolutionizing healthcare, agriculture, manufacturing, and retail. But bad data poorly analyzed can be catastrophic for policymaking. It’s time for the FCC to step into the future by using artificial intelligence tools to address the continuing lack of affordable broadband to many communities—an increasingly entrenched problem of “internet inequality,” which impacts our economy and democracy and threatens the future global competitiveness of our country. 

Congress has charged the Federal Communications Commission with ensuring that every American has access to affordable high-speed internet service. To do so, we must review massive amounts of internet service provider data so we know where broadband is and is not deployed. This allows us to target federal dollars and drive further deployment as efficiently as possible. There is a lot of money at stake. The FCC’s broadband support programs distribute over $9 billion dollars annually. Achieving an accurate understanding of broadband deployment has proven to be challenging for the Commission, which utilizes a process that overstates broadband availability and has failed to catch egregiously flawed data submissions.  The FCC’s data is currently not granular or accurate enough to capture the actual number of homes or businesses that truly have connectivity. 

From sabermetrics to the stock market, sophisticated entities have developed data validation models to ensure that high-quality data is used to maximize efficiency and improve outcomes. Unfortunately, the FCC’s current data processes haven’t gotten high marks lately. Earlier this year it failed spectacularly when it failed to catch bad data indicating that a brand new provider had gone from zero customers to providing near-gigabit speeds to 62 million households in just six months. To ensure confidence in and accuracy of our data, the data validation approach used must do better to catch flawed data on the front end.

More sophisticated tools exist and have been available and widely utilized across the industry for years. Automated validation algorithms allow organizations to assess trends and identify patterns in real time to determine if information is potentially suspect. For instance, a bank customer who routinely makes small transactions in her hometown would likely have her account flagged for fraud if a large withdrawal was suddenly made halfway around the world. 

Looking toward the future, machine learning enables organizations managing large datasets to dynamically validate and score data to allow for wider deviations from “normal” patterns in certain circumstances before being flagged for action. For instance, many banks no longer require account holders to notify them before traveling overseas to avoid fraud alerts, as the pattern of purchases leading up to a trip can enable your bank to accurately identify legitimate transactions without even knowing your final travel destination.  

By adopting more sophisticated data validation algorithms, the FCC could avoid repeating past mistakes.  Such algorithms can not only automate the data validation process but also can ensure consistency and learn from previous provider submissions to improve error detection. While such technology would enhance our ability to more accurately map who does and does not have broadband, careful consideration must be given to ensuring such code is free of bias.

The FCC receives broadband provider coverage data every six months, creating a rich set of historical time series data.  Data validation algorithms work in large part by learning from past information to pattern match future observations. If a carrier has years of slow coverage growth, a sudden spike in the provider’s reported coverage area would trigger further review.  An algorithm could even consider additional data sources, such as publicly filed network expansion data and changes in demographic trends, to improve the dataset and reduce the number of false positives detected. 

The steps outlined above would improve the accuracy of future data submitted to the agency. However, a validation model is only as reliable as the historical data set used to train the model. The FCC needs to change its data collection practices, but we also need new information to check the broadband provider data we’re already collecting. For example, a model could compare reported data about certain locations with provider advertisements for those same locations. Better yet, the FCC should allow Americans to comment directly on potential map errors through an improved challenge process, injecting transparency and real-world scrutiny into an otherwise opaque process.

Democrats and Republicans agree: The FCC’s broadband maps are terrible. FCC Chairman Pai recently announced that he would share a draft order in August that would allow more granular data to be collected and put in place a citizen challenge process—where consumers can dispute providers’ coverage claims. This is a step forward, but to fix our maps we must not only improve the granularity of our data but also ensure that data submissions are vetted and validated by the commission, without having to rely on providers or consumers to make sure we get this right. Data algorithms can help achieve this goal. With better data, we will be able to make better policy and faithfully execute the core functions of our agency—including addressing internet inequality.

The need for accurate data is clear, and the technology to validate that information already exists.  We must put it to work for the American people.

Geoffrey Starks is a commissioner of the Federal Communications Commission. You can follow him on Twitter @GeoffreyStarks.