Data scraped from Facebook, Twitter to build 48 million people profiles

Facebook CEO Mark Zuckerberg testifies before a House Energy and Commerce Committee hearing regarding the company's use and protection of user data on Capitol Hill in Washington, April 11, 2018. /REUTERS
Facebook CEO Mark Zuckerberg testifies before a House Energy and Commerce Committee hearing regarding the company's use and protection of user data on Capitol Hill in Washington, April 11, 2018. /REUTERS

Personal information scraped from the social media profiles of up to 48 million people was left unsecured on a publicly available web storage platform, potentially allowing anyone to access "highly sensitive" data, a new report has warned.

According to security firm

UpGuard, who uncovered the vulnerability, Washington-based Localblox pieced together data from

Facebook, LinkedIn,

Twitter, Zillow, and other sites to "build a three-dimensional picture on every individual affected", ZD Net

reports.

The records were then stored in a single file on a public, unlisted Amazon S3 storage bucket.

While the bucket was secured hours after the researchers alerted Localblox’s CTO of the issue, the entire 1.2 terabyte file containing the information of millions of people had remained available to download for an unspecified amount of time beforehand.

This included names, dates of birth, phone numbers, email addresses, postal addresses, and sometimes, net worth, according to UpGuard.

After the Cambridge Analytica scandal was first uncovered, Facebook Chief Technology Office Mike Schroepfer detailed the worrying ease with which third parties could scrape public information from most users’ profiles.

Related:

And, UpGuard’s new report shows exactly that in action.

On the Localblox website, the firm says it "automatically crawls, discovers, extracts, indexes, maps and augments data in a variety of formats from the web and from exchange networks, adding crowd- sourced verification as needed."

This is used to help ‘companies acquire and utilize a vast amount of information’ from sources on the web.

According to ZD Net, the information in the newly discovered dataset was intended for use in advertising or political campaigning.

"The LocalBlox dataset, 1.2 terabytes in size, contained 48 million records on a lesser or similar number of individual people," UpGuard wrote in an article about the discovery.

"The presence of scraped data from social media sites like Facebook also highlights an important fact: all too often, data held by widely used websites can be targeted by unknown third parties seeking to monetize this information.

"In such cases, both a targeted website like Facebook and any affected users are being victimized, as personal information entrusted to the social network is snatched up for the benefit of a platform of which no one is aware."

SINGLE FILE WITHOUT PASSWORD

While the bucket containing the information was unlisted, it sat on the web storage platform without a password protecting its contents – stored in a single file titled ‘final_people_data_2017_5_26_48m.json.’

It was discovered in late February by Chris Vickery, director of cyber risk research at UpGuard, who notified CTO Ashfaq Rahman.

The file, which was also viewed by ZD Net, contained detailed information on millions of users, including data that could be used to pinpoint their location.

"The sheer breadth of the exposed data includes such information as individuals’ names, physical addresses, dates of birth, scraped LinkedIn job histories, public Facebook data, and individuals’ Twitter handles," according to UpGuard.

"In addition, it appears the prominent real estate site Zillow is used in the process as well, with information being somehow blended from the service's listings into the larger data pool.

"The database appears to work by tracking an IP address, matching collected data to that IP address when able, and thus providing a clearer image of the behavior and background of the user at that IP address."

Much of the information appears to have been pulled from social media and other sites – but, according to ZD Net, Localblox may also have accessed data from non-public sources, such as purchased marketing data.

The firm also says it has a US voter database including 180 million citizens, ZD Net reports.

In response to the claims, Rahman told ZD Net that the personal data did "not link to the actual owners" and that "most" of the data in the 48 million profiles was fabricated and used for internal tests.

He also told the site that ‘no other individual is believed to have accessed this file from the S3 bucket.’

Still, the firm secured the information hours after it was notified of the vulnerability.

Facebook just recently revealed it was cracking down on ‘data scraping’ after being hit with harsh criticism in the wake of Cambridge Analytica.

The site previously allowed users to enter someone's phone number or email address into the search bar to locate that person.

While the tool was helpful for finding friends in some scenarios, for example, in languages which "take more effort to type out a full name", the firm says it was also regularly abused.

As a result, Facebook decided to do away with it entirely.

"Given the scale and sophistication of the activity we've seen, we believe most people on Facebook could have had their public profile scraped in this way," Schroepfer wrote in a blog post about the firm’s numerous privacy updates.

"So we have now disabled this feature. We're also making changes to account recovery to reduce the risk of scraping as well."

More on this:

Also read:

WATCH: The latest videos from the Star