Data scraping treasure trove found in the wild

Data scraping treasure trove found in the wild

We bring word of yet more data exposure, in the form of “nonsensitive” data scraping to the tune of 66m records across 3 large databases. The information was apparently scraped from various sources and left to gather dust, for anyone lucky enough to stumble upon it.

What is data scraping?

The gathering of information from websites either by manual means, which isn’t time optimal, or by automated processes such as dedicated programs or bots. Often, this data scraping is for nefarious purposes and can be used for marketing or simply threatening behaviour. It also typically relies on the person being scraped to have provided much of the grabbable data upfront. It’s frowned upon, but it’s often unclear where things stand legally.

Scrape all the things

Three large databases were found by security researchers, containing a combined tally of 66,147,856 unique records. At least one instance was exposed due to a lack of authentication. The records are very business-centric, with one (for example) containing full name, email, listed location, employment history, and skills. This sounds very much like the information you see on a public facing Linkedin profile. Indeed, many people have said they received breach notifications to their Linkedin specific mail, and there’s some mention of Github too.

Elsewhere, some 22 million records were found on the second server. This related to job search aggregation data, and this included IP, name, email, and potential job locations. Number 3 sang to the tune of 48 million records, and also sounds like a generic business-centric dump. Name, phone, employer, and so on.

Is the threat serious?

The information collected isn’t exactly a red hot dump of personal information, but it’s certainly useful for phishing attempts. It could also prove useful to anyone wanting a ready made marketing list. The big problem is that even if the ones doing the data scraping had no harmful intentions, that may not apply to anybody finding the treasure trove.

Given how this information was stumbled upon in the first place, there’s no real way to know how many bad actors got their hands on it first.

How can I reduce the scraping risk?

Well, that’s a good question. Given that the data was (mostly) freely given online in terms of the LinkedIn profile information, it’s all about personal choice. Take a look at your LinkedIn right now. Are you happy with what’s on display? Have you hidden any of it? Perhaps it’s a good idea to remove older roles, or jobs of a sensitive nature. Maybe that phone number doesn’t need to be so prominent. How about location, does it have to be so precise? Or would a broader area suffice?

Unfortunately, many people don’t consider the information they place online to be harmful, until it suddenly is. By the time it’s been scraped, plundered, and jammed into a larger database, it’s already too late to do anything about it.

The only real solution is to control every last aspect of what you’re happy to place in front of everybody else, which for most people involves having to dredge up a list of sites and accounts then start stripping things out. That’s fine; it’s never too late to start pulling things offline that don’t need to be there.

Next steps for anyone affected?

Given the very prominent business angle to this one, it’d be wise to consider who may look to take advantage of it. Alongside the previously mentioned phishers, this is the kind of thing someone could use alongside the offer of fake jobs. If you want to become a money mule, this could definitely be the “perfect” lead in!

A common destination for business-centric grab bags such as this one are unremarkable job search sites. Be on the look out for a flood of poor quality job offer spam. Be especially wary if they come bearing gifts of paid membership, as nobody should pay someone grabbing your data free of charge then using it to spam them with nonsense.

Ah yes, spam.

Scraped email lists will inevitably be harvested, readjust quality filters if needed. The good news is, most email offerings do a pretty good job of keeping your mailbox clean.

Almost all of us will end up in a data dump at some point. Whether scraped or hacked, being cautious around strange phone calls and peculiar emails will go a long way towards minimising any further potential harm.

ABOUT THE AUTHOR

Christopher Boyd

Former Director of Research at FaceTime Security Labs. He has a very particular set of skills. Skills that make him a nightmare for threats like you.