Source – sciencemag.org
Last week, the credit reporting agency Equifax announced that malicious hackers had leaked the personal information of 143 million people in their system. That’s reason for concern, of course, but if a hacker wants to access your online data by simply guessing your password, you’re probably toast in less than an hour. Now, there’s more bad news: Scientists have harnessed the power of artificial intelligence (AI) to create a program that, combined with existing tools, figured more than a quarter of the passwords from a set of more than 43 million LinkedIn profiles. Yet the researchers say the technology may also be used to beat baddies at their own game.
The work could help average users and companies measure the strength of passwords, says Thomas Ristenpart, a computer scientist who studies computer security at Cornell Tech in New York City but was not involved with the study. “The new technique could also potentially be used to generate decoy passwords to help detect breaches.”
The strongest password guessing programs, John the Ripper and hashCat, use several techniques. One is simple brute force, in which they randomly try lots of combinations of characters until they get the right one. But other approaches involve extrapolating from previously leaked passwords and probability methods to guess each character in a password based on what came before. On some sites, these programs have guessed more than 90% of passwords. But they’ve required many years of manual coding to build up their plans of attack.
The new study aimed to speed this up by applying deep learning, a brain-inspired approach at the cutting edge of AI. Researchers at Stevens Institute of Technology in Hoboken, New Jersey, started with a so-called generative adversarial network, or GAN, which comprises two artificial neural networks. A “generator” attempts to produce artificial outputs (like images) that resemble real examples (actual photos), while a “discriminator” tries to detect real from fake. They help refine each other until the generator becomes a skilled counterfeiter.
Giuseppe Ateniese, a computer scientist at Stevens and paper co-author, compares the generator and discriminator to a police sketch artist and eye witness, respectively; the sketch artist is trying to produce something that can pass as an accurate portrait of the criminal. GANs have been used to make realistic images, but have not been applied much to text.
The Stevens team created a GAN it called PassGAN and compared it with two versions of hashCat and one version of John the Ripper. The scientists fed each tool tens of millions of leaked passwords from a gaming site called RockYou, and asked them to generate hundreds of millions of new passwords on their own. Then they counted how many of these new passwords matched a set of leaked passwords from LinkedIn, as a measure of how successful they’d be at cracking them.
On its own, PassGAN generated 12% of the passwords in the LinkedIn set, whereas its three competitors generated between 6% and 23%. But the best performance came from combining PassGAN and hashCat. Together, they were able to crack 27% of passwords in the LinkedIn set, the researchers reported this month in a draft paper posted on arXiv. Even failed passwords from PassGAN seemed pretty realistic: saddracula, santazone, coolarse18.
Using GANs to help guess passwords is “novel,” says Martin Arjovsky, a computer scientist who studies the technology at New York University in New York City. The paper “confirms that there are clear, important problems where applying simple machine learning solutions can bring a crucial advantage,” he says.
Still, Ristenpart says “It’s unclear to me if one needs the heavy machinery of GANs to achieve such gains.” Perhaps even simpler machine learning techniques could have assisted hashCat just as much, he says. (Arjovsky concurs.) Indeed, an efficient neural net produced by Carnegie Mellon University in Pittsburgh, Pennsylavania, recently showed promise, and Ateniese plans to compare it directly with PassGAN before submitting his paper for peer review.
Ateniese says that though in this pilot demonstration PassGAN gave hashCat an assist, he’s “certain” that future iterations could surpass hashCat. That’s in part because hashCat uses fixed rules and was unable to produce more than 650 million passwords on its own. PassGan, which invents its own rules, can create passwords indefinitely. “It’s generating millions of passwords as we speak,” he says. Ateniese also says PassGAN will improve with more layers in the neural networks and training on many more leaked passwords.
He compares PassGAN to AlphaGo, the Google DeepMind program that recently beat a human champion at the board game Go using deep learning algorithms. “AlphaGo was devising new strategies that experts had never seen before,” Ateniese says. “So I personally believe that if you give enough data to PassGAN, it will be able to come up with rules that humans cannot think about.”