Ads peddling the victims of human trafficking hide among millions of escort listings online. While identifying similar ads could be the key to taking down a human trafficking organization, the sheer volume of listings — with new ones added each day — makes the task a daunting one for law enforcement.
Researchers at Carnegie Mellon University and McGill University hope to simplify that task by adapting an algorithm first used to spot anomalies in data, like typos in patient information at hospitals or errant figures in accounting, to identify similarities across escort ads. The algorithm scans and clusters similarities in text and could help law enforcement direct their investigations and better identify human traffickers and their victims, said Christos Faloutsos, the Fredkin Professor in Artificial Intelligence in CMU's School of Computer Science, who led the team.
"Our algorithm can put the millions of advertisements together and highlight the common parts," Faloutsos said. "If they have a lot of things in common, it's not guaranteed, but it's highly likely that it is something suspicious."
The team calls the algorithm InfoShield and will present a paper on their findings at this week's IEEE International Conference on Data Engineering (ICDE).
According to the International Labor Organization, an estimated 24.9 million people are trapped in forced labor. Of those, 55% are women and girls trafficked in the commercial sex industry, where most ads are posted online. The same person may write ads for four to six victims, leading to similar phrasing and duplication among listings.
"Human trafficking is a dangerous societal problem which is difficult to tackle," lead authors Catalina Vajiac and Meng-Chieh Lee wrote. "By looking for small clusters of ads that contain similar phrasing rather than analyzing standalone ads, we're finding the groups of ads that are most likely to be organized activity, which is a strong signal of (human trafficking)."
Vajiac is a Ph.D. student in the Computer Science Department. Lee worked on InfoShield while a visiting student at CMU and will continue to do so when he returns to pursue his Ph.D. Other authors included fellow Tartans Faloutsos and Namyong Park, a Ph.D. student in the Computer Science Department. Reihaneh Rabbany, a former post-doctoral researcher at CMU who is now an assistant professor in the School of Computer Science at McGill, and her students Aayushi Kulshrestha and Sacha Levy collaborated on the research. The team was assisted by experts from Marinus Analytics, a spinoff of CMU's Robotics Institute that uses artificial intelligence, machine learning, predictive modeling and geospatial analysis to combat sex trafficking.
To test InfoShield, the team ran it on a set of escort listings in which experts had already identified trafficking ads. The team found that InfoShield outperformed other algorithms at identifying the trafficking ads, flagging them with 85% precision. Perhaps more importantly, it did not incorrectly flag any escort listings as human trafficking ads when they were not. False positives can quickly erode trust in an algorithm, Faloutsos said.
Proving their success was tricky. The test data set contained actual ads placed by human traffickers. The information in these ads is sensitive and kept private to protect the victims of human trafficking, so the team could not publish examples of the similarities identified or the data set itself. This meant that other researchers could not verify their work.
"We were basically saying, 'Trust us, our algorithm works,'" Vajiac said.
To remedy this, the team looked for public data sets they could use to test InfoShield that mimicked what the algorithm looked for in human trafficking data: text and the similarities in it. They turned to Twitter, where they found a trove of text and similarities in that text created by bots.
Bots will often tweet the same information in similar ways. Like a human trafficking ad, the format of a bot tweet might be the same with some pieces of information changed. Rabbany said that in both cases — Twitter bots and human trafficking ads — the goal is to find organized activity.
Among tweets, InfoShield outperformed other state-of-the-art algorithms at detecting bots. Vajiac said this finding was a surprise, given that other algorithms take into account Twitter-specific metrics such as the number of followers, retweets and likes, and InfoShield did not. The algorithm instead relied solely on the text of the tweets to determine bot or not.
"That speaks a lot to how important text is in finding these types of organizations," Vajiac said.
Despite working on algorithms for forecasting and anomaly detection for 30 years, this was the first time Faloutsos applied one to stopping human trafficking. He and the team hope their work plays a role in helping law enforcement rescue victims and in reducing human suffering.
Their work continues. The team talks to experts weekly to learn more about human trafficking and efforts to end it. The more they learn, the more passion they put toward stopping it.
"You see how relevant and impactful your work could be," Rabbany said. "And you see how much work there is to be done, how much room for improvement there is and how much you could bring to the table.