Written by Visiting Professor Peter Cochrane OBE.
Every now and again serendipity draws me into something new, unusual and exciting. And so it was, that following my previous blog on the Truth Engine I received an invitation to become an advisor to the first global datathon about to apply ‘Natural Language Processing’ (NLP) to a representative sample of Fake News and Propaganda that appeared in the media and social networks.
The event was co-organised by The Data Sciences Society and the QCRI Qatar and held on 21 - 27 Jan 2019, with a dedicated platforms hosted in Doha, Bangalore, Riyadh and Sofia. A foundation component of the Datathon was the QCRI/MIT-CSAIL Tanbih project focussed on detecting bias, and propaganda in news publications.
By the time I was engaged as an advisor; over 200 participants across 30 countries had already registered and formed 40 teams of students and professionals. Their primary Datathon challenge was stated as follows:
“To develop intelligent systems able to classify entire articles and text fragments as propagandistic or not”
All the teams were challenged by the same standardised AI training datasets comprising 451 news articles with sentence-level annotations indicating the content as propaganda or not. And they enjoyed less than 5 days (and nights) to come up with workable solutions, demonstrably viable capabilities, fully compiled reports, and report back/present to the judges.
The basic thinking of the organisers is further exemplified by the following definitions and categories:
Propaganda & Fake News Definition: The spreading of ideas, facts, or allegations deliberately to influence opinions with reference to predetermined ends.
Or, as PolitiFact would have it: "Fake news is made-up stuff, masterfully manipulated to look like credible journalistic reports that are easily spread online to large audiences willing to believe the fictions and spread the word.”
And for CBS: "Stories that are provably false, with enormous traction in the culture, and consumed by millions of people".
Difficulty Level 1: Build an intelligent system able to detect any propagandistic article
Difficulty Level 2: Detect whether each of sentences propagandistic or not
Difficulty Level 3: Locate and identify each propagandistic technique
So, how did the teams do? The top ten achieved detection accuracies just below or just above a remarkable 86%. This was surprised everyone, including me! How did they do it in such a short time? They configured standard, or readily available, Natural Language Processing (NLP) engines and/or components, with AI packages that learned to identify emergent patterns of words, phrases, statement and headline types.
If AI is good at anything it is pattern recognition and matching. This is a particularly important quality when identifying patterns hidden in massive data sets that escape human ability. The big question now is; could we significantly improve on these results? My guess is that old engineering mantra applies: You get 80% of the result for 20% of the effort…and getting to 100% is probably impossible.
The reality is that Fake News and Propaganda will most likely need 5 - 10 distinctly different techniques applied at the same time as inferred in my previous blog: How to Build a Truth Engine. Here I identified Fact Checkers and long term historical analysis of publications and behaviours, employment, employer, organisation, motivation, and hidden agendas as accessible and workable trending metrics. I am now adding AI applied to NPL to that list.
If there is a negative here it has to be the ‘Dark Side’ watching and accessing the Hackathon to learn about new defence strategies. However, the good news is that human habituality is very hard to hide, and AI will continue to learn and adjust accordingly in near real time. So I think this is a war we might just win provided we consolidate our global resources. Watch this space!
*Peter is a seasoned professional with over 40 years of hands on management, technology and operational experience, Peter has been involved in management and transformation of giant corporations, establishing new companies, advising governments and the creation and deployment of new technologies, products and management systems.