Machine Learning without Negative Data: A New AI Move

RIKEN’s Centre for Advanced Intelligence Project made a breakthrough when they developed a way through which artificial intelligence could differentiate between things without the use of ‘negative data,’ something that was essential before this progress was made.

But what does this exactly mean?

We, humans, classify data as well. We can differentiate between blue and black, big and small, life from objects and good from bad. It is our innate nature and mental capacity that allows us to make these classifications on a subconscious level. Artificial intelligence runs on the same idea of classifications; the negative data and the positive data. They mean exactly what they entail, positive data meaning things considered to be good (a smiling face perhaps) and negative data meaning the bad (a frowning face perhaps).  

The factor that makes this entire process difficult in regards to artificial intelligence is that it requires both kinds of data to work efficiently. The reality is, in most cases, you will not be able to find negative data no matter which category you are looking under. You might not be able to find pictures with people frowning perhaps. In more realistic terms, you may not be able to do things like market research because while finding consumers of your product may be easier, finding those who chose the competitors product may be hard. This is because artificial intelligence would need the positive data of the competitors to make that analysis for you.  

Similarly, web or app developers may face the same hardship. They may continue to gather the data of individuals who use their app or are subscribers but as soon as they stop, their data is deleted as well as per the privacy policy and protection of personal information. Thus creating a situation where negative data is once again, unavailable.

According Takashi Ishida from RIKEN, “we have made it possible for computers to learn with only positive data, as long as we have a confidence score for our positive data, constructed from information such as buying intention or the active rate of app users. Using our new method, we can let computers learn a classifier only from positive data equipped with confidence.”

Ishida, Niu Gang, and Masashi Sugiyama made the proposal that the computers could add the confidence score to get the probability of data belonging to a positive or negative class. Their computers thus learned from positive data and user information through confidence scores against the problems of machine learning to help it divide data between the two categories.

They tested this new progress through two pictures, a T-shirt representing positive and a sandal representing negative data. They attached confidence scores to the former and found that on some occasions the computers were able to make distinctions without the negative data at all. According to Ishida, “This discovery could expand the range of applications where classification technology can be used. Even in fields where machine learning has been actively used, our classification technology could be used in new situations where only positive data can be gathered due to data regulation or business constraints. In the near future, we hope to put out technology to use in various research fields, such as natural language processing, computer vision, robotics, and bioinformatics.”

Note: Some information used in this article is from

Asif Razzaq is an AI Journalist and Cofounder of Marktechpost, LLC. He is a visionary, entrepreneur and engineer who aspires to use the power of Artificial Intelligence for good.

Asif's latest venture is the development of an Artificial Intelligence Media Platform (Marktechpost) that will revolutionize how people can find relevant news related to Artificial Intelligence, Data Science and Machine Learning.

Asif was featured by Onalytica in it’s ‘Who’s Who in AI? (Influential Voices & Brands)’ as one of the 'Influential Journalists in AI' ( His interview was also featured by Onalytica (