Wow! Cool Laboratory [Researcher Introduction]
Division of Synergetic Information Science,
Research Group of Complex Systems Engineering,
Laboratory of Formative System Engineering
Research fields: Computer science, artificial intelligence
Research themes: Machine learning, data mining, information retrieval
Achieving Accurate Decision Making, Using Human Confidence Judgments,
Contributing to Quality Control in Crowdsourcing
Facilitating Human Computation
The Formative System Engineering Laboratory in the Graduate School of Information Science and Technology is working to develop technology to design more highly functional software more simply and reliably, combining artificial intelligence (AI) and software engineering (SE) technologies. There are few such laboratories in Japan, and one of the characteristics of the Graduate School of Information Science and Technology is that it calls for “the creation of new research fields through fusion between different fields” and “wide-ranging and holistic higher education”.
Associate Professor Satoshi Oyama, who belongs to the laboratory, is carrying on research into the themes of artificial intelligence and machine learning. “Machine learning is a technique by which computers learn from past examples or environments, to achieve better prediction. Within that field, I have been researching basic algorithms and their application to Web searching and data mining, for about ten years”.
The general method for machine learning is “supervised learning”, in which a human provides the correct answer in advance for a given example problem, and the computer learns on that basis. For example, in a task such as image classification, answers like “this image shows/does not show a bird” are provided, and the computer classifies images accordingly. The role of the supervision is often taken by the researchers themselves, or students, and there was a problem that it was not possible to obtain enough labeled training data, due to time constraints and other issues. Also, however much machine learning advances, it is difficult to eliminate all errors. For truly difficult problems, computers cannot solve them alone, and they must defer to human judgment in some cases.
“One of the goals of artificial intelligence research until now has been to build a smart computer, so that everything can be handled by machine, even with no humans present. In recent years, the “human computation” approach, in which humans and machines cooperate to solve problems that cannot be solved by machines alone, has been attracting attention. The idea of human computation is to employ humans as a computing resource, and the advent of “crowdsourcing” (Commentary 1) Web services which can be used to request large numbers of people to do jobs (tasks) via the Internet, it is now possible to collect large amounts of answer data“.
Statistical Estimation of Worker Ability and the Accuracy of Confidence Judgments
Enabling More Reliable Quality Control of Crowdsourcing
Crowdsourcing makes it possible to request tasks relatively cheaply via the Internet, so its use is advancing in various fields of computer science, such as image recognition, natural language processing, information retrieval, and databases. However, crowdsourcing employs the general public, so there is no guarantee that all the workers have the necessary ability or diligence to handle the task. Therefore, assessment of the abilities of workers participating in crowdsourcing tasks is essential for quality control.
“There are a number of methods for assessing worker ability, but the simplest is to run tests with data for which the correct answers are known, and use the accuracy rate for the test to estimate ability. Another method is to disperse tests through the tasks, so that the workers are not aware of them. This method is close to supervised learning, and is applicable only to casesa where the correct answers are known”.
For problems where the correct answers are not known, the same task (labeling the same data item) can be performed by multiple workers and the results can be judged statistically. A much simpler method is to integrate labeling using a majority voting of the workers’ results. However, in crowdsourcing not all workers will have the same probability of error, and error rates are influenced by differences in ability and honesty. Associate professor Oyama is using statistical methods in his research into ways to estimate true labels in situations where workers may assign incorrect labels. He has proposed and experimentally tested a method in which the reliability of labels is estimated automatically on the basis of workers reporting whether or not they are confident in their own work results. (Commentary 2)
“Overconfident workers might report high confidence scores even though their work was actually incorrect, and conversely, modest workers might report low confidence scores despite giving correct answers. There might also be workers who give careless confidence reports without thinking. In this experiment, we investigated the correlation between the average worker confidence and the accuracy rate, and found that confidence scores contain information that is of value when estimating true labeling. The breakthrough point is that rather than trusting the worker’s reported self-confidence, we built a model that takes into consideration differences in the accuracy of self-reporting”.
Focusing Effort on Promoting Collaboration Between Industry, Academia, and Government
Expectations for Knowledge From the Fields of Psychology and Social Sciences
Associate Professor Oyama joined with researchers from the University of Tokyo, University of Tsukuba, Kyoto University, and Kyushu University to establish the Crowdsourcing Research Group*. The Group involves crowdsourcing-related companies as well as computer science researchers, and pursues various aspects of research.
Regarding future research activities, Oyama says “Research into incentives for honest reporting of one’s own confidence or ability will be important”. He is working with some experts in game theory to investigate the relationship between incentive design and quality control. “I want to research what kind reward setting is appropriate to obtain the highest-quality work results, and to predict how the quality of work results change when rewards are changed”.
Also, as there is human involvement in human computation and crowdsourcing, social sciences and psychology also get involved. For example, incentive design is handled in economics and game theory,, and the issues of estimating human ability and measuring the accuracy of confidence judgments have already been addressed by cognitive psychology and educational psychology. “This is an interdisciplinary theme, so there are many problems that cannot be solved by computer science researchers alone. I think there are areas in which we will apply knowledge from social sciences and psychology, and I want to deepen collaboration with people in those fields. I would also like people from companies that are interested in crowdsourcing but don’t want to take the plunge to consult me. I think we have a lot to discuss, such as areas in which we can use our technology, and areas that we can solve through partnership”.
Commentary 1 What is crowdsourcing?
Conceptual illustration of crowdsourcing services. It is simple to use the Internet to ask large numbers of people to perform tasks.
Commentary 2 Quality control in crowdsourcing
The relationship between confidence and accuracy rate. Some workers are overconfident and some are underconfident, so it is necessary to consider differences in the accuracy of workers’ confidence judgments for the purpose of quality control.