Construction of mathematical basis for realizing data rating service

Reference No. 2022a007
Type/Category Grant for Supporting the Advancement of Female Researchers-Workshop (I)
Title of Research Project Construction of mathematical basis for realizing data rating service
Principal Investigator Nakayama Naoko(Mamezou Co., Ltd. Strategic Digital Business Unit / Chief consultant)
Research Period September 21, 2022. - September 22, 2022.
Keyword(s) of Research Fields data rating, mathematical basis, data science, DX, AI
Objectives and Expected Results Recently, the public and private sectors have been promoting digital transformation (DX) in various fields. In this context, mutual use of data recorded and stored within organizations, such as companies, local governments, and educational and research institutions, has become one of the most important themes in DX. Traditionally, data quality has been a focus of attention when it comes to the mutual use of data. However, the quality referred to in this context has often dealt with physical or institutional issues such as uniformity of specifications and formats among organizations, missing data, and the credibility of data providers. In DX, however, it is important to handle and process data as numerical values to promote various types of automation, including AI. Therefore, it is considered necessary to evaluate not only the physical state or institutional evaluation but also the data itself, which is the object of processing, mathematically, in order to assess whether the data is of suitable quality for mathematical processing. For example, in the mutual use of data in DX in the future, the scenes in which the purpose is to acquire training data for AI are expected to increase, and in such cases, the mathematical quality of the data will surely be demanded from various fields. In addition, in data cleansing, which involves checking for formal consistency and finding, correcting, and deleting duplicates and erroneous data, further automation can be facilitated by considering a mathematical approach.
In this proposal, we conduct joint research that aims to accomplish the so-called "data rating", which mathematically determines and clarifies data quality. There are two types of approaches considered in this proposal as below:
(1) Rule-based data quality assurance:
The data quality assurance mainly involves the data derived from edges devices. We apply statistical methods (e.g., data variance, comprehensiveness, amounts of outliers, entropy, missing data) over the data, validating the possibilities of data usage in applications.
(2) data quality assurance based on the mathematical algorithm:
Deep learning has been rapidly developing during these years, in the time the adversarial example is well-known for being a threat to the safety of deep learning models. Research in which attacking the deep neural network by utilizing its gradient information has shown that using the information obtained via exploring the network can achieve a great success rate in terms of adversarial attack. We consequently propose a novel technique of adversarial attack based on the information gathered during the exploration, aiming to achieve a higher success rate of the attacks than before; as a result, we can improve the robustness of the deep learning model by constantly training it with numerous adversarial examples generated by the newly proposed technique.
Regarding the topics motioned above, joint research will be conducted to realize "data rating," which refers to mathematically determining and clarifying the quality of data, in this research proposal. Aiming to establish a new mathematical foundation (algorithm) for "data rating" through joint research meetings between industry and academia, discussions will be held on the implementation of a "data rating service" in conjunction with a cloud environment. The "data rating service" is a service that mathematically determines the quality of various data in the cloud environment and clarifies it as a rating. In order to implement this service, it is necessary to automate "data rating" as much as possible; therefore, the use of mathematical infrastructure and AI (Artificial Intelligence) is also envisioned. In the research meeting, the mathematical infrastructure will be used to validate whether data meets formal requirements or not; also, it will automatically rate whether the data is available to the users. A mechanism to guarantee data quality through the use of AI will also be studied at the same time. We expect that the outcome of this research can provide a good perspective on the "mathematical quality of data", which the rationale behind the mathematical data quality is generally inadequate. Many enterprises and groups are still vaguely conscious of this topic even though the usage of data in respect of AI is considered to increase definitely in the future.
Organizing Committee Members (Workshop)
Participants (Short-term Joint Usage)
Tanigawa Takuji(SoftBank Corp. / Ideation director )
Shinano Yuji(ZIB (Zuse Institute Berlin) / Researcher)
Kondo Masaaki(Keio University / Professor)
Ishihara Toru(Nagoya University / Professor)
Kaji Shizuo(Kyushu University / Professor)
Fujisawa Katsuki(Kyushu University / Professor)
Public Website