A study on the method of eliminating duplication of ocean temperature and salinity data

Conference: AIIPCC 2022 - The Third International Conference on Artificial Intelligence, Information Processing and Cloud Computing
06/21/2022 - 06/22/2022 at Online

Proceedings: AIIPCC 2022

Pages: 7Language: englishTyp: PDF

Authors:
Ji, Fengying; Dong, Mingmei; Liu, Yulong; Xu, Shanshan; Wan, Fangfang; Shi, Xiaoxiao; Han, Luyao; Yue, Xinyang; Zhang, Zengjian (National Marine Data and Information Service, China)

Abstract:
Duplicate data introduced during data collection, transmission, exchange, and management will lead to uncertainty in the total amount of data and cause unreliability of the whole data set, resulting in erroneous results of statistical analysis of marine characteristics. In this paper, based on extensive data processing practices, the author first clarifies the source and type of temperature and salinity duplicate data, then proposes a series of threshold for identifying repeated data from different instruments, and establishes a process for identifying and eliminating duplicate data. In this way, duplicated data in the WOD data are effectively removed. This operational process can effectively remove the duplicates of temperature and salinity data and improve the application value of the dataset, which can also be applied to other types of data.