10.16 可持续的太字节级海洋图像分析的采集、整理和管理工作流

论文标题:An acquisition, curation and management workflow for sustainable, terabyte-scale marine image analysis

作者:Timm Schoening, Kevin Köser, Jens Greinert

数字识别码:10.1038/sdata.2018.181

光学成像技术是海洋研究中的一种常用技术。潜水机器人、拖曳式摄像机、投放式摄像机、电视指导采样装置,均可生成水下环境的图像数据。现在,一些先进的技术如4K摄像机、自动机器人、高容量电池和LED照明等使得系统的光学监测能够在大空间尺度和更短的时间内进行,且数据采集量和采集速度都有所增加。不断增多的船队和新兴的自主航行器也同时扩大着大数据集,这进一步增加了图像数据采集量和采集速度。大量的数据需要自动化处理工具以最大程度获取其中的信息。系统的数据分析主要得益于经校准的、地理相关的数据加上清晰的元数据描述,对机器视觉和机器学习尤是如此。因此,采集到的宝贵数据必须进行存档,并尽快进行整理、备份、公开。

在《科学数据》发表的An acquisition, curation and management workflow for sustainable, terabyte-scale marine image analysis一文中,来自基尔GEOMAR亥姆霍兹海洋研究中心的TimmSchoening及同事针对可持续的海洋图像分析,提出了一个完整的工作流程。作者就数据采集、整理和管理流程提出了意见,并将其应用于由自主水下航行器获取的多太字节(TB)深海数据集的处理案例中。

可持续的太字节级海洋图像分析的采集、整理和管理工作流

图1:图像数据从采集到整理和管理的工作流程示意图。

摘要:Optical imaging is a common technique in ocean research. Diving robots,towed cameras, drop-cameras and TV-guided sampling gear: all produce image data of the underwater environment. Technological advances like 4K cameras, autonomous robots, high-capacity batteries and LED lighting now allow systematic optical monitoring at large spatial scale and shorter time but with increased data volume and velocity. Volume and velocity are further increased by growing fleets and emerging swarms of autonomous vehicles creating big data sets in parallel. This generates a need for automated data processing to harvest maximum information. Systematic data analysis benefits fromcalibrated, geo-referenced data with clear metadata description, particularly for machine vision and machine learning. Hence, the expensive data acquisition must be documented, data should be curated as soon as possible, backed up and made publicly available. Here, we present a workflow towards sustainable marine image analysis. We describe guidelines for data acquisition, curation and management and apply it to the use case of a multi-terabyte deep-sea data set acquired by an autonomous underwater vehicle.

期刊介绍:Scientific Data (https://www.nature.com/sdata/) is a peer-reviewed, open-access journal for descriptions of scientifically valuable datasets, and research that advances the sharing and reuse of scientific data. Scientific Data welcomes submissions from a broad range of research disciplines, including descriptions of big or small datasets, from major consortiums to single research groups. Scientific Data primarily publishes Data Descriptors, a new type of publication that focuses on helping others reuse data, and crediting those who share.

The 2017 journal metrics for Scientific Data are as follows:

•2-year impact factor: 5.305

•5-year impact factor: 5.862

•Immediacy index: 0.843

•Eigenfactor® score: 0.00855

•Article Influence Score: 2.597

•2-year Median: 2


分享到:


相關文章: