Named Entity Disambiguation for Maritime-related Data Retrieved from Heterogenous Sources
Jacek Małyszko , Witold Abramowicz , Milena Stróżyna
AbstractThe article concerns integration and disambiguation of data related to the maritime domain. A developed system is described, which collects and merges data about several maritime-related entities (vessels, vessel types, ports, companies etc.) retrieved from different internet sources and feeds the data into a single database. This process is however not trivial. There are few challenges, which need to be faced to successfully conduct it. Firstly, in different sources, entities may be referenced to in different ways, for example, by using different text strings. Additionally, some of these references may be ambiguous, i.e. potentially the reference may point to more than one entity. To enable efficient analysis of data coming from different sources, such ambiguities must be resolved automatically as a preprocessing step, before the data is uploaded to the database and utilized in further computations. The aim of the disambiguation process is to assign artificial, unique identifiers to each entity and then, if possible, automatically assign these identifiers to each data item related to a given entity. In the article, developed methods for resolving such ambiguities are discussed and their evaluation is presented.
|Journal series||TransNav -The International Journal on Marine Navigation and Safety of Sea Transportation, ISSN 2083-6473, e-ISSN 2083-6481, (B 12 pkt)|
|Publication size in sheets||0.6|
|Keywords in English||Maritime-Related Data, Heterogenous Sources, Disambiguation of Data, Data Sensors, Data Source, Common Operating Picture (COP), Maritime Domain Awareness (MDA), Maritime Mobile Service Identitiy (MMSI)|
|Score|| = 12.0, 13-12-2019, ArticleFromJournal|
= 12.0, 13-12-2019, ArticleFromJournal
|Publication indicators||= 1|
|Citation count*||4 (2021-07-22)|
* presented citation count is obtained through Internet information analysis and it is close to the number calculated by the Publish or Perish system.