Entity resolution is a process of disambiguating real-world entities from data. It is used to identify and resolve multiple cases of a single entity to reveal a clearer picture of the information contained in the data. Entity resolution, also known as record linking, data matching, or deduplication, is a difficult task to achieve on a large scale. The core technology behind entity resolution is the process of determining when two references from the information system to a real-world entity refer to the same or different entities.
For example, the same patient may be admitted to a hospital at different times or through different departments, such as inpatient and outpatient admissions. The emergency room is the process of comparing the intake information for each visit and deciding which intake records are for the same patient and which are for different patients. Entity resolution is crucial, as it compares non-identical records despite all the data inconsistencies without the constant need to formulate rules. The fundamental law of entity resolution reformulates the principle of data quality relating to the integrity of the entity's identity in the ER vocabulary. Assuming that the identifier of one of the identities is preserved for the merged identity, it is possible to find the references that were resolved for other identities and review the link values (entity identifiers) that were previously assigned to them. Companies use entity resolution to connect disparate data sources with clean data, detect non-obvious relationships between multiple data silos, and obtain a unified view of the data.
There are many sophisticated computational approaches to both link prediction and entity resolution that can be used to recommend connections on social networks and identify duplicate accounts that belong to the same person. Entity resolution is also used to maintain a strong supply chain by consolidating supplier data into data silos spread across multiple business units, regions, geographies, and categories of parts and materials. The OYSTER system's resolution engine is an implementation of the R-Swoosh algorithm with an identity management system. Identities are stored as XML documents that are converted into an internal memory structure at the beginning of an execution and converted back to XML at the end. The explanation function and identity management system use this indexing of references to identities. We recently opened a Spark-based tool, Zingg, to solve entity resolution using machine learning.
This tool can be used to capture customer data entered by different relationship managers for the same customer (typographical errors in names, telephone numbers, addresses, etc.) and clean up this data by resolving entities to obtain a holistic view of customers.