Companies use entity resolution to connect disparate data sources with clean data, detect non-obvious relationships between several data silos, and obtain a unified view of data. This process is essential for businesses to make inferences about large volumes of information in business systems and applications by gathering records that correspond to the same entity (client). Entity resolution is necessary when combining different sets of data based on entities that may or may not share a common identifier. It helps companies to compare non-identical records despite all the data inconsistencies without the constant need to formulate rules. Entity resolution is used to maintain a strong supply chain by consolidating supplier data into data silos spread across multiple business units, regions, geographies, and categories of parts and materials.
It also helps to unify customer data before starting any marketing activity. In addition, entity resolution can block fraudulent sellers from re-enrolling with slight variations in their data. It is also used to reconcile products, compare their prices, and decide which vendor sells the cheapest. We recently opened a Spark-based tool, Zingg, to solve the resolution of entities using machine learning. This approach involves decoupling entity-representation learning from similarity learning, so that an entity-solving task can be reused in other entity-solving tasks.
The benefits of entity resolution are enormous, especially for the public sector related to health, transportation, finance, law enforcement and the fight against terrorism.