Object Fusion in Geographic Information Systems

Object Fusion in Geographic Information Systems

picture


[ Abstract ] [ Description ] [ Current work ]
[ Papers ] [ Contact information ] [ Relevent GIS Links ] [ Bibliography ] [ Material ]

Abstract
Given two geographic databases, a fusion algorithm should produce all pairs of corresponding objects (i.e., objects that represent the same real-world entity). We develop four fusion algorithms, which only use locations of objects are described and their performance is measured in terms of recall and precision. These algorithms are designed to work even when locations are imprecise and each database represents only some of the real-world entities. We test our methods extensively; the tests show that the performance depends on the density of the data sources and the degree of overlap among them. All four algorithms are much better than the current state of the art (i.e., the one sided nearest-neighbor join). One of these four algorithms is best in all cases, at a cost of a small increase in the running time compared to the other algorithms

The Goal: Fusing Objects that Represent the Same Real-World Entity
Each data source provides data that the other sources do not provide. Hence, we want to integrate objects that represent the same real-world entity in the different sources. Doing so enables us to utilize the different perspectives of the data sources.

picture

Why do we use locations to match objects?

Why is it difficult to use locations?
Missing data makes the problem even more complicated!

Fusion methods
In this work, we present four novel methods for computing fusion sets. The first algorithm checks whether the nearest object is a good candidate for a match or whether it is not.
The second algorithm takes all the objects within some distance bound as candidates. and selects the fusion sets with the highest degree of confidence using threshold value.
The third algorithm checks also whether the objects within the distance bound have already found a match or whether they have not.
The fourth and last algorithm takes also into account the degree of overlap between the sources.

Current Work
We are working now to generalize our algorithms and develop new ones to solve several other fusion problems. First, in the algorithms we present we assume that each entity is presented by one object from each data set at the most, i.e. a 1:1 matching. There are cases, however, when the matching should be one to many, as in the generalization problem (i.e.transforming a map from big scale to a smaller scale). Second, we only use the location attribute of each object to identify him, in many instances there are other attributes, both spatial (e.g. area, perimeter or shape of polygon) and alphanumeric (e.g. name), which may help us as well in increasing the recall and precision.

We also work on the fusion of more than two datasets. It may be argued that multiple sources fusion may be done sequentially, i.e. the fusions of two sources, thereafter the fusion of the result set to the third source and on. However, since any fusion inevitably contains errors, it is not clear if such an operation lead to good results. A possible solution is to do the fusion of all the sources simultaneously. This, however, is a complicated process, since the number of possible fusion sets is exponential in the sources number; thus, such algorithm should be designed thriftily.

Papers

Contact information
pointContact person:
      Eliyahu Safra
      database lab,
      School of Engineering and Computer Science,
      Hebrew university
      Jerusalem 91904
      Israel

pointProject members:

Some Relevant GIS Links

Bibliography

Material

top