Object Fusion in Geographic Information Systems

[ Abstract ] [ Description ] [ Current work ]
[ Papers ] [ Contact information ] [ Relevent GIS Links ] [ Bibliography ] [ Material ]

Abstract Given two geographic databases, a fusion algorithm should produce all pairs of corresponding objects (i.e., objects that represent the same real-world entity). We develop four fusion algorithms, which only use locations of objects are described and their performance is measured in terms of recall and precision. These algorithms are designed to work even when locations are imprecise and each database represents only some of the real-world entities. We test our methods extensively; the tests show that the performance depends on the density of the data sources and the degree of overlap among them. All four algorithms are much better than the current state of the art (i.e., the one sided nearest-neighbor join). One of these four algorithms is best in all cases, at a cost of a small increase in the running time compared to the other algorithms

The Goal: Fusing Objects that Represent the Same Real-World Entity
Each data source provides data that the other sources do not provide. Hence, we want to integrate objects that represent the same real-world entity in the different sources. Doing so enables us to utilize the different perspectives of the data sources.

Why do we use locations to match objects?

There are no global keys that can tell which objects should be fused
Locations are a natural choice to replace keys, since :
- Each spatial object includes location attributes
- In a perfect world, two objects that represent the same entity have the same location

Why is it difficult to use locations?

In real maps, locations are inaccurate
When an object is depicted as a polygon, it is unclear which point should represent its location
It is difficult to distinguish between the following two cases:
- A pair of objects that represent close entities
- A pair of objects that represent the same entity

Missing data makes the problem even more complicated!

Fusion methods
In this work, we present four novel methods for computing fusion sets. The first algorithm checks whether the nearest object is a good candidate for a match or whether it is not.
The second algorithm takes all the objects within some distance bound as candidates. and selects the fusion sets with the highest degree of confidence using threshold value.
The third algorithm checks also whether the objects within the distance bound have already found a match or whether they have not.
The fourth and last algorithm takes also into account the degree of overlap between the sources.

Current Work We are working now to generalize our algorithms and develop new ones to solve several other fusion problems. First, in the algorithms we present we assume that each entity is presented by one object from each data set at the most, i.e. a 1:1 matching. There are cases, however, when the matching should be one to many, as in the generalization problem (i.e.transforming a map from big scale to a smaller scale). Second, we only use the location attribute of each object to identify him, in many instances there are other attributes, both spatial (e.g. area, perimeter or shape of polygon) and alphanumeric (e.g. name), which may help us as well in increasing the recall and precision.

We also work on the fusion of more than two datasets. It may be argued that multiple sources fusion may be done sequentially, i.e. the fusions of two sources, thereafter the fusion of the result set to the third source and on. However, since any fusion inevitably contains errors, it is not clear if such an operation lead to good results. A possible solution is to do the fusion of all the sources simultaneously. This, however, is a complicated process, since the number of possible fusion sets is exponential in the sources number; thus, such algorithm should be designed thriftily.

Papers

Catriel Beeri, Yaron Kanza, Eliyahu Safra and Yehoshua Sagiv.
Object Fusion in Geographic Information Systems.
In Proceedings of the 30^th International Conference on Very Large Databases (VLDB), Toronto (Canada), September 2004.
(ps, pdf)

Contact information point

Contact person:
      Eliyahu Safra
      database lab,
      School of Engineering and Computer Science,
      Hebrew university
      Jerusalem 91904
      Israel

point

Project members:

Some Relevant GIS Links

Bibliography

B. Amann, C. Beeri, I. Fundulaki, and M. Scholl.
Ontology-based integration of XML Web resources.
In Proceedings of the First International Semantic Web Conference , pages 117-131, Sardinia (Italy), 2002.
O. Boucelma, M. Essid, and Z. Lacroix.
A WFS based mediation system for GIS interoperability.
In Proceedings of the 10^th ACM International Symposium on Advances in Geographic Information Systems, pages 23-28, 2002.
T. Bruns and M. Egenhofer.
Similarity of spatial scenes.
In Proceedings of the 7^th International Symposium on Spatial Data Handling,
pages 31-42, Delft (Netherlands), 1996.
M. A. Cobb, M. J. Chung, H. Foley, F. E. Petry, and K. B. Show.
A rule-based approach for conflation of attribute vector data.
GioInformatica, 2(1) pages 7-33, 1998.
Y. Doytsher and S. Filin.
The detection of corresponding objects in a linear-based map conflation.
Surveying and Land Information Systems, 60(2):117-128, 2000.
Y. Doytsher, S. Filin, and E. Ezra.
Transformation of datasets in a linear-based map conflation framework.
Surveying and Land Information Systems, 61(3):159-169, 2001.
F. T. Fonseca and M. J. Egenhofer.
Ontology driven geographic information systems.
In Proceedings of the 7^th ACM International Symposium on Advances in Geographic Information Systems, pages 14-19, Kansas City (Missouri, US), 1999.
F. T. Fonseca, M. J. Egenhofer, and P. Agouris.
Using ontologies for integrated geographic information systems.
Transactions in GIS, 6(3), 2002.
M. Minami.
Using ArcMap.
Environmental Systems Research Institute, Inc., 2000.
Y. Papakonstantinou, S. Abiteboul, and H. Garcia-Molina.
Object fusion in mediator systems.
In Proceedings of the 22nd International Conference on Very Large Databases, pages 413-424, 1996.
B. Rosen and A. Saalfeld.
Match criteria for automatic alignment.
In Proceedings of 7^th International Symposium on Computer-Assisted Cartography (Auto-Carto 7),
pages 1-20, 1985.
A. Saalfeld.
Conflation-automated map compilation.
International Journal of Geographical Information Systems, 2(3):217-228, 1988.
A. Samal, S. Seth, and K. Cueto.
A feature based approach to conflation of geospatial sources.
International Journal of Geographical Information Science, 18(00):1-31, 2004.
R. Sinkhorn.
A relationship between arbitrary positive matrices and doubly stochastic matrices.
The Annals of Mathematical Statistics, 35(2):876-879, 1964.
R. Sinkhorn.
Diagonal equivalence to matrices with perscribed row and column sums.
The American Mathematical Monthly, 74(4):402-405, 1967.
H. Uitermark, P. V. Oosterom, N. Mars, and M. Molenaar.
Ontology-based geographic data set integration.
In Proceedings of Workshop on Spatio-Temporal Database Management,
pages 60-79, Edinburgh (Scotland), 1999.
G. Wiederhold.
Mediation to deal with heterogeneous data sources.
In Introperating Geographic Information Systems, pages 1-16, 1999.

Material

VLDB poster (ps, pdf)
VLDB presentation (Power Point)

top