Mining Social Structures from Genealogical Data


The starting point of this research project is the large collection of historical documents maintained by the Brabant Historical Information Center (BHIC). A document can be anything ranging from scans of birth and death certi.cates, memories of succession, or tax declarations, to so.cial photographs or family pictures. The current status of this collection is that the documents have been tagged by source and subject.Researchers can use keyword-based search to fi.nd relevant documents for their research (either a scan or a pointer to a physical location) based on these tags. This database, however, is not at all flawless; many names are duplicate, have several alternative spellings, or even contain mistakes. Furthermore, important semantic links such as the parent-child relation are only implicitly available, making simple tasks such as .finding out if two given persons are related, very labor intensive.

Project Overview

This project addresses the problem of how to derive identities of persons and social structures from large sets of genealogical data available as text and photographs with incomplete information. In order to do so we want to investigate and deploy a combination of techniques from data mining, machine learning and human computation. The project goals are (a) a semantically enriched and cleaned version of the current database of the BHIC; (b) the development of advanced search tools to support historical research; and (c) providing automatic tools for supporting large scale prosopographical research.

Research Team


    • Frans Oliehoek
    • Bijan Ranjbar-Sahraei

Cultural Heritage

    • Rien Wols
    • Jacques van Rensch