Document retro-conversion for personalized electronic reedition

Document retro-conversion for personalized electronic reedition

In this paper, we propose a generic framework to store, retrieve, transform and present mixed sets of native and virtual documents. We intend to use or to develop specific tools organized in a global architecture, from document analysis and capture, document retrieval and classification-categorization, to full generation of personal sets of documents, corresponding to user's specific needs and profile. The first step concerns document preparation and formal analysis. The second step adds semantic metadata, content indexing, and structure-semantic analysis. The third step helps user for the constitution of personalized documents. Research is based on domain specific large sets of documents, as for example European Union law documents (many millions, many file formats, in twenty official languages).

Keywords: Document image analysis, neural network, OCR combination.

Year: 2005

Download: download Full text [945 kB]

Authors of this publication:

Dalibor Fiala

Phone: +420 377 63 2429

Dalibor is the research group coordinator and an associate professor at the Department of Computer Science and Engineering at the University of West Bohemia in Pilsen, Czech Republic. He is interested in data mining, web mining, information retrieval, informetrics, and information science.

Fran├žois Rousselot


Fran├žois is interested in knowledge acquisition and computational linguistics.