Web Mining Methods for the Detection of Authoritative Sources: Theory and Practice

Web Mining Methods for the Detection of Authoritative Sources: Theory and Practice

The development of information society in recent decades has enabled collecting, filtering and storing huge amounts of data. These data must be further processed to gain valuable information and knowledge. The scientific field dealing with extracting information and knowledge from data has evolved rapidly to cope with the extent and growth of information sources the number of which has geometrically increased with the appearance of the World Wide Web. All traditional approaches in information retrieval, knowledge acquisition, and data mining must be adapted for the dynamic, heterogeneous, and unstructured data on the Web. Web mining has come into being as a fully-fledged research discipline. This book presents state-of-the-art knowledge of Web mining from the perspective of looking for authoritative sources. Besides introduction to the theoretical concepts of Web crawling, ranking algorithms, and social networks, results of practical experiments are shown as well. In particular, a brand new algorithm for bibliographic networks is introduced. This publication will be especially useful to professionals, researchers, and students in the field of data mining and information retrieval.

Keywords: Web mining, Web crawling, ranking algorithms, bibliographic networks, citations, co-authorships, authorities, bibliographic PageRank

Year: 2009

Download: download Full text 

Authors of this publication:


Dalibor Fiala


Phone: +420 377 63 2429
E-mail: dalfia@kiv.zcu.cz
WWW: http://www.kiv.zcu.cz/~dalfia/

Dalibor is the research group coordinator and an associate professor at the Department of Computer Science and Engineering at the University of West Bohemia in Pilsen, Czech Republic. He is interested in data mining, web mining, information retrieval, informetrics, and information science.

Related Projects:


Project

Social Networks Analysis

Authors:  Karel Ježek, Dalibor Fiala, Michal Nykl
Desc.:Application of the PageRank algorithm and its modifications to the exploration of network structures, particularly citation and co-autorship networks.