Text-Mining Research Group » Download

Download

Almus: Automatic Text Summarizer
Size:	135 kB
Desc.:	The system creates a summary of a set of documents dealing with the same topic. It is also possible to generate an update summary by specifying the basic document collection. The summarization method is based on the latent semantic analysis.
Related:	Automatic Text Summarisation

Download

EuroSearch - search in a multilingual environment
Size:	1078 kB
Desc.:	This software enables searching a specific multilingual text corpus. Algorithms make use of a modified vector space model, text transformations into language-independent forms, and TF-IDF based scoring.
Related:	Searching and Summarizing in Multilingual Enviroment

Download

SPOT - dictionary of domain-specific terminology
Size:	5454 kB
Desc.:	The SPOT application provides the engine for on-line ICT and domain-specific dictionary. It helps translators and all interested professionals create, find and use correct translations of complicated and/or new terms, especially in the field of ITC. The SPOT package contains the compiled application written in Java and the Spring Framework, source code, database scripts and installation documentation.
Related:	SPOT: English-Czech ICT Terminology On-line Review

Download

SVDPlag v1.0
Size:	914 kB
Desc.:	This tool allows identifying cases of plagiarism in written text. This particular solution employs an advanced technique based on the Latent Semantic Analysis (LSA) framework to perform large statistics computations. For that purpose, Singular Value Decomposition (SVD) is used to infer the associations among the common N-grams contained in the examined documents. Moreover, this tool enables applying various text pre-processing techniques. This library has been developped in C# under the .NET Framework 3.5 which is required for runing as well as the 64-bit operating system. The supported architecture is x86-64. This tool employs Extreme Optimization Numerical Libraries for .NET version 3.5 64-bit. The older or 32-bit libraries are not supported.
Related:	Automatic Plagiarism Detection

Download

Sciento
Size:	2058 kB
Desc.:	This software has been developed within the project 2C06009 co-funded by the Ministry of Education of the Czech Republic. It consists of four modules that allow for a scientometric analysis of DBLP and CiteSeer data. Data from these two digital bibliographic libraries can be imported into a relational SQL database and thoroughly examined. This software implements novel evaluation methods that enable creating rankings of influential researchers by mining from the above digital libraries.
Related:	User Profile Mining, Social Networks

Download

TArank
Size:	492 kB
Desc.:	This software applies a novel method called "time-aware PageRank" to analyze bibliographic citation networks.
Related:	User Profile Mining, Social Networks

Download

Teraman v1.0
Size:	406 kB
Desc.:	Teraman is a tool for N-gram extraction from large text datasets. Our approach is based on batch processing and therefore it is able to process texts which are much larger than the available memory. The process consists of three steps: pre-processing & indexing, counting N-grams and de-indexing. The tool is developed in C# under the .NET Framework 2.0 which is required for running.
Related:	Document Classification