Dictionaries On-line
|
Desc.: | English-Czech online dictionary mostly based on the i-spell database. |
|
Desc.: | Still in progress - may be temporarily unavailable Dictionary administered and updated by volunteers under the Dept. of Computer Science and Engineering at the University of West Bohemia. Terminology focused mostly on the areas of computer science and engineering. |
Other Research Groups
|
Desc.: | The research activities of ARG relate mostly to applications in Information Retrieval and other related disciplines (e.g. data indexing and storage, data modeling, data compress, text retrieval etc.). |
|
Desc.: | Knowledge Discovery Group aims at the development of pre-processing methods for data mining, natural language learning, mining in spatio-temporal data, difficult patterns classification, integration data mining tools with database systems and other. |
|
Desc.: | The aim of research in Natural Language Engineering (NLE) is to endow computer systems with the ability to process natural language. This ability is essential for applications such as information retrieval and web search, information extraction and data mining, text summarization, and speech technology. NLE techniques for morphological analysis , part-of-speech tagging, word prediction, or term extraction are already in use in real-world applications in these a reas, and the technology required for applications such as news summarization or spoken dialogue systems (e.g., systems that can engage in a dialogue with customers to give information about train timetables) is already at a very advanced state of development. |
Text Corpora - Useful Sources
|
Desc.: | The American National Corpus (ANC) project is creating a massive electronic collection of American English, including texts of all genres and transcripts of spoken data produced from 1990 onward. The ANC will provide the most comprehensive picture of American English ever created, and will serve as a resource for education, linguistic and lexicographic research, and technology development. |
|
Desc.: | The British National Corpus (BNC) is a 100 million word collection of samples of written and spoken language from a wide range of sources, designed to represent a wide cross-section of current British English, both spoken and written. |
|
Desc.: | The Czech National Corpus (CNC) is a non-commercial, academic project focused on building up a large computer-based corpus, containing mainly written Czech. CNC presents a very large, modern and valuable language and informational base. |
|
Desc.: | A collection of transcripts of academic speech events recorded at the University of Michigan. |
|
Desc.: | Useful information on text corpora and concordancing.The site was originally a Corpus Linguistics site at Rice University. |
|
Desc.: | The Text Corpus Toolkit is a web application designed to facilitate analysis and administration of various text corpora via a simple web interface. Standard text collections include Reuters, Enron spam, Ling spam, 20Newsgroups, and others. The Toolkit can be used by text-mining researchers to generate various statistics on text corpora. |
|
Desc.: | The Survey of English Usage carries out research in English Linguistics and was the first centre in Europe to do research with corpora. The Survey is based in the Department of English Language and Literature at UCL. |