We have hosted the application webcorpus in order to run this application in our online workstations with Wine or directly.
Quick description about webcorpus:
WebCorpus is a Hadoop-based framework that enables you to calculate statistics on large web corpora extracted from web crawls.Features:
- linguistic processing of text corpora with multiple GB or TB in size using Apache Hadoop
- extracts and counts sentences, word n-grams (with or without POS-tags) and cooccurrences
- reads popular web crawl formats (ARC and WARC)
- filters input data by language, duplicate URL, duplicate content and encoding errors
- can be extended by further linguistic counts based on custom UIMA annotations
Programming Language: Java.
.
©2024. Winfy. All Rights Reserved.
By OD Group OU – Registry code: 1609791 -VAT number: EE102345621.