We have hosted the application webcorpus in order to run this application in our online workstations with Wine or directly.


Quick description about webcorpus:

WebCorpus is a Hadoop-based framework that enables you to calculate statistics on large web corpora extracted from web crawls.

Features:
  • linguistic processing of text corpora with multiple GB or TB in size using Apache Hadoop
  • extracts and counts sentences, word n-grams (with or without POS-tags) and cooccurrences
  • reads popular web crawl formats (ARC and WARC)
  • filters input data by language, duplicate URL, duplicate content and encoding errors
  • can be extended by further linguistic counts based on custom UIMA annotations


Programming Language: Java.

.

Page navigation:

©2024. Winfy. All Rights Reserved.

By OD Group OU – Registry code: 1609791 -VAT number: EE102345621.