USFDThe Natural Language Processing (NLP) group at the University of Sheffield is one of the largest and most successful research groups in text mining and language processing in the EU. The group is based in the Department of Computer Science, which also includes world-class teams in the areas of speech, knowledge and information processing, biotechnology, and machine learning for medical informatics.

The NLP group has world-leading research record in the fields of NLP infrastructures (GATE), information extraction, standardisation, machine learning methods for NLP, dialogue systems, question answering, terminology extraction, NLP methods for Knowledge Management and the Semantic Web. We will build on the results of previous EC and national projects, including AnnoMarket (cloud-based NLP services marketplace), CLARIN (language resources infrastructure), GateCloud (UK JISC project), KHRESMOI (biomedical text mining), EnviLOD (UK JISC project on mining scientific collections with the British Library), NEON (mining FAO content). USFD also has extensive experience in innovation and knowledge transfer activities, in particular collaborations and consulting for companies (both SMEs and large corporates), government bodies, and other organisations.


Role in the Project

USFD will assist CNR with project management, as well as lead WP6 on trans-national access and WP 11 on building a new evaluation framework. They are also leading T3.4 on impact coordination; T5.1 on partnerships with industry; and T5.2. on knowledge transfer, which builds on their experience with generating significant impact with the GATE infrastructure. Also, they are responsible for T4.1 summer schools, based on their track record with running the annual GATE training school. In the JRAs, USFD leads T9.3 on enriching the SoBigData RI with new text and social media mining services.


Infrastructure brought into the project

USFD develops and maintains the world-leading open-source GATE text mining infrastructure ( and its vibrant user community (41,000 software downloads in the past 12 months alone and 265,000 – in the past 9 years). GATE has a repository of over 150 text mining and NLP models and algorithms, including Information Extraction (IE), biomedical text mining, ontology-based semantic annotation, machine learning for IE, and NLP evaluation tools, as well as many 3rd party provided text mining plugins. USFD also maintains two cloud-based deployments of GATE as text mining platforms-as-a- service: GATECloud (aimed at researchers wanting to run GATE-provided or their own text mining pipelines on big data) and AnnoMarket (aimed at companies and other users, who want to use pre- packaged, highly scalable GATE text mining web services).