SELFEH Bilingual Corpus

SELFEH (Serbian-English Law Finance Education and Health) is an aligned Serbian-English corpus with documents pertaining to finance, health, law and education. It was developed in the course of the participation of the University of Belgrade HLT group in the INTERA project.
 
SELFEH contains around 1MWs per language and is aligned using TMX format. It contains over 150 documents in XML format, where the Serbian side of the corpus also has POS-tagged and lemmatised versions in XML format.
 

Useful links