Novel training Dataset and labelling functions

Summary
Report explaining the fundamentals to be able to automatically label the scientific text downloaded to allow the automatic preprocessing of the literature