Analysis of company, EU Projects, policy documents, clinical guideline and social media/media data report

Summary
The deliverable will focus on two main activities: (a) spotting and investigating EU-funded companies/SMEs and key topics and themes related to the Health, Demographic Change and Wellbeing Societal Challenge, and, (b) constructing a company-topic graph (multimodal networks) where large components, trends and strong community ties can be isolated in topic-specific health-related sectors and traced in time and space. A wealth of network measures will be estimated facilitating the development of innovative impact indicators. The focus of this analysis will be on the companies/SMEs participating in EU research programmes (FP7, H2020), but we will also aim to expand the coverage to national level. To this end, existing text analytics tools and workflows for processing the semi-structured and unstructured textual company data harvested and available in the Data4Impact Repository will be adapted, tuned, deployed and integrated in the Data4Impact platform. We plan to exploit and examine the effectiveness of recent advances in deep learning and distributional semantic representations concerning Entity (company) extraction and linking and Topic/Theme extraction, together with all necessary pre-processing tools like text normalisers, morphological analysers, term extractors and syntactic parsers. Our focus is on harvested data of finalised EU research projects and their stated expected impacts in the project final reports. The related data will be derived from the Cordis system. We will deploy content analytics workflows (entity extractors and topic/theme analysers) to cater for the detection of entities (e.g. organizations: Universities, Companies, Research organizations, SMEs, Labs, Hospitals, Health organizations etc. and EU Projects) mentioned in specific topics and themes related to the Health, Demographic Change and Wellbeing Societal Challenge. We will also build entity-topic networks examining links and correlations of entity mentions to significant topics and themes. Analysis will provide a set of measures contributing to the development of new impact indicators. Our focus is on harvested data of policy documents & research activities/trends in selected countries (UK, Germany, Sweden). We will deploy content analytics workflows (entity extractors and topic/theme analysers) to cater for the detection of entities (e.g. organizations: Universities, Companies, Research organizations, SMEs, Labs, Hospitals, Health organizations etc. and EU Projects) mentioned in specific topics and themes related to the Health, Demographic Change and Wellbeing Societal Challenge. We will also build entity-topic networks examining links and correlations of entity mentions to significant topics and themes. Analysis will provide a set of measures contributing to the development of new impact indicators. Additionally, the bibliometric analysis of research impact in the professional sphere will be done based on references in clinical guidelines indexed in Clinical Impact, (CI:), a citation database that indexes clinical guideline references to the research literature. We will use this infrastructure to populate the database to provide the material in the form of collected documents with reference lists, from the respective use cases (UK, Germany, Sweden). The data will be semi-automatically translated into citations and validated against PubMed and WoS as well as the OpenAire system. Thereafter, biographical data on researchers Institutional, Country and research areas will be used to develop impact measures and map how EU financed research has come to use in the actual professional health sector setting. For social media data, we will follow the methodology developed in Nelhans and Gunnarsson Lorentzen (2016) to collect Twitter conversations about research. We will use the Twitter streaming API to filter tweets containing the strings 'dx' and 'doi' or including an embedded dx.doi.org URL. In order to capture the follow-on tweets or follow-on commun