Synthetic Datasets 2

Synthetic data is known as ‘artificial data’ that is simulated from real data using statistical models in order to represent the population yet avoid any divulgence of actual patient records. This task will involve creating synthetic data from the real population datasets that are made available in this project. This artificial data will be simulated using the SynthPop library inside the R programming environment. The synthetic datasets will be validated with the real data by analysing summary statistics and Gaussian distributions. Whilst they are somewhat representative, synthetic datasets avoid various governance and confidentiality issues since real patient or citizen records are not provided or disclosed.