Data Management Plan

Summary
The overall methodology includes (1) the re-analysis of RADseq data previously obtained by the host research group to quantify the connectivity among environments and neutral genome-wide levels of genetic diversity, (2) the sequencing of samples of adult fish from three different environments (Galicia and Norway for both; Celtic Sea for ballan wrasse, L. bergylta, and Balearic Sea for the hake, M. merluccius) and larvae from one of these environments (Galicia) to survey genomic functional Single Nucleotide Polymorphism (SNPs) by using HTS technologies to reveal the adaptive potential to environmental conditions, and (3) performing different genomic and statistical analyses to produce quantifiable estimates of the adaptive potential (i.e. the resilience index) and the effect of harvesting on it, and then the likelihood of adaptation to changes in the environment. RADseq data will be analysed using R packages such as ipyrad, Bayescan, Arlequin version 3.5.2.2, DNAsp. Statistics such as Fst-index, number of migrants, and genetic distances will be used to estimate the amount of gene flow among populations. DNA-seq reads obtained from sequencing of functional genome regions, will be managed and analyzed using Genome Analysis Toolkit framework (GATK). We plan to study a minimum of 10 groups of functionally related genes (i.e. grouped by gene-pathways or gene ontology) including ~ 500-1000 coding genes (~ 2.5 -5% of coding genes in the genome). Sequencing will cover the whole coding length of genes and 1kb of sequences upstream of the transcription start site in order to get putative regulatory sequences. In total, we plan to produce sequences of functional genomic regions for ~150-180 adult samples (~75-90 by species) and 25-30 groups of larvae from different nests. The Screening of potentially adaptive genetic variation will be performed using different approaches: a) Genomics seascape: We will use SamBada35 and related software to perform these analyses. b) Comparing allele frequencies of functional gene groups to neutral distributions and estimating statistics that are informative on the action of balancing selection. c) Random Forest algorithm can discover subtle changes in allele frequencies among environments than are undetectable using traditional population genetics approaches and we will use it as implemented in R packages. The deliverables of the project, as well as the designed pipelines for data analyses will be available for research and non-profit initiatives through the website of the project which will be created for this purpose within the institutional host web-service. Thus, the research proposal project will be connected to the DigitalCSIC data repository, which is also associated with other repositories (R3data, FAIRsharingData, etc) and Search platforms (Scholar google, etc) ensuring the correct dissemination and availability of the data, methods and results. Moreover, genomics sequences generated with this project will be deposit in public databases such as NCBI-bioproject database freely accessible for the whole scientific community.