My group is studying the mechanistic basis of epigenetic regulation in the Polycomb system, a vital epigenetic silencing pathway that is widely conserved from flies to plants to humans. We use the process of vernalization in plants in our experiments, which involves memory of winter cold to permit flowering only when winter has passed via quantitative epigenetic silencing of the floral repressor FLC. Utilising this system has numerous advantages, including slow dynamics and the ability to read out mitotic heritability of expression states through clonal cell files in the roots. Using computational modelling and experiments (including ChIP and fluorescent reporter imaging), we have shown that FLC cold-induced silencing is essentially an all-or-nothing (bistable) digital process. The quantitative nature of vernalization is generated by digital chromatin-mediated FLC silencing in a subpopulation of cells whose number increases with the duration of cold. We have further shown that Polycomb-based epigenetic memory is indeed stored locally in the chromatin (in cis) via a dual fluorescent labelling approach. I will also discuss how further predictions from the computational modelling, including opposing chromatin modification states and extra protein memory storage elements, are being investigated. I will also discuss the mechanisms by which long term fluctuating temperature signals are sensed before being converted into digital chromatin states for long term memory storage.
In recent years, the success of and insight gained by classical molecular modeling, in understanding the fundamentals of complex molecular phenomena, have triggered a strong desire to go beyond the limitations of the information that can be extracted from classical Molecular Dynamics MD, especially the limitations that cannot be resolved by advances in computational efficiency. To this aim effective molecular representations have been developed and used for diverse molecular systems in a variety of coarse-grained (CG) and multi-scale (MS) techniques. (For a recent review and perspective see [1]) In the last decade, hybrid particle-continuum approaches such as Single-Chain in Mean-Field (SCMF) [2] and Hybrid Particle Field MD (hPF-MD),[3,4] which link discrete (particle-based) and continuum (field-based) descriptions in a single simulation volume, have been increasingly applied and validated for different systems. In hPF-MD, the nonbonded forces acting on a particle are expressed as function of the derivatives of local density gradients. This reformulation enables much more efficient simulations, especially for large parallel runs, than standard MD as the evaluation of nonbonded pair forces is replaced by building particle-to-mesh density fields and computing the density field potentials. Both steps are of first order in the number of particles. The hPF-MD model has been demonstrated to be effective to investigate homopolymers and block copolymers at both CG[5] and atomistic resolutions [6] also in the presence of solid nanoparticles [7-9] and for liquid-vapor interfaces [10]. The hPF-MD model was also validated in describing the conformational and dynamical properties of biological systems such as lipid bilayers[11-13], biosurfactants [14] and proteins [15]. More recently, after the integration of electrostatics into the hybrid particle-field scheme [16], the hPF-MD method was further successfully applied to charged phospholipids [17]. During the talk, after an introduction to the basics of hPF-MD methodology, I will give an overview of the main results with a special focus on electrostatic interactions their implementation and its applications from simple idealized to complex molecular models [18].
References
[1] Unfolding the prospects of computational (bio)materials modelling G.J.A. Sevink, A. Liwo, P. Asinari, D. MacKernan, G. Milano, and I. Pagonabarraga (Featured Article for the special issue: Classical Molecular Dynamics (MD) Simulations: Codes, Algorithms, Force Fields, and Applications) J. Chem. Phys. 2020, 153, 100901
[2] Single chain in mean field simulations: Quasi-instantaneous field approximation and quantitative comparison with Monte Carlo simulations K. Daoulas, M. Muller J. Chem. Phys. 125, 184904 (2006)
[3] Hybrid Particle-Field Molecular Dynamics Simulations for Dense Polymer Systems G. Milano, T. Kawakatsu J. Chem. Phys. 130, 214106, (2009)
[4] Hybrid Particle-Field Molecular Dynamics Simulations: Parallelization and Benckmarks Y. Zhao, A. De Nicola, T. Kawakatsu, G. Milano Journal of Computational Chemistry 33, 868, (2012)
[5] Micellar Drug Nanocarriers and Biomembranes: How do they Interact? A. De Nicola, S. Hezaveh, Y. Zhao, Toshihiro Kawakatsu, Danilo Roccatano, Giuseppe Milano Phys. Chem. Chem. Phys 16, 5093, (2014)
[6] Generation of Well Relaxed All Atom Models of Large Molecular Weight Polymer Melts: A Hybrid Particle-Continuum Approach Based on Particle-Field Molecular Dynamics Simulations A. De Nicola, T. Kawakatsu, G. Milano J. Chem. Theory Comput., 2014, 10 (12), pp 5651–566
[7] Rational Design of Nanoparticle/Monomer Interfaces: A Combined Computational and Experimental Study of In Situ Polymerization of Silica Based Nanocomposites A. De Nicola, R. Avolio, F. Della Monica, G. Gentile, M. Cocca, C. Capacchione, M. E. Errico and G. Milano RSC Advances 2015, 5, 71336-71340
[8] Self-Assembly of Carbon Nanotubes in Polymer Melts: Simulation of Structural and Electrical Behavior by Hybrid Particle-Field Molecular Dynamics Y. Zhao, M. Byshkin, Y. Cong, T. Kawakatsu, L. Guadagno, A. De Nicola, N. Yu, G. Milano and B. Dong, Nanoscale 2016, 8, 15538-15552
[9] Efficient Hybrid Particle-Field Coarse-Grained Model of Polymer Filler Interactions: Multiscale Hierarchical Structure of Carbon Black Particles in Contact with Polyethylene S. Caputo, V. Hristov, A. De Nicola, H. Herbst, A. Pizzirusso, G. Donati, G. Munaò, A. R. Albunia and G. Milano J. Chem. Theory Comput. 2021, 17, 3, 1755-1770.
[10] Efficient and Realistic Simulation of Phase Coexistence G. J. A. Sevink, E. M. Blokhuis, and X. Li, G. Milano J. Chem. Phys. 2020, 153, 244121.
[11] A hybrid particle-field molecular dynamics approach: a route toward efficient coarse-grained models for biomembranes G. Milano, T. Kawakatsu, A. De Nicola Physical Biology 10, 045007, (2013)
[12] Toward Chemically Resolved Computer Simulations of Dynamics and Remodeling of Biological Membranes T. A. Soares, S. Vanni, G. Milano, M. Cascella Journal of Physical Chemistry Letters (Perspective) J. Phys. Chem. Lett., 2017, 8 (15), pp 3586–3594
[13] Self-Assembly at the Multi-Scale Level: Challenges and New Avenues for Inspired Synthetic Biology Modelling G. Milano, I. Marzuoli, C. D. Lorenz and F. Fraternali Synthetic Biology: Volume 2 Royal Society of Chemistry Book Series 2017
[14] Self Assembly of Triton X-100 in water solutions: A Multiscale Simulation Study Linking Mesoscale to Atomistic Models A- De Nicola, T. Kawakatsu, C. Rosano, M. Celino, M. Rocco, G. Milano Journal of Chemical Theory and Computation 2015 11 (10), 4959-4971
[15] Hybrid Particle-Field Model for Conformational Dynamics of Peptide Chains S. Løland Bore, G. Milano, M. Cascella Journal of Chemical Theory and Computation 2018, 14 (2), pp 1120–1130
[16] Hybrid particle-field molecular dynamics simulation for polyelectrolyte systems YL Zhu, ZY Lu, G. Milano, AC Shi and ZY Sun Phys. Chem. Chem. Phys 2016, 18, 9799
[17] Hybrid particle-field molecular dynamics simulations of charged amphiphiles in aqueous environment H. B. Kolli, A. De Nicola, S. Løland Bore, K. Schäfer, T. Kawakatsu, ZY Lu,YL Zhu, G. Milano, M. Cascella 2018, 14 (9), pp 4928–4937ù
[18] Aggregation of Lipid A Variants: A Hybrid Particle-Field Model A. De Nicola, T. A. Soares, D. E. S. Santos, S. Løland Bore, G. J. A. Sevink, M. Cascella, G. Milano Biochimica et Biophysica Acta - General Subjects 1865, 4, 2021, 129570
8-oxo-7,8-dihydro-2′-deoxyguanosine (8-oxodG), a major product of the DNA oxidization process, has been proposed to have an epigenetic function in gene regulation and has been associated with genome instability. NGS-based methodologies are contributing to the characterization of the 8-oxodG role in many genome-related functions. However, the number of studies addressing the 8-oxodG epigenetic role at a genomic level is still low and the mechanisms controlling genomic 8-oxodG accumulation/maintenance have not yet been fully characterized.
In this study, we report the identification and the characterization of a set of enhancer regions accumulating 8-oxodG in human epithelial cells. We found that these oxidized enhancers are mainly super-enhancers and are associated with bidirectional-transcribed enhancer RNAs and DNA Damage Response activation. Moreover, using ChIA-PET and HiC data, we identified specific CTCF-mediated chromatin loops in which the oxidized enhancer and promoter regions physically associate. Oxidized enhancers and their associated chromatin loops accumulate endogenous double-strand breaks which are in turn repaired by NHEJ pathway through a transcription-dependent mechanism. Our work provides novel mechanistic insights on the intrinsic fragility of chromatin loops containing oxidized enhancers-promoters pairs and suggests that 8-oxodG accumulation in these latter occurs in a transcription-dependent manner.
Abstract:
High-throughput techniques and experiments enable researchers to investigate complex biological processes through large-scale analysis of omics data. The growth of big omics data entails continuous computational challenges in the collection, management, analysis and interpretation (mining) of data, as well as in their sharing, visualization, storage and integration to obtain emerging information necessary to understand the biology of complex systems (systems biology). Indeed, It is imperative to undertake an integrative approach that combines multi-omics data to highlight the interrelationships of different classes of biomolecules and their functions, and to investigate the biological system as a whole (holistic approach).
We present some of the research activities carried out at the Dept. of Agricultural Sciences with the aim of developing computational tools and applying multi-omics data analysis strategies for multiple purposes. The spread of increasingly efficient methods for the sequencing of DNA (genomics and metagenomics) and RNA (transcriptomics) and of NGS-based genotyping techniques allowed to (i) explore the “sequence space”; (ii) investigate genome structure and organization; (iii) characterize gene function and gene expression patterns; (iv) study food and human microbiomes, with particular focus on large-scale analyses performed at strain-level resolution; (v) provide high-resolution profiling of nucleotide variation within germplasm collections (population genomics); (vi) discover loci that are associated with key agronomic traits via genome-wide association studies; (vii) study molecular mechanisms involved in plant-microbe interaction.
Recent publications:
Biological systems are complex entities whose behavior emerges from an enormous number of reactions taking place within and among different internal molecular districts. The dissection and the modeling of the entities and the interactions constituting these interactions are essential in biological processes behind normal and pathological conditions as well as the perturbations induced by the exposure to external molecules like drugs. The recent explosion of omics data fueled the creation of diverse systems biology models. The majority of these are focused on the representation of interactions taking place in single molecular districts and have been successfully used to perform sample stratification, especially in cancer disease. Despite the usefulness proven by these models, they still did not reach the level of complexity needed to distinguish different biological conditions.
One step forward in this direction is the creation of multi-omics models capturing the dynamics taking place within and between omics layers. This latter approach needs powerful modeling strategies and is still an open research field.
We propose the application of a powerful AI technique based on graph embedding for the creation of a system that, starting from multi-omics measurements, is able to model and generate knowledge about multi-omics interactions.
Here we present a novel approach implemented as an R package named MultiOmics Network Embedding forSubType Analysis (MoNETA) for the identification of relevant multi omics relationships between biological samples.
This approach has been applied in the identification of different cancer subtypes using multi omics data form the The Cancer Genome Atlas (TCGA) and the Clinical Proteomic Tumor Analysis Consortium (CPTAC) datasets.
MoNETA will be freely available as an R package at https://github.com/BioinfoUninaScala/MoNETA.
Gastric cancer (GC) remains one of the major causes of cancer-related mortality worldwide. Molecular heterogeneity is a major determinant for the clinical outcomes and an exhaustive tumor classification is currently missing. Histologically normal tissue adjacent to the tumor (NAT) is commonly used as a control in cancer studies, nevertheless shows unique characteristics in several tumor types, possibly leading to suboptimal tumor features definition. Moreover, several limitations to the success of current therapeutic GC treatments may be due to cancer drug resistance that leads to tumor recurrence and metastasis. Apoptosis evasion represents a causative factor for treatment failure in GC as in other cancers and intracellular calcium homeostasis regulation has been found to be associated with apoptosis resistance. Finally, although extensive literature was produced to better define Lauren’s classification subgroups, characterizing pathways and actionable candidates in clinical practice are still missing.
Here I’d like to show:
Bibliography
Russi S, Calice G, Ruggieri V, Laurino S, La Rocca F, Amendola E, Lapadula C, Compare D, Nardone G, Musto P, De Felice M, Falco G, Zoppoli P. Gastric Normal Adjacent Mucosa Versus Healthy and Cancer Tissues: Distinctive Transcriptomic Profiles and Biological Features. Cancers (Basel). 2019 Aug 26;11(9):1248. doi: 10.3390/cancers11091248. PMID: 31454993; PMCID: PMC6769942.
Zoppoli P, Calice G, Laurino S, Ruggieri V, La Rocca F, La Torre G, Ciuffi M, Amendola E, De Vita F, Petrillo A, Napolitano G, Falco G, Russi S. TRPV2 Calcium Channel Gene Expression and Outcomes in Gastric Cancer Patients: A Clinically Relevant Association. J Clin Med. 2019 May 11;8(5):662. doi: 10.3390/jcm8050662. PMID: 31083561; PMCID: PMC6572141.
Laurino S, Mazzone P, Ruggieri V, Zoppoli P, Calice G, Lapenta A, Ciuffi M, Ignomirelli O, Vita G, Sgambato A, Russi S, Falco G. Cationic Channel TRPV2 Overexpression Promotes Resistance to Cisplatin-Induced Apoptosis in Gastric Cancer Cells. Front Pharmacol. 2021 Oct 4;12:746628. doi: 10.3389/fphar.2021.746628. PMID: 34671260; PMCID: PMC8521017.
Background and rationale. Human tumors are complex systems characterized by molecular, cellular and spatial diversities. The totality of features demonstrating differences within a tumor is termed intra-tumor heterogeneity (ITH). ITH may be one of the mechanisms at the basis of the drug resistance and relapse triggered, for example, via the selection of malignant clones. Single cell sequencing approaches coupled with advanced computational analyses have made a huge contribution to understand the molecular basis of tumor ITH. However, due to the lack of data at the single-cell level, little is known about these dynamics in tumors such as neuroblastoma (NB), one of the most common solid tumors of the childhood. NB affects the development of sympathetic nervous system and its treatment is still unsuccessful in half of the patients diagnosed with the high-risk subtype. Here we investigated the ITH of Etoposide and Cisplatin resistant NB cell lines and their parental cells through single cell RNA sequencing (scRNA-seq).
Methods. scRNA-seq was performed on 10X Genomics platform and barcode filtering, alignment of reads and UMI counting were carried out using Cell Ranger 3.0.1. Counts were imported into R for quality control (QC) and downstream analysis. Cells were excluded if fewer than 2000 distinct genes, 20,000 counts or more than 30% of reads mapping to mitochondrial genes were detected. Data were normalized, scaled, log-transformed and, in order to remove confounding sources of variation, percent of mitochondrial genes, read counts and cell cycle scores were regressed out using a regularized negative binomial model implemented in Seurat package. The most variable genes were used for dimensionality reduction and clustering analysis was carried out with the nearest neighbor algorithm. Gene set enrichment analysis of marker genes for each cluster was performed with Webgestalt R package. CIBERSORTx was used to deconvolute bulk RNA-seq datasets with scRNA-seq-derived cell clusters and resulting scores were correlated with clinical and survival data.
Results. We obtained transcriptional profiles of 1514 Etoposide-resistant vs. 2646 parent cells, and 1160 Cisplatin-resistant vs. 1674 parental cells after QC. TSNE and UMAP plots showed a clear separation of resistant and parental cells for both conditions and allowed to identify 8 distinct tumor clusters in Etoposide-resistant/parental and 7 in Cisplatin-resistant/parental cells. We found a significant enrichment (FDR ≤ 0.01) of pathways related to the DNA damage response in both drug resistant cells, suggesting that the upregulation of the DNA repair machinery may be a potential drug resistance mechanism in these cells. Besides, both parental cell lines showed cell clusters characterized by genes involved in embryonal differentiation trajectories and enrichment of neural crest development pathways, reflecting the dynamics of NB cell development. Deconvolution analysis of bulk RNA-seq data with cluster signatures, allowed the identification of specific clusters associated (logrank P ≤ 0.01) with worse/better survival.
Conclusions. In this study, we applied scRNA-seq and advanced bioinformatic pipeline to analyze the chemo resistant NB cell lines. We identified distinct cell populations characterizing Etoposide and Cisplatin resistant NB cell lines, provided insights into plausible mechanisms of chemoresistance and highlighted genes and cluster signatures associated with clinical outcomes that are potentially actionable as therapeutic targets.
Speaker recent publications
Reactivation of the inactive X-chromosome (Xi) has been used to model epigenetic reprogramming in the mouse. Human studies have, however, been hampered by Xi epigenetic instability in pluripotent stem cells and difficulties in tracking emerging iPSCs. Recently, we have shown that reprogramming female human fibroblast via mouse ESC fusion recapitulates features of in vivo human naïve pluripotency. We used this unique reprogramming system to examine the earliest chromatin and transcriptional events in Xi reactivation. Our study revealed a rapid (1-2 days) and wide-spread (30-50% of cells) delocalization of XIST RNA and loss of H3K27me3 from the human Xi that precede, and are tightly associated with, the re-expression of selected Xi genes. After cell division, Xi gene reactivation was observed in a similar percentage of hybrids and remained stable over 6 days. The human pluripotency-specific XACT RNA was instead re-expressed and coated the Xi in rare hybrids (1%), suggesting that XACT is not required for early Xi chromatin changes and gene reactivation in the reprogramming context. Collectively, these data distinguish pre- and post- mitotic changes and reveal a hierarchy of epigenetic events that are required for Xi reactivation.
Interestingly, single-cell RNA-FISH and allele-specific RNA sequencing analyses showed that reprogramming-mediated human Xi reactivation was partial and selective for a specific subset of genes. Selective Xi reactivation was not limited to gene loci residing within specific chromatin domains neither influenced by proximity to XIST locus. Reactivation was instead associated with stochastic Xi expression ahead of reprogramming, as shown by isogenic fibroblast clones and single cell analyses. Importantly, reprogramming-mediated reactivation remained partial even in cells examined up to six days after fusion, but it was extended to a second group of Xi loci by DNA demethylation. These findings underscore the differential sensitivity of distinct human Xi genes to reprogramming-mediated reactivation and suggest that multiple non-overlapping epigenetic mechanisms maintain silencing along the human Xi.
Automatic analysis of rodent behavior has been receiving growing attention in recent years, since rodents have been the reference species for many neuroscientific studies. In parallel, a number of technologies have been developed in a bid to automate the data interpretation. Thanks to the Digital Ventilated Cage (DVC by Tecniplast), a system relying on the detection of animal activity via the generation of tiny electromagnetic fields, we have recently obtained an unbiased understanding of in-cage spontaneous mouse behavior and longitudinally tracked locomotor activity in the two sexes of three non-genetically altered mouse strains during a 24-h period for two months. The recorded locomotor activity of the three mice strains was analysed by relying on different and commonly used circadian metrics (i.e., day and night activity, diurnal activity, responses to lights-on and lights-off phases, acrophase and activity onset and regularity disruption index) to capture key behavioral responses. We compared the 24-h spontaneous locomotor activity of the mice and extrapolated key aspects of the day and night activity patterns for each strain. All analysed metrics clearly show significant differences in the circadian activity of the three selected strains, identifying key differences characterizing strain-specific spontaneous locomotor patterns during the 24-h period. The behavioral differences were also analysed by an unsupervised machine learning approach. Each strain corresponded to a cluster, and notably the repeated and longitudinal measurements of all circadian metrics confirmed that data referring to cages housing each strain were included in a specific cluster. A further analysis is in progress to identify the spatial pattern of in-cage recorded spontaneous locomotor activity of the same mice strains.
Fuochi et al. 2021. Phenotyping spontaneous locomotor activity in inbred and outbred mouse strains by using Digital Ventilated Cages. Lab Anim (NY) 50(8):215-223. doi: 10.1038/s41684-021-00793-0.
Blood is a complex fluid with non-Newtonian characteristics. It consists primarily of a concentrated suspension of deformable red blood cells (RBCs) [1] which tend to aggregate reversibly in microstructures, such as rouleaux; this tendency is a major contributor to the viscoelastic flow behavior of blood. Human blood mechanical response is strongly affected by RBC properties, such as volume fraction, deformability and aggregation [2]. In particular, the tendency of RBCs to form packed structures plays an important role in blood flow behavior, causing the increase of blood viscosity, especially at low shear rates. Currently, both research and clinical hemorheology is mostly based on steady shear measurements to obtain the apparent blood viscosity [3]. However, linear viscoelastic tests, such as oscillatory shear, can provide valuable information about blood microstructure, but few results are available in the literature. Recently, blood viscoelastic moduli have been investigated by passive microrheology [4], but the application of this technique to a heterogeneous material such as blood is questionable.
Here, we present a systematic set of oscillatory shear measurements by conventional bulk rheology to evaluate storage and loss moduli of whole human blood. The rheological behavior of human blood was characterized both in physiological conditions and in RBC aggregating media. The latter ones were obtained by the addition of a polymer and by increasing the hematocrit above the normal physiological levels [5].
[1] G. Tomaiuolo et al., Soft Matter 5, 2009.
[2] O. K. Baskurt, and H. J. Meiselman, in Seminars in thrombosis and hemostasis (New York: Stratton Intercontinental Medical Book Corporation, 2003.
[3] O. K. Baskurt et al., Clinical hemorheology and microcirculation 42, 2009.
[4] L. Campo-Deaño et al., Biomicrofluidics 7, 2013.
[5] Tomaiuolo G et al. Rheologica Acta 55(6), 2016.
Bardet-Biedl syndrome (BBS) is a ciliopathy genetic disorder characterized in most cases by obesity, polydactyly, renal dystrophy and cystic kidneys. BBS is strictly related to the hetero-octameric protein complex named as BBSome. The recruitment of BBSome into cilia membranes is mediated by the binding with the GTP binding protein ARL6, which binds at the interface between the BBS1 and BBS7 subunit of BBSome [1]. Specifically, the ARL6 binding occurs only in the active state of BBSome, characterized by an open conformation bewteen the BBS1 and BBS7 β-propeller subunits [2], while in absence of ARL6 (apo form), BBS1 is arranged in a more closed conformation. Additionally, the most promising structural and functional properties, are exerted by the BBSome core complex formed by the BBS1, 4, 8, 9 and 18 subunits, with the latter having remarkable stabilizing effect on the complex [3]. Experimental data, revealed the ubiquitination at K143 residue of BBS1 by the E3 ligase praja2 positively regulates the binding to ARL6, but a detailed structural mechanism of action is still unknown, probably because of the large size of the system, which requires long-time scale simulations. We have undertaken this challenge using microseconds-long Coarse-Grained Molecular Dynamics (CG-MD) simulations on both the homology models of the human sequence of BBSome (wt-hBBSome) and the K143 monoubiquitinated form (Ub-hBBSome), followed by essential motion analyses. The CG description, in fact, allows building a simplified representation of systems, resulting in the possibility to increase the orders of magnitude in the simulated time and length scales. Our advanced computational approach provided structural insights for the comprehension of the Ubiquitin (Ub) role on the BBSome subunits, representing a valuable therapeutic approach for ciliopathy disorders.
Understanding intratumor heterogeneity and the interactions between tumor cells and the immune system is the critical step in the study of tumor growth and evolution. Typically in these studies a large number of unsorted cells from tumor biopsies are subject to Single-cell RNA sequencing (scRNA-seq) and then classified as malignant cells, stromal cells, and immune cells.
The distinction of malignant from non-malignant cells is a key step in the follow-up analysis of scRNA-seq tumor datasets. The basic idea to solve such a problem relies on estimating common copy number alterations that characterize aneuploidy cells. The copy number profiles are obtained by considering the gene expression profiles of each cell as a function of the genomic coordinates.
The main drawback is that the clusters of reference non-malignant cells require manual identification, and recent work that tries to overcome this problem is severely affected by a wrong identification of normal cells and, similarly to other methods, was not designed to perform a complete automatic identification of the clones, reporting their breakpoints, the specific and shared alteration and a complete clonal deconvolution.
We have developed Single CEll Variational ANeuploidy analysis (SCEVAN). It uses a multichannel segmentation algorithm that exploiting the assumption that all the cells in a given copy number clone share the same breakpoints. Thus, the smoothed expression profile of every individual cell constitutes part of the evidence of the copy number profile in each subclone. SCEVAN exploit a set of stromal and immune signatures and the fact that malignant cells often harbor aneuploid copy number events to automatically discriminate between transformed cells and micro-environment cells. Afterwards, SCEVAN performs a complete downstream analysis to automatically identify tumor subclones, classifying their specific and shared alterations up to a clone phylogeny.
We apply SCEVAN to several datasets encompassing 106 samples and 93,322 cells from different tumors types and technologies. For which SCEVAN exhibits faster and more accurate performance against state-of-the-art methods. Clonal deconvolution extracted from scRNA-seq can also be used to study tumor evolution, for example in glioma tumors has allowed us to confirming that the heterogeneity of glioma subtypes is driven by the clonal architectures and to identify novel drivers of cellular states such as the Proliferative/Progenitor (PPR) subtype.
SCEVAN is available in open source as an R package at the following address \href{https://github.com/AntonioDeFalco/SCEVAN}{https://github.com/AntonioDeFalco/SCEVAN}.
Synthetic Biology aims at engineering biological systems with new functionalities, with applications ranging from health treatments to bioremediation, production of biofuels and drugs in bioreactors. This is made possible by embedding artificial genetic circuits into living cells, such as bacteria, yeast, and fungi, modifying their natural behavior; that is, by synthetically modifying when and how much genes are expressed to produce proteins or other chemicals of interest.
In this talk we will briefly present the work that we have done at the University of Naples in the context of the project COSY-BIO funded by the European Union, which finished last year, and some of the ongoing research.
Our work has been focused on the exploitation of the so-called “genetic toggle-switch”, which is a fundamental component in Synthetic Biology as it plays a key role in cell differentiation and decision making. Its importance comes from its ability to endow host cells with memory of previous stimuli allowing them to completely change their behavior in response. Specifically, we present how, thanks to its characteristics, the genetic toggle-switch can be used either to regulate the expression of two proteins of interest to some intermediate level [1-2] or as a reversible memory mechanism allowing cells to differentiate and balance labor in multicellular applications [3-4]. Moreover, we present some recent results on the control of the ratio and the growth rate of cell populations for biomedical and industrial applications [5-7].
[1] D. Fiore, A. Guarino, M. di Bernardo – “Analysis and control of genetic toggle switches subject to periodic multi-input stimulation”, IEEE Control Systems Letters (2018)
[2] A. Guarino, D. Fiore, D. Salzano, M. di Bernardo – "Balancing cell populations endowed with a synthetic toggle switch via adaptive pulsatile feedback control", ACS Synthetic Biology (2020)
[3] D. Fiore, D. Salzano, E. Cristòbal-Cóppulo, J.M. Olm, M. di Bernardo – "Multicellular feedback control of a genetic toggle-switch in microbial consortia", IEEE Control Systems Letters (2020)
[4] D. Salzano, D. Fiore, M. di Bernardo – “Ratiometric control for differentiation of cell populations endowed with synthetic toggle switches”, Proc. of the 58th IEEE Conference on Decision and Control (2019)
[5] D. Fiore, F. Della Rossa, A. Guarino, M. di Bernardo – "Feedback ratiometric control of two microbial populations in a single chemostat", IEEE Control Systems Letters (2021)
[6] V. Fusco, D. Salzano, D. Fiore, M. di Bernardo – "Embedded control of cell growth using tunable genetic systems", International Journal of Robust and Nonlinear Control (2022)
[7] G. Perrino, S. Napolitano, F. Galdi, A. La Regina, D. Fiore, T. Giuliano, M. di Bernardo, D. di Bernardo – "Automatic synchronisation of the cell cycle in budding yeast through closed-loop feedback control", Nature Communications (2021)
The list of known exoplanets is rapidly growing (almost 5000 at the moment), but we know only an handful of rocky planets in their circumstellar habitable zone. We propose a revision of the criteria for defining the habitable zone, based on the updated knowledge of the role and diffusion of biometals, and present updated results on the statistics of Earth-like planets in the Galaxy.
Genomic surveillance of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the only approach to rapidly monitor and tackle emerging variants of concern (VOC) of the COVID-19 pandemic. Such scrutiny is crucial to limit the spread of VOC that might escape the immune protection conferred by vaccination strategies. It is also becoming clear now that efficient genomic surveillance would require monitoring of the host gene expression to identify prognostic biomarkers of treatments efficacy and disease progression. Here we applied an integrated workflow for RNA extracted from nasal swabs to obtain in parallel the full genome of SARS-CoV-2 and transcriptome of host respiratory epithelium, altogether representing the majority of Italian processed genomic samples. We have matured and applied novel proof-of-principle approaches to prioritize possible gain-of-function mutations by leveraging patients' metadata and isolated patient-specific signatures of SARS-CoV-2 infection. The aforementioned goals have all been achieved in a cost-effective manner that does not require automation, in an effort to allow any lab with a benchtop sequencer and a limited budget to perform integrated genomic surveillance on premises.
Our approach extends the scope of SARS-CoV-2 genomic surveillance, as it allows for the examination of in-vivo samples characterized by the predominance of degraded RNA molecules. This competence enables overcoming the limitation of in-vitro and single-cell studies, such as model-specific variations and a small number of samples limit, respectively. Gene expression data from COVID-19 patients might have a pivotal role as a bridge between genomic data and translational medicine. On one hand, finding a gene signature that describes and defines the patient status after SARS-CoV-2 infection might support new variants surveillance and address their pathogenic effect on the host. On the other hand, it might be used to evaluate the efficacy of new treatments, especially non vaccine-based.
Abstract:
Research activities carried out at the Dept. of Agricultural Sciences involve: i) the development of process-based mechanistic models for the quantitative analysis of biological systems using several approaches such as Ordinary and Partial Differential Equations (ODE and PDE) and Individual-Based (IBM). In this context we work on the integration of different approaches to simulate the temporal and spatial dynamics at different scales; ii) the use of System Dynamic models (ODE) to simulate the growth of microbial cultures mainly driven by metabolic fluxes. PDE models have been used to simulate the emergence of vegetation patterns simulating plant-soil interactions and, with a similar approach, the differentiation of vascular tissues in plants; iii) the implementation of hybrid modeling aiming at integrating continuous approaches (ODE, PDE) within an IBM framework. Such models have been proposed and applied at ecological scale to simulate the formation of vegetation patterns and at tissue/organ scale to simulate xylogenesis and wound closure in plants. The hybrid modeling approach has the big advantage of simulating complex systems as sets of different modules, which can be implemented in different mathematical approaches most appropriate to render the subsystem under consideration; iv) the implementation of pest models in a geospatial framework including cyberinfrastructures to enhance model development and exploitation. There is a strong connection with the data management part, since climate, environmental and pest parameters geospatial data (cubes) support the deployment and the geospatial usage of the model.
Recent publications:
Background | Neuroblastoma is a paediatric tumour of the peripheral sympathetic nervous system originating from the neural-crest cells. It is the second most common childhood solid cancer and its cure remains a challenge. In recent years, next generation sequencing of neuroblastoma has documented low somatic mutation rates and few recurrently mutated genes. As a result, the search for therapy targets is limited. Furthermore, as most studies on neuroblastoma relied mainly on whole exome sequencing, the role of somatic mutations in non-coding regulatory regions remains underestimated. Moreover, the growing interest in noncoding cis-regulatory variants as cancer drivers is currently hampered by numerous challenges and limitations of variant prioritization and interpretation methods and tools.
Aims | We hypothesized that mutated active regulatory elements could de-regulate genes involved in the tumorigenesis of neuroblastoma.
Methods | To overcome the limitations of noncoding driver analysis, we focused on active cis-regulatory elements (aCREs) to design a customized panel for deep sequencing of 56 neuroblastoma tumor and normal DNA sample pairs. We defined CREs by a reanalysis of H3K27ac ChiP-seq peaks of 25 neuroblastoma cell lines. Common H3K27ac peaks represented our target in which to search for driver mutations. We tested these regulatory genomic regions for an excess of somatic mutations and assessed the statistical significance with a global approach accounting for chromatin accessibility and replication timing. Additional validation was provided by analyzing whole-genome sequences of 151 neuroblastomas. HiC data analysis was used to determine the presence of candidate target genes interacting with mutated regions. We also used the k-means clustering algorithm to divide transcriptomic data of 498 neuroblastoma samples into two groups based on expression levels of genes that (according to the HiC results) significantly interacted with mutated aCREs. Moreover, we conducted a motif analysis to assess whether the somatic variants within the selected aCREs disrupted or created transcription factors binding motifs.
Results | We identified a significant excess of somatic mutations in aCREs of diverse genes including IPO7, HAND2, and ARID3A, and used the luciferase reporter gene assays and the CRISPR-Cas9 editing to assess the functional consequences of the mutated IPO7 aCRE on candidate target genes (IPO7, TMEM41B, DENND5A) (P<1.0x10-03). Taken together, patients with noncoding mutations in aCREs showed inferior overall, and event-free survival (P<2.0x10-03). By multivariable analysis, we confirmed that the noncoding mutational burden was independent of age at diagnosis, tumor stage, risk group, and MYCN status (P<2.0x10-02). We also found that the expression profiles of many of the aCREs target genes (tested singularly and in a combined manner) associated with markers of unfavorable prognosis and low survival rates (P<5.0x10-02). Furthermore, we conducted a motif analysis to identify transcription factors with altered binding motifs. Overall, the biological functions of aCRE target genes and those of transcription factors with mutated binding motifs converged towards processes related to embryonic development and immune system response (P<5.0x10-02). This suggests that the combined effect of noncoding cancer driver mutations is the alteration of gene sets involved in specific molecular mechanisms underlying neuroblastoma tumorigenesis.
Conclusion | We integrated multiple data levels taking epigenomics, genomics and transcriptomics information of neuroblastoma to set up an alternative approach for detecting and studying regulatory cancer driver mutations. Our strategy enabled us to identify mutated regulatory regions that may play an important role in regulating biological processes associated with tumor development and immune escape.
Genomic Medicine (GM) is an interdisciplinary medical specialty, whose goal is applying genomic information to clinics and research. Genome data, such as those generated by microarray (MA) and next generation sequencing (NGS), are natively digital and thus well suited for computational analysis and sharing. Nonetheless, their volume and complexity require ad hoc computational approaches and bioinformatics infrastructures. Commercial and academic organizations have developed several tools for common diagnostic purposes; however, more complex analyses are still poorly covered.
Since our activity is dedicated to patients with rare diseases who manifest complex phenotypes and undergo multiple genomic assays, we have to assemble ad-hoc custom analytical pipelines.
We developed genePryor, a prioritization tool for NGS variants. genePryor is aimed to quickly identify known pathogenic variants, as well as highlight potential new disease-causing genes and variants. genePryor integrates information from multiple public and user-provided sources, considers inheritance analysis on multiple pedigrees of variable complexity, and integrates results from MA and NGS. genePryor has been used to analyze data from the Telethon Undiagnosed Disease Program (TUDP - about 1800 WES), contributing to the high diagnostic yield of the program, with about 50% conclusive diagnoses and novel genetic diseases identified. Planned improvements regard integrating tools to automatically match patient-gene phenotypes, considering non-Mendelian patterns of inheritance (like digeny), improving identification of variants with potential regulatory effects.
To study copy number variants (CNV) with a potential positional effect, we challenged the hypothesis that CNV may affect gene expression and determine a clinical phenotype by altering the genomic region between disease-genes and their enhancers. We studied 1900 CNVs from the cytogenetic units of Federico II and identified 27 CNVs located in gene-enhancer intervals. After manual curation, we found a consistent match between the gene and the patient phenotypes for a deletion located in the locus of Sonic Hedgehog (SHH) gene carried by a patient with a complex phenotype. For this CNV, the Strings-and-Binders model supported a slight reduction of gene-enhancer interactions, consistent with a potential positional effect of the variant. We plan to repeat the analysis on data from public repositories to verify the generalizability of our hypothesis and to refine the workflow.
Pinelli, M., Terrone, G., Troglio, F., Squeo, G. M., Cappuccio, G., Imperati, F., Pignataro, P., Genesio, R., Nitch, L., Del Giudice, E., Merla, G., Testa, G., & Brunetti-Pierri, N. (2020). A small 7q11.23 microduplication involving GTF2I in a family with intellectual disability. Clinical Genetics
Haijes, H. A., Koster, M. J. E., Rehmann, H., Li, D., Hakonarson, H., Cappuccio, G., Hancarova, M., Lehalle, D., Reardon, W., Schaefer, G. B., Lehman, A., van de Laar, I. M. B. H., Tesselaar, C. D., Turner, C., Goldenberg, A., Patrier, S., Thevenon, J., Pinelli, M., Brunetti-Pierri, N., … van Hasselt, P. M. (2019). De Novo Heterozygous POLR2A Variants Cause a Neurodevelopmental Syndrome with Profound Infantile-Onset Hypotonia. The American Journal of Human Genetics
Goldmann, J. M., Wong, W. S. W., Pinelli, M., Farrah, T., Bodian, D., Stittrich, A. B., Glusman, G., Vissers, L. E. L. M., Hoischen, A., Roach, J. C., Vockley, J. G., Veltman, J. A., Solomon, B. D., Gilissen, C., & Niederhuber, J. E. (2016). Parent-of-origin-specific signatures of de novo mutations. Nature Genetics
Peptide Nucleic Acids (PNAs), introduced by Nielsen et al. in 1991, are synthetic DNA/RNA analogues and represent a promising tool for gene modulation in anticancer treatment[1]. In the PNA structure, repetitive N-2-aminoethyl-glycine units replace the traditional sugar-phosphate DNA backbone, and the polyamide chain is connected to nucleobase covalently via carboxymethyl spacer. Thanks to their uncharged peptidyl backbone and resistance towards chemical and enzymatic degradation, PNAs can form hybrid complexes with complementary DNA or RNA strands [2-3]. In this view, advanced computational methods based on both conventional and accelerated Molecular Dynamics (cMD and aMD, respectively) simulations were helpful to accurately elucidate the atomistic structural organisation of two differently protonated PNAs structures wrapped into triplex DNA/PNA. In particular, aMD allowed us to improve the conformational space sampling by reducing energy barriers separating different states of a system, thus observing atomistic details about the conformational changes of the two triplex systems. In fact, although the mechanistic aspects for the formation of PNA-DNA triplexes are known, detailed structural information on the PNA-DNA heterotriplexes are still missing. Our findings are in agreement with experimental data and lay the foundation for a further development of novel PNAs in anticancer therapy.
[1] Nielsen, P.E., Egholm, M., Berg, R.H., Buchardt, O., 1991. Sequence-selective recognition of DNA by strand displacement with a thymine-substituted polyamide. Science 254, 1497–1500.
[2] Verona, M. D. et al. Focus on PNA Flexibility and RNA Binding using Molecular Dynamics and Metadynamics. Sci. Rep. 7, 42799;
[3] Zarrilli, F.; Amato, F.; Morgillo, C.M.; Pinto, B.; Santarpia, G.; Borbone, N.; D’Errico, S.; Catalanotti, B.; Piccialli, G.; Castaldo, G.; Oliviero, G. Peptide Nucleic Acids as miRNA Target Protectors for the Treatment of Cystic Fibrosis. Molecules 2017, 22, 1144.
In recent years, mathematical models are providing fundamental support in cancer research. By accounting for various biological and physical processes, these models can reveal insights into the dynamics of tumour growth and invasiveness, thereby allowing the development of pharmacological strategies to control tumour proliferation and invasion. The use of these tools is becoming increasingly widespread in the clinical field, predicting in some cases with a very high precision both the course of the specific patient and his response to therapies.
Cellular automata are simplified, discrete mechanistic models where a cell population evolves autonomously in time according to pre-defined rules that capture elementary biological processes, such as cell proliferation, cell motility, and cell-cell interactions. In our work, a new in silico model able to mimic some of the peculiar characteristics of tumour cells, such as fast proliferation, high cell motility, impaired cell adhesion, and elevated sensitivity to chemotactic stimuli is proposed. The goal of this research, run in collaboration with Houston Methodist Academic Institute, is to investigate tumour growth and invasiveness and its response to specific pharmacological treatments. In particular, tumoral cell motility and invasiveness are investigated mimicking the evolution of cell tissues in 2D and 3D models. Numerical predictions are validated by direct comparison with experimental data developed in-vitro. 2D cell monolayer (Wound Healing) and 3D spheroids in Extracellular Matrix scaffold have been monitored by Time Lapse microscopy to obtain quantitative measurement of dynamic evolution in in vivo mimicking conditions.
Aberrant expression of transcriptional regulators can affect oncogenic gene expression programs in cancer. Mutations in the tumor suppressor TP53 (p53) are commonly found in UV-damaged skin and are thought to protect damaged epidermal cells from senescence and or oncogene-induced apoptosis, favoring cancer formation. In contrast, the other p53 family members TP63 (p63) and TP73 (p73) are rarely mutated in cancer and TP63 is often amplified or overexpressed in squamous cell carcinoma (SCC). Here, we demonstrate that both p63 and p73 are required for cell proliferation in skin SCC and are overexpressed in preneoplastic lesions and in skin SCCs. p63/p73 form heterotetramers and co-occupy thousands of regulatory regions, jointly controlling a transcriptional program that promotes cell proliferation and tumorigenesis. Combining gene targeting with transcriptomic and epigenetic analyses revealed that p63 and p73 control a transcriptional feed-forward circuit that sustains cell proliferation. We find that in skin SCC a key signaling pathway downstream of p63/p73 is the Epidermal Growth Factor Receptor (EGFR)/MAP kinase. p63/p73 directly and positively control transcription of the EGFR ligands, among which amphiregulin (AREG) is the most highly expressed. p63, p73 and AREG are required to maintain skin SCC proliferative potential, anchorage independent growth, and to promote tumorigenesis. Thus, p63 and p73 act as oncogenic drivers in skin SCC, and AREG is a crucial non-cell-autonomous effector downstream of p63 and p73 in skin SCC formation.
CDC25 phosphatases (CDC25S) are members of the family of dual-specificity phosphatases (DSPs) and play a critical role in the regulation of the cell cycle. The overexpression of CDC25s in many human cancers supports their clinical significance and has encouraged the pursuit of specific small-molecule inhibitors. Unfortunately, there are currently no available CDC25 inhibitors with clinical utility. In recent years, our research group has been actively involved in this field, by discovering new drug-like CDC25s targeting molecules endowed with marked antiproliferative effect at cancer cells. Starting from the initial identification of new lead compounds by structure-based virtual screening [1], we then embarked on a medicinal chemistry optimization program, involving multidisciplinary approaches and in particular computational techniques, which eventually led to the discovery of novel chemotypes able to potently inhibit melanoma cells proliferation by triggering apoptosis [2,3]. Thus, CDC25s targeting might open up a new avenue for drug intervention in antimelanoma therapy.
References
[1] Lavecchia, A. et al. J. Med. Chem. 2012, 55, 4142-4158.
[2] Capasso, A. et al. Oncotarget 2015, 6, 40202-40222.
[3] Cerchia, C. et al. J. Med. Chem. 2019, 62, 7089-7110.
The "Omics Sciences" have revolutionized modern biology. To date, there is no scientific field, from medicine to environmental sciences, passing through biochemistry and pharmacology that does not resort to these sciences for the study of complex biological systems.
Proteomics among these fields aims to study the entire set of constitutive proteins of a tissue, an organism in specific moment with the ambitious prospect of correlating this ‘molecular snapshot’ to the observed phenotype.
From the conception of proteomics as a large-scale evolution of the chemistry and biochemistry of proteins, we have gradually come to the definition of a science that has revolutionized the central dogma of biology highlighting how every metabolic and functional process is the result of a complex network of nonlinear interactions between genes, transcripts and proteins.
The bursting success of Proteomics and all other “Omics Sciences” has been possible in the last decade thanks to the strong technological push supported by the development of powerful bioinformatics tools that allow the qualitative and quantitative analysis of mass spectrometry data, together with functional analysis and correlation of the genes, transcripts, proteins or metabolites to reconstruct the appropriate relationship networks.
In the field of Proteomics investigation, two main application areas have been taken off: functional proteomics, which aims to define the molecular mechanisms underlying biological processes of interest through to the identification of in vivo protein-protein interaction (PPI) [1]; differential proteomics, addressed to the comparison of protein expression profiles in multiple biological conditions, e. g. wild type vs mutant or vs pharmacologically treated, etc, in order to define the biological processes affected by the specific treatment or condition. Different methodologies have been developed to carry out the qualitative-quantitative analyses of the protein content in samples using both labelled and label-free approaches. [2]
Among many application fields, both these approaches are also largely employed in the investigation of a biological process, both in physiological and pathological conditions such as oncological diseases [3], neurodegenerative disorders, [4], as it will be discussed in the current presentation.
[1] Iacobucci I. et al. J Proteomics. J Proteomics. 2021 Jan 6; 230: 103990.
[2] Cozzolino F. et al PLoS One. 2020 Sep 4;15(9): e0238037.
[3] Federico A. et al. Biochim Biophys Acta Gene Regul Mech. 2019 Apr;1 862(4):509-521.
[4] Cozzolino F. et al. Hum Mol Genet. 2021 Jun 17;30(13):1175-1187.
Computational protein design has collected many successes in recent years,1,2 however de novo proteins with a tetrathiolate mononuclear metal site have never been characterized both in structure and function. In this case, the selection of the first and second sphere of the iron center able to purposely induce a chosen redox potential is still a difficult task3. Besides in repurposed natural scaffolds or in small cyclic peptide moieties4, de novo proteins featuring tetrathiolate metal clusters have never been structurally characterized before. We present, for the first time, the structural and functional features of a fully designed FeS4 protein and its cognate Zn adduct, namely METPsc. Inspired by natural rubredoxins, this miniaturized protein does not hold any sequence correlation to the known congeners, as assessed by BLASTP. Strikingly, METPsc 28-long sequence stores all the information required to fold around the metal in a tetrahedral geometry and to function as an electron-transfer protein, as confirmed by crystallography, UV-Vis and EPR spectroscopy, and cyclic voltammetry. Finally, we exploited its terminal electron acceptor properties in an artificial electron chain triggered by visible light. Its applicability in optoelectronics and light-harvesting biodevices is being explored.
DNA methylation is one of the most studied epigenetic modifications, with an established role in regulating gene expression and genome stability. It consists of enzyme mediated addition of a methyl-group to DNA bases. By acting in concert with other epigenetic marks, DNA methylation shapes the fate and engraves the identity of a cell. Its dysregulation has been linked to pathological conditions, both as an epiphenomenon and as a driver event.
The methylation status of a cytosine residue is usually represented as the proportion of molecules in which the residue is methylated (average methylation), and differential analysis are concerned at finding residues whose methylation status shifts among conditions. As an alternative approach, the methylation status of a locus can be explored in terms of epialleles, i.e., the possible arrangements of methylated and unmethylated cytosines in individual DNA molecules. Epiallele profiling (the assessment of the frequency of the possible epialleles) enables to dissect methylation heterogeneity of a locus.
In recent years, our group developed bioinformatic tools and computational methods to analyze epiallele profiles in ultra-deep (UD) amplicon bisulfite sequencing data, a targeted sequencing assay in which one or few loci are sequenced at high depth, thus enabling a robust estimate of epialleles [1,2,3,4]. In this way, we were able to gain insights on DNA methylation dynamics, showing that 1) DNA methylation is highly heterogeneous among cells; 2) DNA methylation is mostly a non-stochastic phenomenon, with epiallele profiles being stable across different individuals; 3) According to mathematical models, the observed heterogeneity is compatible with a dynamic equilibrium between DNA methylation and demethylation; 4) Epiallele profiles can be a cell-specific signature. Studying epiallele profiles can aid to track the spatiotemporal evolution of cell-to-cell methylation differences in a cell population.
In the last two years, our group was concerned at applying the analysis of epiallele profiles to genome-wide data. To this aim, in collaboration with the group of Dr. Giovanni Scala, we developed a bioinformatic tool, EpistatProfiler. Currently, we are applying this approach to track the dynamic of epiallele profiles upon neuronal differentiation. We are also investigating how this dynamic can be disrupted in epigenetically dysregulated contexts, as enzymatic machinery knock-out and cancer.
References
Abstract:
We present some of the research lines carried out at the Dept. of Agricultural Sciences aiming at developing databases for multiple purposes: i) management of geospatial data, both vector and raster data models using widespread (postgreSQL+Postgis, GeoServer) and cutting edge (rasdaman) technologies. Data other than geospatial can be stored according to reference standards. Common data include: soil, vegetation, pest, climate, environment, hydrology, and so forth. In the Mascabruno building there is a data center with ~200TB of HDD storage and 8 cores rasdaman enterprise license; ii) with the pressure of feeding an ever growing population and meeting new environmental challenges, future economies and societies will be depending on sustainable crop production and protection. Through the use of natural plant stimulants, hormones, and other nutrients, research is aiming to improve the efficiency, the physiological, and the molecular mechanisms behind these trending supports. We are involved in the development of the Sustainable Crop Production Atlas (SCPA) framework to comprehensively annotate and disseminate the knowledge involving new ways for sustainable crop production; iii) PRGdb is a web accessible open-source database that represents the first repository providing a comprehensive overview of pathogen receptor genes (PRGs) in plants. The database collects information on isolated and predicted pathogen receptor genes (PRG) and tools for facilitating their analysis. In the latest version (PRGdb 4.0) a robust prediction tool for PRG genes, named DRAGO 3, based on HMM and BLAST search is available. Furthermore, the inferred cross-link between genomic and phenotypic information allows access to a large body of information to find answers to several biological questions.
In addition, the Department contributes on the implementation of Omics and Metaomics resources in Plant Genomics, Health, Food Sciences and Nutrigenomics, and in Marine Biology contributing to EU projects and to European Infrastructures like EMBRC, ELIXIR, EMSO and to the European open sciences-life initiatives.
Recent publications:
The schematization of DNA structure can be tested by the Chern–Simons theory, that is a topological field theory mostly considered in the context of effective gravity theories. By means of the expectation value of the Wilson Loop, derived from this analogue gravity approach, it is possible to find the point-like curvature of genomic strings in KRAS human gene and COVID-19 sequences, correlating this curvature with the genetic mutations. The point-like curvature profile, obtained by means of the Chern–Simons currents, can be used to infer the position of the given mutations within the genetic string. Generally, mutations take place in the highest Chern–Simons current gradient locations and subsequent mutated sequences appear to have a smoother curvature than the initial ones, in agreement with a free energy minimization argument.
Molecular Dynamics (MD) is a powerful computational technique used to understand the physical basis of the structure and function of biological systems.[1] In this context, coarse-grained (CG) models have been successfully applied to a broad range of bio-molecular systems, including the self-assembly of lipids in aqueous solutions. However, many biologically relevant processes occur on timescales that far exceed the timescales of typical MD simulations using CG models. Thanks to an innovative simulation technique, name hybrid particle-field (hPF),[2,3] is possible to study large scale systems beyond what is feasible with traditional MD and CG models.[4] A special class of CG models, developed for the hPF technique, have been successfully used to investigate several problems in biophysics.5–8 The first application of CG hPF models, with parameters for phospholipids only, was published by the Milano group in 2011.[5] Thanks to the speed up of dynamics, due to the hPF approach, the self-diffusion acceleration lead to a fast self-assembly process. The net effect is that the developed CG models can reproduce, via self-assembly, the lamellar and non-lamellar structure phases of many lipids and surfactants.[5–8] Our aim is to highlight recent applications and provide a comprehensive overview of hPF CG models for biological applications.
(1) Karplus, M.; McCammon, J. A. Molecular Dynamics Simulations of Biomolecules. Nat. Struct. Biol. 2002, 9 (9), 646–652. https://doi.org/10.1038/nsb0902-646.
(2) Milano, G.; Kawakatsu, T. Hybrid Particle-Field Molecular Dynamics Simulations for Dense Polymer Systems. J. Chem. Phys. 2009, 130 (21), 214106. https://doi.org/10.1063/1.3142103.
(3) Milano, G.; Kawakatsu, T. Pressure Calculation in Hybrid Particle-Field Simulations. J. Chem. Phys. 2010, 133 (21), 214102. https://doi.org/10.1063/1.3506776.
(4) Milano, G.; Kawakatsu, T.; De Nicola, A. A Hybrid Particle–Field Molecular Dynamics Approach: A Route toward Efficient Coarse-Grained Models for Biomembranes. Phys. Biol. 2013, 10 (4), 045007. https://doi.org/10.1088/1478-3975/10/4/045007.
(5) De Nicola, A.; Zhao, Y.; Kawakatsu, T.; Roccatano, D.; Milano, G. Hybrid Particle Field Coarse Grained Models for Biological Phospholipids. J. Chem. Theory Comput. 2011, 7 (9), 2947–2962. https://doi.org/10.1021/ct200132n.
(6) De Nicola, Antonio; Kawakatsu, T.; Rosano, C.; Celino, M.; Rocco, M.; Milano, G. Self-Assembly of Triton X‑100 in Water Solutions: A Multiscale Simulation Study Linking Mesoscale to Atomistic Models. J Chem Theory Comput 2015, 13. https://doi.org/10.1021/acs.jctc.5b00485.
(7) De Nicola, A.; Kawakatsu, T.; Milano, G. A Hybrid ParticleField CoarseGrained Molecular Model for Pluronics Water Mixtures. Macromol Chem Phys 2013, 11.
(8) De Nicola, A.; Soares, T. A.; Santos, D. E. S.; Bore, S. L.; Sevink, G. J. A.; Cascella, M.; Milano, G. Aggregation of Lipid A Variants: A Hybrid Particle-Field Model. Biochim. Biophys. Acta BBA - Gen. Subj. 2020, 129570. https://doi.org/10.1016/j.bbagen.2020.129570.
Cancer is a genetic disease resulting from the accumulation of genomics alterations in living cells. Large scale genomics studies have been instrumental to understand the recurrent somatic genetic
alterations within a cell, including chromosome translocations, single base substitutions, and copy-number alterations and for the characterization of their functional effects in transformed cells. One of the main challenging questions in this field is how to exploit all these molecular information to identify therapeutic targets and to develop personalized therapies. The understanding of the molecular features influencing sensitivity to drugs is the key element for the development of personalized therapies and to predict which patients should be treated and with which drugs and finally to evaluate eligibility criteria for oncology trials.
Machine learning models are able to exploit multi-modal screening datasets such as Projects such as Genomics of Drug Sensitivity in Cancer (GDSC), Cancer Cell Line Encyclopedia (CCLE), Cancer Therapeutics Response Portal, NCI-60 and others to develop predictive algorithms useful to associate omics features with response. The basic approach is to use the data from these screenings to train a machine learning model that predicts the 50% inhibitory concentration (IC50) of a drug from the multi-omics profile of a cell line or a tissue sample. There have been several attempts at applying this approach using various machine learning frameworks such as Variational Autoencoders, Deep Networks, Convolutional Neural Networks, ensemble Neural Network models and combination of these approaches with different encodings of the features .
Most of these studies use the machine learning models as “black boxes" optimized for prediction accuracy without the possibility to interpret the biological mechanisms underlying predicted outcomes.
Recently, some models were proposed to address this issue, but many of them just rely on somatic single nucleotide variations of the screened models; activity of the pathways, measured by gene expression profiling, is not taken into account, neither other important genomics alterations, such as copy number variations (CNV) that are of particular interest in cancer progression. Second, they do not take into account the unbalanced nature of the data since, in all large scale screening repositories, the values of IC50 are clustered around the value representing lack of sensitivity (for measures of sensitivity based on AUC, this value is 1) with a small minority of values representing sensitivity of a cell line to a specific drug.
In order to address these limitations we propose a Multi-Omics Visible Drug Activity prediction (MOViDA) neural network model that extends the visible network approach incorporating functional information in terms of pathway activity from gene expression and copy number data into a neural network. Moreover, MOViDA is trained considering the unbalance of the dataset, we used a random sampler based on a multinomial distribution that accounts for the skewness of the dataset. We compare MOViDA with DrugCell showing that it is more accurate in predicting sensitivity to drugs, especially in the classes corresponding to lower AUC that represent those of more interest. In order to exploit the biological interpretation of network nodes we also develop an ad hoc network explanation method that scores the pathways that affect the prediction of sensitivity of a given cell line to a drug.
To make this data useful for other purposes, we have identified which GOs and genes are good predictors for high sensitivity of a cell line to a drug. This explanation is the basis to hypothesize drug combinations and cell editing aimed at the identification of cell vulnerabilities.
Motivation: DNA methylation is an epigenetic modification, primarily occurring at CpG sites, that is involved in major biological mechanisms, such as the regulation of gene expression and the genome stability. Typically, association studies based on this modification are focused on the identification of genomic regions whose average DNA methylation differs among distinct conditions. However, studying the methylation status of cytosines at single-molecule level can provide additional insights about the cell-to-cell heterogeneity and the cell clonality within a sample. In this context, all the different combinations of CpGs methylation states that can be observed in a given locus are defined as epialleles. Several bioinformatic tools have been developed to extract epiallelic information from bisulfite sequencing data. Nevertheless, these tools have some limitations on the selection of the regions that can be profiled (e.g., number of CpG sites) and they do not provide support on the availability of dedicated statistical tests on the epiallele compositions derived from their output.
Methods: Here we present a novel workflow that can be used to retrieve epiallelic profiles from bisulfite sequencing data. In particular our workflow allows: data loading and filtering, regions design, and epialleles extraction. Dedicated statistical tests can then be used to identify regions that differ among groups based on their epiallelic composition.
Results: We developed EpiStatProfiler, a new R-package providing a library of functions that can be used to extract and summarise epialleles from any type of bisulfite sequencing data and to perform downstream statistical comparisons among different groups. The tool is intended to enable a customized selection of target regions, according to a set of user-defined parameters (minimum coverage, number of cytosines, minimum window size). Furthermore, it is also possible to analyse strand specific and non-CG methylation. Epialleles information is stored by EpiStatProfiler in two different outputs: a compressed 0-1 matrix containing the epialleles composition for each analysed region and an additional output containing basic features and multiple metrics derived from the profiled regions. EpiStatProfiler provides a set of functions to perform epiallele-based comparisons in longitudinal and cross-sectional studies. We believe that this package could represent a valuable tool to qualitatively analyse the methylation heterogeneity in a variety of systems, such as tumor evolution, cell differentiation and disease conditions.