Seoighe Research Group
The focus of our group is on modeling molecular biological data, including gene expression and alternative splicing and the evolution of viruses. We are working on a range of topics, including several with local relevance in South Africa.
Modeling viral evolution, with application to HIV-1 immune evasion and drug resistance
RNA viruses, such as HIV, have extremely high mutation rates, large effective population sizes and short generation times, resulting in a phenonmenal capacity to generate diversity and to adapt to changes in their environment. This has important implications for the design and monitoring of drug treatment regimens and is a key reason for the difficulty of developing a globally effective HIV vaccine. Thus the development of tools to understand HIV evolution within and across infected individuals is not only of basic scientific interest, but also has the potential for clinical applications, in terms of guiding the design of vaccines and treatment regimens.
The evolution of molecular sequences is typically modelled as a continuous-time Markov process, characterized by a rate matrix Q with elements q_ij denoting the instantaneous DNA substitution rates from state i to state j, where the character states are typically nucleotides or triplets of nucleotides (codons). This allows determination of a transition probability matrix P(t), as a function of time, exp(Qt), which describes the probability of a substitution from state i to state j in a time interval, t. Typically, sequences are modelled along the branches of a phylogenetic tree, which may be known, or treated as one of the parameters to be estimated.
Model comparison techniques have been applied in this context to compare the fit of competing models of sequence evolution. Of particular relevance has been the evaluation of evidence for positive Darwinian selection. Sites within molecular sequences that are affected by positive selection provide information about processes such as evasion of host immune responses and the development of drug resistance, which are of clinical significance. My research group's activities in this area have included the development of a fully Bayesian method to assess evidence of positive selection acting on protein-coding sequences using a Markov Chain Monte Carlo method (Scheffler & Seoighe 2005). The applicability of phylogenetic models to HIV sequences has been questioned because of the high rate of recombination of HIV sequences (Anisimova, Nielsen, & Yang 2003;Shriner et al. 2003). HIV is a double stranded RNA virus. Dual-infection of cells by divergent viral strains, although relative rare, has a high probability of resulting in recombination between non-homologous viral strands packaged in the same virion. Sequence histories that include recombination cannot be represented as bifurcating trees that are typically used to model sequence evolution and application of standard selection models to these data have been shown to give invalid results. Using the batch language of HyPhy (Kosakovsky-Pond et al. 2005), we have developed a method that can be used to model the evolution of recombining sequences (Scheffler, Martin, & Seoighe 2006). More recently we have developed an evolutionary model specifically designed to detect directional selection associated with the evolution of drug resistance (Seoighe et al. 2007). This work was in collaboration with the National Institutes of Communicable Disease (NICD) and the University of the Western Cape.
Cellular immune responses are thought to be a major driving force of HIV evolution (Bhattacharya et al. 2007). We have investigated the relationship between human leukocyte antigen (HLA) alleles and cytotoxic T lymphocyte responses as measured using the interferon gamma ellispot and have reported differences in the predictive capacity of HLA alleles in the context of viral strains and human populations (Ngandu et al. 2007). We have also developed probabilistic models designed to detect sites within viral coding sequences that are responsible for evasion of immune responses (in preparation) and, as part of a collaboration led by UCT virologists, have assessed the attenuating effects of immune escape mutations (Chopera et al. 2008).
Discovery of human polymorphisms that affect gene expression or mRNA splicing
Mutations that affect mRNA splicing are thought to be responsible for a large proportion of human diseases that are caused by point mutations and up to 74% of human genes are affected by alternative splicing (Johnson et al. 2003). Mutations in splicing factors can have profound biological effects and have already been shown to be of clinical importance, including as causative mutations of retinitis pigmentosa, a genetic disorder that has been studied intensively by collaborators in the UCT Human Genetics Department (Mordes et al. 2006;Rebello et al. 2004). Although the importance of mutations that alter splicing in heritable diseases is well recognized, prior to our work (Nembaware et al. 2004) there had been little appreciation and no attempt to measure the contribution of polymorphism to the diversity of transcript isoforms that is observable in the public databases. This situation has recently changed and the heritability of alternative splicing and the usefulness of high-throughput technologies for the detection of alternative splicing is now of great interest (Kwan et al. 2007). In particular, the significance of splicing mutations in human genetic disorders and the recent increase in the power of case-control association studies to identify disease-associated loci is likely to increase the interest in polymorphisms and mutations that affect splicing as efforts to identify the causative mutations at human loci with confirmed disease association intensify.
We have recently identified ~30,000 human polymorphisms at sites that are critical for mRNA splicing (splice donor and acceptor sites, branching points and exonic splice enhancers) and used publicly available tools to estimate the effect of these mutations on mRNA splicing (Nembaware et al. in press. We obtained raw signal intensity data generated using the Affymetrix GeneChip® Human Exon 1.0 ST Array. These data were generated by (Huang et al. 2007) from human cell lines used by the International HapMap consortium. As a result, in addition to expression measurements for approximately 1.4 million probe sets, genome-wide haplotype data are available for these cell lines. For each candidate splicing mutation identified in the genome-wide survey we used robust linear models to assess evidence for a genotype effect on splicing index. After correcting for multiple testing, we obtained more than 1,000 genes human polymorphsims that are strongly associated with mRNA splicing (Nembaware et al., in press). These mutations are likely either to have a direct effect on mRNA splicing or to be in linkage disequilibrium with mutations not detected in the genome-wide survey that, nonetheless, affect mRNA splicing.
Polymorphisms with an effect on gene expression (expression quantitative trait loci, or eQTLs) are common (Morley, et al. 2004) and of scientific and medical importance. Many of the techniques and some of the same biological cell lines that have been applied to detect splicing mutations have previously been applied to the discovery of eQTLs. We have developed a statistical model that can be used to detect eQTLs or genetic imprinting (preferential expression of one parental allele) from EST data. We implemented a model-comparison method for these data in a maximum likelihood framework and demonstrated the extended imprinted regions close to well-known human imprinted genes (Seoighe, Nembaware, & Scheffler 2006). A similar method can be used to detect allele-specific splicing from EST data and this methods forms part of our current work under review for publication.
Gene function, expression and evolution in plants
We have used genomic and microarray gene expression data from Arabidopsis thaliana to learn about the evolution of plants and plant gene expression. Interests include evolution by whole genome duplication (Seoighe & Gehring 2004) and the impact of gametophytic selection on gene compactness (Seoighe, Gehring, & Hurst 2005). We are currently investigating the relationship between genome organization and gene expression in plants.
Human genome variation and genome-wide association studies of complex disorders
As part of a collaboration with UCT and University of the Witwatersrand Human Genetics Departments we propose to develop resources to support genome-wide association studies in Southern African populations. The first phase of this project, which involved the generation of genome-wide genotype data from from five indigenous southern African populations at almost one million loci, using the Affymetrix SNP 6 array has now been completed. We have inferred phase information from these samples and have begun construction of a database and a web front end to provide access to these data. We are also investigating evidence of selective sweeps acting on southern African human populations using these data.
Research publications
- Chopera DR, Woodman Z, Mlisana K, Mlotshwa M, Martin DP, Seoighe C, Treurnicht F, de Rosa DA, Hide W, Karim SA, Gray CM, Williamson C; CAPRISA 002 Study Team. Transmission of HIV-1 CTL escape variants provides HLA-mismatched recipients with a survival advantage. PLoS Pathog. 2008 Mar 21;4(3):e1000033. pdf
- Poulter GL, Rubin DL, Altman RB, Seoighe C. MScanner: a classifier for retrieving Medline citations. BMC Bioinformatics. 2008 Feb 19;9:108. pdf
- Schwegmann A, Guler R, Cutler AJ, Arendse B, Horsnell WG, Flemming A, Kottmann AH, Ryan G, Hide W, Leitges M, Seoighe C, Brombacher F. Protein kinase C delta is essential for optimal macrophage-mediated phagosomal containment of Listeria monocytogenes. Proc Natl Acad Sci U S A. 2007 Oct 9;104(41):16251-6. pdf
- Ngandu N, Bredell1 H, Gray CM, Williamson C, Seoighe C, and the HIVNET028 Study Team (2007) CTL response to HIV-1 subtype C is poorly predicted by known epitope motifs AIDS Research and Human Retroviruses . 2007 Aug;23(8):1033-41 pdf
- Seoighe, C. Ketwaroo F., Pillay, V., Scheffler, K., Wood, N., Duffet, R., Zvelebil, M., Martinson, N., McIntyre, J., Morris, L., Hide, W. (2007) A model of directional selection applied to the evolution of drug resistance in HIV-1 Molecular Biology and Evolution Apr;24(4):1025-31 pdf
- Bredenkamp N, Seoighe C, Illing N. (2007) Comparative evolutionary analysis of the FoxG1 transcription factor from diverse vertebrates identifies conserved recognition sites for microRNA regulation. Dev Genes Evol. 2007 Jan 27;
- Seoighe C, Nembaware V, Scheffler K. (2006) Maximum likelihood inference of imprinting and allele-specific expression from EST data. Bioinformatics. 2006 Dec 15;22(24):3032-9. pdf
- Scheffler K, Martin DP, Seoighe C. (2006) Robust inference of positive selection from recombining coding sequences. Bioinformatics. 2006 Oct 15;22(20):2493-9. pdf
- Scheffler K, Seoighe C. (2005) A bayesian model comparison approach to inferring positive selection. Molecular Biology & Evolution. 2005 Dec;22(12):2531-40. pdf
- Seoighe C, Gehring C, Hurst LD. (2005) Gametophytic Selection in Arabidopsis thaliana Supports the Selective Model of Intron Length Reduction. PLoS Genet. 2005 Aug 5;1(2):e13 pdf
- Scheffler, K. & Seoighe C. (2005) Detecting molecular evidence of positive Darwinian selection. In Information Processing & Living Systems. Vladimir B Bajic & Tan Tin Wee (Eds.). Imperial College Press, London pdf
- Very low power to detect asymmetric divergence of duplicated genes. Proceedings of the RECOMB 2005 International Workshop on Comparative Genomics. Aoife McLysaght & Daniel H. Huson (Eds). Springer, Heidelberg pdf
- Nembaware V, Wolfe KH, Bettoni F, Kelso J, Seoighe C. Allele-specific transcript isoforms in human. FEBS Lett. 2004 Nov 5;577(1-2):233-8. pdf
- Seoighe, C., Gehring, C. Genome duplication led to highly selective expansion of the Arabidopsis thaliana proteome Trends in Genetics 2004 Oct;20(10):461-4. pdf
- Grobler, J., Gray, C.M., Rademeyer,C., Seoighe, C., Ramjee, G., Karim,S.A., Morris, L. Williamson, C. The incidence of HIV-1 dual infection and its association with increased viral load set point in a cohort of subtype C infected female sex-workers Journal of Infectious Disease, 2004: 190
- Nembaware, V, Said, M, Seoighe, C., Gehring, C.A. Molecular mimicry of a plant signaling molecule enables pathogen control of plant homeostasis BMC Evolution Biol, 2004 Mar 24;4(1):10
- Swart, E.C. , Hide, W.H., Seoighe, C. FRAGS: estimation of coding sequence substitution rates from fragmentary data (BMC Bioinformatics, 2004 Jan 29;5(1):8.
- Seoighe, C. Turning the clock back on ancient genome duplication Curr Opin Genet Dev. 2003 Dec;13(6):636-43. pdf
- Bredell, H., Seoighe, C., Gilfillan, J., Gray, C., and Williamson, C. 2003. Defining HIV-1 gag genetic diversity in southern Africa and possible impact on proposed cytotoxic T-lymphocyte epitope recognition. South African AIDS Conference 2003, Durban, South Africa.
- Seoighe, C. Johnston, C.R. Shields, D.C. Significantly different patterns of amino acid replacement after gene duplication as compared to after speciation Mol Biol Evol. 2003 Apr;20(4):484-90.
- Nembaware, V. Crum, K. Kelso, JF, Seoighe, C. Impact of the presence of paralogues on sequence divergence in a set of mouse-human orthologues Genome Res. 2002 Sep;12(9):1370-6.
- Ludidi N.N., Heazlewood J.L., Seoighe C., Irving H.R. and Gehring C.A. Expansin-like molecules: Novel functions derived from common domains Journal of Molecular Evolution 2002 May;54(5):587-94.
- Hide WA, Babenko VN, van Heusden PA, Seoighe C, Kelso JF. The contribution of exon-skipping events on chromosome 22 to protein coding diversity. Genome Res. 2001 Nov;11(11):1848-53.
- Seoighe C, Federspiel N, Jones T, Hansen N, Bivolarovic V, Surzycki R, Tamse R, Komp C, Huizar L, Davis RW, Scherer S, Tait E, Shaw DJ, Harris D, Murphy L, Oliver K, Taylor K, Rajandream MA, Barrell BG, Wolfe KH. (2000) Prevalence of small inversions in yeast gene order evolution. Proc Natl Acad Sci U S A. Nov 21
- McLysaght, A., Seoighe, C. and Wolfe, K.H. (2000) High Frequency of Inversions during Eukaryote Gene Order Evolution in Comparative Genomics Eds. D. Sankoff and J.H. Nadeau. Kluyver Academic Publishers pp 47-58
- Seoighe, C. and Wolfe, K.H. (1999) Yeast genome evolution in the post genome era. Curr Opin Microbiol. Oct;2(5):548-54. Review.
- Seoighe, C. and Wolfe, K.H. (1999) Updated map of duplicated regions in the yeast genome. Gene. Sep 30;238(1):253-61.
- Bradnam, K.R., Seoighe, C., Sharp, P.M. and Wolfe, K.H. (1999) G+C content along and among Saccharomyces cerevisiae chromosomes. Mol. Biol. Evol.
- Seoighe, C. and Wolfe, K.H. (1998) Extent of genomic rearrangement after genome duplication in yeast. Proc. Natl. Acad. Sci. USA 95, 4447-4452.
- Keogh, R.S., Seoighe, C and Wolfe, K.H. (1998) Evolution of gene order and chromosome number in Saccharomyces, Kluyveromyces and related fungi. Yeast 14, 443-457.