James Schnable

James Schnable

E-mail: schnable@unl.edu

Curriculum Vitae (pdf)

Assistant Professor

Co-Founder

Education


I study the regulation of gene expression in plants using comparative genomics and gene expression data. My favorite system to work in are the grasses (Poaceae. Grass species are critical to both of human agriculture and natural ecosystems, which makes understanding how genes are regulated in these species both more important and more feasible (since so many researchers are generating so many useful datasets in difference grass species). The most obvious example of how data rich the grasses are is that they include the most species with sequenced genomes of any family of plants.

Whole Genome Duplications

Unlike mammals, flowering plants tend to tolerate polyploidy -- possessing more than the normal two genome copies per cell -- quite well. Polyploid plants often form distinct species from their diploid parents and can sometimes go on to be incredibly evolutionarily successful. All flowering plant species studied to date are descended from polyploid ancestors. The multiple genome copies within a species descended from a polyploid ancestor are where the term "Whole Genome Duplication" comes from.

My own research on whole genome duplication focuses mostly on maize (Zea mays ssp mays), which, along with its relatives teosinte and tripsacum, is descended from a polyploid species that lived between five and twelve million years ago. By comparing the genome of maize to that of sorghum, a related grass species which does not share the maize whole genome duplication, it is possible to identify two complete copies of each sorghum chromosome within the maize genome. These chromosome copies are not intact, as duplicate copies of many genes have been lost from the maize genome -- primarily by short to medium sized deletions (anywhere from a few basepairs to a couple of neighboring genes) by nonhomologous intrastrand recombination, which we showed in Woodhouse et al 2010 PLoS Biology (see my Publications page). The loss of genes is not evenhanded between the two ancestral chromosome copies. Instead, for each pair of ancestral chromosomes gene loss is consistently biased.

Around 4000 duplicate genes are still present in the maize genome (as many as 5000 or as few as 2000 depending on how you define a "gene"). Using published expression data in maize I demonstrated that genes on the high gene loss ancestral chromosomes tend to be less expressed than their duplicate copies on the equivalent low gene loss ancestral chromosome (Schnable et al 2011 PNAS (see my Publications Page)). This is consistent with studies of recent and synthetic allopolyploids which show that gene copies from one parental species tend to show greater expression than the same genes from the other parental species of the polyploid. This unbalanced expression suggests that the reason gene loss is biased between ancestral chromosome copies in maize (and other plant species) is because the loss of a less expressed gene copy is less likely to cause problems for a plant than the loss of a more expressed gene copy. The greater bias of maize genes responsibly for well studied mutant phenotypes to the high expression, low gene loss subgenome (Schnable et al 2011 PLoS One) lends further weight to this model.

Of course we've merely exchanged one mystery for another. If biases in gene loss are explained by biases in gene expression, what creates the bias in expression between genes from the two parents in the first place? Answering that question is something I'm still working on.

Conserved Noncoding Sequences and Natural Promoter Bashing

Conserved noncoding sequences are regions neighboring a gene which do not code for protein, but are still functionally constrained, because they show high levels of sequence conservation between orthologs in multiple species. It is likely that many of these sequences are involved in the regulate gene expression. (Examples of conserved noncoding sequences with experimentally defined functions.) However in the vast majority of cases the function of specific conserved noncoding sequences remains unknown.

The two duplicate genes created by a whole genome duplication initially possess identical or highly similar promoters with all the same conserved noncoding sequences and are expected to show very similar patterns of expression across different tissues and in response to different stimuli. However, over time important regulatory sequences are removed from promoters by the same short deletions responsible for the loss of whole genes. By comparing differences in the pattern of gene expression between duplicate genes and examining which conserved regulatory sequences found in other grass species have been deleted from the promoter of one gene or another, we are developing testable hypotheses about the functions of individual promoter elements. A good summary of this technique was published in Freeling et al 2012 COPB.

I am also experimenting with running the same analysis in reverse and starting with expression data from a specific stimulus, tissue, or mutant identify duplicate genes showing dissimilar patterns of gene expression, find conserved noncoding sequences associated with the differentially expressed genes but not their duplicates with unchanged expression and end up with a list of sequence motifs putatively responsible for driving or repressing expression in that specific mutant, tissue, or stimulus.