Alterations
We will introduce rather conservative changes to the genome sequence, initially deleting or relocating genomic features using strategies we feel, from first principles, are unlikely to reduce fitness significantly. At the same time, we will be introducing site-specific recombination sequences, allowing subsequent in vitro evolution of the yeast strains generated. This opens up a whole new dimension – not just one synthetic genome, but whole populations of synthetic genomes will be available for analysis and future study. The basic approach is to encode site-specific recombination sites at carefully chosen positions in the synthetic chromosomes that are not expected to have an effect on fitness upfront, but will enable in vitro evolution or “SCRaMbLE” to “tell us” what is in fact dispensable when we transiently express the appropriate recombinase and select for the survivors.
PCRTags
PCRTags are alterations incorporated into most open reading frames (ORFs) (on average one per ORF, as some ORFs are too small and others contain multiple PCRTags). These are made by recoding a ~20bp segments of the coding region of an ORF to a different DNA sequence encoding the same amino acid sequence. PCR primer pairs can then be designed that will selectively amplify only the synthetic or wild type sequences. In this way, transformants that have incorporated a synthetic segment can be quickly scanned to ascertain that a complete substitution of the segment has occurred. PCRTags can also be used to monitor for the deletion of non-essential segments post-SCRaMbLE induction.
Telomeres
It has been shown that yeast telomeres can be replaced by artificial telomeres of the simple sequence repeat (TG1-3)n. Although these telomeres lack several features, including nearby origins of replication and sub-telomeric repeats, deleting them has so far revealed no major impact on chromosome stability or behavior. Even putting artificial telomeres at both ends of the same chromosome has led to only marginal effects on chromosomal stability. Synthetic chromosomes in the Sc2.0 project are thus designed to encode universal telomere caps (UTCs) at their termini, which are specified by ~350bp of the yeast telomere repeat sequence containing a single copy of the “X” consensus element. The deletion of endogenous sub-telomeric repeat regions typically accounts for a significant decrease in chromosome length of the synthetic as compared to native sequence and can be 10s of kilobases.
Transposable Elements
We are keenly interested in transposable element biology, since these elements represent a major component of eukaryotic genomes, and we are also interested in removing repeats in general. The dominant view of transposon biology is that transposons are genome parasites that play no essential role in the genome. This is not in conflict with the observation that transposons have clearly been “domesticated” to play critical biological roles, such as the vertebrate adaptive immune system. When transposons are domesticated they generally lose their ability to transpose. Preliminary studies suggest yeast retrotransposons are indeed dispensable, but to our knowledge, eukaryotes that completely lack transposon sequences are exceedingly rare. Thus, we will test the hypothesis that transposon-free yeast chromosomes can be synthesized and maintained. As it is possible that during diminution of transposon copy numbers, the remaining Ty1 transposons will be activated to mobilize, the SPT3 gene, required for Ty1 transposon transcription, can be deleted. Also, we relocate the highly preferred targets of Ty1 transposition, the tRNA genes, as we go along, minimizing the probability of new insertions into the synthetic chromosome regions. Recent studies point to tRNA regions as hotspots for genomic instability.
tRNA Genes
All natural yeast chromosomes contain one or more yeast tRNA genes, or tDNAs. We are testing the effect of removing all tDNAs from the synthetic chromosomes and re-locating them to a separate “neochromosome”. The strategy here is to encode all tDNAs from a particular chromosome on a centromeric plasmid so that once synthetic chromosome incorporation in yeast is complete the balance of tDNAs will be identical to the starting strain. Re-location of tDNAs is important because they are hotspots for transposition and genome rearrangements. As such, the synthetic chromosomes may well be quite resistant to such recombinational events relative to normal chromosomes, and generally produce a more stable genome. We refer to the tDNA “neochromosome” as the “party chromosome” because we expect the tRNA genes to be unstable, however this instability will be isolated from the synthetic chromosomes. This neochromosome or chromosome “tDNA block” will grow as the synthesis project proceeds.
Introns
There are only about 250-300 introns in the native S. cerevisiae genome. Numerous individual introns have been precisely removed at the DNA level without profound effects on mitotic growth, although effects on meiosis have been noted in some cases. The major exception to this observation is that deletion of introns encoded within certain ribosomal protein genes has been shown to yield fitness defects. Thus, only those introns whose deletion is not associated with fitness defects will be removed; all ribosomal protein gene introns will be retained, at least for the time being. Certain introns contains snoRNAs that may be required for fast growth. In cases where introns are shown to encode snoRNAs or other important small RNA molecules, we will relocate these to non-intronic positions (most snoRNA genes are already non-intronic in S. cerevisiae). Large introns are excellent candidates for homes of yet-to-be-found classes of small RNAs in the yeast genome. The contribution of RNA molecules to regulation of biological complexity in eukaryotes may well have been vastly underestimated, and certainly recent studies of the RNA genomics of metazoans lead credence to this hypothesis. But to what extent does this apply to the relatively streamlined yeast genome? We will be providing evidence for or against this hypothesis depending on the outcome of our intron deletion experiments. If we can continue to delete them with impunity and without obvious impact on fitness (for example, making an entirely intron-free chromosome), it will suggest that there are few essential RNAs imbedded in introns. But if we find specific phenotypic effects of deleting specific introns, this will be a very exciting result, requiring follow-up. Such results would imply either a previously unknown and critical RNA product or a previously unknown essential role for splicing. These can be dissected by expressing the intronic sequences ectopically; if such constructs “complement” intron deletion phenotypes it is strong presumptive evidence for an intron-encoded product.
Genes unnecessary for laboratory viability
There are genes that we are very confident are not necessary for viability in the lab, either singly or in combination as “Silent” cassettes. Yeast chromosome III contains two “silent” cassettes encoding silent copies of “a” and “alpha” mating type information. This is because wild yeast contains a gene, HO, which converts the mating type in haploid yeast cells. Our laboratory strains already lack a functional HO gene and hence never interconvert. No synthetic lethality has ever been observed with mutations of HO or the silent cassettes. Previous studies have shown that the silent cassettes are dispensable for growth. Thus HO and the silent cassettes will be removed from our chromosome design. We welcome suggestions for other regions about which there is a very high level of confidence that deletion will have essentially no phenotypic consequence.
Recoding Strategies
While it would in principle be possible to recode every reading frame in the genome entirely, we felt this was extremely risky and could have unpredictable results. The expression of certain genes is controlled by codon usage, and widespread elimination of this form of regulation might lead to swings in gene expression incompatible with viability. Also, some genes contain specific RNA sequences controlling gene expression through frameshifting and other mechanisms, and the extent of such regulation is not known. Hence, with the exception of the introduction of PCRTag re-coding, we consider it is unwise to plan any additional, widespread recoding of ORFs. We are, however, replacing all TAG codons with TAA, allowing for subsequent introduction of new coding schemes with unnatural amino acids, as has been done by the Peter Schultz lab and others developing orthogonal translation systems.