Everybody seems to know that DNA, genes, and genomes are made up of the four letters A, T, G, and C. But how many of these are there in a genome? Do different species have different genome sizes? What is the situation in bryophytes? And, how much DNA is there in a whole bryophyte plant? You will find the answers below.
Some DNA background
The genome is the store of the hereditary information. Most cells of living organisms contain genomes. Within any one individual, all cells contain identical genome copies. Genomes consist of DNA, which is a long polymer molecule containing a backbone strand formed of alternating sugar and phosphate molecules. Attached to each sugar molecule is one of the four so-called organic bases (which are better known by their initial letters): adenine, thymine, guanine, and cytosine. Their order determines the information stored on the DNA. These organic bases have the handy property that A pairs well with T and G pairs well with C. This allows the formation of DNA double strands where one strand is complementary to the other, which is great for data integrity. Both strands contain redundant information. If one is damaged, it can easily be repaired using the other strand as a template. Furthermore, when DNA needs to be copied (before a cell splits in two), the strands are separated, and each is used as a template to generate a new complementary strand. This results in two double strands, each comprising an old strand and a newly synthesised one. In animals, fungi, and plants, the genome’s DNA is organised in chromosomes. Today, this term is generally used for a part of a genome that is located on a separate DNA double strand.
How to measure genome sizes
We are very much in the age of whole-genome sequencing, where one can “read” all of the bases of a genome. This is why the amount of DNA in a genome is often measured in base pairs (bp, the number of bases paired in a DNA double strand). Genomes contain a lot of these. To save one typing a lot of zeroes, SI prefixes are commonly used (kbp = 1000 bp, Mbp = 1,000,000 bp, Gbp = 1,000,000,000 bp). For instance, the human genome size is about 3.3 Gpb.
Other than by sequencing whole genomes and inspecting the results, genome sizes can also be measured by a technology called flow cytometry. This is faster, cheaper, and (arguably) more accurate than whole genome sequencing. How does it work? The cells of a specimen’s tissue are broken up in order to release their nuclei (DNA), which are stained with a fluorescent dye. Subsequently, the stained nuclei are sucked through a capillary, a laser beam lights up each one passing through, and a camera detects the resulting signals. The larger the genome, the more dye it takes up, and the stronger the fluorescent signal. Comparing to a reference of known genome size, the genome mass of a species can be determined. The unit of measurement is the pico gram (pg, one millionth of one millionth of a gram), and the mass of a genome is also called that genome’s C-value. In terms of base count, one pg corresponds to slightly less than 1 Gbp.
It is obvious that genome size measurements involve bewildering orders of magnitude, which are not often relevant in every-day life. For comparison, there are about 700,000 letters in the book Oliver Twisst. So, 1 Gbp is about 1400 times the length of Charles Dickens’ famous novel. However, this amount of DNA weighs only about 1 pg, a mass so light that is could not be measured, even with the best scales. On Wikipedia, it is said that 1 pg is about the weight of one E. coli bacterium. Another astonishing fact is that the amount of DNA in a human genome, 3.3 Gbp, if stretched out, would be two to three metres in length. The diameter of most human cells is only one hundred thousandth of this, so DNA clearly has to be packed very tightly!
The human genome size is 3.5 pg (3.3 Gbp), but how do other species compare? Because most genome size estimates were determined by flow cytometry, I will use the unit pg from now on. Again, 1 pg corresponds to about 1,000,000,000 bp. In terms of genome size, the moss Ceratodon purpureus has 0.39 pg, the liverwort Pellia epiphylla has 4.2 pg, and the hornwort Anthoceros punctatus has 0.18 pg. It is obvious that these genome sizes vary a lot. Our three bryophyte examples differ by more than a factor of 20. Checking the Plant C-value Database, the smallest bryophyte genome size reported is 0.16 pg (in the hornwort Leiosporoceros dussii), and the largest is 20.46 pg (in Phyllothallia fuegiana, a liverwort), differing by a factor of almost 130. Flowering plants (angiosperms), of which there are many more genome sizes known, cover an even larger range of genome sizes. The finding that genome sizes vary immensely between species, but without obvious correspondence to organismal complexity, is known as the C-value paradox. Still, genome sizes tend to be similar between related species. In other words, species that share most of their evolutionary history are likely to have similar genome sizes (as they are likely to be similar in other aspects). If more distantly related groups are compared, it is more likely to see large genome sizes differences. For instance, conifers tend to have larger genome sizes than flowering plants, and liverworts tend to have larger genomes than hornworts.
Mechanisms of genome size change
The most drastic way of genome size change is whole-genome duplication. This phenomenon is commonly observed in flowering plants, but it is rather rare in animals. Whole-genome duplication causes an increase in ploidy, which is the number of genomes contained in each of a specie’s cells. Many animals and plants consist of cells that contain two genomes, and we humans are no exception. We inherit one genome each from our mother and father, and species like us are called diploid. Species with higher ploidy levels are generally called polyploid. Whole-genome duplication is often observed following hybridisation in flowering plants, and it can cause new species to originate. This is observed in bryophytes, too, a famous example being the liverwort genus Pellia, where polyploid P. borealis is thought to have originated through doubling the genome of a hybrid between two diverged forms of P. epiphylla.
If, shortly after a whole-genome duplication event, another one happens, then an organism’s ploidy level can rise even higher. Furthermore, if hybridisation happens between organisms of different ploidy levels, it may lead to intermediate and uneven levels. This can lead to a ladder-like distribution of genome sizes in a genus or even within the same species as it is seen, for example, in harebells (Campanula rotundifolia). Judging from the chromosome counts reported in Smith’s British moss flora, this phenomenon is not uncommon in mosses.
Whole-genome duplication is a drastic way of genome size increase. A more subtle way is the duplication of DNA on a more local scale. This can lead to the duplication of genes, creating gene families, and it is often seen that the size of gene families differs between pairs of related species, providing evidence for gene duplication in one lineage (or gene loss in the other, or both). One specific type of DNAs contributing to genome size variation are transposable elements (TEs). These are often called selfish genes and they do not have a function benefitting the organism in which they reside. Instead, they use the cellular machinery to create additional copies of themselves. In doing so, they are like viruses, and some TEs are in fact related to viruses. TEs litter many genomes including the human one, 10% of which is made up by one single type of TE, the 300-bp Alu element.
Finally, genomes may shrink by losing DNA. This is may happen by unequal crossing-over, where genetic recombination happens between identical DNA sequences which are not at corresponding chromosomal positions (like TEs). It is also not unheard of that whole chromosomes are lost, in particular in polyploids, which contain redundant genetic information.
Genome size differences – who has got more junk?
Genome size differences between species are very common, and they can be phenomenal. These differences generally do not reflect a difference in the number of genes. Instead, genomes tend to contain large amounts of repetitive “junk” DNA. Even the moderately-sized genome of Physcomitrella patens is made up to 57% of TEs. While polyploidisation is a drastic way of genome size increase and is associated with large genome size, there are some diploids among the flowering plant species with the largest genome sizes. These are the showy fritillaries (the sister group of the lilies). The study of genome size and its evolutionary implications is a fascinating topic. Why do groups of plants differ so much in genome size? Does natural selection act on genome size? It has been proposed that there might be some genome size threshold that, once passed, does not allow subsequent genome size reduction. How is epigenetics involved in this?
Large genomes are beneficial to microscopic studies of chromosomes, because large genomes tend to consist of large chromosomes, which are easier to prepare and study. This is why many model systems of cytogenetics have very large genomes (cytogenetics is the branch of genetics concerned with the study of chromosomes and cell division). These model systems include: the broad been Vicia faba (13.3 pg); the onion Allium cepa (17.9 pg); grasshoppers (including locusts) like Locusta migratoria (5.3-6.4 pg) and Podisma pedestris (largest known insect genome, 16.9 pg); and, of course, the liverworts of the genus Pellia (Pellia borealis, 7.4 pg).
Unfortunately, large genome size is an obstacle for genomic studies, because it makes the generation of whole genome sequencing data expensive, and so it is no surprising to see that early-established genomics model systems have small genomes like the fruit fly (Drosophila melanogaster, 0.16 pg) and the thale cress (Arabidopsis thaliana, 0.16 pg, too). There is hope for larger genomes – prices for DNA sequencing keep dropping and new technologies emerge, allowing even higher through-put. Still, the smaller a genome, the more individuals can be analysed for the same amount of money.
How much DNA is there in a bryophyte plant?
In genomics studies, it is often desirable to sequence DNA from one individual (rather than pooled DNA from several individuals). The minimum amount of DNA for whole-genome sequencing requested by a sequencing centre (for a “PCR-based Illumina libary”) is about 200 ng. This does not sound much, and it is not something that people working on large, long-lived species usually worry about. But bryophytes are often small, and the question of how much DNA there is in a single individual, becomes relevant. I did not even have a vague idea of this quantity. So, I decided to find out.
I went to a local forested area and collected some shoots of the moss Mnium hornum. Because one cannot know whether different shoots are genetically identical or not, I decided to estimate the amount of DNA in one singe shoot. According to the Plant C-value Database, each cell of M. hornum contains 0.88 pg of DNA. How many cells are there in a shoot? To get an idea of this, I started by looking at an individual leaf. I took some microscopic photos, which I then stitched together.
Then, I used image recognition software to count the number of small “components” in the composite image. The image below shows each component in a different colour. There are 5173 of them. Not all components detected correspond to actual cells (some lie outside the leaf area). So, this count includes some additional “noise”. But this is more than compensated for by the fact that many cells of the leaf were not detected (black regions within the leaf area).
Per leaf, we have 0.88 pg × 5137 cells = 4.5 ng of DNA. I counted about 30 leaves on a single shoot of M. hornum, which takes us to 135 ng overall. This is likely an underestimate. After all, I did not consider the shoot’s stem and rhizoids in my cell count. Also, the image recognition counted fewer cells than there actually were in the leaf (it missed the leaf’s nerve, margin, and many cells of the leaf base). Still, even if the total DNA content of a shoot was twice my estimate, one would still struggle to extract enough DNA for whole-genome sequencing. This is because the efficiency of DNA extraction can be quite low. One never gets all the DNA that there is in one plant. Sad news. What can be done to overcome this problem?
M. hornum is one of the larger acrocarpous mosses and its genome size is not small for a moss. This means that the DNA content of many species is probably too low for extraction and sequencing from a single shoot, and this is probably true for many leafy liverworts, too. One way to avoid this problem might be to focus on pleurocarpous mosses, which tend to have larger plants and thus more cells. But pleurocaps are often very tangled, and it may be hard to decide what is the same plant and what a different one. Another way is to take bryophytes into cultivation. This should be easy and relatively fast for species with vegetative propagules (tubers, gemmae, bulbils, or breaking-off tips), where each new-grown plant is genetically identical to its parent (a clone). Alternatively, if a bryophyte is grown from a single spore, the developing protonema may give rise to multiple shoots. Being genetically identical, their DNA can then be pooled. Where this is not possible, be it because there is no time, or because cultivation fails, technology may help. It is possible to carry out whole-genome sequencing with DNA from as little as one single cell. This is possible by amplifying the cell’s DNA before sequencing. Single-cell sequencing is not the preferred way of whole-genome data generation: It is more expensive and the standard approach, there is more opportunity for contamination, the DNA amplification step tends to introduce biases in the representation of different genome regions, and some parts of the genome are usually missing altogether. Still, this may be better than no data at all.
Genome sizes can be measured by counting the number of base pairs (bp) they contain, or by determining the weight of a genome in pico gram (pg). One pg corresponds to approximately 1,000,000,000 bp. The smallest (known) bryophyte genomes are somewhat over one hundred million bp in size. The largest ones are more than one hundred times that. As far as we can tell, bryophytes tend to have smaller average genome sizes than flowering plants. However, there are some bryophytes with large genomes, including the genus Pellia. Genome sizes may increase by whole-genome duplication (leading to polyploidy) or by local DNA duplication, and the major part of most genomes consists of repetitive junk DNA. Finally, I estimated the amount of DNA present in one shoot of the moss Mnium hornum to 135 ng, which is less than what is usually required for whole-genome sequencing. This issue may be overcome by cultivating genetically identical material for DNA extraction or by whole-genome amplification before sequencing.
- I found this blogpost on genome size variation in harebells very interesting. They have several different ploidy levels, ranging from diploid to hexaploid (6x).
- To check the genome size of plants in your favourite group, have a look at the Plant C-value Database, which by the way as an interesting introduction page.
- For people interested in genomics publications, here is a link to the (open access) paper on the Physcomitrella patens genome.