About us here...

‘Dark DNA’ could change how we think about evolution

Image: Pixabay

DNA sequencing technology is helping scientists unravel questions that humans have been asking about animals for centuries. By mapping out animal genomes, we now have a better idea of how the giraffe got its huge neck and why snakes are so long. Genome sequencing allows us to compare and contrast the DNA of different animals and work out how they evolved in their own unique ways.

But in some cases we’re faced with a mystery. Some animal genomes seem to be missing certain genes, ones that appear in other similar species and must be present to keep the animals alive. These apparently missing genes have been dubbed “dark DNA”. And its existence could change the way we think about evolution.

My colleagues and I first encountered this phenomenon when sequencing the genome of the sand rat (Psammomys obesus), a species of gerbil that lives in deserts. In particular we wanted to study the gerbil’s genes related to the production of insulin, to understand why this animal is particularly susceptible to type 2 diabetes.

But when we looked for a gene called Pdx1 that controls the secretion of insulin, we found it was missing, as were 87 other genes surrounding it. Some of these missing genes, including Pdx1, are essential and without them an animal cannot survive. So where are they?

The first clue was that, in several of the sand rat’s body tissues, we found the chemical products that the instructions from the “missing” genes would create. This would only be possible if the genes were present somewhere in the genome, indicating that they weren’t really missing but just hidden.

Fat sand rat (Psammomys obesus). Credit: MinoZig/WikiCommons

The DNA sequences of these genes are very rich in G and C molecules, two of the four “base” molecules that make up DNA. We know GC-rich sequences cause problems for certain DNA-sequencing technologies. This makes it more likely that the genes we were looking for were hard to detect rather than missing. For this reason, we call the hidden sequence “dark DNA” as a reference to dark matter, the stuff that we think makes up about 25% of the universe but that we can’t actually detect.

By studying the sand rat genome further, we found that one part of it in particular had many more mutations than are found in other rodent genomes. All the genes within this mutation hotspot now have very GC-rich DNA, and have mutated to such a degree that they are hard to detect using standard methods. Excessive mutation will often stop a gene from working, yet somehow the sand rat’s genes manage to still fulfil their roles despite radical change to the DNA sequence. This is a very difficult task for genes. It’s like winning Countdown using only vowels.

This kind of dark DNA has previously been found in birds. Scientists have found that 274 genes are “missing” from currently sequenced bird genomes. These include the gene for leptin (a hormone that regulates energy balance), which scientists have been unable to find for many years. Once again, these genes have a very high GC content and their products are found in the birds’ body tissues, even though the genes appear to be missing from the genome sequences.

Shedding light on dark DNA

Most textbook definitions of evolution state that it occurs in two stages: mutation followed by natural selection. DNA mutation is a common and continuous process, and occurs completely at random. Natural selection then acts to determine whether mutations are kept and passed on or not, usually depending on whether they result in higher reproductive success. In short, mutation creates the variation in an organism’s DNA, natural selection decides whether it stays or if it goes, and so biases the direction of evolution.

But hotspots of high mutation within a genome mean genes in certain locations have a higher chance of mutating than others. This means that such hotspots could be an underappreciated mechanism that could also bias the direction of evolution, meaning natural selection may not be the sole driving force.

So far, dark DNA seems to be present in two very diverse and distinct types of animal. But it’s still not clear how widespread it could be. Could all animal genomes contain dark DNA and, if not, what makes gerbils and birds so unique? The most exciting puzzle to solve will be working out what effect dark DNA has had on animal evolution.

In the example of the sand rat, the mutation hotspot may have made the animal’s adaptation to desert life possible. But on the other hand, the mutation may have occurred so quickly that natural selection hasn’t been able to act fast enough to remove anything detrimental in the DNA. If true, this would mean that the detrimental mutations could prevent the sand rat from surviving outside its current desert environment.

The discovery of such a weird phenomenon certainly raises questions about how genomes evolve, and what could have been missed from existing genome sequencing projects. Perhaps we need to go back and take a closer look.

Adam Hargreaves, Postdoctoral Research Fellow, University of Oxford

This article was originally published on The Conversation. Read the original article.

Leave a Reply

Sort by:   newest | oldest | most voted
Susan Irving Murley
Excellent research with more info related to genomes than I have ever read, (since I have only read 1 studied re: the genome, re: effects of lack of food in future generations). This study is so important and amazing. I would defiantly love to learn more about it. Thank you Adam Hargreaves, et al. I love this site, the one that was on my FB page, because it spoon feeds me important new scientific info that stimulates my brain so well. I have never been on your site,, maybe that site is more up my alley of interest, as I love to learn about the brain, DNA, genomes, etc., but I am a lazy learner, wanting to be spoon-fed. Please share more, but only have time for real scientific info. I give no permission to give my personal information below to the public.

interesting but your definition of evolution isnt great. A different you omit the contributions due to drift. In small populations drift can occur quickly. Don’t short shift your audience.

Frank Abernathy
Frank Abernathy

Dark, hard to find GC rich DNA? Could this be due to a technology issue because of the way DNA is purified and chopped up for sequencing?

Frank Abernathy
Frank Abernathy

Did you use conventional phenol extraction or some other procedure?


Interesting findings and speculation. GC rich regions are more prone to methylation and subsequent chromatin formation correct? Perhaps lower expression of these critical metabolic genes are the cause of your poor rodents TIID? Poor guys…

Rod Sprague
Rod Sprague

“This means that such hotspots could be an underappreciated mechanism that could also bias the direction of evolution, meaning natural selection may not be the sole driving force.” I disagree. Dark DNA has been a factor an organism’s evolution; the ability to deal with or do better with the effects of dark DNA is a trait that was selected for. I was taking a class when the ability of some organisms to pass information to subsequent generations was being noticed. A test question was asking if this was Lamarckian or Darwinian evolution. I argued it was Darwinian because the ability of passing this information was useful to the ancestors of the organisms in question, so it would clearly be Darwinian, as the early stages of this ability would have been selected for, too.


Thank you sir for all the explanation on Dark DNA. I will study hard like you.


No great mystery here – this is a sequencing technology issue. PCR does not amplify GC-rich DNA very well because it has a higher “melting point”. Illumina sequences PCR product – if you’re not amplifying it, you’re not sequencing it. This is not “dark DNA” with mysterious evolutionary implications, you’re simply observing gaps in your genome assembly which are due to amplification bias. Plenty of organisms have GC-rich genomes, especially extremophiles (GC-rich DNA tolerates higher temps, probably not a coincidence). And I would guess the vast majority of organisms at least have some regions of high CG-content. This is commonly known, and sequencing bias is well documented. Not sure what’s new here… other than gratuitous use of “Dark DNA” to make it sound more interesting