In recent years, humanity has created more data than in all of history combined – a remarkable level of production with no signs of slowing down. But where are we going to put all this?
Although scientists are constantly increasing the size of hard drives to hold humanity’s information, and many believe this could be done indefinitely, some suggest that these efforts will eventually be overtaken by the exponential rate at which we generate data. In response to these concerns, scientists have been looking at a rather unique solution: storing files, photos and documents on nature’s own database of information: DNA.
DNA is both vast and condensed enough to hold an unfathomable amount of data in hyper-small spaces. After all, the double helix strands protect the entire planes of our body while being hidden inside cell nuclei just 10 micrometers wide. Additionally, DNA is naturally abundant and can withstand extremely harsh conditions on Earth. Scientists can even recover genetic information from centuries-old DNA.
“Every day, several petabytes of data are generated on the Internet. A single gram of DNA would be enough to store this data. That’s how dense DNA is as a storage medium”, Kasra Tabatabaei, researcher at Beckman Institute for Advanced Science and Technology, said in a statement.
Tabatabaei is the co-author of a new study, published in last month’s edition of the journal Nano Letters, which may well take the concept of DNA data storage to great heights. Essentially, the study team is the first to artificially expand the DNA alphabet, which could allow for massive storage capacities and accommodate a fairly extreme level of digital data.
Before we dive into the details, here’s a quick recap of the biology.
DNA encodes genetic information with four molecules called nucleotides. There’s adenine, guanine, cytosine, and thymine, or A, G, C, and T. In a sense, DNA has a four-letter alphabet, and different combinations of letters represent different bits of data. With just these four letters, nature can encode the genetic information of every living organism. So theoretically we should also be able to store a ton of digital data with this team of letters. What if we had a longer alphabet? Presumably, this would give us a much deeper ability.
Following this line of thinking, the team behind the new study artificially added seven new letters to the DNA repertoire. “Imagine the English alphabet,” Tabatabei said. “If you only had four letters to use, you could only create so many words. If you had the full alphabet, you could produce unlimited word combinations. It’s the same with DNA. Instead of converting zeros and ones to A, G, C, and T, we can convert zeros and ones to A, G, C, T, and the seven new letters of the storage alphabet.”
Additionally, by ensuring that the information encoded in these 11 letters can be regurgitated on demand, the researchers have also invented a new mechanism that accurately re-reads data from synthetic DNA. The system uses deep learning algorithms and artificial intelligence to distinguish between human-made and natural DNA letters, as well as to differentiate everything from one another.
Overall, it provides an extremely clear reading of DNA letter combinations, uncovering all the information hidden within.
“We tried 77 different combinations of the 11 nucleotides, and our method was able to differentiate each of them perfectly,” said Chao Pan, a graduate student at the University of Illinois at Urbana-Champaign and study co-author. . statement, and “the deep learning framework as part of our method for identifying different nucleotides is universal, allowing our approach to be generalized to many other applications”.
DNA is not the only innovative and promising way to preserve our compositional data. A research team from Harvard University, for example, is working on using neon dyes to encode priceless information. Still, Tabatabaei remarked that “DNA is nature’s original data storage system. We can use it to store any type of data: images, video, music – anything.”