Ask HN: Can we think of DNA as Infrastructure as Code

20 pointslog1012y ago50 comments

And is there any study on the parallels between human biology and computer science?

50 comments

wespiser_20182y ago

I've both worked with infrastructure as code, Pulumi, and was a grad student researcher in bioinformatics for several years, and I've developed the following take:

Biology is messy, tangled, and sloppy system built over a billion years under evolutionary pressure. There's no clear analogy to intelligently designed software, and anytime you make an analogy, like DNA == Source code, there's a mechanism which would destroy it's predictive power to explain biological phenomena. Like with DNA, computer software doesn't create the machine it's executed on, code is 1d, while DNA is definitely multi-dimensional, where it's folding, epigentic modifications, and other modifications matter a lot.

All the interesting biology for complex animals happens during the first few stages of development. There's no computational equivalent to this recursively constructive process. Additionally, biology has a single guiding principle through which we understand everything: evolution, and using computer analogies really diminish that.

Therefore, biology is biology. It's not analogous to a Von Neuman architecture machine, or any other computing device we've created. The first principles are simple different.

adayanti2y ago

Thank you @wespiser_2018, spot on.

My career is in both fields + PhD in one. Tortured analogies of biology as computers make me cringe, they're misleading at best. Sure the ribosome superficially looks like a FSM, but that gets you basically nowhere.

Comp-sci people: If you're curious about biology, spend some quality time at Kahn or edX, watch some university intro-bio lectures on youtube, read an intro-level undergrad biol textbook, etc.

skydhash2y ago

I was reading "The Origin of Knowledge and Imagination" and the author was reflecting on the same thing. The mind is not a computer, you do not think like a computer. In fact, there is no separate concept called "the mind". The whole body is part of our perception and action ecosystem. It makes sense as software, while imperfect, is too orderly and simplistic.

entropinflictor2y ago

Chromatin, not just DNA. Anyway, the "code" category is too simple to describe what is actually happening. There is a sort of nanoscale industrial cyber-bio-physical-chemical plant/factory/machinery/complex within each cell. If you consider CS-stuff as an analogy, you might miss the 'physical-chemical' aspect of it all. Like, you know, you would not call an autonomous android with embedded micro chemical facility just a computer or server. Yes it is also a computer, yes it runs some code, but it is moving, it is chemically very active, it has sort of initiative capabilities. cs analogies are way too barren.

obrit2y ago

You might be interested in reading some of Bert Hubert's articles, specifically "DNA seen through the eyes of a coder": https://berthub.eu/articles/posts/amazing-dna/

aristofun2y ago

There is no direct parallels. And guess what - the brain is not even a computer.

Everything this was just metaphors for non experts, irrelevant contexts and making money on pseudo science books.

plokiju2y ago

I think that very much depends on how you define "computer". the brain is doing computations, so I think it's safe to think of it as a computer. it's just a completely alien computer to what we build ourselves

skydhash2y ago

By computation, do you mean problem solving? The most simplistic model I have of the brain (not a psychologist, just from thing I read) is that the brain is an advanced pattern matcher. I don't think we even do linear "computation", just because of the quantity of data we are ingesting, storing and retrieving.

1 more reply

aristofun2y ago

> the brain is doing computations

Nope. We don't really know yet what exactly the brain is "doing".

But again, in _some_ contexts it is better than nothing.

Unfortunately good metaphors stick, regardless of their relevance and accuracy :(

1 more reply

moralestapia2y ago

Not as Infra as Code but you could think of it as another stack with its own architechture and programming language. Also runs on jelly instead of electronics.

markus_zhang2y ago

Have we figured out how to program it?

moralestapia2y ago

Nope :'(.

Hackbraten2y ago

And malware is literally viruses.

danwee2y ago

What always surprised me is not DNA, but how other non-living things can exist at all without DNA. It makes sense to me that I have green eyes since that's what my DNA says... but where is defined that a rock has a particular color or weight or texture?

Whenever I think of how planets move around a star, I always think "Ok. So, I imagine that at some point the universe should know the distance between planet X and star Y, and also, probably, maybe, it should know their masses... but that data is nowhere defined (that we know). So, is the data computed 'on the fly' every $minimal-unit-of-time?). Perhaps the universe doesn't need to know distances nor masses, though.

sohex2y ago

I think you might be trying to form a model based on false premises here. The universe doesn't know anything. Or to put it another way, the universe doesn't know anything because it is everything. You seem to be assuming a sort of director and actor based model wherein the universe (the director) needs to be telling the planets and stars (the actors) what to do, but that's not the case. The planets and stars interact with each other via the various forces that they're subject to and their state may or may not change as a result.

I guess, to generalize, my point is that things are as they are as a consequence of the physics that they're subject to. A rock has its particular color, weight, and texture due to its elemental composition. A molecule doesn't need to know what it is in order to reflect or absorb certain wavelengths of light, that's what naturally occurs when those wavelengths interact with particles that have a certain composition and state. There's no conscious direction happening at any point, nor is there any data being computed.

The universe is essentially a medium that exists with certain properties and everything in it is stuff that also exists with certain properties. We call the manner in which these properties interact physics. So in summation I'd say that there are intrinsic properties and emergent phenomena based on those properties, that's where a rock gets its color, weight, and texture.

AndrewDucker2y ago

Yes, but the computer that runs the code is the body that grows it, and each one is different.

anotherhue2y ago

Computation is everywhere. Read about CRISPR and tape based Turing machines.

blurbleblurble2y ago

We can if we'd like to but remember that it's an analogy, no matter how useful, inspiring or beautiful the connections might be.

I don't like this analogy too much because it collapses "DNA" into a singular thing that's somehow related to classical computing, even though there's no indication that DNA and the multiple layers of systems interacting with it are limited in the same ways as classical computers.

I prefer to think of the DNA as a medium for memory, one of many mediums for memory.

Memory is everywhere, whether or not anyone or anything remembers it.

Noneofya2y ago

>Memory is everywhere, whether or not anyone or anything remembers it.

That one is beautiful.

obrien1984ae2y ago

A better analogy for DNA in computer science would be LLMs. Each organism's DNA represents an experiment being performed in service of training a model. If that experiment manages to procreate, its successful mutations graduate into another round of experiments.

This has gone on for several billion years, resulting in a largely stable model within which experiment are continuing to be run.

As with LLMs, DNA doesn't know anything about the data it is being trained on ("Nature"). And that model continues to change even as the experiments are run.

So DNA is a "blockchain" record of previous and current hypotheses on which traits enable an organism to live to viability. Some of these hypotheses are "dead code," as the environment no longer contains the pressure which made them critical. Some of them are essential to viability. Some of them are experiments whose value has not yet been determined.

Assuming your question is whether IaC could learn from patterns in DNA, I think that's a very interesting idea. Certainly we desire that every loadbalancer, database, and iam policy be capable of self-defense, and be the hardiest, most fit version of itself possible.

Where the analogy struggles is that people writing IaC are more in the business of designing "natures" than they are designing individual organisms which would survive a chaotic and hostile "nature" being enforced on them. And people who write IaC might be unhappy to hear that getting to a "viable" database would require launching several thousand databases in an environment and, after some period of changes, seeing which one is performing best so they can clone that "best" database configuration when new databases are needed.

fl0ki2y ago

I apologize in advance for criticism you only half deserve. I know this is HN, but not every single topic needs to be compared to LLMs or Blockchain. Concepts like fitness functions, selective pressures, feedback mechanisms, etc. -- as applied to both embodied organisms and constructs like cultures and ideologies -- predate not just the last two hype cycles but artificial computation altogether.

Referring to such deeply established foundation is normal -- I'm sure many of us remember when novel AI approaches like perceptrons and evolutionary algorithms were described by analogy to biology -- but skipping over the established literature to talk about nascent concepts makes you sound less like you're contributing to an educated discussion and more like you're courting angel investors with buzzwords.

At the very least it could be "here's the analogy to <established thing>, and to help round out the concept, <new thing> is also analogous to <established thing> with <important differences>". That makes both a stronger argument and a more educational essay than skipping straight to the new thing with no foundation.

For example, one very important difference is that DNA can be edited and lose history, unlike blockchain, but an intact fossil record could be used to infer how those edits came in over time and space, kind of like an incomplete distributed ledger.

obrien1984ae2y ago

> For example, one very important difference is that DNA can be edited and lose history, unlike blockchain, but an intact fossil record could be used to infer how those edits came in over time and space, kind of like an incomplete distributed ledger.

That is a very good point.

It is true that editing blockchain completely destroys the chain, and editing DNA in very specific ways does not. I was thinking (when I wrote it) that because of the way the genes interact it can be completely destructive to take only a single piece of DNA. But, as you correctly point out, we do that all the time. So, yes, the blockchain analogy only partially fits.

Thank you.

barryrandall2y ago

You could, but DNA is a terrible "language." It's unreliable , degrades over time, accumulates and perpetuates errors, and if you use instructions in an order the compiler doesn't like, you'll get origami (loops, hairpins) instead of a program.

mensetmanusman2y ago

It replicates for millennia, it uses air to build plants, it self repairs, it contains the code for its operating system.

1letterunixname2y ago

Unfortunately, the world didn't need a first Gödel, Escher, Bach pseudoscientific conspiracy theory pin board that falls in love with the idea of beautiful ideas over the nuances of reality and the incomparable contrasts of entirely different things.

hliyan2y ago

I learned about DNA transcription and translation while learning about mRNA vaccines in 2020. Here's how I explained the process to myself:

1) DNA = source code on disk

2) RNA polymerase = disk read head

3) RNA = source code / functions loaded to memory

4) Ribosome = JIT compiler

5) Proteins = small, single purpose executables (like unix commands)

6) Proteins once outside the cell = execution

If you think of the body as the hardware, then yes, there is some merit to thinking of DNA as infrastructure as code, operating system and application software.

recvonline2y ago

I think Michael Levin would disagree:

https://www.youtube.com/watch?v=p3lsYlod5OU

Euphorbium2y ago

Sure. https://youtu.be/kbJxl7HU480

gjvc2y ago

genetic algorithms were stolen by the latter from the former

bananapub2y ago

definitely not, go and look up epigenetics

pacificmaelstrm2y ago

DNA is a form of code, but it doesn't encode programs. Instead, like an STL or STEP file it encodes HARDWARE designs.

While you could think of it as encoding infrastructure AND code (as in IaS) you'd need to go beyond that to include the hardware for computing AND physical function (like a whole car + computer) in that conception, which is not what IaS means.

The hardware side of DNA is easy to overlook since we don't yet have the necessesary (CAD) design tools to easily understand the shape and mechanics of proteins just from reading a DNA sequence like we do for macroscale 3D models. But there are hard technological reasons for this.

DNA encodes information, but instead of binary organized into 8-64 bit bytes (10010110) it uses four base pairs (ATCG) organized into 3 letter codons, each of which represents one amino acid.

The cell assembles chains of amino acids which are then placed in an "oven" where the string of molecules folds back on itself to assemble a complicated and functional 3D shape.

When we say complicated, we really do mean complicated. Even the fastest modern super computers are unable to determine the shape of these protein based only on the DNA sequence input. Further, we are unable to simulate the way that a folded protein will interact with other molecules reliably.

Fortunately these kinds of problem will someday be easily solved by quantum computers, but for now we are stuck with approximations of questionable accuracy.

But there are very computer code-like elements to how cells work. Unfortunately it is all spaghetti code. One section of DNA often codes for proteins which bind to one or more other sections of DNA either increasing or decreasing the activity production of the proteins from those locations.

Additionally, some DNA sections code not for protein but RNA strings which are used mechanically by themselves or as part of proteins like CRISPER. RNA is always created as an intermediate step between DNA and Protein, but in this case it is used directly as fRNA (functional RNA). RNA can even fold on itself and act similar to proteins though it is much more fragile.

The many interactions between protein, DNA and RNA perform a kind of computation but it is very obfuscated.

The following are generalized interactions that take place in a cell (perhaps analogous to machine instructions) written in a kind of pseudocode, to help illustrate the recursive functions involved.

DNA + Protein = RNA;

RNA + Protein = Protein;

Protein = Protein++;

Protein = Protein--;

Protein = RNA++;

Protein = RNA--;

RNA = RNA++;

RNA = RNA-+;

RNA = Protein++;

RNA = Protein--;

Protein + RNA = DNA;

Any protein or fRNA can have multiple functions in a cell and affect the production other proteins and fRNAs by interacting with DNA or RNA or with other Proteins involved in the production chain. In addition to this, proteins and fRNA also physically move around other proteins and molecules and make up the structure and machinery of a cell.

Untangling it all is close to impossible currently. There is several billion years worth of tech debt and zero documentation.

dekhn2y ago

This looks like it was written by generative AI but I can't really say for sure.

BTW: protein structure prediction didn't need supercomputers (in the traditional sense) and the PSP problem wasn't solved using supercomputers applying a high quality physics function to simulate folding- instead, it was solved using a combination of ML supercomputers, a really good algorithm (transformers), and a couple of really good data sets- the known structures of proteins, and the known relationship of proteins.

Instead of simulation on a huge supercomputer so they could predict a single strucfture, they trained a model which approximates structure well enough to beat every competitor. From what I can tell, most of the resulting quality doesn't come from their force field but from the distance constraints that are mostly derived from historical relationships between proteins, and the coevolution of their sequences.

taylodl2y ago

Came here to say this. It is extremely over-simplistic to think of DNA as Infrastructure as Code.

j / k navigate · click thread line to collapse

50 comments

wespiser_20182y ago

I've both worked with infrastructure as code, Pulumi, and was a grad student researcher in bioinformatics for several years, and I've developed the following take:

Therefore, biology is biology. It's not analogous to a Von Neuman architecture machine, or any other computing device we've created. The first principles are simple different.

adayanti2y ago

Thank you @wespiser_2018, spot on.

Comp-sci people: If you're curious about biology, spend some quality time at Kahn or edX, watch some university intro-bio lectures on youtube, read an intro-level undergrad biol textbook, etc.

skydhash2y ago

entropinflictor2y ago

obrit2y ago

You might be interested in reading some of Bert Hubert's articles, specifically "DNA seen through the eyes of a coder": https://berthub.eu/articles/posts/amazing-dna/

aristofun2y ago

There is no direct parallels. And guess what - the brain is not even a computer.

Everything this was just metaphors for non experts, irrelevant contexts and making money on pseudo science books.

plokiju2y ago

skydhash2y ago

1 more reply

aristofun2y ago

> the brain is doing computations

Nope. We don't really know yet what exactly the brain is "doing".

But again, in _some_ contexts it is better than nothing.

Unfortunately good metaphors stick, regardless of their relevance and accuracy :(

1 more reply

moralestapia2y ago

Not as Infra as Code but you could think of it as another stack with its own architechture and programming language. Also runs on jelly instead of electronics.

markus_zhang2y ago

Have we figured out how to program it?

moralestapia2y ago

Nope :'(.

Hackbraten2y ago

And malware is literally viruses.

danwee2y ago

sohex2y ago

AndrewDucker2y ago

Yes, but the computer that runs the code is the body that grows it, and each one is different.

anotherhue2y ago

Computation is everywhere. Read about CRISPR and tape based Turing machines.

blurbleblurble2y ago

We can if we'd like to but remember that it's an analogy, no matter how useful, inspiring or beautiful the connections might be.

I prefer to think of the DNA as a medium for memory, one of many mediums for memory.

Memory is everywhere, whether or not anyone or anything remembers it.

Noneofya2y ago

>Memory is everywhere, whether or not anyone or anything remembers it.

That one is beautiful.

obrien1984ae2y ago

This has gone on for several billion years, resulting in a largely stable model within which experiment are continuing to be run.

As with LLMs, DNA doesn't know anything about the data it is being trained on ("Nature"). And that model continues to change even as the experiments are run.

fl0ki2y ago

obrien1984ae2y ago

That is a very good point.

Thank you.

barryrandall2y ago

mensetmanusman2y ago

It replicates for millennia, it uses air to build plants, it self repairs, it contains the code for its operating system.

1letterunixname2y ago

hliyan2y ago

I learned about DNA transcription and translation while learning about mRNA vaccines in 2020. Here's how I explained the process to myself:

1) DNA = source code on disk

2) RNA polymerase = disk read head

3) RNA = source code / functions loaded to memory

4) Ribosome = JIT compiler

5) Proteins = small, single purpose executables (like unix commands)

6) Proteins once outside the cell = execution

If you think of the body as the hardware, then yes, there is some merit to thinking of DNA as infrastructure as code, operating system and application software.

recvonline2y ago

I think Michael Levin would disagree:

https://www.youtube.com/watch?v=p3lsYlod5OU

Euphorbium2y ago

Sure. https://youtu.be/kbJxl7HU480

gjvc2y ago

genetic algorithms were stolen by the latter from the former

bananapub2y ago

definitely not, go and look up epigenetics

pacificmaelstrm2y ago

DNA is a form of code, but it doesn't encode programs. Instead, like an STL or STEP file it encodes HARDWARE designs.

DNA encodes information, but instead of binary organized into 8-64 bit bytes (10010110) it uses four base pairs (ATCG) organized into 3 letter codons, each of which represents one amino acid.

The cell assembles chains of amino acids which are then placed in an "oven" where the string of molecules folds back on itself to assemble a complicated and functional 3D shape.

Fortunately these kinds of problem will someday be easily solved by quantum computers, but for now we are stuck with approximations of questionable accuracy.

The many interactions between protein, DNA and RNA perform a kind of computation but it is very obfuscated.

The following are generalized interactions that take place in a cell (perhaps analogous to machine instructions) written in a kind of pseudocode, to help illustrate the recursive functions involved.

DNA + Protein = RNA;

RNA + Protein = Protein;

Protein = Protein++;

Protein = Protein--;

Protein = RNA++;

Protein = RNA--;

RNA = RNA++;

RNA = RNA-+;

RNA = Protein++;

RNA = Protein--;

Protein + RNA = DNA;

Untangling it all is close to impossible currently. There is several billion years worth of tech debt and zero documentation.

dekhn2y ago

This looks like it was written by generative AI but I can't really say for sure.

taylodl2y ago

Came here to say this. It is extremely over-simplistic to think of DNA as Infrastructure as Code.

j / k navigate · click thread line to collapse