Artificial Intelligence – Genetic origami pt. II

Deep Mind puts a branch of science out of business.

  • Deep Mind has crossed a milestone with the demonstration that its ability to predict protein structure is as good as the painstaking, expensive and difficult current methods potentially offering a big boost to medical and biological research worldwide.
  • In the latest protein structure competition (see here), Deep Mind’s algorithm both convincingly beat the competition but also breached a key threshold considered to be competitive with the traditional experimental methods for determining protein structure.
  • Proteins are the exclusive product of DNA and as such are responsible for 100% of all functions and features that result from an organism’s genetic make-up.
  • In software terms, DNA is the lines of code while the protein is the app itself which underlines how crucial understanding their structure is.
  • Proteins are chains of amino acids which, after they have been synthesized from DNA, are folded by the electrostatic interactions between the amino acids to make the shape which enables their function.
  • Protein structure is crucial because it is, in effect, the execution of instructions that are encoded on the DNA strand.
  • However, because there are 20 different amino acids that can be used and because proteins can be many hundreds of amino acids long, the number of possible structures that can be formed from one chain is practically infinite.
  • Hence, calculating each possibility (as a regular computer would do (brute force)) is impossible as Cyrus Levinthal calculated in 1969 that a protein with 100 amino acids has 3198 possible combinations (see here).
  • The traditional way to work out a protein’s structure is to make a very pure crystal of the protein and bombard it with x-rays which are deflected when they hit atoms creating a pattern from which the structure can be deduced.
  • The most famous use of this technique was the discovery of the double-helix structure of DNA in the 1950s predicted by Watson and Crick and proved using this technique by Rosalind Franklin and Maurice Wilkins.
  • Because there are so many possibilities from a sequence of DNA, an algorithm needs to be able to work out where to look and what possibilities to exclude without having to calculate them.
  • This is exactly how Deep Mind was able to create AlphaGo and its typical multiple neural network structure is also present in its AlphaFold algorithms.
  • AlphaFold2 is similar to AlphaFold 1 but it has multiple neural networks that interact with each other and iterate to arrive at a predicted structure.
  • In 2018, AlphaFold easily won the competition but it was still a long way adrift of a level of accuracy that could make it a viable method for determining protein structure.
  • This year (the contest is held every 2 years) AlphaFold2 won easily but achieved a GDT score of 92.4.
  • The Global Distance Test (GDT) measures the % accuracy with which the model predicts the position of the amino acid residues compared to their actual position that has already been determined by multiple methods.
  • A score of 90 or more is considered to be on par with the tried and tested methods in user today for assessing protein structure.
  • Deep Mind has not yet submitted its data for peer review and publication, but assuming that all goes well (which I think it will), Deep Mind has taken this crucial scientific field into a new phase.
  • The system that it is using is not particularly power or data-hungry and, in my opinion, represents a good use of deep learning.
  • This is because while the permutations of structures are almost infinite, the data set upon which they are based are both finite and stable.
  • Furthermore, the electrostatic and chemical properties of the individual amino acids are relatively well understood meaning that the data set is also very well defined and almost perfectly labelled.
  • This makes protein structure an ideal use case for deep learning as it plays to all of its strengths and none of its weaknesses.
  • Deep Mind’s genius here is being able to create a system that uses deep learning but is also able to work out which options to examine and which to not to look at which is what I suspect lies behind its excellent results.
  • This will make it practical to estimate the structure of far more proteins which in turn should accelerate research as well as drug development.
  • Deep Mind may be at last justifying the $500m that Google paid for it as well as the hundreds of millions that have been subsequently spent.

RICHARD WINDSOR

Richard is founder, owner of research company, Radio Free Mobile. He has 16 years of experience working in sell side equity research. During his 11 year tenure at Nomura Securities, he focused on the equity coverage of the Global Technology sector.