Don’t miss OpenAI, Chevron, Nvidia, Kaiser Permanente, and Capital One leaders only at VentureBeat Transform 2024. Gain essential insights about GenAI and expand your network at this exclusive three day event. Learn More
As the world continues to explore the potential of GPT-4o beating Claude 3.5 Sonnet, EvolutionaryScale, an AI research lab founded by former Meta engineers, who ran the company’s now-disbanded protein-folding team, is moving in a completely different domain: making biology programmable.
The task sounds complicated, but the year-old company is already making waves. Today, it announced the launch of ESM3, a natively multimodal and generative language model that can follow prompts and design novel proteins. In tests, the model was able to generate a novel green fluorescent protein (esmGFP), which would have taken hundreds of millions of years to evolve naturally.
“esmGFP…has a sequence that is only 58% similar to the closest known fluorescent protein. From the rate of diversification of GFPs found in nature, we estimate that this generation of a new fluorescent protein is equivalent to simulating over 500 million years of evolution,” the company wrote in a pre-print paper posted on its website on Tuesday.
In addition to the new model, which comes in three sizes, the startup announced it has raised $142 million in a seed round of funding, led by Nat Friedman, Daniel Gross and Lux Capital. AWS and Nvidia’s venture capital arm also participated in the round. The smallest model has also been open-sourced to accelerate research with the new models.
Countdown to VB Transform 2024
Join enterprise leaders in San Francisco from July 9 to 11 for our flagship AI event. Connect with peers, explore the opportunities and challenges of Generative AI, and learn how to integrate AI applications into your industry. Register Now
However, building the model is just the start and it remains to be seen how impactful it will be in the real world.
Why EvolutionaryScale is targeting biology with AI
While generative AI models have evolved a lot, especially in understanding and reasoning with human language, many have wondered if we can train these models to decipher the core language of life and then use them to develop novel molecules. The core molecules of life — RNA, proteins and DNA – evolved over the last 3.5 billion years through natural chemical reactions. So, having a way to program biology and design new molecules could pave the way to solve some of the biggest challenges faced by humanity, including climate change, plastic pollution and conditions like cancer.
Multiple organizations, including Google Deepmind and Isomorphic Labs, are already in this space, and the latest one to join the fray is EvolutionaryScale. The company, founded in 2023, developed a few protein language models over the last few months, but its latest offering, ESM3, is the largest of all — and natively multimodal and generative.
Described as a frontier generative model for biology, ESM3 was trained with 1 trillion teraflops of computing power on 2.78 billion natural proteins sampled from various organisms and biomes and 771 billion unique tokens. It can jointly reason across three fundamental biological properties of proteins: sequence, structure and function. These three data modalities are represented as tracks of discrete tokens at the input and output of ESM3. As a result, the user can present the model with a combination of partial inputs across the tracks, and the model will provide output predictions for all the tracks, generating novel proteins.
“ESM3’s multimodal reasoning power enables scientists to generate new proteins with an unprecedented degree of control. For example, the model can be prompted to combine structure, sequence and function to propose a potential scaffold for the active site of PETase, an enzyme that degrades polyethylene terephthalate (PET), a target of interest to protein engineers for breaking down plastic waste,” the company explained.
In one case, the company was able to use the model with chain-of-thought prompting to design a novel version of green fluorescent protein, a rare protein that can attach to and mark another protein with its fluorescence, enabling scientists to see the presence of the particular protein in a cell. EvolutionaryScale found that the generated version of this protein has brightness characteristics as natural fluorescent proteins. It would have taken nature 500 million years to evolve this generation of protein.
The team also noted that ESM3 can self-improve, providing feedback on the quality of its generations. Feedback from lab experiments or existing experimental data can also be applied to align its generations with goals.
Impact remains to be seen
As of now, ESM3 is available in three sizes, small, medium and large. The smallest one, with 1.4B parameters, has been open-sourced with weights and code on GitHub under a non-commercial license. Meanwhile, the medium and large versions — going up to 98B params – are available for commercial use by companies through EvolutionaryScale’s API and platforms from partners Nvidia and AWS.
EvolutionaryScale hopes researchers will be able to use the technology to solve some of the biggest problems of the world and benefit human health and society. However, its broader applications by companies remain to be seen. The biggest possible beneficiary of the technology could be pharmaceutical companies that could lead the development of novel medicines targeting life-threatening conditions.
Previous models from the company were used in use cases such as improving therapeutically relevant characteristics of antibodies as well as detecting COVID-19 variants to could pose a major risk to public health.
Source link lol