This is a really creative idea! Love the use of agents to generate ideas for ESM3 to go validate.
I'm curious - when a Eureka moment stored in the Biofoundry and selected by the user for simulation, are you using the initial RAG to identify relevant gene ontology labels in addition to the domains? For example, building off the wooly mammoth use case, you could condition the protein generation using the gene ontology label GO:0009631, which describes "any process that increases freezing tolerance of an organism in response to low, nonfreezing temperatures".
Thanks Chris! I'll swap with O3-mini when it gets released. It's kind of amazing how creative and non-obvious some of these reasoning models are with idea generation.
I haven't used GO labels yet. For the RAG system, the Pfam dataset comes annotated with pretty robust descriptions + the protein sequences. I used an LLM to summarize and revise them for clarity. Agents use that info when picking a domain and in their decion making. Do you think GO labels would be redundant in this case, or do you see them as more useful for conditioning the functional track in ESM3? Open to testing it out!
It could definitely help! If you want a global property for the protein (such as cold resistance in the wooly mammoth example), then I think the functional annotations will likely do a good job of ensuring that newly-generated piece of the protein conform to that global property. The input domain probably already does this a bit implicitly, but I imagine the functional annotations will make it more robust.
This is a really creative idea! Love the use of agents to generate ideas for ESM3 to go validate.
I'm curious - when a Eureka moment stored in the Biofoundry and selected by the user for simulation, are you using the initial RAG to identify relevant gene ontology labels in addition to the domains? For example, building off the wooly mammoth use case, you could condition the protein generation using the gene ontology label GO:0009631, which describes "any process that increases freezing tolerance of an organism in response to low, nonfreezing temperatures".
Thanks Chris! I'll swap with O3-mini when it gets released. It's kind of amazing how creative and non-obvious some of these reasoning models are with idea generation.
I haven't used GO labels yet. For the RAG system, the Pfam dataset comes annotated with pretty robust descriptions + the protein sequences. I used an LLM to summarize and revise them for clarity. Agents use that info when picking a domain and in their decion making. Do you think GO labels would be redundant in this case, or do you see them as more useful for conditioning the functional track in ESM3? Open to testing it out!
It could definitely help! If you want a global property for the protein (such as cold resistance in the wooly mammoth example), then I think the functional annotations will likely do a good job of ensuring that newly-generated piece of the protein conform to that global property. The input domain probably already does this a bit implicitly, but I imagine the functional annotations will make it more robust.