# Past Projects
# PreScribe
Demo
An AI medical scribe that proactively checks off requirements for medicine as the conversation goes on, real-time.
Links:
The embedded tweets take a bit to load! Sorry about that
LLM-based Gene Perturbation Simulator
— Daniel George (@degtrdg) March 17, 2024
Put in a target gene, what you did to the gene, and what phenotype you're looking for, and simulate your perturbation on the rest of the gene network! pic.twitter.com/zQ1rx09nUl
Perceptual filter based on what's important to you!
— Daniel George (@degtrdg) January 19, 2024
Made this w/ @ahadj0 on having an LLM perceive what you will see and only show it to you if it's worth your time.
You define what's important to you and what's not. It reasons whether a tweet in your feed should show up! 1/n pic.twitter.com/02GdxyBmao
# Whisper Notes
Demo
A simple voice-to-text clipboard app for quickly recording your thoughts. Use global commands to record and fill your copy register with the transcript. View and delete previous transcripts.
Links:
# BioConceptVecXplorer
Demo
Knowledge is continuous, but its representation in papers is discrete. We can instead represent concepts as vectors and explore a more fluid latent space. Using vector embeddings trained on 30 million PubMed abstracts, we created a tool for researchers to create biological analogies to discover relationships not explicitly in the literature. This was previously done in material science to discover an anti-ferromagnetic material not explicitly in the literature. We extended this idea as a tool to enable bioengineers to make discoveries as well.
We were interviewed and written about for this project on the following biotech newsletter .
Links:
The following are mostly from the genetics lab I work at
# Spheroid Analysis
Example plot from one of my EDAs
Spheroids are 3D cultures that mimic tissues and micro tumors better than the 2D cultures we see in Petri dishes for a variety of reasons. When extending biological circuits from the 2D to the more realistic 3D, the morphology of the cells could affect things like protein production. Protein synthesis is an inherently stochastic process, so when creating biological circuits quantifying a measure of noise in the system is quite useful. This is extending the 2D analysis to spheroids. You can find out more in my slides.
Links:
# Physically Unclonable Function (PUF) Pipeline and Analysis
PUFs are a type of fingerprint that is introduced in manufacturing to show the lineage of a device. They are mainly used in integrated circuits, but they can also be used in other domains like biology. Using CRISPR one can make a PUF to that is impossible to reproduce and can be used to determine the lineage of a cell line ( CRISPR-PUF). I wrote a pipeline to process the millions of DNA sequencing reads that are used to find a ‘distance’ between cell lines. If they are the same cell line the distance would be small, and it would be large if the cell lines are different.
Links:
# Finding Genomic Safe Harbors for CHO
Logic behind pipeline from slides
When you edit a cell line with techniques like CRISPR, if you add code without taking into account the function of the regions around it, there might be unintended consequences. There are certain regions that are considered safe harbors where your changes won’t interfere with existing functionality. There was code from a paper which defined requirements for safe harbors for the human genome, but we needed it to work for the Chinese Hamster Ovary, so I wrote a Python port than can be extended to any genome. See slides for more info.
Links:
# Projects
# Small Molecule Autocomplete RNN
I trained a LSTM based RNN to efficiently enumerate chemical space given a SMILES input. It uses BFS to take the top few possibilities in the probability distribution, and I made model more creative and incentivized it to give shorter outputs.
Links: