Daniel George

Search

Search IconIcon to open search

Last updated Oct 3, 2024 Edit Source

# Past Projects

The embedded tweets take a bit to load! Sorry about that

# Queue-based clipboard with transcriber for easily capturing context for LLMs

# AI-enabled chart generation

# PreScribe

Demo

An AI medical scribe that proactively checks off requirements for medicine as the conversation goes on, real-time.

# Proposal Reviewer

Demo

Creates a diff on your writing given the weakness of your writing. Only looks at the most relevant parts of your writing and gives specific deletions and inserts to the writing to address the weakness.

Links:


# Personalized Discovery Fiction Generator

Demo

Using LLMs to create discovery fiction (like https://michaelnotebook.com/df/index.html) where the model creates a first-person narrative of the information of how the student could have discovered what they’re learning.


# LLM-based Gene Perturbation Simulator


# Perceptual filter


# The Orwell Editor v1


# Tools for Stepping into Biology


# Synapse: Insights in context of when you need them

Demo

Writing interface that brings up relevant but slightly tangential work in a reduced representation to give peripheral vision ( https://notes.andymatuschak.org/Peripheral_vision) on things that might lead to a new direction of work. Data is sourced from a the researcher’s citation manager (Zotero). Researchers accumulate a pristine dataset of what’s most import in these citation managers which can lead to interesting things coming up. The reduced representation of the source gives the user the gist of the idea that they’re already familiar with.


# Whisper Notes

Demo

A simple voice-to-text clipboard app for quickly recording your thoughts. Use global commands to record and fill your copy register with the transcript. View and delete previous transcripts.

Links:


# BioConceptVecXplorer

Demo

Knowledge is continuous, but its representation in papers is discrete. We can instead represent concepts as vectors and explore a more fluid latent space. Using vector embeddings trained on 30 million PubMed abstracts, we created a tool for researchers to create biological analogies to discover relationships not explicitly in the literature. This was previously done in material science to discover an anti-ferromagnetic material not explicitly in the literature. We extended this idea as a tool to enable bioengineers to make discoveries as well.

We were interviewed and written about for this project on the following biotech newsletter .

Links:


The following are mostly from the genetics lab I work at

# Spheroid Analysis

Example plot from one of my EDAs

Spheroids are 3D cultures that mimic tissues and micro tumors better than the 2D cultures we see in Petri dishes for a variety of reasons. When extending biological circuits from the 2D to the more realistic 3D, the morphology of the cells could affect things like protein production. Protein synthesis is an inherently stochastic process, so when creating biological circuits quantifying a measure of noise in the system is quite useful. This is extending the 2D analysis to spheroids. You can find out more in my slides.

Links:


# Physically Unclonable Function (PUF) Pipeline and Analysis

PUFs are a type of fingerprint that is introduced in manufacturing to show the lineage of a device. They are mainly used in integrated circuits, but they can also be used in other domains like biology. Using CRISPR one can make a PUF to that is impossible to reproduce and can be used to determine the lineage of a cell line ( CRISPR-PUF). I wrote a pipeline to process the millions of DNA sequencing reads that are used to find a ‘distance’ between cell lines. If they are the same cell line the distance would be small, and it would be large if the cell lines are different.

Links:


# Finding Genomic Safe Harbors for CHO

Logic behind pipeline from slides

When you edit a cell line with techniques like CRISPR, if you add code without taking into account the function of the regions around it, there might be unintended consequences. There are certain regions that are considered safe harbors where your changes won’t interfere with existing functionality. There was code from a paper which defined requirements for safe harbors for the human genome, but we needed it to work for the Chinese Hamster Ovary, so I wrote a Python port than can be extended to any genome. See slides for more info.

Links:

# Projects


# Small Molecule Autocomplete RNN

I trained a LSTM based RNN to efficiently enumerate chemical space given a SMILES input. It uses BFS to take the top few possibilities in the probability distribution, and I made model more creative and incentivized it to give shorter outputs.

Links: