Daniel George

Last updated Feb 7, 2025 Edit Source

# Past Projects

The embedded tweets take a bit to load! Sorry about that

# Queue-based clipboard with transcriber for easily capturing context for LLMs

peek at something I've been playing with

a queue-based clipboard with a built-in transcriber.

it's brought the activation energy of asking good questions to claude down to zero because it captures my thought process as i'm thinking. i honestly hate having to context switch… pic.twitter.com/GjFBxjiBul
— Daniel George (@degtrdg) September 19, 2024

tool that brings the activation energy of asking good questions to claude down to zero because it captures my thought process as i’m thinking
reduces context switch between reading, copypasting, typing, etc while focusing

# AI-enabled chart generation

Scientists should spend more time doing science and less time fighting with their software.

Introducing @sphinx_bio’s new AI-enabled cell editing, a way for scientists to answer their most important questions in a fraction of the time!

Now you can simply tell Sphinx what you’re… pic.twitter.com/Pff8P9wuSz
— Nicholas Larus-Stone (@nlarusstone) September 4, 2024

I spearheaded this feature from ideation, implementation, and set the foundation for Sphinx’s LLM evaluations with what I made here
Read more details in the blog: https://www.sphinxbio.com/post/ai-enabled-chart-creation

# PreScribe

Demo

An AI medical scribe that proactively checks off requirements for medicine as the conversation goes on, real-time.

Creates a diff on your writing given the weakness of your writing. Only looks at the most relevant parts of your writing and gives specific deletions and inserts to the writing to address the weakness.

Links:

Demo

# Personalized Discovery Fiction Generator

Demo

Using LLMs to create discovery fiction (like https://michaelnotebook.com/df/index.html) where the model creates a first-person narrative of the information of how the student could have discovered what they’re learning.

# LLM-based Gene Perturbation Simulator

LLM-based Gene Perturbation Simulator

Put in a target gene, what you did to the gene, and what phenotype you're looking for, and simulate your perturbation on the rest of the gene network! pic.twitter.com/zQ1rx09nUl
— Daniel George (@degtrdg) March 17, 2024

# Perceptual filter

Perceptual filter based on what's important to you!

Made this w/ @ahadj0 on having an LLM perceive what you will see and only show it to you if it's worth your time.

You define what's important to you and what's not. It reasons whether a tweet in your feed should show up! 1/n pic.twitter.com/02GdxyBmao
— Daniel George (@degtrdg) January 19, 2024

# The Orwell Editor v1

Introducing The Orwell Editor v2 ✨

Now there is:
- copy & paste for your writing to have it analyzed
- shortcuts to make post-it notes
- I'm trying more non-linear writing, so lmk how it is
- upgraded UI to be unobtrusive

Link is below.
cc: @_buildspace @_nightsweekends pic.twitter.com/nqUeRBhVyc
— Daniel George (@degtrdg) April 30, 2023

# Tools for Stepping into Biology

What would a more human medium for sharing understanding in biology look like? Mediums that shed the assumptions of our limitations from when paper was SOTA? https://t.co/Osq41k0iZH

Thanks to @NikoMcCarty for getting me to publish my first piece! pic.twitter.com/l0jPDe0X2k
— Daniel George (@degtrdg) October 14, 2023

https://degtrdg.substack.com/p/tools-for-stepping-into-biology

# Synapse: Insights in context of when you need them

Demo

Writing interface that brings up relevant but slightly tangential work in a reduced representation to give peripheral vision ( https://notes.andymatuschak.org/Peripheral_vision) on things that might lead to a new direction of work. Data is sourced from a the researcher’s citation manager (Zotero). Researchers accumulate a pristine dataset of what’s most import in these citation managers which can lead to interesting things coming up. The reduced representation of the source gives the user the gist of the idea that they’re already familiar with.

# Whisper Notes

Demo

A simple voice-to-text clipboard app for quickly recording your thoughts. Use global commands to record and fill your copy register with the transcript. View and delete previous transcripts.

Links:

# BioConceptVecXplorer

Demo

Knowledge is continuous, but its representation in papers is discrete. We can instead represent concepts as vectors and explore a more fluid latent space. Using vector embeddings trained on 30 million PubMed abstracts, we created a tool for researchers to create biological analogies to discover relationships not explicitly in the literature. This was previously done in material science to discover an anti-ferromagnetic material not explicitly in the literature. We extended this idea as a tool to enable bioengineers to make discoveries as well.

We were interviewed and written about for this project on the following biotech newsletter .

Links:

The following are mostly from the genetics lab I work at

# Spheroid Analysis

Example plot from one of my EDAs

Spheroids are 3D cultures that mimic tissues and micro tumors better than the 2D cultures we see in Petri dishes for a variety of reasons. When extending biological circuits from the 2D to the more realistic 3D, the morphology of the cells could affect things like protein production. Protein synthesis is an inherently stochastic process, so when creating biological circuits quantifying a measure of noise in the system is quite useful. This is extending the 2D analysis to spheroids. You can find out more in my slides.

Links:

Github

# Physically Unclonable Function (PUF) Pipeline and Analysis

PUFs are a type of fingerprint that is introduced in manufacturing to show the lineage of a device. They are mainly used in integrated circuits, but they can also be used in other domains like biology. Using CRISPR one can make a PUF to that is impossible to reproduce and can be used to determine the lineage of a cell line ( CRISPR-PUF). I wrote a pipeline to process the millions of DNA sequencing reads that are used to find a ‘distance’ between cell lines. If they are the same cell line the distance would be small, and it would be large if the cell lines are different.

Links:

Github

# Finding Genomic Safe Harbors for CHO

Logic behind pipeline from slides

When you edit a cell line with techniques like CRISPR, if you add code without taking into account the function of the regions around it, there might be unintended consequences. There are certain regions that are considered safe harbors where your changes won’t interfere with existing functionality. There was code from a paper which defined requirements for safe harbors for the human genome, but we needed it to work for the Chinese Hamster Ovary, so I wrote a Python port than can be extended to any genome. See slides for more info.

Links:

Github

# Projects

# Small Molecule Autocomplete RNN

I trained a LSTM based RNN to efficiently enumerate chemical space given a SMILES input. It uses BFS to take the top few possibilities in the probability distribution, and I made model more creative and incentivized it to give shorter outputs.

Links:

Demo

# Past Projects

# Queue-based clipboard with transcriber for easily capturing context for LLMs

# AI-enabled chart generation

# PreScribe

# Proposal Reviewer

# Personalized Discovery Fiction Generator

# LLM-based Gene Perturbation Simulator

# Perceptual filter

# The Orwell Editor v1

# Tools for Stepping into Biology

# Synapse: Insights in context of when you need them

# Whisper Notes

# BioConceptVecXplorer

# Spheroid Analysis

# Physically Unclonable Function (PUF) Pipeline and Analysis

# Finding Genomic Safe Harbors for CHO

# Projects

# Small Molecule Autocomplete RNN