My DA(i)LLE adventures

Comic strip generation from my journal entries.

Dec 26, 2023

A month ago I was laid off from Amazon Alexa as part of their latest org-wide reduction, and found myself with a lot more free time. Like, a lot more. It made me realize how much of our life we spend working, thinking about work, or the logistics around work. It’s pretty damn consuming. This break has made me think about the sort of things I really want to pour my time and energy into. I have some thoughts - but that’s for another time.

I journal sometimes and found myself scribbling away in my tattered WeWork notepad with this newfound extra time. I started a Notion document to summarize my daily activities to prevent the feeling of days blending into each other. At night I jot down 3-4 bullet points just to summarize the highlights of my day.

For instance, my journal entry for 12/15/23 was:

Wrote some code in the morning
Played tennis
Went to the gym
Went to a seafood restaurant at night

Lately, I’ve been playing around with ChatGPT and Open AI’s APIs, in particular DALLE-3 - Open AI’s image generation model. It’s an impressive tool and I’m always surprised to see what non-deterministic images it comes up with for certain prompts. This is where I thought I’d add some imagery to my recent adventures: generate a comic strip for my day-to-day.

The prompt I tested with DALLE-3:

Comic book style with no dialogues. 4 panels of action of a male character with this plot: <Day Summary Here>

What DALLE-3 came up with:

A comic generated from a standard journal entry.

Pretty cool, right? It captures the highlights successfully, although there are nuances that need some more prompt engineering to fix.

With my AWS prowess I wrapped the python script into a tiny service. It fetches daily updates from Notion, generates a comic, and delivers it to my inbox at 9 a.m. the following day:

Components:

Notion Doc: Where I write my daily summaries. Entries are in a tabular format with a static block ID (Notion’s definition of a page or an element within a page).
EventBridge: Runs a simple cron job to invoke the Lambda function every day at 9 am.
Lambda: Does all the ‘heavy’ lifting
1. Fetches daily updates from Notion via their Notion API.
2. Pings the OpenAI’s image generation API with a prompt and retrieves the comic.
3. Stores the image into S3.
4. Triggers SES to send an email.
S3: Archival storage for all the generated comics.
Simple Email Service (SES): Take a wild guess.

It’s been operational for a few days now. I now look forward to the previous day’s comic in my inbox every morning, and DALLE has produced some hilariously entertaining results. Here are a couple:

A day full of eating, meet-ups, and a concert.

I whipped this up over the weekend while avoiding the tropical heat at my parents’ house in Mumbai. There are a bunch of things I’d like to improve to make this a bit more interesting and appropriate:

Style Consistency: At the moment DALLE adopts a unique drawing style for each entry unless specified. I think this can be fixed with correct prompting, but I do enjoy DALLE experimenting with different styles and colour pallets.
NLP Journal Summaries: Not everyone is as lazy as me and actually puts effort into their journal entries. Ideally, I’d like to process natural journal entries (paragraph style) for those who write more detailed entries, instead of simple bullet points. Summarizing paragraphs is pretty straightforward thanks to developments of various LLMs, although the challenge lies in extracting what the author thinks is important.
Personalization: Instead of generic cartoon characters I can probably use head-shots to generate cartoons that actually look like myself. Several tools generate images with this concept as of today such as professional-looking AI headshots, and the recently viral ‘90s yearbook photos. Perhaps making my own comic book image generation model might be the next step?
Multiple Ingress Points: The current implementation depends on a specific tabular format from Notion to fetch daily updates. Ideally the data scraping should be generic and less rigid.

Image generation in itself is a fascinating topic. This explanation video from the OG CS educator Computerphile does a solid introduction around image diffusion models:

Hope you found this interesting! I’ll post my code on Github once I get access to my account again.

Sahil’s Substack

Discussion about this post