In the past ~4 years, I’ve built a library of about 15k bookmarks. Most of them I had archived, so that I can come back to them and review how it was written exactly when I read them. Additionally, I ran a small knowledge extraction script on them, to have datapoints that I can query with a semantic search through them. I’ve always been curious about how well (or not) are them archived, because I seldom review them.

I think it’s time to do something about this collection of bookmarks, so maybe I can start, month by month, reviewing them, picking the ones that are most interesting, and adding them to this digital garden for reference. The blog is Jekyll-based, because it was the default on GitHub and I figured it would be easy to switch to something like Hugo, which I like because it feels so blazingly fast.

Let’s do this with Claude. Let’s get the first 30 bookmarks in my db, their categories, create some tags for them, ensure their data archived was healthy, and expose them in the bookmark as a new kind of resource.

Prompt for Claude Code:

In the postgres database "garden" (connect with $DATABASE_URL) there are a few
tables that correspond to some bookmarks. Give me the oldest 30 bookmarks by
date, and let's modify the template of this Jekyll website to create a new
section for this digital garden called "Bookmarks". This website should work
without javascript, but if the user has javascript enabled, we should allow
searching through the bookmarks based on the embeddings of their summaries.

To make sure the bookmark data makes sense, display to me a query with a like
this one:

| UUID | URL | Creation Date | Title | Summary | Looks correctly bookmarked

The table "bookmarks" is connected to "bookmark_category" (connected to
"categories"), bookmark_content_references (which references bookmarks with
bookmark_id, has "content", and "strategy": when strategy is "summary-reader"
then "embedding" is the embedding using the multilingual-e5-large embedding
model). A bookmark is not correctly archived if it doesn't have a
summary-reader, doesn't have a correct embedding, or if it has more than one
summary-reader. The title comes from "bookmark_titles", and
"processed_contents" has a strategy_used that when it's "reader", it's the
markdown version of that page.
Here's the validation of your oldest 30 bookmarks. I notice: 
  - Several bookmarks are missing summary-reader (Google Docs, Gmail, etc. -
typically unscrapable)
  - One duplicate due to multiple titles (b2b234ca)
  - 17 out of 30 are correctly bookmarked

Ok, let’s move forward only with the ones that are actually interesting. Claude wanted to export all bookmarks at once; it said that there are actually 10,238 valid bookmarks - that's substantial!. It also wanted to allow the user to use a custom API to create the embeddings. I had to stop it, ask it to export each bookmark on one single file, ask it to research for a smaller embeding model so that we can give the user a semantic search. After telling it to let's use all-MiniLM-L6-V2, quantize it, embed everything again for these 30 bookmarks. Ultrathink a good solution, it came up with this new bookmarks section in this blog.

It still is a beta version, but it’s looking nice!

Bookmarks

Explore Bookmarks