I wanted to share a personal productivity workflow I’ve developed for capturing ideas as voice notes, especially when offline, and having them automatically processed and turned into structured tasks in Vikunja.
The inspiration for this comes from the idea that you should never lose a thought.
“Many times you forget them. And if you forget a good idea - you wanna commit suicide”.
(David Lynch).
https://youtu.be/LVeAuwU-uuU?si=s159lKLdZFjD-jnQ
While extreme xD , it highlights how crucial it is to capture ideas with all their details intact before they fade.
This setup is especially for those moments - in an underground parking garage, on a hike in the woods, or traveling abroad with spotty internet - when you can’t afford to lose a valuable thought. It turns a quick voice memo into a perfectly formatted Vikunja task, waiting for you the next morning.
This is a fairly advanced, self-hosted setup that requires some technical knowledge. I’m sharing the concept and architecture here to spark discussion, get feedback, and see how others might be solving similar problems.
If you’re not so worried about loosing your ability to quickly record when you’re offline, here are better and easier online-only options for you:
TL;DR & Prerequisites
The High-Level Flow:
- Capture: Record an offline voice note on an iPhone/Apple Watch.
- Sync: The recording is saved locally and automatically uploaded to cloud storage when a connection is available.
- Process: A scheduled script on a server pulls the audio file.
- Transcribe: The audio is sent to a Speech-to-Text (STT) service.
- Structure: The raw text is sent to a Large Language Model (LLM) to be summarized, formatted, and structured as a JSON object with a title, description, and other metadata.
- Create: The script uses the structured JSON to create a new task in Vikunja via its API, attaching the original audio file.
Core Components & Alternatives:
- Capture Device: iPhone / Apple Watch (can be adapted to Android);
- Automation App: iOS Shortcuts, Scriptable
- Cloud Storage: self-hosted Nextcloud (Can be replaced with hosted Nextcloud options, or any cloud storage that has an API, like Dropbox, Google Drive, etc.)
- Sync Trigger: Home Assistant (Used for a specific offline-sync mechanism via email sending, could be replaced with other automation tools like n8n.)
- Processing Server: Any self-hosted server (e.g., a VPS or home server) running a Python script used to run stuff based on schedule (could be replaced with other automation tools like n8n);
- Speech-to-Text (STT): OpenAI Whisper API (Alternatives: Self-hosted models, or other cloud services.)
- Language Model (LLM): Self-hosted
Gemma 3n E4B(Alternatives: OpenAI’s API (GPT models), or services like OpenRouter which provide access to many models.). LLM is not required, but it’s key part to make it seem like magic.
Prerequisites:
This is a DIY project. You should be comfortable with:
- Self-hosting applications.
- Basic scripting (the core logic is a Python script).
- Working with APIs.
The Workflow in Detail
Step 1: Capture on your phone/watch (e.g. iOS)
I use an iOS Shortcut on my iPhone and Apple Watch to start recording instantly. On my watch, a double-clench gesture triggers it, which is incredibly convenient.
- When I trigger the shortcut, it starts recording audio.
- When I stop, the recording is saved as an
.m4afile directly to the local iPhone file system. This is the crucial offline-first part. - If an internet connection is present, the shortcut tries to immediately upload the file to a specific folder in my Nextcloud storage.
Step 2: Syncing the Recordings When Finally Online
What if I recorded several notes while offline? They need to be uploaded once I’m back online. Manually triggering this is a pain, so I automated it.
- I have a Home Assistant automation that runs on a schedule (e.g., every 10 minutes).
- This automation sends a specific email to an address linked to my iPhone’s native Mail app (can be done by automation systems other than HASS, e.g. n8n);
- An IOS automation is configured to watch for an email with a specific subject (e.g., “Upload Vikunja Voice Recordings”);
- When this email is received, it automatically triggers a different iOS Shortcut. This shortcut’s only job is to scan the local folder for any audio files in given folder and send them all to Nextcloud (done through calling Scriptable app script written in JavaScript);
- Each succedded upload corresponds to recording being deleted from IPhone local folder so it doesn’t get uploaded next time automation gets executed by email.
Yes, I know, email… But this is the only way I found to remotely call a shortcut after all these years on IOS.
It usually triggers within 10-15 seconds of the email being sent.
Step 3: Server-Side Processing
Now the audio files are sitting in Nextcloud storage. A Python web script I host takes over from here.
- A scheduled job (cron) inside of script runs every minute (the more frequent the faster you’re gonna see task in Vikunja);
- The script connects to Nextcloud storage, checks the designated folder for new audio files, and downloads any it finds;
- It then sends each audio file to my Speach-To-Text (STT) service of choice, OpenAI Whisper. The API is fast, accurate, and cheap.
- Pro-Tip: You can pass a list of keywords or technical terms to the Whisper API to improve its recognition accuracy for specific jargon.
Step 4: Structuring the Brain Dump with an LLM
A raw transcription of my thoughts is often unstructured. Actually this topic also was written this way… I was just talking to my watch for a whole 20 min bus ride. And here is where the magic happens.
-
The raw text from Whisper is passed to an LLM. I use a self-hosted
Gemma 3n E4Bmodel because it’s lightweight, fast, and keeps my data private. -
I use a carefully crafted prompt to ask the LLM to process the text. The prompt looks something like this:
You are a helpful assistant that converts raw, dictated text into a structured task. Take the following text and:
- Create a concise and clear title for the task.
- Write a well-formatted description, summarizing the key points without losing any important details.
- Keep the original transcribed text at the very end of the description under a “### Raw Transcription” heading.
Return your response ONLY as a single JSON object with the keys “title” and “description”. Do not include any other text or markdown formatting around the JSON.
Here is the text:
“[INSERT RAW TEXT FROM WHISPER HERE]” -
The LLM returns a clean JSON object, like
{"title": "Book Weekly Barber Appointment", "description": "..."}.
Advanced LLM Capabilities:
You can push this even further by asking the LLM to:
- Extrapolate Metadata: Ask it to identify due dates, start dates, or recurring schedules (e.g., “remind me to do this every Tuesday at 4pm”) and return them as specific JSON fields.
- Identify the Project: You could ask it to parse a project name from the text. The script would then need to look up the corresponding project ID in Vikunja before creating the task.
Step 5: Creating the Task in Vikunja
The final step is straightforward:
- The Python script parses the JSON response from the LLM.
- It constructs a new JSON payload that matches the format required by the Vikunja API’s
create taskendpoint. - It populates the
titleanddescriptionfields with the data from the LLM. - Crucially, it also attaches the original audio file to the task. This is invaluable. If the LLM ever misunderstands something, or if I want to recall the tone and emotion of my original thought, I can just play the recording directly from the task.
- The script makes the API call, and the task appears in Vikunja.
If there were multiple recordings, the script loops through them, creating a batch of tasks all at once.
Let’s Discuss!
While there might be simpler integrations, I love the power and control this setup gives me. It’s fully customizable, works flawlessly offline, and the privacy can be locked down with self-hosted models.
I’m curious to hear your thoughts.
- How do you handle quick idea capture for Vikunja?
- Do you see any potential improvements or simplifications for this workflow?
- Has anyone else experimented with LLMs for task management?
If anyone is interested in the specific Python code or screenshots of the iOS Shortcuts, let me know. I’d be happy to share them.
P.S. I know I’m ignoring a some of you out there in topics and I’m sincerely sorry. I hope I’ll come back to some stuff sooner or later. Just decided to spend some time sharing what might be valuable for fellow automators that love to self host.