Sign in
Log inSign up
I've built a speech to text app for my wife. Here is how.

I've built a speech to text app for my wife. Here is how.

Kirill Shirinkin's photo
Kirill Shirinkin
·Nov 18, 2020·

6 min read

My wife is finishing her thesis at the moment. 👩🏻‍🎓 As part of this thesis she needed to conduct around a dozen hour-long interviews, then transcribe them to text and do some analysis of it (all in German language). She was telling me how huge the task of transcribing audio to text will be and how she will need hours and hours of extremely unrewarding work to do it. 🙇🏻‍♀️

Being a developer and a cloud infrastructure engineer myself, I helped her out and built an app. In this app, she could upload her audio files and get an accurate transcription back.

Having the backbone of the app ready, it was quite simple to make an actual production app out of it, and this is exactly what I did. Here I want to share some technical decisions that helped me to ship the complete product in around 8 weeks, working on it mostly on weekends and late evenings. In total it took me around 25 hours to build it.

Rails

The complete application is written in Ruby, with Ruby on Rails. This is by far my favourite framework since more than a decade to create web applications. As always with Rails, I was able to add many things like authentication and authorization, file storage, sending emails, admin interface, background job processing etc etc.

One new thing I did differently this time is using Que for background processing. I wanted to keep the total number of components to maintain as small as possible. Most of the background processing workers require another persistence layer, be it Redis or SQS or something else. Que stores and processes jobs from the same database as the application, PostgreSQL in case of Transcripto.

Stimulus

Extra credit goes to StimulusJS. I am not a frontend developer, but I really like Vue.js - it allows writing nice, maintainable, components-orieneted frontend without doing complete SPA thing. With Vue I can just sprinkle interactive components over different parts of the application.

Same applies to Stimulus, but it's even simpler and easier to use. I've tried it on two projects including Transcripto and I love the simplicity of it. Of course, this application has minimal requirements to frontend, but if you are making an app that doesn't need to be full-grown SPA, then I recommend taking a look at Stimulus.

AWS

The infrastructure layer is completely built on AWS. I've used AWS S3 (and S3 Event Notifications), SNS, SQS, ECS Fargate, Transcribe, Aurora Serverless. Standard services, like CloudWatch, VPC and IAM are also part of the stack of course.

Managing this number of cloud resources, across two AWS accounts (one for preprod and one for prod) in a proper way could be a challenge. For my clients I normally would use Terraform, as I believe it's superior to CloudFormation. But for Transcripto I've tried something new - AWS Copilot

AWS Copilot

AWS Copilot is a new CLI for building and deploying apps on AWS. It's pretty new, but it's already great. It makes managing aforementioned AWS resources and linking multiple AWS accounts, setting up CI/CD pipeline (with CodeBuild and CodePipeline) very easy.

If you are starting a new product on top of AWS, I highly recommend looking at Copilot. It saved me may be 5-6 hours of time, if not more. And yes, in the end it generates a lot of CloudFormation templates, but it's care case where I can live with this.

Misc

Other essential tools:

  • ffmpeg - for audio analysis;
  • GitHub - for source control;
  • Figma - for design (my co-founder of mkdev.me, Leo, used it);
  • Bootstrap - still easiest way to do good looking responsive web site;
  • Rollbar - exceptions tracking;
  • Freshdesk - for free ticketing system;

I will talk about many parts of the stack in depth on mkdev YouTube channel.

Transcripto was a fun side project to do - I tried a bunch of new things and helped my wife a lot, already a win-win situation.

It's far from perfect - there are many things I want to improve both on infrastructure and application sides. But I hope this tool will make lives of many students and journalists much easier.

You can see the full stack on Transcripto's StackShare page.