The Tech Stack Behind AI Video Generation Explained

Adam Antal

Colossyan is a synthetic media company. We are working on creating a world-class video-editing platform that focuses on human interaction: artificially generated human actors will tell your story.

This blog post gives a little glimpse into the types of engineering problems and solutions we have at Colossyan. If you're not a technical person, but want to know more about Colossyan, check out our new website, fresh from the oven.

Colossyan is a relatively fresh product, but already has many components.We build our stack leveraging different open source AI solutions, trained with our actors, built into a complete Saas product.

AI actors

Before we jump to the full stack, let's talk about the heart of our product: the AI actors. Different AI models such as our actors require compute power to run. Usually these models compromise on either quality and speed: the better the quality of the actor, the longer a user has to wait for it to be generated. Also the underlying compute resources can vary a lot based on what the chosen (cloud) provider can offer. As GPUs are also quite expensive in the cloud, we have to own a flexible setup: autoscaling of instances using GPUs is a must for any company running AI models at scale in production.

Mia, AI actor
Mia, one of our AI actors

We use AWS, and we run G4 EC2 instances that have GPUs in it. For automatic scaling we leverage AWS ECS, and taking it one step further we benefit from AWS Batch for batch processing the work. We did not immediately start off with stream based processing as GPU-based instance scaling is still a bottleneck most of the time. However as our models and other parts of the stack are being improved, we get closer and closer to achieving real-time actor synthetisation.

The stack

Besides all the actor AI models, there are plenty of well-known industry-standard components that we build our stack upon. From the top to the bottom, our React UI kit was brought to the customers at a very fast pace - just check out our changelog powered by Canny. There's usually a major UI update every 2 weeks. 

Our web backend and API is bundled in a NodeJS container, orchestrated in Kubernetes. Kubernetes comes with a lot of features to ship our product easily (like rolling upgrades), and also many auxiliary services help us in our everyday work: Prometheus for monitoring, Nginx for ingress, logging, etc.

Our product metrics related analytics platform is Mixpanel, providing us with a lot of insights. We pay significant attention to the numbers and raise the bar higher and higher every week. We are actively monitoring many product-related metrics and detecting any anomalies that occur. Our infrastructure has just recently been put to the test: our Hide the pain Harold prank video campaign. We peaked at several thousand of videos generated in a single day, without major interruption or more than 15 minutes queueing time, but that story will be told in a future blog post.

Hide the pain Harold's prank

The future

We’ll actively work on creating proper SLAs in the future, by the time we reach product maturity we have to get ready to provide an excellent service to our customers. Failed video generation or major UI flaws all fall into this category.

There are also plenty of new features in our roadmap. From an engineering perspective they differ in complexity, we will uncover them in later blog posts.

What's next?

Want to know how AI models are assembled? How were we able to improve the scalability and robustness of the system? Stay tuned, and follow our channel to get notified about our upcoming blog posts that will dive deep into many more engineering topics!

Wanna join us? Check out our job opportunities here!

Thank you for Abel Erdesz and Robert Albornoz for the suggestions.

Latest posts