AI-powered text-to-video tool

Problems to solve

Videocreator is an audio-video editing & delivery application for smooth and fast professional video creation. Based on the text entered by the user, it generates a stunning video file composed of a spoken narrative, video footage and unobtrusive background music in a few minutes. 

The client searched for a trustworthy tech partner who will learn their product’s concept and nuances quickly and then develop it from the ground up. Due to the complexity of the project, strong technological competencies to build a functional web-based software were crucial when choosing a vendor.

The scope of work done by Merixstudio included:

  • designing the application’s functionality based on an initial set of wireframes,
  • building algorithms and backend logic development in Python,
  • aligning frontend-operated video edition input and output file generation on the backend,
  • integrating two of the Top 3 stock databases for asset sourcing,
  • scaling AWS environments - up to 100 users generating videos at one moment.

Highlights

1

tool for all industries

10

different languages supported

9

team members

millions

of stock video files for use

Solutions

Backend

spaCy
Python
Django
Celery

Design

No items found.

Frontend

Sass
Redux
React
Pixi.js
Next.js
GSAP

Tools

Websockets
Stripe
Postman
Paypal
AWS
Amazon S3
Amazon Polly
Amazon Cognito

QA

Performance tests
Manual Testing
JMeter
Cypress
Automated UI tests
API testing

Mobile

No items found.

The evolution of the idea started out with a couple of wireframes that defined the core functionality at a high level coupled with two mockup screens. Through a series of discussion sessions and tech consultancy, we’ve proposed a set of solutions to implement for the MVP of the project. The proactive approach in constructing a plan for development has not only won us the client’s trust but also served as a good knowledge base for the early stages of development

Merixstudio’s team adopted the tested ScrumBan workflow and set up a full-stack development team consisting of an Agile project manager, experienced frontend and backend developer, and Quality Assurance Specialist. To efficiently tackle the project’s challenges, we extended the team with another 2 frontend developers, backend developer, Quality Assurance Specialist and DevOps Engineer. In accordance with the best communication practices, they stayed in constant contact with the client (leveraging different platforms such as Slack, Google Meet), exchanging ideas, confirming assumptions and consulting their work with the project’s stakeholders. 

The first big challenge for development (and one that was the initial focus for the team) was video generation using the user’s input. The file generation had to be smooth in terms of performance, as well as accurate in aligning audio files, video footage, background music and subtitle layers in one output file. To guarantee the scalability of the application, we have prepared the appropriate infrastructure and AWS environment as well as performed successful tests for generating 100 videos at one time.

Another functionality that was considered core priority was introducing Amazon Polly text-to-speech technology to allow for the creation of audio files from a user’s text script. The audio output files were then to be paired up with stock video footage selected by the application’s smart algorithms by extracting keywords from the script. 

Once the above have been taken care of, it was predominantly our Frontend Engineers that were tasked then with creating a bespoke editor for the videos users would create. Once all input has been generated, the application allows to manipulate it on asset-specific timelines and decide the succession of assets that would be used in the final cut. 

Lastly, our Python engineers carried the weight of developing an engine that would tie together all of the assets and their respective placement in time and generate one output video file to be then shared via social media channels. It was a big challenge to ensure that this is done smoothly both in terms of performance and alignment of audio & video files. 

Key features

Amazon Cognito

videocreator s one of our client’s applications available on an online platform with AI-powered tools. To provide users with a smooth, fuss-free experience, we have implemented Cognito as the authentication tool. It allows users to harness all the applications without unnecessary logout when switching between them.

Amazon Polly

to build a speech-enabled application that works in many different countries, we reached for Amazon Polly. It’s an Amazon AI service that uses advanced deep learning technologies to synthesize speech that sounds like a human voice. A user provides a text and our application converts it to the speech by matching it to the videos.

Access to milions of video files and images

to make video production even more fun and easy, the application gives users access to millions of high-resolution images, video and audio files, provided by the most renowned stock content agencies.

Video editor

a user-friendly and powerful editor allows making changes in a created video. A user can preview the video, add titles, change videos, add transitions, control the timeline and choose the preferred aspect ratio.

Dashboard

clear, functional dashboard helps users to manage their account as well as maintain their videos and all uploaded media in one place

Powerful admin panel

to provide our client with deep analytics, we developed the admin panel in Django. Thanks to it, our client not only get access to insightful statistics but also can easily maintain the entire application.

Align technology with your business core and drive your company forward

Speak with our experts