Generative NLP
Child Safe Generative AI Frameworks
Our facial analysis AI models enable sophisticated analysis of human emotions through advanced visual facial detection, recognition, and emotion extraction.
These models have been optimized to run on the edge ( devices ) and can also provide high throughput on CPU-only systems.
Our models are fine-tuned on child-focused datasets, a process that eliminates false detections and bias, ensuring significantly higher accuracies compared to other emotion recognition tools in spite of these tools using online facial analysis models and able to perform only if high quality images are available
Multilingual AI and generative frameworks
Our AI frameworks are multilingual models. They support 50+ languages without explicit training or configuration at high accuracies.
In this demo, we demonstrate the capability of multilingual frameworks on Miko3 to engage in open-ended conversations in multiple languages on various topics, such as food recipes, the robot’s swimming capabilities, opinion-based queries related to the best footballer, as well as sensitive discussions like parental preference and promoting good habits while discouraging bad ones.
This is different from conversational frameworks and SDK available, which only support separate models for each language and at significantly lower accuracies or are unable to respond to similar queries in different languages.
One of the safety issues with AI models and LLM has also been to ensure safety and that no bias exists in languages apart from English. Our AI models and LLM ensures the same levels of safety and performance in all the supported languages
Generative Audio
Neural Voice-Cloning
Our Neural Voice cloning system is designed to precisely replicate the voice characteristics of any target speaker, accurately preserving the original speaker’s style, rhythm, and unique vocal properties.
This technology excels in generating voiceovers that maintain the authentic nuances of the original speech. This is unlike other voice cloning tools and solutions that claim to create voice clones, but generated voices do not preserve the voice characteristics to a sufficient degree and does not require any special equipment for audio recordings
The current technology requires 5m of audio data for training. A high-accuracy model training is also available with 20m of audio data. The audio generation time is 2-3 seconds for 15 seconds of audio
In this demo, we demonstrate the capability of neural voice cloning for two famous personalities: Morgan Freeman, a real-life american actor and Miles Morales, a fictional character from the animated Spider-man movie series. Both personalities are speaking a paragraph which was never spoken in real-life or in any movie, and the voice characteristics and prosody are very similar to the actual voices.
Neural Voiceover Songs
This technology extends the voice cloning capabilities to be able to perform voice-overs for songs as well.
Furthermore, our solution also supports cross-linguistic voice cloning, enabling the target speaker to deliver content in a different language while retaining the distinctive attributes of their native speech. This versatility ensures high-quality voice synthesis, which is suitable for diverse applications in various linguistic and cultural contexts.
The ability to perform voiceover for songs is a key differentiation compared to other available voice cloning tools and technologies
In this demo, we demonstrate the capability of neural voiceover for two famous personalities: Morgan Freeman, a real-life american actor and Miles Morales, a fictional character from the animated Spider-man movie series. Morgan Freeman is singing an Indian cinema (Bollywood) song in Hindi, 'Mere saamne waali khidki mein', while Miles is singing 'Wake me up' song by Avicii.
Edge AI framework
Emotion recognition frameworks
Our facial analysis AI models enable sophisticated analysis of human emotions through advanced visual facial detection, recognition, and emotion extraction.
These models have been optimized to run on the edge ( devices ) and can also provide high throughput on CPU-only systems.
Our models are fine-tuned on child-focused datasets, a process that eliminates false detections and bias, ensuring significantly higher accuracies compared to other emotion recognition tools in spite of these tools using online facial analysis models and able to perform only if high quality images are available
Small vocabulary speach recognition
Small vocabulary edge-based AI systems are a niche domain with only a handful of solutions available performing at low latency and high accuracy. However, most of these solutions were designed for a stational device use case and had performance issues ,especially under motor noises and environmental noice
Our small vocabulary AI engine is optimized for moving platforms and environment noises while being extremely power efficient, allowing the pipeline to run inference for 4-5 wake word models with only a 4-5 ms overhead on-device.
In this demo, we demonstrate Miko3’s capability to detect the ‘Hey Miko’ utterance with a certain confidence score while rejecting any other speech.
These AI engines support multiple wake words and a single model for various geographies and user language preferences. This is in contrast to the available commercial offerings, all had requirements of creating models for different geographies and accents, leading to increased complexity, requiring users to select the appropriate language and accent preferences.
The training data and quality requirements for creating our AI models are very simple in contrast to various commercial AI engines rhat erquired special equipment and environmental conditions to capture the training data and requiring atlleat 10 times the data set to even create the baseline model
Large class audio detection
We illustrate Miko3’s advanced capability to detect various audio events, including music, pet animal sounds (such as dogs and cats), and emergency vehicle sounds (such as ambulances and fire trucks).
This AI model runs on the edge/on-device, large-scale audio event recognition pipeline and is designed to detect over 500+ distinct sound signatures accurately.
This system is optimized for power efficiency and low latency, ensuring rapid and reliable audio event detection without compromising performance.
Personalized NLP
Self Initiating Discussions
Our conversational frameworks not only engage in ongoing dialogues but also initiate conversations independently, such as inviting the user to play a game. Following user requests, Miko Mini transitions smoothly into discussing a range of physics-based facts about space. This interactive and autonomous conversational ability set our AI frameworks apart from existing state-of-the-art companion/social robots and voice assistants, which typically support only one-way interactions and do not initiate conversations to engage users proactively.
This feature empowers parents to personalize the AI companion's interactions, making each child's experience truly unique and tailored.
With the parental app, parents can take control and configure the preferred topics on which Miko should converse with the child. Steerable AI frameworks accept input from the user ( parent ), providing personalized preferences as per the user's requirements.
As the user engages with our products, our conversation personalization engine models the user person by learning the user's likes, dislikes, and preferences and initiates conversation on topics of interest to the child.
Steerable AI frameworks
The large-scale structured multilingual query framework is utilized for scenarios where reasonably large training data is available and provides a very high-accuracy query categorization framework.
These belong to the same class of frameworks as Facebook wit.ai, amazon lex, and Microsoft Louis, RASA NLP. However, we found that these frameworks had significantly lower accuracy and higher response times, leading to 5x+ cost.
The sematic query matching AI framework is utilized for queries belonging to subjective categories. These queries have extremely sparse training data2.
The AI framework can support a huge category of queries.
In the available NLP SDK and toolkit, these classes of queries provide inferior generalization performance and support a minimal category of queries.
In this demo, we demonstrate the capability of Miko mini to have a conversation on various queries related to Miko's personality, such as the ability to go snorkeling, cut fruits & vegetables, Miko's own weight, the ability to fly, Niko's birthday, giving recipes. These capabilities differentiate Miko robots from exitsting state of the art voice-enabled systems and social robots in terms of understanding child queries from all domains, including the robot's own personality, and giving appropriate child-safe responses in multiple languages.
Generative Image Frameworks And Other Image AI Models
Neural Image Search
AI multimodal Neural search engine can process up to 100+ million images with search times and inference latencies of less than 2 seconds for the entire pipeline.
The system delivers high-accuracy image search by accepting input text in natural language and returning contextually relevant images.
This is in contrast to other image search systems, which only deliver high accuracies if captions in specific formats or styles accompany the image and if image quality is very high.
Additionally, this system also delivers high accuracies by accepting images as input and returning contextually relevant images without needing image descriptions or captions. When text captions or descriptions are provided along with images, our AI systems can provide significantly higher accuracies than unimodal search modalities.
In this demo, we showcase the capabilities of our neural image search engine, which can output specific and relevant images along with confidence scores for a given text query.
Celebrating Excellence
Miko’s AI Patent recognised globally by WIPO!
Miko's innovative Al solutions patent is now among the most notable GenAl patents from across the globe.
AI-Powered Adaptive Learning System
An adaptive learning system with 15+ patents, including numerous patents granted for AI, NLP and more, using artificial intelligence for localizing and mapping users and objects.