ShengShu Launches Motubrain as New Strategic Pillar for Embodied AI, Backed by $293 Million Series B

New updates have been reported about ShengShu Technology.

Meet Samuel – Your Personal Investing Prophet

Start a conversation with TipRanks’ trusted, data-backed investment intelligence
Ask Samuel about stocks, your portfolio, or the market and get instant, personalized insights in seconds

ShengShu Technology has introduced Motubrain, a unified “world action” model positioned as a general-purpose robotic brain that replaces fragmented, task-specific systems and extends the company’s core video world-model technology into the physical realm. Built on the same multimodal foundations as its flagship Vidu video platform, Motubrain delivers end-to-end perception, prediction, and action control, using video as the central medium to capture time, motion, and causality for robots operating in industrial, commercial, and home environments.

The system has posted top-tier scores on key embodied AI benchmarks, including a 63.77 EWM score on WorldArena and an average of 96.0 across 50 tasks on RoboTwin 2.0, making it the only model to surpass 95.0 in randomized settings and signaling strong generalization across scenarios and robot types. ShengShu reports that Motubrain is already in active deployment with multiple robotics partners, supported by a $293 million Series B round led by Alibaba Cloud and other strategic investors, as the company positions Motubrain and Vidu as twin growth engines for the emerging Physical AI market.

Motubrain’s architecture is designed around a single multimodal model that learns video and action jointly, delivering five capabilities in one training cycle: vision-language-action control, world modeling, video generation, inverse dynamics modeling, and joint video-action prediction. A three-stream Mixture-of-Transformers connects video, language, and action, allowing robots to interpret natural-language instructions, understand their surroundings, forecast outcomes, and generate appropriate action sequences without relying on stitched-together perception and planning modules.

The model is explicitly engineered for cross-robot scalability, enabling one brain to run on multiple embodiments and improve as more robot types and real-world data are added to the ecosystem. ShengShu’s proprietary latent action framework learns motion directly from unlabelled video, human footage, simulations, and multi-robot trajectories, producing favorable scaling behavior: in tests, task success rates continued to climb as both the number of tasks and training episodes increased, reaching roughly 92% success at 50 tasks and at 27,500 episodes, outperforming comparator systems such as Pi-0.5 and Motus.

Real-world evaluations show Motubrain-trained robots executing multi-step workflows with up to 10 atomic actions, demonstrating adaptive behaviors such as retrying a failed scooping action without explicit retry training data, which shifts performance from simple execution to true task completion. ShengShu has also formed partnerships with robotics players including Astribot, SimpleAI, and Anyverse Dynamics to co-develop full-stack solutions that integrate foundation models, multimodal data, and hardware optimization, aiming to accelerate commercial deployments and expand the addressable market for embodied AI.

Strategically, ShengShu positions Motubrain as its next growth pillar alongside Vidu, which recently ranked first on SuperCLUE’s global Reference-to-Video leaderboard and validates the company’s world-modeling approach on the content side. The linkage between Vidu’s video generation and Motubrain’s action capabilities gives ShengShu a vertically integrated platform spanning digital content and physical automation, with potential implications for manufacturing, logistics, services, and consumer robotics as the company transitions from a pure generative media player into a Physical AI infrastructure provider.

Founder Jun Zhu emphasizes that the long-term goal is a unified world model that seamlessly blends perception, reasoning, and control rather than a collection of specialized modules, arguing that this architecture is essential to bridge digital simulations and real-world robotics at scale. For executives and investors, the key takeaways are that ShengShu now has benchmark leadership in embodied AI, live deployments with leading robotics firms, and substantial late-early-stage capital behind a strategy to commercialize a general-purpose robotic brain, positioning the company as a contender in the global race to build foundation models for physical automation.

With more than 200 countries and regions already served by Vidu in sectors such as entertainment, advertising, film, and cultural tourism, ShengShu is leveraging its existing global reach and cloud-scale infrastructure to push Motubrain into adjacent markets where simulation, planning, and autonomous execution can be tightly integrated. The combination of strong technical metrics, operational deployments, and strategic funding suggests ShengShu is entering a new phase where embodied AI, not just generative media, will drive its next leg of growth, while partners and customers gain access to a single model capable of orchestrating complex, multi-robot, multi-environment tasks.

Disclaimer & Disclosure Report an Issue

ShengShu Launches Motubrain as New Strategic Pillar for Embodied AI, Backed by $293 Million Series B

Meet Samuel – Your Personal Investing Prophet

Latest News Feed

More Articles

Stock Comparison

Investment Ideas