Indian Startups Drive Egocentric Data Collection for Robotics

Indian Startups Drive Egocentric Data Collection for Robotics

Synopsis

Indian startups are entering the lucrative egocentric data collection business. This data, captured from a first-person view, is vital for training robots. Leading robotics labs require billions of hours of this data for advanced manipulation and safe operation. Companies like Humyn AI and Objectways are meeting this demand, collecting data across various contexts to fuel the future of robotics.

Listen to this article in summarized format

Captured via wearable cameras, egocentric data is emerging as a key input for robot training
Close to 50 people on a factory floor in Ahmedabad are assembling electronic components, shouldering, screwing and finally putting the finished product into a box, wearing a Go Pro camera on their foreheads.

The camera records the process, which then is then annotated, passed through quality check, and finally to customers, who can use it to train their robots. This kind of data collection is called ego-centric, which refers to data collected from the first-person point of view through wearable cameras.

There is a huge market for them. A report by Stellaris Venture Partners pegs that leading robotics labs need 100 million to 1 billion hours of egocentric data in the next 2-3 years.

To tap into this, multiple Indian startups such as Humyn AI, FPV Labs and Neo Cambrian are entering this business to build a data pipeline for robotics companies. In addition, those in the data collection business such as Objectways are now expanding to collect data for physical AI companies.

Ishank Gupta, co-founder, Humyn AI, explained that to train robots in a single context, the training data required is anywhere between 100,000 to 1 million hours. He defines single context as a one task, for instance, picking up a glass and placing it in a designated shelf in the kitchen. The current consensus, he said, is that for those using egocentric videos to train robotics arms and limbs, estimated data requirement is a few billion hours of data. “These billions of hours of data cannot be scraped and have to be created because there is no repository in the world which has such data,” he explained.

This is the biggest bottleneck for robotics labs, who require egocentric data that allows bots to learn better manipulation of hands, and operate safely in the complex real world environment.

Ravi Shankar, president, Objectways, said, “We started noticing this trend in mid 2025.” The company, which was in the data collection for LLMs, started offering data across egocentric and RGB-D data for calculating depth for robots. “We are doing 1000 hours of data per day, and the demand is for 200,000 to 300,000 hours of data,” he said. The company works with Encord, who counts global robotic labs as clients.

How are they collecting data

Humyn Labs has a verified network of people, who work across 18 countries across India, Latin America, Europe and Southeast Asia, to collect data based on customer needs. The company is currently collecting data for manufacturing, and residentials needs such as washing dishes, folding laundry. The latter is primarily centered in Brazil. Manish Agarwal, co-founder, Humyn Labs said that they have a revenue pipeline from global robotics labs, but did not disclose the names.

Objectways has multiple offices across Karur and Coimbatore in Tamil Nadu, where it employs people to record data through Objectways mobile application. “We started with GoPro cameras, Meta glasses and we even made our own glasses. However they heat up and cannot be used to record for longer hours,” he said. The company then designed its own app to record egocentric data for its clients needs.

Abhinav Kukreja, co-founder, Neo Cambrian, in a LinkedIn post said that they are deploying proprietary hardware to collect accurate and detailed data closer to the real world environment across manufacturing units in India. The company is ramping up its team to build the hardware for robotics data collection.

FPV Labs has collected over 10,000 hours of real-world data in the past eight months, however the company is not looking to sell the data yet. “We are instead investing in the infrastructure to capture, validate and evaluate high-quality data that transfers human state and action representations to robots,” said Abhishek Anand, co-founder.

Anand, drawing a parallel with autonomous vehicles, pointed out that self-driving cars collect thousands of hours of data every day, but only a small fraction is useful for training better models. “Studies like RT-2 show that as little as 1% of data can drive 25% improvement on task success. Quality matters far more than scale,” he added.

Challenges

One of the biggest challenges Shankar is facing is lack of manpower. In India, the company pays Rs 250-400 per hour for collecting data, and even with this getting manpower for generating data is hard.

Humyn’s Agarwal said that the industry is not mature and requirements will keep changing. “You need to be agile and keep up with this change,” he said. In addition, each research lab has their own rules, which means that companies need to build from scratch for each playbook, he added.

A Bengaluru-based investor, who is looking at investing in robotics data collection startups, pointed out the fast paced nature of the business. “You are collecting petabytes of data and this takes weeks to process and send. The biggest challenge here is that by the time the process is done, the requirements change and companies often need to start the process again.”

This editorial summary reflects ET Tech and other public reporting on Indian Startups Drive Egocentric Data Collection for Robotics.

Reviewed by WTGuru editorial team.