Covariant’s CEO on building AI that helps robots learn

Covariant was founded In 2017, we had a simple goal: to help robots learn how to pick things up better. This is a huge need among those looking to automate their warehouse, and it’s a lot more complicated than it looks. Most of the goods we encounter have passed through our warehouse at some point. They come in a very wide range of sizes, shapes, textures and colors.

A Bay Area company built an AI-based system that trains networked robots to improve picking. A demo at his ProMat venue this year shows how connected arms can quickly identify, select, and position different objects.

Co-founder and CEO Peter Chen sat down on a TechCrunch show last week to discuss robot learning, building foundational models, and of course ChatGPT.

TechCrunch: For startups, it makes sense to use as much off-the-shelf hardware as possible.

PC: Yeah. Covariant started in a very different place. We started with pure software and pure AI. The company’s first hires were all AI researchers. There were no mechanical engineers, no robotics engineers. It has allowed us to delve deeper into AI than anyone else.Looking at other robotics companies [at ProMat]they probably use off-the-shelf or open-source models that have been used in academia.

Like ROS.

yes. ROS or open source computer vision libraries. But what we do is fundamentally different. We looked at what academic AI models have to offer, but it’s not enough. Academic AI is built in a lab environment. They weren’t built to withstand real-world testing, especially with many customers, millions of skills, and millions of different types of items that the same AI needs to handle.

Many researchers take different approaches to learning. what’s your difference?

Many of the founding team came from OpenAI, including three of the four co-founders. If you look at what OpenAI has done for the language space in the last 3-4 years, it’s basically taking a foundation model approach to language. Before the recent ChatGPT, there was a lot of natural language processing AI. There have been many natural language AIs out there, such as search, translation, sentiment detection, and spam detection. The approach before GPT is to train a specific AI with a small subset of data for each use case. Looking at the results now, GPT has basically abolished the field of translation and has not been trained to translate. The foundation model approach is essentially a large foundation general model with more data, rather than using a small amount of data specific to one situation or training a model specific to one situation. Train a generalized model to make AI more general.

You focus on picking and placement, but are you also laying the groundwork for future applications?

absolutely. Grasping or pick-and-place functionality is arguably the first general functionality we provide for robots. But behind the scenes, there’s a lot of 3D understanding and object understanding. There are many cognitive primitives that can be generalized to future robotics applications. That said, grabbing and picking is such a vast space that we can work on for a while.

Select and place first because there is a definite need.

There’s a clear need and a clear lack of technology for it. Interestingly, if you had come to this show 10 years ago, you would have spotted a picking robot. they won’t work. The industry has struggled with this for a very long time. People said this wouldn’t work without AI, so people tried niche AI ​​and commercial AI and it didn’t work.

Your system is feeding data into a central database and every pick tells the machines how to pick in the future.

yes. Funny enough, almost every item we touch goes through our warehouse at some point. It is the clearing place that is roughly central to everything in the physical world. If you start by building AI for your warehouse, it’s a great foundation for AI going out of your warehouse. Let’s say you take an apple from the field and bring it to the farm. You’ve seen apples before. I’ve seen strawberries before.

It’s one-on-one. You harvest apples in the fulfillment center, so you can harvest apples in the field. More abstractly, how can we apply these learnings to other aspects of our lives?

If we step away from Covariant specifically and think about where technology trends are headed, we see an interesting convergence of AI, software, and mechatronics. Traditionally, these three areas are somewhat separate from each other. Mechatronics is something you’ll know when you come to this show. It’s about repetitive motion. Talking to sales people tells me about reliability, how this machine does the same thing over and over again.

The truly amazing evolution that we’ve seen in Silicon Valley in the last 15-20 years is in software. People have cracked the code on how to build very complex and very intelligent software. All of these apps that we use are actually people taking advantage of software functionality. We are now at the forefront of AI and are making amazing progress. When asked what lies beyond the warehouse, I see these three trends converging toward building highly autonomous physical machines into the world. We need the convergence of all technologies.

You said that ChatGPT will come out and blind people making translation software. That’s what happens with technology. Are you afraid that GPT will come in and effectively blind the work Covariant is doing?

This is a good question for many, but I think they had an unfair advantage in that they started with roughly the same beliefs that OpenAI had regarding building the base model. General AI is a better approach than building niche AI. That’s what we’ve been doing for the last five years. I can say that we are very well positioned and we are very happy that OpenAI has shown that this philosophy really works.We are very excited to do it in the robotics world. doing.

Source link

Leave a Reply

Your email address will not be published. Required fields are marked *