Hi there,
in the recent years there has been a lot of progress concerning deep reinforcement learning and many publications are available that prove that machine learning can create stable gaits for robots. Mainly interesting and relatively understandable papers are for example https://arxiv.org/abs/1804.10332, where the authors created a walking and also a galloping gait based on training in simulation and a later application on the robot. In a later publication they further advanced and made it possible to learn new gaits via reinforcement directly on the robot without simulation in less than 2 hours https://arxiv.org/abs/1812.11103 There are a lot more examples and different approaches to this.
Nevertheless, this is very remarkable and this made me thinking, whether we can get there also with Nybble and Bittle.
But let's slow down a little. What's reinforcement learning in particular? When you look at the graph below you can get a basic understanding how this works: The Agent, in our case Nybble/Bittle is set in an Environment (e.g. flat floor). There it performs Actions like moving the limbs somehow and trying to get a Reward from the Programmer. The Reward is only given when the State is what we actually want, e.g. moving forward. Trapped in this loop our robot will try to maximize the Reward in every iteration becoming better and better in the movement.
Source: https://en.wikipedia.org/wiki/Reinforcement_learning#/media/File:Reinforcement_learning_diagram.svg
So what I tried to do now, is to use a simulation environment with a flat ground together with a simulation model of Nybble and wanted to make it move forward. I implemented this in the gym training environment in PyBullet with the reinforcement library Stable-Baselines3 https://stable-baselines3.readthedocs.io/en/master/index.html# . There are a lot of functional learning algorithms one can use for reinforcement learning. So for training in my case I tried an algorithm called SAC (Soft Actor-Critic) that seems to be the current state-of-the-art algorithm for reinforcement learning and applied it on Nybble to see how it performs. And the results is definitely still more a crawling than a walking gait, but it shows the potential.
This is a result of reinforcement training only without any intervention from my side:
The next steps are to improve training and the resulting gaits. And once the gaits are good in simulation there are two ways. One is trying to get the learning policy running on Nybble/Bittle or learn it directly on them. I think there I have to use an additional set of hardware to make it run.
If you want to train a walking gait, you can find the link to my repository below, where I will provide further updates. Make sure to install all the necessary python libraries in the import section of the code.
Hello @Gero
This is awesome!
I'm working on implementing reinforcement learning for Bittle and need some advice.
How does your model make the robot move forward? As I understand, you use IMU acceleration data along with yaw, pitch, roll, and current angles as inputs for the neural network. However this data alone doesn't indicate if the robot is moving forward, as velocity is needed for calculating reward, not just acceleration.
Am I missing something?
Thank you in advance!
Hi Gero,
This is a brilliant work! I'm also trying to deploy my reinforcement learning algorithm on bittle but I have some trouble build the serial communication with the robot. It would be super helpful if you can share you code about the serial communication.
Best regards
I finally found some time to push further on this project. I updated the simulation model to better fit the actual mechanics of Bittle and trained a walking gait controller via Reinforcement Learning. It has been very beneficial to use the BiBoard, thanks for that RZ, because it significantly reduced the latencies in the communication of the serial port and the execution of the commands.
First you can see a video of the walking gait of the simulation model. It is important to say, that Bittle is fully controlled by a neural network, with two hidden layers of 256 nodes each. The training was performed with the PPO algorithm.
Next this controller was applied to Bittle equipped with the BiBoard. The data is transmitted via USB and the serial port from a notebook. Bittle sends the current motor angles and the gyro data and this data is processed by the neural network to generate the next motor angles - let's say the next step.
In the next days, I'm planning to update my repository on Github and will share the link.
Hi there,
I was able to continue working on this project a little and created a walking gait controller via reinforcement learning, that I was able to successfully run on Bittle. The controller uses the gyro acceleration, the gyro angles and his leg angles for making steps forward. In this setup, Bittle receives the leg angles from the trained neural network, which uses the gyro information that are sent from Bittle. I think that this latency limits the walking gait smoothness and speed somehow.
In the next days I will upload the new code on my repository: https://github.com/ger01d/opencat-gym
Here you can see the simulation model, that is controlled by the trained neural network:
This is the neural network running on an office notebook and controlling Bittle via the serial port;
(I know, it feels kind of painful, how Bittle hits the ground with his elbow ...)
I've made a first try to run the policy from my last post on Bittle. The so called reality gap which was mentioned in literature is very obvious (difference between simulation and real life). And one might ask, why I use such a short cable. It looks like Bittle is on a chain...
But nevertheless it's a start.
For the application I used @Alex Young's OpenCat modification (Post: gleefully stealing the best ideas), so I could easily send via the legpose command the motor positions, which were generated by the neural network controller. My next steps will be to close the reality gap somehow or train Bittle directly on the hardware.
Is the result sensitive to friction?
I've made a little update regarding the reward function. Now the pitch angle will be considered and high angles will be penalized. The reason for me doing this was, because the Bittle leg configuration tends to walk on the back legs and lift the front legs frequently during training. The resulting gait reminds me of a walking pug somehow. Unfortunately there was a little drift towards the left, which could be minimized with a slightly longer training session.
I also tried to hold and push it with the cursor during walking (you will notice 2 holds and 1 push). This shows me, that the controller seems to be robust, so it can react to distortions.
I'm also trying to make the Bittle-Version (leg configuration < <) learning a walking gait. It seems to be a little more difficult. It ended up in something between running and bouncing:
I've made some changes in the code to take the last joint positions into account for the neural network, a method that was presented here: https://robotics.sciencemag.org/content/4/26/eaau5872/tab-pdf. So now the observation space consists of the body angles and angular velocities and on a joint history of the last 20 positions. I sampled the joint positions every 2nd simulation step, which definitely improved learning rate and made it possible to learn a walking gait. I also had to limit the joints individually to prevent some acrobatic movements and also to prevent from falling over to often.
I'm still using SAC and for the result below I used 1000 episodes and 500000 iteration steps. But I noticed that already after 400-500 episodes the results saturated and there were no more significant improvements.
And I further interacted with the model with the cursor to hold or push it, to see how the model reacts on distortions: