Lessons learned building an IoT stack for elevators

At uptime, we're developing predictive maintenance for elevators. Our R&D team of 20 hardware and software engineers is focused on building the most advanced IoT plateform for the elevator industry, combining data intelligence and easy-to-use web and mobile apps. We also work hand in hand with our team of technicians and domain experts, in charge of maintaining more than 1000 elevators in Paris. It provides us with a strong feedback loop on the features we are developing.

Our vision is to use technology to increase trust in the elevator maintenance industry. Dozens of maintenance organisations in Europe are interested in this platform, and we are here to make a big change in a broken industry!

 

💔 Why is the current elevator maintenance model broken?

The classic maintenance model works as follows:

◼︎ Technicians are required to follow a fixed schedule and maintain elevators a given amount of times a year (9 times a year in France).
◼︎ Priority is given to all breakdowns (no matter their level of urgency): a technician must cease all other activity as soon as a breakdown occurs and try to fix it.
This leads organisations to fail at delivering an acceptable level of service. Technicians have too many elevators to maintain (in France, about 120 to visit once every 6 weeks!), the visits become "ghost visits", useless compliance checks that fail to prevent failures. Breakdowns are not properly fixed as root causes are not accurately identified due to lack of time and training. As a result, elevators break on average 5 times a year, leading to days of downtime and inconvenience for end customers.

 

🔮 What does predictive maintenance mean?

Our work is focused on implementing a dynamic visit model to tailor maintenance visits to each elevator, guide technicians in their actions and help them prioritize their work. That means :

◼︎ Precise analysis of failures to understand their root causes and associated curative actions
◼︎Determining the correct preventive measures and the correct timing to implement them
This way, technicians spend less time on useless tasks, their actions are more timely and accurate, last minute interventions are minimized. This leads to fewer breakdowns, higher uptime rate and an overall high quality of service for the end customer.

 

🏗 How are we building our SaaS plateform?

To make this vision come true, we first developed a universal IoT device to gather data from elevator controllers (traffic, door cycles, sensor data, statuses, errors etc.) and analyze this data to give the best insights to our users.

But predictive maintenance is much more than plugging a connected device on top of pre-existing operational processes: It's a whole reconstruction of how we do maintenance. That's why a large part of our efforts are also focused on helping organisations redesign their processes and seamlessly integrating this data within their workflows.

 

🗺 Use cases of predictive maintenance powered by IoT data?

The IoT stack we described allows to store elevator data and analyse it in order to provide real time insights to any elevator maintenance team. Here are some examples of predictive maintenance features from our custom web and mobile applications:

◼︎ We provide data visualizations to the operational team, helping them to quickly analyse the status of an elevator (the elevator traffic, count of door openings, the most recent faults etc.)
◼︎ Some action recommendations are made based on issues detected by the IoT device. It guides the technician on a maintenance visit to focus his work on precise parts of the elevator. Checking for potential issues early is what prevents an elevator breakdown.
◼︎ A technician can perform a breakdown discrimination via its mobile application and decide whether or not it's an urgent situation. This way he can prioritize his tasks during the day.
◼︎ When the technician receives a breakdown alert, he is able to do a first remote diagnostic and determine the root cause of the problem with the help of the past IoT data.
...and there are many more features we work on based on IoT data!

image15-2

🛰 Building a reliable IoT Stack for elevators?

As discussed above, we needed a device able to collect tons of data from elevators. Unsatisfied with what was available on the market, we developed a universally compatible device, that can be connected to different types of controller boards. But besides building a connected device, building an IoT solution for elevators means overcoming difficulties very specific to the industry:

◼︎ low connectivity in the buildings where the IoT device is located.
◼︎ discrete IoT events depending on the elevator traffic.
◼︎ a huge variance of the number of events between the elevators, inducing scaling issues.

 

🗼 Choosing the right communication protocol?

Once the device is plugged in (and of course has an internet connection), we need to retrieve its data. When we had to choose how the device would communicate with our backend services, we essentially focused on 5 things:

▪︎ Communications security
▪︎ Minimal bandwidth consumption
▪︎ Protocol ease of use
▪︎ Network loss tolerance
▪︎ Bidirectional communication.

Communication protocols considered were HTTP, AMQP and MQTT but there are a lot more out there.

 

🧐 HTTP, why not?

HTTP is a one way protocol. It means the client needs to initiate the connection, ask for a given resource and the connection closes. Each time the client wants to send data, the same cycle happens again.

Also, as per its design, we cannot command clients through HTTP without having them to pull a specific endpoint periodically. Note that we could've used websockets - but websockets are less fault-tolerant and consume more bandwidth than MQTT.

The last blocker is that it's heavier than a binary protocol.

 

🤔 AMQP vs MQTT?

Both are binary protocols that offer important features. Both of these protocols implement pub-sub messaging patterns (MQTT only supports that one actually) and provide secure communications.

MQTT is designed to be light, has a very low power consumption and can handle low-bandwidth / high latency networks. On the other side, AMQP supports more communication patterns and much more features. This implies a heavier footprint on the network.

Finally, the original design of those two protocols helped us choose between the two. AMQP was more about speed and reliability in financial environments whereas MQTT was designed to get data from network and power constrained devices which best fitted our case.

After deciding to go for MQTT, we had to choose which broker we'd used between several options

▪︎ a self-hosted open source mosquitto broker
▪︎ AWS IoT Core
▪︎ HiveMQ
▪︎ other cloud provider solutions
We quickly excluded mosquitto as it would require some maintenance that we could not afford.

HiveMQ was an interesting candidate as they put a lot of effort in the MQTT specifications, becoming the broker that supports the most MQTT features. But we finally settled for AWS which proved to be the best suited solution for us, despite AWS IoT Core supporting less features (no QoS 2 for example).

Because we already used AWS products and thanks to their very comprehensive API, we were able to integrate the whole flow of device creation, certificate management and policies very easily.

 

⚡️ Processing IoT data in real time?

Once we were able to send and receive data through the broker, a new challenge arose. We started receiving around 1000 events per minute during daytime (we hope a lot more soon 😉) . As we want to display data in real time to the end-users we need to persist these IoT events by inserting them in a database and keep a low latency. So we looked for a solution to scale horizontally in order to compute more and more events over time. We chose to use task queues: when we receive an event we add it to the queue, then multiple workers insert the events in the database and make some aggregated computations. We use Celery as it allows to easily create tasks and distribute them to workers. Redis is our Broker to store the tasks before they are consumed. The only job of the worker listening to the MQTT topics is to create a celery task. Celery is the standard python library used to process asynchronous tasks through queues. The task creation is really quick and Redis can store a large amount of messages (in our case it's around 140K messages with 250MB of RAM). We can instantiate more workers if needed and therefore handle spikes that can arise.

 

🔌 Quickly access the last values of the IoT sensors

Today, we have several workers computing events from several devices at the same time and storing the result in our PostgreSQL database. For every sensor we retrieve from our IoT devices, we wanted to keep in cache the last value (last floor, last door opening states etc.), to be able to quickly access it. As we can’t be sure all the events are computed in the same order as they are emitted, we need to make sure only the most recent events update the database. So a timestamp is sent along in the event and we check that this timestamp is more recent than the last stored before we make any updates in the database.

Finally, we needed to solve the issue of writing concurrently since multiple workers can compute the device state at the same time and overwrite the last value of the sensor. To prevent this, when the worker gets the current device state, the worker puts an SQL lock on the row and release it after the update. After computing the device state the worker forwards it through MQTT to allow listeners to get that state and display it appropriately in our applications.

To sum things up, this schema of our stack illustrates what happens on a regular elevator trip. When a person enters an elevator, as soon as the doors are opened the IoT device sends messages to the IoT broker. The same is performed when the elevator starts its movement, reaches a specific floor and finally open its doors at the end of the trip.

Animation-blog-gif (2)

 

🎓 Our learnings

By building our IoT stack for elevators and scaling it to where we are today, we learned a few things:

◼︎ We are satisfied with the MQTT protocol, particularly in our context of low networks and the fact that it's integrated smoothly in the AWS infrastructure with IoT core.
◼︎ To scale the ingestion of IoT events from 0 to 1000 per minute, use Celery with Redis as a broker and scale horizontally by adding workers when needed.
◼︎ To make it easier to keep consistent data, it helps using ACID properties and locks functionalities provided by a RDBMS like PostgreSQL.

 

Thanks to Valentin Montagné, Full Stack Developer, Alexandre Papin, Backend Developer, Thomas Himblot, Team Lead Software engineer, Luca Montaigut, Full Stack Web Developer, and Jean Lebas, Product Manager, for contributing to this article!