Ever wondered how to develop an ML on Spark and actually make it production-grade?Ever asked yourself how to get an ML model quickly to production without Python’s pickle / without “dumping it”?Meet MLeap!
MLeap is a common serialization format and execution engine for machine learning pipelines. It supports Spark, Scikit-learn and Tensorflow for training pipelines and exporting them to an MLeap Bundle.
The above paragraph is taken from MLeap's official documentation
At Digital Turbine, our big-data architecture is based on Spark. As a result, our Data Science team needed to adjust itself to develop ML with Spark, which can sometimes be challenging.And so, our story begins…We developed an ML model using the known workflow:
Then, after a few iterations, we were finally ready to deploy it to Production.But, guess what? We couldn’t save it and pass it on to other teams (e.g. the Data Engineering team).The main issues were mostly low-level-API bugs in the specific ML library. So, we searched for a more collaborative way to share models between the different teams.
From a Data Scientist’s point of view, the main advantages of MLeap are that it is:
Now, let’s get into code, and see how it happens.1. First, import relevant libraries after you have installed it:2. Once it’s complete, serialize your model into a directory on your machine (or cluster):3. Next, when you want to “unpack” this bundle, use something like this:4. Extract the same pipeline that the DS team had developed:5. Finally, transform data on top of it:
That’s all! MLeap is easy to use, easy to implement and easy to ensure you have production-grade models with Spark!Enjoy your (Machine) Learning!