PICKLING

Saving and Loading...

Welcome to another video of the series Machine Learning in Bioinformatics With Python. In the last video we trained the models and then checked the accuracy of the those models. Before moving on to making the protections from those models. I would like to share a very important thing and that is saving and loading those trained models.

For the sake of this example we have trained on a very small data so whenever we train our models it only takes a few seconds but when you will start to train your models on huge data. It is going to take a lot of time so after training a model it is always a good idea to save that model in a form of a file.

Let’s get started, in order to save the model we will be using a library and its name is pickle. Just like pickling (preserving vagetables) in real life we will be preserving our models in the world of programming.

First of we will have to import the library like this:

import pickle

So saving the model is very easy we will use pickle.dump() with two arguments. The first argument is the name of the model and the second argument is another function open(). Open function takes two arguments, the first one is the name of the file with extension and the second argument is the mode. See the different types of modes in the following table:

Symbol Format Position File Handeling
r Read Only Beginning Raises I/O error
rb Read Only in Binary Beginning Raises I/O error
r+ Read and Write Beginning Raises I/O error
w Write only Beginning Creates the file otherwise over-writes the data
wb Write only in Binary Beginning Creates the file otherwise over-writes the data
w+ Read and Write Beginning Creates the file otherwise over-writes the data
wb+ Read and Write in Binary Beginning Creates the file otherwise over-writes the data
a Append only End Creates the file otherwise over-writes the data
ab Append only in Binary End Creates the file otherwise over-writes the data
a+ Append and read End Creates the file otherwise over-writes the data
ab+ Append and read in Binary End Creates the file otherwise over-writes the data

You dont need to worry about remembering all the details in the table above this is just for future reference.

Since logistic regression is giving us the highest accuracy we will save that model which can be done by the following code:

pickle.dump(model, open("LogisticRegression.m", "wb"))

Loading the saved model is also very easy. We will have to use pickle.load() function with open() function as an argument. The open function contains two argument the name of the file and the mode and here is the code. Woah it rhymes :P

loaded_model = pickle.load(open("LogisticRegression.m", "rb")

Then you can simply print the accuracy using the loaded model:

accu = loaded_model.score(X_test, y_test)
print("Accuracy of Logistic Regression : ", accu)

That’s all for now and next we will be finally making our predictions.