Sometimes you need to send complex data over the network, save the state of the data into a file to keep in the local disk or database, or cache the data of expensive operation, in that case, you need to serialize the data.
Python has a standard library called pickle
that helps you perform the serialization and de-serialization process on the Python objects.
In this article, you’ll learn about data serialization and deserialization, the pickle module’s key features and how to serialize and deserialize objects, the types of objects that can and cannot be pickled, and how to modify the pickling behavior in a class.
Object Serialization
Well, serialization refers to the process of converting the data into a format that can be easily stored, transmitted, or reconstructed for later use.
Pickling is the name given to the serialization process in Python, where Python objects are converted into a byte stream. Unpickling, also known as deserializing, is the inverse operation in which byte data is converted back to its original state, reconstructing the Python object hierarchy.
The pickle Module
Pickling and unpickling are Python-specific operations that require the use of the pickle
module.
The pickle
module includes four functions for performing the pickling and unpickling processes on objects:
pickle.dump(obj, file) | pickle.load(file) |
pickle.dumps(obj) | pickle.loads(data) |
The pickle.dump()
function is used to write the serialized byte representation of the object into a specified file or file-like object.
obj
: The object to be serialized.file
: The file or file-like object in which the serialized byte representation of the object will be written.
The pickle.dumps()
function returns the serialized byte representation of the object.
obj
: The object to be serialized.
The pickle.load()
function reads the serialized object from the specified file or file-like object and returns the reconstructed object.
file
: The file or file-like object from which the serialized data is read.
The pickle.loads()
function returns the reconstructed object from the serialized bytes object.
obj
: serialized bytes object to reconstruct.
How to Pickle and Unpickle Data
Consider the following scenario: pickling the data and saving it to a file, then unpickling the serialized object from that file to reassemble it in its original form.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
import pickle # Sample data my_data = { "lib": "pickle", "build": 4.33, "version": 2.1, "status": "Active" } # Serializing with open("lib_info.pickle", "wb") as file: pickle.dump(my_data, file) # De-serializing with open("lib_info.pickle", "rb") as file: unpickled_data = pickle.load(file) print(f"Unpickled Data: {unpickled_data}") |
The above code serializes the my_data
dictionary and the serialized data is written to a file called lib_info.pickle
in binary mode (wb
).
The serialized data is then deserialized from the lib_info.pickle
using the pickle.load()
function.
1 |
Unpickled Data: {'lib': 'pickle', 'build': 4.33, 'version': 2.1, 'status': 'Active'} |
Take a look at another example in which you have a class that contains multiple operations.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
import pickle class SampleOperation: square = 5 ** 2 addition = 5 + 7 subtraction = 5 - 7 division = 14 / 2 # Object created my_obj = SampleOperation() # Serializing pickled_data = pickle.dumps(my_obj) print(f"Pickled Data: {pickled_data}") # De-serializing unpickled_data = pickle.loads(pickled_data) print(f"Unpickled Data (Division): {unpickled_data.division}") print(f"Unpickled Data (Square): {unpickled_data.square}") print(f"Unpickled Data (Addition): {unpickled_data.addition}") print(f"Unpickled Data (Subtraction): {unpickled_data.subtraction}") |
In the above code, an object of the SampleOperation
class is created and stored in the my_obj
variable.
The object my_obj
is serialized using the pickle.dumps()
function and the serialized data is stored in the pickled_data
variable.
Then, the serialized data (pickled_data
) is deserialized using the pickle.loads()
function and the attributes of the unpickled object are printed.
1 2 3 4 5 |
Pickled Data: b'\x80\x04\x95#\x00\x00\x00\x00\x00\x00\x00\x8c\x08__main__\x94\x8c\x0fSampleOperation\x94\x93\x94)\x81\x94.' Unpickled Data (Division): 7.0 Unpickled Data (Square): 25 Unpickled Data (Addition): 12 Unpickled Data (Subtraction): -2 |
This demonstrates that the deserialization process successfully reconstructed the object.
What Can be Pickled and Unpickled?
The pickle
module can pickle a variety of objects, including strings, integers, floats, tuples, named functions, classes, and others.
However, not all types of objects are picklable. Certain types of objects, for example, file handles, sockets, database connections, and custom classes that lack necessary methods (such as __getstate__
and __setstate__
), may not be picklable.
Here’s an example of attempting to pickle a database connection.
1 2 3 4 5 6 |
import pickle import sqlite3 conn = sqlite3.connect(":memory:") # Pickling db connection object pickle.dumps(conn) |
When you run this code, you will receive a TypeError
stating that the connection object cannot be pickled.
1 |
TypeError: cannot pickle 'sqlite3.Connection' object |
Similarly, functions that are not defined with the def
keyword, such as the lambda
function, cannot be pickled using the pickle
module.
1 2 3 4 |
import pickle lambda_obj = lambda x: x ** 2 pickle.dumps(lambda_obj) |
The above code is attempting to pickle the lambda
function object, but it will return an error.
1 2 3 4 |
Traceback (most recent call last): ... pickle.dumps(lambda_obj) _pickle.PicklingError: Can't pickle <function <lambda> at 0x000001EB55373E20>: attribute lookup <lambda> on __main__ failed |
Modify the Pickling Behaviour of the Class
Let’s say you have a class that contains different attributes and some of them are unpicklable. In that case, you can override the __getstate__
method of the class to choose what you want to pickle during the pickling process.
1 2 3 4 5 6 7 8 9 10 11 12 |
import pickle class SampleTask: def __init__(self): self.first = 2**17 self.second = "This is a string".upper() self.third = lambda x: x**x obj = SampleTask() pickle_instance = pickle.dumps(obj) unpickle = pickle.loads(pickle_instance) print(unpickle.__dict__) |
If you directly run the above code, the result will be an error due to the lambda
function defined within the class which is unpicklable.
1 2 3 4 |
Traceback (most recent call last): .... pickle_instance = pickle.dumps(obj) AttributeError: Can't pickle local object 'SampleTask.__init__.<locals>.<lambda>' |
To tackle this kind of situation, you can influence the pickling process of the class instance using the __getstate__
method. You can include what to pickle by overriding the __getstate__
method.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
import pickle class SampleTask: def __init__(self): self.first = 2**17 self.second = "This is a string".upper() self.third = lambda x: x**x def __getstate__(self): state = self.__dict__.copy() del state['third'] return state obj = SampleTask() pickle_instance = pickle.dumps(obj) unpickle = pickle.loads(pickle_instance) print(unpickle.__dict__) |
In the above example, the __getstate__
method is defined, and within this method, a copy of the attributes is made. To exclude the lambda
function from the pickling process, the attribute named third
is removed and then the attributes are returned.
When you run the above example, you will get the dictionary containing the results of the attributes.
1 |
{'first': 131072, 'second': 'THIS IS A STRING'} |
Now if you want the excluded lambda
expression to appear in the unpickled dictionary above, you can use the __setstate__
method to restore the state of the class’s object.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
import pickle class SampleTask: def __init__(self): self.first = 2**17 self.second = "This is a string".upper() self.third = lambda x: x**x def __getstate__(self): state = self.__dict__.copy() del state['third'] return state def __setstate__(self, state): self.__dict__.update(state) self.third = lambda x: x**x obj = SampleTask() pickle_instance = pickle.dumps(obj) unpickle = pickle.loads(pickle_instance) print(unpickle.__dict__) |
In the above code, the __setstate__
method restores the state of the object. During unpickling, the __setstate__
method is called to restore the state of the object.
When you run the above code, you will see the dictionary having the lambda
function object.
1 |
{'first': 131072, 'second': 'THIS IS A STRING', 'third': <function SampleTask.__setstate__.<locals>.<lambda> at 0x000001C54EEB67A0>} |
Customizing Pickling: Modifying Class Behavior for Database Connections
As you know, a variety of objects are unpicklable. Here’s an example that shows how you can pickle the database connection object by modifying the pickling behavior of the class.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
# pickling_db_obj.py import pickle import sqlite3 class DBConnection: def __init__(self, db_name): self.db_name = db_name self.connection = sqlite3.connect(db_name) self.cur = self.connection.cursor() # Method for creating db table def create_table(self): self.connection.execute("CREATE TABLE IF NOT EXISTS users (name TEXT)") return self.connection # Method for inserting data into db table def create_entry(self): self.connection.execute("INSERT INTO users (name) VALUES ('Sachin')") res = self.connection.execute("SELECT * FROM users") result = res.fetchall() print(result) return self.connection # Method for closing db connection def close_db_connection(self): self.cur.close() self.connection.close() |
The above code defined a class DBConnection
, and the SQLite database connection is initialized within this class.
In addition, three new methods are added: create_table
(for creating a database table), create_entry
(for inserting and retrieving data from the table), and close_db_connection
(for closing the database connection).
Now exclude the database connection from the pickling process using the __getstate__
method.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
# pickling_db_obj.py ... def __getstate__(self): state = self.__dict__.copy() # Exclude the connection and cursor from pickling del state['connection'] del state['cur'] return state db_conn = DBConnection(":memory:") pickle_db_conn = pickle.dumps(db_conn) unpickle_db_conn = pickle.loads(pickle_db_conn) print(unpickle_db_conn.__dict__) |
The __getstate__
method creates a copy of the object’s dictionary, then removes the connection (state['connection']
) and cursor (state['cur']
) and returns the dictionary (state
).
The DBConnection
class instance is created and passed the database name (":memory:"
) that will be created in memory.
The database connection object is then pickled, which is then unpickled and printed.
1 |
{'db_name': ':memory:'} |
As you can see, the dictionary of the object only contains the database name. The connection and cursor objects have been removed.
The __setstate__
method is now required to restore the object’s original state during unpickling, in which the database connection will be reestablished.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
# pickling_db_obj.py ... ... # Restoring the original state of the object def __setstate__(self, state): self.__dict__.update(state) self.connection = sqlite3.connect(self.db_name) self.cur = self.connection.cursor() db_conn = DBConnection(":memory:") pickle_db_conn = pickle.dumps(db_conn) unpickle_db_conn = pickle.loads(pickle_db_conn) unpickle_db_conn.create_table() unpickle_db_conn.create_entry() unpickle_db_conn.close_db_connection() print(unpickle_db_conn.__dict__) |
Within the __setstate__
method, the state
dictionary is updated and the new database connection and the cursor are created.
To check if the pickling process works, the create_table
, create_entry
, and close_db_connection
methods are called on the unpickled class instance (unpickle_db_conn
).
When you run the whole script, you will obtain the following output.
1 2 |
[('Sachin',)] {'db_name': ':memory:', 'connection': <sqlite3.Connection object at 0x00000240D6F12A40>, 'cur': <sqlite3.Cursor object at 0x00000240D78044C0>} |
As you can see, everything went well, and the object’s dictionary now has both a connection and a cursor object along with the database name, demonstrating the successful unpickling of the database connection.
Keep in mind that if the
__getstate__
method returns the false value, the__setstate__
method will not be called upon unpickling. Source
While the ability to customize the __setstate__
method during unpickling provides flexibility, it also comes with security considerations. Arbitrary code can be executed during unpickling, which can be a security risk if the pickled data comes from untrusted or malicious sources.
So, what can you do to reduce the security risk? You can’t do much, but you can make sure that data from untrustworthy sources isn’t unpickled. Validate the authenticity of the pickled data during unpickling by using cryptographic signatures to ensure that it has not been tampered with, and if possible, sanitize the pickled data by checking for malicious content.
Conclusion
The pickle module lets you serialize and deserialize the data and now you know how to do it using the pickle module. You can now convert the object data into bytes that can be transmitted over a network or saved into disk for the future.
In this article, you’ve learned:
- What are object serialization and deserialization
- How to pickle and unpickle data using the pickle module
- What type of object can and can’t be pickled
- How to modify the pickling behavior of the class
- How to modify the class behavior for database connection
πOther articles you might be interested in if you liked this one
β Hash passwords using the bcrypt library in Python.
β How to use pytest to test your Python code?
β Create a WebSocket server and client in Python.
β Create multi-threaded Python programs using a threading module.
β Create and integrate MySQL database with Flask app using Python.
β Upload and display images on the frontend using Flask.
That’s all for now
Keep Codingββ