You are currently viewing zipfile – Read And Write ZIP Files Without Extracting It In Python

zipfile – Read And Write ZIP Files Without Extracting It In Python

How often do you work with ZIP files in your day-to-day life?

If you ever worked with ZIP files, then you would know that a lot of files and directories are compressed together into one file that has a .zip file extension.

So, in order to read that files, we need to extract them from ZIP format.

In this tutorial, we will implement some Pythonic methods for performing various operations on ZIP files without even having to extract them.

For that purpose, we’ll use Python’s zipfile module to handle the process for us nicely and easily.

What is a ZIP file?

As mentioned above, a ZIP file contains one or more files or directories that have been compressed.

ZIP is an archive file format that supports lossless data compression.

Lossless compression means that the original data will be perfectly reconstructed from the compressed data without even losing any information.

If you wonder what is an archive file, then it is nothing but computer files that are composed of one or more files along with their metadata.

This format was originally created in 1989 and was first implemented in PKWARE, Inc.’s PKZIP utility, as a replacement for the previous ARC compression format by Thom Henderson. The ZIP format was then quickly supported by many software utilities other than PKZIP.Source

Disk Structure

Illustration to show how the files are placed on the disk.Source

What is the need for a ZIP file?

ZIP files can be crucial for those who work with computers and deal with large digital information because it allows them to

  • Reduce the storage requirement by compressing the size of the files without the loss of any information.
  • Improve transfer speed over the network.
  • Accumulate all your related files into one archive for better management.
  • Provides security by encrypting the files.

How to manipulate ZIP files using Python?

Python provides multiple tools to manipulate ZIP files which include some low-level Python libraries such as lzmabz2zlibtarfile, and many others that help in compressing and decompressing files using specific compression algorithms.

Apart from these Python has a high-level module called zipfile that helps us to read, write, create, extract, and list the content of ZIP files.

Python’s zipfile

zipfile module does provide convenient classes and functions for reading, writing, and extracting the ZIP files.

But it does have limitations too like:

  • The data decryption process is slow because it runs on pure Python.
  • It can’t handle the creation of encrypted ZIP files.
  • The use of multi-disk ZIP files isn’t supported currently.

Opening ZIP files for Reading & Writing

zipfile has a class ZipFile that allows us to open ZIP files in different modes and works exactly as Python’s open() function.

There are four types of modes available –

  • r: Opens a file in reading mode. Default
  • w: Writing mode.
  • a: Append to an existing file.
  • x: Create and write a new file.

ZipFile is also a context manager and therefore supports the with statement.Source

Here, we can see that all the files present in the sample.zip folder have been listed.

Inside ZipFile, the first argument we provided is the path of the file which is a string.

Then the second argument we provided is the mode. Reading mode is default whether you pass it or not it doesn’t matter.

Then we called .printdir() on arch which holds the instance of ZipFile to print the table of contents in a user-friendly format

  • File Name
  • Modified
  • Size

Error Handling by using Try & Except

We are going to see how zipfile handles the exceptions using the BadZipFile class that provides an easily readable error.

The first code block ran successfully and printed the contents of the sample.zip file because the ZIP file we provided was a valid ZIP file, whereas the error was thrown when we provided a bad ZIP file.

We can check if a zip file is valid or not by using is_zipfile function.

Returns True if a file is a valid ZIP file otherwise returns False.


Writing the ZIP file

To open a ZIP file for writing, use write mode w.

If the file you are trying to write exists, then w will truncate the existing file and writes new content that you’ve passed in.

geek.txt will be added to the geekpython.zip which is created just now after running the code.

Adding multiple files

Note: The file you are giving as an argument to .write should exist.

If you try to create a directory or pass a file that does not exist, it will throw a FileNotFoundError.


Appending files to the existing ZIP archive

To append the files into an existing ZIP archive use append mode a.

Reading Metadata

There are some methods that help us to read the metadata of ZIP archives.

  • .getinfo(filename): It returns a ZipInfo object that holds information about the member file provided by filename.
  • .infolist(): Return a list containing a ZipInfo object for each member of the archive.
  • .namelist(): Return a list of archive members by name.

There is another function which is .printdir() that we already used.

Extracting information about the files in a specified archive using .infolist()

Let’s see some more methods

.create_system returned an integer

  • 0 – for Windows
  • 3 – for Unix

Example for showing the use of .namelist()

Reading and Writing Member files

Member files are referred to as those files which are present inside the ZIP archives.

To read the content of the member file without extracting it, then we use .read(). It takes name which is the name of the file in an archive and pwd is the password used for the encrypted files.

We’ve added .split() to print the stream of bytes into lines by using the separator /r/n and added b as a suffix because we are working on the byte object.

Other than .read(), we can use .open() which allows us to read, write and add a new file in a flexible way because just like open() function, it implements context manager protocol and therefore supports with statement.

We can use .open() with write mode w to create a new member file and write content to it, and then we can append it to the existing archive.

Extracting the ZIP archive

There are 2 methods to extract ZIP archive

  • .extractall() – which allows us to extract all members of the archive in the current working directory. We can also specify the path of the directory of our choice.

All the member files will be extracted into the folder named files in your current working directory. You can specify another directory.

  • .extract() – allows us to extract a member from the archive to the current working directory. You must keep one thing in mind you need to specify the full name of the member or it must be a ZipInfo object.

hello.txt will be extracted from the archive to the current working directory. You can specify the output directory of your choice. You just need to specify path="output_directory/" as an argument inside the extract().

Creating ZIP files

Creating ZIP files is simply writing existing files.

or you can simply add files by directly specifying the full name.

Creating ZIP files using shutil

We can use shutil to make a ZIP archive and it provides an easy way of doing it.

The Shutil module helps in performing high-level file operations in Python.

Here archive is the file name that will be created as a ZIP archivezip is the extension that will be added to the file name, and files is a folder whose data will be archived.

Unpacking the ZIP archive using shutil

Here archive.zip is the ZIP archive and archive is the name of the file to be given after the extraction.

Compressing ZIP files

Usually, when we use zipfile to make a ZIP archive, the result we get is actually uncompressed because by default it uses ZIP_STORED compression method.

It’s like member files are stored in a container that is archived.

So, we need to pass an argument compression inside ZipFile.

There are 3 types of constants to compress files:

  • zipfile.ZIP_DEFLATED – requires a zlib module and compression method is deflate.
  • zipfile.ZIP_BZIP2 – requires a bz2 module and the compression method is BZIP2.
  • zipfile.ZIP_LZMA – requires a lzma module and the compression method is LZMA.

We can also add a compression level. We can give a value between 0 to 9 for maximum compression.

Did you know that zipfile can run from the command line?

Run zipfile from Command Line Interface

Here are some options which allow us to list, create, and extract ZIP archives from the command line.

-l or --list: List files in a zipfile.

It just works like .printdir().


-c or --create: Create zipfile from source files.

It will create a ZIP archive named shell.zip and add the file names specified above.

Creating a ZIP file to archive the entire directory


-e or --extract: Extract zipfile into the target directory.

directory.zip will be extracted into the extracted directory.


-t or --test: Test whether the zipfile is valid or not.

Conclusion

Phew, that was a long module to cover, and this article still hasn’t covered everything.

However, it is sufficient to get started with the zipfile module and manipulate ZIP archives without extracting them.

ZIP files do have some benefits like they save disk storage and faster transfer speed over a network and more.

We certainly learned some useful operations that we can perform on ZIP archives with the zipfile module, such as:

  • Read, write, and extract the existing ZIP archives
  • Reading the metadata
  • Creating ZIP archives
  • Manipulating member files
  • Running zipfile from command line

🏆Other articles you might be interested in if you liked this one

✅What is so special about Python generators and how they work?

How to convert bytes into a string in Python?

Understanding the different uses of asterisk(*) in Python.

Different ways to display web and local images in Jupyter Notebook?

How to access list items within the dictionary in Python?

What is the difference between sort() and sorted() in Python?

How to use super() function in Python classes?

What are context managers in Python?


That’s all for now

Keep Coding✌✌