You are currently viewing High-level Path Operations Using pathlib Module In Python

High-level Path Operations Using pathlib Module In Python

The pathlib module is a part of Python’s standard library and allows us to interact with filesystem paths and work with files using various methods and properties on the Path object.

Getting Started With pathlib

The most frequently used class of the pathlib module is Path. It is better to kick off with the Path class if we are using this module for the first time or not sure which class to use for our task.

In the above example, first, we imported the Path class from the pathlib module and then instantiated the Path with __file__.

This returns the absolute path to the current file, main.py, on which we are working.

The Path class instantiates the file’s concrete path for the operating system on which the user is working. Because we’re using Windows, we’ll get the following output if we print the type of path.

Before we get into the methods and properties of Path, it’s important to understand that the Path classes are divided into pure paths and concrete paths.

Pure class classification

Pure Paths

Pure paths enable us to manipulate the file paths of another operating system, such as manipulating the Windows path on a Unix machine or vice versa without accessing the operating system.

Pure paths only support computational operations and do not support I/O operations such as reading, writing, or manipulating files.

Pathlib’s PurePath

PurePath is a class that is used to perform various operations on the path object. Consider the example below, in which we instantiate the PurePath() class.

We got the PureWindowsPath() path when we ran the above code because we are on a Windows machine, if we were on a non-Windows machine, we would get the PurePosixPath() path.

The PurePath() has two subclasses, which are as follows:

  • PureWindowsPath()
  • PurePosixPath()

PureWindowsPath

This subclass is implemented for Windows filesystem paths, as the name suggests.

PurePosixPath

This subclass is used for non-Windows filesystem paths.

PurePath Methods And Properties

PurePath provides several methods that allow us to perform various operations on filesystem paths.

Getting the drive name

The PurePath.drive can be used to extract the drive name from the specified path. We’ll get a string representing the drive name, or an empty string if no drives are present in the path.

The first part of the code has a drive name in its path, which we got in the output, but the second part of the code did not, so we got an empty string.

Getting the root and stem

The root is the file path’s top-level directory, which we can access with PurePath.root, and the stem is the last component of the file path without the suffix, which we can access with PurePath.stem.

Getting the ancestors of the path

The PurePath.parents can be used to access the logical ancestors of the path.

Using the slicing technique, we were able to access the path’s ancestors. Python 3.10 added support for slices and negative index values for PurePath.parents.

We got the full path except for the file name when we used 0, one directory back when we used 1, and the beginning portion of the path when we used -1.

Getting the parent

The PurePath.parent allows us to access the logical parent of the path.

In the above example, the parent directory of test.py is Pycharm/, the parent directory of Pycharm/ is SACHIN/, and the parent directory of SACHIN/ is the drive D:/, which contains all of these directories and files.

That’s why we got this output D:\SACHIN\Pycharm.

Getting the name and suffix

PurePath.name provides access to the name of the path’s final component, while PurePath.suffix provides access to the file extension of the final component. If the file has multiple extensions, we can get the list of file extensions with PurePath.suffixes.

The last component of the path is test.py, and the extension is .py, which is what we got in the output.

What if our test.py file has extensions like test.py.zip? If we want to extract both extensions, we can use PurePath.suffixes.

Check if a path is absolute

The absolute path is one that has both a root and a drive(if the naming convention allows), and we can use the PurePath.is_absolute() method to determine whether or not a path is absolute. Returns a boolean value.

Looking at the first two PureWindowsPath cases, we first get True because the path has both a drive and a root, but then we get False because the path lacks a drive.

In the PurePosixPath cases, we first got True even though the path did not have a drive name because non-Windows paths do not include drive names like Windows paths. But when we used the drive name in the path, we got False.

Combining paths

PurePath.joinpath() allows us to concatenate the path with the argument passed to it.

Matching the path

PurePath.match() takes a pattern and matches the path against the provided pattern(glob style pattern). When the path is matched, it returns True, otherwise, it returns False.

Depending on the platform we’re working on, pattern matching can be case-sensitive.

Changing the name

PurePath.with_name() accepts a name argument and returns the new path with the changed file name.

If there is no name in the path, then we’ll get a ValueError.

Changing the stem

The PurePath.with_stem() method creates a new path with a different stem.

The ValueError is thrown if the path does not have a name.

Changing the suffix

We can change the suffix using PurePath.with_suffix(). If the file name lacks a suffix, the provided suffix will be appended.

What happens if we supply an empty string? The file’s suffix will be removed.

Concrete Paths

Concrete paths perform computational operations in addition to I/O operations on filesystem paths. Unlike pure paths, we could use concrete paths to perform operations such as reading the file, writing data to the file, and even interacting with the files.

We can make system calls on path objects thanks to concrete paths. Concrete paths are subclasses of pure path classes, and there are three ways to instantiate concrete paths:

  • Path()
  • WindowsPath()
  • PosixPath()

Pathlib’s Path

At the beginning of the article, we saw a glimpse of the Path class, which is a subclass of the PurePath class that represents the concrete path of the filesystem path.

When we instantiate the Path() class, it generates either PosixPath or WindowsPath object, depending on the machine we’re working on.

The Path() created a concrete Windows path because we’re on a Windows machine.

PosixPath

PosixPath is a subclass of PurePosixPath and Path class that represents concrete non-Windows filesystem paths.

Because PosixPath will make system calls, we can’t instantiate it on our machine because it’s running on Windows.

We can only instantiate the class that corresponds to our system, for example, we can instantiate the WindowsPath class on Windows machines and the PosixPath class on POSIX-compliant machines.

WindowsPath

WindowsPath is a subclass of PureWindowsPath and Path class that represents concrete Windows filesystem paths.

Path Methods

The Path class provides several methods for performing I/O operations on filesystem paths by interacting with the operating system.

Getting the current working directory and home directory

You may have used os.getcwd() to get the current working directory, Path.cwd() does the same thing, returning the new path object of the current working directory.

We obtained the path to our current working file, and we can see that the path separator is a backslash(\) because we are using the Windows operating system.

Path.home() returns the path to the user’s home directory. If the home directory cannot be resolved, a RuntimeError is thrown.

Accessing the components of the path

We’ve seen the PurePath properties that help us access the path’s components, since, Path is a subclass of PurePath, we can use those properties with the Path class as well.

Iterating the directories

Using Path.iterdir(), we can get the path objects of the contents of the specified directory.

The path in the above code points to the pathlib_module directory, and we obtained the path objects of the directories and files contained within pathlib_module.

Here is another example of the .iterdir() method.

We iterated through the contents of the files directory, which is located in the current working directory.

Filesystem Modification

Creating a directory

Path.mkdir() creates a new directory at the specified path with the default mode=0o777, which means the directory is accessible to all users and groups and has read, write, and execute permissions.

When we execute the above code, a new directory called new_dir is created in the pathlib_module directory.

If the path already exists, we will receive a FileExistsError. If we run the above code again, we’ll get the following result.

The path we specified already exists which is why the directory is not created. However, .mkdir() has an exist_ok parameter that, when set to True, ignores the error.

Note: The path’s final component should not be the existing non-directory file.

Creating a file

Path.touch() allows us to create a file with mode=0o666 at the specified path, indicating that the file has read and write permissions for all users and groups but no executable permission. The exist_ok parameter defaults to True.

A file called sample.txt will be created. We’ll get the FileExistError if we set exist_ok=False and run the code again.

Renaming the files and directories

Methods like .with_name and .with_stem enable us to rename the file name of the specified path. To rename the files and directories, we can also use Path.rename().

The directory files will be renamed to the docs. What happens if the target file or directory name already exists? The code will raise a FileExistsError.

The above code threw an error because the directory named files already exist in the project directory.

Directory tree

Removing the directory

Path.rmdir() deletes the directory specified in the path, but only if it is empty, otherwise, an OSError is raised.

If we attempt to remove a directory that does not exist, we will receive a FileNotFoundError.

Reading and Writing Operations

Path class provides several methods to perform reading and writing operations on the file. Assume we have a text file with some data and we want to read and write that data.

Sample text file

Opening the file

Before reading or writing data to the file, the Path class provides a .open() method that opens the file specified by the path. You may have already used the built-in open(), this method works in the same way.

Reading the file

To read the content of the file specified by the path, we can use the Path.read_text() method.

Writing data to the file

The Path class provides a .write_text() method for writing text data to a file.

Similarly, we can use the Path.write_bytes() method to write binary data to a file. It opens the file in binary mode.

We wrote the binary data to the sample_file.txt but if we look at the code, we read the file content using the .read_bytes() method.

Path.read_bytes() opens the file in binary mode and returns the contents of the file as a byte string.

Conclusion

The pathlib module provides high-level classes for manipulating file paths. These classes can be used to perform various operations on file paths as well as interact with files to perform I/O operations.

Let’s recall what we’ve learned:

  • Pure path and Concrete path classes
  • Path operations using the PurePath class
  • Path class for instantiating concrete paths
  • Methods of the Path class
  • Reading and writing files
  • Modifying the filesystem

Reference – docs.python.org/3/library/pathlib.html


πŸ†Other articles you might be interested in if you liked this one

βœ…Perform high-level file operation using shutil module in Python.

βœ…Read and write zip files without extracting them in Python.

βœ…File handling in Python – Open, read, and write.

βœ…Generate temporary files and directories using tempfile module in Python.

βœ…What is the difference between seek() and tell() in Python?

βœ…A comprehensive guide to context manager and with statement in Python.

βœ…Open and read multiple files simultaneously using with statement in Python.


That’s all for now

Keep Coding✌✌